AI systems have proven to be accurate—in terms of positive predictive value (PPV) and sensitivity—for tasks that are time consuming or strenuous for health care professionals. Accuracy of those systems is important and a necessary condition for integrating AI in clinical practice. While it may seem natural to connect a technology’s accuracy with expectations about its efficiency, accuracy should not be mistaken for efficiency. Nevertheless, this consistently occurs in academic literature, policy reports and news items about AI. For example, when studies suggest that AI will reduce healthcare costs, resolve shortage of staff, optimize care in low resource settings, and even prevent burnout amongst health care professionals1,2,3,4,5,6,7,8, see Supplementary Table 1 for examples of these conflations. In some of these recent publications AI’s accuracy is thus mistakenly taken as a sufficient condition to achieve efficiency gains eg.1,3,7,8: In other academic papers, the accuracy of a technological system is even deemed equivalent to its efficiency 4,5,6. We consider this elision of concepts to be flawed and misleading.

First, the confusion of accuracy for efficiency in terms of workload reduction is flawed because it rests on a too narrow assessment of what constitutes workload in the first place. AI systems do not emerge out of thin air. A significant amount of human labor and time has been invested in the development and validation of these systems by data scientists, AI engineers and clinicians. Leaving this labor out of discussions about medical AI draws a too favorable a picture in terms of the total amount of human work needed. Furthermore, ongoing input and labor from medical professionals will be necessary, even after systems have been validated and integrated in the daily workflow. For example, AI systems for radiology and pathology will require a constant stream of expert-annotated images to maintain system accuracy9. If these annotations must be completed separately or differently from standard annotation processes, for example in a separate digital system, this additional labor will have to be factored into clinicians’ already heavy workload of clinical assessment, multidisciplinary deliberation, and patient communication. Importantly, health care professionals will likely need to maintain their ability to assess these images without the support of AI. This means that training and new responsibilities will come on top of their work schedules, thereby increasing their workload10. In addition, we must not forget that technology is imperfect11. AI systems will make mistakes, malfunction, or even breakdown. Mistakes can include biased outcomes, “hallucinations” and AI drift, which may seriously harm patients and therefore demand measures and increased awareness to counter these unwanted effects. This underscores the fact that complex technological systems such as medical AI can only function well when supported by an extensive and reliable technical infrastructure and the expertise of people like IT experts and data scientists. Substantial human labor will also be required by these experts to keep the systems up to date, to ensure that they continue to be accurate, and to monitor their proper functioning in the workplace12. Moreover, innovations that may seem like an efficiency win in the short-term may become sources of inefficiency in the long-term because of systemic changes, as we have learned from other technologies13. Emails, for example, enabled the rapid exchange of written text. But emails have not merely replaced letters, the new system also changed fundamentally what and how we communicate and thereby led to more frequent communication in the long run. People now spend more time writing and reading emails than they ever did on letters14. This should teach us that even if AI proves to be an accurate tool leading to efficiency gains in a narrow sense, other systemic shifts might nullify this efficiency gain. An increase in availability of accurate AI systems may, for example, result in institutional or policy recommendations to apply it more frequently or for multiple causes, which might eventually increase rather than decrease the workload of clinicians regardless of the presence of an AI support system.

Second, even if AI systems are accurate and experimental results support the claim of efficiency gain, we should not underestimate the influence of the human-side of technology implementation on such systemic effects. Health care professionals who operate these systems influence whether the possible benefits from the technology will be realized. Their knowledge and (technical) competencies can foster or undermine efficiency; even the most accurate AI system will be inefficient in the hands of a practitioner who is unable to use it correctly. Therefore, the potential benefits of technological systems can only be realized when used adequately in clinical practice and implemented under specific conditions. These conditions include the skills to handle such technologies and the willingness to bear new responsibilities12. Another major variable in this equation is the trust a health care practitioner will place in these systems. At least some minimal level of trust is needed to be willing to use an AI system in the first place. Trust is also an important factor when these systems are adopted in the clinical workflow, as it is generally argued that health care experts should stay in the loop e.g.15. More importantly, in their interactions with these new technologies, medical experts will have to critically consider when the advice of such systems should be followed in clinical decision-making and when it should be disregarded; in other words, when should health care experts trust and when should they distrust such systems? Given the computational power of medical AI, it can be reasonable for medical experts to follow the algorithm’s advice16. Yet, the academic literature indicates that putting too much trust in algorithms can be risky; clinicians may, for example, uncritically adopt an algorithm’s biased or wrong advice17. Too much trust in these systems may cause efficiency gains on the short term, but eventually cause more mistakes and, thus, patient harm and a loss of efficiency in the long run. In the other extreme, when health care professionals do not trust these systems at all and question the accuracy of such algorithms too much—as medical AI are typically prone to type 1 errors, or false-positives18 – this may result in a decrease of efficiency in clinical practice due to unnecessary additional tests.

We conclude that is important to remain conscious and critical about how we talk about expected benefits of AI in terms of accuracy and efficiency. First, we should refrain from drawing conclusions about systemic effects based on single studies. Hopes that technology will lead to increased efficiency are not unprecedented. However, historical research indicates that such hopes are only rarely, unequivocally fulfilled10. The systemic effects of these technologies can often only be assessed years after their introduction with the help of historians, philosophers of technology, sociologists, and empirical insights into the day-to-day experiences of users themselves19. In other words, we cannot be sure of the systemic effects before the technology is introduced to the clinic. Second, to do justice to the broader context and human labor involved in developing and deploying medical AI systems, it will be crucial to distinguish the benefits of AI more clearly in terms of effectiveness (getting more done) and efficiency (doing it with fewer resources)10. Explicitly distinguishing between these two dimensions in future research will help us ascertain whether additional support and work is necessary or whether fewer recourses are needed for the same or better results. Third, more research needs to be conducted on the relation between trust and efficiency: How does trust in these systems emerge and what are its consequences? Is the expectation of efficiency a cause of (unwarranted) trust in AI systems inducing the aforementioned problems of overreliance? Normative investigations that provide guidance into the reasonable grounds for trust (such as accuracy, efficiency and clinical value16,20) are important in and of themselves, but they will not necessarily result in widespread trust in these systems. For now, it remains to be seen whether accurate AI systems will lead to efficiency gains and workload reduction. In the meantime, we must proceed carefully and continue to critically assess whether emerging AI systems really fulfill the needs of clinical realities.