Autor/es reacciones

Alfonso Valencia

ICREA professor and director of Life Sciences at the Barcelona National Supercomputing Centre (BSC).

These two independent studies present AI systems designed for clinical patient management. Both represent significant technical advances, although they should be interpreted within their proper context: they are research developments rather than systems currently deployed in real hospitals.

MIRA is an autonomous agent that operates within a simulated electronic health record environment, capable of conducting patient interviews, ordering diagnostic tests, and proposing treatments. When evaluated on hundreds of real emergency department cases, it matched or exceeded physician performance across many of the conditions assessed, although not all of them. The second system, AMIE, is a conversational AI optimised for clinical reasoning across multiple patient visits. It also performed at a level comparable to a panel of primary care physicians, while demonstrating closer adherence to clinical guidelines and recommendations. Such orthodoxy may or may not prove advantageous in real-world settings, where flexibility and adaptation to individual cases are often just as important.

These developments can be regarded as important technical advances with the potential to improve clinical workflows and hospital processes, but they are not yet systems operating in real-world healthcare environments.

From a technical perspective, given the complexity of these systems, it is essential that they are independently evaluated and used by other researchers before confidence can be placed in the validity of the reported results. This is particularly important in relation to issues such as potential contamination between training and evaluation data, a common and serious concern in systems trained on datasets so vast that assessing their provenance and quality becomes extremely challenging. In this regard, openness is crucial. While MIRA is openly available, AMIE is not, making independent evaluation impossible and ultimately limiting the degree of trust that can be placed in its reported performance.

In any case, it is important to emphasise that we are still in the realm of research and development rather than implementation within highly complex and regulated environments such as hospitals. The current limitations remain substantial. These systems are not yet ready to interact with the full complexity of real patients, clinicians, and healthcare systems, including the many forms of interaction that extend beyond text and which are critical to everyday clinical practice.

In summary, these are important scientific publications that clearly demonstrate the rapid pace at which AI applications for medical decision-making are advancing, driven in large part by major technology companies, though not exclusively so, fortunately. Before such systems can be implemented in real healthcare settings, prospective studies involving real patients and robust ethical oversight will still be required, following the standard—and legally mandated—process applicable to any medical technology.

EN