Lucía Ortiz de Zárate
Pre-doctoral researcher in Ethics and Governance of Artificial Intelligence in the Department of Political Science and International Relations at the Autonomous University of Madrid
The study addresses, experimentally, the potential of ChatGPT (OpenAI) to pass the United States Medical Licensing Exam (USMLE). Passing this exam is a prerequisite for acquiring a licence to practice medicine in the United States, and it tests the ability of medical specialists to apply knowledge, concepts and principles that are essential for providing the necessary care to patients.
The novelty of the paper lies not only in the fact that it is the first experiment to be used for this purpose, but also in its results. According to the researchers, ChatGPT is very close to passing the USMLE test, which would require at least a 60% success rate. The test used in the study contains three types of questions (open response, multiple-choice without justification and multiple-choice with justification). Currently, ChatGPT has achieved an average of between 52.4 % and 75 % correct answers, well above the 36.7 % score achieved only a few months ago with previous models. These rapid improvements of ChatGPT in just a few months make researchers optimistic about the possibilities of this AI.
While the results may be of great interest, the study has important limitations that call for caution. For the USMLE exam, ChatGPT was tested on 375 exam questions from the June 2022 edition of the exam, published by the official website responsible for the exam. In this sense, we will have to wait and see what results are obtained when ChatGPT is applied to a larger number of questions and, in turn, is trained with a larger volume of data and more specialised content. In addition, the results of the ChatGPT test were evaluated by two doctors. Thus, it is necessary to wait for further studies with a larger number of qualified evaluators to be able to endorse the results of this AI.
This type of study demonstrates, on the one hand, the potential of AI for medical applications and, on the other hand, the need to rethink knowledge evaluation methods. In terms of medical practice, AI technologies can be a very significant help for doctors when making diagnoses, prescribing treatments and medicines, etc. These changes push us to rethink the relationship between AI, doctors and patients. As for evaluation systems—not only in medicine—the progressive improvement of AI systems such as ChatGPT show that we need to rethink our methods for evaluating the knowledge and skills (and content) that future professionals need.