This article is 1 year old
Reactions: ChatGPT algorithms could help identify Alzheimer's cases

Artificial intelligence algorithms using ChatGPT - the OpenAI company's GPT-3 language model - can identify speech features to predict the early stages of Alzheimer's disease with 80 per cent accuracy. The neurodegenerative disease causes a loss of the ability to express oneself that the algorithms could recognise, according to the journal PLOS Digital Health.

28/12/2022 - 08:05 CET

Adobe Stock.

Expert reactions

Alfonso Valencia - algoritmos alzhéimer EN

Alfonso Valencia

ICREA professor and director of Life Sciences at the Barcelona National Supercomputing Centre (BSC).

Science Media Centre Spain

The aim of the study is to investigate the usefulness of the natural language processing system (GPT-3) in the classification of Alzheimer's cases based on the characteristics of conversations (pauses, intervals).  

In particular, the study compares the results of using distilled information from GPT-3 (so-called embeddings) with other systems, including different embeddings and specific training processes. In all these tests the embedding-based system is more effective in distinguishing cases from controls and also in specific tests that quantify the severity of cases (MMSE score). It is conceivable that these results will improve further with more advanced NLP (Natural Language Processing) systems - these days there is talk of a possible GPT-4 in 2023 - trained on more data.  

The foundation of such applications is the ability to find patterns from correlations between elements, in this case, components of conversations. This is the speciality of machine learning system developments and in particular what makes PLN systems such as GTP-3 used in this study powerful.  

Considerations to be taken into account are that the data used are from a test set commonly used in this field (ADReSSo Challenge) which is very limited in size (237 conversations) and very homogeneous, with no mixing of patients from different diseases. The authors recognise the need for validation on sets external to the one used for the study. This is a basic step for the validation of any system that seems to have been omitted in this publication. 

The final part of the press release and the article talk about the possible practical application of the system, with the unfortunate mention of a possible public servant. This system is far from such applicability and the installation of a public server based on these results would be a very bad idea with very problematic ethical connotations. The potential medical application of such systems, like any other AI/ML (Artificial Intelligence/Machine Learning) based system, is a much more complex issue that requires robust and systematically validated results, as well as overcoming a number of ethical questions about confidentiality, reliability and utility. 

On the positive side, it is interesting that these technologies are being applied to medical problems where they can contribute to research on diseases such as Alzheimer's, where AI/ML's ability to detect complex patterns in data can be very useful. 

Alfonso Valencia is a member of the advisory board of SMC Spain. 


Pablo Haya - algoritmos alzhéimer EN

Pablo Haya Coll

Researcher at the Computer Linguistics Laboratory of the Autonomous University of Madrid (UAM) and director of Business & Language Analytics (BLA) of the Institute of Knowledge Engineering (IIC)

Science Media Centre Spain

Speech impairment is an important biomarker of neurodegenerative disorders such as Alzheimer's disease. The research line where the article is located proposes the use of natural language processing (NLP) techniques for the early detection of Alzheimer's disease through speech. The authors use a classifier based on language models, specifically GPT-3, which determines whether a person is developing Alzheimer's disease and to what degree, based on the text extracted from a locution. The classifier has been validated using real speech from healthy people and people with Alzheimer's disease. The results reflect new evidence of the superiority of incorporating language models in problems of a certain complexity where PLN has a place.   

The real impact of this technology as a diagnostic test is more debatable. Firstly, it would have been interesting if the article had included a comparison with the methods currently used in the early detection of Alzheimer's disease. Only the comparison with other PLN-based methods is included.   

Secondly, the cost-benefit analysis should take into account the false positive rate, which has not been reported. Open use to the public, as proposed by the authors via a website or a mobile app, would lead to many more healthy people passing the test than people with Alzheimer's disease. Depending on the false positive rate, many healthy people could be diagnosed as developing the disease. This would most likely lead to a disproportionate increase in alternative tests to verify whether the results are correct.   

Finally, before this technology could be used as a diagnostic test, it would have to comply with the validation protocols established by the various health agencies. The study presented in the article would correspond to a very preliminary phase given the size and representativeness of the sample used.

The author has not responded to our request to declare conflicts of interest

Lucía Ortiz - algoritmos alzhéimer EN

Lucía Ortiz de Zárate

Pre-doctoral researcher in Ethics and Governance of Artificial Intelligence in the Department of Political Science and International Relations at the Autonomous University of Madrid

Science Media Centre Spain

Medicine is one of the most promising application areas for Artificial Intelligence. The use of these intelligent systems could lead to a very significant improvement in diagnostics, disease detection, etc. Along these lines, the article by Agbavor and Liang of Drexel University looks at how the latest version of Open AI's chatbot, GPT-3, can be used in the early diagnosis of Alzheimer's and dementia.    

Language impairment (decreased response time to certain questions, changes in sentence structure, etc.) is an important marker for diagnosing neurodegenerative diseases. Using 237 voice recordings from the ADReSSo Challenge database, the researchers trained the GPT-3 algorithm and showed that it can detect the onset of Alzheimer's with a hit rate of 80 %. These results match and, in some cases, exceed the hit rates of other conventional Alzheimer's detection models and tests.  

Although these results are promising, the study has some important limitations that point to the need for further, larger and more detailed studies. The sample with which we have worked in this case is low, therefore, in order to be able to verify the real usefulness of this and other chatbots it will be necessary to have much larger samples that allow a greater generalisation of the results of the study. In addition, it is important to note the possible presence of biases and other ethical issues in the samples used that need to be addressed to ensure that AI works equally well in diagnosing people of any gender, ethnicity, nationality, age, etc. In this sense, studies of this kind highlight the need to incorporate an ethical perspective in any AI study applied to society. 

The author has not responded to our request to declare conflicts of interest
Predicting dementia from spontaneous speech using large language models
  • Research article
  • Peer reviewed
  • People
  • Modelling
PLOS Digital Health
Publication date

Felix Agbavor et al.

Study types:
  • Research article
  • Peer reviewed
  • People
  • Modelling
The 5Ws +1
Publish it