The language models used by tools such as ChatGPT fail to identify users' erroneous beliefs
Large language models (LLMs) do not reliably identify people's false beliefs, according to research published in Nature Machine Intelligence. The study asked 24 such models – including DeepSeek and GPT-4o, which uses ChatGPT – to respond to a series of facts and personal beliefs through 13,000 questions. The most recent LLMs were more than 90% reliable when comparing whether data was true or false, but they found it difficult to distinguish between true and false beliefs when responding to a sentence beginning with ‘I believe that’.