This article is 10 months old
Reaction: Speech deepfakes fool humans even if they are trained to detect them

Speech deepfakes are synthetic voices produced by machine learning models that can resemble real human voices. Research published in PLoS ONE involving half a thousand participants shows that they were able to correctly identify that they were not real voices 73% of the time. The results of the study—conducted in English and Mandarin—showed only a slight improvement in those people who were specifically trained to spot these deepfakes.

02/08/2023 - 20:00 CEST
Expert reactions

Fernando Cucchietti - deepfakes EN

Fernando Cucchietti

Head of the Data Analysis and Visualization group of the Barcelona Supercomputing Center (BSC-CNS)

Science Media Centre Spain

I think the study is well done and robust, although the range of application of the results is somewhat limited. In particular, the conditions where the experiments were done are very laboratory, in the sense that they are not realistic for situations where deepfakes can be problematic, for example, if you know the person they are imitating.

The conclusions, in this sense, may be a bit exaggerated, but it is also true that they coincide with other alternative and similar studies. It is a very specific study with very specific conditions, which is good because it means that the results are less affected by other factors, for example, previous prejudices or biases as in the case of disinformation studies; but it also means that in other contexts they do not have to be fully applied.

The author has not responded to our request to declare conflicts of interest
Warning: Humans cannot reliably detect speech deepfakes
  • Research article
  • Peer reviewed
  • Experimental study
  • People
Publication date

Kimberly T. Mai et al.

Study types:
  • Research article
  • Peer reviewed
  • Experimental study
  • People
The 5Ws +1
Publish it