A study of more than 630 billion words (mostly in English) used on 3 billion web pages concludes that the term 'people' is not gender-neutral: its meaning is biased towards the concept 'men'. The authors write in Science Advances that they see this as "a fundamental bias in the collective view of our species", relevant because the concept 'people' is "in almost all societal decisions and policies".
This is an interesting article that investigates the concept of "people"/persona from a gender perspective by analysing billions of words from a corpus of internet data. The article analyses the similarity of the terms used in the corpus when writing about people, men and women. It finds a greater similarity between terms used to refer to people/person and terms used to refer to men than to refer to women. From this analysis, they conclude that there is a gender bias in the concept "people"/person, prioritising men over women.
Concepts describing collective terms (such as people) are not simply a reflection of a possible underlying bias, but also act as a reinforcement of society's perception of men and women. Therefore, the gender bias found in such an important collective term could have profound implications for our society.
For centuries women have been excluded from the public sphere. It was not until after the French Revolution that this situation began to be reversed, although the echo of its consequences resonates to this day. The article adds further evidence of how this asymmetry has influenced language in favour of a stronger association of the concept 'person' with the concept 'man' than with the concept 'woman'.
Being able to detect gender bias computationally is a warning in the development of Artificial Intelligence systems. These systems learn from data by linking differences in language use to model predictions. Depending on the application, they may contravene ethical and legal principles, which implies the inclusion of mitigating measures. However, it should be noted that there are applications where biases can be beneficial, such as in detecting diseases that have different prevalence by gender.
This work demonstrates the potential of computational and statistical techniques to analyse huge volumes of text. However, the paper's conclusions need to be further explored. Large volumes of data do not guarantee their representativeness. It would be interesting to extend the time horizon to understand whether the biases detected are increasing or decreasing. And finally, to move from an analysis based on correlations to models that capture causal relationships.
In particular, the influence between language and thought is a controversial association that is still under discussion.