Autor/es reacciones

The announcement of investment in the development of a large, open-source, transparent language model in Spanish and the other co-official languages is welcome news as existing models, even those that are multilingual, have been trained on mostly English data. Recent research points out that these models use internal representations based on English and, therefore, the language they generate in other languages, especially if they are languages with few resources, may have linguistic biases and use expressions that are not specific to those languages.   

Furthermore, being open source, this language model will be available to any person or institution, facilitating access to natural language processing tools for a wide range of applications and users. Open source also makes it possible to involve wider communities of developers, researchers and language experts in the continuous improvement of the model. Both ELLIS Europe and ELLIS Alicante advocate the development of open science, including the development of open source artificial intelligence systems.   

Transparency is another key feature to contribute to trust in their operation and results, as well as to foster the exchange of ideas so necessary to drive innovation. Trust in these systems is a key requirement for their use in society, especially in critical applications where the correct interpretation of language is essential.  

Clearly, the inclusion of co-official languages alongside Spanish is an important and necessary step towards the preservation and promotion of linguistic diversity, such a valuable asset for our society.   

What does it add to the existing models?  

ELLIS Europe and ELLIS Alicante believe that if we want artificial intelligence to be socially sustainable, we need to expand access to high-performance computing, especially using renewable energy, encourage open source practices, invest in attracting and retaining the best minds, and demand transparency in the research, deployment and use of AI. This approach not only democratises AI development, but also contributes to the development of a more secure and competitive AI ecosystem. In this context, it is important to develop our own open and transparent language models, trained on quality data that does not infringe intellectual property rights and in our own languages to minimise bias. Given the cross-cutting nature of large language models, which can be used in virtually any sector, it is of strategic value to have our own development of these models. Furthermore, we cannot forget that there are more than 480 million people in the world whose mother tongue is Spanish, being the official language of 20 sovereign states in the world. The opportunities for impact are therefore immense.   

What will be your main obstacles?  

Developing a great language model with internationally competitive performance is a complex task with several challenges of different kinds.   

Firstly, challenges of resources, funding and environmental impact. Creating a large, high-quality language model requires significant resources, both financial and computational. Adequate budget is needed for research, hardware acquisition, hiring of specialised staff and other related expenses. I understand that this obstacle would be addressed on the basis of the Prime Minister's announcement. Large computational requirements have a direct impact on the environment as the training and use of these models entails large energy needs which, if renewable energies are not used, contribute to the carbon footprint.   

The second challenge is obtaining large amounts of data for training. Collecting, cleaning and labelling this data can be a challenge in itself, especially when dealing with co-official languages with fewer resources. In addition, it is necessary to verify that the data used is not proprietary or protected by intellectual property rights.   

The third major challenge concerns the need for large computing capacities. In this respect, Spain has a supercomputer, MareNostrum 5, located at the Barcelona Supercomputing Center, which would solve this difficulty.   

Fourthly, there is the challenge of talent. The development of cutting-edge language models requires the participation of experts in artificial intelligence, computational linguistics, machine learning and other related fields. Attracting and retaining skilled talent in these fields is a challenge as talent is in short supply and in high demand globally. ELLIS Europe and ELLIS Alicante aim to attract, retain and help inspire the next generation of excellent artificial intelligence research talent in Europe by offering a globally competitive working environment.   

Fifthly, we cannot forget that software is a living thing, in continuous evaluation and improvement. It is not only necessary to subject models to rigorous testing and evaluation to ensure their quality and performance, but also to plan a process of continuous improvement to keep the model up-to-date and relevant in a constantly evolving environment. Keeping up with the latest developments and competing in a rapidly changing technological world can be a constant challenge.  

Finally, we cannot forget the ethical dimension. It is crucial to address ethical issues and mitigate bias, stereotyping and other undesirable behaviour in the development of language models, as well as ensuring the preservation of privacy and security. At ELLIS Alicante we have a line of research in this regard.

EN