Autor/es reacciones

Rodolfo Zevallos

Researcher in the Language Technologies group at the BSC (Barcelona Supercomputing Center)

The article ‘Joint Speech and Text Machine Translation for up to 100 Languages’ presents SEAMLESSM4T, a multilingual machine translation model that marks a major breakthrough in the field by unifying multiple tasks into a single, robust and efficient system. It supports a wide range of functions, including automatic speech recognition (ASR), text-to-text (T2TT), text-to-speech (T2ST), speech-to-text (S2TT) and speech-to-speech (S2ST), all in a number of languages. It is also notable for its modular design, which allows each component to be used independently. This flexibility is particularly valuable, as it facilitates customisation, optimises the use of resources and enhances its applicability in a variety of practical contexts.

The performance of the model is excellent compared to the state of the art. Moreover, the model's robustness to background noise and speaker variability is another positive aspect, ensuring a high level of accuracy even under adverse conditions. Its contribution to a more responsible artificial intelligence is also remarkable, with significant reductions in toxicity levels and a systematic assessment of gender bias, essential aspects to ensure fairness in its use.

Finally, given the level of innovation and technical complexity of the model presented in the paper, it would be beneficial to have a more extensive version of the article, which would allow us to explore in greater detail the methodological and technical aspects that underpin it. In addition, it would be interesting to further explore the tokenisation (word segmentation) process, particularly for morphologically complex languages, where an adequate representation is crucial to improve the quality of translations.

EN