Autor/es reacciones

Lluís Montoliu

Research professor at the National Biotechnology Centre (CNB-CSIC) and at the CIBERER-ISCIII

 

A mouse, a turbot, and a human have roughly the same number of genes, around 20,000 genes each. However, everyone understands that we are very different from a mouse and a turbot. Therefore, in a way, the number of genes we have doesn't tell us about our complexity or what we will be like. The solution to this paradox lies in the non-coding genome, the part that doesn't contain protein-coding information.

Remember that, in all these animals, and particularly in us, those 20,000 genes barely make up 2% of our entire genome. This surprised geneticists for many years. What's in the remaining 98% of the genome? (That is, in most of the genome!). The answer is that this dark and unknown part of the genome harbors many repetitive sequences, many mobile elements (transposons and retrotransposons), and, most importantly, the regulatory DNA sequences (to which specific proteins bind) that tell a gene when and where to start functioning, or when and where to turn off. In other words, the morphological complexity and diversity we see in a mouse, a turbot, and a human is not achieved by increasing or decreasing the number of genes, but rather by altering precisely these regulatory sequences. These sequences cause the same genes to be activated or deactivated at different times during development or in different cells, depending on the species. Essentially, if we change the activation and deactivation program of the same genes, we will generate very different animals. And that is what ultimately produces animals as disparate as these three, even though they have a similar number of genes.

For years, algorithms and computer programs have been developed to analyze this non-coding, non-meaningful part of the genome, searching for precise sequences known from other research to be regulatory elements. These elements come in many forms: some are associated with gene activation, others with gene silencing, and some even act as insulators, separating what happens in one gene from what will happen in a neighboring gene. These predictive programs are based on the systematic comparison of sequences known to operate as activators, silencers, or insulators (among many other types of regulatory elements) in a specific DNA sequence, which is usually not too long, typically a few thousand nucleotides. Each of these programs usually specializes in detecting one of these types of regulatory elements.

All of this has been turned upside down with the emergence of AlphaGenome, whose public launch has just been published in the journal Nature. A new artificial intelligence from Google, yet another one, is capable of performing all these inspections and deductions on virtually all types of regulatory elements, all at once, and it can do so on enormous segments of DNA, up to a million letters long—something we didn't know how to do.

Google already surprised the scientific community a few years ago with AlphaFold and its impressive ability to predict the structure and folding of proteins from just the DNA sequence. Now it leaves us speechless again with AlphaGenome and its ability to interpret and predict all those non-coding sequences in the genome, the non-meaning sequences found in a huge segment of our genome. To develop this AI, researchers trained it by analyzing the human and mouse genomes.

Naturally, this will have a significant impact not only on basic research, to understand how genes work, but also on more practical, applied aspects. For example, how to identify new DNA sequences that are relevant in those areas of the genome that are usually overlooked or not taken into account. The information provided by AlphaGenome must be carefully considered and analyzed when addressing genetic diagnoses.

An alteration in any of these regulatory elements, preventing the activation or silencing of a gene when it should occur, can result in a change in the pattern of embryonic development or the appearance of symptoms of a pathology associated precisely with the abnormal functioning of that gene (without the gene's sequence itself having changed at all). In this case, the mutation would not be within the gene but outside, in more or less distant sequences, in the DNA sequences that conceal these regulatory elements. This AI, called AlphaGenome, will now efficiently discover these elements in any segment of the genome we want to analyze, especially in the non-coding genome—the genome we used to call meaningless (when we didn't understand it)—which, thanks to AlphaGenome, we can now begin to decipher with much greater precision and detail than we previously knew.

EN