A team from the Centre for Genomic Regulation (CRG) and Pompeu Fabra University (UPF) in Barcelona has developed an artificial intelligence tool capable of designing regulatory sequences for genes that do not exist in nature. When introduced into cells, these enhancers can increase or decrease gene activity in a specific way depending on the type of cell targeted. According to the authors, ‘the potential applications are enormous. It's like writing software, but for biology.’ The results are published in the journal Cell.

Darío Lupiáñez - CRG potenciadores EN
Darío Lupiáñez
Principal investigator of the 3D Genomics group at the Andalusian Centre for Development Biology (CABD)
What do you think of the study overall? Is it of good quality?
"Although the human genome sequence has been available since 2001, the reality is that, more than two decades later, its functioning remains enigmatic in many respects. Long before that major milestone, we had already managed to decipher the genetic code, i.e. the principles by which genes give rise to proteins. However, genes represent only 2% of the genome; the remaining percentage corresponds to non-coding DNA (formerly mistakenly referred to as junk DNA). Within that 98%, there is another code that is perhaps even more important and whose functioning remains a mystery: the regulatory code.
For cells to function properly, genes must be activated at the right time, in the right place and in the right amount. This activation does not happen randomly, but depends on certain specific DNA sequences called enhancers. Enhancers act as genetic switches that control which genes are turned on and when. Recent studies estimate that the non-coding genome contains millions of enhancers. Despite knowing about their existence for decades, we still do not understand how these sequences work, i.e., what combination of DNA letters (the nucleotides A, C, G and T) allows them to do their job in each specific context. In this sense, this study represents a very important step towards understanding the code that regulates gene activity.
The study, carried out by researchers from the CRG and the UPF, addresses this problem by using artificial intelligence to design thousands of synthetic DNA sequences, each with a different combination of binding motifs. These motifs are very short sequences of nucleotides that can attract certain proteins called transcription factors. These proteins are key, as they bind to enhancers and activate or block genes. The researchers evaluated these sequences in a blood cell differentiation model using parallel and massive reporter analysis technologies. This technology allows thousands of different sequences to be studied at the same time. This approach has made it possible to decipher some of the rules governing the functioning of these genetic switches in different cellular contexts.
In summary, this is excellent work from an experimental and computational point of view, combining tools from synthetic biology, artificial intelligence and cell biology in an innovative way".
How does it fit in with existing evidence and what new insights does it provide? What implications could it have?
"This study explores how combinations of motifs within the enhancer sequence influence their function, thus establishing the rules that allow us to understand how they work. One of the most relevant observations of this study is that the behaviour of motif combinations can be quite unpredictable and even counterintuitive. For example, certain motifs can cause gene activation when acting in isolation. However, when combined with other activating motifs, they can have the opposite effect and reduce expression. In addition, the authors discover that these motif combinations do not work the same in all cell types. In other words, the same sequence can be very active in one cell type but completely inactive in another. This highlights that each cell type has a particular way of ‘interpreting’ the code, which adds a degree of complexity to gene regulation.
By analysing thousands of these combinations, the authors construct predictive models that allow them to predict which type of sequence will be active in a specific context. In this way, they are able to design synthetic enhancers that can activate genes specifically in a particular cell type. This type of activation can be used to direct the differentiation process towards specific cell types or to treat certain genetic diseases. Traditionally, gene therapies have used enhancers that occur naturally in the genome, but these are often not specific enough and can activate genes in unwanted tissues, causing side effects. The design of ‘à la carte’ enhancers, as detailed in this study, would allow gene expression to be controlled with much greater precision, reducing risks and expanding the possibilities of this type of therapy.
Beyond applications in biomedicine, these discoveries are also relevant in other scientific areas. Understanding how the genome code works can help us predict how certain mutations could contribute to genetic diseases by altering the functioning of enhancers. In addition, these principles can be applied to other species, opening doors in fields such as agriculture (e.g., designing plants with specific characteristics) or industrial biotechnology".
Are there any important limitations to consider?
"The work carried out by the authors is impressive, from both a conceptual and technological point of view. However, the high costs associated with this type of experiment limit the study to 38 transcription factors and 7 different cell types. Despite this remarkable effort, this represents only a small fraction of the combinatorial possibilities, as there are more than 1,600 transcription factors and more than 200 different cell types in the human body. Therefore, to fully decipher the ‘regulatory code’ governing enhancer activity, it would be necessary to analyse many more combinations of motifs in a wider variety of cellular contexts.
This work merely highlights something that was already suspected: that the combinatorial and regulatory capacity of the regulatory genome is virtually infinite, providing fertile ground for the emergence of new cell types and functions in an evolutionary context. In this sense, this study represents a crucial step in understanding this regulatory code, demonstrating that it is possible to begin to decipher its rules. Furthermore, it highlights the usefulness of artificial intelligence in addressing highly complex combinatorial problems, laying the foundations for more extensive studies in the future".
- Research article
- Peer reviewed
Frömel et al.
- Research article
- Peer reviewed