DeepMind and the EMBL European Bioinformatics Institute (EMBL-EBI) have just made freely and openly available to the scientific community the Artificial Intelligence (AI)-based predictions of the three-dimensional structures of almost all catalogued proteins known to science, via the AlphaFold protein structure database.
Proteins are the building blocks of life. They are the machinery of cells, perform many functions in the body and are very necessary for the regulation of tissues and organs. Their shape is closely linked to their function. Being able to have a 3D prediction of a protein's structure allows a better understanding of what it does and how it works.
Structural biology, which, among other things, studies the structure of proteins, is fundamental to understanding how the components of the cell function and is relevant to virtually all areas of life science research. This is therefore a real milestone that puts a highly valuable tool in the hands of all researchers, in a completely open way, helping them to expand knowledge and address global challenges.
Proteins are the machinery of cells and their shape is closely linked to their function. Being able to have a 3D prediction of the structure allows a better understanding of what it does and how it works
Over the past few months, the EMBL-EBI Protein Data Bank in Europe (PDBe) team and DeepMind have worked frantically to increase the size of the database 200-fold, from around one million to more than 200 million structures, covering almost every organism on Earth whose genome has been sequenced.
The expanded database includes predicted structures for the widest range of species, including plants, bacteria, animals and other organisms. This opens up new avenues of research in the life sciences that will have an impact on global challenges, from developing an effective malaria vaccine, understanding Parkinson's disease, improving the health of honeybees or combating plastic pollution.
Although these predictions are not the treasure itself, they do represent the map that can lead us to it
Although these are predictions and some may contain errors, the database will allow scientists to design experiments and test hypotheses based on these structures, greatly facilitating their work. By analogy, we could say that, although these predictions are not the treasure itself, they do represent the map that can lead us to it.
The work and opportunities open to the scientific community are enormous, the impact on all areas of biology is incalculable. This version of the database will also go a step further to open up new avenues of research in bioinformatics and computational analysis, allowing researchers to potentially detect patterns and trends in the database that will allow us to understand the fundamental principles that determine protein folding.