This article is 4 months old

A rereading of the human genome expands the catalogue of genetic variations

Two studies published today in the journal Nature significantly expand the catalogue of known human genetic variations. The resulting data constitute what may be the most complete view of the human genome to date.

23/07/2025 - 17:00 CEST
Expert reactions

Marfany - genoma

Gemma Marfany

Professor of Genetics at the Universitat de Barcelona (UB) and head of group at CIBERER

Science Media Centre Spain

These two articles address some of the main questions yet to be resolved in decoding the human genome sequence. On one hand, most of the available genetic information comes from the genome of individuals with European ancestry, with very limited representation of other human populations. On the other hand, large-scale sequencing based on short reads left regions uncovered or without precise sequence data due to the high number of repetitive sequences in our genome. It's as if some parts of the genome could be examined through high-precision lenses, while others had to be viewed through thick, distorted, and blurry glass.

Using long-read sequencing technologies, the researchers in these two studies applied complementary strategies to analyze the human genome—either sequencing the genomes of over 1,000 individuals from 26 different populations, but with relatively low depth, or deeply sequencing the genomes of 65 individuals from 28 populations. The results reveal that our genome is highly dynamic, with a great deal of structural variation previously unknown—particularly in regions with many repeats, where transposable elements move and jump, and where centromeres (sequences that define the uniqueness of chromosomes) evolve rapidly. Other notable findings include a more accurate definition of Y chromosome variation across human populations, improved resolution of gene regions involved in human diseases, and the high structural variability in chromosome regions crucial to our immune system, such as the genes in the major histocompatibility complex.

Greater knowledge of the structure, variability, and dynamics of our genome across different human populations will allow us to better understand our evolution and adaptation to diverse environments. It represents a shift from a rough map of our genome—focused on major cities and towns—to a much finer-scale view, complete with isolated houses, rivers, and mountains.

The author has declared they have no conflicts of interest
EN

Montoliu - genoma 2

Lluís Montoliu

Research professor at the National Biotechnology Centre (CNB-CSIC) and at the CIBERER-ISCIII

 

Science Media Centre Spain

The sequencing of the first human genome in 2001 was a remarkable milestone. Being able to read the more than three billion base pairs of the genome (albeit with many gaps and uncertainties) allowed us, for the first time, to have a reference genome with which any individual genome could be compared to identify possible disease-causing mutations. That first sequenced genome did not belong to a single individual but was constructed using genetic data from various people. The technology used at the time allowed for relatively short reads. With the development of massive sequencing—which generally also produces short reads of about 150 bases—combined with long-read sequencing in 2022, many of the gaps were filled in and about 200 million new letters were added to the human genome, through a consortium of researchers who named themselves “Telomere-to-Telomere” (T2T), referring to the ends of chromosomes, the telomeres. Something like “from end to end.” In 2023, the sequencing of the Y chromosome—the smallest of all, which had not yet been obtained—was completed, adding another 30 million letters to the human genome, bringing its size to 3.23 billion base pairs. Any two humans share 99.9% of those letters, differing in only 0.1%, which corresponds to about 3.2 million letters (inherited from our mother and another 3.2 million from our father).

The technology that enables the reading of very long DNA strands—tens or hundreds of thousands of intact bases—made it possible in 2023 to begin discovering underlying genetic variability between different human genomes. At that time, genomes from 47 populations around the world were characterized. This was the first version of the so-called "Pangenome," a collection of genomes that captures the existing genetic variability among human beings. There is no single genome; rather, each population (and essentially each individual) has slightly different genomes, especially in intergenic regions—between genes—which make up a whopping 98% of our genome, leaving just 2% for our twenty thousand genes, which are the ones we need to live.

This week, the journal Nature publishes two related collaborative papers from the T2T consortium, along with contributions from many other international laboratories (mostly German and American), in which the most optimized versions of long-read DNA sequencing technologies have been applied. What these researchers found is a large number of previously unknown structural variants (SVs) that had gone unnoticed. For example, if we have a 5,000-base DNA segment repeated several times in tandem and you sequence the genome using short fragments of 150 bases, since each of these segments is essentially identical, you won’t be able to detect all the repetitions—you might at most detect a few. However, if you apply long-read sequencing technology and can pass very long DNA strands through a nanopore that contains all of these tandemly repeated units—either as direct or inverted repeats—you might deduce that one person has, say, 47 repetitions while another has only 23, and inverted ones at that. In other words, this once again reveals additional underlying genetic variability in our genomes—variability we suspected but did not know or couldn’t interpret until technologies emerged that allow us to read very long strands of intact DNA, such as those offered by the most sophisticated sequencing methods developed by Oxford Nanopore Technologies (ONT) and PacBio.

The first paper reports up to 65 representative human genomes (expanding the pangenome) containing up to 130 haplotypes (contiguous chromosomal fragments inherited together from parents), filling in many of the previously unknown intervals and gaps still present in the human genome. A second paper details the most precise sequencing yet, using long reads from over a thousand human individuals, enabling the identification of up to 100,000 structural variants and 300,000 variable-number tandem sequences. Mobile elements—jumping genes, transposons, and retrotransposons—are proposed as the origin of this structural diversity, along with the existence of homologous recombination events, that is, the mixing of sequences based on the similarity of their bases.

We still know very little about the true meaning and impact of having 40 or 400 copies of a specific DNA segment, but what these two publications show is that each individual’s genome is unique, with its own structural variations, which can coexist within a population. Hence the move toward the pangenome (a set of descriptive genomes from dozens of human populations) as our new "reference genome"—not just a single genome anymore, but many genomes—which we should use to detect the presence or absence of mutations in genes or intergenic sequences that can help diagnose people with genetic diseases. Genetic diagnosis always precedes the development of any potential gene therapy. That’s why these two papers are significant: they reveal the additional complexity of our genome, which is much more variable between individuals than we ever imagined. And this should help us better diagnose patients affected by congenital disorders or genetically based diseases.

The author has not responded to our request to declare conflicts of interest
EN
Publications
Journal
Nature
Study types:
  • Research article
  • Peer reviewed
Topics genetics
The 5Ws +1
Publish it
FAQ
Contact