Skip to content

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.


COVID Genomes Paint Portrait of an Evolving Pathogen

The COVID-19 pandemic is an unfolding story told in numbers. While news reports focus on the number of tests, cases, hospitalizations, and deaths undulating through time and space, an organization called the GISAID Initiative tracks the number of SARS-CoV-2 genome sequences that researchers have posted, from all over the world. It recently reached a milestone: 75,000.

Consulting genome sequences to follow and predict the spread of an epidemic or pandemic is called genomic epidemiology. It’s important.

Public posting of the first genome of the novel coronavirus from researchers in China, back in early January, got the ball rolling in vaccine development. Some companies were able to plug the new genetic information into vaccine designs already in the works for other viruses.

GISAID began in 2006 to mobilize data gathering for the then-impending bird flu, and was originally called the Global Initiative on Sharing All Influenza Data. Early in 2020 GISAID embraced the novel coronavirus, and it has just added the new swine flu brewing in Asia. The official host nation is the Federal Republic of Germany.

The genome of SARS-CoV-2 is just under 30,000 RNA bases. It is an instruction manual for how the virus spreads, infects, and latches on to the ACE2 receptors that dot many of our cells.

Like all genomes, that of the virus changes. This is one reason why what we thought we knew about the virus changes – because that’s how science works. Scientists who alter their views aren’t making mistakes or misleading anyone. They face unrealistic expectations of science as a final word, which may arise from the popular but erroneous idea of “scientific proof.” We know what we know until something new comes along.

On the Nature of Mutation and Evolution

Changeable genomes arise because of an inherent characteristic of the chemicals that form genomes – the nucleic acids DNA and RNA – the ability to copy (replicate) themselves. And nature is not a perfect copier. Mistakes happen, like typos.

If a mistake in copying DNA or RNA changes the encoded amino acid, then the protein of which the amino acid is a part changes. If the protein changes in a way that alters its function and affects how the organism or virus gets along in the world, then natural selection comes into play.

A change that provides an advantage persists, and is called “positive selection,” aka Darwin’s “survival of the fittest.” The particulars depend upon the species. A mammal may be more likely to survive to reproduce if a mutation affects hair pigment in a way that provides camouflage; a virus may mutate in a way that enables it to spread more easily from person to person, or to bind more ACE2 receptors in a throat than older genetic strains.

Mutations aren’t teleological; that is, they don’t alter a trait to suit a purpose or goal, like camouflage or disease resistance. Mutations are random, but are more likely to occur in parts of a genome where the sequence is repetitive, because the replication enzymes can get hung up – like misspelling banana as banama or bananana.

Genome scientists tease out signals of positive selection and reconstruct how the organism or virus is changing, creating a narrative. “These genomes provide invaluable insights into the ongoing evolution of the virus during the pandemic, which might be helpful to eventually mitigate and control the virus spread,” write the authors of a recent preliminary publication that scrutinized 4,894 SARS-CoV-2 genomes from around the world. And of course vaccine design must stay ahead of changing mutational profiles.

The metric that reveals this microevolutionary change is a ratio of meaningful amino acid changes (nonsynonymous in the jargon) to the genetic changes that do not alter the amino acid (synonymous). (Bio 101 recap: Each 3 DNA or RNA bases in a row encode an amino acid, and connected amino acids form proteins.) A ratio greater than 1 for nonsynonymous change to synonymous change indicates positive selection – evolution in action.

Computational Tools Track Viral Evolution

Researchers probing the evolution of viruses use computational tools to investigate phylogenetics – how members of a species are related to one another.

Algorithms compare pairs of genes or their parts and sort and order the genomes by how closely the sequences align. It’s a little like recovering multiple saved partially-edited Word documents after a crash and reconstructing which is the most recent, next most recent, etc, by the extent of the chunks of text that they share.

Sometimes with DNA or RNA sequences, if the mutation rate is known for a particular gene, a timescale can be superimposed upon the changes. That’s how algorithms (and researchers before the math was invented) construct evolutionary trees of our ancestors.

A second tool, structural analysis, measures the distances between atoms in a viral protein. This information can reveal how the viral spike protein becomes more or less able to latch onto our cells, affecting infectivity.

Structural clues and views can translate directly into vaccine design. The mRNA-based vaccine from Moderna that was the first to publish phase 1 clinical trial results, for example, encodes a spike tweaked to be more stable than the natural one, yet also not send the immune response hurtling into inflammatory overdrive.

The changes that mutations bring may seem subtle, yet they exert powerful effects. Some changes merely alter the orientations of the side chains of the bulkier amino acids, flipping a protrusion up or down. But that can affect where neighboring amino acids touch in the three-dimensional protein, which can affect function.

The genetic and spatial views of the virus complement. The more data we collect, the better we’ll be at predicting effects of new viral variants.

Which Changes are Relevant?

I’ve plowed through several recently published papers that compare SARS-CoV-2 genomes from different times and different places as the infection has circled the globe. Which genomes provide clinically useful information?

Andrew Rambaut and colleagues from the University of Edinburgh, University of Sydney, and University of Cambridge recently reported in Nature a way to sort through the tens of thousands of 30,000-base-long sequences to tag those responsible for enhanced spread of the infection.

The researchers used common sense: if a viral variant disappears from global surveillance reports, it’s erased from the elaborate and ever-growing family tree. “By focusing on active virus lineages and those spreading to new locations, this nomenclature will assist in tracking and understanding the patterns and determinants of the global spread of SARS-CoV-2,” they write. In other words, they’re bestowing cancel culture on the virus.

Even with algorithms galore and insightful scientists, it’s hard to draw conclusions from a changeling. But so far, viral variants are too alike to conclude that a particular change offers an advantage in a way that fuels transmissibility or alters the symptoms and complications of the disease.

The enemy is still too new, circulating among us for a mere six months, to have revealed its secrets.

Leave a Reply

Your email address will not be published. Required fields are marked *

Add your ORCID here. (e.g. 0000-0002-7299-680X)

Related Posts
Back to top