Tracing the evolution of the coronavirus has taken over Anne-Catrin Uhlemann’s life.
Formerly, the Columbia University researcher studied why drug-resistant bacteria often thrived inside immunocompromised people. But when the COVID-19 swept over New York last March, the devastation motivated her team to shift focus. Medini Annavajhala, a postdoctoral researcher in the lab, volunteered to begin storing coronavirus specimens at the medical center’s “biobank”—in case they were needed for future investigations. Last spring and summer, the team also pivoted to searching for potential mutations among the coronavirus—an early, but small, effort that didn’t yield much at first but ultimately paid off.
After the variant from the United Kingdom, B.1.1.7, emerged last autumn, the attention on genomic sequencing—the cutting-edge method for spotting mutations—drastically increased in New York. Uhlemann’s lab, too, rapidly ramped up the number of coronavirus genomes they’d find.
In February, her team helped identify the New York City variant (B.1.526)—a new mutant that originated in the five boroughs and potentially reduces the effectiveness of our immunity defenses. Its spread is now dominant in the city, even outpacing the ever-dangerous U.K. variant.
The past three months have been frantic.
“The past three months have been frantic, just trying to figure it all out,” Uhlemann said.
Her lab is one of many around the city hunting down the variants through genome sequencing. That's how we know there are hundreds of variant cases in the metro area—a factor that may explain why progress against the city’s outbreak has stalled and why the virus is resurging in New Jersey. Nationwide data from the Centers for Disease Control and Prevention shows the New York variant now makes up a larger portion of cases than the dangerous strains from South Africa (B.1.351) and Brazil (P.1). Yet B.1.526 still hasn’t received the CDC’s highest threat-level designation: variant of concern.
“There is an interagency group that looks at these variants and classifies them,” CDC Director Rochelle Walensky said Wednesday during a briefing for the White House COVID-19 Task Force. “That interagency group—that CDC is a part of—is actually looking at exactly this question right now.”
Carefully following the variants will be a crucial weapon in how we beat back the pandemic this year. The lengthy process can take a full work week—from acquiring COVID-positive nose swabs collected at hospitals or test sites to uploading the genetic information about the germ into a global database so researchers across borders can compare notes. Gothamist visited four labs around New York City to learn about the process. This is how it works.
Extraction
At Uhlemann’s lab, the “COVID life cycle” begins by pulling positive specimens from the biobank. Researchers keep individuals’ samples in separate areas of the lab, usually separate rooms altogether. They also keep the COVID-related activities physically distant from ongoing projects unrelated to the virus.
“There’s potential for contamination at every step, so keeping things physically separate helps us,” said Annavajhala.
Every nasal swab is like a zoo. It contains human cells along with the cornucopia of viruses and bacteria that regularly reside inside anyone’s nose. So, scientists must extract the coronavirus’s genetic material called RNA—isolating it from any DNA or other non-COVID nucleic acids found in the swab. But RNA is also less stable and easily degrades. Quality control checks are put into place to ensure a specimen yields enough virus pieces.
At the New York Genome Center, samples are frozen at -80 degrees Celsius (-112 degrees Fahrenheit), March 17th, 2021.
The samples are kept in ultracold freezers between being prepped and sequenced to avoid degradation of the genetic material, March 17th, 2021.
Even in the best scenarios, these leftovers make up a minuscule amount of molecules. So this RNA has to be amplified by making millions of copies of the genetic material—akin to tracing over words written on a sheet of paper to make them look bolder. That’s done through a polymerase chain reaction (PCR) machine, creating a clear liquid kept in a plate holding samples from a few dozen patients. As it is amplified, the RNA is converted into cDNA, or complementary DNA, which is the genetic material that gets untangled to eventually analyze.
Enter the sequencer
The scientists add a molecular barcode—an extra piece of nucleic acid—to the specimen, a stage known as library prep. Now, they can put these meticulous preparations into a high-tech machine—a sequencer—to parse out which mutated virus came from which patient. The sequencer is what reads the strands of genetic material.
“You’re throwing millions of fragments of cDNA at the sequencer at the same time, and so it reads these little barcodes and it says, ‘Oh, I know it came from this well on the plate,’” said Annavajhala.
On a recent Wednesday downtown at the New York Genome Center, associate scientist Atit Raval placed such genetic material mixed with a chemical cocktail onto glass slides called flow cells. The procedure looks a bit like getting an ink cartridge ready to place into a giant printer, or even a Xerox machine. It is, of course, far from that.
Associate scientist Atit Raval places coronavirus genetic material onto a flow cell, a glass slide used during the sequencing process at the New York Genome Center, March 17th, 2021.
The NovaSeq 6000 sequencer at the New York Genome Center, March 17th, 2021.
Flow cells—glass slides—are where a selection of case specimens are placed for the sequencing machine to read, such as this one from the New York Genome Center, March 17th, 2021.
Scientists at the genome center use the NovaSeq 6000, a nearly $1 million system from the biotech company Illumina that takes a day to process the samples. Other sequencers are designed to make studying pathogens more mobile. Annavajhala at Columbia uses an Oxford Nanopore Minion—so small it fits in the palm of your hand.
Once the drops of liquid are inserted into a flow cell—with millions of fragments of nucleic acids from 96 patients—she can plug it into a computer and watch the data stream onto the screen in real-time.
This digital collection is sent to bioinformaticians who can then determine what could be hiding in the data dump—like a mutation called E484K, which scientists call “Eek.” It is found in the variants from South Africa, Brazil, and New York City and may help the virus dodge defense systems.
Moin Chowdhury, a 33-year-old supervisor at the city DOH Public Health Lab, looks at genetic data from different viruses, including coronavirus, as it streams onto a computer screen, March 15th, 2021.
Many of the genetic changes don’t mean much.
“A mutation may not have any effect,” said Scott Hughes, the deputy director of the city Health Department’s Public Health Lab. “Some, again, end up as a variant of concern. The job of the bioinformatics person is to design this software, these pipelines that will then take that sequence and make sense of it.”
Getting everyone on the same page
Genomic surveillance in the U.S. has been slow-going during the pandemic. Soren Germer, the senior vice president of genome technologies at the genome center, said variant tracking efforts have been scattered—similar to what has happened for other portions of the nation’s response.
“There hasn’t been the same coherent, concerted effort in the U.S. today,” Germer said.
The genome center could trace 1,000 samples a day if supplied with enough nose swabs, but getting approvals to obtain patient specimens from other institutions can delay progress. There are fewer bureaucratic setbacks at labs associated with hospitals. Melissa Marton, associate director of the production research lab at the center, added that the lack of government resources for viral sequencing also represents a challenge.
Soren Germer, the senior vice president of genome technologies, and Melissa Marton, associate director of the production research lab, at the New York Genome Center, March 17th, 2021.
In late February, President Joe Biden directed $200 million in federal funding to the Centers for Disease Control and Prevention to identify and track new variants and triple the number of cases sequenced from 7,000 to 25,000 per week. The nation’s labs are currently reporting about 11,000 sequences per week, but the CDC doesn’t release full tallies on the New York variants because that practice is reserved for “variants of concern.” The New York State Department of Health similarly doesn’t publish variant cases on its COVID dashboard or its public archives, unlike its neighbor New Jersey.
The Pandemic Response Lab—a public-private partnership between the health department and the Brooklyn biotech Opentrons—is the city’s most concerted effort to sequence for variants, where lab operators initially hoped to sequence about 2,000 cases a week. The most recent data from the city show the partnership is falling below this goal with about 900 sequences per week during March. The city and state Health Department’s labs are also analyzing cases in Manhattan and at the state’s Wadsworth Center in Albany, respectively.
Adriana Heguy, who leads NYU Langone’s Genome Technology Center, noted genomic sequencing efforts have improved in the past month, but she fears more variants will pop up in states without mask mandates. Every new case offers another opportunity for the virus to mutate—a perpetual danger whenever transmission spirals out of control. Understanding mutations can also help vaccine manufacturers tailor booster shots to protect us from emergent versions of the coronavirus or build a universal shot to stop them all, she added.
“We have to see which variants overtake the other variants, and we have to see if the variants are related to, say, escaping immunity,” she said. “All of these things are right now up in the air. Obviously, the next few weeks, few months are going to be critical.”
Editor's note: After publication, the city health department released fresh data on its variant surveillance. This story was updated to reflect the latest numbers.