For those of you watched the movie Contagion, which came out in 2011, you might be forgiven for believing that it’s not that hard to find “patient zero,” the index patient of a pandemic. Another impression you might have got is that a pandemic starts right off the bat (no pun intended!) when patient zero gets infected by some animal. Both impressions are, however, are likely to be dead wrong.
Zoonotic infections by definition involve the jump of a biological agent, such as a virus, from an animal to a human. But, and this is a critical point, most viruses are not optimized to permit human-to-human (h2h) transmission, so often the infection stops immediately, or there is very limited transmission to other humans. A good case in point is HIV (human immunodeficiency virus), which took several decades to be established in the human population. Another example is H5N1, the avian flu virus strain that greatly concerned us in the years 2004-2006. In research for a paper I published on the treatment of H5N1 back in 2006, one of the things that struck me was the absence of h2h transmission. Sure, the mortality rate of 60% was scary, but there has been very limited transmission between humans (mostly isolated families) and never any sustained transmission, otherwise the world would likely be a very different place! The question is, why?
The Phases of a Pandemic
The WHO developed a tool to help us understand the phases of a pandemic back in 2009, so it is based on a zoonotic influenza outbreak, but for our purposes it doesn’t matter. Phases 1-3 represent no infection, reported animal-human infections, and limited h2h transmission, respectively (probably an R0 between 1 and 1.1). But it’s only in phase 4 we get into serious trouble. So, I want to approach the origin of the SARS-CoV-2 pandemic from that perspective.
This Virus May Have Been Around Quite a Long Time
I know that many of you would love for me to tell you that the pandemic started as result of a lab accident at the now infamous P4 lab of the Wuhan Institute of Virology (WIV) in China. I’m not taking that hypothesis off the table, but I want to propose a new one: this virus has been around quite a while and some of the evidence for phases 1-3 has started to emerge, but it’s not a theory that most are going to people like.
The standard version of the story is that the pandemic started in Wuhan with an outbreak in December of 2019, according to the WHO. Most scientists agree that thousands of coronaviruses have been circulating in Chinese horseshoe bat populations for decades, and that the bats living in the caves in Yunnan province were a likely source. But when did first contact between humans and the actually virus occur?
One intriguing hypothesis advanced by Drs Jonathan Latham and Allison Wilson suggests that first contact was in 2012. In the spring of that year, six miners working in a Yunnan province mine (the Mojiang mineshaft) infested with bats contracted a mysterious illness whose major characteristics bear an uncanny resemblance to SARS-CoV2. How do the authors know that? By obtaining and having translated a key Chinese Master’s thesis describing the events. That translated thesis is now in the public domain thanks to two other Indian researchers who published their findings in the journal Public Health in October 20 of this year. The coronavirus responsible for the miners’ illness (RaTG13) was genetically analyzed in 2017-18, and has a 96.2% sequence identity to the early samples of the SARS-Cov2 virus obtained from Wuhan patients in December 2019 and whose sequence was published by the Chinese in January of 2020. That mineshaft and many other caves harboring bats in the Yunnan province were visited by members of Dr. Zheng-li Shi’s lab at the WIV many times over several years and the likely goal was obtaining coronavirus samples with human pathogenic potential. Dr Shi herself maintained that the 3 miners died of a fungal infection but we know that is nonsense. Regardless of the shenanigans that occurred, this is not our focus.
So, this incident constitutes phase 2. However, one key piece of information is missing: what actually happened to the miners once they were infected? Latham and Wilson make a plausible case that because of the intensity of infection (read: very high viral load), the fact that the lungs were deeply infected, the possibility of co-infection with other bat coronaviruses, RaTG13 evolved in a manner of human passaging, analogous to passaging techniques used in gain of function animal experiments. In other words, the RaTG13was capable under the right circumstances of becoming a pandemic progenitor.
2019: Stranger Things in Europe
On November 11th this year, a fascinating article was published in the Tumori Journal (an Italian cancer journal). The authors investigated the presence of SARS-CoV-2 receptor-binding domain (RBD)–specific antibodies (i.e., antibodies to the spike protein) in blood samples from 959 asymptomatic individuals enrolled in a prospective lung cancer screening trial between September 2019 and March 2020. Antibodies were detected in 111 of 959 (11.6%) individuals, starting from September 2019 (14%), with a cluster of positive cases (>30%) in the second week of February 2020 and the highest number (53.2%) in Lombardy. Study limitations notwithstanding, this is astonishing. In plain language a SARS-CoV-2 strain was circulating as early as September of 2019—perhaps even earlier. Second, 77% of the patients smoked for over 30 years—the remainder were ex-smokers—93% were older than 55 years, and 60% were overweight. These were not exactly healthy people and they were all asymptomatic for Covid-19. Also, according to the authors, it wasn’t until November–December 2019, that many general practitioners began reporting the appearance of severe respiratory symptoms in elderly and frail people with atypical bilateral bronchitis, which was attributed, in the absence of news about the new virus, to aggressive forms of seasonal influenza. So where did these infections come from, and more importantly, were the earlier viral infections less severe?
Analysis of sewage in Barcelona, Spain, also turned up some strange results. These were 24-hour composite raw sewage samples from two large wastewater treatment plants analyzed for the presence of SARS-CoV-2 from April 13, 2020, in the peak of the epidemics, to May 25. In addition, frozen archival samples were analyzed from one plant in 2018 (January-March), 2019 (January, March, September-December) and 2020 (January-March). This latter analysis showed that all samples were negative for the presence of SARS-CoV-2 genomes with the exception of March 12, 2019, in which both IP2 and IP4 target assays (fragments from the RNA-dependent RNA polymerase gene) were positive. This study was criticized heavily and remains in medrxiv archive (a preprint archive) and I have not seen it published in the peer-reviewed literature. Whether those results are truly believable or not, remains to be seen.
While we have seen very little information about what happened in Wuhan prior to December, a group of scientists did present a non-peer-reviewed study entitled “Analysis of hospital traffic and search engine data in Wuhan China indicates early disease activity in the Fall of 2019.” While roundly criticized for several reasons, the study used previously validated data streams—satellite imagery of hospital parking lots and Baidu search queries of disease related terms—to investigate whether something was going on in the fall of 2019 in Wuhan. The authors found an upward trend in hospital traffic and search volume beginning in late summer and early fall 2019. While queries of the respiratory symptom “cough” show seasonal fluctuations coinciding with yearly influenza seasons, “diarrhea” is a more COVID-19 specific symptom and only shows an association with the current epidemic. Their conclusion was that the increase of both signals preceded the documented start of the SARS-CoV-2 pandemic in December. “Digital” studies of this kind have been used in clinical research many times before and have shown remarkable predictive power.
Finally, major, genetic research published in preprint form (not peer reviewed; and full paper not accessible) of many SARS-Cov-2 strains suggests that the Indian subcontinent has the highest strain diversity, according to researchers. They also emphasize that Wuhan was not the locus where h2h SARS-CoV-2 transmission first happened, and that before it spread to Wuhan, the virus had already experienced adaptive evolution during its h2h. In my opinion, some adaptive evolution might have occurred but not sufficient to make any strain infectious enough to start a pandemic.
Collectively these results are still low-level “circumstantial” evidence but they make me wonder if one or more strains very close in genetic composition to the strain that emerged in Wuhan in late December of 2019 were circulating in Europe for several months and even present for several months earlier in Wuhan. If this really did occur, why didn’t the pandemic take off earlier? Regardless of what the Chinese say, and clearly they are trying hard to deflect all the criticism of the pandemic having originated in China, I suspect it was because two key changes in the coronavirus genome weren’t fully optimized. These changes would enable the virus to move from phase 3 to phase 4. The recent work of the same Chinese researchers seems ironically to support this conclusion; they also make an interesting argument that a fast evolutionary rate, meaning lots of mutations that may have been selected based on evolutionary pressure, may have resulted from loss of genetic material coding for some of the nsp proteins, resulting in loss of replication fidelity. Alight then, what are the key mutations?
The Receptor-binding Domain and the Furan Cleavage Site
The receptor binding domain (RBD) in the SARS-CoV-2 virus is critical to human infection. A recent key study determined the crystal structure of the spike protein RBD for SARS-CoV-2 in complex with the cell receptor, ACE2. Compared to the RBD for the original SARS coronavirus, an ACE2-binding ridge in the SARS-CoV-2 RBD has a more compact conformation; moreover, several amino acid changes in the SARS-CoV-2 RBD stabilize two virus-binding hotspots at the RBD–ACE2 interface. These structural features of SARS-CoV-2 RBD increase its ACE2-binding affinity. Even more interesting, RaTG13 also uses human ACE2 as its receptor. The authors of the study noted [italics are my emphasis]: “Second, as with SARS-CoV-2, bat RaTG13 RBM [a region of the RBD] contains a similar four-residue [amino acid] motif [a characteristic pattern of amino acid acids] in the ACE2 binding ridge, supporting the notion that SARS-CoV-2 may have evolved from RaTG13 or a RaTG13-related bat coronavirus). Third, the L486F, Y493Q and D501N residue changes from RaTG13 to SARS CoV-2 enhance ACE2 recognition and may have facilitated the bat-to-human transmission of SARS-CoV-2. A lysine-to-asparagine mutation at the 479 position in the SARS-CoV RBD [the original SARS] (corresponding to the 493 position in the SARS-CoV-2 RBD) enabled SARS-CoV to infect humans. Fourth, Leu455 contributes favourably to ACE2 recognition, and it is conserved [meaning present in both] between RaTG13 and SARS CoV-2; its presence in the SARS CoV-2 RBM may be important for the bat-to-human transmission of SARS-CoV-2.” In summary, several mutations were necessary for the putative RaTG13 progenitor coronavirus to evolve into the pandemic Covid-19 virus, but with enough humans and large loads of virus, these changes could have easily occurred.
The polybasic furan cleavage site not only confers pathogenicity but is essential for the virus to infect human lung cells. All coronaviruses have a spike protein, which is incorporated into the viral envelope and facilitates viral entry into target cells. Specifically, surface unit S1 binds to a cellular receptor while the transmembrane unit S2 facilitates fusion of the viral membrane with a cellular membrane. However, membrane fusion depends on S protein cleavage by host cell proteases at the S1/S2 and the S2′ site, which results in S protein activation. The enzyme furin—ubiquitous in human cells and organs—actually does the cleaving but because it is literally everywhere in the human body, this means the virus can get in anywhere.
The actual cleavage site in the COVID-19 virus was created by a 12‐nucleotide insert TCCTCGGCGGGC coding for a PRRA [proline-arginine-arginine-alanine] amino acid sequence at the S1/S2 junction. Of interest the conjoined arginine residues are coded by two CGGCGG codons, which is rare for coronaviruses: only 5% of arginines are coded by CGG in SARS‐CoV‐2 or RaTG13; moreover, CGGCGG in the new insert is the only doubled instance of this codon in SARS‐CoV‐2. For a progenitor virus such as RaTG13, how would it acquire this magic sequence? One suggestion by Latham and Wilson is that it occurred through recombination of an epithelial sodium channel protein called ENaC-a whose furin cleavage site is identical over eight amino acids to SARS-CoV-2 when RaTg13 (or something like it) found itself in human epithelial or lung tissues. Obviously, this is a hypothesis but no one else has been able to explain how the SARS-CoV-2 virus acquired it.
I apologize; this is tough stuff to understand. For those of you who are more visually oriented, look at Figure 2.

Single point amino acid mutations upon a
comparison of RaTG13 and SARS-CoV with SARS-CoV-2 are shown in cyan green and
light pink spheres, respectively. Mutations among the clinical isolates are shown in red.
Comparative structural analysis between and RaTG13 and SARS-CoV-2 highlights crucial
mutations restricted to ACE-2 receptor binding domain. Four amino acid inserts PRRA were
observed in all clinical isolates and absent in RaTG13; this is highlighted as green spheres.
The corresponding nucleotide alignment is also shown highlighting gaps. The insert in the
disordered region appears at a solvent accessible site which could play a crucial role in
receptor binding.
Summing up
There is some evidence that coronavirus strains similar to SARS-Cov-2 might have been circulating for several months, possibly longer, prior to the outbreak in Wuhan. Most probably this was a sideshow if it did indeed occur. The reason I say that is if somewhere along the line the key mutations occurred in an individual who was in another country besides China, “patient zero” would have had to quickly fly to Wuhan without infecting anyone else, otherwise the pandemic would have had more than one focal point. Also—and no one is commenting on this—given the lineage of coronaviruses, if strains similar to SARs-Cov-2 were circulating around the world earlier in 2019, they most likely had their origin in China.
We do not yet know the sequence of genetic events that led some SARS-Cov-2 ancestor to become the virus strain that caused the outbreak in Wuhan, and in all likelihood that ancestor may not be the RaTG13 coronavirus strain that is so much discussed. Importantly, none of what we have discussed here invalidates the lab accident theory. We just don’t know enough to hedge our bets.
If there’s one thing I have learned this year it is this: just because a thousand virologists and epidemiologists say something is just so, it doesn’t mean they are right.