MERS-Coronavirus Molecular Epidemiology and Genetic Analysis
http://epidemic.bio.ed.ac.uk/node/16/revisions/16/view#overlay-context=coronavirus_analysis - This is an update of an older analysis based on 5 sequences.
We now have 9 complete genome sequences:
4 sequences from the Al-Hasa hospital outbreak have now been isolated and deposited in GenBank by Cotten,M., Watson,S.J., Palser,A.L., Gall,A., Kellam,P., Zumla,A., Memish,Z.A and the Kingdom of Saudi Arabia, Ministry of Health, Riyadh 11176, Kingdom of Saudi Arabia. The genbank links are: KF186564-KF186567.
As these are a tightly epidemiologically-linked I have taken the most recent of these (Al-Hasa 1, collected on 2013-05-09) to add to the genetic analysis.
Name | Location | Accession | Source | Date of collection |
---|
KSA/EMC | KSA | http://www.ncbi.nlm.nih.gov/nuccore/JX869059.2 - JX869059 | http://epidemic.bio.ed.ac.uk/coronavirus_background - Patient 3 | 2012-06-13 |
---|
Qatar/UK England1 | Qatar | http://www.ncbi.nlm.nih.gov/nuccore/KC667074.1 - KC667074.1 | http://epidemic.bio.ed.ac.uk/coronavirus_background - Patient 4 | 2012-09-12 |
---|
England2 | KSA? | http://www.hpa.org.uk/webc/HPAwebFile/HPAweb_C/1317138176202 - HPA Website | http://epidemic.bio.ed.ac.uk/coronavirus_background - Patient 10 | 2013-2-10 |
---|
Jordan-N3 | Jordan | http://www.ncbi.nlm.nih.gov/nuccore/KC776174 - KC776174 | http://epidemic.bio.ed.ac.uk/coronavirus_background - Patient 1 | between 2012-04-09 and 2012-04-19 |
---|
Munich/Abu_Dhabi | Abu Dhabi | http://www.virology-bonn.de/index.php?id=46 - Institute of Virology Website | http://epidemic.bio.ed.ac.uk/coronavirus_background - Patient 17 | 2013-03-22 |
---|
Al-Hasa 1 | KSA | http://www.ncbi.nlm.nih.gov/nuccore/KF186567 - KF186567 | Unknown | 2013-05-09 |
---|
When did these strains share a common ancestor?
With sequences sampled from different times, we can attempt to estimate the rate of evolution. To do this we estimated a maximum likelihood tree under the GTR + gamma model of substitution using PhyML. This is the unrooted maximum likelihood topology with estimated branch lengths:
A maximum likelihood tree estimated using PhyML and the GTR + G model. Branch lengths are in substitutions per site. The tree is arbitrarily rooted midway between the most distant sequences. Numbers below-left of the nodes are bootstrap percentages of 1000 replicates.
A rate of evolution for these sequences can be estimated using root-to-tip regression using our software http://tree.bio.ed.ac.uk/software/pathogen - Path-O-Gen . Here only one Al-Hasa sequence is used as these are strongly linked epidemiologically and cannot be considered independent points. This plots genetic distance from the root of the tree against the time of isolation of each virus:
The root-to-tip regression of genetic distances against time of isolation using the maximum likelihood tree above. The position of the root of the tree was found to maximize the correlation of this plot.
The estimate of the rate of evolution is given by the slope of the line and the time of the most recent common ancestor by the x-intercept:
Rate: | 1.48x10-3 subst/site/year |
tMRCA: | 2011.48 |
This would result in the common ancestor of all sampled viruses being in the in the middle of 2011. This rate of evolution is similar to rates of epidemic influenza A and at least one estimate of the rate of SARS-CoV evolution in humans. The residuals from the link give some indication of the stochasticity of the molecular evolutionary process.
What is the nearest non-human host relative?
The closest non-human sequence to both the human cases is a http://www.ncbi.nlm.nih.gov/nucleotide/291167238 - short fragment from a CoV isolated from a pipistrelle bat in the Netherlands collected in 2008 . The fragment consists 332 nucleotides of pp1b located at nucleotide 15033 in the human CoV genomes. There are 41 differences between the human cases and the bat sequence giving a divergence of 0.123 subst/site which, at the same rate as above (about 1.5x10-3 subst/site/year), corresponds to an MRCA existing about 40 years ago. With the previous rate of 4.4x10-3 this would be over 150 years. So this fragment can tell us little about the possible location and species of the reservoir host for the human cases.
Interpretation
Based on the above results and the restricted geographical range of the known cases, it seems unlikely that this virus has been circulating entirely in humans since these sequences shared a common ancestor. Although it is certain that the virus can spread from human to human (familial cases are noted and the large hospital-associated cluster at Al-Hasa, KSA), a single introduction into humans and subsequent epidemic would be unlikely to have remained restricted to the Ariabian Peninsula (the UK case from January was a transitory visitor to Saudi Arabia).
A more likely interpretation of the data would be multiple zoonotic transmissions from an animal reservoir. If the reservoir has a high contact rate with humans (e.g., a domesticated or farmed animal) then multiple small chains of human transmission could be hypothesized allowing for contact with the cases that have been described so far. It is also clear that a group is developing of the more recently isolated viruses (consisting of Qatar/England1, Munich/AbuDhabi, England2 and the Al-Hasa sequences). It is possibly more plausible that this represents an emerging cluster of human-circulating cases with a common ancestor in the second half of 2012. http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2960982-4/abstract - A recent paper in the Lancet has suggested the virus has a incubation period of up to 12 days. In the time between 2012.6 (the TMRCA of this clade) and 2013.35 (the time of the most recent sample) there is time for a minimum of 22 incubations. At any significant growth rate (R0>>1) this would result in a large number of infections. In this scenario, the recorded cases represent a small fraction of the total number of cases with a bias towards severe cases and traced contacts. However, if this were the case then it would be likely to have spread more widely globally with occasional severe cases croping up to indicate this. With an R0 < 1 then it is possible to get chains of transmission without going to high total numbers of cases but these will generally die out. How long (in terms of the number of infections) these chains will be before they stochastically die out will depend on the value of R0.
Once again, sequence data from more individual cases and potentially some consideration of the spatial pattern of these may be able to tell us more about the likely number of zoonotic events and degree of human circulation.
Deeper virus origins
Whilst the (relatively) close phylogenetic relationship of the human virus to bat coronavirus may indicate bats as an ultimate source of this virus, it seems unlikely that bats are the immediate contact for the human cases as human bat contacts are relatively low frequency. Speculatively, it would be plausible that the virus crossed from bats to a domesticated or agricultural animal which then spread widely within the last few years in the Arabian Peninsula. Further surveillance of both bats and other potential reservoirs will undoubtedly be ongoing and the epidemiology of this virus will become more clear.