At the intersection of machine learning and life science
At the intersection of machine learning and life science
Despite some compelling arguments specialists might make for the ecological benefits of mosquitoes, to put it mildly they are generally an unloved species. But far from being a mere nuisance, from a public health perspective the mosquito is considered to be the most dangerous animal on the planet. They are a vessel for an ever-changing repertoire of viruses, bacteria, and other microorganisms, some of which can be deadly if transmitted to their human or animal hosts. This picture is made more complicated by differences between mosquito species, their travel and feeding patterns, and the predicted expansion of their global breeding grounds due to climate change.
Multiple efforts in the past five years have used metagenomic sequencing to catalog the contents of mosquitoes collected from around the world. However, Chan Zuckerberg Biohub’s Amy Kistler points out that these DNA and RNA catalogues emerge from what she calls “the blender approach”: sequencing dozens of mosquitoes mashed together in one sample. Kistler, an Infectious Disease Initiative Group Leader at the Biohub, argues that this approach, combined with an overemphasis on identifying viruses rather than the full swath of microorganisms carried by mosquitoes, misses potentially critical epidemiological information that can inform interventions and treatments.
For example, the information garnered from bulk sequencing approaches fails to address important questions key to understanding the transmission and circulation of pathogens, such as how prevalent a given microbe is within mosquito populations, what the animal reservoirs are for specific microbes, and how bacteria and eukaryotic microbes influence the fitness of viruses in mosquitoes and vice versa.
In a recent paper published in the journal eLife, Kistler’s team took a different, “cherry in every cocktail” approach, sequencing the entire contents of individual mosquitoes collected from various sites in California in the fall of 2017. This unearthed a treasure trove of data that included sequences from dozens of previously unknown viruses, sequences from recent blood meals that identified the mosquitoes’ animal reservoirs, and even sequences that fill in genomic gaps in known viruses. The single-mosquito approach also gave Kistler’s team unprecedented information about co-occurrences of these various sequences, which could ultimately help with xenosurveillance—identifying emerging pathogens circulating in humans and other animals.
“I think that’s where metagenomics really shines; you see everything. You learn about the host, the pathogens they carry, and the animals they’re biting,” says Kistler, who was joined in the work by colleagues from the Gothenburg Global Biodiversity Centre in Sweden; the Alameda County Mosquito Abatement District; Stanford University; and UC San Francisco. “And metagenomics is particularly powerful towards understanding factors like prevalence and linking blood meals and pathogens—if you can measure them in individual insects.”
Current surveillance by vector-control agencies in California involves directed screening of hundreds of mosquitoes for known mosquito-borne human pathogens including West Nile, St. Louis Encephalitis, and Western Equine Encephalitis viruses, to determine whether these are circulating. Although this approach can detect changes from one week to the next that could warrant a call to action, according to Kistler it is not sufficiently quantitative, especially as blood meal sequences make up only a tiny sliver of the data. The single-mosquito approach is more labor-intensive and expensive, but the richness of the data could warrant using it to complement more basic surveillance efforts.
“Single-mosquito analyses allows us to understand, in more quantitative terms, what viruses and other pathogens might be lurking in the mosquitoes and from which animal the mosquitoes might be acquiring or transmitting pathogens,” says Kistler.
In the current work, viruses dominated those sequences that were not from the mosquitoes themselves. In the 148 mosquitoes the team tested from five sites throughout California, the approach identified 24 viruses that are closely related or identical to known species, as well as 46 novel virus species. A few mosquitoes were carrying a shocking amount of virus, with nearly 100% of the non-host sequences coming from a single virus, prompting Kistler to begin referring to mosquitoes as “flying bags of viruses.”
Among the prokaryotic sequences, the team unsurprisingly found Wolbachia bacteria, which most mosquitoes carry. Eukaryotes rounded out the data, and about half of the eukaryotic sequences were from Trypanosomatidae, including some known animal parasite species. They also found sequences from mammals, birds, invertebrates, fungi, plants, and the potential human pathogen phyla Apicomplexa and Nematoda.
The most basic level of information provided by the approach—the identification of the full mosquito microbe repertoire and insights into the correlations or anti-correlations of particular microbes with human or animal pathogens—could ultimately point to new potential control agents like Wolbachia, which is insect-specific and known to interfere with its host’s ability to carry a number of viruses including Dengue, Zika, and chikungunya.
In the blender approach, monitoring relationships between mosquitoes, blood meals, and pathogens requires multiple assays in parallel. Moreover, bulk analysis of multiple mosquitoes makes it impossible to directly link an individual insect’s pathogen load with transmission. The single-mosquito approach allows these data to be connected more directly, using just the one sequencing assay.
One such connection is between sequences that commonly co-occur with each other. In one of several vignettes in the paper, the team describes sequences from Anaplasma bacteria in more than half of the mosquitoes that also had blood meal sequences from local mule deer, suggesting that anaplasmosis—an illness that in its late stages can lead to respiratory failure—could be a burden on the local deer population.
“There’s a growing encyclopedia of knowledge around animal species derived from new types of sequencing approaches like this one,” Kistler says, in reference to mule deer and other animal reservoirs whose genomes are not yet fully sequenced.
Looking at co-occurrences also allowed the authors to bring to light some viral “dark matter”: sequences that can’t be assigned to any genome because they don’t resemble anything in reference databases. Because they sequenced individual mosquitoes, the team could exploit the fact that all segments of viruses with segmented genomes—such as orthomyxoviruses, which have 6 to 8 segments—will co-occur within mosquitoes that carry the virus, but not within those that don’t carry the virus. Indeed, this co-occurrence analysis allowed the team to identify all of the missing segments from two orthomyxoviruses of mosquitoes that were reported in previous studies, and two novel ones they report in their paper.
Extending this co-occurrence analysis even further, the team was able to discover—minus any preconceived ideas or information about sequences—that a group of narnaviruses, RNA viruses previously considered to consist of a single naked RNA encoding only an RNA-dependent RNA polymerase, actually has an additional segment consisting of a highly divergent open reading frame. This segment encodes a protein with no significant sequence or structural similarity to any previously known protein motif or domain in sequence databases.
“This conceptually simple approach allows us to break free from reliance on sequence databases, revealing completely novel sequence information. There’s something just so profound about that,” says Kistler.
In addition to the power the single-mosquito approach brings to being able to simultaneously monitor the full collection of microbes, the blood meals, the animal reservoirs, and how prevalent the microbes are, the approach can also be used at different time points to follow the evolution of pathogens.
Besides identifying new potential microbial threats to animals and humans, the approach should also be useful for tracking down viruses that we know are still circulating, but that have disappeared from our radar. For instance, Zika virus outbreaks have happened periodically since the 1960s, the most recent one having played out in 2015 and 2016. Because we aren’t sure of Zika’s animal reservoir, we can’t distinguish whether it’s been kept at bay since then because the human population has gained resistance, because the reservoir has died out, or for some other reason entirely. By sequencing individual insects, we may discover reemerging Zika viruses in the mosquito population along with the identity of the animal reservoir, which would help answer these questions.
“If there was a pathogen outbreak happening in an endemic area of the world, I would want to deploy this approach, in addition to the more conventional methods,” says Kistler. “And I think there are certainly ways to scale it to make it cost-effective and feasible.”
At the intersection of machine learning and life science
Learn More
Bolivian biologists building a better future for science and health
Learn More
Blazing new trails
Learn More
Stay up-to-date on the latest news, publications, competitions, and stories from CZ Biohub.
Marketing cookies are required to access this form.