Metagenomics-Based Analysis of Population-Level Genomic Variation in Hot Spring Microbial Communities
By: Daniel Hogan, Andre's Williams
Department: Biology
Faculty Advisor: Dr. José R. De La Torre
Metagenomics, sequencing genomic DNA from entire microbial communities, has altered our understanding of the microbial world, revealing unsuspected levels of biodiversity. Most metagenomic analyses have focused on obtaining metagenome-assembled genomes, which represent the consensus genome sequence of organisms in the community. However, these datasets also contain information about the genetic variation present within each species in the community. To date, the extent of this microdiversity, along with the environmental and evolutionary forces driving it, have largely been ignored. Prior work in our lab examined the microdiversity of natural populations of thermophilic ammonia-oxidizing archaea (ThAOA) using metagenomic datasets sequenced with Illumina short-read technologies (reads of 100-150 bp). We found great variability in the amount of genetic diversity of ThAOA populations in hot springs across three continents, as indicated by the density of single nucleotide polymorphisms (SNPs). Some populations (Spain) appeared almost clonal, with little or no genetic variation across the entire genome. In contrast, we found that populations in the United States (Nevada) and China (Yunnan) showed significant diversity across all loci. We have now examined whether other dominant microorganisms in the samples show similar patterns of diversity. In the Spanish community, the second most abundant organism was a close relative of Thermoflexus hugenholtzii. Like the ThAOA in this sample, we found remarkably low numbers of SNPs for T. hugenholtzii, suggesting that the microbial community in this engineered hydrothermal system may be extremely new. In springs with genetically diverse populations of ThAOA, we likewise found significant SNP densities across the genomes of other abundant microorganisms. More recently, we have looked at genomic variability at larger scales within populations. Flexible genomic islands, often containing multiple genes or operons, result from the acquisition by individual cells of large genomic fragments by horizontal gene transfer. Genes in flexible islands have higher proportions of non-synonymous to synonymous changes when compared to genes in the core genome. These results suggest different selective pressures may be acting on genes found in the core or flexible regions of the genome. In the future, we will explore the use of long-read sequencing technologies to improve our assessments of genomic variability in these communities.