Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - PopMet (Investigating bacterial strain evolution through metagenomic genome assemblies)

Teaser

In microbial genomics classical reductionist approaches, focusing on single species, genes and genomes, are increasingly replaced by holistic “whole-community” metagenomic studies. In this approach, DNA of all genomes found in a given ecosystems is randomly sequenced, and...

Summary

In microbial genomics classical reductionist approaches, focusing on single species, genes and genomes, are increasingly replaced by holistic “whole-community” metagenomic studies. In this approach, DNA of all genomes found in a given ecosystems is randomly sequenced, and the genomic content of the microorganisms within the environment are reconstructed, in order to enable quantification of the taxonomic and functional diversity within a given community. Earlier metagenomic studies relied mostly on identifying taxa at genus or broader taxonomic levels, typically using highly conserved genes found in all bacteria (marker genes). However, theoretical advances in using single nucleotide variants (SNVs) allow currently for the distinction of single bacterial species, or even strains. These advances are critically important, as pathogenicity and metabolism of bacteria are often strain specific. SNVs can also guide the reconstruction of the bacterial genomes, thus linking taxonomy to specific functions.
In this project, we exploited the information in SNVs to get an understanding of the bacteria and their functional capacities in a diverse set of environments. This is different from classical metagenomics, in that we have a higher resolution to differentiate between single strains and in that we can recover part of the genomes and link this to the taxonomic identity of a bacterium, giving us more information about what who is doing at a given environment. This is important to understand how pathogenic bacteria are distributed in the environment, where they normally occur and when they turn from a “bystander” to an “aggressor” that is trying to damage the host. This kind of switch is known for many bacteria, e.g. E. coli is often found in the gut ecosystem, but is also often causing diarrhea, given opportunistic circumstances. Understanding the bacterial distribution and what can be considered as normal was thus an important scientific research goal in this project.
A second part was to investigate how bacterial genomes evolve over time, as this is important to understand how the phenotype of a bacteria can change. Normally these changes in phenotype are not harmful to the host, but in some rare cases this can also lead to a switch from bystander to aggressor strain.

Work performed

In the PopMet project the main question was, how much strain variability exists in the bacteria living in an environment. This question can be broken down into three subparts, first how we can identify bacterial strains reliably, second if we can get an idea what the genome of these bacteria in metagenomes exists and third if we can track their “evolutionary progress” over time or between different environments.
For the PopMet project an extensive software pipeline was developed to assemble genomes from metagenomes, henceforth referred to as MATAFILER, (https://github.com/hildebra/MATAFILER/). With the help of this pipeline, we analysed the core samples of the PopMet project, a gut microbial time series, as well as other microbial samples.
In a first step, the genomes of selected species were reconstructed and assembled. Subsequently, the same species from different patients were compared to estimate their global genetic divergence. The genetic diversity within a patient’s time series was in all cases extremely low. The first research question was to define and quantify bacterial strains. One established approach is to use the 16S gene as a phylogenetic marker gene, but the resolution is too low to even reliably identify a bacterial species. Instead, the usage of SNVs and their application at resolving species at sub-species level was investigated. Relying on stable, single copy marker genes in bacterial genomes, I extended the mOTU approach by including 40 and 100 stable core genes and then comparing the marker genes of a given species between samples. This was extended to comparing the whole genome, if recoverable, between samples. The latter two approaches could reliable place species from the same patient as the same bacterial strain.
The second part of my analysis was to reconstruct bacterial genomes from metagenomes. For this a new algorithm was implemented, with which we discovered a new bacterial species in a sample that was coinciding with an antibiotic treatment. This new species probably represents a new family of Clostridiaceae. Using the reconstructed genome, we could show it presence in samples from the same patient before and after antibiotic treatment. Further, the genomes of 4 co-occurring species in the same sample were also assembled.
The third question was to estimate the genetic variability and population behaviour over time. For this the reconstructed genomes were essential, enabling a precise mapping of metagenomic reads to the assembled bacterial genomes of a specific patient. This extended also to the question which algorithms are suitable to call SNVs on a potentially highly diverse and heterogeneous strain mix within metagenomes. To call genetic variants within a time series, a novel SNV calling pipeline is being developed. Using this pipeline we determined SNVs that are undergoing fixation over the time course of 4 years.

Final results

Two software packages were developed that are publicly available, one currently submitted for publication. Both pipelines will help other researchers with similar questions about bacterial ecology. First, the MATAFILER pipeline, that is a primary tool for assembly of metagenomes and species level genome reconstructions. A special focus during development was on the profiling of the taxonomy within a sample, using 16S rDNA, 40 marker genes, 10 marker genes. Second we developed the rtk (rarefaction toolkit) software, aimed at exploring the diversity within a bacterial community and accounting for low abundance bacteria. These tools were applied to a variety of projects, including researching the global distribution of bacterial subspecies in different countries, how bacteria can be transferred in a FMT (fecal microbiota transplant) between the human hosts and how antibiotic resistance is present in bacteria across the globe. Thus, the project enabled an in-depth understanding of bacterial species in diverse environments, ranging from the human gut microbiome to the oceans of the world. Effectively we could show that bacterial species are often subdivided into subspecies clusters, and this corresponds in only very few species to the geographic location of the human host. Further, bacterial strains of the same species can be transferred between human hosts by means of FMT, but this only happens rarely. Further, we could show very little genetic variation within the species living within a single human host. Exploring the microenvironments and the bacteria that inhabit it is an important step to towards understanding how nutrients are recycled, potential benefits provided to us by bacteria, but also understanding undesired reactions to bacteria, like autoimmune responses.
To disseminate my knowledge and experience, I taught a visiting school class about evolution and introduced metagenomics to participants of the Biology Olympiad. I also was responsible for a one day training in metagenomics (bioinformatics) in Leuven, Belgium. Until now, the project generated six peer-reviewed publications in high ranking journals. Thus I have encouraged young citizens to a scientific career and strengthened Europe as a knowledge based economy.

Website & more info

More info: https://github.com/hildebra/Rarefaction.