Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - MetaPlat (Development of an Easy-to-use Metagenomics Platform for Agricultural Science)

Teaser

The MetaPlat project is focussed on creating an easy-to-use integrated hardware and software platform to enable the rapid analysis of large metagenomic datasets. It addresses the subject in a holistic way by creating a platform that handles large metagenomics data and...

Summary

The MetaPlat project is focussed on creating an easy-to-use integrated hardware and software platform to enable the rapid analysis of large metagenomic datasets. It addresses the subject in a holistic way by creating a platform that handles large metagenomics data and produces in-depth analyses and comparisons thereby allowing researchers to make full use of the data generated for each sample. This requires many disciplines and skills to be brought together in order to achieve success. Furthermore it requires a mixture of creative research, focused application and commercial awareness, which together will lead to the development of such a platform.
Dairy livestock have the ability to convert tough plant material such as grass into quality, high-protein products for human nutrition through fermentation by microbes in their digestive tracts, but a by-product of this action is substantial methane production. Methane is a greenhouse gas that has more heat trapping capacity than CO2 and is produced in vast quantities by livestock world-wide. The dairy and beef industry, has significant economic, nutritional, and cultural value so it is not feasible to demand that everybody stop drinking milk and eating meat. Any strategy that aims to mitigate greenhouse gas emission in agriculture also needs to maintain the efficiency of cattle in food production by investigating the action of microbes in livestock. With this in mind we are creating an easy-to-use high performance machine learning platform with the objective of enabling the rapid analysis of large metagenomic datasets, in order to better understand the microbial mechanisms behind efficient food production, better meat quality and methane production. The project goal has been broken down into the following core objectives:
• Sample collection preparation, and sequencing
• Curation of the reference databases (phylogeny-aware new classification and previously unclassified sequences using machine learning)
• Development of accurate classification algorithms
• Real-time or time-efficient comparison analyses
• Production of statistical and visual representations conveying more useful information.
• Platform Integration
• Provide insights into probiotic supplement usage, methane production and feed conversion efficiency in cattle

Work performed

MetaPlat researchers are now starting to understand what microbes, genes and enzymes are responsible for the worst effects of this methane production, through the use of genomic sequencing and analysis. However genomic sequencing is producing so much data that it is extremely difficult to make sense of all this data and mine it for patterns. MetaPlat colleagues in research institutes across the EU are now developing machine learning software and high performance computing that is giving promising results. In controlled experiments with SRUC we have identified microbes and molecular pathways that lead to greenhouse gas emissions in dairy production. We have also identified feeding strategies and food supplements that promote the growth of microbes that reduce methane output, while increasing food production.
This EU funded research is also giving valuable insight into how humans can harness the microbes for converting grass and other plant material into biofuels.
Specialist ‘transfer of knowledge’ workshops have been delivered to research fellows in both molecular biology and cloud computing. Two international research conferences have also been organised around MetaPlat where all partners participated, namely:
CERC 2016 http://www.cerc-conference.eu/.
CERC 2017 http://www.cerc-conference.eu/.
There were a number of workshops organised for the project, including one international Workshop on Data Analytics in Metagenomics (http://scm.ulster.ac.uk/~e10267487/DAM2017/index.html) held in Nov. 2017, in conjunction with IEEE BIBM 2017 conference in Kansas City, USA
MetaPlat has thus far addressed the following key objectives in sample collection preparation, and sequencing, the development of accurate classification algorithms, time-efficient comparison analyses visualization, integration, to provide insights into probiotic supplement usage, methane production and feed conversion efficiency in cattle.

Final results

Revealing the full extent of microbial gene diversity and complex microbial interactions, integrated metagenomics and network analysis is a major contribution of MetaPlat. A MetaPlat study investigated the rumen microbial community in cattle through the integration of metagenomic and network-based approaches. One of the main contributions beyond the state of the art is the development of a random matrix theory-based approach to automatically determining the correlation threshold used to construct the co-abundance network associated with methane emission. The findings exhibit a clear modular structure with certain trait-specific genes highly over-represented in modules. More specifically, all the 20 genes previously identified to be associated with methane emissions are found in a module (hypergeometric test, p < 10−11). One third of genes are involved in methane metabolism pathways.
MetaPlat utilises a high-throughput computing asynchronous queueing system that lends itself to scaling up (making processing nodes more powerful) and scaling out (adding multiple processing nodes in parallel). Such queueing systems have a number of advantages. Firstly, their asynchronous nature means that resource usage is kept as efficient as possible: long-running jobs do not hold onto i/o resources and their related threads needlessly. Secondly, loose coupling between queues and their consumers permits the creation of multiple consumers without significant impact on the functioning of the queue itself. The queue does not need to \'know\' about or manage its consumers. Scaling becomes a relatively simple matter of adding more processes on a multi-core node, or adding more nodes in a distributed system. Although some data processing is complex, in that it needs to recombine the results of parallel and distributed processes, going forward we will implement an Actor Model (as exemplified by Akka or the Erlang language), effectively implementing a queueing system at a more fine-grained level.
The project has also produced visualisations of the metagenomic data, which is crucial for understanding the microbial diversity in the gut. As part of MetaPlat, visualisation tools are incorporated into the metagenomics pipeline to visualise microbial data through PCoA plots, bar charts and bubble plots. The use of metagenomics in this project has provided unprecedented insight into the form and function of heterogeneous communities of microorganisms and their vast biodiversity, without the need for isolation and lab culture of particular organisms. Microbial communities affect human and animal health, support the growth of plants, are critical components of all terrestrial and aquatic ecosystems and can be exploited to produce fuels or chemicals. Metagenomics, thus pervades a number of hugely important industries central to economic growth and employment. MetaPlat will allow us to better understand the microbial mechanisms behind efficient food production, better meat quality and methane production.

Website & more info

More info: http://www.metaplat.eu/.