Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - VIROGENESIS (Virus discovery and epidemic tracing from high throughput metagenomic sequencing)

Teaser

VIROGENESIS addresses the core challenge of advancing methodology that maximizes the use of Next-Generation Sequencing (NGS) data in biomedical and clinical settings. However, NGS full potential for routine use in clinical and health laboratories is hampered as current...

Summary

VIROGENESIS addresses the core challenge of advancing methodology that maximizes the use of Next-Generation Sequencing (NGS) data in biomedical and clinical settings. However, NGS full potential for routine use in clinical and health laboratories is hampered as current methodology is not sufficiently adapted to these data. The VIROGENESIS project has identified specific bioinformatics bottlenecks that prevent the effective use of NGS in clinical and epidemiological settings. 1) NGS datasets typically contain known and unknown RNA and DNA viruses, but current virus detection methods lack the necessary sensitivity to discover novel viruses, and unassigned sequence reads are currently discarded from analyses. New metagenomics classifiers are needed that can handle the large size of datasets, characterize unknown viruses and show increased classification resolution for known viruses. 2) When virome data has been assembled and classified, phylogenetic investigations provide information on virus dynamics during outbreaks when source and mode of transmission must be identified and the geographic spread quickly mapped. Current inference models are not adapted to work with large and incompletely assembled NGS data. New algorithms are needed that encompass methodological improvements in terms of data size, data format and speed of analysis. 3) Software that can efficiently present the wealth of results from NGS analyses is lacking, with implications for translation of complex, diverse information into ready-to-use data for clinical researchers and public health officials. Novel visualisation should scale complex and diverse information and handle uncertainty present in the output of analyses.
Outbreaks of viral pathogens can cause global epidemics, and timely detection of causative agents or real-time surveillance is crucial to control transmission and understanding factors contributing to virus spread. NGS technologies have advantages over traditional methods in terms of rapid and sensitive identification of novel or existing virus and of tracking population dynamics in a patient or the entire population. VIROGENESIS supports the use of viral NGS data by allowing more comprehensive, rapid characterization of emerging viruses from clinical and surveillance samples.
Hence, VIROGENESIS main objective is to develop software that addresses major bottlenecks of using NGS data for virus detection and surveillance. Software will be incorporated into existing bioinformatics pipelines and infrastructures, available as free, modular and open-source software, which offers opportunities for bioinformatics developers and enterprises to further exploit commercialization of the bioinformatics solutions. To this aim, the project has been divided into 3 smaller objectives related to research and innovation. Tools will be developed for rapid discovery, recovery and annotation of potential viral reads from metagenomics data, and for virome shift detection and epidemic tracing starting from the assembled, typed and annotated NGS viral data. Novel visual data encodings will be designed together with user interfaces for VIROGENESIS tools in a single platform. A fourth objective focuses on strategies to provide VIROGENESIS software to end-users and stakeholders, and integrate the software into long-term. Activities include pilot studies, workshops and training.

Work performed

The work to be carried out during the project duration is divided into 5 Work Packages (WP), which all started at project beginning. The first period of VIROGENESIS is characterized by significant advancements in each WP, at the level of innovation and research, and of coordination and exploitation of software solutions. Initially, research efforts focused on prototype development for WP1 (assembly, discovery and typing) and WP2 (virome shift and epidemic tracing). For WP1, a reference-based nucleotide sequence assembly and aligner was developed based on compressed numerical representation, showing that a significant reduction in alignment time while maintaining accuracy was achieved compared to conventional methods. Progress was made for virus classification based on protein secondary structure and domain identification, in addition to standard sequence matching algorithms (BLAST). A web-application was built for accurate, rapid typing of known viruses. For WP2, a novel pipeline for classifying viral species within metagenomes was developed, incorporating results from WP1. New methods of phylogenetic placement of NGS virome data through ancestral state reconstruction and alignment-free, sequence matches were developed. The genealogical transmission model of the popular BEAST software was extended with a hierarchical phylogenetic model, allowing for the analysis of NGS data in a bifurcating tree-based Bayesian statistical framework. A new probabilistic version of PhyloType was accompanied by a novel method for fast dating of sequence data. Performance of these solutions is being assessed by the consortium and user-representatives using simulated and real metagenomics data. For WP3, the exploration of new visual data encodings focused on the challenges related to scalability and uncertainty of new data as well as to connecting meta-data and user interactivity. The design of Interactive Graphical User Interfaces (GUI) was started for the different tools, and an outline for the general architecture to integrate the tools in bioinformatics was defined. VIROGENESIS solutions are being promoted to the different key partners and stakeholders, including individual clinical and public health researchers, laboratories and institutes, and larger infrastructure (WP4). Pilot studies are being prepared to demonstrate innovation of VIROGENESIS in areas of diagnostics, public health and virus discovery. Training modules have been included in the well-acknowledged international VEME workshop to educate VIROGENESIS solutions to clinicians, virologists and researchers. The existence of VIROGENESIS beyond the lifetime of the project is being explored by linkages with strategic partners and EU initiatives for the uptake of software into large infrastructures. VIROGENESIS solutions will be distributed as open-source software which allows for commercial exploitation of needs of public health institutes and diagnostic laboratories. A pilot business case was developed to define market potential and go-to-market strategy, which is useful for other SMEs.

Final results

The project was positioned at the ‘idea of application’ stage but prototypes will evolve to complete ‘proof-of-concept’ applications. The impact of VIROGENESIS is visible to date. Public health has been severely threatened by the recent and explosive outbreaks of Zika virus in the Americas and Yellow Fever virus in Angola. We decided to early release an Arbovirus typing tool into the public domain, pending its publication, which is widely used (+70000 accessions since the release at http://bioafrica2.mrc.ac.za/rega-genotype/typingtool/aedesviruses/). The uptake of VIROGENESIS software solutions into the 2017 International Bioinformatics Workshop on Virus Evolution and Molecular Epidemiology that will take place in Lisbon, Portugal, enables young and established researchers, virologists and clinicians to keep up with the latest trends in the field of infectious diseases, and of pathogen tracking and detection. (https://rega.kuleuven.be/cev/veme-workshop/2017).

Website & more info

More info: http://www.virogenesis.eu.