Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - CONTESSA (COuNt data TimE SerieS Analysis: significance tests and sequencing data application)

Teaser

The aim of this project is to develop methods for analysis of time-series based on count data. The target of my project broadens to general analysis of count time-series data such as clustering, classification, perturbations inference and machine learning over sequential count...

Summary

The aim of this project is to develop methods for analysis of time-series based on count data. The target of my project broadens to general analysis of count time-series data such as clustering, classification, perturbations inference and machine learning over sequential count data. The project focus on count data sets from ribonucleic acid sequencing (RNA-seq) time course experiments. My project potentially has promising applications in biology, recent examples include high- throughput sequencing, such as RNA-seq and chromatin immunoprecipitation sequencing (ChIP-seq) analyses and more recently Single Cell sequencing.

Work performed

The work carried through the fellowship was scheduled as a learning period and a very productive research development period. The project lead me to develop six research lines and a dissemination and communication project that are detailed in next section. One work is under review, some of the works are ready for submission and some are still ongoing. To the purpose of further developing and publish the mentioned research results, I was recently awarded a visiting academic position at the department of computer science.
As intermediate results dissemination I took part to many research meetings and workshops. I also took part to summer schools and organized myself scientific events and had weekly meetings with my supervisors Neil Lawrence and Eleni Vasilaki.

Final results

As soon as I started my fellowship I realised that the new technology of single Cells RNA-sequencing was emerging for the study of gene expression. Given the advent of this new technology, I rather focused on this kind of sequencing data and all the problems connected to this. I broadened my project to general analysis of RNA-seq count data from single cells, as clustering, classification and perturbations inference. In the beginning of my fellowship I joined the group of professor Neil Lawrence and learned about Gaussian Processes (GP), attending the GP Summer schools in both September 2015 and 2016, helping with the organisation. I learned how to use and share on GitHub, the basics of Python and Jupyter Notebooks.

In Sheffield I collaborated with Marta Milo and Guillaume Hautbergue, working on Amyotrophic Lateral Sclerosis (ALS) data. I joined a very important project where a new experimental technology has been tested. I could handle new RNAseq data and develop a custom pipeline from the alignment phase to the differential expression and protein protein interaction network estimation. In particular the new proposed technology is called GRASPS (Genome-wide RNA Analysis of Stalled Protein Synthesis): A novel translatome technology to identify functional consequences of widespread RNA dysregulation in neurodegeneration. This work is in collaboration and has been presented at the Sheffield neuroscience conference and is currently under revision for journal submission.

In the meantime I started my secondment at the University of Manchester where I collaborated within the group of Professor Magnus Rattray. There I could interact with a computational biology team and start a project about finding co-oscillating genes in a given set of RNA-seq data (bulk or Single Cell). This study lead us to develop a method and a software called PyScope: Detecting oscillatory gene networks. This is in collaboration and has been presented at the data science 2017 meeting in Manchester and at the ISMB 2017 meeting in Praga. We propose a full analysis pipeline on the resulting graph to identify communities of signicantly co-oscillating genes.

I also focused on network community extraction methods and their validation. This is extremely important when dealing with real Biological or Social networks. Indeed a way of summarising networks is via the main representative groups of nodes (elements) that are strongly connected, hence via communities. It follows that it is crucial to be able to rely on robust community extraction methods. This led me to develop, in collaboration with Annamaria Carissimo and Italia Defeis, a method for validating community robustness in networks. We show the results obtained with the proposed technique on simulated and real datasets. This work is currently under second round of revision in a top statistical journal an was presented at the Machine Learning conference NIPS 2016, Barcellona.

Discussing ideas with the ML group in Sheffield, I was introduced to the team of professor Ernst Wit, leading a COST action on Networks called COSTNET (COST Action CA15109). I took part to the first Meeting of COSTNET in 2016. There I exchanged some ideas on Network validation with Mirko Signorelli and this lead us to a fruitful collaboration on Networks validation techniques. We developed an inferential procedure for community structure validation in networks. This work is currently under revision for submission to a statistical journal.

I had the opportunity to take part to the launch of the single cell facility at BMS where I was invited for a talk. Within the facility I started a project about how to address Fluidigm C1 doublets problem and the detection of a single cell developmental stage before the sequencing. This work is in collaboration with Max Zwießele, Paul J Gokhale, Marcelo Rivolta and Marta Milo. The Fluidigm C1 is a single-cell analysis system uses a simplified single-cell isolation and cell processing based on I

Website & more info

More info: http://www.sheffield.ac.uk/neuroscience/staff/cutillo.