Opendata, web and dolomites


Data-Driven Genomic Computing

Total Cost €


EC-Contrib. €






 GeCo project word cloud

Explore the words cloud of the GeCo project. It provides you a very rough idea of what is the project "GeCo" about.

reading    relational    region    classic    miss    computation    repositories    metadata    interoperability    principled    investments    big    radical    clusters    abstractions    orthogonal    opening    environment    services    sequence    distributed    distracted    drive    cancer    vision    hidden    basic    biomedical    fetching    individual    foundational    answers    clouds    next    algebra    clustering    federated    extraction    model    integration    populations    computing    genomics    regulatory    ing    time    curated    comparatively    broad    public    internet    boost    generation    hundreds    data    extended    applicable    shift    performance    rethink    genomes    networks    seamless    statistics    perspective    arises    exploited    genomic    first    move    fundamental    paradigm    language    parallel    sequencing    suitable    trace    search    query    dna    doors    crawling    huge    mutations    bioinformatics    collected    driving    indexing    dependent    efficiency    domain    geco    background    inspired    pi    dramatically    guarantee    lens    formats    few    made    computational    descriptive   

Project "GeCo" data sheet

The following table provides information about the project.


Organization address
city: MILANO
postcode: 20133

contact info
title: n.a.
name: n.a.
surname: n.a.
function: n.a.
email: n.a.
telephone: n.a.
fax: n.a.

 Coordinator Country Italy [IT]
 Project website
 Total cost 2˙500˙000 €
 EC max contribution 2˙500˙000 € (100%)
 Programme 1. H2020-EU.1.1. (EXCELLENT SCIENCE - European Research Council (ERC))
 Code Call ERC-2015-AdG
 Funding Scheme ERC-ADG
 Starting year 2016
 Duration (year-month-day) from 2016-09-01   to  2021-08-31


Take a look of project's partnership.

# participants  country  role  EC contrib. [€] 
1    POLITECNICO DI MILANO IT (MILANO) coordinator 2˙500˙000.00


 Project objective

Next-generation sequencing technology has dramatically reduced the cost and time of reading the DNA. Huge investments are targeted to sequencing the DNA of large populations, and repositories of well-curated sequence data are being collected. Answers to fundamental biomedical problems are hidden in these data, e.g. how cancer arises, how driving mutations occur, how much cancer is dependent on environment. But genomic computing has not comparatively evolved. Bioinformatics has been driven by specific needs and distracted from a foundational approach; hundreds of methods solve individual problems, but miss the broad perspective.

The objective of GeCo is to rethink genomic computing through the lens of basic data management. We will first design the data model, using few general abstractions that guarantee interoperability between existing data formats. Next, we will design a new-generation query language inspired by classic relational algebra and extended with orthogonal, domain-specific abstractions for genomics. Query processing will trace metadata and computation steps, opening doors to the seamless integration of descriptive statistics and high-level data analysis (e.g., DNA region clustering and extraction of regulatory networks).

Genomic computing is a “big data” problem, therefore we will also achieve computational efficiency by using parallel computing on both clusters and public clouds; the choice of a suitable data model and of computational abstractions will boost performance in a principled way. The resulting technology will be applicable to individual and federated repositories, and will be exploited for providing integrated access to curated data, made available by large consortia, through user-friendly search services. Our most far-fetching vision is to move towards an Internet of Genomes exploiting data indexing and crawling. The PI’s background in distributed data, data modelling, query processing and search will drive a radical paradigm shift.


year authors and title journal last update
List of publications.
2018 Marco Masseroli, Arif Canakoglu, Pietro Pinoli, Abdulrahman Kaitoua, Andrea Gulino, Olha Horlova, Luca Nanni, Anna Bernasconi, Stefano Perna, Eirini Stamoulakatou, Stefano Ceri
Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data
published pages: , ISSN: 1367-4803, DOI: 10.1093/bioinformatics/bty688
Bioinformatics 2019-06-14
2017 Fabio Cumbo, Giulia Fiscon, Stefano Ceri, Marco Masseroli, Emanuel Weitschek
TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
published pages: , ISSN: 1471-2105, DOI: 10.1186/s12859-016-1419-5
BMC Bioinformatics 18/1 2019-06-14
2017 Vahid Jalili, Matteo Matteucci, Marco Masseroli, Stefano Ceri
Explorative visual analytics on interval-based genomic data and their metadata
published pages: , ISSN: 1471-2105, DOI: 10.1186/s12859-017-1945-9
BMC Bioinformatics 18/1 2019-06-14
2017 Abdulrahman Kaitoua, Pietro Pinoli, Michele Bertoni, Stefano Ceri
Framework for Supporting Genomic Operations
published pages: 443-457, ISSN: 0018-9340, DOI: 10.1109/TC.2016.2603980
IEEE Transactions on Computers 66/3 2019-06-14
2018 Fabrizio Celli, Fabio Cumbo, Emanuel Weitschek
Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
published pages: , ISSN: 2214-5796, DOI: 10.1016/j.bdr.2018.02.005
Big Data Research 2019-06-14
2017 Alice Cambiaghi, Manuela Ferrario, Marco Masseroli
Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration
published pages: bbw031, ISSN: 1467-5463, DOI: 10.1093/bib/bbw031
Briefings in Bioinformatics 2019-06-14
2017 Stefano Ceri, Abdulrahman Kaitoua, Marco Masseroli, Pietro Pinoli, Francesco Venco
Data Management for Heterogeneous Genomic Datasets
published pages: 1251-1264, ISSN: 1545-5963, DOI: 10.1109/TCBB.2016.2576447
IEEE/ACM Transactions on Computational Biology and Bioinformatics 14/6 2019-06-14
2017 Vahid Jalili, Matteo Matteucci, Marco Masseroli, Stefano Ceri
Indexing Next-Generation Sequencing data
published pages: 90-109, ISSN: 0020-0255, DOI: 10.1016/j.ins.2016.08.085
Information Sciences 384 2019-06-14
2017 Vahid Jalili, Matteo Matteucci, Marco J. Morelli, Marco Masseroli
MuSERA: Multiple Sample Enriched Region Assessment
published pages: bbw029, ISSN: 1467-5463, DOI: 10.1093/bib/bbw029
Briefings in Bioinformatics 18 (3) 2019-06-14
2019 Pietro Pinoli, Stefano Ceri, Davide Martinenghi, Luca Nanni
Metadata management for scientific databases
published pages: 1-20, ISSN: 0306-4379, DOI: 10.1016/
Information Systems 81 2019-04-18
2018 Stefano Perna, Pietro Pinoli, Stefano Ceri, Limsoon Wong
TICA: Transcriptional Interaction and Coregulation Analyzer
published pages: 342-353, ISSN: 1672-0229, DOI: 10.1016/j.gpb.2018.05.004
Genomics, Proteomics & Bioinformatics 16/5 2019-04-18
2019 Andrea Gulino, Abdulrahman Kaitoua, Stefano Ceri
Optimal Binning for Genomics
published pages: 125-138, ISSN: 0018-9340, DOI: 10.1109/tc.2018.2854880
IEEE Transactions on Computers 68/1 2019-04-18
2019 Cheng Wang, Luca Nanni, Boris Novakovic, Wout Megchelenbrink, Tatyana Kuznetsova, Hendrik G. Stunnenberg, Stefano Ceri, Colin Logie
Extensive epigenomic integration of the glucocorticoid response in primary human monocytes and in vitro derived macrophages
published pages: , ISSN: 2045-2322, DOI: 10.1038/s41598-019-39395-9
Scientific Reports 9/1 2019-04-18
2018 Vahid Jalili, Matteo Matteucci, Jeremy Goecks, Yashar Deldjoo, Stefano Ceri
Next Generation Indexing for Genomic Intervals
published pages: 1-1, ISSN: 1041-4347, DOI: 10.1109/tkde.2018.2871031
IEEE Transactions on Knowledge and Data Engineering 2019-04-18

Are you the coordinator (or a participant) of this project? Plaese send me more information about the "GECO" project.

For instance: the website url (it has not provided by EU-opendata yet), the logo, a more detailed description of the project (in plain text as a rtf file or a word file), some pictures (as picture files, not embedded into any word file), twitter account, linkedin page, etc.

Send me an  email ( and I put them in your project's page as son as possible.

Thanks. And then put a link of this page into your project's website.

The information about "GECO" are provided by the European Opendata Portal: CORDIS opendata.

More projects from the same programme (H2020-EU.1.1.)

InsideChromatin (2019)

Towards Realistic Modelling of Nucleosome Organization Inside Functional Chromatin Domains

Read More  


The Mass Politics of Disintegration

Read More  

EVOMENS (2020)

The evolution of menstruation in primates

Read More