Opendata, web and dolomites


Data-Driven Genomic Computing

Total Cost €


EC-Contrib. €






 GeCo project word cloud

Explore the words cloud of the GeCo project. It provides you a very rough idea of what is the project "GeCo" about.

boost    domain    next    made    exploited    curated    inspired    federated    few    arises    miss    foundational    abstractions    biomedical    ing    shift    applicable    indexing    investments    data    search    answers    metadata    radical    cancer    efficiency    mutations    paradigm    genomics    collected    computational    extended    genomes    drive    clusters    internet    dna    orthogonal    extraction    relational    hidden    algebra    generation    lens    driving    seamless    query    model    opening    individual    genomic    dramatically    repositories    region    classic    geco    huge    perspective    hundreds    guarantee    fetching    background    clouds    environment    sequence    distracted    vision    pi    bioinformatics    distributed    big    services    populations    regulatory    networks    interoperability    computation    language    suitable    trace    clustering    computing    parallel    performance    fundamental    rethink    first    comparatively    formats    doors    integration    basic    dependent    principled    move    reading    crawling    descriptive    statistics    public    broad    sequencing    time   

Project "GeCo" data sheet

The following table provides information about the project.


Organization address
city: MILANO
postcode: 20133

contact info
title: n.a.
name: n.a.
surname: n.a.
function: n.a.
email: n.a.
telephone: n.a.
fax: n.a.

 Coordinator Country Italy [IT]
 Project website
 Total cost 2˙500˙000 €
 EC max contribution 2˙500˙000 € (100%)
 Programme 1. H2020-EU.1.1. (EXCELLENT SCIENCE - European Research Council (ERC))
 Code Call ERC-2015-AdG
 Funding Scheme ERC-ADG
 Starting year 2016
 Duration (year-month-day) from 2016-09-01   to  2021-08-31


Take a look of project's partnership.

# participants  country  role  EC contrib. [€] 
1    POLITECNICO DI MILANO IT (MILANO) coordinator 2˙500˙000.00


 Project objective

Next-generation sequencing technology has dramatically reduced the cost and time of reading the DNA. Huge investments are targeted to sequencing the DNA of large populations, and repositories of well-curated sequence data are being collected. Answers to fundamental biomedical problems are hidden in these data, e.g. how cancer arises, how driving mutations occur, how much cancer is dependent on environment. But genomic computing has not comparatively evolved. Bioinformatics has been driven by specific needs and distracted from a foundational approach; hundreds of methods solve individual problems, but miss the broad perspective.

The objective of GeCo is to rethink genomic computing through the lens of basic data management. We will first design the data model, using few general abstractions that guarantee interoperability between existing data formats. Next, we will design a new-generation query language inspired by classic relational algebra and extended with orthogonal, domain-specific abstractions for genomics. Query processing will trace metadata and computation steps, opening doors to the seamless integration of descriptive statistics and high-level data analysis (e.g., DNA region clustering and extraction of regulatory networks).

Genomic computing is a “big data” problem, therefore we will also achieve computational efficiency by using parallel computing on both clusters and public clouds; the choice of a suitable data model and of computational abstractions will boost performance in a principled way. The resulting technology will be applicable to individual and federated repositories, and will be exploited for providing integrated access to curated data, made available by large consortia, through user-friendly search services. Our most far-fetching vision is to move towards an Internet of Genomes exploiting data indexing and crawling. The PI’s background in distributed data, data modelling, query processing and search will drive a radical paradigm shift.


year authors and title journal last update
List of publications.
2018 Marco Masseroli, Arif Canakoglu, Pietro Pinoli, Abdulrahman Kaitoua, Andrea Gulino, Olha Horlova, Luca Nanni, Anna Bernasconi, Stefano Perna, Eirini Stamoulakatou, Stefano Ceri
Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data
published pages: , ISSN: 1367-4803, DOI: 10.1093/bioinformatics/bty688
Bioinformatics 2019-06-14
2017 Fabio Cumbo, Giulia Fiscon, Stefano Ceri, Marco Masseroli, Emanuel Weitschek
TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
published pages: , ISSN: 1471-2105, DOI: 10.1186/s12859-016-1419-5
BMC Bioinformatics 18/1 2019-06-14
2017 Vahid Jalili, Matteo Matteucci, Marco Masseroli, Stefano Ceri
Explorative visual analytics on interval-based genomic data and their metadata
published pages: , ISSN: 1471-2105, DOI: 10.1186/s12859-017-1945-9
BMC Bioinformatics 18/1 2019-06-14
2017 Abdulrahman Kaitoua, Pietro Pinoli, Michele Bertoni, Stefano Ceri
Framework for Supporting Genomic Operations
published pages: 443-457, ISSN: 0018-9340, DOI: 10.1109/TC.2016.2603980
IEEE Transactions on Computers 66/3 2019-06-14
2018 Fabrizio Celli, Fabio Cumbo, Emanuel Weitschek
Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
published pages: , ISSN: 2214-5796, DOI: 10.1016/j.bdr.2018.02.005
Big Data Research 2019-06-14
2017 Alice Cambiaghi, Manuela Ferrario, Marco Masseroli
Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration
published pages: bbw031, ISSN: 1467-5463, DOI: 10.1093/bib/bbw031
Briefings in Bioinformatics 2019-06-14
2017 Stefano Ceri, Abdulrahman Kaitoua, Marco Masseroli, Pietro Pinoli, Francesco Venco
Data Management for Heterogeneous Genomic Datasets
published pages: 1251-1264, ISSN: 1545-5963, DOI: 10.1109/TCBB.2016.2576447
IEEE/ACM Transactions on Computational Biology and Bioinformatics 14/6 2019-06-14
2017 Vahid Jalili, Matteo Matteucci, Marco Masseroli, Stefano Ceri
Indexing Next-Generation Sequencing data
published pages: 90-109, ISSN: 0020-0255, DOI: 10.1016/j.ins.2016.08.085
Information Sciences 384 2019-06-14
2017 Vahid Jalili, Matteo Matteucci, Marco J. Morelli, Marco Masseroli
MuSERA: Multiple Sample Enriched Region Assessment
published pages: bbw029, ISSN: 1467-5463, DOI: 10.1093/bib/bbw029
Briefings in Bioinformatics 18 (3) 2019-06-14
2019 Pietro Pinoli, Stefano Ceri, Davide Martinenghi, Luca Nanni
Metadata management for scientific databases
published pages: 1-20, ISSN: 0306-4379, DOI: 10.1016/
Information Systems 81 2019-04-18
2018 Stefano Perna, Pietro Pinoli, Stefano Ceri, Limsoon Wong
TICA: Transcriptional Interaction and Coregulation Analyzer
published pages: 342-353, ISSN: 1672-0229, DOI: 10.1016/j.gpb.2018.05.004
Genomics, Proteomics & Bioinformatics 16/5 2019-04-18
2019 Andrea Gulino, Abdulrahman Kaitoua, Stefano Ceri
Optimal Binning for Genomics
published pages: 125-138, ISSN: 0018-9340, DOI: 10.1109/tc.2018.2854880
IEEE Transactions on Computers 68/1 2019-04-18
2019 Cheng Wang, Luca Nanni, Boris Novakovic, Wout Megchelenbrink, Tatyana Kuznetsova, Hendrik G. Stunnenberg, Stefano Ceri, Colin Logie
Extensive epigenomic integration of the glucocorticoid response in primary human monocytes and in vitro derived macrophages
published pages: , ISSN: 2045-2322, DOI: 10.1038/s41598-019-39395-9
Scientific Reports 9/1 2019-04-18
2018 Vahid Jalili, Matteo Matteucci, Jeremy Goecks, Yashar Deldjoo, Stefano Ceri
Next Generation Indexing for Genomic Intervals
published pages: 1-1, ISSN: 1041-4347, DOI: 10.1109/tkde.2018.2871031
IEEE Transactions on Knowledge and Data Engineering 2019-04-18

Are you the coordinator (or a participant) of this project? Plaese send me more information about the "GECO" project.

For instance: the website url (it has not provided by EU-opendata yet), the logo, a more detailed description of the project (in plain text as a rtf file or a word file), some pictures (as picture files, not embedded into any word file), twitter account, linkedin page, etc.

Send me an  email ( and I put them in your project's page as son as possible.

Thanks. And then put a link of this page into your project's website.

The information about "GECO" are provided by the European Opendata Portal: CORDIS opendata.

More projects from the same programme (H2020-EU.1.1.)

MOCHA (2019)

Understanding and leveraging ‘moments of change’ for pro-environmental behaviour shifts

Read More  

DINEMOS (2019)

Discovery of New Molecular Semiconductors

Read More  

SARF (2018)

Single-Atom Radio Frequency Fingerprinting

Read More