HEISENDATA

HeisenData - Towards a Next-Generation Uncertain-Data Management System

 Coordinatore THE RESEARCH COMMITTEE OF THE TECHNICAL UNIVERSITY OF CRETE 

 Organization address address: BUILDING E4 CAMPUS KONOUPIDIANA
city: CHANIA
postcode: 73132

contact info
Titolo: Prof.
Nome: Nikolaos
Cognome: Varotsis
Email: send email
Telefono: -65183
Fax: -56597

 Nazionalità Coordinatore Greece [EL]
 Totale costo 100˙000 €
 EC contributo 100˙000 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2009-RG
 Funding Scheme MC-IRG
 Anno di inizio 2010
 Periodo (anno-mese-giorno) 2010-03-01   -   2014-02-28

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    THE RESEARCH COMMITTEE OF THE TECHNICAL UNIVERSITY OF CRETE

 Organization address address: BUILDING E4 CAMPUS KONOUPIDIANA
city: CHANIA
postcode: 73132

contact info
Titolo: Prof.
Nome: Nikolaos
Cognome: Varotsis
Email: send email
Telefono: -65183
Fax: -56597

EL (CHANIA) coordinator 100˙000.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

real    learning    team    scalable    probabilistic    reasoning    models    databases    model    relational    database    histograms    helped    tools    optimal    pdbss    components    effectiveness    ie    data    algorithms    heisendata    architectures    framework    class    pdbs    correlation    inference    simplistic    supporting    statistical    life    extraction    synopses    uncertainty    uncertain    structures    effectively    patterns    query    conventional   

 Obiettivo del progetto (Objective)

'Several real-world applications need to manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings, e.g., for motion prediction and human behavior modeling; information-extraction tools can assign different possible labels with varying degrees of confidence to segments of text, due to the uncertainties and noise present in free-text data. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex correlation patterns present in real-life data. Unfortunately, to date, approaches to Probabilistic Database Systems (PDBSs) have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilities are typically associated with individual data tuples, with little or no support for capturing data correlations. This research proposal aims to design and build a novel, extensible PDBS that supports a broad class of statistical models and probabilistic-reasoning tools as first-class system objects, alongside a traditional relational-table store. Our proposed architecture will employ statistical models to effectively encode data-correlation patterns, and promote probabilistic inference as part of the standard database operator repertoire to support efficient and sound query processing. This tight coupling of relational databases and statistical models represents a major departure from conventional database systems, and many of the core system components need to be revisited and fundamentally re-thought. The proposed research will attack several of the key challenges arising in this novel PDBS paradigm (including, query processing, query optimization, data summarization, extensibility, and model learning and evolution), build usable prototypes, and investigate key application domains (e.g., information extraction).'

Introduzione (Teaser)

An EU team developed data systems that use statistical and probabilistic reasoning to reduce uncertainty. The project helped to unify such methods with conventional databases, in part by developing scalable algorithms and a variety of new tools.

Descrizione progetto (Article)

Various software applications must manage and make decisions using data with high levels of uncertainty. While certain tools can fill in the gaps to some degree, such tools are generally simplistic and limited.

The EU-funded 'Heisendata - towards a next-generation uncertain-data management system' (http://heisendata.softnet.tuc.gr/ (HEISENDATA)) project aimed to improve matters. The team planned to design and build new probabilistic database systems (PDBSs), supporting statistical models and probabilistic reasoning in addition to conventional database structures. The project intended to address the challenges involved in supporting such a novel union, including redesign of key system components. HEISENDATA ran for four years to February 2014.

Project work covered three main branches: new probabilistic data synopses for query optimisation, new PDBS algorithms and architectures, and scalable algorithms and tools.

The data synopses involved defining and creating algorithms for building histograms. For various error metrics, the new algorithms constructed optimal or near-optimal histograms and wavelet synopses. Further work introduced probabilistic histograms, which allowed a more accurate representation of the data's uncertainty characteristics.

Additionally, the team addressed problems related to unstructured text containing units of structured information. The solutions extended a leading information extraction (IE) model, by developing two query approaches. The efficiency and effectiveness of the approaches were compared using real-life data sets. The result was a set of rules for choosing appropriate inference algorithms under various conditions, yielding up to 10-fold speed improvements.

The project also devised a framework for scaling any generic entity resolution algorithm, and demonstrated the framework's effectiveness. Further work helped to integrate the IE pipeline with probabilistic query processing.

HEISENDATA found new statistical methods for processing data with high uncertainties, and integrated the methods into conventional database structures. The work addressed a topic of interest to the academic and commercial sectors.

Altri progetti dello stesso programma (FP7-PEOPLE)

CLIMBING (2013)

Climate and nutrient impacts on lake biodiversity and ecosystem functioning

Read More  

TDP-43 (2010)

Taming TDP43: High-throughput screening for compounds to reduce aggregation of the new player in MND

Read More  

PECTA (2012)

Extremal Problems in Combinatorics and Their Applications

Read More