#	Pagina
attuale pagina	/open-fp7/projects/94040/index.html
-1	/open-h2020/per-topic/compounding/list/index.html
-2	/open-fp7/projects/108248/index.html

HEISENDATA

HeisenData - Towards a Next-Generation Uncertain-Data Management System

Coordinatore	THE RESEARCH COMMITTEE OF THE TECHNICAL UNIVERSITY OF CRETE Organization address address: BUILDING E4 CAMPUS KONOUPIDIANA city: CHANIA postcode: 73132 contact info Titolo: Prof. Nome: Nikolaos Cognome: Varotsis Email: send email Telefono: -65183 Fax: -56597
Nazionalità Coordinatore	Greece [EL]
Totale costo	100˙000 €
EC contributo	100˙000 €
Programma	FP7-PEOPLE Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
Code Call	FP7-PEOPLE-2009-RG
Funding Scheme	MC-IRG
Anno di inizio	2010
Periodo (anno-mese-giorno)	2010-03-01 - 2014-02-28

Partecipanti

#	participant	country	role	EC contrib. [€]
1	THE RESEARCH COMMITTEE OF THE TECHNICAL UNIVERSITY OF CRETE Organization address address: BUILDING E4 CAMPUS KONOUPIDIANA city: CHANIA postcode: 73132 contact info Titolo: Prof. Nome: Nikolaos Cognome: Varotsis Email: send email Telefono: -65183 Fax: -56597	EL (CHANIA)	coordinator	100˙000.00

Mappa

Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

inference patterns relational pdbs data heisendata statistical scalable reasoning synopses model ie learning framework algorithms simplistic histograms conventional effectiveness uncertain database probabilistic query structures architectures helped extraction supporting real correlation optimal life models class team tools pdbss databases components effectively uncertainty

Obiettivo del progetto (Objective)

'Several real-world applications need to manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings, e.g., for motion prediction and human behavior modeling; information-extraction tools can assign different possible labels with varying degrees of confidence to segments of text, due to the uncertainties and noise present in free-text data. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex correlation patterns present in real-life data. Unfortunately, to date, approaches to Probabilistic Database Systems (PDBSs) have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilities are typically associated with individual data tuples, with little or no support for capturing data correlations. This research proposal aims to design and build a novel, extensible PDBS that supports a broad class of statistical models and probabilistic-reasoning tools as first-class system objects, alongside a traditional relational-table store. Our proposed architecture will employ statistical models to effectively encode data-correlation patterns, and promote probabilistic inference as part of the standard database operator repertoire to support efficient and sound query processing. This tight coupling of relational databases and statistical models represents a major departure from conventional database systems, and many of the core system components need to be revisited and fundamentally re-thought. The proposed research will attack several of the key challenges arising in this novel PDBS paradigm (including, query processing, query optimization, data summarization, extensibility, and model learning and evolution), build usable prototypes, and investigate key application domains (e.g., information extraction).'

Introduzione (Teaser)

An EU team developed data systems that use statistical and probabilistic reasoning to reduce uncertainty. The project helped to unify such methods with conventional databases, in part by developing scalable algorithms and a variety of new tools.

Descrizione progetto (Article)

Various software applications must manage and make decisions using data with high levels of uncertainty. While certain tools can fill in the gaps to some degree, such tools are generally simplistic and limited.

The EU-funded 'Heisendata - towards a next-generation uncertain-data management system' (http://heisendata.softnet.tuc.gr/ (HEISENDATA)) project aimed to improve matters. The team planned to design and build new probabilistic database systems (PDBSs), supporting statistical models and probabilistic reasoning in addition to conventional database structures. The project intended to address the challenges involved in supporting such a novel union, including redesign of key system components. HEISENDATA ran for four years to February 2014.

Project work covered three main branches: new probabilistic data synopses for query optimisation, new PDBS algorithms and architectures, and scalable algorithms and tools.

The data synopses involved defining and creating algorithms for building histograms. For various error metrics, the new algorithms constructed optimal or near-optimal histograms and wavelet synopses. Further work introduced probabilistic histograms, which allowed a more accurate representation of the data's uncertainty characteristics.

Additionally, the team addressed problems related to unstructured text containing units of structured information. The solutions extended a leading information extraction (IE) model, by developing two query approaches. The efficiency and effectiveness of the approaches were compared using real-life data sets. The result was a set of rules for choosing appropriate inference algorithms under various conditions, yielding up to 10-fold speed improvements.

The project also devised a framework for scaling any generic entity resolution algorithm, and demonstrated the framework's effectiveness. Further work helped to integrate the IE pipeline with probabilistic query processing.

HEISENDATA found new statistical methods for processing data with high uncertainties, and integrated the methods into conventional database structures. The work addressed a topic of interest to the academic and commercial sectors.