Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - WDAqua (Answering Questions using Web Data)

Teaser

Sharing, connecting, analysing, and understanding data on the Web can provide better services to citizens, communities, and the industry. One way to achieve this is through data-driven question answering, by delivering precise and comprehensive answers to natural language...

Summary

Sharing, connecting, analysing, and understanding data on the Web can provide better services to citizens, communities, and the industry. One way to achieve this is through data-driven question answering, by delivering precise and comprehensive answers to natural language questions, primarily by making better use of the knowledge encoded in the Web of Data. The aim of the WDAqua project was to advance the state of the art in this field by interleaving training, research, and innovation.
The WDAqua project:
- Provided a training programme for young data scientists
- Addressed challenges related to the whole question answering pipeline including 1) understanding of a spoken question, 2) analysis of the question, 3) retrieval of information to answer the question, and 4) presentation of the answer
- Delivered an open source framework and ecosystem of question answering components

The main research challenges addressed in the project can be summarized in the following points:
- Answer questions expressed in different formats
- Exploit knowledge encoded in the Web of Data to enhance question answering
- Scale question answering to the size and dynamicity of the Web
- Provide comprehensible answers for questions and justifications for these answers
- Consider trust and provenance, as well as data access control during question answering
- Discover high-quality datasets suitable for question answering, including cross-lingual, cross-border, and cross-domain settings
- Enable users to easily ask questions and find answers

Fifteen excellent international Early Stage Researchers (ESRs) have worked on research projects addressing these challenges and have developed several collaborations inside and outside the network. In an interdisciplinary and intersectoral research environment, the ESRs have received high-quality training including 3 R&D Weeks, 3 Learning Weeks, one Innovation Week and various workshops as well as the participation to international scientific conferences and workshops, academic and soft skills courses and accomplishment of industrial and academic secondments. In the four years of the project duration, the young researchers managed to develop their individual skills for pursuing a successful career in academia and industry, as well as to build a strong network, capable of advancing the state of the art in Question Answering over the Web of Data.

Work performed

The highlights of the research outcomes of WDAqua which advance the state of the art in Question Answering include:
- Methods for metadata modeling and management for linked data sources
- Efficient storage and query processing of knowledge graphs from heterogeneous data sources used for Question Answering
- Extensive studies and methods for improving the searchability, accessibility and user interfaces for structured Web datasets
- A Question Answering system over geospatial linked data
- Novel neural network based approaches for speech recognition, semantic parsing, disambiguation, and answer generation for user questions
- A generic method for developing Question Answering systems over various knowledge bases covering multiple languages
- A framework for static and dynamic composition of Question Answering systems and their evaluation

The research efforts of the four-year research programme led to over 70 scientific publications, including top research venues such as the Web Conference and the International Semantic Web Conference, which won several awards. Apart from this, WDAqua delivered to the research community a novel open and adaptable Question Answering architecture and platform, and various Question Answering components, which have been made available to the community as open source projects, as well as open datasets for Question Answering. Assets that can be further exploited in the research as well as the industry include 17 GitHub projects, 3 Docker releases, 2 online demos, 12 datasets and 1 patent, and several evaluation reports about various applications scenarios.

The WDAqua project offered a high-quality training programme, including peer-based activities, extensive supervision by experts, research visits, local courses, and 12 training events (Learning Weeks, R&D Weeks, technical workshops, Innovation Week, Career Fair) offering a big spectrum of technical and non-technical skills, engaging external trainers, senior scientists, and people from the industry.

WDAqua has disseminated the results not only at dozens of international conferences at which the project researchers presented their research contributions as well as workshops and tutorials co-organized with other researchers but also to the broader public through a website, social media, and online videos.

Final results

WDAqua managed to achieve several research contributions in the field of Question Answering using the Web of Data and became a notable project in Question Answering in the Semantic Web research community. The 15 young researchers developed into experienced scientists equipped with technical and non-technical skills which ensure a successful career in academia, industry and entrepreneurship. Through its various training, research, and innovation activities, WDAqua gained impact with respect to different aspects:
- It contributed to structuring doctoral research training at the European level and training and capacity building of early-stage researchers, opening different career paths to the ESRs, which are currently employed in the academia and industry or act as entrepreneurs.
- It developed cutting-edge technology on Question Answering over the Web of Data, which enables citizens to find answers to questions over heterogeneous data and multilingual sources.
- It delivered several open source software projects and datasets which have been downloaded tenths of times by other researchers and practitioners.
- It enables industry to build on the research results produced by the project in order to build domain-specific Question Answering systems targeting several applications, e.g., in the automotive or financial domain.

Website & more info

More info: http://wdaqua.eu.