Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - EURHISFIRM (Historical high-quality company-level data for Europe)

Teaser

With economic growth still slow in some parts of Europe, the key societal challenges facing the European Union are investment, growth, and job creation. Unstable capital markets had undermined corporate investments and led to unemployment and social inequality. To meet these...

Summary

With economic growth still slow in some parts of Europe, the key societal challenges facing the European Union are investment, growth, and job creation. Unstable capital markets had undermined corporate investments and led to unemployment and social inequality.

To meet these challenges, the European Union needs sound scientific evidence. Big data are promising tools in science, but they lack the depth that historical data (usually paper-based and not yet existing in digital form) can provide to understand the dynamics of the past, the present and the future. Indeed, the current lack of high quality long-term empirical European data prevents the usage and testing of models for analysing structural and cyclical changes. For example, the 2008 Financial Crisis began in the mortgage-backed security (MBS) sector, where risk models went astray because they were calibrated on only five to ten years of historical data taken from a benign period. If longer-term historical data had been available to build empirical models based on larger timespans, a more robust understanding of the underlying currents and risks could have been possible, which could have potentially led to better preparation or forewarning on the crisis.

The heterogeneity of historical business rules and practises of different national and regional variations call for an ad hoc Research Infrastructure (RI) that can connect to other existing systems. IT research must develop innovative technologies that push forward the technological frontier in the historical social sciences: the scaling up of the variety, quantity, and quality of long-term data. Digitalized historical sources as part of the European cultural heritage represent a shared wealth in citizenship, cultural growth, and economic potential. EURHISFIRM meets the need for establishing state-of-the-art benchmark RIs in Europe in the social sciences and humanities, in which both big and historical data have yet to be fully exploited.

EURHISFIRM therefore designs a world-class RI to connect, collect, collate, align, and share detailed, reliable, and standardized long-term company-level data for Europe to enable researchers, policymakers, scholars, and private companies to analyse, develop, and evaluate effective strategies to promote investment and economic growth. The creation of a vibrant European community will support the project’s development based on innovative technologies to spark a “big data revolution” in the historical social sciences within the open science landscape and to open access to cultural heritage in cooperation with existing RIs. This design will enable ESFRI, member states and other funding bodies to decide on the further preparation and implementation of the RI.

Work performed

1. Economic history: WP4, with the project’s economic historians, catalogued EURHISFIRM’s data sources and selected a common documentation standard (the DDI-Lifecycle standard). In WP7, the economic historians work with the technical developers to ensure that the tools developed can process the sources correctly and can take into account historical nuances by providing initial information to train the deep-learning machines, verifying the results of the tools developed, and producing historical reference documents.
2. Information technology: Most of the deliverable deadlines of the technical WPs (5, 6, 7, 9) begin on or after this reporting period, but the progress is well underway. WP5 is studying existing database models and key elements to be considered for the common data model and has established a methodology for common data model evaluation. It coordinates the inter-WP Working Group on Identification and Standardization (WGIS) to implement project-wide technical standards. WP6 has begun its work on data connection and linkage of independent databases. It will use the consortium’s most advanced databases (Brussels and Paris) as a first test case and has started discussions with other institutions for further test cases. WP7 is developing its own deep learning based-OCR (optical character recognition) system to recognise the sources’ tabular structures and texts. It is also conducting tests to validate and improve the tools developed. Web linking tools (to connect the sources’ data to those found in the world wide web) have begun to be evaluated in task 7.4. WP9 has begun investigating the architectural framework by studying materials from existing European RIs and participating in related technical discussions with other WPs.
3. Practical operations: WP3 is working legal recommendations concerning the RI design in the open science context (e.g. data rights, user and access rights, privacy laws). WP8, in charge of understanding the target users and their needs, has conducted quantitative and qualitative studies. WP10’s work in business model and governance will begin in the third quarter of 2019, but it has begun a preliminary assessment of business and governance model alternatives.
4. Project administration: WP1 runs the project’s logistics and administration, coordinates with other WPs in the overall project management, and works with the Executive Committee and the Steering Committee to drive the strategy (including compliance to open science and and FAIR data principles through a continuously updated Data Management Plan). WP2 leads the project’s communication and outreach tasks, including the website and social network channels, the project identity and logo, and organisation of the General Assemblies. The community building task, which concerns both WPs 1 and 2, is also a project-wide effort from all consortium members and a key part of EURHISFIRM’s ambitions.

Final results

EURHISFIRM will produce a comprehensive design for an RI handling Pan-European long-term company-level data. Its federative nature will allow the data sources to maintain their respective characteristics and national idiosyncrasies while enabling cross-referencing with other databases. This will permit integration of new sources and technologies while being minimally intrusive to existing implementations and will encourage national research centres to grow both collaboratively and independently. This target aligns with the larger ambition of advancing European research collaborations/communities and scientific excellence to the forefront by exploiting modern technology to make available important and untapped historical sources. The availability of historical financial data will permit a more evidence-based and profound understanding of current and future European economic events. The project will also address the larger, global challenge of converging modern technology with historical scientific sources and will serve as a framework for other RIs seeking scientific, technological, and practical durability that also promote open science/FAIR data principles.

The project’s technical core consists of the intersection between economic history and information technology. By the end of the project, EURHISFIRM will have finalised its technical designs based on its deep learning based-OCR platform, federative common data model, architectural and infrastructural components, and user experience. The project’s practical considerations (business model, legal considerations, target user research, cultural heritage) and project administration (management/strategy [including compliance to open science/FAIR data principles], community building and communications) will also be finalised.

Website & more info

More info: https://eurhisfirm.eu/.