The Lynx project was conceived to mitigate the compliance problems faced by SMEs and companies when engaging in trade abroad, by connecting or interlinking legal and regulatory data from different jurisdictions and institutions at various levels (internationally, nationally...
The Lynx project was conceived to mitigate the compliance problems faced by SMEs and companies when engaging in trade abroad, by connecting or interlinking legal and regulatory data from different jurisdictions and institutions at various levels (internationally, nationally, and regionally).
In order to bring the above-mentioned data together, Lynx relies on public open data, on the one hand, and on the formalisms and technologies provided by the Semantic Web and the Linked Data paradigms, on the other. The latter enables publishing data of varying nature in standardised structured formats (RDF) that permit to establish fine-grained relations between single data elements in a machine-processable format.
With this inspiration, the first objective of the project was declared: the creation of the Lynx multilingual Legal Knowledge Graph (henceforth LKG), a knowledge base related to compliance that integrates data and documents from multiple jurisdictions, in various languages, as well as open standards and sectorial best practice guidelines. This is the Lynx LKG in a strict sense. However, and thanks to the Linked Data principles, all the links of Lynx LKG resources to other resources available as linked data in the wider Web of Data are also considered as part of a more general graph of legal knowledge.
For the purpose of creating, and then exploiting, the Lynx multilingual LKG, an ecosystem of smart services is being developed. In the first stage, services are created to manage the ingestion and integration of documents in the LKG and their subsequent semantic annotation. Then, in order to consume and exploit the wealth of information in the LKG, other type of services are developed, such as semantic search, question answering, translation, summarisation, or recommendation services.
In the final step, the above-mentioned services will be configured into three pilots according to the industry needs represented by the Lynx business cases. These pilots will exploit the knowledge available in the LKG to provide three different compliance solutions, namely, (i) compliance assurance services for contracts; (ii) compliance assurance services in the geothermal energy sector; (iii) compliance solutions for strategy design in labour law.
The first stage in this project consisted in gathering the functional requirements expressed by potential final users of the Lynx platform and the ones identified by the business cases. As a consequence, scenarios and use cases were defined, and a prioritised list of required datasets and business case requirements was obtained, which resulted in the identification of a set of technical requirements. Such requirements were translated into specifications, and were then mapped to microservices. Finally, the microservice architecture was adopted and defined in a blueprint, and its main components were implemented.
A second major achievement for this period was the definition and execution of the Data Management Plan, according to the guidelines provided by the EC. Complementary to this, a CKAN-based Lynx Data Portal was deployed to document datasets of interest for the legal and regulatory domains. In parallel, public repositories of legal and regulatory corpus have been identified and, whenever possible, documents have been collected. Also, web-crawling techniques and other data harvesting methods have been employed to create a collection of documents that covers the needs of the Lynx business cases and that is used in the training of Neural Machine Translation engines. Collected corpora are also being used in the creation of specific terminological resources.
As for the microservices, first implementations are already available. Such services allow a unified treatment of Lynx data and documents, and cover the following functionalities: Word Sense Induction and Disambiguation (WSID), Dictionary Access (DA), Terminology Extraction (TermEx), Name Entity Recognition (NER), Temporal Expression Extraction (TimEx), Geographical Entity Recognition (Geo), Relation Extraction (RelEx), Entity Extraction (EntEx), Cross-lingual search (Sear), Question Answering (QADoc), Semantic Similarity (SeSim), Terminology Query (TermQ), Summarization (Summ), and Neural Machine Translation (Trans). They are all HTTP REST APIs described using OpenAPI online.
Derived from the analysis of the pilot requirements, five curation workflows have been defined: one common workflow termed LKG population, and four use case specific workflows known as Contract Analysis, Geothermal Project Analysis, and Labour Law Search.
For communication purposes, the project identity set was created, with the website as the main entry point to the project. The website offers a summary of project objectives and pilots; links to related projects and initiatives; pointers to deliverables and articles; relevant data models; access to the aforementioned Lynx Data Portal, to the initial version of the LKG and Lynx services APIs, as well as posts on related events.
As for dissemination and exploitation activities, several workshops have been organised, and Lynx has been present in more than a dozen of attended conferences and events, as well as scientific publications.
The objectives established in the Lynx project expect to contribute to compliance management solutions for SMEs and companies involved in internationalization processes. The first steps have been already taken and the embryo of the future LKG has been implemented. It provides access to legislation as open data, from multiple jurisdictions and in various languages in Europe, and builds on the technical specifications of the European Legislation Identifier (ELI) for the sake of interoperability. The LKG integrates legal and standards-related data sources according to Semantic Web formats, and provides a single access point to aggregated data sources. In this regard, it is expected that not only companies, but also citizens, benefit from such a unified interface to legal data.
Up-to-date, relevant legal and standards-related data sources are being identified and documented for a more efficient access through a CKAN-based Data Portal. Language resources (vocabularies, corpora) are being created and provided also through the Data Portal, so that they can be further reused when creating new opportunities for other sectors.
As for the technologies, first versions of new reusable services and platform technology are in place, which will enable third parties to create repurposed legal technology services. Annotation and extraction services are applied to multilingual legal data creating an added value to those resources. In the same line, smart search and access functionalities are being developed to navigate interlinked legal and standards-related data in a cross-lingual fashion.
Services will be configured into three pilots to demonstrate the added value of Lynx solutions. Such services, and the algorithms behind them, will be applied over open data sources and company-private ones in a transparent way to the user. The impact of the solutions based on Lynx is expected to reduce the time devoted to find the appropriate data and the costs incurred by companies when managing compliance.
More info: http://lynx-project.eu/.