Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - LEXICON POETICUM (Lexicon Poeticum: A lexical resource for Old-Norse Icelandic skaldic poetry and its relevant social fields)

Teaser

This fellowship begins a long-term project to produce a new dictionary of Old Norse poetry. The corpus was composed between the 9th and 14th centuries by poets from Norway and its settlements throughout Europe, especially Iceland. The dictionary will replace earlier works not...

Summary

This fellowship begins a long-term project to produce a new dictionary of Old Norse poetry. The corpus was composed between the 9th and 14th centuries by poets from Norway and its settlements throughout Europe, especially Iceland. The dictionary will replace earlier works not updated since the 1930s and is closely related to the Skaldic Project (skaldic.abdn.ac.uk), which has been producing new editions and translations of the corpus.

Lexicon Poeticum gives us an understanding of not only the language but also the thought world of Viking and medieval Scandinavia. The project develops interfaces and data models for treating complex texts, allowing links between the material evidence for the poetry (manuscripts and runic inscriptions) and its words, the poetry itself, and the dictionary. The web application developed by the project (lexiconpoeticum.org) aims to provide an accessible entry point into the words and concepts used by Viking and medieval Scandinavian poets. It is at the same time the interface used to produce the dictionary, updated in real time, and so brings the work of the scholar closer to the broader community.

The project aims to develop and analyse a resource for understanding the lexicon of Old Norse poetry. It does this by integrating as closely as possible with existing dictionaries (esp. the Dictionary of Old Norse Prose (ONP: onp.hum.ku.dk) and corpora (the Skaldic Project and the Codex Regius project), including overcoming major differences in modelling of each project. It develops a semantic model for a large part of the lexicon based on native ontologies found in related texts.

Work performed

Work performed:

Developed an interface to link words in poetry in the corpus to the dictionary (lemmatisation), including variants from different manuscripts of the same poem
Performed lemmatisation of the published corpus (100%) and further editions in progress (in total 117,000 words)
Added and lemmatised variants from the corpus (approx. 25% completed)
Developed an interface for viewing and querying the resource, as well as for editing and updating it (http://lexiconpoeticum.org)
Connecting the wordlist of LP to the existing Dictionary of Old Norse Prose (ONP): a mixture of automated linking and manual linking and checking
Working with ONP to make it more accessible and available to other projects so that LP can continue to link to it
Developed a system for importing XML-based corpora and either or both automatically and manually lemmatising them
Incorporated Codex Regius project’s XML edition to supplement the Skaldic Project’s coverage of the poetic corpus (resulting in >99% of total target corpus available)
Started developing an ontology for the words in lexicon based on native ontology as found explicitly in native poetological works and implicitly in the extended diction system used by poets
Developed quantitative methods for understanding the lexicon including a method for comparing lexicon size for different sized corpora

The project so far has resulted in a number of methodological advancements, the development of a very large public resource, and some initial results in the quantitative analysis of the lexicon in relation to the corpus:

A new method for efficient assisted manual linking of corpus words to dictionary headwords (presented at DHN 2017, Euralex 2018)
A new method for highly accurate automated linking of words to dictionary headwords (presented at Euralex 2018)
A new method and results in comparing different sized corpora, taking into consideration the non-linear nature of lexicon size in relation to corpus size (presented at Saga Conference 2018)
A native ontology implemented for semantic classification of words in the lexicon (presented at ICHLL9)
Modelling of lexicon size based on identifying constants for each corpus that can be used to predict lexicon size according to corpus size

The progress of the project is also documented on the project web site (See http://skaldic.abdn.ac.uk/m.php?p=doclp&i=989)

Final results

New resources and results:

A lexicon of 15,000 words linked to 700,000 instances of words on manuscript pages
Close integration with the standard prose dictionary with continued co-development of the two dictionaries
Incorporation of the remainder of the poetic corpus from the Codex Regius project, including a new system for incorporating new digital XML editions into both LP and ONP
A highly accurate and very fast mixed automated-manual lemmatisation system, allowing new texts to be rapidly and accurately linked to dictionary headwords
New insights into the lexical richness of the corpus, which uses twice as many words per thousand words of text as the prose corpus, and substantially more than comparable poetic corpora.

The broader impact of the project is still to be realised, but the Fellow has recently been approached by a major games software company in relation to the project.

Website & more info

More info: http://lexiconpoeticum.org.