MULTILEX

Multilingual Lexicon Extraction from Comparable Corpora

 Coordinatore JOHANNES GUTENBERG UNIVERSITAET MAINZ 

 Organization address address: SAARSTRASSE 21
city: MAINZ
postcode: 55099

contact info
Titolo: Dr.
Nome: Sascha
Cognome: Hofmann
Email: send email
Telefono: +49 7274 508 35111
Fax: +49 7274 508 35412

 Nazionalità Coordinatore Germany [DE]
 Totale costo 100˙000 €
 EC contributo 100˙000 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2013-CIG
 Funding Scheme MC-CIG
 Anno di inizio 2014
 Periodo (anno-mese-giorno) 2014-09-01   -   2018-08-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    JOHANNES GUTENBERG UNIVERSITAET MAINZ

 Organization address address: SAARSTRASSE 21
city: MAINZ
postcode: 55099

contact info
Titolo: Dr.
Nome: Sascha
Cognome: Hofmann
Email: send email
Telefono: +49 7274 508 35111
Fax: +49 7274 508 35412

DE (MAINZ) coordinator 100˙000.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

of    words    translated    acquisition    language    languages    alignments    multiword    corpora    texts    parallel    translations    cross    human    or    word   

 Obiettivo del progetto (Objective)

'Given large collections of parallel (i.e. translated) texts, it is well-known how to, by successively applying a sentence- and a word-alignment step, establish correspondences between words across languages. However, parallel texts are a scarce resource for most language pairs involving lesser-used languages. On the other hand, human second language acquisition seems not to require the reception of large amounts of translated texts, which indicates that there must be another way of crossing the language barrier. Apparently, the human capabilities are based on looking at comparable resources, i.e. texts or speech on related topics in different languages, which, however, are not translations of each other. Comparable (written or spoken) corpora are far more common than parallel corpora, thus offering the chance to overcome the data acquisition bottleneck. Despite its cognitive motivation, in the proposed project we will not attempt to simulate the complexities of human second language acquisition, but will show that it is possible by purely technical means to automatically extract information on word- and multiword-translations from comparable corpora. The aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new methodology which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between words and multiword-units. 3) Improving the quality of computed word translations by applying an interlingua approach, which, by relying on several pivot languages, allows a highly effective multi-dimensional cross-check. 4) We will show that, by looking at foreign citations, language translations can even be derived from a single monolingual text corpus.'

Altri progetti dello stesso programma (FP7-PEOPLE)

REDHOTGEN 2 (2012)

Genetic and physiological regulation of skin red colour development on apples under high temperature environments: Genetic tools for developing heat tolerant red-skinned apples

Read More  

STORM (2015)

Stemming the rising tide: The protective role of saltmarshes

Read More  

SEISMOLOS (2009)

Ground-motion modelling for seismic hazard assessment in regions with moderate to low seismic activity

Read More