AUTOWORDNET

The Automatic Generation of Lexical Databases Analogous to WordNet

 Coordinatore UNIVERSITE D'AIX MARSEILLE 

 Organization address address: Boulevard Charles Livon 58
city: Marseille
postcode: 13284

contact info
Titolo: Ms.
Nome: Celine
Cognome: Damon
Email: send email
Telefono: +33 4 91998595
Fax: +33 4 91998599

 Nazionalità Coordinatore France [FR]
 Totale costo 258˙475 €
 EC contributo 258˙475 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2010-IEF
 Funding Scheme MC-IEF
 Anno di inizio 2012
 Periodo (anno-mese-giorno) 2012-09-01   -   2014-08-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    UNIVERSITE D'AIX MARSEILLE

 Organization address address: Boulevard Charles Livon 58
city: Marseille
postcode: 13284

contact info
Titolo: Ms.
Nome: Celine
Cognome: Damon
Email: send email
Telefono: +33 4 91998595
Fax: +33 4 91998599

FR (Marseille) coordinator 258˙475.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

databases    synsets    related    english    lexical    language    resource    relations    words    word    wordnet    languages    semantic    computing   

 Obiettivo del progetto (Objective)

'WordNet is a lexical database of English where words are grouped into sets of synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet has turned out to be an indispensable resource in the processing of natural language, and based on its model similar lexical databases were created for many other languages.

However, constructing such databases takes many years of work and is very costly. On the other hand, methods for the automatic identification of semantically related words based on large text corpora have reached a considerable degree of maturity, with the results coming close to native speakers’ performance. The proposed project aims at further refining and extending these approaches, thereby making it possible to fully automatically generate a resource similar to WordNet. The developed system will be largely language independent and is to be applied to four European languages, namely English, French, German, and Spanish. The resulting databases will be made freely available on the internet.

This is an outline of the proposed methodology: Starting from a part-of-speech tagged corpus, various methods for computing related words, such as syntax-based or utilizing latent semantic analysis, are applied and the results are systematically compared. The quality is evaluated by comparing the simulation results to a recently published data set comprising the 200,000 human similarity judgments from the Princeton Evocation project, rather than to the well established but inadequate 80 item TOEFL dataset. To identify synsets, an algorithm for unsupervised word sense induction is applied, and each word in the vocabulary is assigned to one or (if ambiguous) several of the synsets. Finally, to determine the relations between words (e.g. synonymy, hyponymy, holonymy, meronymy), an adapted version of Peter Turney’s approach for computing relational similarities is developed and applied.'

Altri progetti dello stesso programma (FP7-PEOPLE)

CROSS-TALK (2008)

Health-promoting cross-talk between intestinal microbiota and Humans

Read More  

REMEDIAM (2014)

Impact of polyaromatic hydrocarbons on arbuscular mycorrhizal fungi and biochemical and molecular mechanisms involved in plant protection and pollutant dissipation

Read More  

GUT DCS IN IBD (2009)

Intestinal dendritic cells and gut T-cell homing in inflammatory bowel diseases

Read More