TTC

Terminology extraction, translation tools and comparable corpora

 Coordinatore UNIVERSITE DE NANTES 

 Organization address address: quai de Tourville 1
city: Nantes
postcode: 44035

contact info
Titolo: Ms.
Nome: Pauline
Cognome: BOUDANT
Email: send email
Telefono: -40998462
Fax: -40998381

 Nazionalità Coordinatore France [FR]
 Totale costo 2˙663˙099 €
 EC contributo 2˙025˙000 €
 Programma FP7-ICT
Specific Programme "Cooperation": Information and communication technologies
 Code Call FP7-ICT-2009-4
 Funding Scheme CP
 Anno di inizio 2010
 Periodo (anno-mese-giorno) 2010-01-01   -   2012-12-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    UNIVERSITE DE NANTES

 Organization address address: quai de Tourville 1
city: Nantes
postcode: 44035

contact info
Titolo: Ms.
Nome: Pauline
Cognome: BOUDANT
Email: send email
Telefono: -40998462
Fax: -40998381

FR (Nantes) coordinator 0.00
2    EURINNOV SARL

 Organization address address: Rue Jean Goujon
city: Paris
postcode: 75008

contact info
Titolo: Mr.
Nome: Matthieu
Cognome: Rolland
Email: send email
Telefono: -215
Fax: -244

FR (Paris) participant 0.00
3    SOGITEC INDUSTRIES SA

 Organization address address: Rue Marcel Monge
city: Suresnes
postcode: 92158

contact info
Titolo: Mr.
Nome: Claude
Cognome: Méchoulam
Email: send email
Telefono: -166
Fax: -166

FR (Suresnes) participant 0.00
4    SYLLABS SARL

 Organization address address: RUE JEAN BAPTISTE BERLIER - PEPINIERE MASSENA
city: PARIS 13
postcode: 75013

contact info
Titolo: Ms.
Nome: Helena
Cognome: Blancafort
Email: send email
Telefono: -172
Fax: -177

FR (PARIS 13) participant 0.00
5    TILDE SIA

 Organization address address: VIENIBAS GATVE
city: RIGA
postcode: 1004

contact info
Titolo: Mr.
Nome: Aivars
Cognome: Berzins
Email: send email
Telefono: -67604630
Fax: -67605379

LV (RIGA) participant 0.00
6    UNIVERSITAET STUTTGART

 Organization address address: Keplerstrasse
city: STUTTGART
postcode: 70174

contact info
Titolo: Dr.
Nome: Ulrich
Cognome: Heid
Email: send email
Telefono: -82720
Fax: -82713

DE (STUTTGART) participant 0.00
7    UNIVERSITY OF LEEDS

 Organization address address: Woodhouse Lane
city: LEEDS
postcode: LS2 9JT

contact info
Titolo: Dr.
Nome: Serge
Cognome: Sharoff
Email: send email
Telefono: -7699
Fax: -3699

UK (LEEDS) participant 0.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

terminologies    web    domain    export    nthe    language    strategies    platform    corpus    generating    machine    contextual    automatic    topical    create    corpora    monolingual    word    terminology    lexical    alignment    languages    extraction    translation    automatically    tools    bilingual    ttc   

 Obiettivo del progetto (Objective)

The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) aims at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian), as well as in Chinese and Russian.nComparable corpora gather sets of texts corresponding to a same domain, but not necessary being a translation from each other.nThe main steps for automatically generating bilingual terminologies are the automatic extraction of monolingual terminologies and the bilingual alignment of the extracted terminologies. The terminologies will include single word terms (SWT) and multi-word terms (MWT), as well as their variations.nThe TTC project will develop generic methods and tools for automatic extraction of terminologies and alignment algorithms including adaptors to domains and languages, in order to break the lexical acquisition bottleneck in both statistical and rule-based machine translation. Alignment will be based on several strategies, i.e. lexical strategies (use of compositional methods and of an interlingua representation), contextual strategies (use of cognates, context vectors and labelled links) and corpora strategies (bettering of available corpora, for instance by topical web crawling). Developed methods will require as less prior linguistic knowledge as possible, so as to reduce the gaps in language coverage.nIt will also develop or adapt tools for gathering and managing these comparable corpora and for managing terminologies. In particular, a topical web crawler and an open terminology platform will be developed. This open terminology platform will support tasks such as terminology storage, search, editing and export.nThe TTC project will integrate developed and existing tools in an online platform, which will be based on Web Services and will use reputable open solutions such as UIMA (Unstructured Information Management Architecture ) and EuroTermBank . Existing tools to be integrated in the platform consist of already developed GPL term extraction tools, a framework for contextual analysis, as well as TreeTagger versions, tokenisers and POS taggers for several languages. The platform will allow users to create thematic corpora given some clues (such as terms or documents on a specific domain), to extract monolingual terminology from such corpora, to create a comparable corpus in a target language from a corpus in a source language, to align bilingual terminologies, to choose the tools to apply for terminology extraction, to expand a given corpus and to export monolingual or bilingual terminologies in order to use them easily in automatic and semi-automatic translation tools.

Altri progetti dello stesso programma (FP7-ICT)

HIVE (2008)

Hyper interaction viability experiments

Read More  

COWIN (2010)

Converging resources to support the value creation in Europe of Microsystems and Smart Miniaturized Systems research projects

Read More  

NESTER (2008)

Networked embedded and control systems technologies for Europe and Russia

Read More