#	Pagina
attuale pagina	/open-h2020/projects/210540/index.html

Opendata, web and dolomites

NonSequeToR SIGNED

Non-sequence models for tokenization replacement

Total Cost €

EC-Contrib. €

Partnership

Views

NonSequeToR project word cloud

Explore the words cloud of the NonSequeToR project. It provides you a very rough idea of what is the project "NonSequeToR" about.

severely social reflects computational sequence broken laid impeded poor complexity differ symbolic idea natural economic language rates am representation structure form radically surface error nlp computer genres humans ing communication czech edited interdisciplinary interaction foundations domain limitations languages expertise machine symbol arbitrary interfaces science learning deep prone underlying representations translation limited tokenization models treat embedding successful basis captures human linguistics position english noisy liberate forms vector powerful heuristics performance concerned media morphological

Project "NonSequeToR" data sheet

The following table provides information about the project.

Coordinator	LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN Organization address address: GESCHWISTER SCHOLL PLATZ 1 city: MUENCHEN postcode: 80539 website: www.uni-muenchen.de contact info title: n.a. name: n.a. surname: n.a. function: n.a. email: n.a. telephone: n.a. fax: n.a.
Coordinator Country	Germany [DE]
Total cost	2˙500˙000 €
EC max contribution	2˙500˙000 € (100%)
Programme	1. H2020-EU.1.1. (EXCELLENT SCIENCE - European Research Council (ERC))
Code Call	ERC-2016-ADG
Funding Scheme	ERC-ADG
Starting year	2017
Duration (year-month-day)	from 2017-10-01 to 2022-09-30

Partnership

Take a look of project's partnership.

#	participants	country	role	EC contrib. [€]
1	LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN Organization address address: GESCHWISTER SCHOLL PLATZ 1 city: MUENCHEN postcode: 80539 website: www.uni-muenchen.de contact info title: n.a. name: n.a. surname: n.a. function: n.a. email: n.a. telephone: n.a. fax: n.a.	DE (MUENCHEN)	coordinator	2˙500˙000.00

Map

Project objective

Natural language processing (NLP) is concerned with computer-based processing of natural language, with applications such as human-machine interfaces and information access. The capabilities of NLP are currently severely limited compared to humans. NLP has high error rates for languages that differ from English (e.g., languages with higher morphological complexity like Czech) and for text genres that are not well edited (or noisy) and that are of high economic importance, e.g., social media text.

NLP is based on machine learning, which requires as basis a representation that reflects the underlying structure of the domain, in this case the structure of language. But representations currently used are symbol-based: text is broken into surface forms by sequence models that implement tokenization heuristics and treat each surface form as a symbol or represent it as an embedding (a vector representation) of that symbol. These heuristics are arbitrary and error-prone, especially for non-English and noisy text, resulting in poor performance.

Advances in deep learning now make it possible to take the embedding idea and liberate it from the limitations of symbolic tokenization. I have the interdisciplinary expertise in computational linguistics, computer science and deep learning required for this project and am thus in the unique position to design a radically new robust and powerful non-symbolic text representation that captures all aspects of form and meaning that NLP needs for successful processing.

By creating a text representation for NLP that is not impeded by the limitations of symbol-based tokenization, the foundations are laid to take NLP applications like human-machine interaction, human-human communication supported by machine translation and information access to the next level.

Deliverables

List of deliverables.
Data Management Plan	Open Research Data Pilot	2019-03-25 09:52:52

Take a look to the deliverables list in detail: detailed list of NonSequeToR deliverables.

Publications

List of publications.
year	authors and title	journal	last update
2019	Timo Schick, Hinrich SchÃ¼tze Learning Semantic Representations for Novel Words: Leveraging Both Form and Context published pages: , ISSN: , DOI: 10.5282/ubm/epub.61859		2019-06-06
2018	Philipp Dufter, Hinrich SchÃ¼tze A Stronger Baseline for Multilingual Word Embeddings published pages: , ISSN: , DOI: 10.5282/ubm/epub.61864		2019-06-06
2019	Apostolos Kemos, Heike Adel, Hinrich SchÃ¼tze Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging published pages: , ISSN: , DOI: 10.5282/ubm/epub.61846		2019-06-06
2019	Timo Schick, Hinrich SchÃ¼tze Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking published pages: , ISSN: , DOI: 10.5282/ubm/epub.61863		2019-06-06
2018	Yadollah Yaghoobzadeh, Heike Adel, Hinrich Schuetze Corpus-Level Fine-Grained Entity Typing published pages: 835-862, ISSN: 1076-9757, DOI: 10.1613/jair.5601	Journal of Artificial Intelligence Research 61	2019-06-06
2019	Timo Schick, Hinrich SchÃ¼tze Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts published pages: , ISSN: , DOI: 10.5282/ubm/epub.61844		2019-06-06
2018	Wenpeng Yin, Hinrich SchÃ¼tze Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms published pages: 687-702, ISSN: 2307-387X, DOI: 10.1162/tacl_a_00249	Transactions of the Association for Computational Linguistics 6	2019-06-06
2018	Nina Poerner, Masoud Jalili Sabet, Benjamin Roth and Hinrich SchÃ¼tze Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective published pages: , ISSN: , DOI: 10.5282/ubm/epub.61865		2019-06-06
2019	Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita, Stefan RÃ¼d, Hinrich SchÃ¼tze SMAPH published pages: 1-42, ISSN: 1046-8188, DOI: 10.1145/3284102	ACM Transactions on Information Systems 37/1	2019-06-06

Are you the coordinator (or a participant) of this project? Plaese send me more information about the "NONSEQUETOR" project.

For instance: the website url (it has not provided by EU-opendata yet), the logo, a more detailed description of the project (in plain text as a rtf file or a word file), some pictures (as picture files, not embedded into any word file), twitter account, linkedin page, etc.

Send me an email (fabio@fabiodisconzi.com) and I put them in your project's page as son as possible.

Thanks. And then put a link of this page into your project's website.

The information about "NONSEQUETOR" are provided by the European Opendata Portal: CORDIS opendata.