CLSASTS

Rapid Cross-Lingual Speaker Adaptation for Statistical Text-to-Speech Systems

 Coordinatore Ozyegin University 

 Organization address address: NISANTEPE MAH ORMAN SOK 13
city: ALEMDAG CEKMEKOY ISTANBUL
postcode: 34794

contact info
Titolo: Dr.
Nome: Nilay
Cognome: Papila
Email: send email
Telefono: +90 216 564 95 68
Fax: +90 216 564 90 57

 Nazionalità Coordinatore Turkey [TR]
 Totale costo 100˙000 €
 EC contributo 100˙000 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2010-RG
 Funding Scheme MC-IRG
 Anno di inizio 2011
 Periodo (anno-mese-giorno) 2011-02-01   -   2015-01-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    Ozyegin University

 Organization address address: NISANTEPE MAH ORMAN SOK 13
city: ALEMDAG CEKMEKOY ISTANBUL
postcode: 34794

contact info
Titolo: Dr.
Nome: Nilay
Cognome: Papila
Email: send email
Telefono: +90 216 564 95 68
Fax: +90 216 564 90 57

TR (ALEMDAG CEKMEKOY ISTANBUL) coordinator 100˙000.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

source    tts    speaker    rapid    eigenvoices    then    statistical    synthesis    eigenvoice    function    multiple    language    maximum    speech    decade    problem    lingual    dominant    weights    sts    languages    regression    communicate    voice    cross   

 Obiettivo del progetto (Objective)

'Unit selection has been the dominant approach to text-to-speech synthesis (TTS) in the last decade. Recently, statistical TTS (STS) is proposed where statistical models are used for speech synthesis. The high quality and intelligibility speech it generates, the flexibility it offers in voice/speaker/emotion conversion, and its small memory requirements make STS systems a strong candidate to be the dominant TTS technology in the next decade. One of the most exciting research directions in the STS field is speaker adaptation where the goal is to adapt the voice characteristics to a target speaker. Maximum a posteriori and maximum likelihood linear regression methods are two of the common approaches for adaptation. There is also a recent and growing interest in cross-lingual speaker adaptation using the STS approach where the goal is to use speaker adaptation techniques to generate speech with a speaker’s voice characteristics in a target language that the speaker does not speak. Globalization and the need to communicate in multiple languages in social, economic, and political interactions increase the importance of the problem. In this proposal, a novel rapid adaptation approach is proposed for the cross-lingual speaker adaptation problem using the eigenvoice adaptation technique with the STS systems. In the proposed approach, two sets of eigenvoices are produced, one for the target language and one for the source language. Then, a regression function is generated between the source and target eigenvoice weights. Given a speaker, eigenvoice weights for the source eigenvoices are computed using a novel, perceptually-motivated objective function, and then the regression function is used to estimate the target eigenvoice weights which are then used to synthesize speech in the target language. The proposed system is expected to be the first high-performance cross-lingual speaker adaptation method for STS that can work with 5-10 seconds of adaptation data.'

Introduzione (Teaser)

With rapid globalisation and the need for communication across multiple languages, attention is increasingly being focused on the development of supporting tools and applications. A group of EU-funded researchers is working to contribute to advances in this area that will ultimately help people communicate more effectively.

Altri progetti dello stesso programma (FP7-PEOPLE)

SUBSTRATE USE (2011)

Linking substrate consumption to consumer identity in carbon-cycling microbes inhabiting anoxic marine sediments

Read More  

MUFOCA (2013)

"The behavioral, fMRI, and EEG profiles of multifocal attention"

Read More  

THZPOWERELECTRONICS (2013)

Enabling Technologies for High Power Terahertz Electronic Circuitry

Read More