Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - ECOLANG (Ecological Language: A multimodal approach to language and the brain)

Teaser

The human brain has evolved the ability to support communication. In every day settings, language is learned, and mostly used in face-to-face contexts in which multiple cues contribute to comprehension: speech, including intonation of the voice, what has been said before, the...

Summary

The human brain has evolved the ability to support communication. In every day settings, language is learned, and mostly used in face-to-face contexts in which multiple cues contribute to comprehension: speech, including intonation of the voice, what has been said before, the face (especially mouth movements) and the hands: gestures that are linked to what is being said, as pointing to an object while talking about it, and also gestures that evoke imagery of what is being said. Yet, our understanding of how language is learnt and used in adulthood, and of the associated neural circuitry comes almost exclusively from studying language as merely speech or text, in which the real-world context is eliminated or greatly reduced.

The overarching goal of ECOLANG is to pioneer a new way to empirically study language learning and comprehension, moving from the traditional (reductionist) approach to a real-world approach in which language is analysed in its rich face-to-face multimodal environment. In particular, we ask whether and how do children and adults use the multimodal cues to learn new vocabulary and to process known words. We further ask how the brain integrates the different types of multimodal information during language learning and comprehension.

Using a real-world approach to language learning and comprehension will provide key insights to the treatment of developmental and acquired language disorders, potentially leading the way to new treatments. It will also provide novel constraints for automatic language processing, leading to improved performance by automatic systems in learning and processing language and in interacting with humans.

Work performed

During the period, we have focused on collecting and annotating the corpus of real-world language. This comprises interactions of an adult speakers describing sets of objects to a naïve addressees (adults or 3-4-year-old children). These interactions are designed to mimic as closely as possible the type of everyday interactions between two people. Of special interest to us is whether speakers will use multimodal cues in a different manner when talking to a child or an adult; whether they use the cues differently when talking about objects that are known or new to their partner and finally when the objects are present or absent. Each conversational partner is videorecorded. Speakers wear eyetracking glasses in order to record their eyegaze during the interaction and a microphone. Kinect technology is used to capture speakers’ movements.

We have collected data for thirty caregiver-child dyads, and about 25 adult-adult dyads. We have developed protocols for the annotation of the data, focusing on the speaker including: (a) transcription of speech; (b) coding of the prosodic cues present in the communication; (c) gesture coding; (d) eyegaze annotation; (e) Mouth informativeness (i.e., how informative for a give word the lip movements are). Data annotation is labour intensive and is carried out bringing together automatic tools, crowdsourcing and manual coding.

Our main results so far concern data from the children (for which we are further in the annotation process). These are:

1. Caregivers use the multimodal cues in their interaction with children when the cues are most useful for the learning, thus they can and do modulate their communication in order to maximise opportunities for learning.

2. Some of the different cues show a developmental trend: caregivers use a larger number of onomatopoeic words (e.g., bang) when talking to younger than older children.

Together, these results begin to unravel a comprehensive picture of how caregivers support their young children in learning new words and concepts.


In addition to collecting and annotating the corpus, during the period we have begun developing computational models of how the different cues are statistically distributed and integrated in the caregiver-child interactions, building on previous work in computational linguistics in which measures of the predictability of a word are derived from language corpora.

Finally, we carried out a first EEG study investigating the brain signatures to processing online the multimodal cues. Using our annotation protocol, we quantified each of these cues. We use these measures to predict subjects’ EEG responses to words in short passages, produced by an actor instructed to read the passages in a natural manner (including gesturing).

We found that the brain activity associated with processing is always modulated by the multimodal cues. This provides direct neural evidence that language comprehension cannot be reduced to linguistic processing but entails the dynamic integration of speech and the various non-linguistic cues.

Final results

Progress beyond state-of-the-art

ECOLANG pioneers a new way to empirically study spoken language. We study real-world language as a multimodal phenomenon: the input comprises linguistic information, but also prosody, mouth movements, eye gaze and gestures. This contrasts with the current mainstream reductionist approach in which language is most often reduced to speech or text. By bringing in the multimodal context, we blur the traditional distinction between language and communication currently present in linguistics, psychology of language, and neurobiology of language. Crucially, we also study language in real-world settings in which interlocutors are adults or children and learning new words is intertwined with processing known words. This contrasts with current approaches in which language processing is studied in adults and language learning is studied in children.



Expected Results

1. Collection and annotation of the multimodal corpus
To our knowledge, this will be the first naturalistic annotated corpus that comprises both adult-to-adult and adult-to-child conversations that manipulate key aspects of the context. These manipulations are expected to bring about different weighted combinations of the cues, e.g., will visible cues (gesture, mouth movements), in addition to prosody, be more prominent in language toward a child than in adult directed language? Is gesture different when objects are present (pointing to the objects) vs when they are absent (gestures iconic of referent properties)?

2. Computational models of multimodal language
We will develop the first computational account of how multimodal cues are combined in spoken language, tested against behavioural and brain data.

3. Behavioural and Neurophysiological signatures of multimodal language learning and processing. Our experiments will provide first evidence regarding whether and how the multimodal cues in input are used in processing and learning.

4. Toward an understanding of the natural organisation of language in the brain. Our imaging experiments will provide the first evidence concerning how neural networks are orchestrated in multimodal language. The patients’ study will allow us to establish whether there is a causal connection between activation of separable subnetworks and use of specific cues. This may have profound implication for rehabilitation.

Website & more info

More info: http://www.language-cognition-lab.org/research/ecological-language/.