Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - Demeco (Default meanings in compound interpretation)

Teaser

Languages always change, and existing words are constantly combined into novel combinations such as the compound nouns ‘trout pout’ (lips injected with collagen) or ‘bucket list’ (things to do before you die). Surprisingly, little is known about the actual processes...

Summary

Languages always change, and existing words are constantly combined into novel combinations such as the compound nouns ‘trout pout’ (lips injected with collagen) or ‘bucket list’ (things to do before you die). Surprisingly, little is known about the actual processes and resources humans use to understand these novel combinations when they encounter them for the first time. This project investigated a central question in the field of linguistics, namely the extent to which people rely on inherent meanings of words in combination (lexical semantics), compared with the extent to which they infer meaning from context (pragmatics). Focussing on English noun-noun compounds, we proceeded in two steps. We first investigated how humans interpret new compounds when they see them in the absence of any context. In particular, we wanted to know whether, for a given group of language users, some compounds show more variation in interpretation than others, and if so, what determines the amount of variation shown. Secondly, we investigated how twenty of these combinations were interpreted in context. Using contexts designed to bias the reader towards unusual interpretations of the compounds, we were interested in how reading time varied between compounds that have a clearly preferred reading in isolation (a ‘default’ interpretation) as opposed to compounds that do not have clearly preferred readings in isolation. This design was intended to enable us to evaluate two rival theories about the relative importance of different information sources to language users. In the theory of generalised conversational implicatures, any default interpretation is expected to arise automatically in the minds of speakers, and therefore needs to be ‘cancelled’ if not compatible with the context. According to this theory, the stronger the default interpretation in isolation, the longer it should take to read a compound when the context requires a different interpretation, because of the effort involved in over-riding the default. In contrast, in the theory of default semantics, default interpretations only arise in the minds of speakers when they are compatible with the context. According to this theory, interpretations given out of context should have no effect on reading time for compounds in context. A better understanding of how language works is potentially useful in a wide variety of applications, including language teaching, speech and language therapy, translation and all interactions of humans with computers.

Work performed

We completed the two studies outlined above, the first on compound interpretation in isolation, and the second on compound interpretation in context. We wanted to use compounds that had actually been coined by language users, but would nevertheless be new to our participants. To create this set of novel compounds, we selected 45 compounds that occurred only once in the ukWaC corpus, a huge database of 2 billion words taken from English texts on UK websites. Our assumption was that combinations that occurred so rarely were likely to be new to most speakers of the language.

Study 1 investigated the interpretation of these compounds in isolation. In a computer-based questionnaire study, 20 participants gave their interpretation of the compounds and rated how difficult it was to come up with an interpretation. The participants were all native speakers of British English who were aged 16-19 years and had grown up in the East of England. The compounds varied widely in the number of interpretations they received, and only four compounds had a preferred interpretation on which at least half the participants agreed. To quantify the variation, we introduced three measures: the percentage of participants who gave a non-unique interpretation (convergence), the number of different interpretations given (spread), and the degree of unpredictability of the interpretations (entropy). All three of these measures were found to correlate with the perceived difficulty of interpretation; in other words, the greater the variation in the interpretations given by our participants, the more difficult they perceived the compound to interpret. This is evidence that some compounds tend more than others towards having a default interpretation, whether defaultness is understood in terms of agreement between speakers (as in the theory of generalised conversational implicatures) or in terms of ease of interpretation (as in the theory of default semantics). Furthermore, these two understandings of defaultness converge, in the sense that the same compounds are more or less ‘default’ irrespective of which approach is adopted. We then looked to see whether the relative ease of interpretation of different compounds could be explained in terms of their linguistic properties and the properties of other compounds sharing a word, the so-called compound constituent families: for example, the constituent families of ‘bank account’ include ‘bank manager’, ‘bank statement’, ‘bank holiday’, ‘credit account’, ‘cheque account’, ‘eye-witness account’ and so on. We found that the best predictor of interpretation difficulty was the variability in senses of the second noun in its constituent family: the greater the unpredictability of the meaning of the second noun in other compounds, the more difficult was the compound to interpret. This suggests that disambiguating the individual words, especially the final word, is a crucial step in compound interpretation.

Study 2 investigated the behaviour of twenty of these compounds in context. We created contexts that biased the readers towards an unusual interpretation of the compound, and tracked the eye movements of participants when reading these compounds. In addition to the eye-tracking itself, Study 2 also consisted of an extensive pre-test, which we used to check that the contexts were likely to produce the interpretations we expected, and a post-test, to check what the participants actually thought. At the time of writing the eye-tracking data is still being analysed.

Final results

Our results so far show that compound interpretation in isolation shows much more variation than previously assumed. We are also the first to show a correlation between interpretational difficulty and properties of the compound constituent families, in particular, the variation in senses of the second noun. This finding has huge implications for psycholinguistic models of language as well as for computational approaches to language, which often treat words as having a single semantic representation. Our finding forces a rethinking of the role of word sense disambiguation in both these fields. Interpretation of compounds is a particular challenge for computer programmes that interpret human language, such as Google translate, and our findings point to fruitful directions for research to address these challenges. For cognitive science, our results show that individual and even idiosyncratic knowledge of the world is a major factor in language comprehension, possibly as important as the words themselves.

Website & more info

More info: http://www.demeco-project.eu/.