Commentary on Steels, L. & Belpaeme, T.


Word counts:

Abstract: 60 words

Main text: 955 words

References: 360 words

Total text: 1,470 words


Learning colour words is slow: a cross-situational learning account


Paul Vogt1,2
1Language Evolution and Computation Research Unit
School of Philosophy, Psychology and Language Sciences
University of Edinburgh
40 George Square, Edinburgh EH8 9LL,
United Kingdom

2Induction of Linguistic Knowledge
Computational Linguistics
Tilburg University
P.O. Box 90153, 5000 LE Tilburg,
The Netherlands.
+44 131 650 3960

Andrew D. M. Smith
Language Evolution and Computation Research Unit
School of Philosophy, Psychology and Language Sciences
University of Edinburgh
40 George Square, Edinburgh EH8 9LL,
United Kingdom
+44 131 651 1837




Abstract Research into child language reveals that it takes a long time for children to learn the correct mapping of colour words. Steels and Belpaeme’s guessing game, however, models fast learning of words. We discuss computational studies based on cross-situational learning, which yield results that are more consistent with the empirical child language data than those obtained by Steels and Belpaeme.

Steels and Belpaeme (hereinafter S&B) have successfully shown how computational modelling can contribute greatly to the study of the evolution of language and cognition. S&B have – in our opinion correctly – decided to write the paper from an engineer’s point of view. We feel, however, that their model of linguistic communication would have been more realistic, and therefore the results they obtained more robust, if they had used a model of acquiring colour categories through multiple contexts.


S&B model the communication between agents using the guessing game model, which is, in itself, not unreasonable. Their claim, however, that this game is “equivalent” to colour chip naming experiments carried out by anthropologists (Section 2.4.2), is not justified, in our opinion. The guessing game is primarily a model of learning through corrective feedback, whereas colour chip naming experiments consist of an anthropologist (A) asking an informant (B) to point out, on a chipset, the focal colour of a colour term from B’s language. There are three important differences between the anthropological experiments and the guessing game. Firstly, B is not doing any learning - in fact, A is learning about B’s representation of colour and about B’s language. Secondly, A does not correct B’s responses or provide any feedback about them. Finally, there is no negotiation between A and B about what the words should refer to.


This positive feedback loop between the choice of which words to use and their success in communication is the main learning mechanism in the guessing game. Indeed, S&B claim that the feedback loop is a necessary requirement for cultural language development (Section 5, condition 1), although in fact it is widely accepted that children receive little, if any, corrective feedback while learning words (Bloom 2000, but see Chouinard & Clark (2003) for an alternative account).  In computational simulations of  lexicon creation and learning, similar to those presented by S&B, we have shown that agents using a cross-situational statistical learner (a variant of Siskind’s (1996) cross-situational learner) can successfully develop a shared vocabulary of grounded word meanings without corrective feedback (Smith 2003; Vogt 2004). In our model, as in guessing games, hearers have to infer what speakers are referring to, but, unlike in guessing games, the agents do not have any way of verifying the effectiveness of their attempts at communication. Instead, the agents use co-variances to learn a mapping between words and categories based on the co-occurrence of words and potential referents across multiple situations.


Although young children do learn to relate colour terms to colours, it takes them a considerable length of time to find the appropriate mappings (e.g., Andrick & Tager-Flusberg 1986; Sandhofer & Smith 2001). For instance, it has been estimated that, on average, children required over 1,000 trials to learn the three basic colour terms “red”, “green” and “yellow” (Rice 1980, cited in Sandhofer & Smith 2001). Sandhofer & Smith suggest that children go through different stages in learning colour words: first they appear to learn that colour terms relate to the domain of colour, and only then can they actually learn the correct mapping. This has also been observed by Andrick & Tager-Flusberg (1986), who additionally suggest that children find it difficult to learn the boundaries of colour categories, thus slowing down the learning of colour words. Research into child lexical acquisition is, of course, dominated by the problem of referential indeterminacy, and many constraints have been suggested to explain how children reduce indeterminacy (see, e.g., Bloom 2000). Very few of these accounts, however, allow for the fact that children hear words in multiple different contexts, and can use this to determine the intended reference. Recent empirical research, indeed, shows that a cross-situational model of learning provides a robust account of lexical acquisition in general, and of the acquisition of adjectives, including colour categories, in particular. Houston-Price et al. (2003) suggest that the children in their study used cross-situational learning to disambiguate word reference, even though their experiments were designed with attentional cues. In addition, Mather & Schafer (2004) show that children can learn the reference of nouns by exploiting co-variations across multiple contexts. Akhtar & Montague (1999) demonstrate that children use cross-situational learning to discover the meanings of novel adjectives. Klibanoff & Waxman (2000), furthermore, provide empirical support for their proposal that adjectival categories are learnt cross-situationally, within the context of basic level categories. 


A comparison of the guessing game and a cross-situational statistical learner, using computational simulations, has shown that, in the guessing game, coherence in production between agents is considerably higher and that learning is much faster (Vogt & Coumans 2003). This means that agents using cross-situational statistical learning have considerable difficulties in arriving at a shared lexicon, though in the end they manage to overcome them. Note, however, that cross-situational statistical learning improves when: agents’ semantic categories are similar (Smith 2003); learners assume mutual exclusivity (Smith in press); and the context size is relatively small (Smith & Vogt 2004). This slower rate of acquisition is thus consistent with the empirical evidence that children learn colour words relatively slowly. Importantly, as yet unpublished studies have shown that the category variance among agents in the cross-situational learner tends to be much higher than that seen from the guessing games. This suggests that negotiating category boundaries in the cross-situational learner is more difficult, which could confirm Andrick & Tager-Flusberg (1986)’s finding.


S&B have presented a model of learning colour words which is fast and based on corrective feedback. Research on child lexicon acquisition suggests, however, that colour categories are actually acquired slowly and through cross-situational learning. If cross-situational learning is, indeed, a more plausible model than the guessing game, then the results achieved by S&B may no longer hold for their account of cultural learning.




Akhtar, N. and Montague, L (1999) Early lexical acquisition: the role of cross-situational learning. First Language 19: 347-358

Andrick, G. R. and Tager-Flusberg, H. (1986) The acquisition of colour terms. Journal of Child Language 13: 119-134

Bloom, P. (2000) How children learn the meaning of words. Cambridge, MA: MIT Press.

Chouinard, M. M. and Clark, E. V. (2003) Adult reformulation of child errors as negative evidence. Journal of Child Language 30: 637-669

Houston-Price, C., Plunkett, K., Harris, P. and Duffy, H. (2003). Developmental change in infants’ use of cues to word meaning.  Paper presented to XIth  European Conference on Developmental Psychology, Catholic University of Milan, Italy.

Klibanoff, R. S. and Waxman, S. R. (2000) Basic level object categories support the acquisition of novel adjectives: Evidence from preschool-aged children. Child Development 7(3): 649-659

Mather, E. and Schafer, G. (2004) Object-label covariation: A cue for the acquisition of nouns? Poster presented at the meeting of the International Society of Infant Studies. Chicago.

Rice, N. (1980) Cognition to language. Baltimore, MD: University Park Press.

Sandhofer, C. M. and Smith, L. B. (2001) Why children learn color and size words so differently: Evidence from adults’ learning of artificial terms. Journal of Experimental Psychology: General 130(4): 600-620.

Siskind, J. M. (1996) A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition 61: 39-91.

Smith, A. D. M. (2003) Intelligent Meaning Creation in a Clumpy World Helps Communication. Artificial Life 9(2): 559-574.

Smith, A. D. M. (in press) Mutual Exclusivity: Communicative success despite conceptual divergence. In M. Tallerman (Ed.) Language origins: perspectives on evolution. Oxford: Oxford University Press.

Smith, A. D. M. & Vogt, P. (2004) Lexicon acquisition in an uncertain world. Paper presented at the 5th Evolution of language conference. Leipzig.

Vogt, P. (2004) Minimum cost and the emergence of the Zipf-Mandelbrot law. In J. Pollack, M. Bedau, P. Husbands, T. Ikegami and R. A. Watson (Eds.) Artificial Life IX Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems. The MIT Press.

Vogt, P. and Coumans, H. (2003) Investigating social interaction strategies for bootstrapping lexicon development Journal of Artificial Societies and Social Simulation 6(1).