Here is the official version (PNAS Early Edition)... | ...and
here
a pre-print
PDF version, as well. |
SummaryThis paper, which was published in the Proceedings of the National Academy of Sciences of the USA (PNAS) on 30 May 2007, has attracted a fair amount of press coverage. Since newspaper stories are often cut to fit the amount of space available and since the published paper goes into a lot of technical detail about both the genetic background and the statistical techniques we used, we have posted the following description of our work for interested readers. |
Our paper reports a statistical study of the relationship between the geographical distribution of two genes and the geographical distribution of tone languages.
The two genes, ASPM and Microcephalin, have attracted a lot of attention in the last couple of years, following two papers (1 and 2) published in Science in 2005 by a Chicago research group led by Bruce Lahn. Lahn’s group showed that there are two variants (alleles), one for each of these two genes, which emerged fairly recently (estimated 6,000 years ago for ASPM and 37,000 years ago for Microcephalin) and that these new alleles seem to be spreading quickly in the human species (and are therefore probably “adaptive”, or favoured by natural selection). They also showed that these “derived” alleles (as they are known) are unevenly distributed in the world’s populations, being especially rare in sub-Saharan Africa and most common in Europe, North Africa and Western Asia.
The distribution of the "derived" allele of ASPM in the Old World populations we studied in our paper.
Each circle represents one population and the intensity of blue reflects the allele frequency (min 0%, max 60%).
The distribution of the "derived" allele of Microcephalin in the Old World populations we studied in our paper.
Each circle represents one population and the intensity of green reflects the allele frequency (min 3%, max 100%).
Tone languages are languages (like Chinese, Thai, Yoruba, and Zulu) in which the pitch or “tone” of words and syllables makes a difference to word meaning. For example, in Chinese huār (with a high level pitch) means ‘flower’ and huàr (with a falling pitch) means ‘picture’. In non-tonal languages (like English or Spanish), pitch is only used at the sentence level, for emphasis and overall meanings like questioning. Roughly half the languages in the world are tonal and half are non-tonal, but they’re fairly unevenly distributed: tone languages are the norm in sub-Saharan Africa and are common in Southeast Asia and among Native American languages especially in parts of Central and South America. Non-tone languages are the norm in Europe and Central, South and West Asia, and among the aboriginal languages of Australia. For more details about their distribution you can consult, for example, the entry on tone in the World Atlas of Language Structures.
(Please,
go here for another Chinese example, with sound files. In
Yoruba, igba spoken with different
tones means different things (recordings courtesy of Dr. Lawrence Olufemi
Adewole of Ile-Ife University, Nigeria): LowHigh = a kind of tree,
MidMid = '200',
MidHigh = 'gourd' and
LowLow = 'time'.)
The distribution of tone languages in the Old World populations we studied in our paper.
Each square represents one population: yellow stands for non-tone languages and gray for tone languages.
(But what about the Americas?)
Superficially, the distribution of the
older (i.e., non-"derived") alleles, as reported by Lahn’s group, resembles the
distribution of tone languages. Because the two genes in question
are known to be involved in
brain growth and development, and
because there is some evidence that differences in performance on language-related experimental tasks
can be linked to differences in brain structure, we hypothesised that the proportion of the
older alleles of ASPM and Microcephalin in a given population
would correlate with whether the language spoken by the population is
tonal.
This
means that our approach is different from
the well-known work of Cavalli-Sforza and his
colleagues, which aims to correlate genetic and
linguistic classifications of populations, using known
or hypothesised historical relations between languages and language
families (do populations genetically similar tend to be also
linguistically similar? - where genetic similarity involves many independent
loci and linguistic similarity involves historical, ancestor-descendant
relationships). Our work investigates
correlations between genetic markers and
typological features of languages (do populations having certain alleles
tend to speak languages using the same feature? - without reference to overall
genetic similarity or linguistic historical classifications).
Language typology studies the ways in which languages can differ. Some of this is fairly familiar: for example, in French and English adjectives and nouns go in the opposite order - that’s word order typology. But there are typological differences in sound structure and word structure, too. In most Australian aboriginal languages, there are no fricative sounds (sounds like S or SH or F), whereas in most European languages there are lots - yet most Australian languages have lots of different N and L and R sounds that many English speakers struggle to tell apart. Or again: in many language (e.g. Turkish, Inuktitut (Eskimo) and Swahili) the verb forms have lots of prefixes or suffixes to indicate the subject, the object, the tense, and so forth; in English or Chinese there’s hardly any of this kind of marking. All these kinds of differences are what language typology is about. |
By
comparing nearly 1000 genetic markers and 26 linguistic
features (the linguistic data with details on our sources and methods can
be found here), we were able to show that, as most people would expect,
there is generally no correlation between
population genetics and language typology – but the relation between tone and the two genes under
study was confirmed to be especially strong in all our
analyses. It’s because there generally isn’t a correlation between
population genetics and language typology that the correlation we’ve found may
be interesting.
This relationship
remains important and statistically highly significant even when we
consider the correlation between tone and
ASPM and Microcephalin simultaneously, after we take into account the
fact that neighbouring populations tend to share both genes and languages, plus
some more tests. (Go here for more details
of what we did.)
The distribution of the correlations between all pairs of genetic markers and linguistic features in our database.
The horizontal axis represents the strength of the correlation (Pearson's r, between -1 and +1, 0 means no correlation).
It can be seen that most correlations are around zero, but that the correlation between tone and ASPM, and tone and
Microcephalin, respectively, are very improbable (stronger than 98.6% of all the correlations).
It must be noted that the correlation between tone and ASPM, and tone and Microcephalin are highly significant.
The distribution of tone and non-tone languages function of the population frequency of the
"derived" alleles of ASPM (horizontal axis) and Microcephalin (the vertical axis).
Tone languages are represented by empty squares and non-tone languages by black squares.
It can be seen that in the bottom-left quadrant there are only tone languages, in the to-right quadrant only non-tone languages,
while in the top-left quadrant there is a balanced mixture (the Americas fit here, supporting our prediction).
The bottom-right quadrant contains no populations in our sample and the reason is not known.
We believe that this correlation may reflect some sort of predisposition or cognitive bias induced by the two genes in question. We don’t have any detailed idea of what this bias might consist of, but we assume it is very small and would only manifest itself in language change over many generations. We know, of course, that any normal human infant can learn the language of any human community that it’s brought up in – genes don’t play any role at the individual level. But subtle differences in the way children acquire language might lead to changes in the long run. All languages change over time (as anyone who has struggled with Shakespeare knows), and computer simulations and mathematical models have suggested that small differences in the way children acquire language could, over enough generations, give rise to big differences in the way a language is structured. And if those subtle differences are influenced by a child’s genetic make-up, that could explain the kind of correlation we’ve found.
What about the Americas?
|
The next step is to do experiments in which we look for evidence of the nature of the predisposition or bias. The work of Patrick Wong and his colleagues provides one possible lead here: they have shown that some monolingual adults find it much harder than others to learn an artificial language vocabulary that makes use of tone or pitch distinctions, and that the differences between these groups show up in subtle differences of brain structure as well. If we could show that these differences also reflect differences in genetic make-up, it would go some way to showing that the correlation we have found is based on a real causal link.
Our work has no immediate practical implications, but its longer term interest would lie in discovering that there’s a causal link between population genetics and language typology. (Again, we haven’t found that: we’ve just demonstrated some very unlikely correlations that suggest there might be such a link.) If that link can be found, then it will fit into the rapidly growing scientific understanding of how genetic make-up influences behaviour and cognitive development. That’s important work with lots of practical ethical dimensions: as science finds out more and more about specific genetic influences, society is really going to have to start dealing with a lot of policy questions that have only been theoretical up till now. But at this point all our paper does is report something that might be a piece of the overall jigsaw puzzle.
What the paper doesn't show nor claimFirst, we are not claiming that there is any direct connection between an individual’s genes and an individual’s language. We’re talking about small individual biases adding up to group effects over many generations of language change. People acquire the language(s) they’re exposed to in early childhood, regardless of their genes. Second, we’re not making any suggestion of “superiority” or “selective advantage” for one language over another. Our work provides absolutely no reason to think that non-tonal languages are easier or “more advanced” than tonal languages (or vice-versa). There’s also no reason to think that there’s any evolutionary advantage to non-tonal languages: Chinese society developed advanced technology and politics and philosophy with a tonal language just as successfully as Eastern Mediterranean societies at about the same time with non-tonal languages. Third, we’re not offering any new findings about the effects of these genes on brain development. We make only very limited suggestions about the detailed neurocognitive mechanisms that might be involved. Not much is known about the functions of these genes in brain development anyway, though this is certainly a hot topic in genetics. Since we’re not geneticists, we’re not involved in the front-line biochemical research, so not really in a position to speculate about what exactly might be going on in the brain. Finally, we’re not suggesting that language is involved in the selective pressure for the "derived" alleles of ASPM and Microcephalin. Nobody really knows what the selective pressures were (although a lot of people would certainly like to find out). Bruce Lahn’s group were very explicit that they didn’t know what the selective advantage might be. Some people have even argued that there is no selective advantage and that the whole story is just a matter of genetic drift. We assume that the “cognitive bias” we propose could be an accidental by-product of whatever it is that these genes are doing. |
Last updated: 25 June
2007 D.R. Ladd & Dan Dediu |