Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Comparing genes across linguistic families

2,467 views

Published on

Comparisons of genes and languages in humans

Published in: Education
  • Be the first to comment

Comparing genes across linguistic families

  1. 1. Comparing genes across linguistic families Guido Barbujani inceton, Institute of Advanced Study October 20
  2. 2. “If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world...” -Charles Darwin, The Origin of Species, 1859
  3. 3. In turn, language data are useful to help us understand biological diversity and migration processes A C B History
  4. 4. A common language frequently signifies a common origin and a related language indicates a common origin further back in time. Such commonality of origin should be reflected by genetic relationship, despite several complicating factors. Robert R. Sokal (1988) Proc. Natl. Acad. Sci. USA
  5. 5. Summary 1.Migrations are population, not molecular, processes 2.Classical comparisons of genes and languages 3.The trouble with the vocabulary and an alternative approach 4.Comparing genes across language families
  6. 6. Fig. 1. The first principal component of gene frequencies from 38 independent alleles at the human loci: ABO, Rh, MNS, Le, Fy, Hp, PGMi, HLA-A, and HLA-B. Shades indicate different intensities of the first principal component, which accounts for 27 percent of the total variation It all began from this P. Menozzi, A. Piazza & L.L. Cavalli-Sforza (1978) Science
  7. 7. Continentwide genetic gradients in Europe L.L. Cavalli-Sforza et al. (1994) Science
  8. 8. Diffusion of Neolithic artifacts in Europe P. Balaresque et al. (2010) PLoS Biology, interpolated from data by R. Pinhasi et
  9. 9. Rationale for the proposal of a Neolithic demic diffusion European genetic diversity distributed in gradients. Only gene flow can generate such patterns on the continental scale No documented migration in post-Neolithic times spanning the area from the Levant to the Atlantic coasts Neolithic technologies may have spread by cultural contact or by migration (most likely, by a combination thereof) Diffusion of Neolithic artifacts cannot produce genetic clines if it is caused only by cultural contacts Demic diffusion: expanding Neolithic people carried in Europe their know-how, their genes, and perhaps their languages too.
  10. 10. E. Kitchen et al. (2009) Proc. R. Soc. B Their languages too?
  11. 11. C. Renfrew (1987) Archaeology and language: The puzzle of Indo-European origins Their languages too?
  12. 12. Conditions for the origin of genetic gradients by demic diffusion 1. Demographic growth of farmers 2. Diffusion, incomplete admixture 3. Farmers continue to grow in numbers, hunter-gatherers don’t A.J. Ammerman & L.L. Cavalli-Sforza (1984) The Neolithic Transition and the Genetics of Populations in Europe But… 0. Low population density
  13. 13. In the first DNA studies (mtDNA) very old ages are estimated for the main European mutations “Each cluster can be assigned, in its entirety, to one of the proposed migration phases; the age of each cluster approximates very closely the timing of the migratory event” “The main mitochondrial variants in Europe predate the Neolithic expansion” (M. Richards et al. (1996, 2000) Am. J. Hum. Genet.
  14. 14. Estimated ages of mitochondrial haplogroups (x 1000) Richards Sykes Richards et al. 1996 1999 et al. 2000 H 23.5 11.0-14.0 15.0 - 17.2 J 23.5 8.5 6.9 - 10.9 T 35.5 11.0-14.0 9.6 - 17.7 IWX 50.5 11.0-14.0 X: 20.0 I: 19.9 - 32.7 K 17.5 11.0-14.0 10.0 - 15.5 U 36.5 5: 50.0 44.6 - 54.4 Neolithic contribution overestimated in preDNA studies? Hans Bandelt Haplogroup H, “the signature of the Paleolithic expansion in Europe”
  15. 15. Two basic models Palaeolithic model Neolithic model (Cultural diffusion of food- (Demic diffusion of food- production technologies production technologies G. Barbujani (2012) Curr. Biol.
  16. 16. Ok folks, all those with haplogroup H come with me, let’s do the Paleolithic migration. No way Steve, not you. You’re a J, damn it, a J! Wait until the Neolithic! “Each cluster can be assigned, in its entirety, to one of the proposed migration phases; the age of each cluster approximates very closely the timing of the migratory event”
  17. 17. Ancient DNA evidence: Neolithic Europeans did not only carry the J hg, no evidence of the H hg in Paleolithic Europeans 20,000 55,000 45,000 7,700 22,700 13,600 12,000 Haplogroup estimated age 21 pre-neolithic hunter-gatherers 105 Nolithic farmers
  18. 18. Post Pr (Model B): 1,655 to 2,691 folds as high as Post Pr (Model A) Genetic continuity since Paleolithic times very unlikely in ABC analyses of mtDNA 2 individuals from the Upper Paleolithic, 43 from the Mesolithic (including the two La Braña specimens) and 121 from the Neolithic
  19. 19. It is people who migrate, not haplogroups Haplogroup ages are not estimates of migration times
  20. 20. Summary 1.Migrations are population, not molecular, processes 2.Classical comparisons of genes and languages 3.The trouble with the vocabulary and an alternative approach 4.Comparing genes across language families
  21. 21. Often, genetic isolates are also linguistic isolates F. Calafell & J. Bertranpetit (1993) Am. J. Phys. Anthropol.
  22. 22. In Europe, linguistically-related populations are genetically closer than unrelated populations separated by the same geographic distance Correlations Positive, significant • GEO,LANG 26 / 26 • GEO,GEN 22 / 26 • GEN,LANG 16 / 26 • GEN,LANG.GEO 11 / 26 R.R. Sokal (1988) Proc. Natl. Acad. Sci USA
  23. 23. In agreement with Renfrew’s predictions, four African- Eurasian gradients corresponding to four language families G. Barbujani & A. Pilastro (1993) Proc. Natl. Acad. Sci .USA
  24. 24. R.D. Gray & Q.D. Atkinson (2003) Nature In agreement with Renfrew’s predictions, estimated divergence between Indo- European languages between 7,800 and 9,500 years BP
  25. 25. R. Bouckaert et al. (2012) Science In agreement with Renfrew’s predictions, geographic origin of the Indo-European family inferred in Anatolia
  26. 26. But no gene-language correlation in the Americas
  27. 27. A simple, global correspondence between genetic and linguistic diversity? 1. Do we speak different languages because our genes influence language learning? 2. Do we carry different alleles because we speak different languages? L.L. Cavalli-Sforza et al. (1988) Proc. Natl. Acad. Sci. USA
  28. 28. Summary 1.Migrations are population, not molecular, processes 2.Classical comparisons of genes and languages 3.The trouble with the vocabulary and an alternative approach 4.Comparing genes across language families
  29. 29. Many linguists disagree Controversial linguistic classifications Random similarities due to the limited number of sounds humans can produce Impossibility to tell random from significant correspondences if etimologies cannot be traced Overlapping cultural boundaries
  30. 30. The trouble with vocabulary comparisons
  31. 31. An alternative to vocabulary comparisons: Structural features of languages in grammar and syntax Word order English equivalent Proportion of languages Example languages SOV "She him loves." 45% Pashtoon, Japanese, Afrikaans SVO "She loves him." 42% English, Hausa, Mandarin VSO "Loves she him." 9% Hebrew, Tuareg, Zapotec VOS "Loves him she." 3% Malgasy, Baure OVS "Him loves she." 1% Hixkaryana OSV "Him she loves." <1% Warao The Parametric Comparison Method G. Longobardi & C. Guardiano (2009) Lingua
  32. 32. Summary 1.Migrations are population, not molecular, processes 2.Classical comparisons of genes and languages 3.The trouble with the vocabulary and an alternative approach 4.Comparing genes across language families
  33. 33. 5,886 subjects genotyped at 500,568 loci using the Affymetrix 500K single nucleotide polymorphism (SNP) chip. POPRES populations that match our linguistic database in Europe Genetic data Populations: England, France, Germany, Greece, Hungary, Ireland, Italy, Poland, Portugal, Romania, Russia, Serbia, Croatia, Spain
  34. 34. 20 Spanish Basques + Basque + Finnish 93 Finns Final sample size: 805 individuals for ~ 220,000 SNPs (MAF > 0.01 and genotyping rate > 98%)
  35. 35. Principal Component Analysis (PCA) of individual genotypes
  36. 36. A matrix summarizing structural variation in 15 European languages PCAnalysis of languages
  37. 37. Language diversity Genomic diversity Common elements and differences between PCA plots of genomic and linguistic diversity Main inconsistencies: 1.Hungarians genomes close to those of Indo-European speakers 2.Romanian genomes close to those of their geographical, non-Romance speaking, neighbours
  38. 38. Among Indo-European languages, distances inferred from vocabulary and syntax suggest similar clusterings Vocabulary Syntax
  39. 39. In Europe, distances inferred from syntax and DNA suggest similar clusterings Syntax Genetic distances
  40. 40. Path difference distance between linguistic and genetic UPGMA Comparison with those obtained in 100,000 pairs of random topologies drawn, with replacement, from the total set of the possible topologies for 15 taxa Probability to obtain smaller distance values than observed, P<0.004 The close relationship between trees inferred from linguistic and genetic distances is very unlikely to have arisen just by chance
  41. 41. Mantel correlations between distance matrices Bonferroni P=0.0006
  42. 42. Mantel and partial Mantel correlations between distance matrices Bonferroni P=0.006
  43. 43. Main inconsistencies: 1.Hungarians genomes close to those of Indo-European speakers 2.Romanian genomes close to those of their geographical, non-Romance speaking, neighbours Recent admixture accounts for some PCA inconsistencies
  44. 44. To summarize: 1.Within the Indo-European family, similar trees inferred from vocabulary and syntactic comparisons 2.European populations speaking similar languages also tend to resemble each other at the genomic level 3.Syntax appears to offer a better prediction of genomic distances than geography 4.Contacts between populations after their separation from a common ancestor can be recognized, and better accounted for, by comparing genomic and linguistic patterns of variation
  45. 45. When did the main human populations separate?
  46. 46. Henn et al. (2012) Proc Natl Acad Sci USA Scally and Durbin (2012) Nature Rev Genet
  47. 47. Fossil, archaeological and genomic evidence place divergence among continental populations in the interval. 120-60 k years ago. When did the main language families diverge? Correlation suggests, but does not prove, common causation. Would it be possible that the same geographic constraints led to parallel genetic and linguistic change, in different time moments? Darwin had in mind population trees; but how sure are we that genetic evolution and linguistic change really occurred in a tree-like fashion? Indoeuropean Documentation Center, Utexas at Austin Several open questions
  48. 48. Silvietta Ghirotto Francesca Tassi Pino Longobardi York University Davide Pettener University of Bologna http://www.langelin.org/ Cristina Guardiano University of Modena

×