REVIEWS




                                      THE HUMAN Y CHROMOSOME:
                                      AN EVOLUTI...
REVIEWS


                                                               Structural features                              ...
REVIEWS


                                     Analysis of the near-complete euchromatic sequence          Y chromosome th...
REVIEWS




                             12                                                                               ...
REVIEWS


                                                                         1                              M31     ...
REVIEWS


                                   Box 2 | Could the Y chromosome recombine?
                                   ...
REVIEWS


                                     Box 3 | Dating Y lineages using microsatellite variation
                  ...
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE
Upcoming SlideShare
Loading in …5
×

THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE

1,690 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,690
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE

  1. 1. REVIEWS THE HUMAN Y CHROMOSOME: AN EVOLUTIONARY MARKER COMES OF AGE Mark A. Jobling* and Chris Tyler-Smith‡ Until recently, the Y chromosome seemed to fulfil the role of juvenile delinquent among human chromosomes — rich in junk, poor in useful attributes, reluctant to socialize with its neighbours and with an inescapable tendency to degenerate. The availability of the near-complete chromosome sequence, plus many new polymorphisms, a highly resolved phylogeny and insights into its mutation processes, now provide new avenues for investigating human evolution. Y-chromosome research is growing up. SATELLITE DNA The properties of the Y chromosome read like a list of genes (FIG. 1): but how complete is this catalogue, what A large tandemly-repeated DNA violations of the rulebook of human genetics: it is not proportion of genes can be identified and what do they array that spans hundreds of essential for the life of an individual (males have it, but do? Efforts to discover genome-wide sequence varia- kilobases to megabases. females do well without it), one-half consists of tandemly tion have identified vast numbers of Y-specific single RECOMBINATION repeated SATELLITE DNA and the rest carries few genes, and nucleotide polymorphisms (SNPs): the Ensembl The formation of a new most of it does not recombine. However, it is because database lists 28,650 at the time of writing, which combination of alleles through of this disregard for the rules that the Y chromosome is might seem enough to provide an extremely detailed meiotic crossing over. such a superb tool for investigating recent human evolu- PHYLOGENETIC TREE of Y-chromosomal lineages. But how Some authors include intrachromosomal gene tion from a male perspective and has specialized, but many of these SNPs are real, and how many are arte- conversion under this heading. important, roles in medical and forensic genetics. facts that are produced by unknowingly comparing As this has been shown on the The Y chromosome needs to be studied in different true Y-chromosomal sequences with similar sequences Y chromosome, they prefer not to ways from the rest of the genome. In the absence of (PARALOGUES) elsewhere on the same or other chromo- refer to it as 'non-recombining'. RECOMBINATION, genetic mapping provides no informa- somes2? Also, are these SNPs a representative set of EUCHROMATIN tion and physical analyses are of paramount impor- sequence variants from the human population as a The part of the genome that is tance, so the availability of the near-complete sequence whole? The answer is no, because of ascertainment decondensed during interphase, of the EUCHROMATIC portions of the chromosome might bias (BOX 1) in the range of populations that were sur- which is transcriptionally active. have even greater impact here than on the rest of the veyed for variation. As well as this possible treasure trove genome. In this review, we are mainly concerned with of unverified SNPs, the availability of Y-chromosome *Department of Genetics, the application of Y-chromosomal DNA variation to sequence means that there are now more than 200 University of Leicester, investigations of human evolution, but there is also con- binary polymorphisms that are well characterized, University Road, siderable overlap with other areas. Questions about phe- and 100–200 potentially useful new MICROSATELLITES, as Leicester LE1 7RH, UK. ‡ The Wellcome Trust notype are relevant because phenotypic effects can lead well as the ~30 published polymorphic tri-, tetra- and Sanger Institute, Wellcome to evolutionary changes being influenced by natural pentanucleotide repeat markers. Finally, there is a Trust Genome Campus, selection rather than by gene flow and genetic drift. robust and developing phylogeny of Y-chromosomal Hinxton, Cambridgeshire It seems that the more we know about the Y chro- haplotypes (FIG. 3) that are defined by binary polymor- CB10 1SA, UK. e-mails: maj4@le.ac.uk; mosome the more questions we have. Its sequence1 phisms (haplogroups), and a unified nomenclature sys- cts@sanger.ac.uk allows us to get a comprehensive picture of its struc- tem3 that allows diversity data from different research doi:10.1038/nrg1124 ture and organization, and to compile a catalogue of groups to be readily integrated. 598 | AUGUST 2003 | VOLUME 4 www.nature.com/reviews/genetics © 2003 Nature Publishing Group
  2. 2. REVIEWS Structural features Protein-coding genes Phenotype Point Satellites >99% X–Y Y–Y repeats Inversion polymorphism Copy Expression mutation Location number pattern or Deletion gene loss Ubiquitous 0 Testis Other 1 >1 1 PAR1 2 SRY Sex reversal RPS4Y ZFY 3 TGIF2LY 4 PCDH11Y 5 TSPYB None 6 IR3 AMELY None TBL1Y None 7 PRKY None IR1 8 9 TSPYA array 10 IR3 11 cen AZFa 12 USP9Y Infertility Mb DBY 13 UTY TMSB4Y P8 VCY, VCY 14 NLGN4Y 15 P7 16 P6 17 XKRY AZFb P5 CDY2, CDY2 18 XKRY P4 HFSY 19 CYorf15A, CYorf15B SMCY 20 DYZ19 EIF1AY PHYLOGENETIC TREE RPS4Y2 21 RBMY1, RBMY1 A diagram that represents the IR2 RBMY1, RBMY1 evolutionary relationships IR2 PRY between a set of taxa (lineages). 22 P3 RBMY1, RBMY1 IR1 PRY BPY2 PARALOGUES 23 P2 DAZ1 Sequences, or genes, that have DAZ2 originated from a common CDY1 24 BPY2 ancestral sequence, or gene, by a P1 DAZ3 duplication event. 25 DAZ4 BPY2 MICROSATELLITE CDY1 26 AZFc A class of repetitive DNA None Yqh sequences that are made up of PAR2 tandemly organized repeats Figure 1 | Y-chromosomal genes. Our present view of the Y chromosome, which is based on DNA sequence information that that are 2-8 nucleotides in was derived largely from a single HAPLOGROUP-R individual1. From left to right: cytogenetic features of the chromosome and their length. They can be highly approximate locations, which are numbered from the Yp telomere. Structural features include three satellite regions (cen, DYZ19 polymorphic and are frequently used as molecular markers in and Yqh), segments of X–Y identity (PAR1 and PAR2; dark brown) and high similarity (mid brown), and Y–Y repeated sequences population genetics studies. in which the regions with greatest sequence identity are designated ‘IR’ for ‘inverted repeat’ and ‘P’ for ‘palindrome’. An inversion polymorphism on Yp that distinguishes haplogroup P from most other lineages is indicated. The locations of the 27 HAPLOGROUP distinct Y-specific protein-coding genes are shown; some are present in more than one copy and their expression patterns are A haplotype that is defined by summarized. Pseudoautosomal genes and Y-specific non-coding transcripts are not shown. On the right, the phenotypes that binary markers, which is more are associated with gene inactivation or loss are indicated; some deletions produce no detectable phenotype111 (black) and stable but less detailed than one represent polymorphisms in the population, whereas others result in infertility (AZFa, AZFb and AZFc) (red), although the defined by microsatellites. contributions of the individual deleted genes are unclear. NATURE REVIEWS | GENETICS VOLUME 4 | AUGUST 2003 | 5 9 9 © 2003 Nature Publishing Group
  3. 3. REVIEWS Analysis of the near-complete euchromatic sequence Y chromosome than elsewhere in the nuclear genome, of the Y chromosome has called into question many of which is indeed observed4,5. We also expect it to be more the cliches that were previously associated with this chro- susceptible to genetic drift, which involves random mosome1. Also, this new knowledge of DNA sequence, changes in the frequency of haplotypes owing to sam- genes, polymorphisms and phylogeny, provides a starting pling from one generation to the next. Drift accelerates point for a new generation of studies into Y-chromosome the differentiation between groups of Y chromosomes in diversity in human populations, mutation processes and different populations — a useful property for investigat- the role of the Y chromosome in disease. In this review, ing past events. However, because of drift, the frequencies we briefly cover mutation processes and disease studies, of haplotypes can change rapidly through time, so, quan- and point the reader to more detailed coverage in tifying aspects of these events, such as admixture propor- the bibliography, but largely focus on the use of the Y tions between populations or past demographic changes, chromosome as a marker in studies of human evolution. might be unreliable. Geographical clustering is further influenced by the Special features of the Y chromosome behaviour of men, who are the bearers of Y chromo- Why focus on a piece of the genome that only tells us somes. Approximately 70% of modern societies practice about one-half of the population? Because of its sex- patrilocality6,7: if a man and a woman marry but are not determining role, the Y chromosome is male specific from the same place, it is the woman who moves rather and constitutively haploid. It passes from father to son, than the man. As a consequence, most men live closer to and, unlike other chromosomes, largely escapes meiotic their birthplaces than do women, and local differentia- recombination (BOX 2). Two segments (the pseudoauto- tion of Y chromosomes is enhanced. By contrast, somal regions) do recombine with the X, but these mtDNA, which is transmitted only by women, is amount to less than 3 Mb of its ~60-Mb length; for the expected to show reduced geographical clustering. This purposes of this review, ‘Y chromosome’ refers to the has been shown in studies of Europe8 and island non-recombining majority (also variously known as Southeast Asia9, and also in comparisons of populations NRY, NRPY and MSY). The importance of escaping recom- that practice patrilocality or matrilocality (in which men bination is that haplotypes, which are the combinations move and women do not) in Thailand10; here, matrilocal of allelic states of markers along the chromosome, usu- groups show enhanced mtDNA differentiation and ally pass intact from generation to generation. They reduced differentiation of Y chromosomes. change only by mutation, rather than the more complex reshuffling that other chromosomes experience, and so Mutation and Y-chromosome diversity preserve a simpler record of their history. Using binary As mutation is the only force that acts to diversify polymorphisms with low mutation rates, such as SNPs, Y haplotypes, understanding mutational dynamics is a unique phylogeny can therefore easily be constructed. important to understanding the origins of haplotype Assuming a 1:1 sex ratio, the human population can diversity. More generally, knowledge of the rates and be represented in microcosm by one man and one processes of mutation at different classes of marker is woman. This couple carry four copies of each autosome fundamental to evolutionary interpretations of diver- and three X chromosomes, but only one Y chromosome. sity data and to understanding genetic disease. In this In the population as a whole, the EFFECTIVE POPULATION SIZE broader context, studies on the Y chromosome are of of the Y chromosome is therefore expected to be one- particular interest because the mutations observed quarter of that of any autosome, one-third of that of the here are the result of exclusively intra-allelic processes. X chromosome and similar to that of the effectively Although these processes occur on all chromosomes, haploid mitochondrial DNA (mtDNA). Assuming that the haploid Y provides a model for studying them the same mutation processes act on all chromosomes, without the complicating factors of inter-allelic events we therefore expect lower sequence diversity on the and allelic diversity. NRY, NRPY AND MSY Several neologisms have been introduced to refer to the Box 1 | Ascertainment bias: a bugbear of human evolutionary studies portion of the Y chromosome that excludes the The best way to discover the single nucleotide polymorphism (SNP) variation in a population is to resequence the pseudoautosomal regions, for chosen region in all individuals. This approach is standard for the highly variable control region of mitochondrial example, non-recombining Y (NRY), non-recombining DNA (mtDNA), and has been used in some studies of X-chromosomal and autosomal segments; however, it is rarely portion Y (NRPY) and male- used on the Y chromosome because of low variation and high cost. Most published surveys of Y variation have specific Y (MSY), but none has analysed variants that were previously discovered in a different (often small) set of chromosomes. This can cause achieved wide acceptance. ascertainment bias, which is systematic distortion in a data set that is caused by the way in which markers or samples are collected. FIG. 2 shows how misleading this can be. If, for example, variants were discovered in European EFFECTIVE POPULATION SIZE Y chromosomes and then tested in China, the erroneous conclusion would be drawn that Chinese Y chromosomes The size of an idealized population that shows the same showed little variation, and vice versa. A good illustration of the contradictory conclusions that can be reached is amount of genetic drift as the provided by two studies of Y-chromosome diversity in China. Using SNPs that were chosen because of their high population studied. This is variability in the south, Su et al.91 concluded that there was less variation in the north, where only a subset of the approximately 10,000 southern haplotypes was found. But, when variation was measured by Karafet et al. using a more global set of binary individuals for humans, in markers, more haplotypes were found in the north than in the south92. Y-SNP diversity needs to be interpreted with contrast to the census great caution; microsatellites, which are variable in all populations, provide a less biased measure of diversity. population size of >6 × 109. 600 | AUGUST 2003 | VOLUME 4 www.nature.com/reviews/genetics © 2003 Nature Publishing Group
  4. 4. REVIEWS 12 25 24 48 20 23 21 22 13 11 17 27 10 26 9 16 19 44 31 32 14 15 43 29 6 28 30 5 7 18 4 45 3 8 33 35 36 46 2 47 34 37 40 38 41 1 39 42 A B C D E F* G H I J K* L M N O P* Q R Figure 2 | Global distribution of Y haplogroups. Each circle represents a population sample with the frequency of the 18 main Y haplogroups (FIG. 3; shown here in simplified form) identified by the Y Chromosome Consortium (YCC) indicated by the coloured sectors. Note the general similarities between neighbouring populations but large differences between different parts of the world. Data are from published sources (as detailed below) that were chosen to maximize the geographical coverage, sample size and inclusion of relevant markers. In some cases, the limited number of markers could not identify all potential YCC haplogroups and the published data were supplemented by unpublished information from the authors or other studies. Despite this, some populations remain incompletely characterized and the chromosomes assigned here to the PARAGROUPS F*, K* and P*, might be reassigned if further typing is carried out. Populations are numbered as follows: 1, !Kung60; 2, Biaka Pygmies60; 3, Bamileke60; 4, Fali60; 5, Senegalese112; 6, Berbers60; 7, Ethiopians112; 8, Sudanese61; 9, Basques61; 10, Greeks69; 11, Polish69; 12, Saami69; 13, Russians73; 14, Lebanese113; 15, Iranians113; 16, Kazbegi (Georgia)113; 17, Kazaks113; 18, Punjabis72; 19, Uzbeks73; 20, Forest Nentsi73; 21, Khants73; 22, Eastern Evenks73; 23, Buryats73; 24, Evens73; 25, Eskimos73; 26, Mongolians73; 27, Evenks73; 28, Northern Han73; 29, Tibetans78,114; 30, Taiwanese61; 31, Japanese61; 32, Koreans113; 33, Filipinos115; 34, Javanese115; 35, Malaysians115; 36, West New Guineans (highlands)115; 37, Papua New Guineans (coast)115; 38, Australians (Arnhem)115; 39, Australians (Sandy Desert)115; 40, Cook Islanders115; 41, Tahitians116; 42, Maori117; 43, Navajos78; 44, Cheyenne78; 45, Mixtecs78; 46, Makiritare79; 47, Cayapa118; 48, Greenland Inuit83. As with all regions of human DNA — except for the Studies of genetic diseases show a strong bias towards mtDNA control region — base substitutional mutation fathers as the source of new mutations, and also show occurs at too low a rate to be analysed directly. However, increasing mutation rate with paternal age (reviewed in the secure phylogenetic framework and haploidy of the REF. 12). The explanations generally used for these two Y chromosome mean that recurrent mutations can be observations are, respectively, the larger number of cell identified unambiguously, and data that accumulate divisions (and hence DNA replications) in male than in from resequencing will provide information about the female gametogenesis, and the increase of mutation rate mutational properties of individual bases. The human with time through continuing divisions of spermato- population is so large that, even given the low average genic stem cells. As Y chromosomes pass only through mutation rate of ~2 × 10–8 per base per generation11, we the male germline, its mutagenic properties affect the expect recurrent mutations to occur at every base of the Y chromosome more than any other. The ratio of male to Y chromosome in each global generation. However, female mutation rates — the α-factor (α) — can be esti- these modern recurrences will usually go undetected. At mated by comparing the number of mutations that have PARAGROUP present, the number of recurrent base substitutions in accumulated in homologous autosomal Y-chromosomal A group of haplotypes that contain some, but not all, of the Y Chromosome Consortium (YCC) tree is only and X-chromosomal sequences over a given time period. the descendants of an ancestral five (REF. 3), although this is likely to increase as more Estimates for α vary considerably between studies, but all lineage. chromosomes are typed. show a significantly higher mutation rate in the male NATURE REVIEWS | GENETICS VOLUME 4 | AUGUST 2003 | 6 0 1 © 2003 Nature Publishing Group
  5. 5. REVIEWS 1 M31 A1 A * A2* M91 2 M6 M14 M23 M49 M71 M135 M141 P3 P4 P5 MEH1 M196 M206 M212 a M114 A2a b P28 A2b M28 M59 A3a M32 a M51 3 A3b1 b M144 M190 M220 1 M13 M63 M127 M202 M219 * A3b2* 2 a M171 A3b2a b M118 A3b2b * B* 1 M146 B1 B M60 M181 * B2a* M150 1 M109 M152 P32 B2a1 a 2 M108.1 * B2a2* a M43 B2a2a 2 M182 * B2b* 1 P6 M115 M169 B2b1 M112 50f2(P) 2 B2b2 b M30 M129 * M108.2 B2b3* Y 3 a B2b3a * B2b4* 4 P7 a P8 B2b4a b MSY2.1 M211 B2b4b * C* M8 M105 M131 C 1 C1 M38 a* P33 C2* 2 M208 C2a RPS4Y711 M216 b C2b * C3* M217 P44 a M93 3 P39 C3a b M48 M77 M86 C3b 4 M210 c C3c C4 * D* M15 D1 D SRY10831.1 M174 1 P42 a D2a M42 2 M55 M57 M64.1 P12 P37.1 P41.1 12f2.2 M116.1 * D2b* b 1 M125 D2b1 M94 2 M151 D2b2 M139 YAP * E* 1 M33 M132 * M44 E1* E M145 a E1a M75 a M41 E2a M203 M96 SRY4064 P29 2 M54 M85 M90 M98 E2b b * E3* * E3a* 1 M58 E3a1 2 M116.2 E3a2 a M2 P1 M180 3 M149 E3a3 4 M154 E3a4 3 P2 DYS391p 5 M10 M155 M156 E3a5 6 M66 E3a6 M191 E3a7 7 * E3b* M78 * E3b1* 1 a M148 E3b1a M35 M215 * E3b2* b M81 a M107 E3b2a 2 M165 b E3b2b M168 M123 * E3b3* 3 a M34 * E3b3a* M136 E3b3a1 P9 4 M281 1 E3b4 * F* * G* F M201 1 P20 G1 G 2 P15 * G2* a P16 * G2a* P17 P18 G2a1 1 * H* M52 M69 * M36 H1* H M82 a H1a 1 b M97 M39 M138 H1b Apt c H1c 2 H2 * I* P19 * I1* I * I1a* P30 1 P40 I1a1 M170 P38 a M21 1 2 I1a2 M72 I1a3 3 * P41.2 I1b* b P37.2 1 I1b1 2 M26 * M161 I1b2* a I1b2a * J* 1 M62 J1 J * J2* 12f2.1 a M47 M68 J2a b M137 J2b c J2c P14 M172 d M158 2 J2d M89 M12 * J2e* e M102 * J2e1* M213 1 a M99 J2e1a f M67 1* M92 J2f* J2f1 2 M163 M166 J2f2 * K* 1 SRY9138 M70 K1 K 2 M147 K2 3 K3 M20 M22 M11 M61 * L* 1 M27 M76 L1 L * M* M4 M5 M106 P35 M189 M186 1 P34 M1 M Figure 3 | The phylogenetic tree of binary a* M2* 2 M104/P22 M16 M83 M2a Y-chromosomal haplogroups. The b M2b * N* LLY22g 1 M128 N1 N phylogeny is based on that of the 2 P43 N2 Tat * N3* Y Chromosome Consortium (YCC)3, with 3 a M178 * P21 N3a* 1 N3a1 minor modifications72,112,115,119. Clades and M214 * O* O 1 M119 MSY2.2 a* M101 O1* clade names are coloured to match the pie O1a M9 b M50 M103 M110 O1b * O2* charts in FIG. 2. The Y clade contains all of M175 2 P31 a M95 * M88 M111 O2a* SRY+465 1 O2a1 the the haplogroups A–R. Paragroups, b * 47z O2b* 1 O2b1 which are lineages that are not defined by * O3* a M121 O3a b M164 O3b the presence of a derived marker, are 3 M122 c M159 LINE1 O3c d M7 * O3d* indicated by an asterisk (for example, P*). 1 M113 O3d1 e M134 * O3e* The nomenclature system allows the union * 1 M117 M133 M162 O3e1* a O3e1a * P* of two haplogroups, such as D and E, to * Q* P 1 M120 Q1 Q be indicated by juxtaposed letters (DE). P36 MEH2 2 M25 M143 Q2 92R7 * M19 Q3* A designation such as R(xR1a) indicates the P27 3 M3 a M194 Q3a b M199 Q3b M45 partial typing of markers in a haplogroup, in M74 * c Q3c R* * R1* R this case describing all chromosomes in M207 SRY10831.2 * R1a* a * R1a1* clade R except those in R1a. Details of the 1 M17 a M56 M157 R1a1a M173 b R1a1b markers, together with further information 1 c M64.2 M87 R1a1c * R1b* M18 about nomenclature rules, can be found at P25 1 2 M73 R1b1 R1b2 b a M37 R1b3a the YCC web site (see online links box). b M65 M126 R1b3b 3 M269 c R1b3c Note that this phylogeny and nomenclature M153 R1b3d d M160 e SRY-2627 R1b3e should be regarded as the YCC2003 Tree, f g M222 R1b3f R1b3g 2 M124 and may be used as a reference. R2 602 | AUGUST 2003 | VOLUME 4 www.nature.com/reviews/genetics © 2003 Nature Publishing Group
  6. 6. REVIEWS Box 2 | Could the Y chromosome recombine? There have been controversies over whether mitochondrial DNA (mtDNA) recombines93; although the Y chromosome (excluding the pseudoautosomal regions) is assumed to be exempt from recombination, are there any circumstances under which it could occur? Recent work20 has shown extensive gene conversion between paralogues on the Y chromosome, and, as gene conversion represents a form of recombination, the argument has been made that the Y chromosome is therefore a recombining chromosome. However, here we adhere to the conventional definition of recombination as crossing over that occurs between chromosomal homologues, and therefore exclude gene conversion from this discussion. Approximately 1 in 1,000 men94 carry two Y chromosomes (47,XYY), and, in principle, Y–Y recombination might occur here. As both chromosomes are identical, there would be no consequences for haplotype integrity, but the dynamics of intrachromosomal recombination between repeated sequences could alter. In practice, however, most fertile XYY men eliminate one Y chromosome in their germline95, so there is no possibility of recombination. Those that retain all three sex chromosomes tend to suffer spermatogenic arrest96, so any potential recombinants never enter the next generation and are not observed. Recombination might occur between highly similar XY-HOMOLOGOUS sequences and introduce a block of X-chromosomal material into a Y chromosome. This should be recognizable if the transferred segment is not too small, as the minimum sequence divergence between X and Y chromosomes in such blocks (~1%) is much greater than the average sequence divergence between different Y haplotypes (~0.05%). The rate of such events is probably low: they were observed only twice in an interspecific phylogeny of the cat family97. There are rare instances in which segments of the Y chromosome are carried on other chromosomes as asymptomatic translocations98,99. Recombination (or gene conversion) could occur between a segment from one haplogroup and a normal Y chromosome from a different haplogroup. Typing polymorphisms on a resulting recombinant chromosome would show a burst of HOMOPLASY in the phylogeny, corresponding to a set of markers located together in the recombinant segment.As population sample sizes and numbers of markers increase, it seems probable that such rare recombinant chromosomes will eventually be discovered. than in the female germline. A recent estimate13, based microsatellites in different lineages (for example, REF. 23) on a comparison of Y and X sequences, gives a value of that there are marker- and allele-specific differences in 2.8 (95% confidence interval limits: 2.3–3.4). mutation rates. In principle, direct analysis of sperm The Y chromosome is rich in low-copy-number DNA (which is ideal for Y chromosomes) offers access to repeated sequences, and non-allelic homologous recom- these rates. The only study to attempt this gave mutation bination between these paralogues causes both non- rates for two tetranucleotide microsatellites that were pathogenic rearrangements and male infertility through comparable to those obtained by other methods, but for AZFa14–16, AZFb 17 and AZFc 18 deletions (FIG. 1). There technical reasons only mutations involving repeat gains is evidence of GENE CONVERSION between paralogues in two were analysed24. The overall picture presented by these of these cases19,20. Such conversion events are expected to studies is that the properties of Y-specific microsatellites be frequent compared with base substitutions, and might are not detectably different from those of their autoso- influence the probability of rearrangements by altering mal counterparts, which indicates that mutation at these the lengths of tracts of sequence identity. The controver- markers might be largely or completely intra-allelic, and sial proposal has been made that the arrangement of is consistent with the widely-held view that replication genes that are essential in spermatogenesis (such as the slippage is the mechanism responsible. DAZ genes) in multiple copies in paralogous repeats has By contrast, the fact that the non-recombining evolved to protect them, through beneficial Y–Y gene region of the Y chromosome is completely devoid of conversion, against the degeneration that results from hypervariable GC-rich minisatellites supports the idea haploidy20. It might be that the lack of crossing over with that these loci are largely the by-products of recombina- a homologue makes intrachromosomal exchange more tion activity25. The only highly polymorphic Y-specific likely in the male-specific region of the Y chromosome minisatellite known is an atypical AT-rich locus 26. than on other chromosomes. Gene conversion is a funda- Studies of deep-rooting pedigrees27 and father–son mental and poorly understood process that seems certain pairs28 indicate that its high mutation rate might arise to be further clarified by studies on the Y chromosome, from unequal sister-chromatid exchange, or replica- HOMOLOGUES within the framework of the phylogeny. tion slippage that is facilitated by the secondary Genes or sequences that share a Mutation at Y-specific tri- and tetranucleotide structure of repeats. common ancestor. microsatellites has been analysed in deep-rooting pedi- grees21 and father–son pairs22. Such experiments typi- Selection and Y-chromosome diversity HOMOPLASY The generation of the same cally show few mutation events, and for several markers Aside from mutation, selection is a potentially impor- sequence state at a locus by no mutations at all. Studies that use mutation rates in tant force in patterning Y-haplotype diversity in popu- independent routes (convergent calculations (BOX 3) therefore often quote average rates, lations. The Y chromosome is subject to purifying evolution). such as 3.17 × 10–3 per microsatellite per generation22, selection. For example, absence of the Y chromosome GENE CONVERSION over eight tetranucleotide repeats (95% confidence (the 45,X karyotype) leads to Turner syndrome, and The non-reciprocal exchange interval limits: 1.89–4.94 × 10–3). However, it is clear the loss or inactivation of Y genes can produce an XY of sequence. from studies of the relative diversity of individual female or hermaphrodite phenotype29 (FIG. 1), or male NATURE REVIEWS | GENETICS VOLUME 4 | AUGUST 2003 | 6 0 3 © 2003 Nature Publishing Group
  7. 7. REVIEWS Box 3 | Dating Y lineages using microsatellite variation A suitable approach to estimate the time to most recent common ancestor (TMRCA) of a set of closely related Y-chromosomes is to use the microsatellite variation in the chromosomes and mutation rates that are derived from family or other studies. The basis of this approach is intuitively obvious: a Y haplogroup originates Time when a new binary-marker mutation occurs. This happens on a single chromosome, so there is necessarily no associated microsatellite variation. If the lineage spreads, microsatellite mutations will occur: the longer the time, the greater the accumulated variation (see figure). In practice, several factors can complicate the calculation. First, the mutation rate is not known precisely. This is partly because mutations are rare, so the measurement derived from father–son comparisons has a large error, and the use of deep-rooting pedigrees introduces the further problem of distinguishing mutation from non-paternity. However, mutation rates also differ between microsatellites and between alleles (see the section on mutation). Second, a generation time (ranging from 20–35 years) must be chosen to convert a time measured in generations into one measured in years. Third, and perhaps most important, variation will not accumulate at a steady rate, but variants will sometimes be lost by drift, so demography and population structure will have large influences. These are usually poorly known. The effect of these complications is to introduce great uncertainty into the measurements: small errors are more likely to imply that sources of error have been ignored than that a reliable figure has been obtained (see link to BATWING program in the online links box). infertility30 (for example, USP9Y/DFFRY). Here, however, does not imply its absence in other populations in we are concerned with the question of past or present dif- which different lineages predominate, so further stud- ferential selection on Y lineages. BALANCING SELECTION seems ies on a larger scale are warranted to take advantage of unlikely because heterozygous advantage is impossible for improved genotyping methods36,37 and phylogenetic this haploid locus, and FREQUENCY-DEPENDENT SELECTION has resolution3. Analyses of normal individuals could dis- not been found, but could the chromosome have experi- cover lineages that have expanded more rapidly than enced positive selection when an advantageous mutation expected, irrespective of association to any known arose or a change in circumstances conferred an advan- phenotype. One such example has been detected, but tage on a previously neutral variant? Because of the lack was explained by social rather than biological of recombination, any selection would affect the entire selection38 (BOX 4). So, the evidence available so far does chromosome and produce an increase in frequency of a not indicate that significant differential selection acts lineage more rapidly than would be expected by drift. At on contemporary Y lineages. first, this lineage would be just one haplogroup among Under neutrality and the other simplifying assump- many, and could be detected by comparisons between tions that are commonly made by population geneticists, Y lineages, but eventually it might be fixed and have to be such as constant population size and random mating, detected by comparisons with other loci. the coalescence time or time to most recent common We expect that polymorphisms in Y-encoded proteins ancestor (TMRCA) of a locus is proportional to its (which are now estimated to number 27 distinct types) effective population size. As we have seen, the effective might influence some phenotypes (FIG. 1). Association population size of the Y chromosome is one-quarter of studies, which measure the frequencies of Y haplotypes in that of autosomes and one-third of that of the X chro- matched groups of men with different phenotypes, can mosome, so a Y-chromosome TMRCA that is signifi- detect these influences. However, despite a number of cantly less than one-quarter or one-third of these, searches over the past five years (TABLE 1), it has been diffi- respectively, would indicate a departure from neutral- cult to confirm such effects. Many of the studies showed ity, at least as defined by the models. Many estimates of no association, and in those cases where an association Y-chromosome coalescence time have been made; BALANCING SELECTION was found, it often could not be reproduced31,32 or could among those presented in the past few years, some5,39, Selection that favours more than be explained by population structure33. but not all40, are recent. Pritchard et al.39 used data on one allele, for example, through heterozygote advantage, and so A few associations seem robust. The finding of a eight microsatellites from 445 chromosomes to esti- maintains polymorphism. higher frequency of XX MALES in haplogroup Y*(xP)34 mate a TMRCA of 46 (16–126) thousand years ago can be explained by a plausible mechanism: the pro- (kilo-years ago, kya) under a model of exponential FREQUENCY-DEPENDENT tective effect of an inversion polymorphism in hap- population growth. Similarly, Thomson et al.5 used SELECTION logroup P, which prevents the ectopic recombination SNP variation, which was identified using denaturing Selection that favours the lower- frequency alleles and so event between the PRKX and PRKY genes that pro- high-performance liquid chromatography (DHPLC), maintains polymorphism. duces many XX males. However, this will have little in ~66 kb of DNA from 53–70 individuals to infer a evolutionary impact because of the rarity of XX males. TMRCA of 59 (40–140) kya under a similar popula- XX MALE The association of low sperm count with haplogroup tion model. These point estimates are recent com- An individual with a 46,XX karyotype but a male phenotype K(xP) in the Danes deserves further investigation35. pared with TMRCAs of 177 kya for mtDNA41, 535 and rather than the expected female Because of the high geographical specificity of Y vari- 1,860 kya for two regions of the X chromosome42,43 phenotype ants, the absence of an association in one population and 800 kya for an autosomal gene44. 604 | AUGUST 2003 | VOLUME 4 www.nature.com/reviews/genetics © 2003 Nature Publishing Group

×