SlideShare a Scribd company logo
1 of 52
Download to read offline
DOI: 10.1126/science.1237619
, 562 (2013);341Science
et al.G. David Poznik
Common Ancestor of Males Versus Females
Sequencing Y Chromosomes Resolves Discrepancy in Time to
This copy is for your personal, non-commercial use only.
clicking here.colleagues, clients, or customers by
, you can order high-quality copies for yourIf you wish to distribute this article to others
here.following the guidelines
can be obtained byPermission to republish or repurpose articles or portions of articles
):August 7, 2013www.sciencemag.org (this information is current as of
The following resources related to this article are available online at
http://www.sciencemag.org/content/341/6145/562.full.html
version of this article at:
including high-resolution figures, can be found in the onlineUpdated information and services,
http://www.sciencemag.org/content/suppl/2013/08/01/341.6145.562.DC1.html
can be found at:Supporting Online Material
http://www.sciencemag.org/content/341/6145/562.full.html#related
found at:
can berelated to this articleA list of selected additional articles on the Science Web sites
http://www.sciencemag.org/content/341/6145/562.full.html#ref-list-1
, 22 of which can be accessed free:cites 46 articlesThis article
http://www.sciencemag.org/content/341/6145/562.full.html#related-urls
1 articles hosted by HighWire Press; see:cited byThis article has been
registered trademark of AAAS.
is aScience2013 by the American Association for the Advancement of Science; all rights reserved. The title
CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005.
(print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience
onAugust7,2013www.sciencemag.orgDownloadedfrom
Sequencing Y Chromosomes Resolves
Discrepancy in Time to Common
Ancestor of Males Versus Females
G. David Poznik,1,2
Brenna M. Henn,3,4
Muh-Ching Yee,3
Elzbieta Sliwerska,5
Ghia M. Euskirchen,3
Alice A. Lin,6
Michael Snyder,3
Lluis Quintana-Murci,7,8
Jeffrey M. Kidd,3,5
Peter A. Underhill,3
Carlos D. Bustamante3
*
The Y chromosome and the mitochondrial genome have been used to estimate when the common
patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males
from nine populations, including two in which we find basal branches of the Y-chromosome tree.
We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing
ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and
the mitochondrial genome, we estimate the time to the most recent common ancestor (TMRCA) of
the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome TMRCA to
be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages
do not coalesce significantly more recently than female lineages.
T
he Y chromosome contains the longest
stretch of nonrecombining DNA in the
human genome and is therefore a pow-
erful tool with which to study human history.
Estimates of the time to the most recent common
ancestor (TMRCA) of the Y chromosome have dif-
fered by a factor of about 2 from TMRCA estimates
for the mitochondrial genome. Y-chromosome
coalescence time has been estimated in the range
of 50 to 115 thousand years (ky) (1–3), although
larger values have been reported (4, 5), whereas
estimates for mitochondrial DNA (mtDNA) range
from 150 to 240 ky (3, 6, 7). However, the quality
and quantity of data available for these two uni-
parental loci have differed substantially. Whereas
the complete mitochondrial genome has been
resequenced thousands of times (6, 8), fully
sequenced diverse Y chromosomes have only
recently become available. Previous estimates of
the Y-chromosome TMRCA relied on short re-
sequenced segments, rapidly mutating micro-
satellites, or single-nucleotide polymorphisms
(SNPs) ascertained in a small panel of individ-
uals and then genotyped in a global panel. These
approaches likely underestimate genetic diver-
sity and, consequently, TMRCA (9).
We sequenced the complete Y chromosomes
of 69 males from seven globally diverse pop-
ulations of the Human Genome Diversity Panel
(HGDP) and two additional African populations:
San (Bushmen) from Namibia, Mbuti Pygmies
from the Democratic Republic of Congo, Baka
Pygmies and Nzebi from Gabon, Mozabite Berbers
from Algeria, Pashtuns (Pathan) from Pakistan,
Cambodians, Yakut from Siberia, and Mayans
from Mexico (fig. S1). Individuals were selected
without regard to their Y-chromosome haplogroups.
The Y-chromosome reference sequence is
59.36 Mb, but this includes a 30-Mb stretch of
constitutive heterochromatin on the q arm, a
3-Mb centromere, 2.65-Mb and 330-kb telomeric
pseudoautosomal regions (PAR) that recombine
with the X chromosome, and eight smaller gaps.
We mapped reads to the remaining 22.98 Mb
of assembled reference sequence, which consists
of three sequence classes defined by their com-
plexity and degree of homology to the X chro-
mosome (10): X-degenerate, X-transposed, and
ampliconic. Both the high degree of self-identity
within the ampliconictractsandthe X-chromosome
homology of the X-transposed region render por-
tions of the Y chromosome ill suited for short-read
sequencing. To address this, we constructed filters
that reduced the data to 9.99 million sites (11)
1
Program in Biomedical Informatics, Stanford University School
of Medicine, Stanford, CA, USA. 2
Department of Statistics,
StanfordUniversity,Stanford,CA,USA.3
DepartmentofGenetics,
Stanford University School of Medicine, Stanford, CA, USA.
4
Department of Ecology and Evolution, Stony Brook University,
Stony Brook, NY, USA. 5
Department of Human Genetics and
Department of Computational Medicine and Bioinformatics,
University of Michigan, Ann Arbor, MI, USA. 6
Department of
Psychiatry, Stanford University, Stanford, CA, USA. 7
Institut
Pasteur, Unit of Human Evolutionary Genetics, 75015 Paris,
France. 8
Centre National de la Recherche Scientifique, URA3012,
75015 Paris, France.
*Corresponding author. E-mail: cdbustam@stanford.edu
050100150200250300350400450500
FilteredDepthEWMA
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Position (Mb)
0.00.10.20.30.40.50.60.70.80.91.0
(MQ0/UnfilteredDepth)EWMA
Depth Filter
MQ0 Ratio Filter
Exclusion Mask
Inclusion Mask
Compatible Site
Incompatible Site
...
0 Mb 59.36 Mb
X degenerate X transposed Ampliconic Heterochromatic Pseudoautosomal Other
Fig. 1. Callability mask for the Y chromosome. Exponentially weighted
moving averages of read depth (blue line) and the proportion of reads
mapping ambiguously (MQ0 ratio; violet line) versus physical position.
Regions with values outside the envelopes defined by the dashed lines
(depth) or dotted lines (MQ0) were flagged (blue and violet boxes) and
merged for exclusion (gray boxes). The complement (black boxes) defines
the regions within which reliable genotype calls can be made. Below, a
scatter plot indicates the positions of all observed SNVs. Those incom-
patible with the inferred phylogenetic tree (red) are uniformly distributed.
The X-degenerate regions yield quality sequence data, ampliconic sequences
tend to fail both filters, and mapping quality is poor in the X-transposed
region.
2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org562
REPORTS
onAugust7,2013www.sciencemag.orgDownloadedfrom
(Fig. 1 and fig. S2). We then implemented a hap-
loid model expectation-maximization algorithm
to call genotypes (11).
We identified 11,640 single-nucleotide vari-
ants (SNVs) (fig. S3). A total of 2293 (19.7%)
are present in dbSNP (v135), and we assigned
haplogroups on the basis of the 390 (3.4%) present
in the International Society of Genetic Genealogy
(ISOGG) database (12) (fig. S4). At SNVs, me-
dian haploid coverage was 3.1x (interquartile range
2.6 to 3.8x) (table S1 and fig. S5), and sequence
validation suggests a genotype calling error rate
on the order of 0.1% (11).
Because mutations accumulate over time
along a single lengthy haplotype (13), the male-
specific region of the Y chromosome provides
power for phylogenetic inference. We constructed
a maximum likelihood tree from 11,640 SNVs
using the Tamura-Nei nucleotide substitution
model (Fig. 2) and, in agreement with (14), ob-
serve strong bootstrap support (500 replicates)
for the major haplogroup branching points. The
tree both recapitulates and adds resolution to
the previously inferred Y-chromosome phyloge-
ny (fig. S6), and it characterizes branch lengths
free of ascertainment bias. We identify extra-
ordinary depth within Africa, including lineages
sampled from the San hunter-gatherers that
coalesce just short of the root of the entire tree.
This stands in contrast to a tree from autosomal
SNP genotypes (15), wherein African branches
were considerably shorter than others; genotyp-
ing arrays primarily rely on SNPs ascertained in
European populations and therefore undersample
diversity within Africa. Two regions of reduced
branch length in our tree correspond to rapid
expansions: the out-of-Africa event (downstream
of F-M89) and the agriculture-catalyzed Bantu
expansions (downstream of E-M2). Among the
three hunter-gatherer populations, we find a rel-
atively high number of B2 lineages. Within this
haplogroup, six Baka B-M192 individuals form a
distinct clade that does not correspond to extant
definitions (11) (fig. S7). We estimate this pre-
viously uncharacterized structure to have arisen
~35 thousand years ago (kya).
We resolve the polytomy of the Y macro-
haplogroup F (16) by determining the branching
order of haplogroups G, H, and IJK (Fig. 2 and
fig. S6). We identified a single variant (rs73614810,
a C→T transition dubbed “M578”) for which
haplogroup G retains the ancestral allele, whereas
its brother clades (H and IJK) share the derived
allele. Genotyping M578 in a diverse panel con-
firmed the finding (table S2). We thereby infer
more recent common ancestry between hgH and
hgIJK than between either and hgG. M578 de-
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 1100.0 1200.0
H-M138Cambodian
N-M231Cambodian
E-P59 Nzebi
Q-M3 Maya
E-P116 Nzebi
E-M191Nzebi
E-P252 Nzebi
B-P70 San
E-U290 Nzebi
B-M192Baka
N-L708 Yakut
E-M183Mozabite
N-L708 Yakut
E-U290 Baka
E-P116 Nzebi
N-L708 Yakut
L-M357 Pashtun
R-L657 Pashtun
E-M154Nzebi
A-P28 San
Q-L54 Maya
B-M192Baka
A-M14 Baka
B-M30 Baka
E-P277 Nzebi
E-M183Mozabite
B-M192Baka
O-Page23 Cambodian
E-P278.1Nzebi
E-P252 Baka
E-P277 Nzebi
E-U290 Nzebi
E-P278.1Nzebi
E-P277 Nzebi
B-M211Baka
A-M51San
E-P252 Baka
E-M191Nzebi
E-P252 Mbuti
G-M406Pashtun
E-L515 Baka
N-L708 Yakut
E-P252 Baka
E-M183Mozabite
B-M112Baka
B-P6San
B-M211Baka
E-P277 Nzebi
B-M192Baka
A-P262San
G-M377Pashtun
E-P277 Nzebi
B-M109Nzebi
E-P277 Mbuti
E-M183Mozabite
B-M112Baka
B-Page18 Mbuti
B-M192Baka
E-P277 Nzebi
B-P6San
E-P252 Mbuti
B-M192Mbuti
E-P252 Nzebi
B-M30 Baka
B-M192Baka
E-P277 Nzebi
E-P252 Baka
O-M95 Cambodian
B-M112Baka
CT-M168
N-Page56
B-M150
P-M45
O-P186
E-U290
A-M6
B-P6
G-P287
B-M182
E-M2/M180
Q-L54
B-M211
E-M191
E-L514
BT-M42
E-P179
KxLT-M526
B-M192
E-U175/P277
N-L708
A-M14
B-M30
F-M89
E-M183
E-P252
A-L419
K-M9
NO-M214
BEFT(Non-African)A
Haplogroups
HIJK-M578
Fig. 2. Y-chromosome phylogeny inferred from genomic sequencing. This
tree recapitulates the previously known topology of the Y-chromosome phylogeny;
however, branch lengths are now free of ascertainment bias. Branches are drawn
proportional to the number of derived SNVs. Internal branches are labeled with
defining ISOGG variants inferred to have arisen on the branch. Leaves are colored
by major haplogroup cluster and labeled with the most derived mutation observed
and the population from which the individual was drawn. Previously uncharacterized
structure within African hgB2 is indicated in orange. (Inset) Resolution of a
polytomy was possible through the identification of a variant for which hgG
retains the ancestral allele, whereas hgH and hgIJK share the derived allele.
www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 563
REPORTS
onAugust7,2013www.sciencemag.orgDownloadedfrom
fines an early diversification episode of the Y
phylogeny in Eurasia (11).
To account for missing genotypes, we as-
signed each SNV to the root of the smallest sub-
tree containing all carriers of one allele or the
other and inferred that the allele specific to the
subtree was derived (fig. S8). We used the chim-
panzee Y-chromosome sequence to polarize 398
variants assigned to the deepest split—a task
complicated by substantial structural divergence
(11, 17).
We estimated the coalescence time of all Y
chromosomes using both a molecular clock–based
frequentist estimator and an empirical Bayes ap-
proach that uses a prior distribution of TMRCA
from coalescent theory and conducts Markov chain
simulation to estimate the likelihood of param-
eters given a set of DNA sequences (GENETREE)
(11, 18) (Table 1). To directly compare the TMRCA
of the Y chromosome to that of the mtDNA, we
estimated their respective mutation rates by cali-
brating phylogeographic patterns from the initial
peopling of the Americas, a recent human event
with high-confidence archaeological dating.
Archaeological evidence indicates that humans
first colonized the Americas ~15 kya via a rapid
coastal migration that reached Monte Verde II in
southern Chile by 14.6 kya (19). The two Native
American Mayans represent Y-chromosome hgQ
lineages, Q-M3 and Q-L54*(xM3), that likely
diverged at about the same time as the initial
peopling of the continents. Q is defined by the
M242 mutation that arose in Asia. A descendent
haplogroup, Q-L54, emerged in Siberia and is
ancestral to Q-M3. Because the M3 mutation
appears to be specific to the Americas (20), it
likely occurred after the initial entry, and the
prevalence of M3 in South America suggests
that it emerged before the southward migratory
wave. Consequently, the divergence between
these two lineages provides an appropriate cal-
ibration point for the Y mutation rate. The large
number of variants that have accumulated since
divergence, 120 and 126, contrasts with the
pedigree-based estimate of the Y-chromosome
mutation rate, which is based on just 4 mutations
(21). Using entry to the Americas as a calibration
point, we estimate a mutation rate of 0.82 × 10−9
per base pair (bp) per year [95% confidence
interval (CI): 0.72 × 10−9
to 0.92 × 10−9
/bp/year]
(table S3). False negatives have minimal effect
on this estimate due to the low probability, at
5.7x and 8.5x coverage, of observing fewer
than two reads at a site (observed proportions:
3.1% and 0.6%) and due to the fact that the
number of unobserved singletons possessed by
one individual is offset by a similar number of
Q doubletons unobserved in the same individual
and thereby misclassified as singletons possessed
by the other (11) (figs. S9 and S10). This calibra-
tion approach assumes approximate coincidence
between the expansion throughout the Americas
and the divergence of Q-M3 and Q-L54*(xM3),
but we consider deviation from this assumption
and identify a strict lower bound on the point of
divergence using sequences from the 1000 Ge-
nomes Project (11). As a comparison point, we
consider the out-of-Africa expansion of modern
humans, which dates to approximately 50 kya
(22) and yields a similar mutation rate of
0.79 × 10−9
/bp/year.
We constructed an analogous pipeline for
high coverage (>250x) mtDNA sequences from
the 69 male samples and an additional 24 females
from the seven HGDP populations (11) (fig. S11).
As in the Y-chromosome analysis, we calibrated
the mtDNA mutation rate using divergence with-
in the Americas. We selected the pan-American
hgA2, one of several initial founding haplogroups
among Native Americans. The star-shaped phy-
logeny of hgA2 subclades suggests that its di-
vergence was coincident with the rapid dispersal
upon the initial colonization of the continents
(23). Calibration on 108 previously analyzed hgA2
sequences (11) (fig. S12) yields a point estimate
equivalent to that from our seven Mayan mtDNAs,
but within a narrower confidence interval. From
this within-human calibration, we estimate a mu-
tation rate of 2.3 × 10−8
/bp/year (95% CI: 2.0 ×
10−8
to 2.5 × 10−8
/bp/year), higher than that from
human-chimpanzee divergence but similar to
other estimates using within-human calibration
points (24, 25).
The global TMRCA estimate for any locus con-
stitutes an upper bound for the time of human
population divergence under models without gene
flow. We estimate the Y-chromosome TMRCA
to be 138 ky (120 to 156 ky) and the mtDNA
TMRCA to be 124 ky (99 to 148 ky) (Table 1) (11).
Our mtDNA estimate is more recent than many
previous studies, the majority of which used mu-
tation rates extrapolated from between-species
divergence. However, mtDNA mutation rates are
subject to a time-dependent decline, with pedigree-
based estimates on the faster end of the spectrum
and species-based estimates on the slower. Be-
cause of this time dependency and the need to
calibrate the Yand mtDNA in a comparable man-
ner, it is more appropriate here to use within-
human clade estimates of the mutation rate.
Rather than assume the mutation rate to be a
known constant, we explicitly account for the
uncertainty in its estimation by modeling each
TMRCA as the ratio of two random variables.
We estimate the ratio of the mtDNA TMRCA to
that of the Y chromosome to be 0.90 (95% CI:
0.68 to 1.11) (fig. S13). If, as argued above, the
divergence of the Y-chromosome Q lineages
occurred at approximately the same time as that
of the mtDNA A2 lineages, then the TMRCA
ratio is invariant to the specific calibration time
used. Regardless, the conclusion of parity is
robust to possible discrepancy between the di-
vergence times within the Americas (11). Using
comparable calibration approaches, the Y and
Table 1. TMRCA and Ne estimates for the Y chromosome and mtDNA. Pop., population.
Method
Y chromosome mtDNA
Pop. n TMRCA* Ne Pop. n TMRCA* Ne
Molecular clock All 69 139 (120–156) 4500†
All 93 124 (99–148) 9500†
GENETREE‡
San 6 128 (112–146) 3800 Nzebi 18 105 (91–119) 11,500
Baka 11 122 (106–137) 1800 Mbuti 6 121 (100–143) 3700
*Employs mutation rate estimated from within-human calibration point. Times measured in ky. †Uses Watterson’s
estimator, %qw. ‡Each coalescent analysis restricted to a single population spanning the ancestral root (11).
Fig. 3. Similarity of
TMRCA does not imply
equivalent Ne of males
and females. The TMRCA
for a given locus is drawn
from a predata (i.e., prior)
distribution that is a func-
tion of Ne, generation time,
sample size, and demo-
graphic history. Consider
the distribution of possible
TMRCAs for a set of 100
uniparental chromosomes.
Although the Mbuti mtDNA
Ne is twice as large as that
of the Baka Y chromosome,
the corresponding predata
TMRCA distributions overlap
considerably.
0.0000.0020.0040.0060.0080.010
Time (ky)
ProbabilityDensity
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org564
REPORTS
onAugust7,2013www.sciencemag.orgDownloadedfrom
mtDNA coalescence times are not significantly
different. This conclusion would hold whether
or not an alternative approach would yield more
definitive TMRCA estimates.
Our observation that the TMRCA of the Y
chromosome is similar to that of the mtDNA
does not imply that the effective population sizes
(Ne) of males and females are similar. In fact,
we observe a larger Ne in females than in males
(Table 1). Although, due to its larger Ne, the dis-
tribution from which the mitochondrial TMRCA
has been drawn is right-shifted with respect to
that of the Y-chromosome TMRCA, the two dis-
tributions have large variances and overlap (Fig. 3).
Dogma has held that the common ancestor of
human patrilineal lineages, popularly referred to
as the Y-chromosome “Adam,” lived considera-
bly more recently than the common ancestor of
female lineages, the so-called mitochondrial
“Eve.” However, we conclude that the mitochon-
drial coalescence time is not substantially greater
than that of the Y chromosome. Indeed, due to
our moderate-coverage sequencing and the ex-
istence of additional rare divergent haplogroups,
our analysis may yet underestimate the true
Y-chromosome TMRCA.
References and Notes
1. J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun,
M. W. Feldman, Mol. Biol. Evol. 16, 1791–1798
(1999).
2. R. Thomson, J. K. Pritchard, P. Shen, P. J. Oefner,
M. W. Feldman, Proc. Natl. Acad. Sci. U.S.A. 97,
7360–7365 (2000).
3. H. Tang, D. O. Siegmund, P. Shen, P. J. Oefner,
M. W. Feldman, Genetics 161, 447–459 (2002).
4. M. F. Hammer, Nature 378, 376–378 (1995).
5. F. Cruciani et al., Am. J. Hum. Genet. 88, 814–818
(2011).
6. M. Ingman, H. Kaessmann, S. Pääbo, U. Gyllensten,
Nature 408, 708–713 (2000).
7. R. L. Cann, M. Stoneking, A. C. Wilson, Nature 325,
31–36 (1987).
8. P. A. Underhill, T. Kivisild, Annu. Rev. Genet. 41,
539–564 (2007).
9. M. A. Jobling, C. Tyler-Smith, Nat. Rev. Genet. 4,
598–612 (2003).
10. H. Skaletsky et al., Nature 423, 825–837 (2003).
11. Materials and methods are available as supplementary
materials on Science Online.
12. ISOGG, International Society of Genetic Genealogy
(2013); available at www.isogg.org/.
13. P. A. Underhill et al., Ann. Hum. Genet. 65, 43–62 (2001).
14. W. Wei et al., Genome Res. 23, 388–395 (2013).
15. J. Z. Li et al., Science 319, 1100–1104 (2008).
16. T. M. Karafet et al., Genome Res. 18, 830–838 (2008).
17. J. F. Hughes et al., Nature 463, 536–539 (2010).
18. R. C. Griffiths, S. Tavaré, Philos. Trans. R. Soc. London B
Biol. Sci. 344, 403–410 (1994).
19. T. Goebel, M. R. Waters, D. H. O’Rourke, Science 319,
1497–1502 (2008).
20. M. C. Dulik et al., Am. J. Hum. Genet. 90, 229–246
(2012).
21. Y. Xue et al.; Asan, Curr. Biol. 19, 1453–1457 (2009).
22. R. G. Klein, Evol. Anthropol. 17, 267–281 (2008).
23. S. Kumar et al., BMC Evol. Biol. 11, 293 (2011).
24. S. Y. W. Ho, M. J. Phillips, A. Cooper, A. J. Drummond,
Mol. Biol. Evol. 22, 1561–1568 (2005).
25. B. M. Henn, C. R. Gignoux, M. W. Feldman,
J. L. Mountain, Mol. Biol. Evol. 26, 217–230 (2009).
Acknowledgments: We thank O. Cornejo, S. Gravel,
D. Siegmund, and E. Tsang for helpful discussions; M. Sikora
and H. Costa for mapping reads from Gabonese samples; and
H. Cann for assistance with HGDP samples. This work was
supported by National Library of Medicine training grant
LM-07033 and NSF graduate research fellowship DGE-1147470
(G.D.P.); NIH grant 3R01HG003229 (B.M.H. and C.D.B.);
NIH grant DP5OD009154 (J.M.K. and E.S.); and Institut
Pasteur, a CNRS Maladies Infectieuses Émergentes Grant,
and a Foundation Simone et Cino del Duca Research Grant
(L.Q.M.). P.A.U. consulted for, P.A.U. and B.M.H. have stock
in, and C.D.B. is on the advisory board of a project at 23andMe.
C.D.B. is on the scientific advisory boards of Personalis, Inc.;
InVitae (formerly Locus Development, Inc.); and Ancestry.com.
M.S. is a scientific advisory member and founder of Personalis,
a scientific advisory member for Genapsys Former, and a
consultant for Illumina and Beckman Coulter Society for
American Medical Pathology. B.M.H. formerly had a paid
consulting relationship with Ancestry.com. Variants have been
deposited to dbSNP (ss825679106–825690384). Individual
level genetic data are available, through a data access
agreement to respect the privacy of the participants for
transfer of genetic data, by contacting C.D.B.
Supplementary Materials
www.sciencemag.org/cgi/content/full/341/6145/562/DC1
Materials and Methods
Supplementary Text
Figs. S1 to S13
Tables S1 to S3
Data File S1
References (26–51)
11 March 2013; accepted 25 June 2013
10.1126/science.1237619
Low-Pass DNA Sequencing of 1200
Sardinians Reconstructs European
Y-Chromosome Phylogeny
Paolo Francalacci,1
* Laura Morelli,1
† Andrea Angius,2,3
Riccardo Berutti,3,4
Frederic Reinier,3
Rossano Atzeni,3
Rosella Pilu,2
Fabio Busonero,2,5
Andrea Maschio,2,5
Ilenia Zara,3
Daria Sanna,1
Antonella Useli,1
Maria Francesca Urru,3
Marco Marcelli,3
Roberto Cusano,3
Manuela Oppo,3
Magdalena Zoledziewska,2,4
Maristella Pitzalis,2,4
Francesca Deidda,2,4
Eleonora Porcu,2,4,5
Fausto Poddie,4
Hyun Min Kang,5
Robert Lyons,6
Brendan Tarrier,6
Jennifer Bragg Gresham,6
Bingshan Li,7
Sergio Tofanelli,8
Santos Alonso,9
Mariano Dei,2
Sandra Lai,2
Antonella Mulas,2
Michael B. Whalen,2
Sergio Uzzau,4,10
Chris Jones,3
David Schlessinger,11
Gonçalo R. Abecasis,5
Serena Sanna,2
Carlo Sidore,2,4,5
Francesco Cucca2,4
*
Genetic variation within the male-specific portion of the Y chromosome (MSY) can clarify the
origins of contemporary populations, but previous studies were hampered by partial genetic
information. Population sequencing of 1204 Sardinian males identified 11,763 MSY single-nucleotide
polymorphisms, 6751 of which have not previously been observed. We constructed a MSY
phylogenetic tree containing all main haplogroups found in Europe, along with many
Sardinian-specific lineage clusters within each haplogroup. The tree was calibrated with
archaeological data from the initial expansion of the Sardinian population ~7700 years ago.
The ages of nodes highlight different genetic strata in Sardinia and reveal the presumptive
timing of coalescence with other human populations. We calculate a putative age for coalescence
of ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–based
estimates.
N
ew sequencing technologies have pro-
vided genomic data sets that can recon-
struct past events in human evolution
more accurately (1). Sequencing data from the
male-specific portion of the Y chromosome (MSY)
(2), because of its lack of recombination and low
mutation, reversion, and recurrence rates, can
be particularly informative for these evolution-
ary analyses (3, 4). Recently, high-coverage Y
chromosome sequencing data from 36 males from
different worldwide populations (5) assessed
6662 phylogenetically informative variants and
estimated the timing of past events, including a
putative coalescence time for modern humans of
~101,000 to 115,000 years ago.
MSY sequencing data reported to date still
represent a relatively small number of individuals
from a few populations. Furthermore, dating esti-
mates are also affected by the calibration of the
1
Dipartimento di Scienze della Natura e del Territorio, Uni-
versitàdiSassari,07100Sassari,Italy.2
IstitutodiRicercaGenetica
e Biomedica (IRGB), CNR, Monserrato, Italy. 3
Center for Ad-
vanced Studies, Research and Development in Sardinia (CRS4),
Pula, Italy. 4
Dipartimento di Scienze Biomediche, Università di
Sassari, 07100 Sassari, Italy. 5
Center for Statistical Genetics,
Department of Biostatistics, University of Michigan, Ann Arbor,
MI 48109, USA. 6
DNA Sequencing Core, University of Michigan,
Ann Arbor, MI 48109, USA. 7
Center for Human Genetics Re-
search, Department of Molecular Physiology and Biophysics,
Vanderbilt University, Nashville, TN 37235, USA. 8
Dipartimento
di Biologia, Universitàdi Pisa, 56126 Pisa, Italy. 9
Departamento
de Genética, Antropología Física y Fisiología Animal, Universi-
dad del País Vasco/Euskal Herriko Unibertsitatea, 48080 Bilbao,
Spain. 10
Porto Conte Ricerche, Località Tramariglio, Alghero,
07041 Sassari, Italy. 11
Laboratory of Genetics, National Institute
on Aging, Baltimore, MD 21224, USA.
*Corresponding author. E-mail: pfrancalacci@uniss.it (P.F.);
fcucca@uniss.it (F.C.)
†Laura Morelli prematurely passed away on 20 February 2013.
This work is dedicated to her memory.
www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 565
REPORTS
onAugust7,2013www.sciencemag.orgDownloadedfrom
www.sciencemag.org/cgi/content/341/6145/562/DC1
Supplementary Materials for
Sequencing Y Chromosomes Resolves Discrepancy in
Time to Common Ancestor of Males Versus Females
G. David Poznik, Brenna M. Henn, Muh-Ching Yee, Elzbieta Sliwerska,
Ghia M. Euskirchen, Alice A. Lin, Michael Snyder, Lluis Quintana-Murci,
Jeffrey M. Kidd, Peter A. Underhill, Carlos D. Bustamante*
*Corresponding author. E-mail: cdbustam@stanford.edu
Published 2 August 2013, Science 341, 562 (2013)
DOI: 10.1126/science.1237619
This PDF file includes:
Materials and Methods
Supplementary Text
Figs. S1 to S13
Tables S1 to S3
References
Other Supplementary Material for this manuscript includes the following:
(available at www.sciencemag.org/cgi/content/full/341/6245/562/DC1)
Data File S1. Sample, phylogeny, and variant data (zipped archive).
Data File S2. Y chromosome genotype calls.
To protect participant privacy, this zipped archive is available through a data access
agreement (DAA) for transfer of genetic data by contacting C.D.B.
Data File S3. Y chromosome mapped sequencing reads.
This BAM file is also available via the DAA described above. Mapping, quality score
recalibration, and indel realignment are described in Materials and Methods.
2
Table of Contents
Materials and Methods.............................................................................................. 4
Sequencing.......................................................................................................................... 4
Genotypes ........................................................................................................................... 4
Validation............................................................................................................................ 5
Phylogenetic Inference........................................................................................................ 5
mtDNA Analysis................................................................................................................. 6
Frequentist Estimation of TMRCA ......................................................................................... 6
Empirical Bayesian Estimation of TMRCA and Ne: GENETREE......................................... 10
Predata Distribution of TMRCA ........................................................................................... 11
Supplementary Text.................................................................................................. 12
Novel Y Chromosome Phylogenetic Structure................................................................. 12
Imputation......................................................................................................................... 12
Calibration and Mutation Rate Estimation ....................................................................... 13
Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation..................... 14
Calibration Time............................................................................................................... 17
Existence of Rare Yet More Basal Lineages .................................................................... 18
Effective Population Size.................................................................................................. 18
Additional Acknowledgements......................................................................................... 18
3
Supplementary Figures
Fig. S1. Map of populations. ............................................................................................ 19
Fig. S2. Sequencing read mapping on Xq21. ................................................................... 20
Fig. S3. Quality control and genotype calling on the Y chromosome.............................. 21
Fig. S4. Cross-tabulation of populations and Y haplogroups........................................... 22
Fig. S5. Call rate and mean sequencing coverage on the Y chromosome........................ 23
Fig. S6. Y chromosome phylogenetic backbone. ............................................................. 24
Fig. S7. Novel structure in Y hgB2. ................................................................................. 25
Fig. S8. Phylogeny-aware imputation. ............................................................................. 26
Fig. S9. Y chromosome hgQ clade with Phase 1 1000 Genomes samples included........ 27
Fig. S10. Sequencing coverage for Mayan HGDP00856 at singleton sites. .................... 28
Fig. S11. mtDNA phylogeny............................................................................................ 29
Fig. S12. mtDNA calibration tree..................................................................................... 30
Fig. S13. Comparing the Y chromosome TMRCA to that of mtDNA.................................. 31
Supplementary Tables
Table S1. Y chromosome summary of samples............................................................... 32
Table S2. M578 genotyping results. ................................................................................ 34
Table S3. Mutation rate point estimates........................................................................... 36
Supplementary Data
Data File S1. Sample, phylogeny, and variant data. ........................................................ 37
Data File S2. Y chromosome genotype calls................................................................... 37
Data File S3. Y chromosome mapped sequencing reads................................................. 37
FTP Addresses and Accession Numbers for External Data....................... 38
Y Chromosome hgQ Sequences from the 1000 Genomes Project ................................... 38
Complete mtDNA hgA2 Sequences: GenBank Accession Numbers............................... 38
References and Notes................................................................................................ 39
4
Materials and Methods
Sequencing
We prepared genomic libraries (26) from cell lines (HGDP) and blood (Gabonese), then
sequenced the libraries on Illumina HiSeq 2000 machines at the Stanford Center for
Genomics and Personalized Medicine. We used BWA (27) to map paired 101 bp reads to
the GRCh37 human reference, removed PCR duplicates with Picard (28), and then
utilized the Genome Analysis Tool Kit (GATK) (29, 30) to recalibrate quality scores,
perform local realignment around candidate indels, and compute genotype likelihoods.
Genotypes
Callability Mask
To learn directly from the read data the boundaries of the regions within which short-read
sequencing could yield reliable variant calls, we calculated average filtered read depth
across all samples in contiguous 1 kb windows and computed an exponentially-weighted
moving average (EWMA) of these values (Fig. 1). Regions for which the EWMA
deviated from a narrow envelope were identified as problematic. Those of depressed
depth corresponded to ampliconic sequences, within which reads do not map uniquely
and were thus filtered out. Regions of inflated depth corresponded to heterochromatin,
where naïve application of standard genotype calling methods would give the impression
of abundant heterozygosity due to the pileup of highly similar reads around the borders of
unassembled regions. After constructing the depth-based filter, we repeated this
procedure for the MQ0 ratio, the proportion of unfiltered reads with fully ambiguous
mapping. Although the X-transposed region showed no deviation in the depth-based
mask, it failed the MQ0 ratio based mask. In females we found depressed read depth in
the homologous region of the X chromosome (Fig. S2); we hypothesize that in males,
each of whom possesses one X and one Y, there is an equal exchange of mismapped
reads between the two chromosomes. The depth and MQ0 masks were merged and
smoothed, leaving 10.45 Mb of sequence for down-stream quality control.
Site-Level Quality Control
With the regional mask in hand, we defined a series of site-level quality control filters
(Fig. S3A). Of the 22,974,737 mapped coordinates, 12,532,580 fell within the bounds of
the regional exclusion mask. A further 129,411 were excluded due to an MQ0 ratio
greater than or equal to 0.10, and 170,144 were excluded because more than 20 samples
had missing genotypes, either due to an absence of sequencing reads or to a heterozygous
maximum likelihood genotype (Fig. S3B). The remaining polymorphic sites had a
median depth (across all samples) of 265, and we filtered out all sites whose depth was
outside three median absolute deviations of this value, thus excluding 12,425 with depth
above 371 and 141,512 below 159 (Fig. S3C). Finally, we culled 547 sites with a
heterozygous maximum likelihood genotype in more than seven samples (Fig. S3D). This
left 9,988,118 callable sites. Of 432 ISOGG SNPs with observed variation in our data,
5
393 pass the regional and mapping quality filters, and of these, just one failed the
missingness filter and a further two the depth filter.
Genotype Calling
To call genotypes, we implemented a haploid model EM algorithm that treated allele
frequency as the latent variable and used the homozygous state genotype likelihoods
calculated by GATK. Genotypes with a heterozygous maximum likelihood state were
classified as missing because calls in such cases were found to be disproportionately
incompatible with the inferred phylogeny.
Validation
The false positive rate is kept low primarily by the fact that GATK generally requires at
least 2 reads of support to identify a site as variable. In addition, we exclude sites
incompatible with the phylogeny. Though this filter discards some genuine homoplasic
variants, the class is enriched for false positives, and we have chosen to err on the side of
conservatism. We consider three means of validation.
Sanger Sequencing
We validated Y chromosome genotypes for the 29 male HGDP samples at 46 sites using
a combination of targeted PCR and Sanger sequencing (3 sites), and exome capture
followed by Illumina sequencing (43 sites). Validation failed to yield data for two
genotypes, and we compared the remaining 1,245 genotypes to the main data set to find a
concordance rate of 99.92%. Just one genotype was discordant (M150, hg19 position
21869519, in HGDP00462). The genotype had zero sequencing reads of support, and the
individual had been imputed to carry the reference allele whereas the validation data
indicated that this sample actually carries the non-reference allele. Only one other
sample, the nearest neighbor to HGDP00462, also carried the non-reference allele, and
this illustrates the fact that it is impossible to properly impute missing genotypes for sites
otherwise identified as singletons (Supplementary Text, “Imputation” section).
Minimally Diverged Samples
We also consider private variation among minimally diverged individuals to argue that
sequencing errors are minimized in our study. Specifically, we observe a cluster of five
Baka hgB2 samples with just a handful of singletons per lineage. This group
approximates a replication set and thus gives tight upper bounds on the false positive
variant rate.
Haplogroup Assignments
All HGDP haplogroup assignments were consistent with prior ISOGG designations.
Phylogenetic Inference
We used MEGA5 (31) to construct maximum likelihood phylogenetic trees.
6
mtDNA Analysis
mtDNA Pipeline
To call mitochondrial haplogroups, we converted sequences from the GRCh37 to the
rCRS coordinate system and imported to HaploGrep (32), which draws on the
Phylotree database (33). We explicitly utilized data presented in Table 1 of Behar et al.
(34) to polarize alleles for variants assigned to the most ancient split—that between hgL0
and the rest of the tree (Fig. S11). Whereas the mutation rate on the Y chromosome is
sufficiently low that we could regard base substitutions as unique events and simply
discard sites that were incompatible with the phylogeny, excluding sites would have been
inappropriate for the mitochondrial genome, in which a much higher mutation rate has
led to considerable homoplasy. To account for this, we split sites with multiple
substitutions into pseudo-sites, each of which constitute a unique event. We discarded a
few mutational hotspot sites with evidence for more than four unique substitution events.
Calibration Based on mtDNA hgA2
Since there are far fewer segregating sites in the mitochondrial genome, and we only had
seven hgA2 lineages, we used 108 publicly available hgA2 Native American sequences
to calibrate. Kumar et al. (23) list 568 accession numbers for mitochondrial genomes, 134
of which belong to hgA2 and are of American descent. We downloaded the subset of 108
entries that included the full mtDNA sequence and, along with the GRCh37 reference
sequence, conducted a multiple alignment using MUSCLE (35). We then called
haplogroups, built a tree (Fig. S12), assigned variants to branches, and resolved
homoplasies as described above.
Frequentist Estimation of TMRCA
The Molecular Clock
Under the infinite sites model, mutations accumulate in a Poisson process of rate µl, the
locus-wide mutation rate. To estimate TMRCA, molecular clock approaches first estimate
the mean number of derived mutations per lineage and then divide by an estimate of the
mutation rate. For both the Y chromosome and the mtDNA, we estimate TMRCA with:
where D is the sample average of { Di }, the inferred number of mutations accumulated
by each lineage since the global MRCA:
ˆT =
D
ˆµly
,
D =
1
n
nX
i=1
Di.
7
We estimated the { Di } using a maximum likelihood phylogeny (Fig. 2), and we estimate
the yearly mutation rate, µly, as:
where t is the known TMRCA of the calibration subclade and C is the sample average of {
Ci }, the number of derived mutations acquired by each lineage since the common
ancestor of the subtree:
Here nc is the number of individuals within the calibration subclade. is therefore a
scaled ratio of two random variables:
TMRCA Confidence Intervals
From the frequentist perspective, we consider T a fixed but unknown constant, and we are
interested in the sampling variance of our estimator conditional on its true value. Since
the calibration subtree is a small fraction of the total tree, D and C are approximately
uncorrelated. This fact simplifies the expression for the standard deviation of a ratio of
random variables, which is obtained using the δ method (36):
Since both D and C are sums of Poisson random variables with a large number of total
events, each is well approximated by the normal distribution. Consequently, their ratio is
also approximately normally distributed (37). Therefore, if we are able to compute σD|T
and σC, we can construct a confidence interval for T.
We first consider σD|T. The { Di } are identically Poisson distributed, but they are not
independent due to the shared internal branches (3). Thus,
Since each Di is a Poisson random variable, its variance is equal to its mean. Now
consider samples i and j. The numbers of mutations that have accumulated in each since
ˆµly =
C
t
,
C =
1
nc
ncX
i=1
Ci.
€
ˆT
ˆT = t
D
C
.
ˆT|T ⇡
t
C
s✓
D
C
C
◆2
+ 2
D|T .
2
D|T = Var[D|T] =
1
n2
"
X
i
Var[Di|T] + 2 ·
X
i
X
j>i
Cov [Di, Dj|T]
#
.
8
their MRCA are independent. However, they share all mutations possessed by their
MRCA. Thus,
where Dij is the number of derived variants possessed by the common ancestor of i and j.
Let I denote the set of internal branches, and let bs and bl be the number of descendants
and the length of a branch, b, respectively. Each internal branch will be shared by bs
choose 2 pairs of individuals. Thus,
which gives:
An identical argument applies to σC within the calibration subtree. We, therefore,
construct a 95% confidence interval for TMRCA as:
The bias of the point estimator is minimal (36).
Precision of TMRCA Estimation
The standard error for the mean estimate of a Poisson random variable with mean µlT is
€
µlT n , so the coefficient of variation (the ratio of the standard error to the mean)
declines in proportion to
€
nµlT . On the Y chromosome, T is large and, because the non-
recombining locus is so long, µl is quite large as well. Consequently, the standard error
for estimating the mean branch length is relatively small, and the greater source of
uncertainty lies in estimating the mutation rate, where the time intervals over which
mutations have accumulated are shorter, and the number of lineages is smaller. However,
µl is sufficiently large that we could derive a narrow confidence interval based solely on
the two hgQ lineages we had sequenced. In contrast, for the mtDNA, the uncertainty due
to σD|T exceeds that due to σC.
An Alternative Frequentist Estimator
Cov [Di, Dj|T] = Dij,
2 ·
X
i
X
j>i
Cov [Di, Dj|T] = 2 ·
X
b2I
✓
bs
2
◆
bl =
X
b2I
bs(bs 1)bl,
D|T =
1
n
sX
i
Di +
X
b2I
bs(bs 1)bl.
T = ˆT ± z0.025 · ˆT|T
T = t
2
4D
C
± z0.025 ·
1
C
v
u
u
t
✓
D
C
C
◆2
+
1
n2
X
i
Di +
X
b2I
bs(bs 1)bl
!3
5 .
9
An alternative frequentist estimator defines D as half the average mutational distance dij
between pairs of individuals that span the ancestral root (3):
Here, L and R represent sets of individuals on the left and right side of the root. This
estimator is less well-suited to our data set. We have four Y hgA individuals on the left
side of the tree and 65 individuals on the right side. This partition-based approach
effectively upweights information from the hgA samples, since all distances are measured
with respect to a member of this clade. However, we have lower effective coverage on
the internal branches of hgA than elsewhere in the tree. This is due to both the lower
number of samples and the fact that hgA lineages are highly diverged. Consequently,
these are exactly the samples for which false negatives are of greatest potential impact.
For the sake of comparison, the TMRCA point estimates from this approach are 134 ky and
118 ky for the Y chromosome and mtDNA, respectively.
Estimating the Ratio of mtDNA TMRCA to Y TMRCA
To compare the TMRCA of the Y chromosome to that of the mtDNA, we estimate the ratio:
where we define M and Y as the fixed but unknown unscaled TMRCA of the mtDNA and Y
respectively, and R as the ratio M / Y. The quantity τ = tm / ty is the ratio of coalescence
times of the Native American lineages, mtDNA hgA2 and Y chromosome hgQ. Our
estimator of γ is:
where
The standard error is:
Since R is the ratio of two random variables, its standard error is:
D =
1
2|L||R|
X
i2L
X
j2R
dij.
=
Tm
Ty
=
tmM
tyY
= ⌧R,
ˆ = ⌧ ˆR = ⌧
ˆM
ˆY
,
ˆM = Dm/Cm,
ˆY = Dy/Cy,
ˆR = ˆM/ˆY .
ˆ| = ⌧ ˆR|M,Y .
10
where
€
ρ = Corr[ ˆM | M, ˆY |Y ]. We cannot disregard the correlation term in this case. If the
TMRCA of male and female lineages are correlated, their estimates will be as well, though
the correlation of the estimates would necessarily be less than that of the true values due
to the uncertainty in both variables. Confidence bands for γ are defined by:
To assume zero correlation would be conservative, as positive correlation reduces the
variance. We consider representative values of ρ for the sake of comparison (Fig. S13).
Again, the bias of the point estimator is minimal (36).
Empirical Bayesian Estimation of TMRCA and Ne: GENETREE
As distributed, GENETREE can handle only 99 sites per run, but we modified the source
code to enable runs of several thousand SNPs. First, we perform a grid search to obtain a
maximum likelihood estimate for the scaled mutation rate, θ = 2Neµlg, where µlg is the
locus-wide per generation mutation rate. We then simulate the posterior distribution of
TMRCA, conditional on this estimate. We restricted each analysis to a single population so
that the assumption of exchangeability of lineages (38) would hold. As the TMRCA is
determined by the deepest coalescence in a sample, we exclusively analyzed populations
that sample from both sides of the tree (Fig. 2): the San and Baka for the Y chromosome
and the Mbuti and Nzebi for the mitochondrial genome. Results from the Baka and Mbuti
Pygmy populations are the most directly comparable (Table 1).!
We excluded several lineages from the GENETREE analyses. In the Baka, we excluded
three samples possessing high levels of autosomal identity by descent with another
individual, as inferred with Illumina Omni SNP arrays. We also excluded six Baka hgE
samples, as these likely represent West African agriculturalist lineages that have
introgressed into the Baka a few thousand years ago (39) in violation of the
exchangeability assumption of coalescent theory. In the mitochondrial analysis we
removed two Nzebi and one Mbuti because GENETREE does not allow for identical
lineages.
Point estimates for the Baka Y chromosomes reflect averages of multiple coalescent runs.
Each run subsampled 1500 (of 2927) segregating sites to overcome computation
limitations for the full dataset. Estimates for the Mbuti mtDNAs reflect averages of
multiple coalescent runs, each with a different random seed, as these runs were more
variable due to a smaller Poisson mean (nµl).
ˆR|M,Y ⇡
1
E[ˆY |Y ]
v
u
u
t E[ ˆM|M]
E[ˆY |Y ]
ˆY |Y
!2
+ 2
ˆM|M
2⇢ ˆM|M ˆY |Y
E[ ˆM|M]
E[ˆY |Y ]
,
= ⌧
"
ˆM
ˆY
± z0.025 · ˆR|M,Y
#
.
11
Coalescent theory measures time in units of Ne generations. To convert to years, we use
the maximum likelihood estimate of θ, the gender-specific generation time (g; Table S3),
and the Native American calibration estimate for µly, the locus-wide per year mutation
rate:
GENETREE is suboptimal for our data set. Due to the exchangeability assumption and
computational limitations, each analysis draws information from just a subset of the data.
Because the full sequence data is highly informative about the underlying gene
genealogy, very few random trees are compatible with it. This makes GENETREE a
highly inefficient approach to estimating population genetic parameters. Thus, we
emphasize the point estimates and confidence intervals derived from the frequentist
approach.
Predata Distribution of TMRCA
For a constant population size, the TMRCA of a locus, measured in Ne generations, is given
by:
where Ti is the time during which i ancestral lineages of the sample existed. Coalescent
theory (38) models Ti as an exponential random variable with parameter:
To obtain the distributions presented in Fig. 3, we simulated five million draws of TMRCA
for n = 100 lineages and scaled each value by a factor of Ne·g to convert to years.
ˆNe =
ˆ✓
2ˆµlg
=
ˆ✓
2gˆµly
ˆTMRCA = ˆTc
ˆNeg =
ˆTc
ˆ✓
2ˆµly
TMRCA =
nX
i=2
Ti,
i =
✓
i
2
◆
.
12
Supplementary Text
Novel Y Chromosome Phylogenetic Structure
Haplogroup B2
Within hgB2, we identify one clade and three additional lineages that represent
previously uncharacterized structure (Figs. 2, S7). Each lineage represents an ancient
divergence within the Y chromosome phylogeny and carries no known differentiating
mutations downstream of M192 and Page72, which define hgB2b1.
First, in the main text we describe a subclade of B2b1a that encompasses six Baka
individuals. Previously, B2b1a2 was associated with the P70 variant, but because these
six Baka individuals carry the ancestral allele for P70, we propose reassociating P70 with
a new label, “B2b1a2a,” and labeling the new clade “B2b1a2b.” Second, B2b1b was
previously associated with P6, but we have identified a Mbuti individual carrying the
ancestral allele for this variant. Thus, we propose associating P6 with a new label,
“B2b1b1,” and designating the new lineage “B2b1b2.” Finally, we identify two new
lineages within B2b1a1. The individuals representing both of these lineages carry the
ancestral T allele for the M169 variant that defines B2b1a1a, the only extant sublineage
of B2b1a1 not represented.
Haplogroup F
Table S2 presents genotyping results for the M578 variant in separate panel of
individuals. The results confirm the (G, H, IJK) → (G, (H, IJK)) polytomy resolution.
The demographic fates of hgG and hgHIJK were geographically asymmetric, with the
spread zone of hgG (40) considerably more restricted than that of hgHIJK (Fig. S6). The
latter now spans all continents, including Africa due to the back migration of some
haplogroups (41).
Imputation
We used our phylogeny-aware algorithm (Fig. S8) to impute approximately 5.3 missing
genotypes per Y chromosome variant site and a median of 826 per individual.
Imputation Limitations
It is not possible to impute singletons: when the carrier of a unique allele has zero reads
of support, there is no evidence for variation at the site. Doubletons pose a similar
problem. Let A and B be nearest neighbors in the phylogeny. Consider the case where, at
a given site, A possesses an allele not observed in any other sample, and B has zero reads.
It is impossible to distinguish whether the site is an A singleton or an A/B doubleton.
However, conditional on one sample missing data at a particular site, our imputation
strategy correctly imputes two thirds of tripletons; it fails only in the case where the
lineage of the missing sample is the last to coalesce. For four lineages, there are 18
possible trees. Of these, twelve consist of stepwise coalescence, and the lineage with
13
missing data is the most diverged in just three. Thus, we correctly impute five-sixths of
quadrupletons.
Polarizing Variants on the Branch Spanning the Ancestral Root
Our method to infer the ancestral state at a given site was inapplicable to the 398 variants
assigned to the most ancient (basal) split, as no outgroup for these branches was present
within the data set. For these, we first conducted a LiftOver (42) to map GRCh37
coordinates to those of the chimpanzee reference (PanTro3). Due to the abundance of
large-scale inversions between the two chromosomes (17), it was necessary to BLAT
(43) 101 bp chunks of DNA surrounding each human variant to infer relative orientation.
Ancestral states were thereby inferred for 322 variants, and those of the remaining 76, for
which the corresponding chimpanzee allele could not be inferred, were randomly
assigned in the corresponding proportion.
Homoplasy and the Infinite Sites Model
We deemed a SNV consistent with the tree when we observed no ancestral alleles in the
subtree rooted at the branch to which the SNV was assigned. Most variants (11,279) were
consistent with the tree, and we imputed missing genotypes for those that were. Sites
incompatible with the phylogeny were uniformly distributed across the callable regions
(Fig. 1) and were excluded from downstream analyses. Just 199 (of 361) incompatibilities
were supported by more than one sequencing read. This lack of homoplasy on the Y
chromosome justifies usage of the infinite sites model.
Calibration and Mutation Rate Estimation
Mutation rate estimates are typically based on family pedigrees (14) or species
phylogenies, such as the human-chimpanzee divergence (2, 3). However, just one
pedigree-based rate is available for the Y chromosome, and, though the mutation process
is highly stochastic, this rate is based on a single pedigree. Furthermore, precise
alignment between the human Y chromosome and that of the chimpanzee is difficult due
to extreme structural divergence. Finally, if the Y is subject to a time-dependent mutation
rate, as is mtDNA (24, 25), then neither estimation approach is ideal for dating human
population events.
Instead, we estimate mutation rates using a within-human calibration point, the initial
migration into and expansion throughout the Americas. Well-dated archaeological sites
include Paisley Cave in Oregon, which dates to 14.3 kya (19); Buttermilk Creek in
Central Texas, at 13.2–15.5 kya (44); and Monte Verde II in Southern Chile, 14.6 kya
(45). To date the expansion of genetic lineages unique to the Americas, we follow Goebel
et al. who state that the most parsimonious estimate is that “humans colonized the
Americas around 15 kya” (19). We show that a lack of parity between the expansion
event and the divergence of lineages used for calibration would have minimal effect on
the difference between the TMRCA of the Y and mtDNA if the divergences are within a few
thousand years of one another (Fig. S13, Materials and Methods).
14
For reference and comparison, Table S3 summarizes mutation rate point estimates on
four scales. The Y chromosome mutation rates are similar to previous autosomal
phylogenetic-based mutation rates and extended pedigree-based rates, but they are almost
two-fold higher than autosomal mutation rates based on trios (46).
Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation
We developed a method to estimate the variance in estimated TMRCA that is due to the
stochastic nature of the mutation process (Materials and Methods, “Frequentist
Estimation of TMRCA” section). Here we discuss the potential impact of bias due to
sequencing error and modest sequencing coverage. We have estimated TMRCA by
calculating the ratio of two quantities, divergence and the mutation rate, each of which
depends on experimental measurements. The numerator is the average tip-to-root height
of the tree, and we estimate the denominator as the ratio of average branch length within
the calibration subtree to the calibration time. Data for each of the three measurements is
imperfect. In this section, we consider potential biases in the first two, and we consider
calibration time in the next section.
Tip-to-Root Height
We measure tip-to-root height as the total number of SNVs assigned to all branches
separating an individual from the common ancestor of all individuals. This sum includes
the singletons of the terminal branch and the shared variants on the internal branches.
Two factors act in opposition to stretch and shrink an observed branch length with respect
to its true value: sequencing error and the total sequencing coverage of the branch, which
itself is influenced both by sequencing coverage of individuals and by sampling density
of the clade rooted at the branch. The primary effect of sequencing error is to stretch
terminal branches, as it is unlikely that random sequencing errors will cluster
phylogenetically. We have demonstrated that genotype error is minimal (Materials and
Methods, “Validation” section). Consequently, branch lengths are not significantly
inflated by sequencing error.
Though modest sequencing coverage translates to unobserved variants near the tips of the
tree, thereby shortening observed heights, the internal branches of the tree, which
constitute the overwhelming majority of any tip-to-root path, have quite high coverage
due to the superposition of sequencing from all descending lineages. Thus, most observed
internal branch lengths cannot differ significantly from their true lengths. Fortunately, the
most divergent sample with the longest terminal branch, the San individual in the hgA-
M51 clade, had higher than average sequencing coverage (6.15×) and, consequently, call
rate (0.985). We observed 1012 private variants in this individual, and we estimate
approximately 22 false negatives—unobserved variants with either a no-call genotype or
just one sequencing read, an event insufficient to identify a site as variable. This worst-
case scenario is less than 2% of the average tip-to-root height. We likely have very few
false negatives in other individuals, even among those of lower coverage, since the lower
coverage samples are clustered in the densely sampled portions of the tree, such as in hgE
and portions of hgB, and the imputation strategy we’ve implemented enables these
lineages to receive credit for variation detected in neighbors and which they can be
15
inferred to possess. Finally, the maximum observed tip-to-root height (1188), could be
considered a conservative upper bound on the true mean, and it differs from the observed
mean by just 5%.
Branch Lengths in the Calibration Subtree
We now consider how sequencing coverage affects branch lengths in the Y chromosome
hgQ subtree used to estimate the mutation rate. We sequenced Mayan HGDP00856, a
representative of hgQ-M3, to 5.7× coverage and Mayan HGDP00877, whose haplogroup
is labeled hgQ-L54*(xM3) because it carries the L54 mutation but is ancestral at the M3
SNP, to an average depth of 8.5×. Had we sequenced the two Mayan lineages to lower
coverage, we would have artificially boosted TMRCA estimates by underestimating the
mutation rate. However, haploid coverage for the Mayan samples are high enough that
false negatives have little impact on our calibration. The rate of false negatives is
dominated by sites in the terminal branches of the tree with either zero or one sequencing
read for a sample. When an individual has zero or one read at a shared SNP, we can
usually impute its genotype, but it is not possible to impute singletons or to distinguish a
singleton from a doubleton in the presence of missing data (Supplementary Text,
“Imputation” section). Although missing singletons and misclassified doubletons have
little impact on total branch length from the tips to the root of the entire tree, they are
quite important for calibration because singletons constitute a significant portion of
branch length within the calibration subtree.
In our study, the shared hgQ branch is of approximately the same length as the Q-M3 and
Q-L54*(xM3) terminal branches. Consequently, no-call genotypes at singletons sites,
which lead to missing singletons, are counterbalanced by no-call genotypes in the shared
hgQ branch, which lead to doubletons misclassified as singletons. This relies on the fact
that at 5.7× and 8.5× coverage, the no-call rates on the doubleton and singleton branches
are comparable. In general, a no-call due to the presence of just a single sequencing read
is less likely to occur on the doubleton branch than on the singleton branch, but of the
9,988,118 callable sites only 194,966 (2.0%) and 23,989 (0.2%) are covered by just one
read in HGDP00856 and HGDP00877, respectively.
To empirically estimate the false negative rate within the hgQ subtree used for
calibration, we incorporated data from the 1000 Genomes Project (47). We downloaded
genotype calls (VCF files) for 525 males from Phase 1, called haplogroups, and identified
eleven individuals belonging to hgQ1
. We then downloaded aligned sequence data (BAM
files) for these samples, converted from the GRCh37 to hg19 reference, and applied our
pipeline to the combined set of 80 individuals (Fig. S9). In the combined analysis, the
branch shared by all hgQ lineages grew from 136 to 146 SNPs2
. One SNP had not been
called in either HGDP sample (hg19 position 15825218), and nine SNPs were no-calls in
HGDP00856: three due to the absence of reads, and six due to one erroneous read (of 4–
1
A twelfth, NA19753, was sequenced using SOLiD. We did not include this sequence in our analysis since
it is likely to have different error and mapping properties than those generated by Illumina technology.
2
The exact length is 149, but the difference includes two SNPs that were on the borderline of the depth-
based filter in the main study and a net of one SNP discarded due to homoplasy: two in the main study and
one in the combined analysis.
16
10). With perfect data, these nine SNPs would have been classified as doubletons, but
they were instead misclassified as HGDP00877 singletons. Thus, for HGDP00856, we
can estimate the no-call rate within the hgQ subtree, β0 ≈ 6.8% (10 / 146). Partly because
the coverage is higher, we observed no doubletons misclassified as singletons due to
missingness in HGDP008773
. Thus, for HGDP00877, β0 ≈ 0.7% (1 / 146).
Whereas on the shared doubleton branch the no-call rate should sufficiently inform the
type 2 error rate (βd ≈ β0), the no-call rate does not provide complete information for the
terminal branches since GATK, prudently, will most often not designate a site as variable
if there is just one sequencing read with the alternative allele in the entire sample. Thus,
to fully model the singleton type 2 error rate, βs, we must also consider the probability of
observing just one read, β1, since when this occurs at a singleton site, a false negative will
most often result. To do so, we computed the sequencing read depth distribution over all
ten million callable sites for each sample. Scaling this empirical probability mass
function by the number of singletons observed in the individual and censoring to discard
the zero-read and one-read bins, we observe that when coverage exceeds 4×, the expected
read-depth distribution among singletons closely mirrors the observed distribution (Fig.
S10). This suggests that there are few false negatives at sites for which at least two
sequencing reads are observed. Thus, βs ≈ β0 + β1.
When a branch with false negative rate β has true length L and observed length Y, the
number of unobserved variants, X, is given by:
.
On the HGDP00856 singleton branch, we have Y = 126 and, from the empirical read-
depth distribution, β1 = 2.0%. Thus, βs ≈ β0 + β1 = 6.8% + 2.0% = 8.8%, which gives X ≈
12.2 missing singletons. This is likely an overestimate because the no-call rate across all
variable sites, 2.2% (Table S1), is lower than the empirical rate within the subtree, 6.8%.
The branch shared by all hgQ-M3 lineages (branch 18 in Fig. S9) affords an opportunity
to empirically check the singleton false negative rate for HGDP00856, since this
individual should possess each of these variants. We had correctly called 16 of 17 in our
main analysis. This suggests a singleton false negative rate for this sample of 1/17 =
5.9%4
, but the variance for this particular estimate is quite high since it is based on just
17 sites, so to be conservative, we use the value of 8.8% estimated above.
For HGDP00877, we have Y = 120 and β1 = 0.2%, which give βs ≈ 0.7% + 0.2% = 0.9%,
and X ≈ 1.1 missing singletons. This prediction cannot be tested empirically with these
data because the lineage is an outgroup to the two hgQ-L54*(xM3) sequences from the
1000 Genomes Project. As discussed above, there were nine doubletons previously
3
It is possible that one such SNP exists and is missing in all three hgQ-L54*(xM3) sequences, but this is a
low probability event.
4
The lone false negative occurred at hg19 position 22613361. Prior to imputation, we do make the correct
call in the combined analysis, because one read was present, and it carried the derived A allele.
X = L =
1
Y
17
classified as HGDP00877 singletons, so accounting for type 2 errors reduces this branch
length by 7.9 (9 – 1.1).
Putting these two together, we compute the average branch length since MRCA of the
two samples as 125 SNPs, which differs by the observed value of 123 by 1.6%. Thus, one
might wish to scale our Y chromosome TMRCA estimates by a factor of 123 / 125 = 0.984.
However, the effect of false negatives would be offset by false positives, should one or
two exist, so we choose not to.
False negatives are not an issue for mitochondria, where all sequences are complete.
Calibration Time
In light of the above, the largest potential source of bias is the calibration time: the dating
of the arrival of humans into the Americas and the approximation of synchronicity of this
arrival with phylogenetic divergences.
Timing of Expansion into the Americas
Archaeological dates for the time of first arrival in the Americas range from 14.3–16.5
ky. Goebel, et al. (19) conclude that the most parsimonious estimate is that “humans
colonized the Americas around 15 kya,” so we elect 15 ky as reasonable figure for both
the maternal and paternal loci. If the true divergence time of American lineages were 14.3
ky, one must scale down the TMRCA ranges we report by about 5%. Likewise, for 16.5 ky,
an increase of 10% would be requisite. However, the specific number used will have no
effect on the relative TMRCA estimates for the two loci, provided the divergences of the
two loci were contemporaneous. We consider the case of unequal split times in Fig. S13
(Materials and Methods, “Estimating the Ratio of mtDNA TMRCA to Y TMRCA”
subsection).
Y Chromosome Calibration Point
With 108 sampled lineages, the point of rapid expansion within the Americas among
mtDNA hgA2 lineages is clear. However, the corresponding point within Y hgQ is less
so. Though we have argued that M3 most likely occurred shortly subsequent to initial
entry to the Americas, it remains possible that hgQ-M3 and hgQ-L54*(xM3) diverged
within Siberia or Beringia. When we include lower coverage 1000 Genomes hgQ
lineages, we observe a star-like diversification among the Q-M3 derived lineages (Figure
S9, below branch #18). It is possible that some subset of the 17 M3-equivalent mutations
accumulated prior to entry—within Beringia, for example, as has been proposed for
mtDNA founding lineages (48). However, 12 of the 13 sequenced individuals are from
Mexico, and this sampling bias could obscure a more upstream initiation of the
expansion. For example, it is possible that hgQ-M3 lineages within Greenland do not
share all 17 of these mutations. Because just three sequences represent hgQ-L54*(xM3),
the phylogenetic structure of this subhaplogroup remains largely unknown, but the root of
the sampled hgQ-M3 lineages can be used to calculate a strict lower bound on the
mutation rate, as entry to the Americas certainly happened no later than this point.
18
The 1000 Genomes lineages are inappropriate to calibrate upon due to lower sequencing
coverage (average = 2.9×; Supplementary Text, “Branch Lengths in the Calibration
Subtree” subsection), so we are left with a single lineage from our sample, HGDP00856,
for this lower bound calculation. Accounting for false negatives had little effect when two
samples were used for calibration, as the degree to which the hgQ-M3 branch grew was
offset by a corresponding shrinkage of the hgQ-L54*(xM3) due to the hgQ doubletons
that were unobserved in HGDP00856 and thereby misclassified as HGDP00877
singletons. However, it is important to correct for type 2 errors when considering this
lineage alone. In the main analysis, the observed length of the M3 lineage was 126
mutations. This breaks down to 16 observed M3-equivalent SNPs and 110 post-M3
SNPs. Using a singleton false negative rate of 8.8%, this translates to approximately 10.6
(0.088*110/(1–0.088)) unobserved post-M3 SNPs, which gives a calibration length of
120.6 SNPs. This differs from the calibration used in the main text by 1.9%.
Existence of Rare Yet More Basal Lineages
We emphasize that the estimates we derive refer to the coalescence times within our
sample. For the mitochondrial genome, we have likely sampled the most divergent
branches in the tree (34). However for the Y chromosome, our estimate of the TMRCA
reaches as far back as the A1b clade. Inclusion of samples from hgA1a or the newly
discovered hgA0 (5) or hgA00 (49) would push the date further back. However, these
haplogroups are very rare, and it is difficult to assess whether correspondingly divergent
but singular mitochondrial genomes may also await discovery.
Effective Population Size
The Ne differences we observe between males and females are most likely due to a
greater variance in reproductive success among males, a phenomenon influenced by
cultural and demographic factors, such as the practice of polygyny (50). Both purifying
and positive selection could also act to reduce the Ne along the linked regions of the Y
chromosome. However, both forms of selection may have also acted on the mitochondrial
genome. Additional information would be necessary before one could invoke natural
selection as the primary cause of reduced male Ne, and the hypothesis is neither necessary
nor sufficient.
Additional Acknowledgements
This material is based upon work supported by the National Science Foundation Graduate
Research Fellowship under Grant No. DGE-1147470. Any opinion, findings, and
conclusions or recommendations expressed in this material are those of the author(s) and
do not necessarily reflect the views of the National Science Foundation.
19
Fig. S1. Map of populations.
We sampled Y chromosomes and mtDNAs from nine populations including Baka Pygmies from Gabon, Cambodians, Maya from
Mexico’s Yucatán Peninsula, Mbuti Pygmies from the Democratic Republic of Congo, Mozabite Berbers from Algeria, Nzebi from
Gabon, Pashtuns (Pathan) from Pakistan’s North-West Frontier Province, San from Namibia, and Yakut from Siberia.
●
●
●
●
●
●
●
●
●
Baka
Cambodian
Maya
Mbuti
Mozabite
Nzebi
Pashtun
San
Yakut
20
Fig. S2. Sequencing read mapping on Xq21.
Total read depth and the depth of MQ0 reads are plotted for 24 HGDP females. Mean values in contiguous 5 kb windows are shown
along chrXq21. Dashed gray lines indicate the region that corresponds to the “X-transposed” segment of the Y chromosome.
chrX Position (Mb)
DepthinHGDPFemales
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●
●
●
●
●●●
●
●
●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●
●
●
●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●
●
●●●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●
●
●●●●●●●●●●●●●●●●●
●
●
●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●
●
●
●●●●●
●
●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●
●
●●●●
●
●●
●
●
●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●
●
●
●
●●●●●●
●●●
●
●
●
●
●
●
●●
●
●●●
●
●●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●●●●●●●●●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●●●●
●
●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●
●●
●
●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●●
●
●●●●●●●●●●●●●●●
●●
●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●●
●
●●●●
●
●●
●
●●●
●
●
●
●
●●●
●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●
●●●●●●●●
●
●
●●●●●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●●
●●
●●●
●●
●●
●
●
●
●
●
●
●●●●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●
●
●●●
●
●●●●●●
●
●
●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●
●
●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●
●●
●
●●●●●●●●●●●●●●●●●●
●●
●
●
●
●●●●●●●●●●
●●●
●●
●
●
●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●
●
●
●
●
●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●
●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
85 86 87 88 89 90 91 92 93 94 95 96
050100150200250300
Homologue of X−transposed Region●
●
Filtered Depth
MQ0 Depth
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

More Related Content

Similar to Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...GigaScience, BGI Hong Kong
 
Dnaandchromosomes 111109071547-phpapp02
Dnaandchromosomes 111109071547-phpapp02Dnaandchromosomes 111109071547-phpapp02
Dnaandchromosomes 111109071547-phpapp02joy000 renojo
 
Manipulatingproteins 111109090227-phpapp01
Manipulatingproteins 111109090227-phpapp01Manipulatingproteins 111109090227-phpapp01
Manipulatingproteins 111109090227-phpapp01joy000 renojo
 
Dna and chromosomes
Dna and chromosomesDna and chromosomes
Dna and chromosomesaljeirou
 
Dna of human and great ape
Dna of human and great apeDna of human and great ape
Dna of human and great apeLekshmiJohnson
 
Human Genome 2009
Human Genome 2009Human Genome 2009
Human Genome 2009lyonja
 
Karyotype analysis and evolution by Mannat
Karyotype analysis and evolution by MannatKaryotype analysis and evolution by Mannat
Karyotype analysis and evolution by MannatMannatAulakh
 
Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...
Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...
Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...CIAT
 
Unusual struc of y chromosome
Unusual struc of y chromosomeUnusual struc of y chromosome
Unusual struc of y chromosomeSomashree Das
 
Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173Robin Gutell
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfNoraCRuizGuevara
 
Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...
Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...
Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...SABARI KRISHNAN B. B.
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomicsSukhjinder Singh
 

Similar to Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females (20)

Mouse genome
Mouse genomeMouse genome
Mouse genome
 
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
 
Dnaandchromosomes 111109071547-phpapp02
Dnaandchromosomes 111109071547-phpapp02Dnaandchromosomes 111109071547-phpapp02
Dnaandchromosomes 111109071547-phpapp02
 
Manipulatingproteins 111109090227-phpapp01
Manipulatingproteins 111109090227-phpapp01Manipulatingproteins 111109090227-phpapp01
Manipulatingproteins 111109090227-phpapp01
 
Dna and chromosomes
Dna and chromosomesDna and chromosomes
Dna and chromosomes
 
Nucleotide Variation and Selective Pressure in the Mitochondrial Genome of Af...
Nucleotide Variation and Selective Pressure in the Mitochondrial Genome of Af...Nucleotide Variation and Selective Pressure in the Mitochondrial Genome of Af...
Nucleotide Variation and Selective Pressure in the Mitochondrial Genome of Af...
 
Lisbon genome diversity
Lisbon genome diversityLisbon genome diversity
Lisbon genome diversity
 
Dna of human and great ape
Dna of human and great apeDna of human and great ape
Dna of human and great ape
 
Human Genome 2009
Human Genome 2009Human Genome 2009
Human Genome 2009
 
Karyotype analysis and evolution by Mannat
Karyotype analysis and evolution by MannatKaryotype analysis and evolution by Mannat
Karyotype analysis and evolution by Mannat
 
Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...
Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...
Phylogenetic patterns in the genus Manihot (Euphorbiaceae) inferred from anal...
 
Unusual struc of y chromosome
Unusual struc of y chromosomeUnusual struc of y chromosome
Unusual struc of y chromosome
 
Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
 
Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...
Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...
Introduction to HUMAN CHROMOSOME ANALYSIS: Conventional Karyotyping Method (G...
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Final Draft
Final DraftFinal Draft
Final Draft
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 

More from Carlos Bella

Offshore fresh groundwater reserves as a global phenomenon
Offshore fresh groundwater reserves as a global phenomenonOffshore fresh groundwater reserves as a global phenomenon
Offshore fresh groundwater reserves as a global phenomenonCarlos Bella
 
Revealing letters in rolled Herculaneum papyri by X-ray phase-contrast imaging
Revealing letters in rolled Herculaneum papyri by X-ray phase-contrast imagingRevealing letters in rolled Herculaneum papyri by X-ray phase-contrast imaging
Revealing letters in rolled Herculaneum papyri by X-ray phase-contrast imagingCarlos Bella
 
Animal behaviour: Incipient tradition in wild chimpanzees
Animal behaviour: Incipient tradition in wild chimpanzeesAnimal behaviour: Incipient tradition in wild chimpanzees
Animal behaviour: Incipient tradition in wild chimpanzeesCarlos Bella
 
Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...
Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...
Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...Carlos Bella
 
Detection of Radio Emission from Fireballs
Detection of Radio Emission from FireballsDetection of Radio Emission from Fireballs
Detection of Radio Emission from FireballsCarlos Bella
 
Skeptic encyclopedia of pseudoscience
Skeptic encyclopedia of pseudoscienceSkeptic encyclopedia of pseudoscience
Skeptic encyclopedia of pseudoscienceCarlos Bella
 
Preserved flora and organics in impact melt breccias
Preserved flora and organics in impact melt brecciasPreserved flora and organics in impact melt breccias
Preserved flora and organics in impact melt brecciasCarlos Bella
 
An assessment of the temporal bone lesions of the Broken Hill cranium
An assessment of the temporal bone lesions of the Broken Hill craniumAn assessment of the temporal bone lesions of the Broken Hill cranium
An assessment of the temporal bone lesions of the Broken Hill craniumCarlos Bella
 
A Sedna-like body with a perihelion of 80 astronomical units
A Sedna-like body with a perihelion of 80 astronomical unitsA Sedna-like body with a perihelion of 80 astronomical units
A Sedna-like body with a perihelion of 80 astronomical unitsCarlos Bella
 
Fuel gain exceeding unity in an inertially confined fusion implosion
Fuel gain exceeding unity in an inertially confined fusion implosionFuel gain exceeding unity in an inertially confined fusion implosion
Fuel gain exceeding unity in an inertially confined fusion implosionCarlos Bella
 
Meteor Phenomena and Bodies
Meteor Phenomena and BodiesMeteor Phenomena and Bodies
Meteor Phenomena and BodiesCarlos Bella
 
The Origin Of The 1998 June BoöTid Meteor Shower
The Origin Of The 1998 June BoöTid Meteor ShowerThe Origin Of The 1998 June BoöTid Meteor Shower
The Origin Of The 1998 June BoöTid Meteor ShowerCarlos Bella
 
Physics first spectrum of ball lightning
Physics   first spectrum of ball lightningPhysics   first spectrum of ball lightning
Physics first spectrum of ball lightningCarlos Bella
 
Transient Water Vapor at Europa’s South Pole
Transient Water Vapor at Europa’s South PoleTransient Water Vapor at Europa’s South Pole
Transient Water Vapor at Europa’s South PoleCarlos Bella
 
Solid-state plastic deformation in the dynamic interior of a differentiated a...
Solid-state plastic deformation in the dynamic interior of a differentiated a...Solid-state plastic deformation in the dynamic interior of a differentiated a...
Solid-state plastic deformation in the dynamic interior of a differentiated a...Carlos Bella
 
Broadband high photoresponse from pure monolayer graphene photodetector
Broadband high photoresponse from pure monolayer graphene photodetectorBroadband high photoresponse from pure monolayer graphene photodetector
Broadband high photoresponse from pure monolayer graphene photodetectorCarlos Bella
 
Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase
 Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase
Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-PhaseCarlos Bella
 
A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...
A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...
A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...Carlos Bella
 
Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...
Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...
Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...Carlos Bella
 

More from Carlos Bella (20)

Offshore fresh groundwater reserves as a global phenomenon
Offshore fresh groundwater reserves as a global phenomenonOffshore fresh groundwater reserves as a global phenomenon
Offshore fresh groundwater reserves as a global phenomenon
 
Revealing letters in rolled Herculaneum papyri by X-ray phase-contrast imaging
Revealing letters in rolled Herculaneum papyri by X-ray phase-contrast imagingRevealing letters in rolled Herculaneum papyri by X-ray phase-contrast imaging
Revealing letters in rolled Herculaneum papyri by X-ray phase-contrast imaging
 
Animal behaviour: Incipient tradition in wild chimpanzees
Animal behaviour: Incipient tradition in wild chimpanzeesAnimal behaviour: Incipient tradition in wild chimpanzees
Animal behaviour: Incipient tradition in wild chimpanzees
 
Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...
Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...
Cohesive forces prevent the rotational breakup of rubble-pile asteroid (29075...
 
Detection of Radio Emission from Fireballs
Detection of Radio Emission from FireballsDetection of Radio Emission from Fireballs
Detection of Radio Emission from Fireballs
 
Skeptic encyclopedia of pseudoscience
Skeptic encyclopedia of pseudoscienceSkeptic encyclopedia of pseudoscience
Skeptic encyclopedia of pseudoscience
 
Preserved flora and organics in impact melt breccias
Preserved flora and organics in impact melt brecciasPreserved flora and organics in impact melt breccias
Preserved flora and organics in impact melt breccias
 
An assessment of the temporal bone lesions of the Broken Hill cranium
An assessment of the temporal bone lesions of the Broken Hill craniumAn assessment of the temporal bone lesions of the Broken Hill cranium
An assessment of the temporal bone lesions of the Broken Hill cranium
 
A Sedna-like body with a perihelion of 80 astronomical units
A Sedna-like body with a perihelion of 80 astronomical unitsA Sedna-like body with a perihelion of 80 astronomical units
A Sedna-like body with a perihelion of 80 astronomical units
 
Fuel gain exceeding unity in an inertially confined fusion implosion
Fuel gain exceeding unity in an inertially confined fusion implosionFuel gain exceeding unity in an inertially confined fusion implosion
Fuel gain exceeding unity in an inertially confined fusion implosion
 
Meteor Phenomena and Bodies
Meteor Phenomena and BodiesMeteor Phenomena and Bodies
Meteor Phenomena and Bodies
 
The Origin Of The 1998 June BoöTid Meteor Shower
The Origin Of The 1998 June BoöTid Meteor ShowerThe Origin Of The 1998 June BoöTid Meteor Shower
The Origin Of The 1998 June BoöTid Meteor Shower
 
Physics first spectrum of ball lightning
Physics   first spectrum of ball lightningPhysics   first spectrum of ball lightning
Physics first spectrum of ball lightning
 
Nature12917
Nature12917Nature12917
Nature12917
 
Transient Water Vapor at Europa’s South Pole
Transient Water Vapor at Europa’s South PoleTransient Water Vapor at Europa’s South Pole
Transient Water Vapor at Europa’s South Pole
 
Solid-state plastic deformation in the dynamic interior of a differentiated a...
Solid-state plastic deformation in the dynamic interior of a differentiated a...Solid-state plastic deformation in the dynamic interior of a differentiated a...
Solid-state plastic deformation in the dynamic interior of a differentiated a...
 
Broadband high photoresponse from pure monolayer graphene photodetector
Broadband high photoresponse from pure monolayer graphene photodetectorBroadband high photoresponse from pure monolayer graphene photodetector
Broadband high photoresponse from pure monolayer graphene photodetector
 
Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase
 Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase
Formation SiO2 Mass-Independent Oxygen Isotopic Partitioning During Gas-Phase
 
A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...
A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...
A Complete Skull from Dmanisi, Georgia, and the Evolutionary Biology of Early...
 
Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...
Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...
Minor Planet Evidence for Water in the Rocky Debris of a Disrupted Extrasolar...
 

Recently uploaded

Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 

Recently uploaded (20)

Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 

Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

  • 1. DOI: 10.1126/science.1237619 , 562 (2013);341Science et al.G. David Poznik Common Ancestor of Males Versus Females Sequencing Y Chromosomes Resolves Discrepancy in Time to This copy is for your personal, non-commercial use only. clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others here.following the guidelines can be obtained byPermission to republish or repurpose articles or portions of articles ):August 7, 2013www.sciencemag.org (this information is current as of The following resources related to this article are available online at http://www.sciencemag.org/content/341/6145/562.full.html version of this article at: including high-resolution figures, can be found in the onlineUpdated information and services, http://www.sciencemag.org/content/suppl/2013/08/01/341.6145.562.DC1.html can be found at:Supporting Online Material http://www.sciencemag.org/content/341/6145/562.full.html#related found at: can berelated to this articleA list of selected additional articles on the Science Web sites http://www.sciencemag.org/content/341/6145/562.full.html#ref-list-1 , 22 of which can be accessed free:cites 46 articlesThis article http://www.sciencemag.org/content/341/6145/562.full.html#related-urls 1 articles hosted by HighWire Press; see:cited byThis article has been registered trademark of AAAS. is aScience2013 by the American Association for the Advancement of Science; all rights reserved. The title CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience onAugust7,2013www.sciencemag.orgDownloadedfrom
  • 2. Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females G. David Poznik,1,2 Brenna M. Henn,3,4 Muh-Ching Yee,3 Elzbieta Sliwerska,5 Ghia M. Euskirchen,3 Alice A. Lin,6 Michael Snyder,3 Lluis Quintana-Murci,7,8 Jeffrey M. Kidd,3,5 Peter A. Underhill,3 Carlos D. Bustamante3 * The Y chromosome and the mitochondrial genome have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y-chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and the mitochondrial genome, we estimate the time to the most recent common ancestor (TMRCA) of the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome TMRCA to be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages. T he Y chromosome contains the longest stretch of nonrecombining DNA in the human genome and is therefore a pow- erful tool with which to study human history. Estimates of the time to the most recent common ancestor (TMRCA) of the Y chromosome have dif- fered by a factor of about 2 from TMRCA estimates for the mitochondrial genome. Y-chromosome coalescence time has been estimated in the range of 50 to 115 thousand years (ky) (1–3), although larger values have been reported (4, 5), whereas estimates for mitochondrial DNA (mtDNA) range from 150 to 240 ky (3, 6, 7). However, the quality and quantity of data available for these two uni- parental loci have differed substantially. Whereas the complete mitochondrial genome has been resequenced thousands of times (6, 8), fully sequenced diverse Y chromosomes have only recently become available. Previous estimates of the Y-chromosome TMRCA relied on short re- sequenced segments, rapidly mutating micro- satellites, or single-nucleotide polymorphisms (SNPs) ascertained in a small panel of individ- uals and then genotyped in a global panel. These approaches likely underestimate genetic diver- sity and, consequently, TMRCA (9). We sequenced the complete Y chromosomes of 69 males from seven globally diverse pop- ulations of the Human Genome Diversity Panel (HGDP) and two additional African populations: San (Bushmen) from Namibia, Mbuti Pygmies from the Democratic Republic of Congo, Baka Pygmies and Nzebi from Gabon, Mozabite Berbers from Algeria, Pashtuns (Pathan) from Pakistan, Cambodians, Yakut from Siberia, and Mayans from Mexico (fig. S1). Individuals were selected without regard to their Y-chromosome haplogroups. The Y-chromosome reference sequence is 59.36 Mb, but this includes a 30-Mb stretch of constitutive heterochromatin on the q arm, a 3-Mb centromere, 2.65-Mb and 330-kb telomeric pseudoautosomal regions (PAR) that recombine with the X chromosome, and eight smaller gaps. We mapped reads to the remaining 22.98 Mb of assembled reference sequence, which consists of three sequence classes defined by their com- plexity and degree of homology to the X chro- mosome (10): X-degenerate, X-transposed, and ampliconic. Both the high degree of self-identity within the ampliconictractsandthe X-chromosome homology of the X-transposed region render por- tions of the Y chromosome ill suited for short-read sequencing. To address this, we constructed filters that reduced the data to 9.99 million sites (11) 1 Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, CA, USA. 2 Department of Statistics, StanfordUniversity,Stanford,CA,USA.3 DepartmentofGenetics, Stanford University School of Medicine, Stanford, CA, USA. 4 Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA. 5 Department of Human Genetics and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. 6 Department of Psychiatry, Stanford University, Stanford, CA, USA. 7 Institut Pasteur, Unit of Human Evolutionary Genetics, 75015 Paris, France. 8 Centre National de la Recherche Scientifique, URA3012, 75015 Paris, France. *Corresponding author. E-mail: cdbustam@stanford.edu 050100150200250300350400450500 FilteredDepthEWMA 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Position (Mb) 0.00.10.20.30.40.50.60.70.80.91.0 (MQ0/UnfilteredDepth)EWMA Depth Filter MQ0 Ratio Filter Exclusion Mask Inclusion Mask Compatible Site Incompatible Site ... 0 Mb 59.36 Mb X degenerate X transposed Ampliconic Heterochromatic Pseudoautosomal Other Fig. 1. Callability mask for the Y chromosome. Exponentially weighted moving averages of read depth (blue line) and the proportion of reads mapping ambiguously (MQ0 ratio; violet line) versus physical position. Regions with values outside the envelopes defined by the dashed lines (depth) or dotted lines (MQ0) were flagged (blue and violet boxes) and merged for exclusion (gray boxes). The complement (black boxes) defines the regions within which reliable genotype calls can be made. Below, a scatter plot indicates the positions of all observed SNVs. Those incom- patible with the inferred phylogenetic tree (red) are uniformly distributed. The X-degenerate regions yield quality sequence data, ampliconic sequences tend to fail both filters, and mapping quality is poor in the X-transposed region. 2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org562 REPORTS onAugust7,2013www.sciencemag.orgDownloadedfrom
  • 3. (Fig. 1 and fig. S2). We then implemented a hap- loid model expectation-maximization algorithm to call genotypes (11). We identified 11,640 single-nucleotide vari- ants (SNVs) (fig. S3). A total of 2293 (19.7%) are present in dbSNP (v135), and we assigned haplogroups on the basis of the 390 (3.4%) present in the International Society of Genetic Genealogy (ISOGG) database (12) (fig. S4). At SNVs, me- dian haploid coverage was 3.1x (interquartile range 2.6 to 3.8x) (table S1 and fig. S5), and sequence validation suggests a genotype calling error rate on the order of 0.1% (11). Because mutations accumulate over time along a single lengthy haplotype (13), the male- specific region of the Y chromosome provides power for phylogenetic inference. We constructed a maximum likelihood tree from 11,640 SNVs using the Tamura-Nei nucleotide substitution model (Fig. 2) and, in agreement with (14), ob- serve strong bootstrap support (500 replicates) for the major haplogroup branching points. The tree both recapitulates and adds resolution to the previously inferred Y-chromosome phyloge- ny (fig. S6), and it characterizes branch lengths free of ascertainment bias. We identify extra- ordinary depth within Africa, including lineages sampled from the San hunter-gatherers that coalesce just short of the root of the entire tree. This stands in contrast to a tree from autosomal SNP genotypes (15), wherein African branches were considerably shorter than others; genotyp- ing arrays primarily rely on SNPs ascertained in European populations and therefore undersample diversity within Africa. Two regions of reduced branch length in our tree correspond to rapid expansions: the out-of-Africa event (downstream of F-M89) and the agriculture-catalyzed Bantu expansions (downstream of E-M2). Among the three hunter-gatherer populations, we find a rel- atively high number of B2 lineages. Within this haplogroup, six Baka B-M192 individuals form a distinct clade that does not correspond to extant definitions (11) (fig. S7). We estimate this pre- viously uncharacterized structure to have arisen ~35 thousand years ago (kya). We resolve the polytomy of the Y macro- haplogroup F (16) by determining the branching order of haplogroups G, H, and IJK (Fig. 2 and fig. S6). We identified a single variant (rs73614810, a C→T transition dubbed “M578”) for which haplogroup G retains the ancestral allele, whereas its brother clades (H and IJK) share the derived allele. Genotyping M578 in a diverse panel con- firmed the finding (table S2). We thereby infer more recent common ancestry between hgH and hgIJK than between either and hgG. M578 de- 0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 1100.0 1200.0 H-M138Cambodian N-M231Cambodian E-P59 Nzebi Q-M3 Maya E-P116 Nzebi E-M191Nzebi E-P252 Nzebi B-P70 San E-U290 Nzebi B-M192Baka N-L708 Yakut E-M183Mozabite N-L708 Yakut E-U290 Baka E-P116 Nzebi N-L708 Yakut L-M357 Pashtun R-L657 Pashtun E-M154Nzebi A-P28 San Q-L54 Maya B-M192Baka A-M14 Baka B-M30 Baka E-P277 Nzebi E-M183Mozabite B-M192Baka O-Page23 Cambodian E-P278.1Nzebi E-P252 Baka E-P277 Nzebi E-U290 Nzebi E-P278.1Nzebi E-P277 Nzebi B-M211Baka A-M51San E-P252 Baka E-M191Nzebi E-P252 Mbuti G-M406Pashtun E-L515 Baka N-L708 Yakut E-P252 Baka E-M183Mozabite B-M112Baka B-P6San B-M211Baka E-P277 Nzebi B-M192Baka A-P262San G-M377Pashtun E-P277 Nzebi B-M109Nzebi E-P277 Mbuti E-M183Mozabite B-M112Baka B-Page18 Mbuti B-M192Baka E-P277 Nzebi B-P6San E-P252 Mbuti B-M192Mbuti E-P252 Nzebi B-M30 Baka B-M192Baka E-P277 Nzebi E-P252 Baka O-M95 Cambodian B-M112Baka CT-M168 N-Page56 B-M150 P-M45 O-P186 E-U290 A-M6 B-P6 G-P287 B-M182 E-M2/M180 Q-L54 B-M211 E-M191 E-L514 BT-M42 E-P179 KxLT-M526 B-M192 E-U175/P277 N-L708 A-M14 B-M30 F-M89 E-M183 E-P252 A-L419 K-M9 NO-M214 BEFT(Non-African)A Haplogroups HIJK-M578 Fig. 2. Y-chromosome phylogeny inferred from genomic sequencing. This tree recapitulates the previously known topology of the Y-chromosome phylogeny; however, branch lengths are now free of ascertainment bias. Branches are drawn proportional to the number of derived SNVs. Internal branches are labeled with defining ISOGG variants inferred to have arisen on the branch. Leaves are colored by major haplogroup cluster and labeled with the most derived mutation observed and the population from which the individual was drawn. Previously uncharacterized structure within African hgB2 is indicated in orange. (Inset) Resolution of a polytomy was possible through the identification of a variant for which hgG retains the ancestral allele, whereas hgH and hgIJK share the derived allele. www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 563 REPORTS onAugust7,2013www.sciencemag.orgDownloadedfrom
  • 4. fines an early diversification episode of the Y phylogeny in Eurasia (11). To account for missing genotypes, we as- signed each SNV to the root of the smallest sub- tree containing all carriers of one allele or the other and inferred that the allele specific to the subtree was derived (fig. S8). We used the chim- panzee Y-chromosome sequence to polarize 398 variants assigned to the deepest split—a task complicated by substantial structural divergence (11, 17). We estimated the coalescence time of all Y chromosomes using both a molecular clock–based frequentist estimator and an empirical Bayes ap- proach that uses a prior distribution of TMRCA from coalescent theory and conducts Markov chain simulation to estimate the likelihood of param- eters given a set of DNA sequences (GENETREE) (11, 18) (Table 1). To directly compare the TMRCA of the Y chromosome to that of the mtDNA, we estimated their respective mutation rates by cali- brating phylogeographic patterns from the initial peopling of the Americas, a recent human event with high-confidence archaeological dating. Archaeological evidence indicates that humans first colonized the Americas ~15 kya via a rapid coastal migration that reached Monte Verde II in southern Chile by 14.6 kya (19). The two Native American Mayans represent Y-chromosome hgQ lineages, Q-M3 and Q-L54*(xM3), that likely diverged at about the same time as the initial peopling of the continents. Q is defined by the M242 mutation that arose in Asia. A descendent haplogroup, Q-L54, emerged in Siberia and is ancestral to Q-M3. Because the M3 mutation appears to be specific to the Americas (20), it likely occurred after the initial entry, and the prevalence of M3 in South America suggests that it emerged before the southward migratory wave. Consequently, the divergence between these two lineages provides an appropriate cal- ibration point for the Y mutation rate. The large number of variants that have accumulated since divergence, 120 and 126, contrasts with the pedigree-based estimate of the Y-chromosome mutation rate, which is based on just 4 mutations (21). Using entry to the Americas as a calibration point, we estimate a mutation rate of 0.82 × 10−9 per base pair (bp) per year [95% confidence interval (CI): 0.72 × 10−9 to 0.92 × 10−9 /bp/year] (table S3). False negatives have minimal effect on this estimate due to the low probability, at 5.7x and 8.5x coverage, of observing fewer than two reads at a site (observed proportions: 3.1% and 0.6%) and due to the fact that the number of unobserved singletons possessed by one individual is offset by a similar number of Q doubletons unobserved in the same individual and thereby misclassified as singletons possessed by the other (11) (figs. S9 and S10). This calibra- tion approach assumes approximate coincidence between the expansion throughout the Americas and the divergence of Q-M3 and Q-L54*(xM3), but we consider deviation from this assumption and identify a strict lower bound on the point of divergence using sequences from the 1000 Ge- nomes Project (11). As a comparison point, we consider the out-of-Africa expansion of modern humans, which dates to approximately 50 kya (22) and yields a similar mutation rate of 0.79 × 10−9 /bp/year. We constructed an analogous pipeline for high coverage (>250x) mtDNA sequences from the 69 male samples and an additional 24 females from the seven HGDP populations (11) (fig. S11). As in the Y-chromosome analysis, we calibrated the mtDNA mutation rate using divergence with- in the Americas. We selected the pan-American hgA2, one of several initial founding haplogroups among Native Americans. The star-shaped phy- logeny of hgA2 subclades suggests that its di- vergence was coincident with the rapid dispersal upon the initial colonization of the continents (23). Calibration on 108 previously analyzed hgA2 sequences (11) (fig. S12) yields a point estimate equivalent to that from our seven Mayan mtDNAs, but within a narrower confidence interval. From this within-human calibration, we estimate a mu- tation rate of 2.3 × 10−8 /bp/year (95% CI: 2.0 × 10−8 to 2.5 × 10−8 /bp/year), higher than that from human-chimpanzee divergence but similar to other estimates using within-human calibration points (24, 25). The global TMRCA estimate for any locus con- stitutes an upper bound for the time of human population divergence under models without gene flow. We estimate the Y-chromosome TMRCA to be 138 ky (120 to 156 ky) and the mtDNA TMRCA to be 124 ky (99 to 148 ky) (Table 1) (11). Our mtDNA estimate is more recent than many previous studies, the majority of which used mu- tation rates extrapolated from between-species divergence. However, mtDNA mutation rates are subject to a time-dependent decline, with pedigree- based estimates on the faster end of the spectrum and species-based estimates on the slower. Be- cause of this time dependency and the need to calibrate the Yand mtDNA in a comparable man- ner, it is more appropriate here to use within- human clade estimates of the mutation rate. Rather than assume the mutation rate to be a known constant, we explicitly account for the uncertainty in its estimation by modeling each TMRCA as the ratio of two random variables. We estimate the ratio of the mtDNA TMRCA to that of the Y chromosome to be 0.90 (95% CI: 0.68 to 1.11) (fig. S13). If, as argued above, the divergence of the Y-chromosome Q lineages occurred at approximately the same time as that of the mtDNA A2 lineages, then the TMRCA ratio is invariant to the specific calibration time used. Regardless, the conclusion of parity is robust to possible discrepancy between the di- vergence times within the Americas (11). Using comparable calibration approaches, the Y and Table 1. TMRCA and Ne estimates for the Y chromosome and mtDNA. Pop., population. Method Y chromosome mtDNA Pop. n TMRCA* Ne Pop. n TMRCA* Ne Molecular clock All 69 139 (120–156) 4500† All 93 124 (99–148) 9500† GENETREE‡ San 6 128 (112–146) 3800 Nzebi 18 105 (91–119) 11,500 Baka 11 122 (106–137) 1800 Mbuti 6 121 (100–143) 3700 *Employs mutation rate estimated from within-human calibration point. Times measured in ky. †Uses Watterson’s estimator, %qw. ‡Each coalescent analysis restricted to a single population spanning the ancestral root (11). Fig. 3. Similarity of TMRCA does not imply equivalent Ne of males and females. The TMRCA for a given locus is drawn from a predata (i.e., prior) distribution that is a func- tion of Ne, generation time, sample size, and demo- graphic history. Consider the distribution of possible TMRCAs for a set of 100 uniparental chromosomes. Although the Mbuti mtDNA Ne is twice as large as that of the Baka Y chromosome, the corresponding predata TMRCA distributions overlap considerably. 0.0000.0020.0040.0060.0080.010 Time (ky) ProbabilityDensity 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org564 REPORTS onAugust7,2013www.sciencemag.orgDownloadedfrom
  • 5. mtDNA coalescence times are not significantly different. This conclusion would hold whether or not an alternative approach would yield more definitive TMRCA estimates. Our observation that the TMRCA of the Y chromosome is similar to that of the mtDNA does not imply that the effective population sizes (Ne) of males and females are similar. In fact, we observe a larger Ne in females than in males (Table 1). Although, due to its larger Ne, the dis- tribution from which the mitochondrial TMRCA has been drawn is right-shifted with respect to that of the Y-chromosome TMRCA, the two dis- tributions have large variances and overlap (Fig. 3). Dogma has held that the common ancestor of human patrilineal lineages, popularly referred to as the Y-chromosome “Adam,” lived considera- bly more recently than the common ancestor of female lineages, the so-called mitochondrial “Eve.” However, we conclude that the mitochon- drial coalescence time is not substantially greater than that of the Y chromosome. Indeed, due to our moderate-coverage sequencing and the ex- istence of additional rare divergent haplogroups, our analysis may yet underestimate the true Y-chromosome TMRCA. References and Notes 1. J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun, M. W. Feldman, Mol. Biol. Evol. 16, 1791–1798 (1999). 2. R. Thomson, J. K. Pritchard, P. Shen, P. J. Oefner, M. W. Feldman, Proc. Natl. Acad. Sci. U.S.A. 97, 7360–7365 (2000). 3. H. Tang, D. O. Siegmund, P. Shen, P. J. Oefner, M. W. Feldman, Genetics 161, 447–459 (2002). 4. M. F. Hammer, Nature 378, 376–378 (1995). 5. F. Cruciani et al., Am. J. Hum. Genet. 88, 814–818 (2011). 6. M. Ingman, H. Kaessmann, S. Pääbo, U. Gyllensten, Nature 408, 708–713 (2000). 7. R. L. Cann, M. Stoneking, A. C. Wilson, Nature 325, 31–36 (1987). 8. P. A. Underhill, T. Kivisild, Annu. Rev. Genet. 41, 539–564 (2007). 9. M. A. Jobling, C. Tyler-Smith, Nat. Rev. Genet. 4, 598–612 (2003). 10. H. Skaletsky et al., Nature 423, 825–837 (2003). 11. Materials and methods are available as supplementary materials on Science Online. 12. ISOGG, International Society of Genetic Genealogy (2013); available at www.isogg.org/. 13. P. A. Underhill et al., Ann. Hum. Genet. 65, 43–62 (2001). 14. W. Wei et al., Genome Res. 23, 388–395 (2013). 15. J. Z. Li et al., Science 319, 1100–1104 (2008). 16. T. M. Karafet et al., Genome Res. 18, 830–838 (2008). 17. J. F. Hughes et al., Nature 463, 536–539 (2010). 18. R. C. Griffiths, S. Tavaré, Philos. Trans. R. Soc. London B Biol. Sci. 344, 403–410 (1994). 19. T. Goebel, M. R. Waters, D. H. O’Rourke, Science 319, 1497–1502 (2008). 20. M. C. Dulik et al., Am. J. Hum. Genet. 90, 229–246 (2012). 21. Y. Xue et al.; Asan, Curr. Biol. 19, 1453–1457 (2009). 22. R. G. Klein, Evol. Anthropol. 17, 267–281 (2008). 23. S. Kumar et al., BMC Evol. Biol. 11, 293 (2011). 24. S. Y. W. Ho, M. J. Phillips, A. Cooper, A. J. Drummond, Mol. Biol. Evol. 22, 1561–1568 (2005). 25. B. M. Henn, C. R. Gignoux, M. W. Feldman, J. L. Mountain, Mol. Biol. Evol. 26, 217–230 (2009). Acknowledgments: We thank O. Cornejo, S. Gravel, D. Siegmund, and E. Tsang for helpful discussions; M. Sikora and H. Costa for mapping reads from Gabonese samples; and H. Cann for assistance with HGDP samples. This work was supported by National Library of Medicine training grant LM-07033 and NSF graduate research fellowship DGE-1147470 (G.D.P.); NIH grant 3R01HG003229 (B.M.H. and C.D.B.); NIH grant DP5OD009154 (J.M.K. and E.S.); and Institut Pasteur, a CNRS Maladies Infectieuses Émergentes Grant, and a Foundation Simone et Cino del Duca Research Grant (L.Q.M.). P.A.U. consulted for, P.A.U. and B.M.H. have stock in, and C.D.B. is on the advisory board of a project at 23andMe. C.D.B. is on the scientific advisory boards of Personalis, Inc.; InVitae (formerly Locus Development, Inc.); and Ancestry.com. M.S. is a scientific advisory member and founder of Personalis, a scientific advisory member for Genapsys Former, and a consultant for Illumina and Beckman Coulter Society for American Medical Pathology. B.M.H. formerly had a paid consulting relationship with Ancestry.com. Variants have been deposited to dbSNP (ss825679106–825690384). Individual level genetic data are available, through a data access agreement to respect the privacy of the participants for transfer of genetic data, by contacting C.D.B. Supplementary Materials www.sciencemag.org/cgi/content/full/341/6145/562/DC1 Materials and Methods Supplementary Text Figs. S1 to S13 Tables S1 to S3 Data File S1 References (26–51) 11 March 2013; accepted 25 June 2013 10.1126/science.1237619 Low-Pass DNA Sequencing of 1200 Sardinians Reconstructs European Y-Chromosome Phylogeny Paolo Francalacci,1 * Laura Morelli,1 † Andrea Angius,2,3 Riccardo Berutti,3,4 Frederic Reinier,3 Rossano Atzeni,3 Rosella Pilu,2 Fabio Busonero,2,5 Andrea Maschio,2,5 Ilenia Zara,3 Daria Sanna,1 Antonella Useli,1 Maria Francesca Urru,3 Marco Marcelli,3 Roberto Cusano,3 Manuela Oppo,3 Magdalena Zoledziewska,2,4 Maristella Pitzalis,2,4 Francesca Deidda,2,4 Eleonora Porcu,2,4,5 Fausto Poddie,4 Hyun Min Kang,5 Robert Lyons,6 Brendan Tarrier,6 Jennifer Bragg Gresham,6 Bingshan Li,7 Sergio Tofanelli,8 Santos Alonso,9 Mariano Dei,2 Sandra Lai,2 Antonella Mulas,2 Michael B. Whalen,2 Sergio Uzzau,4,10 Chris Jones,3 David Schlessinger,11 Gonçalo R. Abecasis,5 Serena Sanna,2 Carlo Sidore,2,4,5 Francesco Cucca2,4 * Genetic variation within the male-specific portion of the Y chromosome (MSY) can clarify the origins of contemporary populations, but previous studies were hampered by partial genetic information. Population sequencing of 1204 Sardinian males identified 11,763 MSY single-nucleotide polymorphisms, 6751 of which have not previously been observed. We constructed a MSY phylogenetic tree containing all main haplogroups found in Europe, along with many Sardinian-specific lineage clusters within each haplogroup. The tree was calibrated with archaeological data from the initial expansion of the Sardinian population ~7700 years ago. The ages of nodes highlight different genetic strata in Sardinia and reveal the presumptive timing of coalescence with other human populations. We calculate a putative age for coalescence of ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–based estimates. N ew sequencing technologies have pro- vided genomic data sets that can recon- struct past events in human evolution more accurately (1). Sequencing data from the male-specific portion of the Y chromosome (MSY) (2), because of its lack of recombination and low mutation, reversion, and recurrence rates, can be particularly informative for these evolution- ary analyses (3, 4). Recently, high-coverage Y chromosome sequencing data from 36 males from different worldwide populations (5) assessed 6662 phylogenetically informative variants and estimated the timing of past events, including a putative coalescence time for modern humans of ~101,000 to 115,000 years ago. MSY sequencing data reported to date still represent a relatively small number of individuals from a few populations. Furthermore, dating esti- mates are also affected by the calibration of the 1 Dipartimento di Scienze della Natura e del Territorio, Uni- versitàdiSassari,07100Sassari,Italy.2 IstitutodiRicercaGenetica e Biomedica (IRGB), CNR, Monserrato, Italy. 3 Center for Ad- vanced Studies, Research and Development in Sardinia (CRS4), Pula, Italy. 4 Dipartimento di Scienze Biomediche, Università di Sassari, 07100 Sassari, Italy. 5 Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA. 6 DNA Sequencing Core, University of Michigan, Ann Arbor, MI 48109, USA. 7 Center for Human Genetics Re- search, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA. 8 Dipartimento di Biologia, Universitàdi Pisa, 56126 Pisa, Italy. 9 Departamento de Genética, Antropología Física y Fisiología Animal, Universi- dad del País Vasco/Euskal Herriko Unibertsitatea, 48080 Bilbao, Spain. 10 Porto Conte Ricerche, Località Tramariglio, Alghero, 07041 Sassari, Italy. 11 Laboratory of Genetics, National Institute on Aging, Baltimore, MD 21224, USA. *Corresponding author. E-mail: pfrancalacci@uniss.it (P.F.); fcucca@uniss.it (F.C.) †Laura Morelli prematurely passed away on 20 February 2013. This work is dedicated to her memory. www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 565 REPORTS onAugust7,2013www.sciencemag.orgDownloadedfrom
  • 6. www.sciencemag.org/cgi/content/341/6145/562/DC1 Supplementary Materials for Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females G. David Poznik, Brenna M. Henn, Muh-Ching Yee, Elzbieta Sliwerska, Ghia M. Euskirchen, Alice A. Lin, Michael Snyder, Lluis Quintana-Murci, Jeffrey M. Kidd, Peter A. Underhill, Carlos D. Bustamante* *Corresponding author. E-mail: cdbustam@stanford.edu Published 2 August 2013, Science 341, 562 (2013) DOI: 10.1126/science.1237619 This PDF file includes: Materials and Methods Supplementary Text Figs. S1 to S13 Tables S1 to S3 References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/341/6245/562/DC1) Data File S1. Sample, phylogeny, and variant data (zipped archive). Data File S2. Y chromosome genotype calls. To protect participant privacy, this zipped archive is available through a data access agreement (DAA) for transfer of genetic data by contacting C.D.B. Data File S3. Y chromosome mapped sequencing reads. This BAM file is also available via the DAA described above. Mapping, quality score recalibration, and indel realignment are described in Materials and Methods.
  • 7. 2 Table of Contents Materials and Methods.............................................................................................. 4 Sequencing.......................................................................................................................... 4 Genotypes ........................................................................................................................... 4 Validation............................................................................................................................ 5 Phylogenetic Inference........................................................................................................ 5 mtDNA Analysis................................................................................................................. 6 Frequentist Estimation of TMRCA ......................................................................................... 6 Empirical Bayesian Estimation of TMRCA and Ne: GENETREE......................................... 10 Predata Distribution of TMRCA ........................................................................................... 11 Supplementary Text.................................................................................................. 12 Novel Y Chromosome Phylogenetic Structure................................................................. 12 Imputation......................................................................................................................... 12 Calibration and Mutation Rate Estimation ....................................................................... 13 Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation..................... 14 Calibration Time............................................................................................................... 17 Existence of Rare Yet More Basal Lineages .................................................................... 18 Effective Population Size.................................................................................................. 18 Additional Acknowledgements......................................................................................... 18
  • 8. 3 Supplementary Figures Fig. S1. Map of populations. ............................................................................................ 19 Fig. S2. Sequencing read mapping on Xq21. ................................................................... 20 Fig. S3. Quality control and genotype calling on the Y chromosome.............................. 21 Fig. S4. Cross-tabulation of populations and Y haplogroups........................................... 22 Fig. S5. Call rate and mean sequencing coverage on the Y chromosome........................ 23 Fig. S6. Y chromosome phylogenetic backbone. ............................................................. 24 Fig. S7. Novel structure in Y hgB2. ................................................................................. 25 Fig. S8. Phylogeny-aware imputation. ............................................................................. 26 Fig. S9. Y chromosome hgQ clade with Phase 1 1000 Genomes samples included........ 27 Fig. S10. Sequencing coverage for Mayan HGDP00856 at singleton sites. .................... 28 Fig. S11. mtDNA phylogeny............................................................................................ 29 Fig. S12. mtDNA calibration tree..................................................................................... 30 Fig. S13. Comparing the Y chromosome TMRCA to that of mtDNA.................................. 31 Supplementary Tables Table S1. Y chromosome summary of samples............................................................... 32 Table S2. M578 genotyping results. ................................................................................ 34 Table S3. Mutation rate point estimates........................................................................... 36 Supplementary Data Data File S1. Sample, phylogeny, and variant data. ........................................................ 37 Data File S2. Y chromosome genotype calls................................................................... 37 Data File S3. Y chromosome mapped sequencing reads................................................. 37 FTP Addresses and Accession Numbers for External Data....................... 38 Y Chromosome hgQ Sequences from the 1000 Genomes Project ................................... 38 Complete mtDNA hgA2 Sequences: GenBank Accession Numbers............................... 38 References and Notes................................................................................................ 39
  • 9. 4 Materials and Methods Sequencing We prepared genomic libraries (26) from cell lines (HGDP) and blood (Gabonese), then sequenced the libraries on Illumina HiSeq 2000 machines at the Stanford Center for Genomics and Personalized Medicine. We used BWA (27) to map paired 101 bp reads to the GRCh37 human reference, removed PCR duplicates with Picard (28), and then utilized the Genome Analysis Tool Kit (GATK) (29, 30) to recalibrate quality scores, perform local realignment around candidate indels, and compute genotype likelihoods. Genotypes Callability Mask To learn directly from the read data the boundaries of the regions within which short-read sequencing could yield reliable variant calls, we calculated average filtered read depth across all samples in contiguous 1 kb windows and computed an exponentially-weighted moving average (EWMA) of these values (Fig. 1). Regions for which the EWMA deviated from a narrow envelope were identified as problematic. Those of depressed depth corresponded to ampliconic sequences, within which reads do not map uniquely and were thus filtered out. Regions of inflated depth corresponded to heterochromatin, where naïve application of standard genotype calling methods would give the impression of abundant heterozygosity due to the pileup of highly similar reads around the borders of unassembled regions. After constructing the depth-based filter, we repeated this procedure for the MQ0 ratio, the proportion of unfiltered reads with fully ambiguous mapping. Although the X-transposed region showed no deviation in the depth-based mask, it failed the MQ0 ratio based mask. In females we found depressed read depth in the homologous region of the X chromosome (Fig. S2); we hypothesize that in males, each of whom possesses one X and one Y, there is an equal exchange of mismapped reads between the two chromosomes. The depth and MQ0 masks were merged and smoothed, leaving 10.45 Mb of sequence for down-stream quality control. Site-Level Quality Control With the regional mask in hand, we defined a series of site-level quality control filters (Fig. S3A). Of the 22,974,737 mapped coordinates, 12,532,580 fell within the bounds of the regional exclusion mask. A further 129,411 were excluded due to an MQ0 ratio greater than or equal to 0.10, and 170,144 were excluded because more than 20 samples had missing genotypes, either due to an absence of sequencing reads or to a heterozygous maximum likelihood genotype (Fig. S3B). The remaining polymorphic sites had a median depth (across all samples) of 265, and we filtered out all sites whose depth was outside three median absolute deviations of this value, thus excluding 12,425 with depth above 371 and 141,512 below 159 (Fig. S3C). Finally, we culled 547 sites with a heterozygous maximum likelihood genotype in more than seven samples (Fig. S3D). This left 9,988,118 callable sites. Of 432 ISOGG SNPs with observed variation in our data,
  • 10. 5 393 pass the regional and mapping quality filters, and of these, just one failed the missingness filter and a further two the depth filter. Genotype Calling To call genotypes, we implemented a haploid model EM algorithm that treated allele frequency as the latent variable and used the homozygous state genotype likelihoods calculated by GATK. Genotypes with a heterozygous maximum likelihood state were classified as missing because calls in such cases were found to be disproportionately incompatible with the inferred phylogeny. Validation The false positive rate is kept low primarily by the fact that GATK generally requires at least 2 reads of support to identify a site as variable. In addition, we exclude sites incompatible with the phylogeny. Though this filter discards some genuine homoplasic variants, the class is enriched for false positives, and we have chosen to err on the side of conservatism. We consider three means of validation. Sanger Sequencing We validated Y chromosome genotypes for the 29 male HGDP samples at 46 sites using a combination of targeted PCR and Sanger sequencing (3 sites), and exome capture followed by Illumina sequencing (43 sites). Validation failed to yield data for two genotypes, and we compared the remaining 1,245 genotypes to the main data set to find a concordance rate of 99.92%. Just one genotype was discordant (M150, hg19 position 21869519, in HGDP00462). The genotype had zero sequencing reads of support, and the individual had been imputed to carry the reference allele whereas the validation data indicated that this sample actually carries the non-reference allele. Only one other sample, the nearest neighbor to HGDP00462, also carried the non-reference allele, and this illustrates the fact that it is impossible to properly impute missing genotypes for sites otherwise identified as singletons (Supplementary Text, “Imputation” section). Minimally Diverged Samples We also consider private variation among minimally diverged individuals to argue that sequencing errors are minimized in our study. Specifically, we observe a cluster of five Baka hgB2 samples with just a handful of singletons per lineage. This group approximates a replication set and thus gives tight upper bounds on the false positive variant rate. Haplogroup Assignments All HGDP haplogroup assignments were consistent with prior ISOGG designations. Phylogenetic Inference We used MEGA5 (31) to construct maximum likelihood phylogenetic trees.
  • 11. 6 mtDNA Analysis mtDNA Pipeline To call mitochondrial haplogroups, we converted sequences from the GRCh37 to the rCRS coordinate system and imported to HaploGrep (32), which draws on the Phylotree database (33). We explicitly utilized data presented in Table 1 of Behar et al. (34) to polarize alleles for variants assigned to the most ancient split—that between hgL0 and the rest of the tree (Fig. S11). Whereas the mutation rate on the Y chromosome is sufficiently low that we could regard base substitutions as unique events and simply discard sites that were incompatible with the phylogeny, excluding sites would have been inappropriate for the mitochondrial genome, in which a much higher mutation rate has led to considerable homoplasy. To account for this, we split sites with multiple substitutions into pseudo-sites, each of which constitute a unique event. We discarded a few mutational hotspot sites with evidence for more than four unique substitution events. Calibration Based on mtDNA hgA2 Since there are far fewer segregating sites in the mitochondrial genome, and we only had seven hgA2 lineages, we used 108 publicly available hgA2 Native American sequences to calibrate. Kumar et al. (23) list 568 accession numbers for mitochondrial genomes, 134 of which belong to hgA2 and are of American descent. We downloaded the subset of 108 entries that included the full mtDNA sequence and, along with the GRCh37 reference sequence, conducted a multiple alignment using MUSCLE (35). We then called haplogroups, built a tree (Fig. S12), assigned variants to branches, and resolved homoplasies as described above. Frequentist Estimation of TMRCA The Molecular Clock Under the infinite sites model, mutations accumulate in a Poisson process of rate µl, the locus-wide mutation rate. To estimate TMRCA, molecular clock approaches first estimate the mean number of derived mutations per lineage and then divide by an estimate of the mutation rate. For both the Y chromosome and the mtDNA, we estimate TMRCA with: where D is the sample average of { Di }, the inferred number of mutations accumulated by each lineage since the global MRCA: ˆT = D ˆµly , D = 1 n nX i=1 Di.
  • 12. 7 We estimated the { Di } using a maximum likelihood phylogeny (Fig. 2), and we estimate the yearly mutation rate, µly, as: where t is the known TMRCA of the calibration subclade and C is the sample average of { Ci }, the number of derived mutations acquired by each lineage since the common ancestor of the subtree: Here nc is the number of individuals within the calibration subclade. is therefore a scaled ratio of two random variables: TMRCA Confidence Intervals From the frequentist perspective, we consider T a fixed but unknown constant, and we are interested in the sampling variance of our estimator conditional on its true value. Since the calibration subtree is a small fraction of the total tree, D and C are approximately uncorrelated. This fact simplifies the expression for the standard deviation of a ratio of random variables, which is obtained using the δ method (36): Since both D and C are sums of Poisson random variables with a large number of total events, each is well approximated by the normal distribution. Consequently, their ratio is also approximately normally distributed (37). Therefore, if we are able to compute σD|T and σC, we can construct a confidence interval for T. We first consider σD|T. The { Di } are identically Poisson distributed, but they are not independent due to the shared internal branches (3). Thus, Since each Di is a Poisson random variable, its variance is equal to its mean. Now consider samples i and j. The numbers of mutations that have accumulated in each since ˆµly = C t , C = 1 nc ncX i=1 Ci. € ˆT ˆT = t D C . ˆT|T ⇡ t C s✓ D C C ◆2 + 2 D|T . 2 D|T = Var[D|T] = 1 n2 " X i Var[Di|T] + 2 · X i X j>i Cov [Di, Dj|T] # .
  • 13. 8 their MRCA are independent. However, they share all mutations possessed by their MRCA. Thus, where Dij is the number of derived variants possessed by the common ancestor of i and j. Let I denote the set of internal branches, and let bs and bl be the number of descendants and the length of a branch, b, respectively. Each internal branch will be shared by bs choose 2 pairs of individuals. Thus, which gives: An identical argument applies to σC within the calibration subtree. We, therefore, construct a 95% confidence interval for TMRCA as: The bias of the point estimator is minimal (36). Precision of TMRCA Estimation The standard error for the mean estimate of a Poisson random variable with mean µlT is € µlT n , so the coefficient of variation (the ratio of the standard error to the mean) declines in proportion to € nµlT . On the Y chromosome, T is large and, because the non- recombining locus is so long, µl is quite large as well. Consequently, the standard error for estimating the mean branch length is relatively small, and the greater source of uncertainty lies in estimating the mutation rate, where the time intervals over which mutations have accumulated are shorter, and the number of lineages is smaller. However, µl is sufficiently large that we could derive a narrow confidence interval based solely on the two hgQ lineages we had sequenced. In contrast, for the mtDNA, the uncertainty due to σD|T exceeds that due to σC. An Alternative Frequentist Estimator Cov [Di, Dj|T] = Dij, 2 · X i X j>i Cov [Di, Dj|T] = 2 · X b2I ✓ bs 2 ◆ bl = X b2I bs(bs 1)bl, D|T = 1 n sX i Di + X b2I bs(bs 1)bl. T = ˆT ± z0.025 · ˆT|T T = t 2 4D C ± z0.025 · 1 C v u u t ✓ D C C ◆2 + 1 n2 X i Di + X b2I bs(bs 1)bl !3 5 .
  • 14. 9 An alternative frequentist estimator defines D as half the average mutational distance dij between pairs of individuals that span the ancestral root (3): Here, L and R represent sets of individuals on the left and right side of the root. This estimator is less well-suited to our data set. We have four Y hgA individuals on the left side of the tree and 65 individuals on the right side. This partition-based approach effectively upweights information from the hgA samples, since all distances are measured with respect to a member of this clade. However, we have lower effective coverage on the internal branches of hgA than elsewhere in the tree. This is due to both the lower number of samples and the fact that hgA lineages are highly diverged. Consequently, these are exactly the samples for which false negatives are of greatest potential impact. For the sake of comparison, the TMRCA point estimates from this approach are 134 ky and 118 ky for the Y chromosome and mtDNA, respectively. Estimating the Ratio of mtDNA TMRCA to Y TMRCA To compare the TMRCA of the Y chromosome to that of the mtDNA, we estimate the ratio: where we define M and Y as the fixed but unknown unscaled TMRCA of the mtDNA and Y respectively, and R as the ratio M / Y. The quantity τ = tm / ty is the ratio of coalescence times of the Native American lineages, mtDNA hgA2 and Y chromosome hgQ. Our estimator of γ is: where The standard error is: Since R is the ratio of two random variables, its standard error is: D = 1 2|L||R| X i2L X j2R dij. = Tm Ty = tmM tyY = ⌧R, ˆ = ⌧ ˆR = ⌧ ˆM ˆY , ˆM = Dm/Cm, ˆY = Dy/Cy, ˆR = ˆM/ˆY . ˆ| = ⌧ ˆR|M,Y .
  • 15. 10 where € ρ = Corr[ ˆM | M, ˆY |Y ]. We cannot disregard the correlation term in this case. If the TMRCA of male and female lineages are correlated, their estimates will be as well, though the correlation of the estimates would necessarily be less than that of the true values due to the uncertainty in both variables. Confidence bands for γ are defined by: To assume zero correlation would be conservative, as positive correlation reduces the variance. We consider representative values of ρ for the sake of comparison (Fig. S13). Again, the bias of the point estimator is minimal (36). Empirical Bayesian Estimation of TMRCA and Ne: GENETREE As distributed, GENETREE can handle only 99 sites per run, but we modified the source code to enable runs of several thousand SNPs. First, we perform a grid search to obtain a maximum likelihood estimate for the scaled mutation rate, θ = 2Neµlg, where µlg is the locus-wide per generation mutation rate. We then simulate the posterior distribution of TMRCA, conditional on this estimate. We restricted each analysis to a single population so that the assumption of exchangeability of lineages (38) would hold. As the TMRCA is determined by the deepest coalescence in a sample, we exclusively analyzed populations that sample from both sides of the tree (Fig. 2): the San and Baka for the Y chromosome and the Mbuti and Nzebi for the mitochondrial genome. Results from the Baka and Mbuti Pygmy populations are the most directly comparable (Table 1).! We excluded several lineages from the GENETREE analyses. In the Baka, we excluded three samples possessing high levels of autosomal identity by descent with another individual, as inferred with Illumina Omni SNP arrays. We also excluded six Baka hgE samples, as these likely represent West African agriculturalist lineages that have introgressed into the Baka a few thousand years ago (39) in violation of the exchangeability assumption of coalescent theory. In the mitochondrial analysis we removed two Nzebi and one Mbuti because GENETREE does not allow for identical lineages. Point estimates for the Baka Y chromosomes reflect averages of multiple coalescent runs. Each run subsampled 1500 (of 2927) segregating sites to overcome computation limitations for the full dataset. Estimates for the Mbuti mtDNAs reflect averages of multiple coalescent runs, each with a different random seed, as these runs were more variable due to a smaller Poisson mean (nµl). ˆR|M,Y ⇡ 1 E[ˆY |Y ] v u u t E[ ˆM|M] E[ˆY |Y ] ˆY |Y !2 + 2 ˆM|M 2⇢ ˆM|M ˆY |Y E[ ˆM|M] E[ˆY |Y ] , = ⌧ " ˆM ˆY ± z0.025 · ˆR|M,Y # .
  • 16. 11 Coalescent theory measures time in units of Ne generations. To convert to years, we use the maximum likelihood estimate of θ, the gender-specific generation time (g; Table S3), and the Native American calibration estimate for µly, the locus-wide per year mutation rate: GENETREE is suboptimal for our data set. Due to the exchangeability assumption and computational limitations, each analysis draws information from just a subset of the data. Because the full sequence data is highly informative about the underlying gene genealogy, very few random trees are compatible with it. This makes GENETREE a highly inefficient approach to estimating population genetic parameters. Thus, we emphasize the point estimates and confidence intervals derived from the frequentist approach. Predata Distribution of TMRCA For a constant population size, the TMRCA of a locus, measured in Ne generations, is given by: where Ti is the time during which i ancestral lineages of the sample existed. Coalescent theory (38) models Ti as an exponential random variable with parameter: To obtain the distributions presented in Fig. 3, we simulated five million draws of TMRCA for n = 100 lineages and scaled each value by a factor of Ne·g to convert to years. ˆNe = ˆ✓ 2ˆµlg = ˆ✓ 2gˆµly ˆTMRCA = ˆTc ˆNeg = ˆTc ˆ✓ 2ˆµly TMRCA = nX i=2 Ti, i = ✓ i 2 ◆ .
  • 17. 12 Supplementary Text Novel Y Chromosome Phylogenetic Structure Haplogroup B2 Within hgB2, we identify one clade and three additional lineages that represent previously uncharacterized structure (Figs. 2, S7). Each lineage represents an ancient divergence within the Y chromosome phylogeny and carries no known differentiating mutations downstream of M192 and Page72, which define hgB2b1. First, in the main text we describe a subclade of B2b1a that encompasses six Baka individuals. Previously, B2b1a2 was associated with the P70 variant, but because these six Baka individuals carry the ancestral allele for P70, we propose reassociating P70 with a new label, “B2b1a2a,” and labeling the new clade “B2b1a2b.” Second, B2b1b was previously associated with P6, but we have identified a Mbuti individual carrying the ancestral allele for this variant. Thus, we propose associating P6 with a new label, “B2b1b1,” and designating the new lineage “B2b1b2.” Finally, we identify two new lineages within B2b1a1. The individuals representing both of these lineages carry the ancestral T allele for the M169 variant that defines B2b1a1a, the only extant sublineage of B2b1a1 not represented. Haplogroup F Table S2 presents genotyping results for the M578 variant in separate panel of individuals. The results confirm the (G, H, IJK) → (G, (H, IJK)) polytomy resolution. The demographic fates of hgG and hgHIJK were geographically asymmetric, with the spread zone of hgG (40) considerably more restricted than that of hgHIJK (Fig. S6). The latter now spans all continents, including Africa due to the back migration of some haplogroups (41). Imputation We used our phylogeny-aware algorithm (Fig. S8) to impute approximately 5.3 missing genotypes per Y chromosome variant site and a median of 826 per individual. Imputation Limitations It is not possible to impute singletons: when the carrier of a unique allele has zero reads of support, there is no evidence for variation at the site. Doubletons pose a similar problem. Let A and B be nearest neighbors in the phylogeny. Consider the case where, at a given site, A possesses an allele not observed in any other sample, and B has zero reads. It is impossible to distinguish whether the site is an A singleton or an A/B doubleton. However, conditional on one sample missing data at a particular site, our imputation strategy correctly imputes two thirds of tripletons; it fails only in the case where the lineage of the missing sample is the last to coalesce. For four lineages, there are 18 possible trees. Of these, twelve consist of stepwise coalescence, and the lineage with
  • 18. 13 missing data is the most diverged in just three. Thus, we correctly impute five-sixths of quadrupletons. Polarizing Variants on the Branch Spanning the Ancestral Root Our method to infer the ancestral state at a given site was inapplicable to the 398 variants assigned to the most ancient (basal) split, as no outgroup for these branches was present within the data set. For these, we first conducted a LiftOver (42) to map GRCh37 coordinates to those of the chimpanzee reference (PanTro3). Due to the abundance of large-scale inversions between the two chromosomes (17), it was necessary to BLAT (43) 101 bp chunks of DNA surrounding each human variant to infer relative orientation. Ancestral states were thereby inferred for 322 variants, and those of the remaining 76, for which the corresponding chimpanzee allele could not be inferred, were randomly assigned in the corresponding proportion. Homoplasy and the Infinite Sites Model We deemed a SNV consistent with the tree when we observed no ancestral alleles in the subtree rooted at the branch to which the SNV was assigned. Most variants (11,279) were consistent with the tree, and we imputed missing genotypes for those that were. Sites incompatible with the phylogeny were uniformly distributed across the callable regions (Fig. 1) and were excluded from downstream analyses. Just 199 (of 361) incompatibilities were supported by more than one sequencing read. This lack of homoplasy on the Y chromosome justifies usage of the infinite sites model. Calibration and Mutation Rate Estimation Mutation rate estimates are typically based on family pedigrees (14) or species phylogenies, such as the human-chimpanzee divergence (2, 3). However, just one pedigree-based rate is available for the Y chromosome, and, though the mutation process is highly stochastic, this rate is based on a single pedigree. Furthermore, precise alignment between the human Y chromosome and that of the chimpanzee is difficult due to extreme structural divergence. Finally, if the Y is subject to a time-dependent mutation rate, as is mtDNA (24, 25), then neither estimation approach is ideal for dating human population events. Instead, we estimate mutation rates using a within-human calibration point, the initial migration into and expansion throughout the Americas. Well-dated archaeological sites include Paisley Cave in Oregon, which dates to 14.3 kya (19); Buttermilk Creek in Central Texas, at 13.2–15.5 kya (44); and Monte Verde II in Southern Chile, 14.6 kya (45). To date the expansion of genetic lineages unique to the Americas, we follow Goebel et al. who state that the most parsimonious estimate is that “humans colonized the Americas around 15 kya” (19). We show that a lack of parity between the expansion event and the divergence of lineages used for calibration would have minimal effect on the difference between the TMRCA of the Y and mtDNA if the divergences are within a few thousand years of one another (Fig. S13, Materials and Methods).
  • 19. 14 For reference and comparison, Table S3 summarizes mutation rate point estimates on four scales. The Y chromosome mutation rates are similar to previous autosomal phylogenetic-based mutation rates and extended pedigree-based rates, but they are almost two-fold higher than autosomal mutation rates based on trios (46). Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation We developed a method to estimate the variance in estimated TMRCA that is due to the stochastic nature of the mutation process (Materials and Methods, “Frequentist Estimation of TMRCA” section). Here we discuss the potential impact of bias due to sequencing error and modest sequencing coverage. We have estimated TMRCA by calculating the ratio of two quantities, divergence and the mutation rate, each of which depends on experimental measurements. The numerator is the average tip-to-root height of the tree, and we estimate the denominator as the ratio of average branch length within the calibration subtree to the calibration time. Data for each of the three measurements is imperfect. In this section, we consider potential biases in the first two, and we consider calibration time in the next section. Tip-to-Root Height We measure tip-to-root height as the total number of SNVs assigned to all branches separating an individual from the common ancestor of all individuals. This sum includes the singletons of the terminal branch and the shared variants on the internal branches. Two factors act in opposition to stretch and shrink an observed branch length with respect to its true value: sequencing error and the total sequencing coverage of the branch, which itself is influenced both by sequencing coverage of individuals and by sampling density of the clade rooted at the branch. The primary effect of sequencing error is to stretch terminal branches, as it is unlikely that random sequencing errors will cluster phylogenetically. We have demonstrated that genotype error is minimal (Materials and Methods, “Validation” section). Consequently, branch lengths are not significantly inflated by sequencing error. Though modest sequencing coverage translates to unobserved variants near the tips of the tree, thereby shortening observed heights, the internal branches of the tree, which constitute the overwhelming majority of any tip-to-root path, have quite high coverage due to the superposition of sequencing from all descending lineages. Thus, most observed internal branch lengths cannot differ significantly from their true lengths. Fortunately, the most divergent sample with the longest terminal branch, the San individual in the hgA- M51 clade, had higher than average sequencing coverage (6.15×) and, consequently, call rate (0.985). We observed 1012 private variants in this individual, and we estimate approximately 22 false negatives—unobserved variants with either a no-call genotype or just one sequencing read, an event insufficient to identify a site as variable. This worst- case scenario is less than 2% of the average tip-to-root height. We likely have very few false negatives in other individuals, even among those of lower coverage, since the lower coverage samples are clustered in the densely sampled portions of the tree, such as in hgE and portions of hgB, and the imputation strategy we’ve implemented enables these lineages to receive credit for variation detected in neighbors and which they can be
  • 20. 15 inferred to possess. Finally, the maximum observed tip-to-root height (1188), could be considered a conservative upper bound on the true mean, and it differs from the observed mean by just 5%. Branch Lengths in the Calibration Subtree We now consider how sequencing coverage affects branch lengths in the Y chromosome hgQ subtree used to estimate the mutation rate. We sequenced Mayan HGDP00856, a representative of hgQ-M3, to 5.7× coverage and Mayan HGDP00877, whose haplogroup is labeled hgQ-L54*(xM3) because it carries the L54 mutation but is ancestral at the M3 SNP, to an average depth of 8.5×. Had we sequenced the two Mayan lineages to lower coverage, we would have artificially boosted TMRCA estimates by underestimating the mutation rate. However, haploid coverage for the Mayan samples are high enough that false negatives have little impact on our calibration. The rate of false negatives is dominated by sites in the terminal branches of the tree with either zero or one sequencing read for a sample. When an individual has zero or one read at a shared SNP, we can usually impute its genotype, but it is not possible to impute singletons or to distinguish a singleton from a doubleton in the presence of missing data (Supplementary Text, “Imputation” section). Although missing singletons and misclassified doubletons have little impact on total branch length from the tips to the root of the entire tree, they are quite important for calibration because singletons constitute a significant portion of branch length within the calibration subtree. In our study, the shared hgQ branch is of approximately the same length as the Q-M3 and Q-L54*(xM3) terminal branches. Consequently, no-call genotypes at singletons sites, which lead to missing singletons, are counterbalanced by no-call genotypes in the shared hgQ branch, which lead to doubletons misclassified as singletons. This relies on the fact that at 5.7× and 8.5× coverage, the no-call rates on the doubleton and singleton branches are comparable. In general, a no-call due to the presence of just a single sequencing read is less likely to occur on the doubleton branch than on the singleton branch, but of the 9,988,118 callable sites only 194,966 (2.0%) and 23,989 (0.2%) are covered by just one read in HGDP00856 and HGDP00877, respectively. To empirically estimate the false negative rate within the hgQ subtree used for calibration, we incorporated data from the 1000 Genomes Project (47). We downloaded genotype calls (VCF files) for 525 males from Phase 1, called haplogroups, and identified eleven individuals belonging to hgQ1 . We then downloaded aligned sequence data (BAM files) for these samples, converted from the GRCh37 to hg19 reference, and applied our pipeline to the combined set of 80 individuals (Fig. S9). In the combined analysis, the branch shared by all hgQ lineages grew from 136 to 146 SNPs2 . One SNP had not been called in either HGDP sample (hg19 position 15825218), and nine SNPs were no-calls in HGDP00856: three due to the absence of reads, and six due to one erroneous read (of 4– 1 A twelfth, NA19753, was sequenced using SOLiD. We did not include this sequence in our analysis since it is likely to have different error and mapping properties than those generated by Illumina technology. 2 The exact length is 149, but the difference includes two SNPs that were on the borderline of the depth- based filter in the main study and a net of one SNP discarded due to homoplasy: two in the main study and one in the combined analysis.
  • 21. 16 10). With perfect data, these nine SNPs would have been classified as doubletons, but they were instead misclassified as HGDP00877 singletons. Thus, for HGDP00856, we can estimate the no-call rate within the hgQ subtree, β0 ≈ 6.8% (10 / 146). Partly because the coverage is higher, we observed no doubletons misclassified as singletons due to missingness in HGDP008773 . Thus, for HGDP00877, β0 ≈ 0.7% (1 / 146). Whereas on the shared doubleton branch the no-call rate should sufficiently inform the type 2 error rate (βd ≈ β0), the no-call rate does not provide complete information for the terminal branches since GATK, prudently, will most often not designate a site as variable if there is just one sequencing read with the alternative allele in the entire sample. Thus, to fully model the singleton type 2 error rate, βs, we must also consider the probability of observing just one read, β1, since when this occurs at a singleton site, a false negative will most often result. To do so, we computed the sequencing read depth distribution over all ten million callable sites for each sample. Scaling this empirical probability mass function by the number of singletons observed in the individual and censoring to discard the zero-read and one-read bins, we observe that when coverage exceeds 4×, the expected read-depth distribution among singletons closely mirrors the observed distribution (Fig. S10). This suggests that there are few false negatives at sites for which at least two sequencing reads are observed. Thus, βs ≈ β0 + β1. When a branch with false negative rate β has true length L and observed length Y, the number of unobserved variants, X, is given by: . On the HGDP00856 singleton branch, we have Y = 126 and, from the empirical read- depth distribution, β1 = 2.0%. Thus, βs ≈ β0 + β1 = 6.8% + 2.0% = 8.8%, which gives X ≈ 12.2 missing singletons. This is likely an overestimate because the no-call rate across all variable sites, 2.2% (Table S1), is lower than the empirical rate within the subtree, 6.8%. The branch shared by all hgQ-M3 lineages (branch 18 in Fig. S9) affords an opportunity to empirically check the singleton false negative rate for HGDP00856, since this individual should possess each of these variants. We had correctly called 16 of 17 in our main analysis. This suggests a singleton false negative rate for this sample of 1/17 = 5.9%4 , but the variance for this particular estimate is quite high since it is based on just 17 sites, so to be conservative, we use the value of 8.8% estimated above. For HGDP00877, we have Y = 120 and β1 = 0.2%, which give βs ≈ 0.7% + 0.2% = 0.9%, and X ≈ 1.1 missing singletons. This prediction cannot be tested empirically with these data because the lineage is an outgroup to the two hgQ-L54*(xM3) sequences from the 1000 Genomes Project. As discussed above, there were nine doubletons previously 3 It is possible that one such SNP exists and is missing in all three hgQ-L54*(xM3) sequences, but this is a low probability event. 4 The lone false negative occurred at hg19 position 22613361. Prior to imputation, we do make the correct call in the combined analysis, because one read was present, and it carried the derived A allele. X = L = 1 Y
  • 22. 17 classified as HGDP00877 singletons, so accounting for type 2 errors reduces this branch length by 7.9 (9 – 1.1). Putting these two together, we compute the average branch length since MRCA of the two samples as 125 SNPs, which differs by the observed value of 123 by 1.6%. Thus, one might wish to scale our Y chromosome TMRCA estimates by a factor of 123 / 125 = 0.984. However, the effect of false negatives would be offset by false positives, should one or two exist, so we choose not to. False negatives are not an issue for mitochondria, where all sequences are complete. Calibration Time In light of the above, the largest potential source of bias is the calibration time: the dating of the arrival of humans into the Americas and the approximation of synchronicity of this arrival with phylogenetic divergences. Timing of Expansion into the Americas Archaeological dates for the time of first arrival in the Americas range from 14.3–16.5 ky. Goebel, et al. (19) conclude that the most parsimonious estimate is that “humans colonized the Americas around 15 kya,” so we elect 15 ky as reasonable figure for both the maternal and paternal loci. If the true divergence time of American lineages were 14.3 ky, one must scale down the TMRCA ranges we report by about 5%. Likewise, for 16.5 ky, an increase of 10% would be requisite. However, the specific number used will have no effect on the relative TMRCA estimates for the two loci, provided the divergences of the two loci were contemporaneous. We consider the case of unequal split times in Fig. S13 (Materials and Methods, “Estimating the Ratio of mtDNA TMRCA to Y TMRCA” subsection). Y Chromosome Calibration Point With 108 sampled lineages, the point of rapid expansion within the Americas among mtDNA hgA2 lineages is clear. However, the corresponding point within Y hgQ is less so. Though we have argued that M3 most likely occurred shortly subsequent to initial entry to the Americas, it remains possible that hgQ-M3 and hgQ-L54*(xM3) diverged within Siberia or Beringia. When we include lower coverage 1000 Genomes hgQ lineages, we observe a star-like diversification among the Q-M3 derived lineages (Figure S9, below branch #18). It is possible that some subset of the 17 M3-equivalent mutations accumulated prior to entry—within Beringia, for example, as has been proposed for mtDNA founding lineages (48). However, 12 of the 13 sequenced individuals are from Mexico, and this sampling bias could obscure a more upstream initiation of the expansion. For example, it is possible that hgQ-M3 lineages within Greenland do not share all 17 of these mutations. Because just three sequences represent hgQ-L54*(xM3), the phylogenetic structure of this subhaplogroup remains largely unknown, but the root of the sampled hgQ-M3 lineages can be used to calculate a strict lower bound on the mutation rate, as entry to the Americas certainly happened no later than this point.
  • 23. 18 The 1000 Genomes lineages are inappropriate to calibrate upon due to lower sequencing coverage (average = 2.9×; Supplementary Text, “Branch Lengths in the Calibration Subtree” subsection), so we are left with a single lineage from our sample, HGDP00856, for this lower bound calculation. Accounting for false negatives had little effect when two samples were used for calibration, as the degree to which the hgQ-M3 branch grew was offset by a corresponding shrinkage of the hgQ-L54*(xM3) due to the hgQ doubletons that were unobserved in HGDP00856 and thereby misclassified as HGDP00877 singletons. However, it is important to correct for type 2 errors when considering this lineage alone. In the main analysis, the observed length of the M3 lineage was 126 mutations. This breaks down to 16 observed M3-equivalent SNPs and 110 post-M3 SNPs. Using a singleton false negative rate of 8.8%, this translates to approximately 10.6 (0.088*110/(1–0.088)) unobserved post-M3 SNPs, which gives a calibration length of 120.6 SNPs. This differs from the calibration used in the main text by 1.9%. Existence of Rare Yet More Basal Lineages We emphasize that the estimates we derive refer to the coalescence times within our sample. For the mitochondrial genome, we have likely sampled the most divergent branches in the tree (34). However for the Y chromosome, our estimate of the TMRCA reaches as far back as the A1b clade. Inclusion of samples from hgA1a or the newly discovered hgA0 (5) or hgA00 (49) would push the date further back. However, these haplogroups are very rare, and it is difficult to assess whether correspondingly divergent but singular mitochondrial genomes may also await discovery. Effective Population Size The Ne differences we observe between males and females are most likely due to a greater variance in reproductive success among males, a phenomenon influenced by cultural and demographic factors, such as the practice of polygyny (50). Both purifying and positive selection could also act to reduce the Ne along the linked regions of the Y chromosome. However, both forms of selection may have also acted on the mitochondrial genome. Additional information would be necessary before one could invoke natural selection as the primary cause of reduced male Ne, and the hypothesis is neither necessary nor sufficient. Additional Acknowledgements This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1147470. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
  • 24. 19 Fig. S1. Map of populations. We sampled Y chromosomes and mtDNAs from nine populations including Baka Pygmies from Gabon, Cambodians, Maya from Mexico’s Yucatán Peninsula, Mbuti Pygmies from the Democratic Republic of Congo, Mozabite Berbers from Algeria, Nzebi from Gabon, Pashtuns (Pathan) from Pakistan’s North-West Frontier Province, San from Namibia, and Yakut from Siberia. ● ● ● ● ● ● ● ● ● Baka Cambodian Maya Mbuti Mozabite Nzebi Pashtun San Yakut
  • 25. 20 Fig. S2. Sequencing read mapping on Xq21. Total read depth and the depth of MQ0 reads are plotted for 24 HGDP females. Mean values in contiguous 5 kb windows are shown along chrXq21. Dashed gray lines indicate the region that corresponds to the “X-transposed” segment of the Y chromosome. chrX Position (Mb) DepthinHGDPFemales ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ● ● ●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●● ● ● ● ●●● ● ● ● ●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ● ● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●● ● ●●●●●●●● ● ●●●●●●●●● ● ●●●●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ● ● ●●●● ● ●●●●●●●●●●●●●●●●● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ● ●●●●●●●●● ● ●●●●●●●●●●●● ● ●●●●● ● ● ●●●●●●●●● ● ● ●●●●● ● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●● ● ●●●●●●●●●●●●●●●●● ● ●●●● ● ●● ● ● ●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ● ● ●●●●●● ●●● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●●●●● ● ●●●● ●●●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ●●●● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ●●● ● ●● ●●● ●● ● ●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ●● ● ●●●●●●●●●●●●●●● ●● ●●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●● ● ●●●● ● ●● ● ●●● ● ● ● ● ●●● ●● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ●●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ●●●●●●●● ● ● ●●●●● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●●●● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●●● ●● ●●● ●● ●● ● ● ● ● ● ● ●●●●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ●●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●● ● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●● ●●●●●●●●● ● ●●● ● ●●●●●● ● ● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ●●●●●●●●●●●● ● ●●●●●●●●●●● ● ●●●●●●●● ● ● ●●●●●●●●●● ● ●●●●●●● ● ●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●● ● ●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ● ● ● ●● ● ●●●●●●●●●●●●●●●●●● ●● ● ● ● ●●●●●●●●●● ●●● ●● ● ● ●●●● ● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 85 86 87 88 89 90 91 92 93 94 95 96 050100150200250300 Homologue of X−transposed Region● ● Filtered Depth MQ0 Depth