Y DNA Surname Projects - Some Fresh Ideas

2,436 views

Published on

James Irvine's presentation on YDNA Surname projects from the 2015 International Conference on Genetic Genealogy.

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,436
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Background: Genealogist for over 50 years.
    No knowledge of genetics, but 10 years of experience of administering Irwin DNA project, aka Clan Irwin Surname DNA Study.
  • Irwin project also known as Irwin Clan Surname DNA Study.
    Irwin project is not necessarily typical of Scots clans, but many lessons apply to all surname projects.
  • at-, mt- and x-DNA also used for Deep Ancestry and “chasing cousins”.
  • Testing companies very dependent on “admins” for customer interface – viz. manning of FTDNA stand at WDYTYA.
    Important to recognise Administrators are volunteers whose interests, skills and time availability are, by definition, not limitless!
    This lecture focuses on items 3 and 4.
    A personal thought: as a surname project administrator, to date I have found understanding genetics to be less critical than having time, patience, a good support network, and skills in genealogy, data handling & communicating. I have also been lucky to inherit an interesting surname and trained as an engineer. However to understand NGS SNP criteria I will need more knowledge of genetics.
  • Very lucky this DNA project brings out so many features.
    0.1% ratio is typical of many DNA surname projects
  • “All Scottish Irwins, regardless of spelling, are descended from a common ancestor.”
    Solid lines show confirmed paper trails.
  • 436 “joins”, but this includes some results pending” and mt-DNA and at-DNA orders ,and excludes non-FTDNA data;
    corrected figure to end-October 2015 is 392 y-DNA test results.
  • Penetration is ratio of participants tested to world population.
    Note the heavy US bias in project, but Scotland not under-represented.
    Study suggests that penetration of about 0.06% necessary before project gets a fair perspective of diaspora.
  • Distribution of all project participants who know the county in Britain or Ireland of their earliest confirmed paternal ancestor.
    Good correlation with census/Griffiths Valuations.
    Placenames in green appear in traditional genealogies.
  • For background see Reading List slide at end of lecture.
  • Spelling relevant in Scotland but not elsewhere.
    NB All forgoing data is before considering significance of DNA test results!
  • - 111 marker panel more useful than 67 panel, but expensive
    - 12 marker panel can be useful, especially with individual “private” SNP test
    - “horses for courses”
  • Full Excel table of results (470 lines, 180 columns) at www.dnastudy.clanirwin.org.
    This slide shows sample of 21 results (of 392), of first 25 markers (of 111), and of
    4 (of the 34) genetic families identified by Administrator.
    Colour key at bottom denotes “genetic distance” from modal value for each marker.
    Some participants with only 12 markers can be categorised, some cannot.
    Challenge for lecture: How are these genetic families best defined, identified and named.
  • Matching and grouping cause much confusion, and little reliable guidance available.
  • Moral : use FTDNA pages to determine GD
    Average mutation rates of different markers vary by a factor of c.400.
  • TiPs didn’t “arrive” until 2005 and by then the trail-blazing admins had developed their own tools and rules. They are still not popular with admins.
  • TiP Score term conceived by myself and Ralph Taylor. The more I use it the more I realise its potential.
  • FTDNA’s terminology and “Matches” pages cause much confusion.
    Time prevents discussion of latter; they are more useful for cousin chasing than for surname projects; not screened to remove dissimilar surnames.
    Most near matches have TiP score > 95%. I used to use cut-off of 80%, now use 60%, but not critical (for similar surnames).
  • Grouping is biggest challenge for admins.
    Much inertia: most admins “set in their ways”.
  • Fine in theory.
  • Iterative process.
    DNA signature of Modal participant may not be that of common ancestor because (a) small sample size, (b) sample bias (e.g. two branches of the family have procreated at the same rate, but one stayed in UK where DNA sampling is rare, another migrated to USA where DNA sampling common), (c) “Founder effect”, where two branches procreated at different rates, typically one with a relatively lower rate in UK, but another with a higher rate migrated to USA, and (d) “Genetic drift”, the consequence of random mutations irrespective of procreation rates or migration where some lines flourish over time and others dwindle or die out.
    Some gentic families have only one participant if he has a very clear origin.
  • My “Total participants” is a little less than FTDNA’s “Project joins”, as the latter include tests still at laboratory and mtDNAs
    Singletons, initially 50%, now steady at just 10%
    phases: - establishment & initial growth – difficulty in identifying genetic families;
    - recognition of most genetic families
    - “maturity” - few new genetic families being added ,although project continues to grow
    The 0.04% and 0.07% on the right are the project’s penetration levels when the second and thrid phses happened: interesting to compare with other projects.
  • In theory the TMRCA of a genetic family may be estimated by averaging the TMRCAs of members of the family using Magee’s matrix, but I am unsure how to interpret the mathematical result.
  • Note: These wide probability ranges do not include further uncertainties attributable to individual marker mutation rates, back mutations and no. of years per generation.
    Moral: Don’t use TMRCAs based on genetic distance!
  • Not all strict synonyms.
    The term NPE is borrowed from genetics, where it has a narrow interpretation.
    Some genetic genealogists feel this interpretation should be retained, and they and others feel very sensitive about its use in genetic genealogy.
    For genetic genealogy I think a wide interpretation is necessary. I would prefer the term “SDEs”, but this novelty is not widely known.
  • Illegitimacy quite common (today technically 50%!!), but certainly not only cause of NPEs
    Historically, adoption and formal name-change were rare
    Step father probably most common
  • These terms conceived by Dr John Plant; they are not widely used, but they need to be.
  • This is an example of FTDNA’s “Matches” page.
    Note this is an Elliot with several Irwin near matches.
  • Note this is an Irwin with several Elliot (& Fairburn!) near matches.
  • Touchy subject with many admins and participants. But with understanding, clear explanation and sensitivity I have handled over 50 NPEs without any complaints.
  • Reminder of challenge.
  • This is my spreadsheet analysis of the same data as in the previous slide. Many points arise. Note:
    - half of these examples claim Irish ancestry
    range of markers tested, from 12 to 111;
    2 brothers with BG of 2/25 (anecdote), and one of 5/37, outside FTDNA “Matches” criterion;
    clarity of TiP Scores
    few pairs of cousins found
    e-NPEs and i-NPEs
    ability to name all four genetic families (Munster ancecdote)
  • Most important slide.
    30 genetic families now identified – for what was thought to be a single source surname!
    The Borders genetic family dominant, with 262 members; probably now the largest such cluster in any surname DNA project.
    Most e-NPEs and all i-NPEs have or used to have other Borders surnames, implying these “events” probably occurred after Irwin settlement in Borders (1300s?) but before migrations to Ulster (1600s).
    Only 5% of Irwin project STR tests via General Fund, but these provide several of the critical genealogies from which their geographical origins can be identified.
  • Example of use of triangulation. Note the sequence in which the tests were taken.
  • Note most participants reside in the New World, many can trace ancestry back to Ireland, but correlation of DNA and available genealogical evidence shows most have Scottish origins.
    Most apparently migrated from the Borders to Ulster in the 17th century, and from Ulster to America in the 18th century.
    Question arises: is project US biased?
  • Project has cast a completely new light on traditional understanding of this Scottish surname.
    “X” indicates where traditional tree was wrong.
    Discoveries had to be handled sensitively.
  • Pretty, but not convinced! Did help to identify sub-groups within Borders genetic family
  • The modal sub-group BA (12 members match 67/67, 30 match 37/37) is probably an example of convergence, with regression towards the mode.
  • Recent breakthroughs in “Next Generation Sequence” SNP tests (e.g. FGC Elite, Chromo2, Big Y) are very powerful, but expensive and difficult to analyse.
  • Deep Ancestry speculates on the geographical distribution of these SNPs.
    L555 recognized by ISOGG mid 2012; still private to Irwin Borders genetic family.
    NGS tests necessary to bring tree into surname era.
  • BigY is FTDNA’s Next Generation Sequencing (NGS) test.
    BAM data is the raw test results, typically 30Gb, i.e. too much to send by e-mail unless compressed.
  • L21 members are very lucky to have Mike Walsh and Alex Williamson – www.ytree.net This is an on-line, free access phylogenetic tree of c.1800 P3I2/L21 NGS test results that have been copied to Williamson. He lists Private SNPs separately.
    This example shows the 12 L555 testees: the 5th largest such surname group in Williamson’s tree.
  • BigTree data (only), as of 9 Nov. 2015, processed for Irwin project. I disagree with some minor details.
    Shows L555 still unique to Irwins.
    Note how “flat” this sub-clade is compared with, for example, the extensive biforcation shown in the sub-clades of the phylogenetic trees of Maurice Gleeson.
  • Decision to minimise dependence on 3rd parties was prompted by Williamson’s threat to discontinue his Bigtree. This threat has now lapsed, but my resultant ability to read BAM data has improved my understanding of NGS data and enhanced reliability of TMRCA estimates, as well as avoiding dependence on FGC or FullY analyses.
    All but last prioritiy achieved in 2 years; L555 Pack test will be FTDNA’s first surname SNP Pack test.
  • This example for kit 65048 may not be typical, and may be out of date, but the extent of the read and pink cells illustrates the principle that no computerised BigY analysis is necessarily as comprehensive as might be expected. I have even “found” probable SNPs listed in FTDNA’s Matches that were not in the relevant csv file, and “discovered”, by chance, probable SNPs that were not listed by FTDNA , Walsh or Williamson.
  • This is a flow diagram illustrating my appreciation of the various tools available to analyse BigY test results., and some of the parameters used in these analyses.
  • Thanks to Dennis Wright for pointing me in this direction. His webpage at https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf explains how to load and use the BAM IGV Viewer.
    Step 1 is the most difficult!
    Step 2 is tedious.
    Step 3 is easy and most illuminating. Steps 3(1) and 3(2) are iterative. See following slides. L555 is described by some as our project’s “Terminal” SNP for our Borders sub-group.
    Step 4 is most important. As the number of available genetically closely-related BigY test results increases, so does the likelihood of quality ratings that are incompatible. Judgement is thus called for, as no computer program could resolve these occasional conflicts (any more than a computer could describe an oil-painting).
  • Once set up, surprisingly easy to use.
    This slide shows the 12 L555 results for variant 21368012-G-A on one screen!
  • Sources: FTDNA CSV Novel Variants and Known SNPs; FTDNA Matches; FCG/YFull Analyses; haplogroup web sites, e.g. Mike Walsh, Alex Williamson
  • NB 1. Capital A, C, G or T indicate “probable”, lower case a, c, g, t indicate “possible”.
    2. Black boxes identify probable Intermediate and Private SNP blocks.
    3. When identifying probable Intermediate and Private SNPs, compatibility of “possible” quality derived from a single BigY may need subjective revision.
    4. Such revision cannot be undertaken by a computer program.
    5. The more comparable BigY test results the better the insight into Intermediate and Private SNPs.
  • This example shows page 2 of pages 1-3 of my manual analysis of BAM data for the 12 BigY Border Irwin tests to date.
    Raw BAM data is shown in red print (Read count s of SNPs & of Indels, % consistency of SNP Reads).
    Alternate variants in capitals if Read count >10 AND Read consistency > 85%.
    This top page shows pre-L555 and L555 variants: boxed data is probable, unboxed data is possible. Note the Alternate variants for each base pair are the same for all Testees.
    FGC and YFull contributions shown in bright green.
  • This analysis differs slightly from that of Alex Williamson.
    Worryingly, neither his version nor the above correlate with the STR data of these 12 BigY project members.
    The more private SNPs, the older the biforcation. Note the BA testee from our modal sub-group is apparently not the oldest – example of “founder effect”?
  • Average mutation rates (“years per SNP”) are derived from radio carbon dating/ancient DNA/genealogies: YFull use 118 years per SNP (see Adamov D et al ‘Defining a New Rate Constant for Y-Chromosome SNPs based on Full Sequencing Data’ in Russian Journal of Genetic Genealogy 2015 7/1 p76 (ex http://dna.cfsna.net/HAP/index.html).
    Dennis Wright and FTDNA use 120 years per SNP.
    For FGC ‘s NGS tests over a larger sample of the genome, a smaller “years per SNP” ratio is applicable.
  • TMRCAs based on av. mutation rate of 120 years per SNP.
    Mean of AD1200 for L555 block seems credible.
    Starburst/bottleneck/starburst phenomena – striking, no obvious explanation
  • Some individual TMRCAs seem credible, e.g. B9.
    But others clearly not, e.g. B10: need for L555 SNP Pack to avoid reliance on single tests
  • A difference of 1½ SNPs and 170 years seems a lot, and our genealogical evidence suggests that the ISOGG criteria for defining SNPs (as of 3 Nov. 2015) is too restrictive.
  • I have included my “DIY” criteria above simply to put them in context, not to suggest they have more merit than the other criteria.
    Blanks indicate I haven’t got the relevant evidence.
  • Format courtesy of Maurice Gleeson.
    We are making considerable progress at bridging the gap between paper trails and DNA test data.
  • The bad news is that the Borders, Drum, Orkney and Perthshire Irvine/gs are apparently unrelated to each other through male line
    The good news is that :
    - so many American Irwins can now be positiveily entified as descendants of the Border Irvings;
    - surname is a plural origin name – not surprising, but upsets traditionalists;
    further developments and revelations likely.
    With 262 members (or 202 even if NPEs and <37 markers excluded), our Border Irwin genetic family is apparently the largest such cluster in all of the 8,000+ surname projects.
    And its 12 BigY test results are the 4th largest surname cluster in Alex Williamson’s Big Tree. These two features make it an excellent case study for statistical analyses by other project admins.
  • Most of this would not have been possible without FTDNA’s vision, stoicism and patience.
  • Y DNA Surname Projects - Some Fresh Ideas

    1. 1. 11th Annual International Conference on Genetic Genealogy Houston, 13-15 November, 2015 Surname Projects – Some Fresh Ideas James M Irvine Member: GOONS, ISOGG, OFHS, SGS
    2. 2. D N A 31 patients Did Not Attend their appointments at this surgery last month.
    3. 3. Overview (1) pre BigY: - Background - Penetration - “Matching”, “Grouping” & “Genetic Families” - False Positives & False Negatives - TMRCAs - “NPEs” - Geographic origins - SNPs (2) BigY & BAM data: use & interpretation using the Irwin project to illustrate principles & tools that may be relevant to other surname projects 3
    4. 4. Surname DNA Projects: their context 4 DNA testing Medical Paternity Genetic Criminal Archeology applications testing genealogy investigations ("Ancient DNA") mt-DNA y-DNA at-DNA x-DNA tests tests tests tests Deep Surname "chasing Ancestry projects cousins" - Closed projects - STR tests - Open projects - SNP tests y-DNA & surnames only descend through the male line
    5. 5. Surname DNA Projects: Roles of volunteer Administrators 1. Agree & refine terms of reference & goals - including “closed” or “open”. 2. Maintain genetic & genealogical database. 3. Define & identify genetic families. 4. “Add value” from genealogical data: - identify cousins & geographic origins. 5. Publicise results. 6. Liaise with individual participants. 7. Recruit new participants. Always respecting participants’ confidentiality. 5
    6. 6. Irwin Surname project: Background• Scottish lowlands surname • strong genealogical traditions, but few “old” pedigrees • active clan association in America • the DNA project: - only represents 0.12% of Irwins etc. in world today, BUT - has grown steadily over 10 years - has 392 y-DNA STR and 19 “BigY” test results - is about the 50th largest of 8,000 surname projects - includes largest genetic family in any surname project - shows surname typifies Scotch-Irish-America diaspora - has associated but separate Autosomal DNA project 66
    7. 7. The traditional genealogy of the Irwins 7
    8. 8. 8 Irvine, Ayrshire Irwin project: 1200 Eskdale, Dumfriesshire traditionally a single-origin Scottish surname 1300 Bonshaw, Dumfriesshire Drum, Aberdeenshire 1400 Orkney 1500 1600 Dumfries Castle Irvine Perth Shetland Co.Fermanagh 1700 1800
    9. 9. Irwin project: growth 9
    10. 10. Irwin project: Geographical “penetration” 10 Participant's All Irwins etc. Penetration place of in world today of project residence * ** Project size/Population 392 300,000 0.12% USA 77% 61% 0.13% Canada 6% 12% 0.05% Australia, New Zealand 6% 9% 0.07% England & Wales 5% 10% 0.05% Scotland 5% 4% 0.12% Ireland (NI & Eire) 1% 3% 0.03% Germany, Netherlands - 1% 0.00% Unknown, other - - - *: Source: w w w .w orldnames.publicprofiler.org/ **: definition: w w w .jogg.info/62/files/Irvine.pdf
    11. 11. Irwin project: Origins in UK counties, if known, of participants’ earliest confirmed paternal ancestors 11 L Cumberland 4 Dumfriesshire 14 Antrim 18 Derry 10 Tyrone 15 Down 2 Armagh 3 Fermanagh 14 Monaghan 1Cavan 1Connaught 3 Donegal 2 Leinster 5 Ayrshire 1 . Irvine ..Dumfries Bonshaw. Esk- dale .Castle Irvine Munster 5 Shetland 4 Orkney 9 Aberdeenshire 7 Perthshire 4 Northum- berland, Durham 7
    12. 12. The Scotch-Irish 12 The term Scots-Irish, or Ulster Scots, refers to Scots who migrated to Ireland, typically in the 17th century from SW Scotland to Ulster. •Many Scots took part in the Plantation of Ulster c.1610, either as a landowning Undertaker, or as a tenant. Each Undertaker undertook to keep 40 loyal tenants. •Other settlers included Border Reivers who had been banished. •Most Scots-Irish were Presbyterians. •Very few Scots-Irish have pedigrees back to Scotland (unless their ancestors were Undertakers). The American term Scotch-Irish refers to descendants of these Ulster settlers who in turn migrated to America, typically in the 18th century to the Appalachian piedmont (PA-GA). •Few Scotch-Irish have pedigrees back to Ireland.
    13. 13. Irwin project: Earliest confirmed paternal ancestors 13 Irwin 32% 1900s 4% Irvine 16% 1800s 29% Erwin 13% 1700s 48% Ervin 8% 1600s 3% Irving 8% 1500s 1% Irvin 8% 1400s 1% Arnwine 1% 1300s 0% Urwin 1% 1200s 1% Other 13% Unknown 13% Spelling Birth date
    14. 14. Irwin project : Marker resolution No. of markers 2010 2015 12 13% 5% 25 6% 1% 37 48% 54% 67 33% 26% 111 - 14% 37 or more 81% 94% % participants
    15. 15. Irwin project: Results examples (1) 15 ID Haplo 12 25 group 393 390 394 391 385 385 426 388 439 389 392 389 458 459 459 455 454 447 437 448 449 464 464 464 464 a b -1 -2 a b a b c d Cluster (1) 65875 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 112094 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 194922 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 102835 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 108028 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 85111 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 72683 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 54774 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 87191 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 19864 R1b1 13 24 14 11 11 15 12 12 12 12 13 28 18 9 10 11 11 25 15 20 30 15 16 17 17 169170 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 31 15 16 17 17 84825 R1b1 13 24 14 10 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 16 16 16 17 39927 R1b1 13 24 14 11 11 14 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 15 16 17 106520 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 - - - - - - - - - - - - - Cluster (2) 161010 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 72309 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 Cluster (3) 51216 R1b1 13 24 14 11 11 14 12 12 13 13 13 29 17 9 10 11 11 25 15 19 29 14 15 17 18 29479 R1b1 13 24 14 10 11 14 12 12 12 13 13 28 17 9 10 11 11 25 15 19 29 14 15 16 17 Cluster (4) 75606 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 17 17 17 22971 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 16 15 17 Singleton 84049 R1b1 13 25 14 10 11 14 12 12 12 12 14 28 17 9 10 11 11 25 15 18 30 16 16 16 17 Key: compared with modal value: >2>; 2> ; 1> ; = ; <1 ; <2 ; >2< bold: fast moving markers small: GD rule differs
    16. 16. Matching & Grouping: Definitions Large projects need rigorous definition of terms & procedures to determine: (1) if two testees are a near match, (2) how matching testees are grouped, & (3) how groups should be named 16
    17. 17. Genetic Distance: Example Comparison of two 12-marker STR haplotypes 17 3 3 3 3 3 3 4 3 4 3 3 3 Haplotype 9 9 9 9 8 8 2 8 3 8 9 8 DYS 3 0 4 1 5 5 6 8 9 9 2 9 a b -i -ii Testee A 13 24 14 11 11 15 12 12 12 13 13 29 Testee B 13 24 15 11 11 15 11 12 10 13 13 29 difference 0 0 1 0 0 0 1 0 2 0 0 0 matching markers: 9/12 mismatching markers: 3/12 Genetic Distance: 4/12 Genetic Distances are useful for educational & illustrative purposes, BUT: 1. Special rules apply for multi-copy markers: DYS 385, 389, 395, 413, 459, 464, CDY & YCA11. 2. Four different models for calculating GDs: Stepwise; Infinite alleles; FTDNA hybrid, old & new. 3. GDs take no account of differing average mutation rates for each marker: e.g. av. rate of CDY is 400 times that of DYS494.
    18. 18. TiP (Time Predictor) 18 TiPs - allow for different average mutation rates for each marker - are FTDNA’s most sophisticated tool for matching; BUT - appear complicated and slow; - derivation is “opaque”, and liable to be updated; - 2 decimal places (e.g. 96.73%) is misleading; - limited to FTDNA testees.
    19. 19. “TiP Score” TiP Score: - simple, arbitrary tool for project management; - 24-generation, no-paper-trail TiP at highest available resolution; - best available indicator of the probability of two testees sharing a common ancestor within the surname era; - avoids problems of Genetic Distances & matrices; - nearest whole % (e.g. 97%) sufficient;
    20. 20. Matching A “near match” is a rule-of-thumb, arbitrarily chosen, to determine if two participants share a common ancestor within the surname era, i.e. in the last millennium. FTDNA list near matches on their personal yDNA “Matches” pages. They use criteria of GD = 1/12, 2/25, 4/37 or 7/67, sometimes known as “1, 2, 4, 7 rule”, or “10% rule” Some Surname project administrators use other criteria, e.g. • GD: “1, 2, 4, 6 rule”, or • GD: “0, 2, 3, 5 rule” Irwin project: • TiP Score: “60% rule” (for Irwins); “95% rule” (for non-Irwins) 20
    21. 21. False Positives & False Negatives • FTDNA’s “Matches” pages are useful for newbies, but are in fact an arbitrary compromise: • for comparing similar surnames the “10% rule” is too stringent : - 7% of Irwins show as “False Negatives” (e.g. 5/37 or 6/37); - 60% TiP Score gives better matching. • for comparing dissimilar surnames the “10% rule” is too lax : - most “Matches” are “False Positives” i.e. co-incidental; - 95% TiP Score gives better screening to identify NPEs, especially when confirmed by terminal SNP test, e.g. L555. 21
    22. 22. Grouping Assigning testees to clusters / groups / genetic families: Subjective choice of project administrator: • by haplogroup (default used in FTDNA public pages) or SNP • by genealogical feature e.g. surname spelling, or place of residence • by near matches e.g. GD matrix GD from mode TiP Score from modal participant • other features e.g. rare / idiosyncratic markers, TMRCAs, cladograms, triangulation 22
    23. 23. Genetic Distance Matrix: Example 23 Genetic Distance Matrix of eight 37-marker STR haplotypes A - B 0 - C 1 4 - D 0 1 3 - E 13 9 8 16 - F 7 11 4 9 1 - G 3 8 10 8 0 2 - H 6 2 9 7 6 10 9 - Participant A B C D E F G H Interpretation: Two genetic families: A, B, C, D and E, F, G One Singleton: HH Problems: 1-3. Problems inherent in Genetic Distance. 4. Separate matrices necessary for comparing 12, 25, 37, 67 & 111 markers. 5. Matrices are very cumbersome for large projects.
    24. 24. Irwin project – justification for use of 60% TiP Score 24 0 10 20 30 40 50 60 70 Frequency of TiP Scores Magnitude of TiP Scores from project modal haplotype
    25. 25. Irwin project : Definitions • Genetic family: 2 or more participants with TiP Scores > 60% (> 95% for dissimilar surnames). • Singleton: unassigned Irwin with TiP Score < 60%. • TiP Score: 24-generation, no-paper-trail TiP, at highest available resolution, from modal participant: probability of sharing common ancestor with modal participant within the surname era, i.e. probability of being member of genetic family. • Modal participant: participant whose genetic signature is the most typical of the members of a genetic family.25
    26. 26. Irwin project: Growth 0.12% 0.07% 0.04% 26 0 50 100 150 200 250 300 350 400 450 Nov May Nov May Nov May Nov May Nov May Nov May Nov May Nov May Nov May Nov May Nov 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Total participants Genetic families Singletons
    27. 27. TMRCA (Time to Most Recent Common Ancestor) 27 Popular tables/graphs can predict no. of generations/years back to the common ancestor of two participants. BUT • All TMRCAs are probabilities • TMRCAs based on genetic distance: - assume some single average mutation rate; - even the chosen average mutation rate may be incorrect; - ignore back mutations; - can be very misleading.
    28. 28. TMRCAs: typical margins of error when predicted by Genetic Distance 28 Genetic Most probable TMRCA 90% of TMRCAs Distance within 0/37 1 generation = 30 years 0 - 290 years 1/37 3 generations = 90 years 0 - 450 years 2/37 6 generations = 180 years 65 - 580 years 3/37 9 generations = 270 years 110 - 710 years 4/37 12 generations = 360 years 165 - 825 years 5/37 15 generations = 450 years 220 - 930 years Assumptions: average mutation rate =0.0042 per generation 1 generation =30 years Source: www.dna-project.clan-donald-usa.org/tmrca.htm
    29. 29. NPEs: synonyms • Non-paternal event (from genetics) • Non-paternity event • Extra paternity event • False paternity event • False paternity • Misattributed paternity • Non-patrilineal transmission • Male introgression • Ancestral introgression • Undocumented Adoption • Not the Parent Expected • Surname discontinuity • Surname Discontinuity Event (my preferred term) 29
    30. 30. NPEs: possible causes Narrow definition (used in genetics): • Surrogacy: not yet likely in context of genealogy • Illegitimacy outside marriage: boy taking maiden name of mother • Infidelity within marriage: boy taking surname of mother’s husband Wider definition (when surname & DNA don’t match) also includes: • Re-marriage: boy taking surname of step-father • Adoption, incl. orphan, waif: boy taking surname of guardian • Formal name-change: man taking maiden name of wife or mother • Informal name-change, or alias: man taking name of farm, trade or mother • Anglicisation of gaelic or foreign surname • Error in genealogy Similar symptoms , but not a NPE if father didn’t use a hereditary surname: • By-name: man taking name of farm, trade or origin • Tenant or vassal: man taking surname of landlord or chief • Apprentice or slave: man taking surname of master 3030
    31. 31. Manifestations of NPEs • Egressions from a genetic family (“e-NPEs”): same DNA, but different surname e.g. Irwin DNA, but Elliot surname (possibly an Elliot step-father) • Introgressions into a genetic family (“i-NPEs”): same surname, but different DNA e.g. Elliot DNA, but Irwin surname (possibly an Irwin step-father) “One project’s e-NPE is another project’s i-NPE”. 31
    32. 32. Examples of Irwin / Elliot e- NPEs 32 ...........Elliott ...........Elliott ...........Elliott ...........Elliott ...........Irving ...........Erwin ...........Elliott ...........Erwin ...........Nipper ...........Irvine ...........McDonald ...........Armstrong ............Irwin ............Snowdon
    33. 33. Examples of Elliot / Irwin i- NPEs 33 .......... Elliott ............Fairbairn ............Fairbairn ............Elliott ............Elliott ............Elliott ............Elliott ............Farms ............Fairbairn ............Fairbairn ............Fairbairn ............Fairbairn ............Fairbairn ............Fairbairn
    34. 34. Recognising & handling NPEs e-NPEs: testee finds near matches with another surname, & asks admin. to join this second surname project. NB Need stringent matching criteria or evidence of NPE. i-NPEs: administrator finds near matches with another surname, & creates a new genetic family within in his project. NB i-NPEs are a sensitive subject which may disappoint testees, even if they accept the ‘event’ was not necessarily an illegitimacy or infidelity. For all NPEs, if cause & date of the ‘event’ are not known, seek evidence that the two surnames were once neighbours. 34
    35. 35. 35 ID Haplo 12 25 group 393 390 394 391 385 385 426 388 439 389 392 389 458 459 459 455 454 447 437 448 449 464 464 464 464 a b -1 -2 a b a b c d Cluster (1) 65875 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 112094 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 194922 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 102835 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 108028 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 85111 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 72683 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 54774 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 87191 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 19864 R1b1 13 24 14 11 11 15 12 12 12 12 13 28 18 9 10 11 11 25 15 20 30 15 16 17 17 169170 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 31 15 16 17 17 84825 R1b1 13 24 14 10 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 16 16 16 17 39927 R1b1 13 24 14 11 11 14 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 15 16 17 106520 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 - - - - - - - - - - - - - Cluster (2) 161010 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 72309 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 Cluster (3) 51216 R1b1 13 24 14 11 11 14 12 12 13 13 13 29 17 9 10 11 11 25 15 19 29 14 15 17 18 29479 R1b1 13 24 14 10 11 14 12 12 12 13 13 28 17 9 10 11 11 25 15 19 29 14 15 16 17 Cluster (4) 75606 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 17 17 17 22971 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 16 15 17 Singleton 84049 R1b1 13 25 14 10 11 14 12 12 12 12 14 28 17 9 10 11 11 25 15 18 30 16 16 16 17 Key: compared with modal value: >2>; 2> ; 1> ; = ; <1 ; <2 ; >2< bold: fast moving markers small: GD rule differs Irwin project: Results examples (1)
    36. 36. 36 Irwin project: Results examples (2) ID Earliest confirmed paternal ancestor Haplo- No. of Genetic Distance TiP Remarks Surname Forename born died Residence(s) group markers from Mode Score tested /12 /25 /37 /67 /111 from modal SCOTTISH BORDERS ("B") 65875 U Irwin Henry E c1813 Lancaster Co, PA R1b1 67 - - - - - - Modal participant 112094 E Urwin William 1783 1851 Co. Durham R1b1 67 0/ 0/ 0/ 0/ - 100% 194922 U Ervin John 1715 N.Ireland SC R1b1 111 0/ 0/ 0/ 0/ 0/ 100% 102835 U Armstrong 1844 1902 Co.Tyrone OH R1b1 67 0/ 0/ 0/ 0/ - 100% 108028 U Irvine Andrew 1763 1797 Ireland PA R1b1 37 0/ 0/ 0/ - - 100% 85111 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 67 0/ 0/ 1/ 1/ - 100% 5th cousin of 72683 72683 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 111 0/ 0/ 2/ 2/ 5/ 99% 5th cousin of 85111 54774 U Irving William fl.1484x1506 Bonshaw, Dumfriesshire R1b1 67 0/ 0/ 2/ 3/ - 99% 87191 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 0/ 0/ 1/ 2/ - 99% brother of 19864 19864 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 1/ 2/ 3/ 4/ - 99% brother of 87191 169170 E Irvine John 1662 1732 Eskdale, Dumfriesshire R1b1 37 0/ 1/ 3/ - - 99% Mt. Everest line 84825 U Erwin Matthew c1695 Co.Antrim? NC R1b1 67 1/ 3/ 5/ 5/ 7/ 98% False negative 39927 C Elliot Simon 1897 1955 Co.Fermanagh R1b1 37 1/ 2/ 4/ - - 98% e-NPEs 106520 U Irvin Joe 1744 MD R1b1 12 0/ - - - - 91% NPE Elliot (1) ("NE1") 161010 U Irwin Hiram 1815 Ireland? IL I1 67 13/ 28/ 39/ 55/ - 0% ) 100% with Elliots 72309 U Irwin Andrew 1765 1824 Scotland TN I1 37 13/ 28/ 40/ - - 0% ) i-NPEs ORKNEY (1) ("O1") 51216 U Irving Christe fl. 1468 Shapinsay, Orkney Isles NY R1b1 37 2/ 6/ 11/ - - 16% Washington Irving 29479 E Irvine George c1705 1742 Sandwick, Orkney Isles R1b1 37 3/ 6/ 11/ - - 18% author of this paper IRISH - Munster ("IM") 75606 U O'Ciarmhacain/Irwin Eoin 1785 1845 Limerick, Ireland NJ R1b1 67 2/ 8/ 16/ 19/ - 1% gaelic; catholic 22971 I Irwin William 1840 Limerick, Ireland R1b1 67 2/ 9/ 17/ 20/ - 1% Singleton 84049 U Irwin William c1770 c1810 Leinster, Roscommon R1b1 37 5/ 9/ 16/ - - 2%
    37. 37. Irwin project: Genetic Families And we thought Irwin was a single-origin surname! *: with 262 members this is apparently the largest genetic family in any surname project. 37 Origin Genetic % of 392 of which Families participants e-NPEs Scotland Borders* 1 67% 17% i-NPEs 15 10% 0 Aberdeenshire 1 1% 0 Forfarshire 1 0% 0 Perthshire 1 1% 0 Orkney 2 2% ?1% Shetland 1 1% 0 Unknown 6 3% ?0-3% Ireland 4 4% 1% Germany/ Netherlands 1 2% 0 Africa 1 0% 0 Singletons - 9% ? Total 34 100% 13-16%
    38. 38. 38 EXAMPLE OF TRIANGULATION Crystie Irwing Irvings were first Magnus (Irving) fl. 1468, -a1504 recorded in Orkney fl. 1470 IRVINGS OF ORKNEY first of Sabay in 1369 Clovigarth showing the two lines of descent John m ? …………. identified by DNA tests fl.1483,-1519x22 heiress (Clovigarth) Sabay of Yesnaby James John m2 Katherine Kirkness m1 ........ Irving fl.1534, -1567 fl.1534 , -1597/8 fl.1561 (Clovigarth) Sabay; Law man of Orkney Overgarson heiress of Overgarson? ? Magnus William William James Alexander fl.1536, -1614 fl.1601 -1614 -1612 fl.1601 Shapinsay Sabay Clovigarth Overgarson Yesnaby Thomas Patrick Magnus Alexander Alexander c1570-p1646 fl. 1582, -a1614 fl.1583, -1649 -1629 c1600-1642 Quholm Overgarson Lie Yesnaby ? William Magnus Patrick George c1610- c1601-1626 -1657 fl. 1635x78 c1628-c1700 last of Sebay Overgarson Lie Yesnaby George David James fl.1650, -1702x11 fl. 1673x1701 c1660-c1705 Overgarson Lie Yesnaby Magnus Patrick 1650- fl.1711x29 John Magnus Hary (2) Duncan (1) Edward Edward 1682-a1746 1685-p1731 c1705-p1768 c1700-1749 1704-1756x64 1707-1796 Quholm Skaebreck Overgarson Lie Quoyloo James William John Edward George c1734-1797 1731-1807 ? c1736-p1792 c1735-c1791 c1750-1800 Quholm; NY Skaebreck Overgarson Quoyloo James Ebenezer John m Jannet Edward Peter George 1759-1835 1776-1868 -1808x21 Irvine 1774-1833x41 1741-p1772 c1750-1800 New York Washington Huan 1754-1832x41 Overgarson Lie Quoyloo 1783-1859 author FTDNA Kit. No. 174038 51216 29479 169056 174074 199671 Test sequence 4th= 2nd 1st 3rd 4th= 6th Genetic family "Orkney 1" "Orkney 2"
    39. 39. Irwin project: Geographic origins 39 Participant's Residence of Historic origin place of earliest confirmed of residence paternal ancestor genetic family Project size 392 392 392 USA 77% 21% - Canada 6% 1% - Australia, New Zealand 6% - - England & Wales 5% 3% - Ireland (NI & Eire) 1% 40% 5% Scotland 5% 23% 84% Germany, Netherlands - 1% 2% Unknown, other - 10% 9%
    40. 40. 40 Irvine, Ayrshire Irwin project: 1200 Scottish ancestral lines as shown by DNA tests 1300 Borders X Drum, Aberdeenshire X 1400 Orkney1 Orkney2 1500 Eskdale Bonshaw Dumfries 11 other & Castle lines Irvine 1600 X Perth X Shetland 1700 1800 BE BB BD BA, Bel, Ber, B9, B10, B14, B15, B16, B17, B23, B29 Eskdale
    41. 41. Irwin project : Borders Family Cladogram 41
    42. 42. Irwin project: The 15 sub-groups of the Borders family (pre-BigY) - SNP L555 recognised by ISOGG in mid-2012 - 50 tests to date, nil “L555-” results by Irwins or NPEs 42 L21 Totals Z251 L555 mode DYS DYS DYS DYS DYS DYS DYS DYS DYS DYS DYS YCA DYS un- 617 576 449 442 447 459b 391 570 534 438 570 11b 449 assigned =11 =17 = 31 = 13 =27 = 9 = 10 =14 = 15 = 16 = 17 = 23 = 29 No. of members 34 16 15 19 6 3 11 32 7 4 5 18 7 16 67 262 excl. NPEs & <37 markers 202 US descendants? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Irish ancestors? Yes Yes Yes Yes Yes ? Yes Yes Yes ? Yes Yes Yes Yes Yes Scotish origin ? Bonshaw Dumfries Eskdale ? ? ? ? ? ? ? ? ? ? - NPE surname - - - - Elliot Errand - - - - - - - - - Code BA BB BD BE Bel Ber B9 B10 B14 B15 B16 B17 B23 B29 BX TMRCA ( by STRs) 1800 1750 1050 850 1700 1300 750 1200 BC200 1700 Earliest genealogy 1700 1500 1565 1600 1800 1850 1700 1700 1750 1700 1800 1650 1650 1650 var. L555 Yes Yes Yes Yes Yes ? Yes Yes Yes Yes ? Yes Yes Yes (Yes)
    43. 43. The two types of y-DNA test 43 STR tests metaphor: "individual leaves on a tree" used for: comparing genetic signatures Sequencing Sanger Next Generation quantification analogue probabilistic expressed as counts of markers quality of base pairs FTDNA y- tests 12/25/37/67/111 markers Single SNP SNP Pack BigY use in Surname projects main tool haplogroup BigY advanced tool projects: confirmation support secondary data haplogroup prediction STR and mt data SNP ('snip') tests "branches and twigs" building phylogenetic tree Sanger binary e.g. L21+ or L21-
    44. 44. Irwin project: Phylogenetic treeThe genetic "Adam" 200,000-300,000bp M42 M168 70,000bp M89 M9 M45 M96 M170 M304 M207 30,000bp E I s1I J R (years before present) P147 L68 M253 NE1 NKr M267 M172 M173 25,000bp E1 I2 I1 NC ND J1 NG J2 R1 P177 L46 M410 M513 M343 16,000bp P2 L135 CLAN IRWIN PHYLOGENETIC TREE L26 M439 UD P25 M2 AF M223 IL as at 1 Nov. 2015 M67 UJ P297 12,000bp showing tested members of Irwin genetic families in green, M269 NBt NJ NKd NL UN U3 U4 U5 and FTDNA's predictions of Irwin genetic families in red. L23 Mesolithic See Borders Irwin phylogenetic tree for L555 BigY results L51 PF7589 G L151, P311 Atlantic Modal Haplotype U106O2 P312 SF 5,300bp-Neolithic S263 DF27 Z195 M269+, L21- DA L21 NR 4,000bp S264 L176.2 Z274 DF63 DF13 DF96 Z262 Z209 NN CTS6919 DF49 - b DF21 - h CTS4466 Z251 R1b12a1a2c1a - c R1b12a1a2c1g - i R1b12a1a2c1l R1b12a1a2c1j - d - k - e - m L1 NBl M167 O1 A92 DF23 - f Y11277 - n Z21065 - S1156 Z16943 - FGC13899 Z16506 Z2961 Z16294 A541 CTS4157 Z16944 Pre-surname era BA BB BD BE Bel Ber BY674 NM M222 PFNF Z16281 NE2 A195 IM1 FGC7549 L555 B9 B10 B14 B15 B16 PF IM2 B17 B23 B29
    45. 45. Part 2: BigY and BAM data – use and interpretation 45
    46. 46. Example of Williamson’s “BigTree” www.ytree.net 46
    47. 47. Irwin project, ex “BigTree” 47 R-P312 ZZ37 L21 Z29644 DF63 DF13 Z29645 A91 DF21 FGC11134 Z251 Z29646 A92 S5488 Z16250 Z16943 S11556 FGC13899 Z29647 Z16506 Z16294 CTS4466 Z16944 CTS4157 A6077 BY674 Z16281 Z21065 S1115 L555 Z16929 Z16932 Z16935 Z16937 Z16940 Z16945 Z16949 FGC19531 14750280AA A2201 V38 A4257 L557 Z16930 Z16933 S20749 Z16938 Z16941 Z16946 Z17660 FGC19533 16344314TT Z16282 A195 Z21065 L561 Z16931 Z16934 Z16936 Z16939 Z16942 Z16947 Y5816 FGC19536 FCG34569 -21368012GA 6966393AG 17193400CA CTS11273 FGC4341 9166578AC 7583420GA 16630774GA 7581395GT FGC19532 14209909CT 8531427CT 8356286CT 7244870AG 17319595GA 15093112GA 15218377TA 10007460CT 14268577CT 22487613GT FGC19534 16967721CA 16561158AG 17417800AC 7940600GA 19263733TA 1554 21519299GA 19166468GA 19048311TC FGC19535 17371426CT 21515424TA 20809987AC 8311955CA 21782548TG 22479673GC 23804663GA 19201889CG FGC19537 21030091GA 21950915GT 23427085GA 16737596AT 24479734TC FGC19538 22164909TC 17357906TA FGC19539 17851993CG 16344316TT 19262306GA 18982587GA 21306828GA 18982595GA 22461683GT 218190 75606 Withers Irvine Irwin Breen Burgin NM IM Broadley Reams 328617 3722 A3093 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Bradley Hardage Irvine Flanagan Whitaker Irvin Irvine Irvin Irvin Irving Ervin Irving Irwin Erwin Ervin Cunningham Irving Clarke Fortner singleton NE2 (NPE) IU (NPE) B14 BX BA B29 BB B23 B17 B9 B10 BE BX (NPE) BD Donatella Desmond
    48. 48. Irwin project: BigY goalsInitial goals • manage and understand BigY results • set up cloud account to share project data Interim goals • minimise dependence on 3rd party analysis tools • focus on our large L555 (“Borders”) genetic family • facilitate 1 BigY test for each of 10 main sub-groups • confirm/refine project phylogentic tree and TMRCAs Current goals • facilitate FTDNA offering a low-cost L555 “SNP Pack” test • use SNP Pack data to refine individual TMRCAs NB I am giving low priority to “naming” novel variants and having them placed on the phylogenetic trees of FTDNA and ISOGG, at least until a robust understanding of the structure of L555 sub-branches has emerged. 48
    49. 49. Example of limitations of algorithm-based analyses of BigY test results: the Private SNPs of FTDNA L555 Kit no. 65048 49 FGC YFull Williamson "DIY" Name Position vcf csv** Analysis *** incl. In No. of No. of Consistency SNP Big Tree?* reads Indels of SNP reads status FGC19532 8557914 G A Pass, I variant Known SNP, High conf. Private >95% B100 yes 75 0 100% Probable FGC19534 16642304 G C Pass, I variant Known SNP, High conf. Private >95% B100G yes 48 0 100% Probable FGC19535 16956346 T G Pass, I variant Known SNP, High conf. Private >95% B100 yes 81 0 100% Probable FGC19537 18668146 C A Pass, I variant Known SNP, High conf. Private >95% C 98 yes 47 0 98% Probable FGC19538 18775426 C T Pass, I variant Known SNP, High conf. Private >95% B100 yes 64 0 100% Probable FGC19539 19436082 G A Pass, I variant Known SNP, High conf. Private >95% C 96 yes 40 0 98% Probable - 18982587 G A - Novel variant, High conf. - - - 34 0 94% unstable - 18982595 G A - Novel variant, High conf. - - - 32 0 97% unstable - 13226006 C A - - Private >40% - - 2 0 100% possible - 13571571 C T - - Private >40% - - 2 0 100% possible - 10064260 C T - - Private >40% - - 2 0 100% possible - 16275572 C A - - - M100 - 2 0 100% possible A608 7534406 G T - Known SNP, High conf. * - - 94 55 67% no - 16344316 TC T Pass, I variant - -/a - - 73 0* 100% no CTS10214 19328796 G T Rej'd*, 1 variant - - 1 read - 1 0 100% no PF3499 14624254 C T - - - >1 read - 29 0* 100% no *: no BED coverage **: FTDNA list 73 other ***: FGC and YFull's *: AW lists *: Indel in high conf. Novel variants, analyses have many 20 other low others tests of which 13 appear to be more low confidence conf. Private private to 65048 private markers markers BAM dataFTDNA Bases Variant
    50. 50. Analysis options for BigY test results 50 FTDNA BAM file Computerised algorithms ("science") Manual refinement ("art") FGC YFull FTDNA vcf file Analysis Analysis FTDNA csv file Haplogroup projects e.g. "Big Tree" FTDNA Matches Surname project admins "DIY" Detecting & Filtering Quality - High level SNPs - Old SNPs - Regions - Terminal SNPs - Intermediate SNPs - SNPs/Indels - Novel SNPs - Private SNPs - No.of Reads - Unique SNPs - Consistency of Reads - Compatibility within sub-clade - Stability across haplogroup - Phylogenetic trees -TMRCAs
    51. 51. Process for “DIY” BigY analysis1. Create project cloud account ; upload VCF, BAM & BAM.BAI files. 2. Identify relevant variants from CSV & Matches data, Walsh & Williamson (& FGC/YFull Analyses, if used). 3. Use BAM IGV viewer to: (1) filter relevant variants: A: pre-L21 (shared by all L555 testees) B: L21-L555 ( ” ) C: L555 block (shared only by L555 testees) I : Intermediate (shared by some L555 testees) Pn: Private (unique to each testee). (2) determine SNP quality for each variant: “Probable” if >10 reads AND consistency >85% “possible” if 2-9 reads OR consistency 70-85% “No” if 1 read, OR consistency <70%, OR Indel, OR unreliable region. 4. Consider stability of SNP quality vs. that for closely-related BigY testees.51
    52. 52. BAM analysis Example: 1: Use of BAM IGV Viewer www.broadinstitute.org 52
    53. 53. BAM analysis Example: 2: Construct matrix of relevant variables and closely-related BigY testees 53 Named Position 1 - 22874 2 - 311268 6 - N126337 Variant on Genome Irvine - BX C'ningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Reference Alternative Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% CTS11273 23045843 T A DF13 2836431 A C FGC19532 8557914 G A FGC19534 16642304 G C Synonyms and positions of FGC19535 16956346 T G named variants FGC19537 18668146 C A (shown in red) FGC19538 18775426 C T are derived from FGC4341 8757882 A G ybrowse L21 15654428 C G (www.ybrowse.isogg.org) L555 7647335 G T PF496 13297909 T G PF6729 10022033 A G PR1489 14543997 C C Z16940 22470652 T T Z16946 8014468 G A Z16949 7933047 T TAA CAZ251 8736334 G A 8531427 C T 13226006 C A 13294119 T T 13801126 A G 15093112 G A 15218377 T A 16561158 A G 16630774 G A 17319595 G A 18982595 G A 21368012 G A A G G A A A 32 0 94 21515424 T A 21782548 T G 21950915 G T 22487613 G T 23898645 T C 24479734 T C Base 5 - 230264- 2264263 - 65048
    54. 54. BAM analysis Example: 3: Enter BAM data, sort & filterBlock Named Position 1 - 22874 2 - 311268 6 - N126337 SNP Comments Variant on Genome Irvine - BX Cunningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Category Reference Alternative Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Block B L21 15654428 C G G 59 0 100 G 71 0 98 G 60 0 100 G 69 0 96 G 33 0 97 g 18 0 78 L21 to DF13 2836431 A C c 3 0 100 c? 1 0 100 C 11 0 91 c 6 0 100 c 6 0 100 c 2 0 100 Poor qualities -surprising L555 Z251 8736334 G A a 4 - 100 - a 6 0 100 a 14 0 100 a 2 0 100 ?a 7 0 57 Poor qualities -surprising Block C L555 7647335 G T T 51 0 100 T 54 0 98 T 76 0 100 T 91 2 100 T 36 0 100 t 9 0 78 Probable L555 Z16946 8014468 G A A 50 0 94 A 125 0 100 A 49 0 100 A 73 0 100 A 22 0 100 A 25 0 88 Probable Z16940 22470652 T T C 53 0 96 C 52 0 88 C 72 0 89 C 44 0 89 C 53 0 100 C 59 0 86 No Unreliable region Z16949 7933047 T TA T 46 39 100 T 76 75 95 T 38 39 100 T 47 47 100 T 54 47 100 T 94 68 100 No Indel Intermediate FCG34569 21368012 G A A 85 0 100 G 147 0 90 G 82 0 100 A 80 0 99 A 48 0 98 A 32 0 94 Probable Block PF496 13297909 T G g 71 0 73 t? 21 0 67 T 15 0 100 T 15 0 93 T 21 0 100 g 85 0 65 No conflicts with FCG34569 Private 17319595 G A A 23 0 87 G 24 0 100 G 27 0 100 G 58 0 100 G 24 0 100 G 78 0 100 Probable block for 21782548 T G G 79 0 100 T 174 0 100 T 93 0 100 T 97 0 100 T 35 0 97 T 27 0 100 Probable 1 -22874 PF6729 10022033 A G g 7 0 86 a? 8 0 85 a 4 0 100 a 11 0 64 ?a 6 0 83 ?a 5 0 60 possible Private 8531427 C T C 63 0 100 T 47 0 98 C 44 0 100 C 47 0 100 C 69 0 100 C 72 0 100 Probable block for 16561158 A G A 17 0 100 G 34 0 100 A 23 0 100 A 41 0 100 A 14 0 100 A 16 0 100 Probable 2 -311268 21515424 T A T 45 0 100 A 59 0 98 T 49 0 100 T 77 0 99 T 42 0 100 T 45 0 100 Probable 21950915 G T G 47 0 100 T 63 0 94 G 61 0 100 G 54 0 100 G 29 0 100 G 42 0 100 Probable 13801126 A G c 1748 10 81 G 2281 0 89 c 1144 1 76 c 1658 7 71 ?c 1083 28 57 ?c 1676 53 63 No Indel Private FGC19532 8557914 G A G 59 0 100 G 99 0 98 A 75 0 100 G 93 0 100 G 31 0 100 G 101 0 100 Probable block for FGC19534 16642304 G C G 58 0 100 G 77 0 100 C 48 0 100 G 67 0 100 G 45 0 100 G 21 0 100 Probable 3 -65048 FGC19535 16956346 T G T 90 0 100 T 139 0 95 G 81 0 100 T 53 0 100 T 87 0 100 T 102 0 100 Probable FGC19537 18668146 C A C 29 0 100 C 53 0 100 A 47 0 98 C 64 0 100 C 21 0 100 C 44 0 100 Probable FGC19538 18775426 C T C 59 0 100 C 128 0 100 T 64 0 100 C 58 0 100 C 48 0 100 C 18 0 100 No appears elsewhere in L21 13226006 C A c 4 0 100 c 4 0 100 a 2 0 100 c 6 0 100 c? 1 0 100 C 31 0 100 possible Private 16630774 G A G 65 0 100 G 44 0 100 G 42 0 98 A 32 0 100 G 59 0 100 g 6 0 100 Probable block for 22487613 G T G 119 0 98 G 127 0 93 G 101 0 99 T 67 0 88 G 205 0 99 G 184 0 100 Probable 4 -22642 PR1489 14543997 C C c 4 0 100 - c? 1 0 100 a 2 0 100 c 8 0 100 c 5 0 80 possible Private 15218377 T A T 22 0 100 T 41 0 100 T 31 0 100 T 51 0 100 A 10 0 100 T 40 0 100 Probable block for 24479734 T C T 91 0 100 T 143 0 100 T 80 0 100 T 51 0 100 C 58 0 98 T 72 0 100 Probable 5 -23026 FGC4341 8757882 A G A 24 0 100 A 45 0 98 A 35 0 100 A 51 0 100 g 9 0 100 a 4 0 100 possible note marginal no. of counts Private 23898645 T C t 56 0 84 t 109 0 78 t 71 0 80 t 90 0 71 t 45 0 80 C 27 0 85 Probable block for 15093112 G A G 98 0 100 G 74 0 99 G 76 0 100 G 34 0 100 G 104 0 100 a 137 0 84 possible note marginal consistency 6 -N126337 13294119 T T C 32 0 100 C 35 0 100 C 25 0 92 c 74 0 62 C 18 0 100 t 10 0 70 possible 5 - 230264- 2264263 - 65048Base
    55. 55. L555 BAM analysis Results 55 BigY - L555 data as of 21 Oct 2015, by James Irvine, based on initial work by Dennis Wright JamesIrvine: DennisWright: FTDNA: VCF(1): A if Quality >500 Alex Williamson: Mike Walsh (1): FGC: YFull: All: Stage/Block: ) Lower case .BAM data: A Capitals: tested A .bam: not seen in,vcf -"Good" CSV: a if Quality <500 y included as per DW 9 Tree, official S shared, 99, 95% - no entry A: Adam - L21, shown at foot of table ) IF <50% are A IF >85% AND no. ofreads >10 g Rejected, "1"qual. >500 a .bam: not seen in,vcf -"Weak" n Novel VCF(2): P pass p privste, not terminal 8 Tree, draft 3 Multi family/surname s shared, 40% m >1 read B: L21 - L555 ) "good" BAM a IF 70-84%OR no. ofreads 2-9 - Rejected, "1"qual. <500 ? inconclusive: 1 or 2 samples, multiple bases k Known- R rejected ? private, "?" 7 Public, consistent 2 Singe family/surname P private, 99, 95% s 1 read C: L555 ) data a? IF no.of reads 1 ? Inconclusive, "0/1" a/- no .bam test result H High conf. 0 ancestral ; 2 entries for 1 SNP! 6 Public, semi-cnstnt 1 Single individual p private, 40% intermediate Between C and P ) Italicsin cols. G & H A Private to individual Shared SNPs which DW ignores M Med. Conf. 1 derived 4 Public, unsure -1 Unstable confirmed * private, 10% P1, P2, P3 .... Private: unique to 1 test) additional to DW T Inconclusive SNP Unstable region - 22216800-22512940 (T Krahn) u Unknown conf. 0/1 1 & R 1 - 22874 2 - 311268 6 - N126337 7-54774 8 - 364399 9 - 280156 10 - 87191 11- 160045 12 - 280599 Irvine - BX Cunningham - BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Irving - BB Ervin - BE Ervin - B23 Irving - BD Irwin - B9 Irvin - B14 SNP (Variant/ Indel) Remarks Stage/Block Position b37 Reference Alternative Alternative reads Indels Derived/reads% vcf(1)FTDNA vcf(2)FTDNA csvFTDNA AWilliamson MWalshStage FGC YFull Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csvFTDNA AWilliamson MWalshStage FGC Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csv AWilliamson MWalshStage FGC YFull Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csvFTDNA csv:N:NovelV.;H:HighConf. MWalshStage Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csv MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage MWalsh-Total Block B: L21 to L555 L21/S145/M529 B 15654428 C G G 59 0 100 y 9 G 71 0 98 y 9 G 60 0 100 9 G 69 0 96 G 1P G kH 9 G 33 0 97 9 g 18 0 78 9 G 46 0 100 G 60 0 100 G 50 0 100 G 61 0 100 G 26 0 96 G 67 0 94 DF13/S521/CTS241 b 28364318 A C c 3 0 100 c 1P - y 9 c? 1 0 100 c 1R y - c 11 0 91 y 9 c 6 0 100 C kH 9 c 6 0 100 9 c 2 0 100 9 c 7 0 86 c 8 0 100 c 7 0 88 C 18 0 100 c 4 0 100 c 5 0 100 Z251/S470 b 8736334 G A a 4 - 100 a 1R ku y S m - - y ? a 6 0 100 y s m a 14 0 100 - 1R ? k?u a 2 0 100 ?a 7 0 57 a 7 0 100 a 6 0 83 a 8 0 100 a 9 0 100 a 1 0 100 A 14 0 100 Z18600 FGC only, not covered by BigY 25633952 G A Z16943 B 6351101 T A A 46 0 100 A 1P nH y 7 - - A 62 0 97 A 1P nH y 7 - A 51 0 100 nH y 7 - - A 74 0 100 A nH 7 A 53 0 96 A 1P nH 7 A 71 0 90 7 A 69 0 100 A 66 0 87 A 77 0 100 A 107 0 97 A 75 0 100 A 80 0 100 Z16944 DW had as P1 B 7527372 G A a 37 0 84 - -! y;p? - - - A 24 0 100 A 1P nH y 7 P A 26 0 100 nH y 7 P - A 29 0 100 A 1P A kH 7 A 45 0 100 A 1P nH 7 A 80 0 90 7 A 48 0 100 A 40 0 98 A 66 0 95 A 61 0 98 A 67 0 A 34 0 100 CTS4157/S3741brother of Z16944 (AW); public block? B 15439136 G A G 15 0 100 G 0P kH - - - - G 18 0 100 g 0P kH - - - G 25 0 100 kH y - - - G 38 0 100 G 0P G kH - ?g 4 0 100 g 0P - - - g 6 0 100 G 14 0 100 G 10 0 100 G 10 0 100 g 5 0 100 G 17 0 100 FGC13746 public block withFGC7549? (Donatella) B 9375616 G T T 38 0 97 T 1P nH - 4 - - T 112 0 99 T 1P nH - 4 - T 45 0 100 nH - 4 - - T 64 0 100 T 1P T nH 4 T 38 0 100 T 1P nH 4 T 17 0 82 - T 40 0 98 T 48 0 98 T 36 0 92 T 53 0 100 T 45 0 100 T 59 0 100 FGC8673 public block withFGC7549? (Donatella) B 9852985 A G G 19 0 100 nH y 4 - - g 114 0 75 nH y 4 - G 52 0 100 nH y 4 - - G 38 0 97 G nH 4 G 14 0 100 nH 4 ?g 5 0 40 - g 12 0 83 G 10 0 100 g 7 0 100 G 20 0 100 G 59 0 100 G 10 0 100 -AW found 2015H1 B 22424486 A A A 88 0 86 A 98 0 100 A 61 0 95 A 67 0 97 A 123 0 86 a 92 0 85 a 62 0 84 A 78 0 90 A 88 0 85 A 83 0 85 A 218 0 94 A 61 0 90 Block C: L555 L555/S393 C 7647335 G T T 51 0 100 kH y 7 - m T 54 0 98 T 1P kH y 7 - T 76 0 100 T 1P kH y 7 - m T 91 2 100 T 1P T kH 7 T 36 0 100 T 1P 7 t 9 0 78 - T 35 0 100 T 52 0 100 T 61 0 100 T 43 0 100 T 25 0 94 T 52 0 100 L557/S394 DB omission? C 22513691 C G G 54 0 100 G 1P kH y 7 P m G 106 0 95 G 1P kH y 7 P G 68 0 100 G 1P kH y 7 P m G 80 0 100 G 1P G kH 7 G 41 0 98 G 1P 7 ?c 12 0 58 - G 76 0 99 G 73 0 100 G 75 0 99 G 88 0 100 G 55 0 93 G 61 0 100 Z16945 C 7536923 A G G 29 0 100 G 1P nH y 7 - - G 38 0 84 nH y 7 - G 28 0 100 nH y 7 - - G 34 0 100 G 1P nH 7 G 37 0 97 nH 7 g 10 0 76 - G 26 0 96 G 31 0 97 G 43 0 95 G 39 0 100 G 76 0 99 G 18 0 100 Z16946 C 8014468 G A A 50 0 94 nH y 7 - - A 125 0 100 nH y 7 - A 49 0 100 nH y 7 - - A 73 0 100 A nH 7 A 22 0 100 nH 7 A 25 0 88 7 A 33 0 100 A 54 0 95 A 51 0 96 A 75 0 100 A 62 0 98 A 49 0 100 Z16929 c 13493784 A G G 29 0 97 nH y 7 - - G 69 0 97 nH y 7 - G 35 0 100 nH y 7 - - G 45 0 100 G nH 7 g 4 0 100 - - - G 16 0 94 G 21 0 100 G 23 0 100 G 30 0 100 G 10 0 100 G 27 0 100 Z16930 C 15625978 A G G 51 0 100 G 1P nH y 7 - - G 52 0 92 G 1P nH y 7 - G 45 0 100 nH y 7 - - G 102 0 97 G 1P G nH 7 G 35 0 100 nH 7 g 4 0 100 - G 71 0 100 G 78 0 99 G 101 0 96 G 106 0 100 G 49 0 98 G 80 0 98 Z16931 C 16433477 T C C 52 0 100 nH y 7 - - C 80 0 99 nH y 7 - C 60 0 100 nH y 7 - - C 39 0 97 C nH 7 C 53 0 98 nH 7 C 76 0 86 7 C 39 0 92 C 43 0 91 C 78 0 99 C 89 0 100 C 49 0 98 C 51 0 100 Z16932 C 17236526 C T T 34 0 100 nH y 7 - - T 65 0 100 nH y 7 - T 60 0 95 nH y 7 - - T 39 0 100 T nH 7 T 24 0 100 nH 7 t 25 0 84 - T 32 0 97 T 42 0 98 T 46 0 98 T 50 0 100 T 23 0 94 T 27 0 100 Z16933 C 17438536 G C C 25 0 100 nH y 7 - - C 24 0 100 nH y 7 - C 26 0 100 nH y 7 - - C 26 0 100 C nH 7 C 15 0 100 nH 7 t? 1 0 - C 19 0 100 C 25 0 96 C 25 0 100 C 21 0 100 C 23 0 100 C 30 0 100 Z16934 C 17448751 G C C 16 0 100 nH y 7 - - C 19 0 100 nH y 7 - C 21 0 100 nH y 7 P - C 28 0 100 C nH 7 c 5 0 100 - C 15 0 87 - C 17 0 100 C 22 0 95 C 22 0 100 C 35 0 100 c 2 0 100 C 13 0 100 Z16935 C 17612482 C T T 46 0 95 nH y 7 - - T 145 0 97 nH y 7 - T 91 0 100 nH y 7 P - T 91 0 99 T nH 7 T 44 0 100 nH 7 T 46 0 89 7 T 31 0 97 T 61 0 97 T 77 0 100 T 64 0 98 T 103 0 99 T 60 0 100 S20749 C 18171989 C T T 40 0 95 nH y 7 - - T 30 0 97 nH y 7 - T 36 0 100 nH y 7 - - T 69 0 100 T nH 7 T 41 0 100 nH 7 t 35 0 74 - T 48 0 100 T 57 0 100 T 63 0 98 T 75 0 100 T 28 0 100 T 49 0 96 Z16936 C 19094859 T C C 26 0 100 nH y 7 - - C 61 0 97 nH y 7 - C 57 0 100 nH y 7 - - C 60 0 98 C nH 7 C 19 0 89 nH 7 C 22 0 91 7 C 37 0 97 C 51 0 100 C 35 0 100 C 53 0 92 C 15 0 100 C 38 0 100 Z16937 C 19200522 G T T 71 0 99 nH - 7 - - T 109 0 97 nH - 7 - T 97 0 98 nH y 7 P - T 83 0 100 T nH 7 T 50 0 100 nH 7 t 101 0 85 7 T 64 0 98 T 112 0 100 T 87 0 100 T 96 0 100 T 62 0 89 T 63 0 100 Z16938 C 19548026 G A A 38 0 97 nH - 7 - - A 103 0 97 nH - 7 - A 52 0 100 nH y 7 P - A 77 0 100 A nH 7 A 33 0 100 nH 7 a 50 0 84 7 A 50 0 100 A 71 0 100 A 58 0 95 A 69 0 97 A 36 0 100 A 59 0 98 Z16939 C 21810487 A G G 69 0 99 nH y 7 - - G 84 0 98 nH y 7 - G 75 0 100 nH y 7 - - G 63 0 98 G nH 7 G 70 0 100 nH 7 G 110 0 85 7 G 90 0 90 G 102 0 98 G 107 0 93 G 115 0 99 G 66 0 98 G 84 0 100 Z16942 C 23130578 T A A 50 0 96 nH y 7 - - A 38 0 100 nH y 7 - A 55 0 98 nH y 7 - - A 45 0 100 A nH 7 A 22 0 100 nH 7 a 53 0 75 - A 27 0 100 A 56 0 98 A 58 0 98 A 57 0 95 A 14 0 100 A 52 0 98 Z17660 C 8877028 G C C 13 0 100 nH y 3 p - C 12 0 100 c 1P nH y 3 p c 4 0 100 c 1P -! y? - p - C 16 0 100 c 1P C nH 3 c 6 0 100 - - - ?c 13 0 69 - c 12 0 83 C 14 0 100 c 23 0 83 C 20 0 100 c 7 0 100 c 8 0 100 FGC19531 csv had both Novel & Known!; AW had P3c 6643803 C T t 8 0 100 - kH - - P - t 8 0 100 kH - - P t 9 0 100 nH Y 2 P - T 16 0 100 t 1P T nH 2 T 15 0 100 nH 2 t 5 0 80 - T 14 0 100 T 13 0 100 T 13 0 100 T 21 0 100 t 5 6 0 T 11 0 100 FGC19536 c 17576040 G C c 7 0 86 - - - c 2 0 100 - - - c 7 0 100 - - C 12 0 100 c 1P C nH 1 c 6 0 100 - c 7 0 57 - C 11 0 100 c 9 0 100 c 9 0 100 c 9 0 100 c 2 0 100 C 12 0 100 Z16940 n 22470652 T T C 53 0 96 C 1P nH y 7 - - C 52 0 88 nH y 7 - C 72 0 89 nH y 7 - - C 44 0 89 C nH 7 C 53 0 100 c 1P nH 7 C 59 0 86 7 C 36 0 97 C 35 0 86 C 39 0 87 C 55 0 93 C 119 0 96 C 26 0 88 Z16941 n 22470900 C G G 44 0 98 nH y 7 - - g 45 0 84 nH y 7 - G 18 0 100 nH y 7 - - G 31 0 97 G nH 7 G 62 0 98 G 1P nH 7 G 61 0 89 7 G 35 0 91 G 49 0 100 G 47 0 94 G 41 0 100 G 96 0 99 G 38 0 100 L561 AW has P3 FGC16164 is 2888667-672n 2888667-70 C C c 6 2 100 - c 2 4 100 - - 0 13 0 - - m c 2 14 100 - c 6 3 100 c 0P C 15 8 100 c 6 18 100 C 11 8 100 C 14 10 100 C 14 10 100 c 9 3 100 c 5 15 100 Z16947 Indel? n 18680368 T TA T 50 0 100 - - ? 3 - T 90 83 96 TA 1P 3 T 49 47 100 - T 85 0 100 - - T 31 0 100 - t 7 0 100 - T 54 0 100 T 84 0 98 T 60 0 100 T 72 0 100 T 31 0 100 T 59 0 100 Z16948 Indel? n 21613125 TA T T 49 0 100 - - ? 3 - - T 90 0 100 T 1P 3 T 90 47 100 - - - T 79 0 100 - - - T 37 0 97 - T 41 0 100 - T 65 0 100 T 79 0 100 T 86 0 100 T 88 0 100 T 39 0 100 T 56 0 100 Z16949 MW: long indel n 7933047 T TAA CA T 46 39 100 ta 1P - y 7 - - T 76 75 95 TA 1P - y 7 - T 38 39 100 TA 1P - y 7 - - T 47 47 100 TA 1P - - 7 T 54 47 100 ta 1P - 7 T 94 68 100 7 T 76 68 100 T 113 100 100 T 124 ### 100 T 125 108 100 T 48 45 100 T 94 0 88 MW: short indel n 16344311 TT T T 39 0 100 t 1P - y 3 - - T 110 0 95 T 1P - y 3 - T 10 0 100 t 1P - y - - - T 77 0 100 t 1P - - 3 T 30 1 100 - - - T 23 0 100 - T 34 0 100 T 35 2 100 T 31 0 100 T 39 0 100 T 47 0 100 T 26 0 100 AW has P3 MW: short indel n 16344316 TCT T T 39 0 100 t 1P - y 3 -/a - T 106 0 93 T 1P - y 3 -/a T 73 0 100 t 1P - y;y? - -/a - T 77 0 100 t 1P - - 3 t 5 25 100 - - - t 7 15 100 - t 3 31 100 t 6 30 100 t 5 26 100 T 8 0 29 t 8 39 100 T 5 0 21 ?covered by 18680368? n 18680369 A AA A 52 45 100 2 A 89 86 98 - A 48 47 100 - A 86 78 100 2 A 33 29 100 2 a 7 5 100 - A 55 38 100 A 65 53 100 A 64 56 100 A 79 67 100 A 31 28 100 A 61 55 100 Indel; AW had P1MW: homopolymer n 21613126 AA A A 49 1 100 a 1P - Y 2 - A 10 69 100 - - - a 4 86 100 - - A 79 0 100 - 2 A 37 0 100 2 A 41 0 100 - A 65 0 100 A 79 1 100 A 86 0 100 A 89 1 100 A 39 2 100 A 56 0 100 AW had P2 MW: long indel n 14750280 ACCA GTGT A A 13 0 100 - A 16 0 100 a 1P Y - 2 - a 4 0 100 - - A 10 0 100 - - A 22 0 100 2 a 4 0 100 - A 12 0 100 A 17 0 100 A 15 0 100 A 15 0 100 a 9 0 100 A 13 0 100 FGC16164 Indel; AW had P3MW: long indel n 2888666 CCTG G C c 8 0 100 - - - I -del c 7 0 100 - - -I -del C 13 0 100 Y 1 I -del C 16 0 100 - - c 9 0 100 - C 23 0 96 1 C 24 0 100 C 20 0 100 C 24 0 100 C 18 0 100 C 12 0 100 C 20 0 100 Indel? MW: homopolymer n 6347814 G GAG AA g? 16 0 63 - - G 115 89 93 GA 0/1R - G 78 75 95 1 - G 117 4 88 - - - - g 9 1 67 - g 2 0 100 - g 5 0 60 g 7 0 100 g 7 1 88 G 13 1 85 g 12 0 75 G 14 0 86 MW: long indel n 13550973 TTAG T T 72 0 100 - T 240 0 99 - T 150 0 100 - T 79 0 99 - T 24 0 100 - T 17 0 100 1 T 23 0 100 T 45 0 100 T 70 0 100 T 57 0 100 T 82 0 100 T 43 0 100 MW: homopolymer n 14101345 CCTT A c 6 0 83 - C 43 0 98 1 C 31 0 100 - C 36 0 97 - c 3 0 100 - c 2 0 100 - c 6 0 100 c 5 0 100 c 4 0 100 c 3 0 100 c 6 0 100 c 7 0 100 AW has P2 MW: homopolymer n 14379561 T TGA TA T 21 0 100 - T 34 31 100 tg 1P n - 1 - T 26 0 100 - T 19 0 100 - - T 40 0 100 - t 8 0 100 - T 33 0 94 T 23 0 100 T 27 19 100 T 31 0 100 T 59 0 98 T 27 0 100 MW: homopolymer n 15305844 A AAT A 16 8 100 - A 35 29 89 - a 6 2 100 - A 16 9 100 - a 6 6 100 1 a 2 2 100 - A 13 11 100 A 16 15 100 A 27 19 100 A 28 17 100 A 32 24 100 a 5 5 100 Indel? c 16344315 TTCT T T 39 0 100 - T 106 0 91 - T 71 0 100 - T 77 0 100 - T 30 0 100 1 T 22 0 100 1 T 34 0 100 T 35 2 100 T 31 0 100 T 38 0 100 T 47 0 100 T 26 0 100 MW: homopolymer n 18585796 C CAA C 33 0 100 - C 147 138 100 1 C 78 0 100 - C 64 0 100 - C 11 0 100 - c 2 0 100 - C 38 0 100 C 37 0 89 C 45 0 100 C 50 0 100 C 15 0 100 C 38 0 100 MW: homopolymer n 2746565 AA A A 55 0 100 a 1P - - 2 - A 17 0 100 - a? 1 53 100 - A 67 0 100 - 2 A 25 0 100 - A 31 2 100 - A 32 0 100 A 37 0 100 A 70 0 100 A 49 0 100 A 30 0 100 A 45 0 100 Intermediate SNPs FCG34569 2,3,8,10 1,4,5,6,7,9,11,12 I 21368012 G A A 85 0 100 A 1P nH Y 2 - G 147 0 90 - - - - G 82 0 100 A 1P - - - - A 80 0 99 A 1P A nH 2 A 48 0 98 A 1P nH 2 A 32 0 94 2 A 51 0 100 G 57 0 100 A 67 0 100 G 92 0 100 A 87 0 99 A 59 0 100 PF506 3,4,5,7,8,9 1,2,10 n 13323493 A C c 24 0 79 c 0/1R U - - m c 5 0 80 c 1R U - - a 4 0 100 a 0R kH - - - a 8 0 100 - 0R ? k?u a 4 0 75 a 0R - ?a 10 0 60 ?a 16 0 56 a 7 0 71 c 40 0 80 ?a 6 0 67 ?c 12 0 50 3,4,5,7,8,9,11,12 csv:P1 1 n 13302072 C T T 42 0 91 t 1P nH - - - t? 33 0 61 - - - - C 13 0 100 - - - 1 - C 21 0 100 - - - C 16 0 100 - - - ?t 36 0 56 - C 20 0 100 c 29 0 72 C 40 0 100 ?c 46 0 57 c 10 0 100 c 17 0 76 PF6812 1,2 3,4,5,10,11,1 2 n 10013029 T G T 14 0 57 t 0R - - - t 9 56 100 t 0R U - g 7 0 71 g 0/1R kU - - m g 35 0 51 ? k?u G 22 0 77 g 0/1R ?g 58 0 64 ?g 37 0 62 ?t 27 0 56 gt 63 0 51 g 48 0 63 g 7 0 71 g 36 0 69 4,11 csv:P1 1,9 n 13317375 A T T 26 0 92 t 1P H - 1 - t? 33 0 61 t 0/1R - t? 16 0 69 - * a? 26 0 54 - - - a? 4 0 100 - t? 3 0 100 - ?t 15 0 58 ?t 28 0 64 t 18 0 78 ?t 31 0 58 a 2 0 100 ?t 20 0 55 CTS11841 2,6,10,11 8,9,12 n 23311208 C T t? 3 0 67 c 5 0 96 c? 31 0 58 c? 36 0 53 t? 2 0 100 c 2 0 100 ct 4 0 50 t 1 0 - t 1 0 - c 6 0 83 c 1 0 100 t 5 0 80 PF682 1,6,7,9 2,3,4,5,10,11 n 14624294 C T c 6 0 83 - t 6 0 67 - t 2 0 100 - - s t 3 0 100 t P1 T k+m t 6 0 83 c 4 0 75 c 1 70 - - c 6 0 83 t 9 0 89 t 1 0 100 ?ct 2 0 50 PF496 3,4,5,7,9,11 1,6 n! 13297909 T G g 71 0 73 kU - - m t? 21 0 67 kU T 15 0 100 kU - T 15 0 93 ? k?u T 21 0 100 g 85 0 65 T 29 0 97 ?t 52 0 54 T 44 0 91 ?g 72 0 58 T 13 0 100 ?t 48 0 52 ? Indel 6 n 13700173 C ? t 68 12 81 - - - - a 1118 34 81 - - - - A 364 9 89 T 1R nM - 1 - - t 127 31 83 T 1R - - T 63 3 88 T 1R - - C 44 8 91 - ?c 18 11 67 t 7 3 86 ?t 18 12 67 c 12 0 75 ?c 30 5 60 ?c 17 3 53 Block P1: Private SNPsfor 22874 AW has P1 P1 17319595 G A A 23 0 87 a 1P nH Y 1 - G 24 0 100 - - - - G 27 0 100 - - - - G 58 0 100 - - - G 24 0 100 - - G 78 0 100 - G 43 0 100 G 63 0 100 G 63 0 100 G 109 0 100 G 23 0 100 G 53 0 100 AW has P1 P1 19263733 T A A 39 0 97 A 1P nH Y 1 - C96 t? 60 0 100 - - - - T 39 0 100 - - - - T 59 0 100 - - - T 40 0 100 - - T 28 0 100 - T 66 0 100 T 63 0 100 T 63 0 100 T 90 0 100 T 29 0 100 T 62 0 100 AW has P1 P1 21782548 T G G 79 0 100 G nH Y 1 - C91 T 174 0 100 - - - - T 93 0 100 - - - - T 97 0 100 - - - T 35 0 97 - - T 27 0 100 - T 38 0 100 T 68 0 100 T 60 0 100 T 68 0 100 C 74 0 100 T 61 0 100 PF6729 p1 10022033 A g g 7 0 86 kU - - m a? 8 0 85 kU a 4 0 100 kU - a 11 0 64 0 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 6 0 100 A 10 0 80 ?a 8 0 50 a 8 0 50 a 7 0 86 PF6730 p1 10022039 A g g 7 0 86 kU - - m a? 6 0 67 kU a 4 0 100 kU - a 10 0 60 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 5 0 80 a 9 0 89 ?a 8 0 50 a 8 0 50 a 7 0 86 p1 14769164 T g g 4 0 100 - - - - C100 t 6 0 100 t 3 0 100 - t 5 0 100 - - t? 1 0 - t 5 0 100 t 9 0 100 T 11 0 100 t 8 0 100 - T 11 0 100 CTS6916 AW has P1 p1 17193400 C a a 2 0 100 a - Y 1 - M100 c 3 0 100 (c) 0P - c 2 0 100 - - 0 - - C 15 0 100 - C 15 0 100 - C 59 0 100 C 48 0 100 C 78 0 100 C 88 0 100 c 4 0 100 C 35 0 100 S25968 p1 23900831 T c c 4 0 75 - - - m t 5 0 100 - t? 4 0 75 - t 8 0 89 - t? 1 0 - t 5 0 80 t 8 0 63 t 12 0 80 T 10 0 90 t 7 0 71 t 8 0 75 PF3498 Matches! p1 8094631 G a a 2 0 100 - - - g 3 0 100 - G 40 0 100 G 20 0 100 G 68 0 99 G 63 0 100 g 2 0 100 G 16 0 100 csv implies P11 p1 22257324 G t g 4 0 100 t 5 0 100 t 3 0 100 t 2 0 100 T 14 0 100 t 4 0 100 t 4 0 100 t 6 0 100 t 101 0 100 T 10 0 100 T 11 0 100 t 4 0 100 Block P2: Private SNPsfor 311268 AW has P2 P2 8531427 C T C 63 0 100 - - - - T 47 0 98 T 1P nH Y 1 - C 44 0 100 - - - - C 47 0 100 - - - C 69 0 100 - - C 72 0 100 - C 70 0 100 C 65 0 100 C 90 0 100 C 111 0 98 C 70 0 100 C 55 0 100 AW has P2 P2 16561158 A G A 17 0 100 - - - G 34 0 100 G 1P nH Y 1 - A 23 0 100 - - - A 41 0 100 - - - A 14 0 100 - - A 16 0 100 - A 22 0 100 A 34 0 100 A 37 0 100 A 33 0 100 A 13 0 100 A 32 0 100 AW has P2 P2 21515424 T A T 45 0 100 - A 59 0 98 1 T 49 0 100 - T 77 0 99 - T 42 0 100 - T 45 0 100 - T 53 0 100 T 51 0 100 T 54 0 100 T 59 0 100 T 40 0 100 T 74 0 100 AW has P2 P2 21950915 G T G 47 0 100 - - - T 63 0 94 T 1P nH Y 1 - G 61 0 100 - - - G 54 0 100 - - - G 29 0 100 - - G 42 0 100 - G 79 0 100 G 54 0 100 G 54 0 100 G 69 0 100 G 42 0 100 G 54 0 100 DW had above L21 n 13833214 T A A 41 0 85 - - t? 80 0 45 - - - A 15 0 93 - 1 - A 100 0 91 A R1 - - A 49 0 88 - a 63 0 79 - a 37 0 73 a 44 0 84 ?a 46 0 67 A 121 0 92 A 28 0 96 A 74 0 92 n 17729336 C C? c 6 0 100 - a/c 2 0 50 a 0/1P n - 1 - c 6 0 100 - c 3 0 100 - 0P - - c 4 0 100 - C 20 0 95 - C 14 0 100 C 17 0 100 C 31 0 100 C 12 0 100 c 2 0 100 C 12 0 100 CTS12439 n 28587358 T G c 123 0 72 U - - m g? 151 0 57 c 0/1R - c? 100 0 65 kU - - c? 77 0 65 ? k?u ?c 67 0 55 c 43 0 77 C 112 0 100 C 165 0 100 C 157 0 64 ?c 162 0 68 c 102 0 75 c 145 0 74 not on ybrowse n 13801126 A G c 1748 10 81 C 0/1R U - - m G 2281 0 89 (A) 0R U - - c 1144 1 76 - - m c 1658 7 71 C 0/1R ? knu ?c 1083 28 57 ?c 1676 53 63 ?c 853 17 61 ?c 1118 36 63 ?c 1037 32 56 ?c 2406 56 64 ?c 517 19 60 ?c 1554 25 61 5 - 230264- 2264263 - 65048
    56. 56. L555 Phylogenetic Tree based on “DIY” BAM anlaysis 56 SNPs Indels & homopolymers L555 S393 Z16931 Z16935 Z16938 Z16946 FGC16164 Z16949 14101345 16344311 18680369 L557 S394 Z16932 S20749 Z16939 Z17660 L561 2746565 14379561 16344315 21613126 Z16929 Z16933 Z16936 Z16942 FGC19531 Z16947 6347814 14750280 16344316 Z16930 Z16934 Z16937 Z16945 FGC19536 Z16948 13550973 15305844 18585796 FCG34569 - 21368012GA 6966393AG 17319595GA CTS11273 FGC4341 9166468GA 7583420GA 16630774GA 7581395GT FGC19532 14209909CT 8531427CT 17417800AC 7244870AG 19263733TA 15093112GA 15218377TA 10007460CT 14768577CT 22487613GT FGC19534 16967721CA 16561158AG 20809987AC 7940600GA 21782548TG 21519299GA 19166468GA 19048311TC ? PR1489 FGC19535 17371426CT 21515424TA 23427058GA 8311955CA ?CTS6916 22479673GC 23804663GA 19201889CG FGC19537 21030091GA 21950915GT 16737596AT ?PF3498 ?13294119GA 24479734TC FGC19538 22164909TC 17357906TA ?PF6729 ?3715806TG FGC19539 ?16505988CT 17851999CG ?PF6730 ?13550958TG ?10064260CT 19262306GC ?S25968 ?13571571CT 21306828GA ?14769164TG ?13726006CA 22461683GT ?22257324GG ?16275572CA 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irwin Erwin Ervin Cunningham Irving 12 - B14 1 - BX(I) 6 - BA 5 - B29 7 - BB 9 - B23 4 - B17 11 - B9 3 - B10 8 - BE 2 - BX(C) 10 - BD ?23898645TC 15542414CT
    57. 57. Deriving TMRCAs from BigY tests TMRCAs derived from SNPs are easy to calculate: TMRCA in years = no. of SNPs x av. no. of years per SNP BUT: • all TMRCAs are probabilities • TMRCAs from a single test have wide confidence limits; confidence improved if several TMRCAs can be averaged • difficulties specific to SNP-based TMRCAs: - “av. years per SNP” depends on type of NGS test (FTDNA use “av. 120 years per SNP”); - no uniformity on what constitutes a relevant SNP, so I use: TMRCA in years = ∑(probable SNPs + 0.5 possible SNPs)/n x 120 57
    58. 58. Irwin project: L555 TMRCAs (1): Age of L555 block 58 No. Duration Age of @120 years SNPs per SNP (approx.) R-L21 ) 5 600 years BC1700 DF13 ) L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10 5494 11134 ) Z16943 ) Z16944 ) L555 block/bottleneck 20 2400 years L555 +19 other probable SNPs = 20 SNPs Pre-surname era Surname era Border Irwins starburst av. 5.5 650 years AD1300 1 probable 10 probables + 3 probables +2 probables +4 probables + 4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables +7 possibles +5 possibles +1 possible + 1 possible +4 possibles +1 possible =say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving 12 -B14 1 - BX 6 -BA 5 -B29 7 -BB 9 -B23 4 - B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD
    59. 59. Irwin project: L555 TMRCAs (2): Ages of individual members 59 No. Duration Age of @120 years SNPs per SNP (approx.) R-L21 ) 5 600 years BC1700 DF13 ) L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10 5494 11134 ) Z16943 ) Z16944 ) L555 block/bottleneck 20 2400 years L555 +19 other probable SNPs =20 SNPs Pre-surname era Surname era Border Irwinsstarburst av. 5.5 650 years AD1300 1 probable 10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables +7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible =say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving 12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 - B23 4 - B17 11 -B9 3 -B10 8 - BE 2 -BX 10 -BD c.630 c.1050 c.1230 c.1230 c.1350 c.1350 c.1530 c.1700 c.750 c.1300 c.1350 c.1600 "DIY" BigY TMRCAs c.750 c.1800 c.1700 BC200 c.1200 c.1700 c.1000 c.1050 c.1450 c.1750 STR TMRCAs, 2011 c.1750 c.1780 c.1700 c.1650 c.1500 c.1650 c.1750 c.1700 c.1700 c.1600 c.1850 c.1565 Earliest genealogy
    60. 60. (3): Age of L555 block by other SNP criteria No. Duration Age of @120 years SNPs per SNP (approx.) R-L21 ) 5 600 years BC1700 DF13 ) L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10 5494 11134 ) Z16943 ) Z16944 ) L555 block/bottleneck 20 2400 years L555 +19 other probable SNPs =20 SNPs Pre-surname era Surname era Border Irwinsstarburst av. 5.5 650 years AD1300 1 probable 10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables +7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible =say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving 12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 -B23 4 -B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD TMRCA = (∑(Probable SNPs + 0.5 possible SNPs)/12 ) * 120 11 7.5 5.5 5.5 5 5 3.5 2 8 5.5 4 3 av. 5.5 650 AD TMRCAs with SNPs as per Williamson's Big Tree 11 5 4.5 6 5 5 3 2 7 5 4 4 av. 5.1 615 AD TMRCA = (∑(Probable SNPs)/12 ) * 120 11 4 3 6 5 5 3 2 6 5 4 3 av. 4.8 570 AD TMRCAs with SNPs as per ISOGG Y Tree criteria 11 4 4 5 1 5 2 2 6 5 4 3 av. 4.3 520 AD years years years 1430 years 1380 1300 1335
    61. 61. Criteria for BigY SNPs 61 Criterion FTDNA FGC Y Full Williamson D.Wright J.Irvine ISOGG csv Analysis Analysis Big Tree "DIY" Y Tree Min. no. of reads/calls 10 2 1-2* 10 10 4 Max. no. of reads none none 320? Min. % consistent reads 99/95/40/10 85 85/70 100/95 Stability within Haplogroup ) "shared excluded no excl. if known Stability within sub-clade ) SNPs" no important 22216800-22512940 unstable region excluded excluded included Other "Unreliable" regions included excluded excluded Indels? included excluded excluded excluded Homopolymers, recLOHs, excluded N/A Min. "Quality" (FTDNA) yes 500 N/A N/A "Confidence" (FTDNA) yes N/A N/A Max. locations on ISOGG tree N/A 3 Min. Mapping quality average (ISOGG) N/A 10% Min. extent of base-pairs (ISOGG) N/A 20 Max. segment, repeated alleles (ISOGG) N/A 5 alleles Av. years per SNP 120 118 - 120 120 - *: depending on region NB The criteria listed are as known to me 11 Nov. 2015; all are evolving and subject to change. Clearly there are both substantive differences and confusion over terminology & definitions. At least in theory it is clearly inappropriate: (1) to seek TMRCAs without clear understanding of how relevant "SNP"s are defined, and (2) to use the same "av. years per SNP" ratio for differing definitions of "SNP".
    62. 62. The Irwin Surname tree 62 The Irwin Surname P311 showing the genetic and conventional genealogies P312 U106 BC2000 of some of the project's 33 genetic families L21 ? DF27 ? S263 and of the Borders genetic family sub-groups Z251 CTS4466 DF21 DF49 ? L176.2 ? S264 (many details omitted) Z16943 Z21065 Y11277 DF23 ? Z262 ? DF96 Bold indicates BigY test; indicates "Brick wall" Z16944 A541 Z16294 Z2961 ? SRY2627 ? ? L555, plus 20 other SNPs A195 Z16281 M222 ? ? ? ? AD1200 FCG34569 A88 A2427 A3955 ? ? ? ? ? 5 SNPs 4 SNPs 8 SNPs 4 SNPs 4 SNPs 4 SNPs 1-10 SNPs A89 A2432 M7964 ? ? ? ? ? 364399 87191 65048 22874 N126337 54774 B9 B14 B17 BE BD B10 BX BA BB B23 B29 IM1 IM2 NE2 PF DA O1 O2 NB1 1300s / 1400s 1500s 1600s 1700s / 1800s ? Today Irvings of ? Irvings of ? 169056 + 4 others 122282 + 7 others William 1754-1830 226426 + 48 others James 1730-1799 116495 + 2 others 51216 + 3 others Isaac 1781-1851 193093 + 9 others ? Washington 1783-1859 James fl.1534-67 Magnus 1655- 170? Criste fl.1460 Magnus fl.1470 ? Alexander 1754-1844 129415 + 3 others 122282 163590 + 3 others Charles 1738- ? Alexander fl.1601 Edward 1707-1798 129415 ? ? ? Eoin 1785-1841 15606 A3093 3722 116495 1690651216 ? ? ? ? ? ? ? ? 75606 + 2 others 65048 + 32 others ? ? ? Edward 1668-1708 ? ? Matthew 1697- 22874 + 65 others ? ? Edward 1669- ? William fl.1506 ? Irvings of ? ?? ? James 1776-1833 James 1750-1810 Irvings of Dumfries Francis fl.1596 ? Thomas 1650-1722 ? ? ? ? ? John 1734- John 1733- N126337 + 33 others ? ? 87191 + 2 others 13 others William 1710-1763 ? ? William 1698- David fl.1721 54774 + 4 others ? 11 others 169170 364399 + 16 others ? ? John fl.1662 GeneticgenealogyPapertrails Irvines of Eskdale William fl.1323 Alexander 1456-1527 Alexander 1527-1602 Irvings of Bonshaw Irvings of ? Edward 1590- ? Irving - NPE Bell (1) Irvines of Perthshire Irwins of Munster (1) Irwins of Munster (2) Irving - NPE Elliot (2) Irvines of Drum Irvines of Orkney (1) Irvines of Orkney (2)
    63. 63. Main findings relevant to Irwin project • Steady growth over 10 years, now 392 STR test results (94% 37+ markers) • Most participants reside in USA, & typify the Scotch-Irish-American diaspora • 40% claim Irish ancestry, but lack paper trails “across the pond” • Tradition of single-origin Scottish surname refuted • > 90% of all participants matched to a genetic family • 34 genetic families identified, each unrelated to one another in surname era: - 22 Scottish, 4 native Irish, 1 German, 1 African, 6 unknown (Scots ?) • 13-26% of participants from NPEs • Border Irwins genetic family is apparently the largest in any surname project: - all 262 descended from a Dumfriesshire ancestor who fl. C14 - SNP L555 recognised by ISOGG, still unique to Border Irwins - tentatively split into 15 sub-groups - BigY is yielding further insights, but reliable TMRCAs elusive 6363
    64. 64. Findings relevant to other surname projects • Small surname projects can learn much from large projects • Penetration ratios identify geographic bias • Spelling of surname is often misleading • FTDNA’s “Matches” pages give False Positives & False Negatives • TMRCA tables using GDs are misleading • TiP Scores avoid the many limitations of GDs • NPEs should be included • BigY: - a massive step forward - handling of results is unnecessarily cumbersome - comprehension of results is difficult & poorly explained - BAM data essential for analysing SNP quality - “starburst”/“bottleneck” phenomena need investigating - need for improved understanding of SNP criteria - individual TMRCAs unreliable: need SNP Pack back-up 64
    65. 65. Further reading • www.dnastudy.clanirwin.org • www.jogg.info/62/files/Irvine.pdf • https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf (use of BAM IGV Viewer) • www.borderreivers.co.uk • Irving, JB 1907 The Book of the Irvings • Maxwell-Irving, AMT 1968 The Irvings of Bonshaw • Mackintosh, D 1999 The Irvines of Drum and their Cadet Lines 1300-1750 • Tough, DLW 1928 The Last Years of a Frontier • MacDonald Fraser, G 1971 The Steel Bonnets • Perceval-Maxwell, M 1973 The Scottish Migration to Ulster in the Reign of James I • Dickson, RJ 1976 Ulster Emigration to Colonial America, 1718-75 • Fischer, DH 1989 Albion’s Seed • Fitzgerald, P 2008 Migration in Irish History, 1607-2007 65
    66. 66. Acknowledgements • All our 392 participants; • The many participants, most preferring anonymity, who have donated to our General Fund, helped with our website, and guided & encouraged me; • Fellow admins. John Cleary, Maurice Gleeson, Kent Irvin, Peter Irvine, Debbie Kennett, Ralph Taylor, Dennis Wright ; • Catherine Borges, for ISOGG; • Bennett Greenspan and his team at FTDNA; • My patient wife. 66

    ×