Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2010 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, Gold...
2
Platinum Genome project: Improving technology & tools
Create a catalogue of highly accurate whole-genome variant calls w...
3
NIST GIAB – Pedigree analysis
12889 12890 12891 12892
12877 12878
12879 12880 12881 12882 12883 12884 12885 1288712886 1...
4
Pedigree Analysis – Using haplotypes to detect conflicts
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
T
T
A
A
C
A
G
T
A
A
T...
5
Using haplotypes to detect conflicts
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
T
T
A
A
C
A
G
T
A
A
T
C
T
G
A
A
T
C
T
G
A...
6
First step is to define the inheritance of the parental chromosomes to the eleven
children everywhere in the genome
– Id...
7
Homozygous positions (GATK)
– ~2.6B positions identified as homozygous reference across the pedigree
SNPs (GATK, Cortex,...
8
CNVs
9
Incorporating larger variants
SNPs and small indels work well because the genotypes are highly accurate
– A single genot...
10
Incorporating CNVs into this framework
Make breakpoint calls within
each sample using
BreakDancer & Grouper
Identify re...
11
AB CD CB DA CB DB DA CB CA DB CB CA DA
0
500
1000
1500
2000
ReadCounts
0
1
2
Using read counts to confirm deletions – 8...
12
Breakdown of 772 “accurate” CNVs (1kb to 322kb in size)
26640898
BreakDancerGrouper
13
Assembling breakpoints for the 772 CNVs
– Reassessing the “failed” calls where applicable
Incorporating different calli...
14
Illumina Oxford
Morten Kallberg Zamin Iqbal
Xiaoyu Chen Gil McVean
Han-Yu Chuang
Phil Tedder
Sean Humphray
Elliott Marg...
Upcoming SlideShare
Loading in …5
×

Aug2013 illumina platinum genomes

5,372 views

Published on

Published in: Technology, Education
  • Be the first to comment

Aug2013 illumina platinum genomes

  1. 1. © 2010 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, GenomeStudio, Genetic Energy, HiSeq, and HiScan are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Platinum Genomes: Identifying variants using a large pedigree Michael A. Eberle GIAB August, 2013
  2. 2. 2 Platinum Genome project: Improving technology & tools Create a catalogue of highly accurate whole-genome variant calls within a well characterized pedigree – SNPs, indels & CNVs – Including highly confident reference positions – Provide direct supporting evidence for every variant call Develop a framework to assess variant callers Provide a path to improve variant callers by providing a better truth data to sensitively assess sensitivity and precision – Modifying the SNP filters to maximize accuracy Correct FPFN Truth Test
  3. 3. 3 NIST GIAB – Pedigree analysis 12889 12890 12891 12892 12877 12878 12879 12880 12881 12882 12883 12884 12885 1288712886 12888 12893 All 17 members sequenced to at least 50x depth (PCR-Free protocol) Variants are called across the pedigree using different software & technology Inheritance information provides high confident, direct validation of variant calls Analysis of SNPs in the parents and 11 children
  4. 4. 4 Pedigree Analysis – Using haplotypes to detect conflicts A C A G T A A C A G T A A C A G T A A C A T T A A C A G T A A T C T G A A T C T G A A T C T G A G T C G T C G T C G T C G T C G T C G C A T T A G C A T T A G C A T T A G C A T T A G C A T T A With a sufficiently large pedigree all four possible inheritance patterns will be observed and most of the genotypes can be phased into haplotypes Parents Children
  5. 5. 5 Using haplotypes to detect conflicts A C A G T A A C A G T A A C A G T A A C A T T A A C A G T A A T C T G A A T C T G A A T C T G A G T C G T C G T C G T C G T C G T C G C A T T A G C A T T A G C A T T A G C A T T A G C A T T A Individual GT accuracy is assessed using surrounding genotype calls across the pedigree Genotypes are parsimoniously phased to minimize the number of conflicts across the pedigree Facilitates assigning conflicts to sample, imputation of missing data and error correction Error at this sample/position Parents Children
  6. 6. 6 First step is to define the inheritance of the parental chromosomes to the eleven children everywhere in the genome – Identified 709 crossover events between the parents and eleven children Variants called across the pedigree using multiple callers – E.g. GATK, Cortex, Isaac & CGI for SNPs Define accurate variants as those where the genotypes are 100% consistent with the transmission of the parental haplotypes – At any position of the genome there are only 16 possible combinations of genotypes (biallelic & diploid) across the pedigree that are consistent with the inheritance pattern – 313 (~1.6M) possible genotype combinations Analysis of variant calls within the pedigree structure
  7. 7. 7 Homozygous positions (GATK) – ~2.6B positions identified as homozygous reference across the pedigree SNPs (GATK, Cortex, Isaac & CGI) – ~4.7M positions where SNPs agree with transmission of parental chromosomes – >95% (4.5M) called consistent with transmission by multiple algorithms/technologies – >98% (4.6M) with supporting evidence from other call sets (i.e. same variant called in at least one of the samples) Indels (GATK, Cortex & CGI) – ~640k indels consistent with transmission of parental chromosomes – Events range in size from 1 to 350bp CNVs (BreakDancer & Grouper) – ~772 CNVs - mostly deletions though a couple of duplications – Events range from 1kb to 322kb though still refining break points Current state
  8. 8. 8 CNVs
  9. 9. 9 Incorporating larger variants SNPs and small indels work well because the genotypes are highly accurate – A single genotyping error in any of the 13 samples will almost never be consistent with the haplotype transmission Developing approaches for other variants types that have lower calling accuracy – Many CNV callers do not provide GT information – Accuracy is too low to use pedigree-consistency
  10. 10. 10 Incorporating CNVs into this framework Make breakpoint calls within each sample using BreakDancer & Grouper Identify regions of overlap between samples (keeping singletons) Corroborate based on read counts within the putative CNV events Refine to breakpoint resolution NA12877 NA12878 NA12879 NA12880 NA12881 NA12882 Test Regions • Count the uniquely aligned reads within the defined break points for the test regions for each sample & identify events where the read counts are consistent with a deletion or duplication • For internally-consistent events, follow up with targeted analysis to identify bp resolution of events • On average ~150x depth for every event
  11. 11. 11 AB CD CB DA CB DB DA CB CA DB CB CA DA 0 500 1000 1500 2000 ReadCounts 0 1 2 Using read counts to confirm deletions – 8.5kb deletion Best Sol’n: A=0 ; B=1 ; C=1 ; D=1 All Samples with haplotype A are consistent with haploid based on read countsA A A A A A Diploid Haploid Zero-ploid
  12. 12. 12 Breakdown of 772 “accurate” CNVs (1kb to 322kb in size) 26640898 BreakDancerGrouper
  13. 13. 13 Assembling breakpoints for the 772 CNVs – Reassessing the “failed” calls where applicable Incorporating different calling algorithms / methods – E.g. SNP inheritance can help identify CNVs that are missed by other methods – Including mate pair data (~2kb insert size) Working on different methods to improve our catalogue of ~30bp to 2kb events & incorporating different callers Assigning error modes for “failed” SNPs – Many look like cell line mutations & alignment errors Comparing our call set to other datasets to assess accuracy and completeness – Other GIAB call sets – Fosmid data (Jaffe & Kidd) Next steps
  14. 14. 14 Illumina Oxford Morten Kallberg Zamin Iqbal Xiaoyu Chen Gil McVean Han-Yu Chuang Phil Tedder Sean Humphray Elliott Margulies David Bentley This data and more available at www.platinumgenomes.org Acknowledgements

×