140127 platinum genomes pedigree analyses

911 views

Published on

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
911
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

140127 platinum genomes pedigree analyses

  1. 1. Sample Characterization Michael A. Eberle GiaB, January 2014
  2. 2. Pedigree including NA12878 12889 12890 12891 12892 NA12878 12877 12879 12880 12881 12882 12878 12883 12884 12885 12886 12887 12888 12893 ! All 17 members sequenced to at least 50x depth (PCR-Free protocol) ! Variants are called across the pedigree using different software & technology ! Inheritance information provides high confident, direct validation of variant calls 2
  3. 3. Why sequence a pedigree? AA CT AC GT TG AA AG CC AA GT TT AA 3 AG TT CC TG GT AC AG TC CA TT GT AA GG TC CA GT TT CA AG CC AA TT TT AA AG CT AC GG TT AC With a sufficiently large pedigree the transmission of the parental chromosomes can unambiguously be determined AG TT CC TG GT AC AG CC AA GT TT AA Error: T in blue haplotype should be G
  4. 4. Why sequence a pedigree? Either parent could also be TT AG CC AA GT TT AA 4 AG TT CC TG GT AC AA CT AC GT TG AA AG TC CA TT GT AA GG TC CA GT TT CA AG CC AA TT TT AA If only the trio were sequenced this error would not be detected When sequencing a trio we can never eliminate alternative genotypes in some of the samples AG AG AG CT CC TT A Could also be GG or GT C AA CC GG GT TG TT TT GT AC AA AC
  5. 5. A large pedigree identifies most errors Can identify a single error in >99.7% of the variant positions (11 sibs) % Sites Perfectly Constrained Percent 100 “Perfectly constrained” means could remove the genotype information of any More sibs adds confidence to more sample and impute it based on the phasing and other sample genotypes variant calls 50 2 sibs allows phasing & identifies errors in 25% of variant positions Trio never positively identifies the genotypes in every sample 0 1 5 2 3 4 5 6 7 # Siblings 8 9 10 11
  6. 6. Cost to add more siblings % Sites Perfectly Constrained Percent 100 2 Trios of Sequencing / 4 sibs 50 1 Trio of Sequencing 0 1 6 2 3 4 5 6 7 # Siblings 8 9 10 11
  7. 7. Understanding conflicts in the pedigree 7
  8. 8. # Errors Somatic/cell-line deletions on chr22 300 200 Errors per 50kb Errors in NA12878 & NA12893 100 0 300 200 100 0 Normalized Depth 4 3 2 1 0 8
  9. 9. # Errors Somatic/cell-line deletions on chr22 300 Errors per 50kb Errors in NA12878 & NA12893 200 100 0 300 200 100 0 Normalized Depth 4 3 None of the other children carry this deletion (though noise may indicate mosaic) 2 1 0 9 1Mb
  10. 10. Read counts for the haplotypes inferred in NA12878 at location of cell line deletion (200x depth) Maternal haplotype (NA12892) Fraction 0.10 •  Inferred the two haplotypes in NA12878 based on the other samples •  Counts represent the predicted heterozygous locations 0.05 Paternal haplotype (NA12891) 0.00 0 50 100 Allele Counts 10 150 200
  11. 11. Technical replicates validate de novo SNVs 82 (~4%) did not replicate Total Errors TotalConflicts 4000 3000 2000 FPs? 1843 (~96%) replicate original call NA 128 0 82 1000 11 Results in Tech. Rep.
  12. 12. Thoughts on selecting the next samples for sequencing ! Identify and sequence pedigrees with multiple siblings –  WGS every individual in the pedigree to identify haplotype transmission vectors –  One “high quality” family (2 parents & 4 sibs) provides a “better” reference than two lower quality trios for the same amount of sequencing –  Technical replicates allow alternative validation of biologically interesting calls – e.g. de novo mutations, gene conversion etc. ! Choose one or two samples to target for long reads if sequencing-limited –  Sequencing both parent will provide 100% of the variants in the pedigree though with four children only ~75% will be validated in the children –  Sequencing a child will guarantee that every variant has been sequenced in at least one of the parents though will only contain ~50% of the variants in the family ! Quality of the DNA is important –  CEPH pedigree shows many cell line artifacts that are correctly genotyped but deviate from inheritance –  Cell line artifacts complicate the analysis 12

×