Your SlideShare is downloading. ×
0
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Imputation for genotyping by sequencing - Emma Huang
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Imputation for genotyping by sequencing - Emma Huang

643

Published on

Genotyping-by-sequencing (GBS) technology has made dense genotyping cost-effective for many species. However, the high levels of missing data can result in a large loss of information. The popularity …

Genotyping-by-sequencing (GBS) technology has made dense genotyping cost-effective for many species. However, the high levels of missing data can result in a large loss of information. The popularity of GBS makes the development of efficient imputation approaches a priority. Here we consider imputation under the further difficulty caused by multi-parental experimental crosses. We present an approach to imputing founder genotypes which allows recovery of a large proportion of markers. Once these have been imputed, we compare three approaches to imputing progeny genotypes and apply our strategy to an eight-parent rice population to demonstrate the potential gain from imputation.

Published in: Science, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
643
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Imputation for genotyping by sequencing Emma Huang, Chitra Raghavan, Ramil Mauleon, Karl Broman, Hei Leung CSIRO MATHEMATICS, INFORMATICS AND STATISTICS AND FOOD FUTURES FLAGSHIP
  • 2. CSIRO MATHEMATICS, INFORMATICS AND STATISTICS AND FOOD FUTURES FLAGSHIP
  • 3. Comparing Designs FOAM 2014 Resolution/Diversity  Allelefrequency/Power BC F2 RIL MAGIC Natural populations Experimental Crosses Biparental Crosses NAM
  • 4. MAGIC Wheat Inbreeding No mixing 2 generations intercrossing 3 generations intercrossing Double haploids FOAM 20144
  • 5. MAGIC Arabidopsis FOAM 2014 H I C D E J K LA B C D E F F G X Kover et al. PLoS Genet 2009
  • 6. Arabidopsis MAGIC • 19 founders, outcrossed for four generations • Lines from 342 F4 families selfed for 6 generations • Founder lines resequenced (60x coverage) ~3M SNPs • ~500 progeny sequenced (.5x coverage) ~500K SNPs FOAM 20146
  • 7. MAGIC Rice FOAM 2014 Indica Japonica X Bandillo et al. Rice 2013 6:11 • ~2000 lines selfed for 6-8 generations • Preliminary genotyping/phenotyping of 200 lines at S4 • Further genotyping by sequencing (GBS) planned for S8 and founder lines
  • 8. Organisms • 125 Mb • Diploid • 17 Gb • Hexaploid • 430 Mb • Diploid FOAM 20148
  • 9. Major differences in resources www.wheatgenome.org Arabidopsis: reference genomes, annotation, … Rice: reference genome (japonica) Wheat: FOAM 20149
  • 10. Genotypes 60x founders .5x progeny 9K/90K SNP chipsLow-coverage GBS, founders and progeny FOAM 201410
  • 11. • Stretches of missing values where reads don't align • Arabidopsis: .5x coverage, 500K/3M SNPs  17% of total • Rice: Filtering process reduces 159,522 SNPs  12,767 (8%) How do we make use of the genome structure to fill in the gaps in our knowledge? Low-coverage GBS FOAM 2014
  • 12. • Missing data (random) • Comparison across studies (systematic) Genotype Imputation 1 0 - 1 - 1 1 0 - 1 - 1 0 0 - 0 - 1 1 1 - 1 - 0 - - 1 - 0 - - - 1 - 1 - FOAM 201412
  • 13. Typical approach FOAM 2014 High-density reference panel • Phasing Low-density targets • HMM • Pedigree Probabilities • Phases • Imputation
  • 14. History FOAM 2014 Software Release Date Author Institute (fast)PHASE 2001/2006 Stephens Chicago MACH 2007 Abecasis Michigan BEAGLE 2007 Browning Washington AlphaImpute 2011 Hickey Roslin IMPUTE(2) 2009/2012 Marchini Oxford SHAPEIT(2) 2011/2013 Delaneau CNAM
  • 15. Top-down FOAM 2014 Reference Panel Subj_ct 1 S_bj_ct 3 _ubj_c_ 2
  • 16. FOAM 2014 Spacing (/cM) N %MISS %B %M %K 1 200 30 93.7 96.3 79.8 1 200 40 93.0 95.5 78.8 1 200 50 92.0 94.8 77.5 1 400 30 94.3 96.3 80.3 1 400 40 93.8 95.5 79.4 1 400 50 92.6 94.8 78.2 2 200 30 96.7 98.3 83.5 2 200 40 96.3 98.0 82.3 2 200 50 95.4 97.6 80.8 2 400 30 97.0 98.3 84.1 2 400 40 96.5 98.0 83.1 2 400 50 96.0 97.6 81.8 But what happens if our reference panel is incomplete? 16
  • 17. • Higher coverage • Different platform • More replicates • … Simplest solution: get more data FOAM 2014
  • 18. FOAM 2014 Progeny Fo_nder A F_und_r B Fo__der C _oun__r D 18
  • 19. Very simple approach FOAM 2014 F o u n d e r A 1 1 - 1 1 1 0 B 1 - 1 0 1 - 1 C 1 - - 0 1 - 0 D - 1 1 1 - 0 1 F o u n d e r 0 27 48 26 36 43 43 51 1 73 52 74 64 57 57 49
  • 20. Very simple approach FOAM 2014 F o u n d e r A 1 1 ? 1 1? 1 0 B 1 0 1 0 1? ? 1 C 1 0 ? 0 1? ? 0 D 0 1 1 1 0 0 1 F o u n d e r 0 27 48 26 36 43 43 51 1 73 52 74 64 57 57 49
  • 21. More complicated version FOAM 2014 • Missing data in progeny • Recombination between markers • Genotyping error in progeny
  • 22. • MAGIC 8-parent populations • Masked out founder values and progeny values • Varying marker density, sample size, missing % • Imputed founders and used those to impute all data Simulations FOAM 2014
  • 23. Simulations FOAM 2014 Spacing (/cM) N %MISS %F0 %FC %FK 1 200 30 46.9 100 86.6 1 200 40 24.5 100 85.4 1 200 50 9.8 99.6 83.9 1 400 30 47.3 100 88.4 1 400 40 24.9 100 87.4 1 400 50 10.1 100 86.2 2 200 30 47.1 100 90.7 2 200 40 24.8 100 89.5 2 200 50 10.0 100 87.8 2 400 30 47.1 100 92.1 2 400 40 24.9 100 91.3 2 400 50 10.0 100 90.1
  • 24. 178 F4 lines, 37240 markers after filtering ~21% missing parents; 38% missing progeny Masked data on Chr 1 from 1130 markers with full parent data Simulated 22% missingness 128 -> 1092 with 96% correctly imputed For all markers, 25.2% imputed up to 92.7% Rice data FOAM 2014
  • 25. • Wheat: requirement of map position • Arabidopsis: resequenced founders; detection of other variants? • Density of markers • Level of missingness • Genotyping errors • Heterozygosity Relevance to other populations? FOAM 2014
  • 26. CCI Emma Huang t +61 7 3833 5542 e Emma.Huang@csiro.au Thanks! COMPUTATIONAL INFORMATICS AND FOOD FUTURES FLAGSHIP xkcd.com

×