SlideShare a Scribd company logo
1 of 1
Download to read offline
 
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
•  Evalua'on	
  of	
  Es'mate	
  using	
  D	
  
Sta's'c:	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  D=	
  
Accuracy	
  Op*miza*on	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Es*ma*ng	
  haplotype	
  frequencies	
  of	
  Drosophila	
  melanogaster	
  
from	
  pooled	
  sequence	
  data	
  
Devin	
  Petersohn*,	
  Aniqa	
  Rahman*	
  and	
  Elizabeth	
  King	
  
*	
  co-­‐first	
  authors	
  
Abstract	
  
Goals	
  and	
  Significance	
  	
  
•  Selec'on	
  and	
  Popula'on	
  Studies	
  
•  Genotype/Phenotype	
  Mapping	
  
•  Big	
  data	
  processing	
  
•  Cost	
  effec've	
  data	
  collec'on	
  
	
  
	
  
	
  
	
  
Acknowledgments	
  
Results	
  
Results	
  Overview	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Methods	
  
•  Increasing	
   pool	
   size	
   to	
   15	
   founders	
   does	
   not	
  
decrease	
  accuracy	
  of	
  algorithm	
  
•  Increased	
   marker	
   density	
   improves	
   accuracy	
   of	
  
algorithm	
  
•  Window	
  sizes	
  based	
  on	
  gene'c	
  loca'on	
  are	
  most	
  
accurate	
  
•  Increased	
   window	
   size	
   increases	
   accuracy	
   to	
   a	
  
breaking	
  point,	
  where	
  it	
  begins	
  to	
  rise	
  again	
  
References	
  
1.  Burke	
  	
  MK	
  et	
  al.	
  2013.	
  Genome-­‐wide	
  associa'on	
  study	
  of	
  extreme	
  longevity	
  in	
  
Drosophila	
  melanogaster.	
  Genome	
  Biology	
  and	
  Evolu'on	
  6(1):1–11.	
  	
  
2.  	
  King	
  EG,	
  Macdonald	
  SJ,	
  Long	
  AD.	
  2012.	
  Proper'es	
  and	
  power	
  of	
  the	
  Drosophila	
  
Synthe'c	
  Popula'on	
  Resource	
  for	
  the	
  rou'ne	
  dissec'on	
  of	
  complex	
  traits.	
  Gene'cs	
  
191:935–949.	
  
D	
  S	
   P	
  R	
  
Conclusions	
  
	
  
	
  
	
  
	
  
This	
  project	
  was	
  funded	
  by	
  the	
  NSF,	
  
the	
  NIH	
  (F32GM099382),	
  and	
  the	
  
University	
  of	
  Missouri	
  Office	
  of	
  
Undergraduate	
  Research.	
  
Figure	
  1.	
  Expected	
  and	
  es'mated	
  haplotype	
  frequencies	
  of	
  A1	
  (above)	
  and	
  AB8	
  (below)	
  founders	
  for	
  pools	
  1	
  and	
  4	
  across	
  the	
  
genome.	
  Chromosome	
  arms	
  are	
  displayed	
  in	
  varying	
  colors	
  while	
  HMM	
  inferred	
  frequencies	
  appear	
  in	
  a	
  darker	
  shade	
  and	
  es'mated	
  
values	
  appear	
  lighter.	
  
Fly	
  Prep	
  
Pool	
   min	
  %D	
   chromosome	
   max	
  %D	
   chromosome	
   mean	
  %D	
   ave	
  coverage	
  
1	
   0.24	
   X	
   24.51	
   X	
   4.24	
   59.90	
  
2	
   0.55	
   2L	
   27.31	
   X	
   3.97	
   51.68	
  
3	
   0.93	
   2L	
   20.69	
   X	
   5.68	
   28.75	
  
4	
   0.47	
   2R	
   10.65	
   2L	
   2.54	
   70.12	
  
Figure	
  2.	
  Percent	
  difference	
  between	
  es'mated	
  and	
  HMM	
  inferred	
  haplotype	
  frequencies	
  in	
  Pool	
  1	
  (blue)	
  and	
  Pool	
  4	
  (green)	
  across	
  
the	
  genome.	
  Pool	
  4	
  displayed	
  consistently	
  lower	
  D	
  values	
  than	
  pools	
  1-­‐3.	
  
Figure	
  3.	
  Average	
  percent	
  difference	
  observed	
  in	
  haplotype	
  es'mates	
  as	
  a	
  result	
  
of	
  varying	
  marker	
  density	
  in	
  chromosome	
  arm	
  2R,	
  Pool	
  1.	
  	
  SNP	
  density	
  was	
  down-­‐
sampled	
  by	
  randomly	
  selec'ng	
  SNPs	
  from	
  the	
  pooled	
  genomic	
  data	
  from	
  1K-­‐140K	
  
SNPs	
  in	
  increments	
  of	
  1K.	
  Accuracy	
  of	
  the	
  es'mator	
  suffers	
  below	
  1K	
  SNPs/Mb	
  but	
  
reaches	
  a	
  stable	
  low	
  %D	
  aier	
  this	
  point.	
  
Algorithm	
  
The	
  founder	
  ancestry	
  at	
  any	
  given	
  posi'on	
  in	
  each	
  RIL	
  is	
  determined	
  with	
  a	
  high	
  degree	
  of	
  certainty	
  
using	
  the	
  genome	
  sequences	
  of	
  the	
  founders	
  and	
  genotype	
  data	
  for	
  the	
  RILs	
  in	
  a	
  hidden	
  Markov	
  
model2	
  (HMM).	
  In	
  this	
  study,	
  HMM	
  inferences	
  are	
  used	
  as	
  expected	
  haplotype	
  frequencies	
  in	
  the	
  
different	
  pools.	
  
Table	
  1.	
  Summary	
  sta's'cs	
  for	
  pools	
  1-­‐4.	
  Lowest	
  mean	
  D	
  values	
  are	
  observed	
  in	
  pool	
  4,	
  likely	
  due	
  to	
  greater	
  average	
  coverage.	
  
Ques'on	
  	
  
SeOng	
  precedents	
  for	
  op*mal	
  configura*ons	
  for	
  haplotype	
  
es*ma*on	
  from	
  pooled	
  samples	
  to	
  minimize	
  cost	
  and	
  
maximize	
  quan*ty	
  and	
  accuracy	
  of	
  results.	
  
What	
  are	
  the	
  op*mal	
  algorithm	
  seOngs	
  for	
  es*ma*ng	
  
haplotype	
  frequencies	
  from	
  pooled	
  sequence	
  data?	
  
	
  
0 1000 2000 3000 4000 5000
4681012
SNP Density (SNPs per Mb)
Average%D
|
|
|
As	
  the	
  cost	
  of	
  genome	
  sequencing	
  decreases,	
  studies	
  that	
  were	
  previously	
  impossible	
  are	
  becoming	
  more	
  feasible.	
  	
  
For	
  popula'on	
  gene'cists,	
  however,	
  sequencing	
  every	
  individual	
  in	
  a	
  popula'on	
  is	
  oien	
  cost	
  prohibi've.	
  	
  Pooled	
  
sequencing	
  is	
  a	
  commonly	
  used,	
  cheaper	
  alterna've	
  to	
  individual-­‐level	
  sequencing.	
  However,	
  accurately	
  es'ma'ng	
  
the	
  haplotype	
  frequencies	
  of	
  a	
  popula'on	
  from	
  pooled	
  sequence	
  data	
  remains	
  a	
  challenge.	
  In	
  order	
  to	
  address	
  this	
  
problem,	
  we	
  have	
  developed	
  and	
  refined	
  an	
  algorithm	
  to	
  es'mate	
  haplotype	
  frequencies	
  from	
  pooled	
  data.	
  To	
  
experimentally	
  validate	
  our	
  method,	
  we	
  used	
  genomic	
  data	
  collected	
  from	
  	
  pooled	
  sets	
  of	
  recombinant	
  inbred	
  lines	
  
with	
  a	
  completely	
  known	
  haplotype	
  structure.	
  These	
  lines	
  were	
  derived	
  from	
  a	
  50	
  genera'on	
  controlled	
  cross	
  of	
  15	
  
homozygous	
  founder	
  lines	
  of	
  Drosophila	
  melanogaster.	
  	
  We	
  validated	
  the	
  predic've	
  accuracy	
  of	
  our	
  haplotype	
  
es'mator	
  by	
  comparing	
  the	
  haplotype	
  frequency	
  es'mates	
  obtained	
  by	
  our	
  method	
  with	
  the	
  known	
  haplotype	
  
composi'on	
  of	
  the	
  pool.	
  	
  We	
  present	
  a	
  study	
  in	
  which	
  the	
  accuracy	
  of	
  the	
  haplotype	
  es'mator	
  is	
  tested	
  against	
  
variability	
  in	
  raw	
  sequence	
  coverage,	
  SNP	
  density,	
  and	
  the	
  procedure	
  of	
  the	
  algorithm.	
  This	
  algorithm,	
  which	
  can	
  
accurately	
  es'mate	
  the	
  haplotype	
  frequency	
  of	
  a	
  popula'on	
  from	
  pooled	
  sequence	
  data,	
  has	
  the	
  poten'al	
  to	
  
significantly	
  progress	
  the	
  field	
  of	
  genotype-­‐phenotype	
  mapping,	
  a	
  major	
  goal	
  of	
  modern	
  biology	
  and	
  bioinforma'cs.	
  	
  	
  
	
  
Position (Mb)
%D
051015
0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0
X 2L 2R 3L 3R
Applica'on	
  
These	
  plots	
  demonstrate	
  varying	
  haplotype	
  frequencies	
  between	
  young	
  and	
  old	
  popula'ons	
  of	
  
Drosophila	
  melanogaster	
  in	
  a	
  longevity	
  study1.	
  For	
  this	
  region	
  on	
  chromosome	
  2R	
  there	
  is	
  a	
  significant	
  
difference	
  between	
  haplotype	
  frequencies	
  in	
  the	
  two	
  popula'ons.	
  Different	
  colors	
  represent	
  the	
  8	
  
different	
  haplotypes.	
  
(RILs)	
  
Algorithm	
  intakes	
  flavors	
  of	
  SNPs	
  at	
  each	
  posi'on	
  (eg.	
  0=A,	
  
1=T)	
  and	
  refines	
  a	
  haplotype	
  frequency	
  guess	
  to	
  minimize	
  the	
  
difference	
  between	
  the	
  observed	
  allele	
  counts	
  and	
  es'mated	
  
allele	
  counts	
  weighted	
  by	
  haplotype	
  frequency.	
  
●
●
●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●
●
●
●
●
0 1 2 3 4 5 6
3.23.64.04.4
Window Size (cM)
Average%D
Figure	
  4.	
  The	
  effect	
  of	
  window	
  size	
  on	
  accuracy	
  using	
  (a)	
  SNPs,	
  (b)	
  chromosomal	
  
posi'on	
  (Kb),	
  and	
  (c)	
  gene'c	
  posi'on	
  (cM).	
  The	
  op'mal	
  window	
  size	
  is	
  marked	
  on	
  
each	
  plot.	
  	
  Gene'c	
  posi'on	
  has	
  the	
  lowest	
  %D,	
  and	
  is	
  therefore	
  the	
  op'mal	
  
window	
  metric	
  when	
  window	
  size	
  is	
  between	
  0.8	
  and	
  3.5	
  cM	
  (%D:	
  3.05-­‐3.13).	
  
●
●
●
●
●
●
●●
●●●●●●●●●●●●● ●
●
● ●
●
●
●
●
0 5000 10000 15000
3.54.55.56.5
Window Size (SNP)
Average%D
(a)	
  
	
  (c)	
  
Op'mum	
  =	
  3.38	
  %D	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  v	
  	
  at	
  2500	
  bp	
  	
  
ß	
  200	
  	
  SNP	
  window	
  	
  
●
●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ●
●
0 500 1000 1500 2000
3.54.55.56.5
Window Size (Kb)
Average%D
Op'mum	
  =	
  	
  3.37	
  %D	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  v	
  	
  	
  at	
  500Kb	
  	
  	
  	
  
Op'mum	
  =	
  3.05	
  %D	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  v	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  cM	
  	
  	
  
(ho)	
  
(hY)	
  
Pool 1
Position (Mb)
Frequency
0.000.100.200.30
0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0
X 2L 2R 3L 3R
Pool 4
Position (Mb)
Frequency
0.000.100.20
0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0
X 2L 2R 3L 3R
Pool 1
Position (Mb)
Frequency
0.00.10.20.30.4
0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0
X 2L 2R 3L 3R
Pool 4
Position (Mb)
Frequency
0.000.100.200.30
0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0
X 2L 2R 3L 3R

More Related Content

What's hot

Mapping and association mapping
Mapping and association mappingMapping and association mapping
Mapping and association mappingFAO
 
Association mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mappingAssociation mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mappingMahesh Biradar
 
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...Superior Animal Genetics (SAG)
 
genome wide linkage mapping
genome wide linkage mappinggenome wide linkage mapping
genome wide linkage mappingRavi Kamble
 
Fine QTL Mapping- A step towards Marker Assisted Selection (II)
Fine QTL Mapping- A step towards Marker Assisted Selection  (II)Fine QTL Mapping- A step towards Marker Assisted Selection  (II)
Fine QTL Mapping- A step towards Marker Assisted Selection (II)Mahesh Hampannavar
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome SelectionRaghav N.R
 
QTL lecture for Bio4025
QTL lecture for Bio4025QTL lecture for Bio4025
QTL lecture for Bio4025DanChitwood
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in riceSopan Zuge
 
Association mapping
Association mappingAssociation mapping
Association mappingNivethitha T
 
Quantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingQuantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingPGS
 
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...Thermo Fisher Scientific
 
Association mapping in plants
Association mapping in plantsAssociation mapping in plants
Association mapping in plantsWaseem Hussain
 

What's hot (20)

Mapping and association mapping
Mapping and association mappingMapping and association mapping
Mapping and association mapping
 
Association mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mappingAssociation mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mapping
 
QTL
QTLQTL
QTL
 
QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS  QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS
 
Qtl mapping
Qtl mapping Qtl mapping
Qtl mapping
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
Potential for Genomic Selection in indigenous breeds and results of GWAS in G...
 
genome wide linkage mapping
genome wide linkage mappinggenome wide linkage mapping
genome wide linkage mapping
 
Fine QTL Mapping- A step towards Marker Assisted Selection (II)
Fine QTL Mapping- A step towards Marker Assisted Selection  (II)Fine QTL Mapping- A step towards Marker Assisted Selection  (II)
Fine QTL Mapping- A step towards Marker Assisted Selection (II)
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
GWAS
GWASGWAS
GWAS
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome Selection
 
QTL lecture for Bio4025
QTL lecture for Bio4025QTL lecture for Bio4025
QTL lecture for Bio4025
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in rice
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
MyAML Poster FINAL
MyAML Poster FINALMyAML Poster FINAL
MyAML Poster FINAL
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Quantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingQuantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breeding
 
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
 
Association mapping in plants
Association mapping in plantsAssociation mapping in plants
Association mapping in plants
 

Viewers also liked

3 g ibs walk test report dhk_v1415_tems
3 g ibs walk test report dhk_v1415_tems3 g ibs walk test report dhk_v1415_tems
3 g ibs walk test report dhk_v1415_temsIshaque uddin
 
NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4
NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4
NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4Future Managers
 
Guide to genex assistant for lte update 2012-2-25
Guide to genex assistant for lte update 2012-2-25Guide to genex assistant for lte update 2012-2-25
Guide to genex assistant for lte update 2012-2-25Usman Ali
 
surbhi IBS
surbhi IBSsurbhi IBS
surbhi IBSstudy205
 
Genex assistant operation guide (lte)
Genex assistant operation guide (lte)Genex assistant operation guide (lte)
Genex assistant operation guide (lte)Roel Gabon
 
2g 3g Drive test by deep kumar
2g 3g Drive test by deep kumar2g 3g Drive test by deep kumar
2g 3g Drive test by deep kumardeep kumar
 
3G Drive test procedure by Md Joynal Abaden@ Myanmar
3G Drive test procedure  by Md Joynal Abaden@ Myanmar3G Drive test procedure  by Md Joynal Abaden@ Myanmar
3G Drive test procedure by Md Joynal Abaden@ MyanmarMd Joynal Abaden
 
IBS Presentation
IBS PresentationIBS Presentation
IBS PresentationPk Doctors
 

Viewers also liked (9)

3 g ibs walk test report dhk_v1415_tems
3 g ibs walk test report dhk_v1415_tems3 g ibs walk test report dhk_v1415_tems
3 g ibs walk test report dhk_v1415_tems
 
NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4
NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4
NCV 2 Entrepreneurship Hands-On Support Slide Show - Module 4
 
Guide to genex assistant for lte update 2012-2-25
Guide to genex assistant for lte update 2012-2-25Guide to genex assistant for lte update 2012-2-25
Guide to genex assistant for lte update 2012-2-25
 
surbhi IBS
surbhi IBSsurbhi IBS
surbhi IBS
 
Genex assistant operation guide (lte)
Genex assistant operation guide (lte)Genex assistant operation guide (lte)
Genex assistant operation guide (lte)
 
2g 3g Drive test by deep kumar
2g 3g Drive test by deep kumar2g 3g Drive test by deep kumar
2g 3g Drive test by deep kumar
 
3G Drive test procedure by Md Joynal Abaden@ Myanmar
3G Drive test procedure  by Md Joynal Abaden@ Myanmar3G Drive test procedure  by Md Joynal Abaden@ Myanmar
3G Drive test procedure by Md Joynal Abaden@ Myanmar
 
IBS Training
IBS TrainingIBS Training
IBS Training
 
IBS Presentation
IBS PresentationIBS Presentation
IBS Presentation
 

Similar to Pooled Sequence Haplotype Estimator

jin-HMG2014-post
jin-HMG2014-postjin-HMG2014-post
jin-HMG2014-postJin Yu
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Useful.ppt
Useful.pptUseful.ppt
Useful.pptaaaa bbb
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cellAmitSamadhiya1
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS4RTPCRAnand
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisJames Warren
 
Nature BioTech RainDance.pdf
Nature BioTech RainDance.pdfNature BioTech RainDance.pdf
Nature BioTech RainDance.pdfMichael Weiner
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Ronak Shah
 
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...MANGLAM ARYA
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
Biotechnology of High Sensitivity PCR for Oncology Biomarkers
 Biotechnology of High Sensitivity PCR for Oncology Biomarkers Biotechnology of High Sensitivity PCR for Oncology Biomarkers
Biotechnology of High Sensitivity PCR for Oncology BiomarkersKirsten Copren
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009Sean Davis
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray StatisticsA Roy
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 

Similar to Pooled Sequence Haplotype Estimator (20)

jin-HMG2014-post
jin-HMG2014-postjin-HMG2014-post
jin-HMG2014-post
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary Analysis
 
Microsatellites Markers
Microsatellites  MarkersMicrosatellites  Markers
Microsatellites Markers
 
Nature BioTech RainDance.pdf
Nature BioTech RainDance.pdfNature BioTech RainDance.pdf
Nature BioTech RainDance.pdf
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Biotechnology of High Sensitivity PCR for Oncology Biomarkers
 Biotechnology of High Sensitivity PCR for Oncology Biomarkers Biotechnology of High Sensitivity PCR for Oncology Biomarkers
Biotechnology of High Sensitivity PCR for Oncology Biomarkers
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray Statistics
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
final_presentation
final_presentationfinal_presentation
final_presentation
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
GENOMIC SIGNAL PROCESSING
GENOMIC SIGNAL PROCESSINGGENOMIC SIGNAL PROCESSING
GENOMIC SIGNAL PROCESSING
 
Microarray CGH
Microarray CGHMicroarray CGH
Microarray CGH
 

Pooled Sequence Haplotype Estimator

  • 1.                                                       •  Evalua'on  of  Es'mate  using  D   Sta's'c:                    D=   Accuracy  Op*miza*on                                                                         Es*ma*ng  haplotype  frequencies  of  Drosophila  melanogaster   from  pooled  sequence  data   Devin  Petersohn*,  Aniqa  Rahman*  and  Elizabeth  King   *  co-­‐first  authors   Abstract   Goals  and  Significance     •  Selec'on  and  Popula'on  Studies   •  Genotype/Phenotype  Mapping   •  Big  data  processing   •  Cost  effec've  data  collec'on           Acknowledgments   Results   Results  Overview                                                                         Methods   •  Increasing   pool   size   to   15   founders   does   not   decrease  accuracy  of  algorithm   •  Increased   marker   density   improves   accuracy   of   algorithm   •  Window  sizes  based  on  gene'c  loca'on  are  most   accurate   •  Increased   window   size   increases   accuracy   to   a   breaking  point,  where  it  begins  to  rise  again   References   1.  Burke    MK  et  al.  2013.  Genome-­‐wide  associa'on  study  of  extreme  longevity  in   Drosophila  melanogaster.  Genome  Biology  and  Evolu'on  6(1):1–11.     2.   King  EG,  Macdonald  SJ,  Long  AD.  2012.  Proper'es  and  power  of  the  Drosophila   Synthe'c  Popula'on  Resource  for  the  rou'ne  dissec'on  of  complex  traits.  Gene'cs   191:935–949.   D  S   P  R   Conclusions           This  project  was  funded  by  the  NSF,   the  NIH  (F32GM099382),  and  the   University  of  Missouri  Office  of   Undergraduate  Research.   Figure  1.  Expected  and  es'mated  haplotype  frequencies  of  A1  (above)  and  AB8  (below)  founders  for  pools  1  and  4  across  the   genome.  Chromosome  arms  are  displayed  in  varying  colors  while  HMM  inferred  frequencies  appear  in  a  darker  shade  and  es'mated   values  appear  lighter.   Fly  Prep   Pool   min  %D   chromosome   max  %D   chromosome   mean  %D   ave  coverage   1   0.24   X   24.51   X   4.24   59.90   2   0.55   2L   27.31   X   3.97   51.68   3   0.93   2L   20.69   X   5.68   28.75   4   0.47   2R   10.65   2L   2.54   70.12   Figure  2.  Percent  difference  between  es'mated  and  HMM  inferred  haplotype  frequencies  in  Pool  1  (blue)  and  Pool  4  (green)  across   the  genome.  Pool  4  displayed  consistently  lower  D  values  than  pools  1-­‐3.   Figure  3.  Average  percent  difference  observed  in  haplotype  es'mates  as  a  result   of  varying  marker  density  in  chromosome  arm  2R,  Pool  1.    SNP  density  was  down-­‐ sampled  by  randomly  selec'ng  SNPs  from  the  pooled  genomic  data  from  1K-­‐140K   SNPs  in  increments  of  1K.  Accuracy  of  the  es'mator  suffers  below  1K  SNPs/Mb  but   reaches  a  stable  low  %D  aier  this  point.   Algorithm   The  founder  ancestry  at  any  given  posi'on  in  each  RIL  is  determined  with  a  high  degree  of  certainty   using  the  genome  sequences  of  the  founders  and  genotype  data  for  the  RILs  in  a  hidden  Markov   model2  (HMM).  In  this  study,  HMM  inferences  are  used  as  expected  haplotype  frequencies  in  the   different  pools.   Table  1.  Summary  sta's'cs  for  pools  1-­‐4.  Lowest  mean  D  values  are  observed  in  pool  4,  likely  due  to  greater  average  coverage.   Ques'on     SeOng  precedents  for  op*mal  configura*ons  for  haplotype   es*ma*on  from  pooled  samples  to  minimize  cost  and   maximize  quan*ty  and  accuracy  of  results.   What  are  the  op*mal  algorithm  seOngs  for  es*ma*ng   haplotype  frequencies  from  pooled  sequence  data?     0 1000 2000 3000 4000 5000 4681012 SNP Density (SNPs per Mb) Average%D | | | As  the  cost  of  genome  sequencing  decreases,  studies  that  were  previously  impossible  are  becoming  more  feasible.     For  popula'on  gene'cists,  however,  sequencing  every  individual  in  a  popula'on  is  oien  cost  prohibi've.    Pooled   sequencing  is  a  commonly  used,  cheaper  alterna've  to  individual-­‐level  sequencing.  However,  accurately  es'ma'ng   the  haplotype  frequencies  of  a  popula'on  from  pooled  sequence  data  remains  a  challenge.  In  order  to  address  this   problem,  we  have  developed  and  refined  an  algorithm  to  es'mate  haplotype  frequencies  from  pooled  data.  To   experimentally  validate  our  method,  we  used  genomic  data  collected  from    pooled  sets  of  recombinant  inbred  lines   with  a  completely  known  haplotype  structure.  These  lines  were  derived  from  a  50  genera'on  controlled  cross  of  15   homozygous  founder  lines  of  Drosophila  melanogaster.    We  validated  the  predic've  accuracy  of  our  haplotype   es'mator  by  comparing  the  haplotype  frequency  es'mates  obtained  by  our  method  with  the  known  haplotype   composi'on  of  the  pool.    We  present  a  study  in  which  the  accuracy  of  the  haplotype  es'mator  is  tested  against   variability  in  raw  sequence  coverage,  SNP  density,  and  the  procedure  of  the  algorithm.  This  algorithm,  which  can   accurately  es'mate  the  haplotype  frequency  of  a  popula'on  from  pooled  sequence  data,  has  the  poten'al  to   significantly  progress  the  field  of  genotype-­‐phenotype  mapping,  a  major  goal  of  modern  biology  and  bioinforma'cs.         Position (Mb) %D 051015 0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0 X 2L 2R 3L 3R Applica'on   These  plots  demonstrate  varying  haplotype  frequencies  between  young  and  old  popula'ons  of   Drosophila  melanogaster  in  a  longevity  study1.  For  this  region  on  chromosome  2R  there  is  a  significant   difference  between  haplotype  frequencies  in  the  two  popula'ons.  Different  colors  represent  the  8   different  haplotypes.   (RILs)   Algorithm  intakes  flavors  of  SNPs  at  each  posi'on  (eg.  0=A,   1=T)  and  refines  a  haplotype  frequency  guess  to  minimize  the   difference  between  the  observed  allele  counts  and  es'mated   allele  counts  weighted  by  haplotype  frequency.   ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● 0 1 2 3 4 5 6 3.23.64.04.4 Window Size (cM) Average%D Figure  4.  The  effect  of  window  size  on  accuracy  using  (a)  SNPs,  (b)  chromosomal   posi'on  (Kb),  and  (c)  gene'c  posi'on  (cM).  The  op'mal  window  size  is  marked  on   each  plot.    Gene'c  posi'on  has  the  lowest  %D,  and  is  therefore  the  op'mal   window  metric  when  window  size  is  between  0.8  and  3.5  cM  (%D:  3.05-­‐3.13).   ● ● ● ● ● ● ●● ●●●●●●●●●●●●● ● ● ● ● ● ● ● ● 0 5000 10000 15000 3.54.55.56.5 Window Size (SNP) Average%D (a)    (c)   Op'mum  =  3.38  %D                          v    at  2500  bp     ß  200    SNP  window     ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● 0 500 1000 1500 2000 3.54.55.56.5 Window Size (Kb) Average%D Op'mum  =    3.37  %D                                  v      at  500Kb         Op'mum  =  3.05  %D                                  v                  2  cM       (ho)   (hY)   Pool 1 Position (Mb) Frequency 0.000.100.200.30 0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0 X 2L 2R 3L 3R Pool 4 Position (Mb) Frequency 0.000.100.20 0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0 X 2L 2R 3L 3R Pool 1 Position (Mb) Frequency 0.00.10.20.30.4 0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0 X 2L 2R 3L 3R Pool 4 Position (Mb) Frequency 0.000.100.200.30 0 10.0 0 12.4 25.3 37.4 0 10.5 24.3 40.6 52.0 X 2L 2R 3L 3R