High-­‐Resolu,on	  Views	  of	      Cancer	  Genomes	  
The	  Central	  Dogma	  
+	  
Your	  Nature	  Paper	  
Our	  First	  Experiment	  
Overview	  of	  BAC	  in	  the	  Genome	  
Sequencing	  a	  BAC	  
Sequence	  Coverage	  
Repeats	  
Repeats	  
Repeats	  are	  not	  created	  equal	  
Genomic	  Sequencing	     TargeFng	  the	  Exome	  
    Long	  oligos	  synthesized	  on	       arrays	  (DNA)	      RNA	  baits	  synthesized	       from	  DNA	  oligo	  t...
Data	  Flow	      FASTQ	  files	  generated	  by	  Illumina	  pipeline	      Aligned	  to	  reference	  genome	  (hg18,	 ...
Data	  Pipeline....	      Samtools	  import	      Samtools	  sort	      Picard	  MarkDuplicates	      GATK	  Indel	  R...
Realignment	  around	  Indels	      The	  problem	                 Aligners	  align	  each	  read	  independently	       ...
Quality Recalibration    Since most SNV callers will rely on quality scores to     estimate error probabilities, having t...
Variant	  Calling	  and	  EvaluaFon	              A	  developing	  art	  
Sequencing	  Tumor/Normal	  Pairs	  
Good	  SNP	  
Suspect	  Variant	  
SomaFc	  (tumor	  only)	  Variant	  
Likely	  False	  PosiFve	  (normal	  only)	  
LOH	  
NCI60	  Exome	  Sequencing	       No	  Normals	  Available!	  
Variants	  by	  Genomic	  LocaFon	  
All	  Coding	  Variants	  
Type	  1:	  in	  dbSNP,	  Type	  2:	  not	  in	  dbSNP	  
Coding,	  novel	  (no	  dbSNP)	  
Copy	  Number	  from	  Exomes	  
Complete	  Genome	  Sequencing	         Complete	  Genomics	  Data	  
Data	      Delivery	           Via	  USB	  results	      Storage	           Sizes	  are	  LARGE	                  400G...
Breakdown	  of	  Data	  Sizes	  
Data	      Delivery	      Storage	      Processing	           Data	  are	  typically	  tab-­‐delimited	  text	  files,	...
Directory	  Structure	  
Workflows	      Tumor/Normal	           Copy	  Number	           Structural	  Varia,on	           Annotated	  SomaFc	  ...
Germline	  Workflow	  
Germline	  Workflow	      Output	      Future	  direcFons	           Be	  “smarter”	  about	  inheritance	  framework	  ...
Tumor/Normal	  Workflow	  
Medvedev	  et	  al.,	  Nature	  2009	  
Frequent	  geneFc	  alteraFons	  in	  three	  criFcal	  signalling	  pathways.	      The	  Cancer	  Genome	  Atlas	  Resea...
ChromaFn	      ChromaFn	  is	  the	  complex	  of	  protein	  and	  DNA	  that	  make	  up	       the	  chromosomes.	  	 ...
    DNAse	  is	  an	  enzyme	       that	  cuts	  DNA	  at	       locaFons	  where	  DNA	  is	       accessible	      Th...
DNAse	  HypersensiFvity	                Method	  for	  finding	  regions	  of	  “open”	                 chromaFn	         ...
DNAse-­‐chip	  Method	  Crawford,	  G.E.,	  Davis,	  S.,	  Scacheri,	  P.C.,	  Renaud,	  G.,	  Halawi,	  M.J.,	  Erdos,	  ...
DNAse-­‐Seq	  Method	  Crawford,	  G.E.,	  Davis,	  S.,	  Scacheri,	  P.C.,	  Renaud,	  G.,	  Halawi,	  M.J.,	  Erdos,	  M...
DNAse	  Sites	  RelaFve	  to	  Genes	  
DNAse	  HS	  Sites	  and	  Gene	  Expression	      DNAse	  HS	  sites	  near	       transcripFon	  start	  sites	       a...
Nucleosome	  PosiFoning	      Distances	  between	  sequences	       in	  non-­‐DNAse	  HS	  regions	  have	       an	  o...
The	  Last	  Mile	  
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Upcoming SlideShare
Loading in...5
×

Forsharing cshl2011 sequencing

323

Published on

Short overview talk on exome and genome sequencing and DNAse-seq.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
323
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Forsharing cshl2011 sequencing"

  1. 1. High-­‐Resolu,on  Views  of   Cancer  Genomes  
  2. 2. The  Central  Dogma  
  3. 3. +  
  4. 4. Your  Nature  Paper  
  5. 5. Our  First  Experiment  
  6. 6. Overview  of  BAC  in  the  Genome  
  7. 7. Sequencing  a  BAC  
  8. 8. Sequence  Coverage  
  9. 9. Repeats  
  10. 10. Repeats  
  11. 11. Repeats  are  not  created  equal  
  12. 12. Genomic  Sequencing   TargeFng  the  Exome  
  13. 13.   Long  oligos  synthesized  on   arrays  (DNA)    RNA  baits  synthesized   from  DNA  oligo  template    RNA  baits  hybridized  to   DNA  sequencing  library    Targets  captured  using   beads  and  bioFn-­‐labeled   baits    RNA  bait  degraded,   leaving  sequencing  library   enriched  for  target  regions  
  14. 14. Data  Flow    FASTQ  files  generated  by  Illumina  pipeline    Aligned  to  reference  genome  (hg18,  excluding   _random,  unmapped,  and  hap)  using  Novoalign     SAM/BAM  used  extensively    Follow  Broad  InsFtute  GATK  pipeline  for  exome   capture    Use  picard  java  library  for  quality  assessment    Processed  BAM  files  available  via  local  hZp  for   browsing  
  15. 15. Data  Pipeline....    Samtools  import    Samtools  sort    Picard  MarkDuplicates    GATK  Indel  Realignment    GATK  Quality  RecalibraFon    Picard  QC  metrics  
  16. 16. Realignment  around  Indels    The  problem     Aligners  align  each  read  independently     PotenFally  leads  to  increased  error  rates  around   indels    A  potenFal  soluFon     Locally  realign  reads  in  regions  that  might   harbor  an  indel     Goal  is  to  align  reads  overlying  indels  more   accurately,  reducing  errors  in  each  read  and,  in   turn,  reducing  SNV  call  error  rates  
  17. 17. Quality Recalibration  Since most SNV callers will rely on quality scores to estimate error probabilities, having the best possible estimates for error rates is important  Reported error rates from the Illumina sequencer generally reflect technical parameters of the base call process, but not other systematic biases  Quality recalibration can include covariates to account for systematic biases   Cycle count, dinucleotide context, original quality, and sample/library variables
  18. 18. Variant  Calling  and  EvaluaFon   A  developing  art  
  19. 19. Sequencing  Tumor/Normal  Pairs  
  20. 20. Good  SNP  
  21. 21. Suspect  Variant  
  22. 22. SomaFc  (tumor  only)  Variant  
  23. 23. Likely  False  PosiFve  (normal  only)  
  24. 24. LOH  
  25. 25. NCI60  Exome  Sequencing   No  Normals  Available!  
  26. 26. Variants  by  Genomic  LocaFon  
  27. 27. All  Coding  Variants  
  28. 28. Type  1:  in  dbSNP,  Type  2:  not  in  dbSNP  
  29. 29. Coding,  novel  (no  dbSNP)  
  30. 30. Copy  Number  from  Exomes  
  31. 31. Complete  Genome  Sequencing   Complete  Genomics  Data  
  32. 32. Data    Delivery     Via  USB  results    Storage     Sizes  are  LARGE     400GB  per  sample  as  delivered  with  raw  reads  included     Should  use  2-­‐locaFon  backed-­‐up  storage     Not  trivial  to  find  such  storage,  so  might  resort  to  mulFple   USB  drives     Minimize:     Data  movement     Keeping  mulFple  copies  indefinitely  
  33. 33. Breakdown  of  Data  Sizes  
  34. 34. Data    Delivery    Storage    Processing     Data  are  typically  tab-­‐delimited  text  files,  so  Excel   can  be  useful  for  examining  individual  small  files     Generally,  command-­‐line  tools  needed     MacOS  and  linux  only  supported  operaFng   systems,  but  Windows  might  work....     Some  analyses  (snpdiff)  require  large  memory  
  35. 35. Directory  Structure  
  36. 36. Workflows    Tumor/Normal     Copy  Number     Structural  Varia,on     Annotated  SomaFc  Variants    Germline     List  of  annotated  genotypes  per  individual,   summarized  into  a  single  file  that  can  be  used  for   filtering  
  37. 37. Germline  Workflow  
  38. 38. Germline  Workflow    Output    Future  direcFons     Be  “smarter”  about  inheritance  framework     Further  refinements  of  comparison  to  other  data   types  (exomes,  snp  arrays,  RNA-­‐seq)  
  39. 39. Tumor/Normal  Workflow  
  40. 40. Medvedev  et  al.,  Nature  2009  
  41. 41. Frequent  geneFc  alteraFons  in  three  criFcal  signalling  pathways.   The  Cancer  Genome  Atlas  Research  Network  Nature  000,  1-­‐8  (2008)  doi:10.1038/nature07385  
  42. 42. ChromaFn    ChromaFn  is  the  complex  of  protein  and  DNA  that  make  up   the  chromosomes.    It  is  not  a  staFc  structure.  
  43. 43.   DNAse  is  an  enzyme   that  cuts  DNA  at   locaFons  where  DNA  is   accessible    These  “accessible”   regions  have  been   associated  with  open   chromaFn    Regions  of  open   chromaFn  are   necessary  for   transcripFonal  and   regulatory  machinery  to   have  access  to  gene   neighborhoods  and   facilitate  transcripFon  
  44. 44. DNAse  HypersensiFvity     Method  for  finding  regions  of  “open”   chromaFn     In  data  published  with  the  ENCODE   consorFum,  DNAse  hypersensiFve  (HS)   were  shown  to  be  correlated  with:     Histone  modificaFon     TranscripFon  start  sites     Early  replicaFng  regions     TranscripFon  factor  binding  sites   (experimentally  determined  by  ChIP/chip,   etc.)  IdenFficaFon  and  analysis  of  funcFonal  elements  in  1%  of  the  human  genome  by  the  ENCODE  pilot  project.    The  ENCODE  ConsorFum.    Nature,  2007.  
  45. 45. DNAse-­‐chip  Method  Crawford,  G.E.,  Davis,  S.,  Scacheri,  P.C.,  Renaud,  G.,  Halawi,  M.J.,  Erdos,  M.R.,  Green,  R.,  Meltzer,  P.S.,  Wolfsberg,  T.G.,  and  Collins,  F.S.  Nat  Methods,  2006  
  46. 46. DNAse-­‐Seq  Method  Crawford,  G.E.,  Davis,  S.,  Scacheri,  P.C.,  Renaud,  G.,  Halawi,  M.J.,  Erdos,  M.R.,  Green,  R.,  Meltzer,  P.S.,  Wolfsberg,  T.G.,  and  Collins,  F.S.  Nat  Methods,  2006  
  47. 47. DNAse  Sites  RelaFve  to  Genes  
  48. 48. DNAse  HS  Sites  and  Gene  Expression    DNAse  HS  sites  near   transcripFon  start  sites   are  associated  with   acFvely  transcribed   genes.  
  49. 49. Nucleosome  PosiFoning    Distances  between  sequences   in  non-­‐DNAse  HS  regions  have   an  oscillaFng  paZern  with   frequency  that  corresponds  to   a  single  turn  of  the  double-­‐ helix    DNAse  is  known  to  cut   preferenFally  in  the  minor   groove,  which  is  exposed  every   10.4  bases  when  wrapped   around  a  nucleosome    A  nucleosome  is  wrapped  by   147  base  pairs  when   complexed  with  DNA    ImplicaFon:  Nucleosomes  are   posiFoned  in  a  highly   organized,  precise  manner  
  50. 50. The  Last  Mile  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×