2013 july 25 systems biology rna seq v2

772 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
772
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2013 july 25 systems biology rna seq v2

  1. 1. Cancer  Systems  Biology:   RNA-­‐Seq  and  Differen;al  Expression  Analysis   Taking  advantage  of  a  Measurement  Revolu;on   July  25,  2013   Anne  DeslaLes  Mays   Wellstein/Riegel  Laboratory   Mentor:  Anton  Wellstein,  MD,  PhD   7/25/13   Wellstein/Riegel  Laboratory   1  
  2. 2. Talk  Outline   •  On  the  Shoulders  of  Giants   •  Sequencing  Timeline   •  RNASeq  for  Everyone   •  RNA-­‐Sequencing  Details   •  Differen;al  Expression  Analysis   •  Causality   •  Cancer  Therapeu;cs  Example   •  Ask  Bigger  Ques;ons  –  Sequencing  Everything     7/25/13   Wellstein/Riegel  Laboratory   2  
  3. 3. 7/25/13   Wellstein/Riegel  Laboratory   3   Rosalind  Franklin   “pioneered  use  of  x-­‐rays  to  create  images  of  unorganized  maLer  –  such  as   large  biological  molecules  –  not  just  single  crystals”   hLp://www.pbs.org/wgbh/aso/databank/entries/bofran.html   “Franklin  made  equipment  adjustments  to  produce  an  extremely  fine  beam  of  x-­‐rays.     She  extracted  finer  DNA  fibers  than  ever  before  and  arranged  them  in  parallel   bundles.    Studied  fibers’  reac;ons  to  humid  condi;ons.  …  allowed  her  to  discover   cruical  keys  to  DNA’s  structure….  Wilkins  shared  this  with  Watson  &  Crick  at   Cambridge  without  her  knowledge…”  
  4. 4. 7/25/13   Wellstein/Riegel  Laboratory   4   Sequencing  Timeline  
  5. 5. 7/25/13   Wellstein/Riegel  Laboratory   5   Human  Sequencing  Timeline   Key  Technical  Advances:    Celera  Human  Sequence  done  in  one  loca;on   on  the  largest  super  computer  in  private  hands  at  that  ;me  
  6. 6. 7/25/13   Wellstein/Riegel  Laboratory   6  
  7. 7. 7/25/13   Wellstein/Riegel  Laboratory   7  
  8. 8. 7/25/13   Wellstein/Riegel  Laboratory   8  
  9. 9. 7/25/13   Wellstein/Riegel  Laboratory   9  
  10. 10. 7/25/13   Wellstein/Riegel  Laboratory   10  
  11. 11. 7/25/13   Wellstein/Riegel  Laboratory   11  
  12. 12. 7/25/13   Wellstein/Riegel  Laboratory   12  
  13. 13. Cancer  Systems  Biology   Taking  advantage  of  measurement  revolu3on   Declining  sequencing  costs,  decreasing  compu3ng  costs   How  do  you  leverage  all  this  data?   GEO May 25, 2012 GEO June 25, 2013
  14. 14. Here  is  an  example  RNA-­‐Seq  Workflow   7/25/13   Wellstein/Riegel  Laboratory   14   Experimental   Design   Sample   Collec;on   Quality  Control   Read  Trimming   Differen;al   Analysis   Transcript   Iden;fica;on   Pathway   Analysis   Feature   Discovery   Sequencing  
  15. 15. 7/25/13   Wellstein/Riegel  Laboratory   15   hLp://rnaseq.uoregon.edu/index.html  
  16. 16. 7/25/13   Wellstein/Riegel  Laboratory   16   hLp://rnaseq.uoregon.edu/index.html  
  17. 17. 7/25/13   Wellstein/Riegel  Laboratory   17   hLp://rnaseq.uoregon.edu/index.html  
  18. 18. 7/25/13   Wellstein/Riegel  Laboratory   18   hLp://rnaseq.uoregon.edu/index.html  
  19. 19. 7/25/13   Wellstein/Riegel  Laboratory   19  hLp://rnaseq.uoregon.edu/index.html  
  20. 20. 7/25/13   Wellstein/Riegel  Laboratory   20   hLp://rnaseq.uoregon.edu/index.html  
  21. 21. 7/25/13   Wellstein/Riegel  Laboratory   21   hLp://rnaseq.uoregon.edu/index.html  
  22. 22. Replicates:    Type  I  and  Type  II  errors   7/25/13   Wellstein/Riegel  Laboratory   22  
  23. 23. Detec;ng  Signal  vs.  Noise   7/25/13   Wellstein/Riegel  Laboratory   23  
  24. 24. 7/25/13   Wellstein/Riegel  Laboratory   24  
  25. 25. What  is  the  goal  of  the  sequencing   experiment?   7/25/13   Wellstein/Riegel  Laboratory   25  
  26. 26. 7/25/13   Wellstein/Riegel  Laboratory   26  
  27. 27. 7/25/13   Wellstein/Riegel  Laboratory   27  
  28. 28. 7/25/13   Wellstein/Riegel  Laboratory   28   Before  Library  Construc;on   1.  Most  vendors  and  cores  will  assess   the  quality  of  the  RNA  before   sequencing   2.  Important  to  determine  before   sequencing  begins   Garbage  –  in  ==  Garbage  out   Before  library  construc;on,  RNA  quality  must  be  assessed  
  29. 29. 7/25/13   Wellstein/Riegel  Laboratory   29   RNA-­‐seq  
  30. 30. 7/25/13   Wellstein/Riegel  Laboratory   30   Three  steps  to  get  to  a  fresh  sequence  with  the   Illumina  Genome  Sequence  Analyzer   •  Library  genera;on   •  Cluster  genera;on   •  Sequencing  
  31. 31. 7/25/13   Wellstein/Riegel  Laboratory   31   Before  Library  Construc;on   1.  Poly-­‐A  Selec;on  (Total  RNA  -­‐>   mRNA)   2.  mRNA  fragmenta;on   3.  First  strand  synthesis  (here  we  stop   if  we  want  to  maintain  strand   specificity   4.  Second  strand  synthesis   Other  techniques   1.  Ribozero   2.  Ribominus   Library  Construc;on:    Messenger  RNA  are  Poly-­‐A  selected   from  Total  RNA,  fragmented  and  cDNA  synthesized  
  32. 32. 7/25/13   Wellstein/Riegel  Laboratory   32   cDNA  (single  or  double  stranded)   1.  cDNA  is  blunt  end-­‐repaired  and   phosphorylated  (B.)   2.  A-­‐base  added  to  prepare  for   indexed  adapter  liga;on  (C.)     Library  Construc;on:  End  repair  and  adenyla;on  results  in   adapter  liga;on  ready  constructs  
  33. 33. 7/25/13   Wellstein/Riegel  Laboratory   33   Index  adapter  liga;on  and  product   ready  for  amplifica;on  on  cBot  or   the  cluster  sta;on   1.  Strand  specific  tags  are  added  to   the  A  base  –  ligate  index  adapter   (D)   2.  Denature  and  amplify  for  final   product  (E)     Library  Construc;on:  Adapter  liga;on  results  in  cluster-­‐ genera;on-­‐ready  constructs  
  34. 34. 7/25/13   Wellstein/Riegel  Laboratory   34   Single  DNA  molecules  hybridize  to   the  lawn  of  oligos  graped  to  the   surface  of  the  flow  cell   1.  Oligo  lawn   2.  Oligos  hybridize  to  the  adapters   that  had  been  ligated  to  the   library  fragments  which  flow   through  the  cell       Cluster  Genera;on:  In  the  illumina  Cbot  system,  single  molecules  are   isothermally  amplified  in  a  flow  cell  to  prepare  them  for  sequencing  
  35. 35. 7/25/13   Wellstein/Riegel  Laboratory   35   Bridge  amplifica;ons  resul;ng  in   100s  of  millions  of  unique  clusters   1.  Each  fragment  is  clonally   amplified  through  a  series  of   extensions  and  isothermal  bridge   amplifica;ons   2.  Reverse  strands  cleaved  and   washed  away   3.  Ends  are  blocked   4.  Sequencing  primer  hybridized  to   the  DNA  template   5.  Libraries  are  ready  for   sequencing       Cluster  genera;on:    Bound  fragments  are  extended  to  make   copies  and  reverse  strands  cleaved  and  washed  away  
  36. 36. 7/25/13   Wellstein/Riegel  Laboratory   36   4  fluorescently  labeled  reversibly   terminated  nucleo;des   1.  Each  base  competes  for  addi;on   2.  Natural  compe;;on  ensures   highest  accuracy   3.  Aper  each  round  of  synthesis,   clusters  are  excited  by  a  laser   emiqng  a  color  that  iden;fies   the  newly  added  base   4.  Fluorescent  label  and  blocking   group  are  removed  allowing  for   addi;on  of  next  nucleo;de   5.  Proprietary  (Illumina)  chemistry   reads  a  base  in  each  cycle   6.  Allows  for  accurate  sequencing   through  difficult  regions  such  as   homopolymers  and  repe;;ve   sequence   Sequencing:    100s  of  millions  of  clusters  sequenced   simultaneously  
  37. 37. There  are  other  ways  to  Inquire  about  the   Transcriptome   •  Array  Based  Technologies   –  Affymetrix   –  Agilent   –  Known  genes  and  hybridiza;on  protocols   •  Microarray   –  20,000+  array  experiments  on  a  single  platorm   –  Edge  effects   –  False  posi;ves  /  false  nega;ves   •  Bead-­‐based  arrays   •  Tiling  arrays   •  SAGE   7/25/13   Wellstein/Riegel  Laboratory   37  
  38. 38. What  is  unique  about  RNA-­‐Seq?     •  Allows  you  to  discover  and  profile  the  en;re  transcriptome  of   any  organism   •  No  probes  or  primers  to  design   •  Novel  transcripts   •  Novel  isoforms   •  Alterna;ve  splice  sites   •  Rare  transcripts   •  cSNPS  –  all  of  this  in  one  experiment   7/25/13   Wellstein/Riegel  Laboratory   38  
  39. 39. 7/25/13   Wellstein/Riegel  Laboratory   39   Aper  sequencing…   1.  Quality  control  –  trim  your  reads   2.  Count  Reads   •  Align  to  genome   •  Align  to  transcriptome   3.  Interpret  Data   •  Sta;s;cal  tests  (differen;al   expression  analysis)   •  Visualiza;on  (mapped   reads)   •  Pathway  analysis     Not  so  simple  –  big  data,  big   compute  requirements     Aper  sequencing,  we  must  then  perform     RNA-­‐Seq  Data  Analysis  
  40. 40. 7/25/13   Wellstein/Riegel  Laboratory   40  
  41. 41. 7/25/13   Wellstein/Riegel  Laboratory   41  
  42. 42. RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Step 1: align-reads: FASTQ    PE*  reads   Reference   Genome     Assembly   WGS   Exis;ng   Gene  models   (gt  files  w/  tss  ids)*   Gene  models     mapped  to     reference   gsnap   trimmoma;c   FASTQC   trimmed    PE*  reads   Quality  control     consensus     per  read  length    graphs   •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Gsnap  aligned   Bam  files   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia Align-reads: Gsnap is used to align reads to the genome sequence. samtools   Gsnap.CoordSorted.bam  
  43. 43. RNA  Alterna;ve  Splicing:  Why  you   need  gapped  aligners   7/25/13   Wellstein/Riegel  Laboratory   43  
  44. 44. RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Step 2: assemble-reads: Prep_rnaseq_   alignments_for     genome_assisted_   assembly.pl   •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia assemble-reads: Trinity is used to assemble the RNA-Seq reads in each partition. This can be done in a massiviely parallel manner, typically requiring little RAM as compared to whole de novo RNA-Seq assemblies, and can be executed using standard hardware. The firs step (pre_rnaseq_alignments_for genome_assisted_assembly.pl – partitions the reads according to covered regions Gsnap.CoordSorted.bam   Find  Dir_*  -­‐name    “*reads”  >  read_files.list   Read_files.list   GG_write_trinity_   cmds.pl   ParaFly   Trinity_GG.cmds   Find  Dir_*  -­‐name    “*inity.fasta”  –exec  cat  {}  |     Inchworm_accession_incrementer.pl  >   Trinity_GG.fasta   Trinity_GG.fasta  
  45. 45. RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Steps 3 and 4: align-transcripts and assemble-transcript alignments Launch_PASA_pipeline.pl   •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia Trinity_GG.fasta   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.gt   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.bed   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.gff3   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.fasta  
  46. 46. RNASeq flow chart – Step 5 – Tuxedo Suite – using the output of the trinity-genome-guided assembly and the pasa and keygene annotation pipelines à call tuxedo suite (in parallel with then calling the abundancy estimator RSEM •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia         Gff3  (gene  model)         Gff3togt   (convert  to  gt  format           Gt  (gene  model)         tophat   Calls    Bow;e2           Junc;ons.bed         Accepted.hits.   sam  
  47. 47. RNASeq Quantitation and Differential Analysis •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Quantitation (matrix file with counts per isoform) Model building/Differential analysis Trinity.fasta   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia Tuxedo suite Trinity genome guided assembly Abundance     es;ma;on   RSEM   Transcripts   .gt/.gff*   trimmed    PE*  reads   RSEM.isoform.   results   Limma  Model   Design/contrast   matrix     building   randomForest     pcAlg   Genie3.R   DREAM4   Accepted.hits.   sam   cuffdiff2   •  Transcript annotation file produced by cufflinks, cuffcompare or other source •  Counts and read group tracking files also created Isoforms.fpkm_tracking   Genes.fpkm.tracking   Cds.fpkm.tracking   Tss_groups.fpkm.tracking   Isoform_exp.diff   Gene_exp.diff   Tss_group_exp.diff   Cds_exp.diff  
  48. 48. 7/25/13   Wellstein/Riegel  Laboratory   48   How  much  RNA-­‐sequencing  data?   1.  20  million  paired  end  reads  ~  2  GB  of  data   2.  100  million  paired  end  reads  ~  10  GB  of  data     How  much  computa;on  power?   1.  More  memory,  more  processors,  less  ;me  it  takes  to  compute   2.  Outsource  the  analysis,  s;ll  will  need  to  store  the  results  somewhere   Amazon  web  services   S3  storage   EC  elas;c  cloud  on  demand  computa;onal  facility     Georgetown  University  High  Performance  Computer  Core   matrix.georgetown.edu     UPENN  Galaxy  services         How  much  RNA-­‐sequencing  data,  how  much  computa;on   power  and  where  do  you  go  to  compute?  
  49. 49. 7/25/13   Wellstein/Riegel  Laboratory   49   A  growing  number  of  tools  enable  RNA-­‐Seq  analysis  
  50. 50. 7/25/13   Wellstein/Riegel  Laboratory   50   What  percentage  of  reads  are  covered?  What   percentage  of  reads  are  mapped?   3’  Bias  on  transcript  reads   1.  60-­‐80%  of  reads  are  mapped   2.  Highest  percentage  or  3’  end  of   reads  are  mapped   3.  Reads  need  to  be  quality  trimmed   Mapping  tools  bias  exons  to  known   genes        
  51. 51. 7/25/13   Wellstein/Riegel  Laboratory   51   Galaxy  is  a  web  based  tool  commiLed  to  enable  a   researcher  (more  than  just  for  RNA-­‐Seq)  
  52. 52. 7/25/13   Wellstein/Riegel  Laboratory   52  
  53. 53. How  to  visualize  mapped  results?   •  UCSC  Genome  Browser  (Gbrowse)   •  Integrated  Genome  Browser  (IGB)   •  Integrated  Genome  Viewer  (IGV)   Many  shared  formats,  reading  many  of  the  outputs  generated  by   the  programs,  ability  to  generate  ones  own  tracks   7/25/13   Wellstein/Riegel  Laboratory   53  
  54. 54. 7/25/13   Wellstein/Riegel  Laboratory   54   Scale chr21: DNase Clusters Multiz Align Human mRNAs K562 CTCF Int 1 K562 Pol2 Int 1 HeLaS3 Pol2 Int 1 GM12878 1 H1-hESC 1 K562 1 HeLa-S3 1 HepG2 1 GM12878 H1-hESC K562 HeLa-S3 HepG2 HUVEC GM12878 Pk H1-hESC Pk K562 Pk HeLa-S3 Pk 50 kb hg19 23,600,000 23,650,000 C7 Random C7 Targeted Transcription Factor ChIP-seq from ENCODE SwitchGear Genomics Transcription Start Sites H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE RefSeq Genes Human ESTs That Have Been Spliced Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE Vertebrate Multiz Alignment & Conservation (46 Species) UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) Simple Nucleotide Polymorphisms (dbSNP 137) Found in >= 1% of Samples Individual matches for article Przybylski2010 Sequences in Articles: PubmedCentral and Elsevier SNPs in Publications Human mRNAs from GenBank Regulatory elements from ORegAnno Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) from ENCODE/GIS-Ruan DNA Methylation by Reduced Representation Bisulfite Seq from ENCODE/HudsonAlpha CpG Methylation by Methyl 450K Bead Arrays from ENCODE/HAIB Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass. HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2205:11321:72079_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_1:N:0:CTCTCA HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_2:N:0:CTCTCA HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_2:N:0:CACTCC HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_1:N:0:CACTCC HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_1:N:0:CACTCA HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_1:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_1:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_2:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_2:N:0:ATCACG KCEBPB LMafK_(ab50322) KTAL1_(SC-12984) KCEBPB KKYY1 KTBP KE2F4 KTAF1 KELF1_(SC-631) KPol2-4H8 KHEY1 KE2F6_(H-50) KCEBPB KTFIIIC-110 ggNFKB GgPU.1 GBATF GIRF4_(M-17) GBCL11A GgPU.1 gPU.1 KCEBPB DA743484 BF207587 Delgado-Olguin2004 Layered H3K27Ac 100 _ 0 _ Mammal Cons K562 CTCF Sig 1 K562 Pol2 Sig 1 HeLaS3 Pol2 Sig 1
  55. 55. 7/25/13   Wellstein/Riegel  Laboratory   55  
  56. 56. 7/25/13   Wellstein/Riegel  Laboratory   56  
  57. 57. What  do  RNA-­‐Seq  reads  look  like  for  GAPDH?   Repeat  masked  allowing  1/2  mismatched  bases  blat’d  reads     viewed  in  IGB  6.7.2  
  58. 58. 7/25/13   Wellstein/Riegel  Laboratory   58   RNA-­‐Seq  Differen;al   Expression  analysis  
  59. 59. What  does  GAPDH  look  like  in  terms  of  quan;ta;on?   TOTAL  BM   HPP   RPKM   3SEQ  Counts   BLAT  Reads   RPKM   3SEQ  Counts   BLAT  Reads   CD34   0.7   340   230   8   8   14   BST1   19.7   5374       31   31       CD133   0.2   173   176   16   16   33   THY1   0   7       4   4       A12           1           0   A5           0           0   ALK   0   9   24   0   0   3   B9           0           0   C1           0           0   C2           0           0   C7           0           0   E7           0           0   E9           2           0   F6           0           0   G12           0           0   GAPDH   3013.2   727831   356289   120.8   5559   2670   H3           0           0   Blat  read  raw  counts  ra;o  ==  3Seq  counts  ra;o  ~=  130  to  1   RPKM  ra;o  ~=  24.3  
  60. 60. 7/25/13   Wellstein/Riegel  Laboratory   60   RNA-­‐Seq  Quan;fica;on  Challenge:  A  problem  that   exists  with  RNA-­‐Seq  data  that  doesn’t  exist  with  array   data:    Longer  transcripts  produce  more  reads  than   shorter  transcripts   One  solu;on  to  account  for  this  is  RPKM  (FPKM  used  by  Cufflinks)     RPKM  =  10^9  x  C  /  NL,  which  is  really  just  simply  C/N     C(gene)=  the  number  of  mappable  reads  that  fall  onto  a  gene's  exons   N=  total  number  of  mappable  reads  in  the  experiment   L(gene)=  the  sum  of  the  exons  in  base  pairs.     Wold  (2008)     RPKM  –  reads  per  kilo  base  per  million   CPM  –  counts  per  million    
  61. 61. 7/25/13   Wellstein/Riegel  Laboratory   61   RNA-­‐Seq  Quan;fica;on  Challenge:  DESeq  Method  uses   the  geometric  mean  of  counts  in  all  samples   DESeq  Method:   Construct  a  "reference  sample"  by  taking,  for  each  gene,  the  geometric  mean   of  the  counts  in  all  samples.     To  get  the  sequencing  depth  of  a  sample  rela;ve  to  the  reference,  calculate   for  each  gene  the  quo;ent  of  the  counts  in  your  sample  divided  by  the  counts   of  the  reference  sample.     Now  you  have,  for  each  gene,  an  es;mate  of  the  depth  ra;o.     Simply  take  the  median  of  all  the  quo;ents  to  get  the  rela;ve  depth  of  the   library.      'es;mateSizeFactors'  func;on  of  DESeq  package  does  this  calcula;on.    
  62. 62. DESeq:  an  R  package  that  works  with  Raw  Counts  to   determine  genes  differen;ally  expressed  across  samples   •  Simon  Anders   7/25/13   Wellstein/Riegel  Laboratory   62  
  63. 63. 7/25/13   Wellstein/Riegel  Laboratory   63  
  64. 64. 7/25/13   Wellstein/Riegel  Laboratory   64  
  65. 65. 7/25/13   Wellstein/Riegel  Laboratory   65  
  66. 66. Given  a  list  of  differen;ally  expressed  Genes  now   enrichment  analysis  should  be  performed   •  Enrichment  analysis  allows  the  researcher  to  leverage   documented  experiments  which  provide  evidence  for  genes   roles  in  pathways  and  func;ons  that  enable  the  researcher  to   determine  the  results  and  significance  of  their  experiments   •  DAVID   –  Gene  ontology   –  Func;onal  ontology   •  Revigo   –  Output  of  David  may  be  placed  in  REVIGO  for  further   interpreta;on  and  sta;s;cal  explora;on  of  significance  of   discovered  sets  of  genes   7/25/13   Wellstein/Riegel  Laboratory   66  
  67. 67. Using  differen;ally  expressed  genes,  biological   pathways  should  be  explored   •  Differen;ally  expressed  genes  are  put  into  programs  such  as   pathway  studio  or  ingenuity   •  Shortest  path  programs  and   •  Canonical  pathway  analysis   •  Enables  a  researcher  to  reverse  engineer  the  pathways   expressed  in  the  course  of  a  healthy  response  to  a  diseased   response   •  Ideally  a  pathway  reveals  the  observed  phenotype  –   connec;ng  the  expressed  gene  expression  program  with  the   phenotype  –  genotype  –  gene  expression  program  to   phenotype   7/25/13   Wellstein/Riegel  Laboratory   67  
  68. 68. RNA-­‐Sequencing:  What  is  it  good  for?   •  Transcript  Annota;on   –  Muta;on  iden;fica;on   –  Isoform  determina;on   –  Alterna;ve  Splice  Varia;on   •  Differen;al  Gene  Expression   –  Phenotypically  segrega;ng  experiments   –  Allows  us  to  get  at  the  How  in  looking  at  the  response  of   an  organism  within  a  par;cular  cell  popula;on  to  events   –  Good  and  careful  design  will  allow  us  to  unfold  the   dynamics  of  this  response  and  iden;fy  targets  for  altering   disease  responses  to  improve  ones  chances  of  surviving   7/25/13   Wellstein/Riegel  Laboratory   68  
  69. 69. 7/25/13   Wellstein/Riegel  Laboratory   69  
  70. 70. 7/25/13   Wellstein/Riegel  Laboratory   70   hLp://bayes.cs.ucla.edu/home.htm    
  71. 71. 7/25/13   Wellstein/Riegel  Laboratory   71  
  72. 72. 7/25/13   Wellstein/Riegel  Laboratory   72  
  73. 73. 7/25/13   Wellstein/Riegel  Laboratory   73  
  74. 74. 7/25/13   Wellstein/Riegel  Laboratory   74   Acknowledgements   Dr.  Anton  Wellstein   Dr.  Anna  Riegel     Dr.  Marcel  Schmidt   Dr.  Elena  Tassi   The  en;re  lab:    Elena,  Virginie,  Ghada,  Ivana,  Eveline,  Khalid,  Eric  the  en;re  Wellstein/Riegel  laboratory       My  CommiLee       Dr.  Yuri  Gusev   Dr.  Anatoly  Dritschilo   Dr.  Michael  Johnson   Dr.  Christopher  Loffredo   Dr.  Habtom  Ressom   Dr.  Terry  Ryan  (external  commiLee  member)     High  Performance  Core  Group,  Steve  Moore,  especially  Woonki  Chung   Amazon  Cloud  Services   Dr.  Ann  Loraine,  UNC,  IGB  Developer   Brian  Haas,  Author  Trinity  Suite      
  75. 75. Some  Resources   •  hLp://rnaseq.uoregon.edu/index.html   •  hLp://dx.doi.org/10.1038/npre.2010.4282.1    (DESeq)   •  hLp://galaxy.psu.edu/   •  hLp://seqanswers.com/   •  hLp://www.broadins;tute.org/igv/   •  hLp://bioviz.org/igb/index.html   •  hLp://www.illumina.com   •  hLp://www.otogene;cs.com   •  hLp://www.dnanexus.com   •  hLp://bioconductor.org/packages/2.12/bioc/html/limma.html   •  hLp://trinityrnaseq.sourceforge.net/   •  hLp://trinityrnaseq.sourceforge.net/genome_guided_trinity.html   •  hLp://cufflinks.cbcb.umd.edu/   •  hLp://brb.nci.nih.gov/BRB-­‐ArrayTools.html   •  hLp://www.modernatx.com/   7/25/13   Wellstein/Riegel  Laboratory   75  
  76. 76. Systems  Biology  History  (wikipedia)   •  Systems  biology  roots  found  in   –  Quan;ta;ve  modeling  of  enzyme  kine;cs   –  Mathema;cal  modeling  of  popula;on  growth   –  Simula;ons  to  study  neurophysiology   –  Control  theory  and  cyberne;cs   •  Theorists   –  Ludwig  von  Bertalanffy  –  General  Systems  Theory   –  Alan  Lloyd  Hodgkin  and  Andrew  Fielding  Huxley  –  constructed  a   mathema;cal  model  that  explained  poten;al  propaga;ng  along  the   axon  of  a  neuron  cell   –  Denis  Nobel  –  first  computer  model  of  the  heart  Pacemaker   7/25/13   Wellstein/Riegel  Laboratory   76  
  77. 77. Scien;fic  knowledge  is  limited  (and  advanced)  by  the   limits  (and  advancements)  of  measurement   7/25/13   Wellstein/Riegel  Laboratory   77   •  Ilya  Shmulevich  Genomic  Signal  Processing  “Validity  of  the   model  involves  observa;on  and  measurement,  scien;fic   knowledge  is  limited  by  the  limits  of  measurement”   •  Erwin  Shrödinger  Science  Theory  and  Man:  “It  really  is  the   ul;mate  purpose  of  all  schemes  and  models  to  serve  as   scaffolding  for  any  observa;ons  that  are  at  all  means   observable”  

×