THE	
  RNA-­‐SEQ	
  ANALYSIS	
  PIPELINE	
  

Alicia	
  Oshlack	
  
Murdoch	
  Childrens	
  Research	
  Ins5tute	
  
Genes	
  and	
  transcripts	
  
Gene	
  

transcript	
  
New genomic technologies are driving the science
Two	
  ways	
  to	
  look	
  at	
  sequencing	
  data	
  

Sequence	
  
of(mapped)	
  read	
  

• genome	
  sequencing	
  ...
Two	
  ways	
  to	
  look	
  at	
  RNA-­‐seq	
  data	
  

Sequence	
  
of(mapped)	
  read	
  
	
  
• Assembly	
  	
  
• De...
Benefits	
  and	
  opportuni5es	
  of	
  RNA-­‐seq	
  
•  All	
  transcripts	
  are	
  sequenced	
  not	
  just	
  ones	
  ...
This	
  talk	
  
•  Analysis	
  of	
  RNA-­‐seq	
  data	
  for	
  the	
  purpose	
  of	
  
determining	
  differen5al	
  ex...
RNA-­‐seq	
  

Pepke	
  et	
  al,	
  Nature	
  Methods,	
  
Raw	
  data	
  
•  Short	
  sequence	
  reads	
  
	
  
•  Quality	
  scores	
  

@SEQ_ID!
GATTTGGGGTTCAAAGCAGTATCGATCAAATA...
RNA-­‐seq	
  analysis	
  steps	
  
Raw	
  sequence	
  reads	
  

Map	
  onto	
  genome	
  
Which	
  transcriptome?	
  
Ann...
RNA-­‐seq	
  analysis	
  steps	
  
Raw	
  sequence	
  reads	
  

Map	
  onto	
  genome	
  
Which	
  transcriptome?	
  
Ann...
Map	
  reads	
  to	
  the	
  genome	
  
•  Aligning	
  millions	
  of	
  short	
  sequencing	
  onto	
  the	
  
3	
  billi...
Sequencing	
  transcripts	
  not	
  the	
  genome	
  
Gene	
  
CDS	
  

transcript	
  

CDS	
  

CDS	
  

CDS	
  

CDS	
  ...
Splice	
  site	
  mapping	
  
•  Build	
  a	
  junc5on	
  library	
  from	
  
all	
  combina5ons	
  of	
  known	
  
exon	
...
Which	
  transcriptome	
  to	
  use?	
  
RNA-­‐seq	
  analysis	
  steps	
  
Raw	
  sequence	
  reads	
  

Map	
  onto	
  genome	
  

Genome	
  guided	
  assembly	
...
Op5on	
  1	
  
•  Use	
  annota5on	
  (known	
  genes)	
  
–  Works	
  well	
  for	
  well	
  studied	
  organisms	
  (hum...
Op5on	
  2:	
  Genome	
  guided	
  transcript	
  
assembly	
  
•  Uses	
  the	
  loca5on	
  and	
  density	
  of	
  reads	...
Op5on	
  3:	
  De	
  novo	
  transcriptome	
  
assembly	
  
•  Assemble	
  transcripts	
  from	
  the	
  data	
  without	
...
Example:	
  Annota5ng	
  the	
  chicken	
  W	
  
chromosome	
  
	
  	
  Z	
  	
  	
  	
  Z	
  

Male	
  

	
  	
  Z	
  	
 ...
There	
  is	
  an	
  annotated	
  chicken	
  genome	
  
•  Chicken	
  W	
  chromosome	
  is	
  poorly	
  assembled	
  
•  ...
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Experimental	
  design	
 ...
Defining	
  the	
  transcriptome	
  
•  Annota5on	
  ~20,000	
  genes	
  
•  Genome	
  guided	
  assembly	
  (Cufflinks)	
  ~...
Annota5on	
  of	
  the	
  chicken	
  W	
  
combined	
  all	
  three	
  approaches	
  
W/W_random Chromsome
Un_random Chrom...
Summariza5on	
  
Take	
  your	
  “transcriptome”	
  and	
  add	
  
up	
  the	
  reads	
  
CDS	
  

Coding	
  Sequence	
  

CDS	
  

Exons	
  

Introns	
  

§ 	
  Reads	
  in	
  exons	
  
§ 	
  Exons	
  +	
  jun...
CDS	
  

Transcript	
  1	
  

CDS	
  

CDS	
  

CDS	
  

Transcript	
  2	
  
Transcript	
  3	
  

CDS	
  

CDS	
  

CDS	
 ...
Summariza5on	
  turns	
  mapped	
  reads	
  
into	
  a	
  table	
  of	
  counts	
  
Tag	
  ID!

A1!

A2!

B1!

B2!

ENSG00...
RNA-­‐seq	
  analysis	
  steps	
  
Raw	
  sequence	
  reads	
  

Map	
  onto	
  genome	
  
Which	
  transcriptome?	
  
Ann...
Assessing	
  differen5al	
  expression	
  (DE)	
  
•  Which	
  genes	
  are	
  changing	
  in	
  their	
  abundance	
  
bet...
Normaliza5on	
  
Accoun5ng	
  for/removing	
  technical	
  
sources	
  of	
  varia5on	
  
Library	
  size	
  =	
  sum	
  of	
  each	
  column	
  
Tag	
  ID!

A1!

A2!

B1!

B2!

ENSG00000124208!

478!

619!

4830...
Simple	
  thought	
  experiment	
  
•  Two	
  samples	
  A	
  and	
  B,	
  sequenced	
  to	
  the	
  same	
  depth	
  
(sa...
Another	
  way	
  to	
  view	
  it	
  
•  Hypothe5cal	
  example:	
  
Sequence	
  6	
  libraries	
  to	
  
the	
  same	
  ...
Marioni	
  et	
  al.	
  2008	
  RNA-­‐seq	
  data	
  
•  5	
  lanes	
  of	
  liver	
  RNA,	
  5	
  lanes	
  of	
  kidney	
...
0.8
0.4
0.0

Same	
  sample.	
  
Technical	
  replicates	
  

Density

(a)

-6

-4

-2

0

2

4

6

0.2
0.0

Different	
  s...
Another	
  way	
  to	
  view	
  the	
  data	
  

2

K1

2

K2

–  e.g.	
  several	
  liver-­‐
specific	
  genes	
  

•  An	...
The	
  adjustment	
  to	
  data	
  analysis	
  is	
  a	
  
scaling	
  constant	
  
•  Assump5on:	
  core	
  set	
  of	
  g...
Sta5s5cal	
  tes5ng	
  for	
  DE	
  
•  For	
  EACH	
  GENE,	
  is	
  the	
  mean	
  expression	
  level	
  
for	
  the	
 ...
DE	
  analysis	
  happens	
  in	
  R	
  
•  Several	
  packages	
  currently	
  available	
  for	
  DE	
  analysis	
  
of	...
Counts	
  from	
  a	
  Single	
  RNA	
  Sample	
  Can	
  be	
  
Modeled	
  with	
  Poisson	
  Distribu5on	
  
Short	
  rea...
Counts	
  from	
  a	
  Single	
  RNA	
  Sample	
  Can	
  be	
  
Modeled	
  with	
  Poisson	
  Distribu5on	
  
Short	
  rea...
A	
  Small	
  RNA-­‐Seq	
  Experiment	
  (Tech	
  
Reps)	
  
Condi5on	
  B	
  

Condi5on	
  A	
  

M1	
  

M3	
  

M2	
  
...
True	
  Technical	
  Reps	
  Show	
  Poisson	
  
Varia5on	
  for	
  Each	
  Gene	
  

Davis	
  McCarthy	
  

Data:	
  	
  ...
A	
  Small	
  RNA-­‐Seq	
  Experiment	
  
(Biological	
  Reps)	
  
Condi5on	
  B	
  

Condi5on	
  A	
  

M1	
  

M3	
  

M...
Biological	
  Replicate	
  Data	
  shows	
  Quadra5c	
  
Mean-­‐Variance	
  Rela5onship	
  
(development	
  cycle	
  of	
 ...
Need	
  to	
  account	
  for	
  biological	
  
varia5on	
  
•  Technical	
  replicate	
  counts	
  for	
  a	
  gene	
  var...
Output	
  of	
  analysis	
  
•  A	
  list	
  of	
  genes	
  with	
  p-­‐values	
  for	
  tes5ng	
  
differen5al	
  expressi...
RNA-­‐seq	
  analysis	
  steps	
  
Raw	
  sequence	
  reads	
  

Map	
  onto	
  genome	
  
Which	
  transcriptome?	
  
Ann...
RNA-­‐seq	
  analysis	
  steps	
  
Raw	
  sequence	
  reads	
  

Map	
  onto	
  genome	
  
Which	
  transcriptome?	
  
Ann...
The	
  future	
  
•  Effec5vely	
  dealing	
  with	
  no	
  genome	
  or	
  poor	
  
quality	
  genomes	
  (our	
  new	
  m...
Acknowledgements	
  
• 
• 
• 
• 
• 
• 

Mark	
  Robinson	
  (Uni	
  Zurich)	
  
Mavhew	
  Young	
  (Cambridge)	
  
Nadia	
...
Upcoming SlideShare
Loading in...5
×

The RNA-­‐SEQ Analysis Pipeline - Alicia Oshlack

699

Published on

High-throughput sequencing of the transcriptome leads to the production of millions of short reads which need to be analysed in order to make biological inference. RNA-seq data are complex and the analysis involves a series of steps for which research is ongoing. However the choice of analysis methodology can have a major impact on the findings of an experiment. In this talk I will outline the major steps involved in analysing an RNA-seq dataset for the purpose of detecting differential expression. I will discuss and compare the available strategies for each stage of the pipeline. I will also point out several areas in RNA-seq data analysis which require further research in order for the full potential of the technology to be Realised.

Published in: Technology

The RNA-­‐SEQ Analysis Pipeline - Alicia Oshlack

  1. 1. THE  RNA-­‐SEQ  ANALYSIS  PIPELINE   Alicia  Oshlack   Murdoch  Childrens  Research  Ins5tute  
  2. 2. Genes  and  transcripts   Gene   transcript  
  3. 3. New genomic technologies are driving the science
  4. 4. Two  ways  to  look  at  sequencing  data   Sequence   of(mapped)  read   • genome  sequencing     • variant  detec5on   • Muta5on  detec5on   • genomic  rearrangements   • Bisulfite-­‐seq  (methyla5on)   • RNA  edi5ng  etc.   Posi5on  of  mapped   read   • RNA-­‐seq   • ChIP-­‐seq     • MeDIP-­‐seq  for  DNA  methyla5on   etc.       4  
  5. 5. Two  ways  to  look  at  RNA-­‐seq  data   Sequence   of(mapped)  read     • Assembly     • Determining  genes/ transcripts     Posi5on  of  mapped   read     • Expression  levels   • Differen5al  expression     5  
  6. 6. Benefits  and  opportuni5es  of  RNA-­‐seq   •  All  transcripts  are  sequenced  not  just  ones  for   which  probes  are  designed  (cf  microarrays)   •  Annota5on  of  new  exons,  transcribed  regions,   genes  or  non-­‐coding  RNAs   •  Whole  transcriptome  sequencing   –  The  ability  to  look  at  alterna5ve  splicing   –  Allele  specific  expression   –  RNA  edi5ng  
  7. 7. This  talk   •  Analysis  of  RNA-­‐seq  data  for  the  purpose  of   determining  differen5al  expression   •  How  much  are  expression  levels  changing   between  samples?  
  8. 8. RNA-­‐seq   Pepke  et  al,  Nature  Methods,  
  9. 9. Raw  data   •  Short  sequence  reads     •  Quality  scores   @SEQ_ID! GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT! +! !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65  
  10. 10. RNA-­‐seq  analysis  steps   Raw  sequence  reads   Map  onto  genome   Which  transcriptome?   Annota5on  based   Genome  guided  assembly   Summarize  reads  to  transcripts   Sta5s5cal  tes5ng:  Determine   differen5ally  expressed  genes   Systems  biology   De  novo  assembly  
  11. 11. RNA-­‐seq  analysis  steps   Raw  sequence  reads   Map  onto  genome   Which  transcriptome?   Annota5on  based   Genome  guided  assembly   Summarize  reads  to  transcripts   Sta5s5cal  tes5ng:  Determine   differen5ally  expressed  genes   Systems  biology   De  novo  assembly  
  12. 12. Map  reads  to  the  genome   •  Aligning  millions  of  short  sequencing  onto  the   3  billion  bases  of  the  human  genome   •  Accuracy  vs  speed   •  Many  aligners  available  (BWA,  Bow5e,  STAR,   Novoalign,…)  
  13. 13. Sequencing  transcripts  not  the  genome   Gene   CDS   transcript   CDS   CDS   CDS   CDS   CDS   CDS   CDS  
  14. 14. Splice  site  mapping   •  Build  a  junc5on  library  from   all  combina5ons  of  known   exon  boundaries   •  Determine  where  splice   junc5ons  occur  using  the   data  itself  -­‐  unbiased  by   annota5on.   •  Several  so_ware  packages   to  do  this  such  as  TopHat,   SplitSeek,  SpliceMap…   Gerber  et  al,  Nat  Methods,  2011  
  15. 15. Which  transcriptome  to  use?  
  16. 16. RNA-­‐seq  analysis  steps   Raw  sequence  reads   Map  onto  genome   Genome  guided  assembly   De   Which  transcriptome?   novo  assembly   Annota5on  based   Summarize  reads  to  transcripts   Sta5s5cal  tes5ng:  Determine   differen5ally  expressed  genes   Systems  biology  
  17. 17. Op5on  1   •  Use  annota5on  (known  genes)   –  Works  well  for  well  studied  organisms  (human,   mouse,  arabidopsis,  drosophila,…)     –  only  as  good  as  your  annota5on   –  No  novel  transcripts  are  analysed  
  18. 18. Op5on  2:  Genome  guided  transcript   assembly   •  Uses  the  loca5on  and  density  of  reads  along   the  genome  to  assemble  transcripts   •  E.g.  Cufflinks   •  Can’t  assemble  across  breaks  in  the  genome   –  Cancer,  poor  genomes  
  19. 19. Op5on  3:  De  novo  transcriptome   assembly   •  Assemble  transcripts  from  the  data  without  using  a   reference  genome   •  “Harder”  than  genome  assembly   –  –  –  –  Orders  of  magnitude  varia5on  in  coverage   Con5gs  are  short   Alterna5ve  isoforms/transcripts  have  overlapping  sequences   *Very*  computa5onally  intensive   •  So_ware  includes     –  –  –  –  Oases  (velvet)   TransAbyss   Trinity     …  
  20. 20. Example:  Annota5ng  the  chicken  W   chromosome      Z        Z   Male      Z      W   Female   Two  hypotheses  for  mechanisms  of  avian  sex  determina5on:   1.  Dominant  ovary  determining  gene  on  W  (cf  mammals)   2.  Dosage  of  Z-­‐linked  genes    
  21. 21. There  is  an  annotated  chicken  genome   •  Chicken  W  chromosome  is  poorly  assembled   •  Are  genes  on  other  chromosomes  really  on   the  W,  in  par5cular  the  random  chromosome?   Chromosome Assembled Size (Mb) Size inc. random (Mb) Estimated Size (Mb) Estimated Genes (Ensembl) Z 69 70 80 796 W 0.24 0.89 18-54 46 Un_random 56 - - 1287
  22. 22.                                                  Experimental  design   +12hour  Blastoderms     Pooled  Samples     12  Female       PCR  Sexing   12  Female   RNA   12  Male   12  Male   Stage  26  paired  gonads  (day  4.5)   Hand  plate  for  PCR  Sexing   16  Female  gonads   16  Female  gonads   RNA   16  Male  gonads   16  Male  gonads   RNA-­‐seq   • Illumina  HiSeq2000   • Paired-­‐end  100bp   • 4  lanes   • >80million  reads/ sample    
  23. 23. Defining  the  transcriptome   •  Annota5on  ~20,000  genes   •  Genome  guided  assembly  (Cufflinks)  ~45,000   genes   •  De  novo  transcriptome  assembly  ~2.5  million   transcripts  (Abyss  with  filtering)!   A  combined  approach   •  Assemble  cufflink  genes  using  transcripts  from   our  de  novo  assembly  
  24. 24. Annota5on  of  the  chicken  W   combined  all  three  approaches   W/W_random Chromsome Un_random Chromosome Autosomes Abyss Transcripts Cufflinks Transcripts Ensembl Transcripts 391 Blastoderm Coverage Gonads Coverage 0 Coverage Genome Abyss Cufflinks Ensembl 1000 1500 2000 2500 3000 base position 26 Full  list  of  W  genes/transcripts  for  differen5al  expression   Ayers  et  al,  2013   3500
  25. 25. Summariza5on   Take  your  “transcriptome”  and  add   up  the  reads  
  26. 26. CDS   Coding  Sequence   CDS   Exons   Introns   §   Reads  in  exons   §   Exons  +  junc5ons   §   All  reads  start  to  end  of  transcript   §   De  novo  methods   CDS   CDS   Splice  Junc5ons  
  27. 27. CDS   Transcript  1   CDS   CDS   CDS   Transcript  2   Transcript  3   CDS   CDS   CDS   CDS   CDS   CDS   CDS   CDS   CDS   CDS   CDS   Even  when  all  transcripts  are  “known”  summariza5on  or  expression   quan5fica5on  is  difficult.  How  do  you  assign  reads  to  transcripts?  
  28. 28. Summariza5on  turns  mapped  reads   into  a  table  of  counts   Tag  ID! A1! A2! B1! B2! ENSG00000124208! 478! 619! 4830! 7165! ENSG00000182463! 27! 20! 48! 55! ENSG00000125835! 132! 200! 560! 408! ENSG00000125834! 42! 60! 131! 99! ENSG00000197818! 21! 29! 52! 44! ENSG00000125831! 0! 0! 0! 0! ENSG00000215443! 4! 4! 9! 7! ENSG00000222008! 30! 23! 0! 0! ENSG00000101444! 46! 63! 54! 53! ENSG00000101333! 2256! 2793! 2702! 2976! …" …  tens  of  thousands  more  tags  …" **  very  high  dimensional  data  **  
  29. 29. RNA-­‐seq  analysis  steps   Raw  sequence  reads   Map  onto  genome   Which  transcriptome?   Annota5on  based   Genome  guided  assembly   Summarize  reads  to  transcripts   Sta5s5cal  tes5ng:  Determine   differen5ally  expressed  genes   Systems  biology   De  novo  assembly  
  30. 30. Assessing  differen5al  expression  (DE)   •  Which  genes  are  changing  in  their  abundance   between  samples?   •  Sta5s5cal  tests  for  DE  (edgeR)  
  31. 31. Normaliza5on   Accoun5ng  for/removing  technical   sources  of  varia5on  
  32. 32. Library  size  =  sum  of  each  column   Tag  ID! A1! A2! B1! B2! ENSG00000124208! 478! 619! 4830! 7165! ENSG00000182463! 27! 20! 48! 55! ENSG00000125835! 132! 200! 560! 408! ENSG00000125834! 42! 60! 131! 99! ENSG00000197818! 21! 29! 52! 44! ENSG00000125831! 0! 0! 0! 0! ENSG00000215443! 4! 4! 9! 7! ENSG00000222008! 30! 23! 0! 0! ENSG00000101444! 46! 63! 54! 53! ENSG00000101333! 2256! 2793! 2702! 2976! …" …  tens  of  thousands  more  tags  …"
  33. 33. Simple  thought  experiment   •  Two  samples  A  and  B,  sequenced  to  the  same  depth   (same  library  size)   •  Every  gene  that  is  expressed  in  A,  is  expressed  in  B   at  the  same  level   •  Say  there  are  a  small  number  of  genes  that  are   expressed  uniquely  to  sample  B,  but  they  are  quite   highly  expressed  (lots  of  reads)   •  Many  genes  within  the  common  set  will  appear   differen5ally  expressed  (B  <  A)  
  34. 34. Another  way  to  view  it   •  Hypothe5cal  example:   Sequence  6  libraries  to   the  same  depth,  with   varying  levels  of  unique-­‐ to-­‐sample  expression   •  Differences  in  observed   counts  among  the   common  genes   Red=low,  goldenyellow=high  
  35. 35. Marioni  et  al.  2008  RNA-­‐seq  data   •  5  lanes  of  liver  RNA,  5  lanes  of  kidney  RNA   •  Compare  two  single  kidney  libraries  (technical   replicates),  a_er  adjus5ng  for  library  size   trimmed   mean   Distribu5on  of  log-­‐ra5os  of   counts     0  counts  are  omived     Red  line  =  trimmed  mean  
  36. 36. 0.8 0.4 0.0 Same  sample.   Technical  replicates   Density (a) -6 -4 -2 0 2 4 6 0.2 0.0 Different  samples.   Liver  vs  Kidney   Density (b) 0.4 log2(Kidney1 NK1) − log2(Kidney2 NK2) -6 -4 -2 0 2 4 log2(Liver NL) - log2(Kidney NK) 6
  37. 37. Another  way  to  view  the  data   2 K1 2 K2 –  e.g.  several  liver-­‐ specific  genes   •  An  adjustment  at   the  analysis  stage   should  be  made   -4 -2 0 2 4 6  log (Liver N ) - log (Kidney N ) 2 L 2 K ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ●●● ●● ● ●●●●● ● ●● ● ●●● ●● ●●● ● ●●●●●● ● ●● ● ●●●●●●● ● ●●●● ● ● ●●●●●●● ●●●●●●● ● ● ●● ●●●●●●● ●●● ●● ●●●●●●● ●●●●●●● ●●●●●●● ●●●● ●● ●●●●●●● ● ●●●● ● ● ●●●●●●● ● ●●●● ●●●●●● ●●●●●●● ●●●●●●● 0 ●●●●●●● ●●●●●●● ●●● ●●● ●●●●●●● ●●●●●●● ●●●●●●●● ●●●●● ●●●●●●● ● ●●●●●● ●● ●● ●●●●●●● ●● ●●●● ●●●●●●●● ●●●●●●●● ●●●●●●● ●●● ●● ●●●● ●● ●● ●● ●●● ●● ●● ●● ●● ●●●●●●●● ●●●●●● ● ●● ●● ●●●●●●● ●●●● ●●●●●●●● ●●●●●●● ●●●●●●●● ●●● ● ●● ●●● ●●●● ●●●●●●● ● ●●●●●● ●●●●●●● ●●●●●●● ●● ●●●●●●● ●●●●●●● ●● ●●●● -5 •  Sequencing  “real   estate”  is  fixed.   •  Underlying  RNA   -4 composi5on  can  be   -2 0 2 4 6 very   ) − log (Kidney2 log (Kidney1 N different   N ) M = log2(Liver NL) - log2(Kidney NK) (c) ●●●●●●● ●●●●●● ● ●●● ●●●●● ● ●●●●●●●● ●●●●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●●● ●● ●●● ●●●●●●● ● ● ●● ● ●●●●●● ● ●●●●●●● ●●●●●●● ● ● ●●●●● ● ●●● ● ●●●●●●● ● ●●●●●●● ● ●●●●● ● ●● ● ●●●● ●● ● ●●● ● ● ●●●●● ●● ● ● ● ●●● ●● ● ●●● ●● ● ●●●●●● ●● ● ●●●● ●● ●●● ● ● ●● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ●●●●●● ● ●●● ● ●● ● ●●●●●●● ● ●●●●● ●● ●●●● ●● ● ● ●●●●●●●● ● ●● ● ● ●●●●●● ●●●●●● ●●● ●●● ● ● ●● ●● ●● ● ● ● ● ●●● ●●●● ●● ●● ● ● ● ●●● ● ● ●● ●● ● ●●● ●●● ● ● ● ● ●● ●●●●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ●● ●● ● ●●● ● ●●● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ●●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●● ● ●●●●●● ● ●● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ●●●●●●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ●●● ●●● ●●● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●● ●● ● ●● ● ●● ● ●●●● ● ●● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●●● ● ●●● ●●●●●●●● ●●● ● ● ● ● ● ● ●●●●●●● ● ●●●● ●●●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●●●● ●●●●●●●● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●●●●●●●●● ●●●●● ●●● ● ● ●●● ● ● ●●●● ● ●● ● ● ● ●●●●●●●●●● ● ●●●● ● ● ●●● ●●●● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●●●● ●● ● ●●● ●●● ● ●● ●●● ● ● ● ● ●● ●●●●●●●●● ●● ●●●●● ●● ●●●●● ●●●● ●●● ● ●●● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●●●● ●●●● ●●● ●● ●●●●●●●●●●●●●●●● ●● ●●●●●●● ● ● ●● ● ● ● ● ●●●● ●●● ●●● ● ●●●●●● ●● ● ● ●● ● ●● ● ● ● ●●●● ●●●●●●●●●●●●●● ●●●●● ●●● ●● ●● ● ● ●●●● ● ●●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●●●●●●●● ●●●●●●●● ●● ●●●●● ●● ● ● ●●●●● ●●●●●●●●●●●● ●●●●●●●●●● ●●● ●●● ●● ●●● ●● ● ● ● ●●●●● ●●●●●●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●● ●● ●● ●●● ● ● ● ● ● ● ●● ●●● ● ● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●● ●●● ● ●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●● ● ●● ● ● ●●● ●● ● ●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ●●●● ●●●●●●●●● ●●●●● ●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●● ●● ●●● ●●● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●● ●●● ● ●● ●● ● ● ● ●●●●●●●●●●●●●●●●●● ●●●●● ● ● ●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ● ● ●● ● ● ● ● ● ●●●●●●●●●● ●●●●●●● ●●● ● ● ●● ● ● ● ●● ● ●●● ● ●●●●●●●●●●●● ● ● ●●●● ●●●●● ●● ●● ●●●●● ● ● ● ● ●● ● ● ● ● ●● ● ●●●● ● ● ●●●● ●●●● ●● ● ● ● ● ●●●●●●●●●●●● ●●● ●●●●● ●●●●● ●● ● ● ●● ● ● ● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●● ●●● ● ● ●● ● ●● ● ●● ● ●●● ● ● ●● ● ●● ●● ●●●●● ●●●●●●●●●●●●● ●●●●● ● ● ●● ●●● ● ●●● ● ● ●● ● ● ● ● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●● ● ●●● ●●●●●●●● ●●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●● ● ●●● ●●● ● ● ●●● ● ● ●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●●●●●●●● ●●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ●●●●● ●●●●●●●●●●●●●●●●● ● ● ● ●● ●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ●●●● ● ●●●●●● ● ●● ●● ● ●● ● ● ●● ●●●●● ●●●●●● ●●●● ● ● ● ● ● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ●●●●●● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ● ● ●● ●● ●●●●●●●●●●●●● ● ● ●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ● ● ●●● ● ●●● ● ●●●● ● ● ●● ● ● ● ● ● ●●●●● ●●●●●●● ● ●● ● ●● ● ●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ●●●●●● ●●● ● ●●● ●●● ●●●●●●●●●●●●●● ● ● ●● ● ●● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ● ● ●● ● ● ● ●● ● ●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●●●●● ●●●●● ● ● ● ● ● ● ● ●● ●●●●● ● ●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ●●●●●●●●● ● ●● ●● ● ● ● ●● ●●●●●●● ● ● ●●●●● ●● ● ● ●● ●●●●●●●●●●●●●● ●●● ● ●● ● ● ●●● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ●●● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ●● ● ● ●● ●●●●● ● ● ●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ● ●● ●● ● ●●●● ●●● ● ●●●● ● ●●●●●●●●●●●●●●●●●●● ●● ●● ●● ● ● ●●●●● ●●●●●●●●●● ●●● ● ●● ● ●●●●● ● ●● ●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ●● ● ●●●● ●●●●●●●●● ●● ● ● ● ● ●●●●●●● ●●●● ●●●● ● ● ● ●● ●● ● ●● ●●●● ● ● ● ●● ● ● ● ● ●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●● ● ● ●● ● ● ●●●●●●● ● ● ●●●●●●●● ● ● ●● ●●●●●●● ● ●●●●●● ●● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●●●●●●● ●●●●●●●●● ●●● ●●●●● ● ● ● ● ●●● ● ● ● ●●● ●● ●●● ● ● ●●● ● ●● ●●●●● ●●●●● ● ●● ● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ● ● ●●● ●●●●●●●●●●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ●●●● ● ●●●●● ● ● ● ● ●● ● ● ● ● ●●●●●●●●●● ●●● ●●●●● ●● ●● ● ● ● ●● ● ● ● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ●● ● ●● ● ● ● ● ●●●●●● ● ● ● ●● ● ● ● ●●● ●●●●●●●●●● ●●●●●●● ●●● ● ● ● ● ● ●● ● ● ● ● ●●●●●●●●● ● ●●●●● ● ● ● ● ● ●● ●● ● ● ●●● ●●●●● ● ●●●● ● ●● ●●● ● ●●● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ● ● ● ● ●● ● ● ●●●●●●●●●●● ● ●●●●●●●●● ●● ● ●● ● ●●● ●●●●●●●●●●●●●●●●●●●● ●● ● ●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●● ●● ● ● ● ●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ●●● ●●● ● ●●● ● ●●●●●●●●●●●●●●●● ●●●● ●● ● ● ● ●●● ●●●●●●●●●●●● ●●● ● ● ● ●●● ●● ●●● ●● ● ● ●● ● ● ● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ● ● ● ●● ● ● ●●● ●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●● ● ● ●●● ● ●●● ●●●●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ●● ●●●●●●●●●●●●●●●● ● ● ● ● ●●● ● ● ● ● ● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●● ● ● ● ● ●● ●●●●●●●●●● ●●●●●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●●● ●●● ●● ● ● ●●●● ● ● ● ● ● ● ● ●●●●● ●●●● ● ●●●● ●●●●●●●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●● ●● ●● ● ● ● ● ● ●●●●●●●●●●●● ●●●●● ●●●● ●● ●● ● ● ● ● ●●●●● ●●●●●●●●● ●●●●●●●●●●● ● ● ●● ● ● ● ● ● ●●●● ● ●● ●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●●●●●● ●● ●●●●● ●●●●● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●● ●●●●●●●●●●● ●● ●● ●● ● ●●●● ● ●● ●●● ●● ● ● ●● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●●●● ●●● ●●● ●●●● ●● ●●●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ●●●●●●●●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●● ●● ● ●● ●●● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●●● ●●●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● -20 -15 ● ● ● ● ●● ● ● ● Housekeeping genes Unique to a sample -10 A = log2( Liver NL . Kidney NK )
  38. 38. The  adjustment  to  data  analysis  is  a   scaling  constant   •  Assump5on:  core  set  of  genes  that  do  not  change  in   expression.   •  Pick  a  reference  sample,  compute  TMM  rela5ve  to  reference   •  TMM  (Trimmed  Mean  of  M  values)  M=log  ra5o     •  Adjustment  to  sta5s5cal  analysis:   –  Use  addi5onal  offset  (GLM)   Robinson  &  Oshlack,  Genome  Biology,  2010  
  39. 39. Sta5s5cal  tes5ng  for  DE   •  For  EACH  GENE,  is  the  mean  expression  level   for  the  gene  under  one  condi5on  significantly   different  from  the  mean  expression  level   under  a  different  condi5on?   Tag  ID! A1! A2! B1! B2! ENSG00000124208! 478! 619! 4830! 7165! ENSG00000182463! 27! 20! 48! 55! ENSG00000125835! 132! 200! 560! 408! ENSG00000125834! 42! 60! 131! 99! …" …  tens  of  thousands  more  tags  …" 39  
  40. 40. DE  analysis  happens  in  R   •  Several  packages  currently  available  for  DE  analysis   of  RNA-­‐seq  data  in  R:   –  baySeq,     –  DEGSeq,     –  DESeq     –  edgeR   –  NBPSeq     (+  more  methods  with  less  refined  code,  e.g.  TSPM)   •  edgeR  developed  at  WEHI,  so  this  is  my  favourite   •  edgeR  is  only  package  so  far  with  full  GLM   capabili5es   •  Great  support  on  Bioconductor  mailing  list!  
  41. 41. Counts  from  a  Single  RNA  Sample  Can  be   Modeled  with  Poisson  Distribu5on   Short  reads   RNA  sample   Read  1   Read  2   Read  3   Read  4   Read  5   Read  6   …   Map  reads  to   genome   M  =  total  number  of  reads  ≈  20  million  (for  example)   λg    =  true  propor5on  of  gene  g   yg    =  number  of  reads  for  gene  g       Davis  McCarthy   41  
  42. 42. Counts  from  a  Single  RNA  Sample  Can  be   Modeled  with  Poisson  Distribu5on   Short  reads   RNA  sample   Read  1   Read  2   Read  3   Read  4   Read  5   Read  6   …   Map  reads  to   genome   What  we’re   interested  in   M  =  total  number  of  reads  ≈  20  million   λg    =  true  propor5on  of  gene  g   What  we   yg    =  number  of  reads  for  gene  g   observe       Large  M,  small  λi  à  yg  approximately  Poisson,  μg  =  Mλg   Davis  McCarthy   42  
  43. 43. A  Small  RNA-­‐Seq  Experiment  (Tech   Reps)   Condi5on  B   Condi5on  A   M1   M3   M2   yg1   λg3   λg2   λg1   yg2   Reads  Mi  ≈  20  million   Genes  g  =  1,  …,  30k   λg4   M4   yg3   yg4   E(ygi)  =  Mi  λgi   43  
  44. 44. True  Technical  Reps  Show  Poisson   Varia5on  for  Each  Gene   Davis  McCarthy   Data:     Marioni  et  al.,   Genome  Res,   2008   44  
  45. 45. A  Small  RNA-­‐Seq  Experiment   (Biological  Reps)   Condi5on  B   Condi5on  A   M1   M3   M2   yg1   λg3   λg2   λg1   yg2   Reads  Mi  ≈  20  million   Genes  g  =  1,  …,  30k   λg4   M4   yg3   yg4   E(ygi)  =  Mi  λgi   45   Davis  McCarthy  
  46. 46. Biological  Replicate  Data  shows  Quadra5c   Mean-­‐Variance  Rela5onship   (development  cycle  of  slime  mould,  2  samples  at  hr00,  &  2  at  hr04)   Davis  McCarthy   binned  variance,  sample  variance   Data:   Parikh  et  al,   Genome   Biology,  2010   46  
  47. 47. Need  to  account  for  biological   varia5on   •  Technical  replicate  counts  for  a  gene  vary  according  to  a   Poisson  law,  i.e.  sequencing  varia5on  is  Poisson   •  Biological  CV  (BCV)  is  the  coefficient  of  varia5on  with   which  the  (unknown)  true  abundance  of  the  gene  varies   between  RNA  samples.     •  edgeR  models  a  quadra5c  mean-­‐variance  rela5onship   var(ygi)=  μgi  +  φgμgi2   with  φg=Biological  CV2   Robinson  et  al,  2010  
  48. 48. Output  of  analysis   •  A  list  of  genes  with  p-­‐values  for  tes5ng   differen5al  expression  between  samples   •  A  ranked  list  of  genes  
  49. 49. RNA-­‐seq  analysis  steps   Raw  sequence  reads   Map  onto  genome   Which  transcriptome?   Annota5on  based   Genome  guided  assembly   Summarize  reads  to  transcripts   Sta5s5cal  tes5ng:  Determine   differen5ally  expressed  genes   Systems  biology   De  novo  assembly  
  50. 50. RNA-­‐seq  analysis  steps   Raw  sequence  reads   Map  onto  genome   Which  transcriptome?   Annota5on  based   Genome  guided  assembly   De  novo  assembly   Summarize  reads  to  transcripts   Sta5s5cal  tes5ng:  Determine   differen5ally  expressed  genes   Learn   something!   Systems  biology  
  51. 51. The  future   •  Effec5vely  dealing  with  no  genome  or  poor   quality  genomes  (our  new  method,  Corset)   •  Integra5ng  RNA-­‐seq  data  with  other  genomics   data   •  Analysis  methodology  is  cri5cal  and  s5ll   developing  for  specific  purposes   •  Opportuni5es  to  use  this  data  in  new  and   imagina5ve  ways  –  requires  new  analysis   methodology  
  52. 52. Acknowledgements   •  •  •  •  •  •  Mark  Robinson  (Uni  Zurich)   Mavhew  Young  (Cambridge)   Nadia  Davidson  (MCRI)   Mavhew  Wakefield  (WEHI)   Gordon  Smyth  (WEHI)   Davis  McCarthy  (Oxford)  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×