Your SlideShare is downloading. ×
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011

1,269
views

Published on

Published in: Health & Medicine, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,269
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using RNA Seq to conduct systems-level analysis ofembryonic pluripotency, self-renewal and differentiation David-Emlyn Parfitt Shen Lab, Irving Cancer Research Center
  • 2. The molecular regulators of self-renewal and pluripotency are not completely defined or characterized Mouse blastocyst Mouse egg cylinder Human blastocyst (3.5 days) (5.5 days) (5-7 days) Inner Cell Mass Epiblast mESC mEpiSC hESC ≈Nanog JAK-STATOct4 Self-renewal and Pluripotency MAPKSox2 Novel Master Regulators?
  • 3. Defining the molecular networks associated with stem cell self- renewal, pluripotency and differentiation Which tool to use for expression profiling?150 Combinatory Genome-Wide GEP Data Chemical Treatments Algorithmic analysis Master Regulator (ARACNe, Analysis MINDy) Rank In vitro and in vivo validation ESC/EpiSC „Interactome‟
  • 4. Gene Expression Profiling:Microarrays vs RNA-Sequencing Arrays: Well defined technique High throughput Discrete measurement Background noise + batch effect No distinction between isoforms/alleles
  • 5. Gene Expression Profiling: Microarrays vs RNA-Sequencing RNA Sequencing:aaaaaaa aaaaaaa Total RNA aaaaaaa Fragment aaaaaaa Reverse-transcribe to cDNA
  • 6. Gene Expression Profiling: Microarrays vs RNA-Sequencing RNA Sequencing:aaaaaaa aaaaaaa Total RNA* Algorithmic and logistic challenge Lengthy library preparation aaaaaaa aaaaaaa Single base resolution Low background noise Reverse-transcribe to cDNA Distinction of isoform and allelic expression Low amount of RNA needed *Including non-coding RNAs, depending on purification protocol
  • 7. RNA-Sequencing Methodology: Deciding the parametersaaaaaaa aaaaaaa Read length? -Efficiency vs faithfulness aaaaaaa aaaaaaa Single end or paired end reads? -Efficiency vs faithfulness -Alignment accuracy Number of reads? -Depth of coverage -Cost How many to effectively cover the mouse genome (~50MB)?
  • 8. Deciding the parameters: How many 100 bp reads is necessary for comprehensive coverage of the mouse genome?RPKM:Normalized measurement of transcript abundanceReads per kilobase of exome per million mappedreadsRPKM for a particular transcript does not changewhen overall number of reads changes, and it isthe same for transcripts with same abundance
  • 9. Deciding the parameters: How many 100 bp reads is necessary for comprehensive coverage of the mouse genome?RPKM:Normalized measurement of transcript abundanceReads per kilobase of exome per million mappedreadsRPKM for a particular transcript does not changewhen overall number of reads changes, and it isthe same for transcripts with same abundance
  • 10. Deciding the parameters:How many 100 bp reads is necessary for comprehensive coverage of the mouse genome? 100 million, 100bp, SE reads
  • 11. Setting the transcript ‘detection’ threshold RA-72H-1 RA-72H-2 CM CMNumber of raw reads (million) 97.3 88 87 95Number of mapped reads (million) 97 87.7 87 94Transcripts w. RPKM > 0.01 (/27641) 72% 77% 84% 84%
  • 12. Setting the transcript ‘detection’ threshold RA-72H-1 RA-72H-2 CM CMNumber of raw reads (million) 97.3 88 87 95Number of mapped reads (million) 97 87.7 87 94Transcripts w. RPKM > 1 (/27641) 49% 48% 51% 52%
  • 13. RPKM is constant, regardless of number of readsr2=0.9 r2=0.97 “RPKM for a particular transcript does not change when overall number of reads changes”
  • 14. RPKM becomes relatively constant with increased read number 0.95 0.9 Median RPKM 0.85 0.8 0.749 0.75 0.725 0.7 0.65 0.6 0.55 0.5 20 40 60 80 Reads (millions)i.e. We are not detecting significantly more genes/transcripts above 20-30 million reads
  • 15. How many 100 bp reads is necessary for comprehensive coverage of the mouse genome? 1 0.95Percent of final 0.9 transcripts [60,) 0.85 [30,60) [15,30) [7.5,15) Transcript 0.8 Abundance [3.75,7.5) [0.01,3.74) (RPKM) 0.75 0.7 0 20 40 60 80 100 Reads (millions) Between 20 and 30 million 100bp reads is sufficient to capture ~100% of the most abundant transcripts and 95% of the least abundant
  • 16. AcknowledgementsShen Lab:Michael ShenHui ZhaoShen Lab MembersCalifano Lab:Andrea CalifanoMariano AlvarezYufeng ShenXiaoyun SunOlivier CouronneErin Bush