SlideShare a Scribd company logo
Using RNA Seq to conduct systems-level analysis of
embryonic pluripotency, self-renewal and differentiation




                   David-Emlyn Parfitt
        Shen Lab, Irving Cancer Research Center
The molecular regulators of self-renewal and pluripotency are
           not completely defined or characterized
  Mouse blastocyst            Mouse egg cylinder        Human blastocyst
    (3.5 days)                   (5.5 days)                (5-7 days)




                 Inner Cell
                 Mass
                                             Epiblast


        mESC                       mEpiSC                   hESC


                                                   ≈


Nanog
                                                                   JAK-STAT
Oct4           Self-renewal and Pluripotency                       MAPK
Sox2

                     Novel Master Regulators?
Defining the molecular networks associated with stem cell self-
                renewal, pluripotency and differentiation




                                                           Which tool to use for
                                                           expression profiling?
150 Combinatory
                      Genome-Wide GEP Data
   Chemical
  Treatments
                  Algorithmic
                   analysis          Master
                                    Regulator
                  (ARACNe,
                                    Analysis
                    MINDy)
                      Rank




                                                In vitro and in vivo
                                                     validation
                                                                           ESC/EpiSC
                                                                          „Interactome‟
Gene Expression Profiling:
Microarrays vs RNA-Sequencing


           Arrays:



                 Well defined technique

                 High throughput


                 Discrete measurement

                 Background noise + batch effect

                 No distinction between isoforms/alleles
Gene Expression Profiling:
             Microarrays vs RNA-Sequencing

                       RNA Sequencing:
aaaaaaa

   aaaaaaa      Total RNA


      aaaaaaa

                Fragment
  aaaaaaa




          Reverse-transcribe
          to cDNA
Gene Expression Profiling:
             Microarrays vs RNA-Sequencing

                       RNA Sequencing:
aaaaaaa

   aaaaaaa      Total RNA*     Algorithmic and logistic challenge

                               Lengthy library preparation
      aaaaaaa


  aaaaaaa
                               Single base resolution

                               Low background noise
          Reverse-transcribe
          to cDNA              Distinction of isoform and allelic
                               expression

                               Low amount of RNA needed

                               *Including non-coding RNAs, depending
                               on purification protocol
RNA-Sequencing Methodology:
               Deciding the parameters



aaaaaaa

   aaaaaaa
                      Read length?
                         -Efficiency vs faithfulness
      aaaaaaa


  aaaaaaa             Single end or paired end reads?
                          -Efficiency vs faithfulness
                          -Alignment accuracy

                      Number of reads?
                         -Depth of coverage
                         -Cost


                                 How many to effectively cover
                                 the mouse genome (~50MB)?
Deciding the parameters:
           How many 100 bp reads is necessary for comprehensive
                     coverage of the mouse genome?




RPKM:

Normalized measurement of transcript abundance

Reads per kilobase of exome per million mapped
reads

RPKM for a particular transcript does not change
when overall number of reads changes, and it is
the same for transcripts with same abundance
Deciding the parameters:
           How many 100 bp reads is necessary for comprehensive
                     coverage of the mouse genome?




RPKM:

Normalized measurement of transcript abundance

Reads per kilobase of exome per million mapped
reads

RPKM for a particular transcript does not change
when overall number of reads changes, and it is
the same for transcripts with same abundance
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
          coverage of the mouse genome?




             100 million, 100bp, SE reads
Setting the transcript ‘detection’ threshold




                                      RA-72H-1   RA-72H-2   CM    CM

Number of raw reads (million)           97.3       88       87    95

Number of mapped reads (million)        97         87.7     87    94


Transcripts w. RPKM > 0.01 (/27641)     72%        77%      84%   84%
Setting the transcript ‘detection’ threshold




                                   RA-72H-1   RA-72H-2   CM    CM

Number of raw reads (million)        97.3       88       87    95

Number of mapped reads (million)     97         87.7     87    94


Transcripts w. RPKM > 1 (/27641)     49%        48%      51%   52%
RPKM is constant, regardless of number of reads




r2=0.9                            r2=0.97




 “RPKM for a particular transcript does not change
 when overall number of reads changes”
RPKM becomes relatively constant with increased read
                            number

                        0.95

                         0.9

          Median RPKM   0.85

                         0.8
                                    0.749
                        0.75                                0.725


                         0.7

                        0.65

                         0.6

                        0.55

                         0.5
                               20           40         60     80
                                    Reads (millions)


i.e. We are not detecting significantly more genes/transcripts above
                         20-30 million reads
How many 100 bp reads is necessary for comprehensive
                      coverage of the mouse genome?


                     1


                   0.95
Percent of final




                    0.9
  transcripts




                                                                 [60,)
                   0.85                                          [30,60)
                                                                 [15,30)
                                                                 [7.5,15)
                                                                               Transcript
                    0.8                                                        Abundance
                                                                 [3.75,7.5)
                                                                 [0.01,3.74)   (RPKM)
                   0.75


                    0.7
                          0   20     40       60      80   100

                                   Reads (millions)


 Between 20 and 30 million 100bp reads is sufficient to capture
 ~100% of the most abundant transcripts and 95% of the least
 abundant
Acknowledgements




Shen Lab:
Michael Shen
Hui Zhao
Shen Lab Members

Califano Lab:
Andrea Califano
Mariano Alvarez

Yufeng Shen
Xiaoyun Sun

Olivier Couronne
Erin Bush

More Related Content

What's hot

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
Luc Van Laer
 
Aiche 2008, Philadelphia
Aiche 2008, PhiladelphiaAiche 2008, Philadelphia
Aiche 2008, Philadelphia
jshine
 

What's hot (16)

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Aiche 2008, Philadelphia
Aiche 2008, PhiladelphiaAiche 2008, Philadelphia
Aiche 2008, Philadelphia
 
Bioo Scientific - Improving NGS Library Prep Automation on the Sciclone NGS W...
Bioo Scientific - Improving NGS Library Prep Automation on the Sciclone NGS W...Bioo Scientific - Improving NGS Library Prep Automation on the Sciclone NGS W...
Bioo Scientific - Improving NGS Library Prep Automation on the Sciclone NGS W...
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
 
Can we exploit the power of NGS to move towards personalized medicine?, Cente...
Can we exploit the power of NGS to move towards personalized medicine?, Cente...Can we exploit the power of NGS to move towards personalized medicine?, Cente...
Can we exploit the power of NGS to move towards personalized medicine?, Cente...
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
Areas of Expertise-mRNA
Areas of Expertise-mRNAAreas of Expertise-mRNA
Areas of Expertise-mRNA
 
Gene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCRGene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCR
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
Understanding Melt Curves for Improved SYBR® Green Assay Analysis and Trouble...
Understanding Melt Curves for Improved SYBR® Green Assay Analysis and Trouble...Understanding Melt Curves for Improved SYBR® Green Assay Analysis and Trouble...
Understanding Melt Curves for Improved SYBR® Green Assay Analysis and Trouble...
 
RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...
RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...
RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...
 
Semiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesSemiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant Sciences
 
State-of-the-Art Normalization of RT-qPCR Data
State-of-the-Art Normalization of RT-qPCR Data State-of-the-Art Normalization of RT-qPCR Data
State-of-the-Art Normalization of RT-qPCR Data
 
qPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsqPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific Applications
 
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
 

Viewers also liked

Viewers also liked (6)

ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 

Similar to David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011

20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
LutzFr
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
Toyin23
 
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome AmplificationEnabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
QIAGEN
 
Bc2012 submission 109a
Bc2012 submission 109aBc2012 submission 109a
Bc2012 submission 109a
rmazumde
 
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqNUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
Himanshu Sethi
 
1073958 wp guide-develop-pcr_primers_1012
1073958 wp guide-develop-pcr_primers_10121073958 wp guide-develop-pcr_primers_1012
1073958 wp guide-develop-pcr_primers_1012
Elsa von Licy
 

Similar to David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011 (20)

RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Transcriptome project
Transcriptome projectTranscriptome project
Transcriptome project
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Eccmid meet the-expert
Eccmid meet the-expertEccmid meet the-expert
Eccmid meet the-expert
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencing
 
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome AmplificationEnabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
 
Bc2012 submission 109a
Bc2012 submission 109aBc2012 submission 109a
Bc2012 submission 109a
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqNUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
 
High-Throughput Screening of mAb Charge Variants Using Microchip-CZE
High-Throughput Screening of mAb Charge Variants Using Microchip-CZEHigh-Throughput Screening of mAb Charge Variants Using Microchip-CZE
High-Throughput Screening of mAb Charge Variants Using Microchip-CZE
 
1073958 wp guide-develop-pcr_primers_1012
1073958 wp guide-develop-pcr_primers_10121073958 wp guide-develop-pcr_primers_1012
1073958 wp guide-develop-pcr_primers_1012
 

Recently uploaded

New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
i3 Health
 

Recently uploaded (20)

Effects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial healthEffects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial health
 
5cl adbb 5cladba cheap and fine Telegram: +85297504341
5cl adbb 5cladba cheap and fine Telegram: +852975043415cl adbb 5cladba cheap and fine Telegram: +85297504341
5cl adbb 5cladba cheap and fine Telegram: +85297504341
 
Prix Galien International 2024 Forum Program
Prix Galien International 2024 Forum ProgramPrix Galien International 2024 Forum Program
Prix Galien International 2024 Forum Program
 
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
 
Relationship between vascular system disfunction, neurofluid flow and Alzheim...
Relationship between vascular system disfunction, neurofluid flow and Alzheim...Relationship between vascular system disfunction, neurofluid flow and Alzheim...
Relationship between vascular system disfunction, neurofluid flow and Alzheim...
 
Book Trailer: PGMEE in a Nutshell (CEE MD/MS PG Entrance Examination)
Book Trailer: PGMEE in a Nutshell (CEE MD/MS PG Entrance Examination)Book Trailer: PGMEE in a Nutshell (CEE MD/MS PG Entrance Examination)
Book Trailer: PGMEE in a Nutshell (CEE MD/MS PG Entrance Examination)
 
Antiplatelets in IHD, Dose Duration, DAPT vs SAPT
Antiplatelets in IHD, Dose Duration, DAPT vs SAPTAntiplatelets in IHD, Dose Duration, DAPT vs SAPT
Antiplatelets in IHD, Dose Duration, DAPT vs SAPT
 
Aptopadesha Pramana / Pariksha: The Verbal Testimony
Aptopadesha Pramana / Pariksha: The Verbal TestimonyAptopadesha Pramana / Pariksha: The Verbal Testimony
Aptopadesha Pramana / Pariksha: The Verbal Testimony
 
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
 
PT MANAGEMENT OF URINARY INCONTINENCE.pptx
PT MANAGEMENT OF URINARY INCONTINENCE.pptxPT MANAGEMENT OF URINARY INCONTINENCE.pptx
PT MANAGEMENT OF URINARY INCONTINENCE.pptx
 
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
 
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptxCURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
 
Anuman- An inference for helpful in diagnosis and treatment
Anuman- An inference for helpful in diagnosis and treatmentAnuman- An inference for helpful in diagnosis and treatment
Anuman- An inference for helpful in diagnosis and treatment
 
"Central Hypertension"‚ in China: Towards the nation-wide use of SphygmoCor t...
"Central Hypertension"‚ in China: Towards the nation-wide use of SphygmoCor t..."Central Hypertension"‚ in China: Towards the nation-wide use of SphygmoCor t...
"Central Hypertension"‚ in China: Towards the nation-wide use of SphygmoCor t...
 
Is preeclampsia and spontaneous preterm delivery associate with vascular and ...
Is preeclampsia and spontaneous preterm delivery associate with vascular and ...Is preeclampsia and spontaneous preterm delivery associate with vascular and ...
Is preeclampsia and spontaneous preterm delivery associate with vascular and ...
 
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
 
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
 
Contact dermaititis (irritant and allergic).pdf
Contact dermaititis (irritant and allergic).pdfContact dermaititis (irritant and allergic).pdf
Contact dermaititis (irritant and allergic).pdf
 
Presentació "Advancing Emergency Medicine Education through Virtual Reality"
Presentació "Advancing Emergency Medicine Education through Virtual Reality"Presentació "Advancing Emergency Medicine Education through Virtual Reality"
Presentació "Advancing Emergency Medicine Education through Virtual Reality"
 

David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011

  • 1. Using RNA Seq to conduct systems-level analysis of embryonic pluripotency, self-renewal and differentiation David-Emlyn Parfitt Shen Lab, Irving Cancer Research Center
  • 2. The molecular regulators of self-renewal and pluripotency are not completely defined or characterized Mouse blastocyst Mouse egg cylinder Human blastocyst (3.5 days) (5.5 days) (5-7 days) Inner Cell Mass Epiblast mESC mEpiSC hESC ≈ Nanog JAK-STAT Oct4 Self-renewal and Pluripotency MAPK Sox2 Novel Master Regulators?
  • 3. Defining the molecular networks associated with stem cell self- renewal, pluripotency and differentiation Which tool to use for expression profiling? 150 Combinatory Genome-Wide GEP Data Chemical Treatments Algorithmic analysis Master Regulator (ARACNe, Analysis MINDy) Rank In vitro and in vivo validation ESC/EpiSC „Interactome‟
  • 4. Gene Expression Profiling: Microarrays vs RNA-Sequencing Arrays: Well defined technique High throughput Discrete measurement Background noise + batch effect No distinction between isoforms/alleles
  • 5. Gene Expression Profiling: Microarrays vs RNA-Sequencing RNA Sequencing: aaaaaaa aaaaaaa Total RNA aaaaaaa Fragment aaaaaaa Reverse-transcribe to cDNA
  • 6. Gene Expression Profiling: Microarrays vs RNA-Sequencing RNA Sequencing: aaaaaaa aaaaaaa Total RNA* Algorithmic and logistic challenge Lengthy library preparation aaaaaaa aaaaaaa Single base resolution Low background noise Reverse-transcribe to cDNA Distinction of isoform and allelic expression Low amount of RNA needed *Including non-coding RNAs, depending on purification protocol
  • 7. RNA-Sequencing Methodology: Deciding the parameters aaaaaaa aaaaaaa Read length? -Efficiency vs faithfulness aaaaaaa aaaaaaa Single end or paired end reads? -Efficiency vs faithfulness -Alignment accuracy Number of reads? -Depth of coverage -Cost How many to effectively cover the mouse genome (~50MB)?
  • 8. Deciding the parameters: How many 100 bp reads is necessary for comprehensive coverage of the mouse genome? RPKM: Normalized measurement of transcript abundance Reads per kilobase of exome per million mapped reads RPKM for a particular transcript does not change when overall number of reads changes, and it is the same for transcripts with same abundance
  • 9. Deciding the parameters: How many 100 bp reads is necessary for comprehensive coverage of the mouse genome? RPKM: Normalized measurement of transcript abundance Reads per kilobase of exome per million mapped reads RPKM for a particular transcript does not change when overall number of reads changes, and it is the same for transcripts with same abundance
  • 10. Deciding the parameters: How many 100 bp reads is necessary for comprehensive coverage of the mouse genome? 100 million, 100bp, SE reads
  • 11. Setting the transcript ‘detection’ threshold RA-72H-1 RA-72H-2 CM CM Number of raw reads (million) 97.3 88 87 95 Number of mapped reads (million) 97 87.7 87 94 Transcripts w. RPKM > 0.01 (/27641) 72% 77% 84% 84%
  • 12. Setting the transcript ‘detection’ threshold RA-72H-1 RA-72H-2 CM CM Number of raw reads (million) 97.3 88 87 95 Number of mapped reads (million) 97 87.7 87 94 Transcripts w. RPKM > 1 (/27641) 49% 48% 51% 52%
  • 13. RPKM is constant, regardless of number of reads r2=0.9 r2=0.97 “RPKM for a particular transcript does not change when overall number of reads changes”
  • 14. RPKM becomes relatively constant with increased read number 0.95 0.9 Median RPKM 0.85 0.8 0.749 0.75 0.725 0.7 0.65 0.6 0.55 0.5 20 40 60 80 Reads (millions) i.e. We are not detecting significantly more genes/transcripts above 20-30 million reads
  • 15. How many 100 bp reads is necessary for comprehensive coverage of the mouse genome? 1 0.95 Percent of final 0.9 transcripts [60,) 0.85 [30,60) [15,30) [7.5,15) Transcript 0.8 Abundance [3.75,7.5) [0.01,3.74) (RPKM) 0.75 0.7 0 20 40 60 80 100 Reads (millions) Between 20 and 30 million 100bp reads is sufficient to capture ~100% of the most abundant transcripts and 95% of the least abundant
  • 16. Acknowledgements Shen Lab: Michael Shen Hui Zhao Shen Lab Members Califano Lab: Andrea Califano Mariano Alvarez Yufeng Shen Xiaoyun Sun Olivier Couronne Erin Bush