SlideShare a Scribd company logo
Statistics for Next Generation
   Sequencing (RNA-Seq)
Distribution?
• 25000 genes, each with counts over several
  samples
     • 2 conditions, each with several replicates

• Recall, log-Normal for Microarrays
     • Based on fitting on actual data with many replicates


• No equivalent data for RNA-Seq
     • So go back to first principles
RNA-Seq Setting
RNA-Seq Counts Distribution
Hypergeometric Distribution
Simplifying the Hypergeometric
          Distribution
The Poisson Distribution




 λ is both mean
  and variance
The Poisson Distribution
                       (Wikipedia)
•   The number of soldiers killed by horse-kicks each year in each corps in
    the Prussian cavalry. This example was made famous by a book of Ladislaus
    Josephovich Bortkiewicz (1868–1931).
•   The number of yeast cells used when brewing Guinness beer. This example was
    made famous by William Sealy Gosset (1876–1937).[19]
•   The number of phone calls arriving at a call centre per minute.
•   The number of goals in sports involving two competing teams.
•   The number of deaths per year in a given age group.
•   The number of jumps in a stock price in a given time interval.
•   Under an assumption of homogeneity, the number of times a web server is
    accessed per minute.
•   The number of mutations in a given stretch of DNA after a certain amount of
    radiation.
•   The proportion of cells that will be infected at a given multiplicity of infection.
Is Mean = Variance for NGS ?


– Variance ∝ Mean2




 Log Scale: White
line is the Poisson
         line
Why this Over-Dispersion

• The Poisson model only
  models technical variation,
  not biological variation

• Biological variation induces
  more variance than
  captured by the Poisson
  model
–    No reason for difference from
     microarrays where SD ∝ Mean
         (or Variance ∝ Mean2)
                                     SD vs Mean for
                                      Microarrays
Handling Over-Dispersion
What Distribution is X?

• Log-Normal for Arrays?

• The combination of log-Normal and Poisson
  doesn’t have a neat closed form (i.e., formula)

• So assume Gamma distribution
   – Poisson + Gamma -> Negative Binomial
   – Used traditionally to fix the problem of over-
     dispersion
The Gamma Distribution




             Control on
              Right Tail
The Negative Binomial Distribution
Estimating Parameters




                For each gene, estimate
              the mean across replicates,
                 and then estimate the
              variance from the curve fit
                         above
Bias Correction
Thank You

More Related Content

Viewers also liked

Signal Transduction Revised
Signal Transduction RevisedSignal Transduction Revised
Signal Transduction Revised
MD Specialclass
 
Dna sequencing
Dna sequencingDna sequencing
Dna sequencing
carlybeck127
 
217 c reactive protein
217 c reactive protein217 c reactive protein
217 c reactive protein
SHAPE Society
 
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid MetabolismChem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Shaina Mavreen Villaroza
 
Lipid metabolism
Lipid metabolismLipid metabolism
Lipid metabolism
Oheneba Hagan
 
Regulation of Gene Expression ppt
Regulation of Gene Expression pptRegulation of Gene Expression ppt
Regulation of Gene Expression ppt
Khaled Elmasry
 
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTESREGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
University of Louisiana at Monroe, USA
 
Dna Sequencing
Dna SequencingDna Sequencing
Dna Sequencing
Zahoor Ahmed
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
Mandy Suzanne
 

Viewers also liked (9)

Signal Transduction Revised
Signal Transduction RevisedSignal Transduction Revised
Signal Transduction Revised
 
Dna sequencing
Dna sequencingDna sequencing
Dna sequencing
 
217 c reactive protein
217 c reactive protein217 c reactive protein
217 c reactive protein
 
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid MetabolismChem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
 
Lipid metabolism
Lipid metabolismLipid metabolism
Lipid metabolism
 
Regulation of Gene Expression ppt
Regulation of Gene Expression pptRegulation of Gene Expression ppt
Regulation of Gene Expression ppt
 
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTESREGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
 
Dna Sequencing
Dna SequencingDna Sequencing
Dna Sequencing
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Introduction to statistics iii

Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
Setia Pramana
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genome
Naveen Gupta
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
fnothaft
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
Ino de Bruijn
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
jsrep91
 
Ssr assignment
Ssr assignmentSsr assignment
Ssr assignment
Dhiraj Singh
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
Mrinal Vashisth
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
Stephane Wenric
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR GenomicsTarget Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Andrea Telatin
 
Diagnosis of mrsa by molecular methods
Diagnosis of mrsa by molecular methodsDiagnosis of mrsa by molecular methods
Diagnosis of mrsa by molecular methods
Afnan Zuiter
 
Genomics seminar
Genomics seminarGenomics seminar
Genomics seminar
S Rasouli
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
AdamCribbs1
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
had89
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
VHIR Vall d’Hebron Institut de Recerca
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
Yaoyu Wang
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
DivyanshGupta922023
 
SlipChip - Oct 2012
SlipChip - Oct 2012SlipChip - Oct 2012
SlipChip - Oct 2012
marblar
 
DNA analysis
DNA analysisDNA analysis

Similar to Introduction to statistics iii (20)

Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genome
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
 
Ssr assignment
Ssr assignmentSsr assignment
Ssr assignment
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR GenomicsTarget Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
 
Diagnosis of mrsa by molecular methods
Diagnosis of mrsa by molecular methodsDiagnosis of mrsa by molecular methods
Diagnosis of mrsa by molecular methods
 
Genomics seminar
Genomics seminarGenomics seminar
Genomics seminar
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
SlipChip - Oct 2012
SlipChip - Oct 2012SlipChip - Oct 2012
SlipChip - Oct 2012
 
DNA analysis
DNA analysisDNA analysis
DNA analysis
 

More from Strand Life Sciences Pvt Ltd

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
Strand Life Sciences Pvt Ltd
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
Strand Life Sciences Pvt Ltd
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
Strand Life Sciences Pvt Ltd
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
Strand Life Sciences Pvt Ltd
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Strand Life Sciences Pvt Ltd
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
Strand Life Sciences Pvt Ltd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
Strand Life Sciences Pvt Ltd
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
Strand Life Sciences Pvt Ltd
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
Strand Life Sciences Pvt Ltd
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
Strand Life Sciences Pvt Ltd
 
Suffix arrays
Suffix arraysSuffix arrays
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
Strand Life Sciences Pvt Ltd
 

More from Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Introduction to statistics iii

  • 1. Statistics for Next Generation Sequencing (RNA-Seq)
  • 2. Distribution? • 25000 genes, each with counts over several samples • 2 conditions, each with several replicates • Recall, log-Normal for Microarrays • Based on fitting on actual data with many replicates • No equivalent data for RNA-Seq • So go back to first principles
  • 7. The Poisson Distribution λ is both mean and variance
  • 8. The Poisson Distribution (Wikipedia) • The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868–1931). • The number of yeast cells used when brewing Guinness beer. This example was made famous by William Sealy Gosset (1876–1937).[19] • The number of phone calls arriving at a call centre per minute. • The number of goals in sports involving two competing teams. • The number of deaths per year in a given age group. • The number of jumps in a stock price in a given time interval. • Under an assumption of homogeneity, the number of times a web server is accessed per minute. • The number of mutations in a given stretch of DNA after a certain amount of radiation. • The proportion of cells that will be infected at a given multiplicity of infection.
  • 9. Is Mean = Variance for NGS ? – Variance ∝ Mean2 Log Scale: White line is the Poisson line
  • 10. Why this Over-Dispersion • The Poisson model only models technical variation, not biological variation • Biological variation induces more variance than captured by the Poisson model – No reason for difference from microarrays where SD ∝ Mean (or Variance ∝ Mean2) SD vs Mean for Microarrays
  • 12. What Distribution is X? • Log-Normal for Arrays? • The combination of log-Normal and Poisson doesn’t have a neat closed form (i.e., formula) • So assume Gamma distribution – Poisson + Gamma -> Negative Binomial – Used traditionally to fix the problem of over- dispersion
  • 13. The Gamma Distribution Control on Right Tail
  • 14. The Negative Binomial Distribution
  • 15. Estimating Parameters For each gene, estimate the mean across replicates, and then estimate the variance from the curve fit above