Introduction to statistics ii

•Download as PPTX, PDF•

0 likes•452 views

Strand Life Sciences Pvt Ltd

Technology Business

Background

μ, σ2

• Few observations made by a black box

• What is the distribution behind the black box?

• E.g., with what probability will it output a number
bigger than 5?

Approach

• Easy to determine with many observations

• With few observations..

• Assume a canonical distribution based on prior
knowledge

• Determine parameters of this distribution using
the observations, e.g., mean, variance

Estimating the variance σ2

Chi-Square if
the original
distribution
was Normal

Microarray Data
• Many genes, 25000

• 2 conditions (or more), many replicates within
each condition

• Which genes are differentially expressed
between the two conditions?

More Specifically
• For a particular gene
– Each condition is a black box
– Say 3 observations from each black box

• Do both black boxes have the same
distribution?
– Assume same canonical distribution
– Do both have the same parameters?

Which Canonical Distribution
• Use data with many replicates

• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242,
379.8320, 392.1817, 450.4758, 335.8242, 265.2478,
196.6982, 289.6532, 274.4035, 246.6807, 254.8710,
165.9416, 281.9463, 246.6434, 259.0019, 242.1968

• Distribution??

Distribution of log raw intensities
across genes on a single array

The QQ plot of log scale intensities
(i.e., actual vs simulated from normal)

QQ Plot against a Normal Distribution
• 10 + 10 replicates in
two groups

• Single group QQ plot

• Combined 2 groups QQ
plot

• Combined log-scale QQ
plot
Shapiro-
Wilk Test

Which Canonical Distribution

• Assume log normal distribution

Benford’s Law
• Frequency distribution of first significant digit

Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]

Differential Expression

μ1,σ12 μ2,σ22

Group 1 Group 2

Is μ1= μ2?
σ1 = σ2 ? Is variance a
function of
mean?

SD
increases
linearly
with Mean

SD vs Mean across 3 replicates plotted for all genes

SD is flat
now,
except for
very low
values

Another
reason to
work on
the log
scale

SD vs Mean across 3 replicates computed for all
genes after log-transformation

Differential Expression

μ1,σ12 μ2,σ22

Group 1 Group 2

Is μ1= μ2?
σ1 = σ2 ? Sort-of YES

The T-Statistic
Flattened
Normal or T-
Distribution

The curve
fit here
may be a
better
estimate

Lots of false
positives can Not much
be avoided difference
here here

SD vs Mean across 3 replicates computed for all
genes after log-transformattion

Viewers also liked

Introduction of suffix treeLiou Shu Hung

Packet forwarding in wan.46myrajendra

Trie treeShakil Ahmed

Suffix Tree and Suffix ArrayHarshit Agarwal

Data structure triesMd. Naim khan

Lec18Nikhil Chilwant

Fundamentalsmyrajendra

Tries - Tree Based Structures for StringsAmrinder Arora

Basic Packet Forwarding in NS2Teerawat Issariyakul

Application of triesTech_MX

Digital Search TreeEast West University

Trie Data Structureনিষ্পাপ হ্যাকার

Multi ways treesSHEETAL WAGHMARE

Cis82 e2-1-packet forwardingHarjanto Handi Kusumo

Viewers also liked (14)

Introduction of suffix tree

Packet forwarding in wan.46

Trie tree

Suffix Tree and Suffix Array

Data structure tries

Lec18

Fundamentals

Tries - Tree Based Structures for Strings

Basic Packet Forwarding in NS2

Application of tries

Digital Search Tree

Trie Data Structure

Multi ways trees

Cis82 e2-1-packet forwarding

Similar to Introduction to statistics ii

Introduction to statisticsStrand Life Sciences Pvt Ltd

T Test For Two Independent Samplesshoffma5

DNA Microarrayjaipur national university jaipur

Microarray AnalysisJames McInerney

How to analyse bulk transcriptomic data using Deseq2AdamCribbs1

Lesson 3Ning Ding

unit 4 nearest neighbor.pptPRANAVKUMAR699137

Statisticsforbiologists colstonsandymartin

Two dependent samples (matched pairs) Long Beach City College

Chapter one on sampling distributions.pptFekaduAman

Standard Scoresshoffma5

Genetic AlgorithmsKarthik Sankar

Microarray StatisticsA Roy

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)Austin Benson

The T-testZyrenMisaki

Early generation selection in an intra population recurrent selection breedin...CIAT

SdAntony Raj

Association mapping, GWAS, Mapping, natural population mappingMahesh Biradar

Chapter 04leohonesty0814

wk-2.pptxreneejanetubig1

Similar to Introduction to statistics ii (20)

Introduction to statistics

T Test For Two Independent Samples

DNA Microarray

Microarray Analysis

How to analyse bulk transcriptomic data using Deseq2

Lesson 3

unit 4 nearest neighbor.ppt

Statisticsforbiologists colstons

Two dependent samples (matched pairs)

Chapter one on sampling distributions.ppt

Standard Scores

Genetic Algorithms

Microarray Statistics

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)

The T-test

Early generation selection in an intra population recurrent selection breedin...

Association mapping, GWAS, Mapping, natural population mapping

Chapter 04

wk-2.pptx

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

How to convert PDF to text with Nanonetsnaman860154

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)

How to convert PDF to text with Nanonets

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Finology Group – Insurtech Innovation Award 2024

Understanding the Laravel MVC Architecture

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

My Hashitalk Indonesia April 2024 Presentation

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

GenCyber Cyber Security Day Presentation

Salesforce Community Group Quito, Salesforce 101

Unblocking The Main Thread Solving ANRs and Frozen Frames

08448380779 Call Girls In Friends Colony Women Seeking Men

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Presentation on how to chat with PDF using ChatGPT code interpreter

CNv6 Instructor Chapter 6 Quality of Service

The Codex of Business Writing Software for Real-World Solutions 2.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Introduction to statistics ii

1. Statistics for Microarray Data

2. Background μ, σ2 • Few observations made by a black box • What is the distribution behind the black box? • E.g., with what probability will it output a number bigger than 5?

3. Approach • Easy to determine with many observations • With few observations.. • Assume a canonical distribution based on prior knowledge • Determine parameters of this distribution using the observations, e.g., mean, variance

4. Estimating the mean

5. Estimating the variance σ2 Chi-Square if the original distribution was Normal

6. Microarray Data • Many genes, 25000 • 2 conditions (or more), many replicates within each condition • Which genes are differentially expressed between the two conditions?

7. More Specifically • For a particular gene – Each condition is a black box – Say 3 observations from each black box • Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?

8. Which Canonical Distribution • Use data with many replicates • 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968 • Distribution??

9. What is a QQ Plot

10. Distribution of log raw intensities across genes on a single array

11. The QQ plot of log scale intensities (i.e., actual vs simulated from normal)

12. QQ Plot against a Normal Distribution • 10 + 10 replicates in two groups • Single group QQ plot • Combined 2 groups QQ plot • Combined log-scale QQ plot Shapiro- Wilk Test

13. Which Canonical Distribution • Assume log normal distribution

14. Benford’s Law • Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]

15. Differential Expression μ1,σ12 μ2,σ22 Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?

16. SD increases linearly with Mean SD vs Mean across 3 replicates plotted for all genes

17. SD is flat now, except for very low values Another reason to work on the log scale SD vs Mean across 3 replicates computed for all genes after log-transformation

18. Differential Expression μ1,σ12 μ2,σ22 Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES

19. The T-Statistic

20. The T-Statistic

21. The T-Statistic

22. The T-Statistic Flattened Normal or T- Distribution

23. A Problem

24. The curve fit here may be a better estimate Lots of false positives can Not much be avoided difference here here SD vs Mean across 3 replicates computed for all genes after log-transformattion

25. Thank You

Introduction to statistics ii

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Introduction to statistics ii

Similar to Introduction to statistics ii (20)

More from Strand Life Sciences Pvt Ltd

More from Strand Life Sciences Pvt Ltd (11)

Recently uploaded

Recently uploaded (20)

Introduction to statistics ii