SlideShare a Scribd company logo
Statistics for Microarray Data
Background

         μ, σ2




• Few observations made by a black box

• What is the distribution behind the black box?

• E.g., with what probability will it output a number
  bigger than 5?
Approach

• Easy to determine with many observations

• With few observations..

• Assume a canonical distribution based on prior
  knowledge

• Determine parameters of this distribution using
  the observations, e.g., mean, variance
Estimating the mean
Estimating the variance σ2

                         Chi-Square if
                          the original
                         distribution
                         was Normal
Microarray Data
• Many genes, 25000

• 2 conditions (or more), many replicates within
  each condition

• Which genes are differentially expressed
  between the two conditions?
More Specifically
• For a particular gene
  – Each condition is a black box
  – Say 3 observations from each black box


• Do both black boxes have the same
  distribution?
  – Assume same canonical distribution
  – Do both have the same parameters?
Which Canonical Distribution
• Use data with many replicates

• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242,
  379.8320, 392.1817, 450.4758, 335.8242, 265.2478,
  196.6982, 289.6532, 274.4035, 246.6807, 254.8710,
  165.9416, 281.9463, 246.6434, 259.0019, 242.1968


• Distribution??
What is a QQ Plot
Distribution of log raw intensities
 across genes on a single array
The QQ plot of log scale intensities
(i.e., actual vs simulated from normal)
QQ Plot against a Normal Distribution
• 10 + 10 replicates in
  two groups

• Single group QQ plot

• Combined 2 groups QQ
  plot

• Combined log-scale QQ
  plot
                          Shapiro-
                          Wilk Test
Which Canonical Distribution



• Assume log normal distribution
Benford’s Law
• Frequency distribution of first significant digit




    Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]
Differential Expression

          μ1,σ12             μ2,σ22




Group 1                                        Group 2


                    Is μ1= μ2?
                     σ1 = σ2 ?        Is variance a
                                       function of
                                          mean?
SD
increases
 linearly
with Mean




  SD vs Mean across 3 replicates plotted for all genes
SD is flat
   now,
except for
 very low
  values




                                          Another
                                         reason to
                                          work on
                                          the log
                                           scale




SD vs Mean across 3 replicates computed for all
       genes after log-transformation
Differential Expression

          μ1,σ12             μ2,σ22




Group 1                                   Group 2


                    Is μ1= μ2?
                     σ1 = σ2 ?        Sort-of YES
The T-Statistic
The T-Statistic
The T-Statistic
The T-Statistic
                   Flattened
                  Normal or T-
                  Distribution
A Problem
The curve
                                          fit here
                                         may be a
                                           better
                                         estimate




Lots of false
positives can                            Not much
 be avoided                              difference
    here                                    here


SD vs Mean across 3 replicates computed for all
       genes after log-transformattion
Thank You

More Related Content

Viewers also liked

Introduction of suffix tree
Introduction of suffix treeIntroduction of suffix tree
Introduction of suffix tree
Liou Shu Hung
 
Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
myrajendra
 
Trie tree
Trie treeTrie tree
Trie tree
Shakil Ahmed
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
Harshit Agarwal
 
Data structure tries
Data structure triesData structure tries
Data structure tries
Md. Naim khan
 
Lec18
Lec18Lec18
Fundamentals
FundamentalsFundamentals
Fundamentals
myrajendra
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
Amrinder Arora
 
Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2
Teerawat Issariyakul
 
Application of tries
Application of triesApplication of tries
Application of tries
Tech_MX
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
East West University
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
SHEETAL WAGHMARE
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
Harjanto Handi Kusumo
 

Viewers also liked (14)

Introduction of suffix tree
Introduction of suffix treeIntroduction of suffix tree
Introduction of suffix tree
 
Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
 
Trie tree
Trie treeTrie tree
Trie tree
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
Data structure tries
Data structure triesData structure tries
Data structure tries
 
Lec18
Lec18Lec18
Lec18
 
Fundamentals
FundamentalsFundamentals
Fundamentals
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2
 
Application of tries
Application of triesApplication of tries
Application of tries
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
 

Similar to Introduction to statistics ii

Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Strand Life Sciences Pvt Ltd
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
shoffma5
 
DNA Microarray
DNA MicroarrayDNA Microarray
Explorando a Cognição Neural: Mente, Cérebro e Comportamento
Explorando a Cognição Neural: Mente, Cérebro e ComportamentoExplorando a Cognição Neural: Mente, Cérebro e Comportamento
Explorando a Cognição Neural: Mente, Cérebro e Comportamento
tidihi5139
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
James McInerney
 
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptxGGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
BHAGWAT NAWADE
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
AdamCribbs1
 
Lesson 3
Lesson 3Lesson 3
Lesson 3
Ning Ding
 
unit 4 nearest neighbor.ppt
unit 4 nearest neighbor.pptunit 4 nearest neighbor.ppt
unit 4 nearest neighbor.ppt
PRANAVKUMAR699137
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
andymartin
 
Two dependent samples (matched pairs)
Two dependent samples (matched pairs) Two dependent samples (matched pairs)
Two dependent samples (matched pairs)
Long Beach City College
 
Chapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptChapter one on sampling distributions.ppt
Chapter one on sampling distributions.ppt
FekaduAman
 
Standard Scores
Standard ScoresStandard Scores
Standard Scores
shoffma5
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
Karthik Sankar
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray Statistics
A Roy
 
Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)
Austin Benson
 
The T-test
The T-testThe T-test
The T-test
ZyrenMisaki
 
Early generation selection in an intra population recurrent selection breedin...
Early generation selection in an intra population recurrent selection breedin...Early generation selection in an intra population recurrent selection breedin...
Early generation selection in an intra population recurrent selection breedin...
CIAT
 
Sd
SdSd
Association mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mappingAssociation mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mapping
Mahesh Biradar
 

Similar to Introduction to statistics ii (20)

Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
 
DNA Microarray
DNA MicroarrayDNA Microarray
DNA Microarray
 
Explorando a Cognição Neural: Mente, Cérebro e Comportamento
Explorando a Cognição Neural: Mente, Cérebro e ComportamentoExplorando a Cognição Neural: Mente, Cérebro e Comportamento
Explorando a Cognição Neural: Mente, Cérebro e Comportamento
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptxGGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
Lesson 3
Lesson 3Lesson 3
Lesson 3
 
unit 4 nearest neighbor.ppt
unit 4 nearest neighbor.pptunit 4 nearest neighbor.ppt
unit 4 nearest neighbor.ppt
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
 
Two dependent samples (matched pairs)
Two dependent samples (matched pairs) Two dependent samples (matched pairs)
Two dependent samples (matched pairs)
 
Chapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptChapter one on sampling distributions.ppt
Chapter one on sampling distributions.ppt
 
Standard Scores
Standard ScoresStandard Scores
Standard Scores
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray Statistics
 
Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)
 
The T-test
The T-testThe T-test
The T-test
 
Early generation selection in an intra population recurrent selection breedin...
Early generation selection in an intra population recurrent selection breedin...Early generation selection in an intra population recurrent selection breedin...
Early generation selection in an intra population recurrent selection breedin...
 
Sd
SdSd
Sd
 
Association mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mappingAssociation mapping, GWAS, Mapping, natural population mapping
Association mapping, GWAS, Mapping, natural population mapping
 

More from Strand Life Sciences Pvt Ltd

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
Strand Life Sciences Pvt Ltd
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
Strand Life Sciences Pvt Ltd
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
Strand Life Sciences Pvt Ltd
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
Strand Life Sciences Pvt Ltd
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
Strand Life Sciences Pvt Ltd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
Strand Life Sciences Pvt Ltd
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
Strand Life Sciences Pvt Ltd
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
Strand Life Sciences Pvt Ltd
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
Strand Life Sciences Pvt Ltd
 
Suffix arrays
Suffix arraysSuffix arrays
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
Strand Life Sciences Pvt Ltd
 

More from Strand Life Sciences Pvt Ltd (11)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 

Introduction to statistics ii

  • 2. Background μ, σ2 • Few observations made by a black box • What is the distribution behind the black box? • E.g., with what probability will it output a number bigger than 5?
  • 3. Approach • Easy to determine with many observations • With few observations.. • Assume a canonical distribution based on prior knowledge • Determine parameters of this distribution using the observations, e.g., mean, variance
  • 5. Estimating the variance σ2 Chi-Square if the original distribution was Normal
  • 6. Microarray Data • Many genes, 25000 • 2 conditions (or more), many replicates within each condition • Which genes are differentially expressed between the two conditions?
  • 7. More Specifically • For a particular gene – Each condition is a black box – Say 3 observations from each black box • Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?
  • 8. Which Canonical Distribution • Use data with many replicates • 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968 • Distribution??
  • 9. What is a QQ Plot
  • 10. Distribution of log raw intensities across genes on a single array
  • 11. The QQ plot of log scale intensities (i.e., actual vs simulated from normal)
  • 12. QQ Plot against a Normal Distribution • 10 + 10 replicates in two groups • Single group QQ plot • Combined 2 groups QQ plot • Combined log-scale QQ plot Shapiro- Wilk Test
  • 13. Which Canonical Distribution • Assume log normal distribution
  • 14. Benford’s Law • Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]
  • 15. Differential Expression μ1,σ12 μ2,σ22 Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?
  • 16. SD increases linearly with Mean SD vs Mean across 3 replicates plotted for all genes
  • 17. SD is flat now, except for very low values Another reason to work on the log scale SD vs Mean across 3 replicates computed for all genes after log-transformation
  • 18. Differential Expression μ1,σ12 μ2,σ22 Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES
  • 22. The T-Statistic Flattened Normal or T- Distribution
  • 24. The curve fit here may be a better estimate Lots of false positives can Not much be avoided difference here here SD vs Mean across 3 replicates computed for all genes after log-transformattion