SlideShare a Scribd company logo
Microarray data noise
simulation
Despoina I. Kalfakakou
Interinstitutional postgraduate program
“Information Technologies in Medicine and Biology”
Course: Simulation methods in medicine and biology
Instructor: Dr G. Spyrou
Microarray data
• DNA microarray: a collection of tiny
DNA spots on a surface.
• Used in order to estimate the expression
of a large number of genes at the same
time.
• The expression measurements are saved
in a tsv file, with rows representing the
genes and columns representing the
samples.
Microarray data noise
• Biological noise:
– Gene expression is a random and noisy
process.
• “Inner" noise: The result of the inherent
stochasticity of biochemical processes such as
transcription and translation.
• “Outer" noise: Variations in quantities or conditions
from other cellular components (e.g., proteins)
indirectly result in a change in the expression of a
particular gene.
• Technical noise: Artefacts.
Gene Correlation
Possible correlation partners:
•X activates Y.
•X suppresses Y.
•W activates both X and Y.
•W activates X, while suppressing Y.
Information extraction from real
data {1/2}
• Real data constitutes of 30 breast tissue
samples (2 states: 20 healthy tissue; 10
tumour tissue)
• 20368 gene expression measurements per
sample.
• Data are already normalized.
Information extraction from real
data{2/2}
1. Per gene study of mean values and standard
deviations of gene expressions for each
state.
2. Significance Analysis for the discovery of
differentially expressed genes (non-
parametric t-test).
3. Significant gene covariance matrix
construction.
4. SVM training using significant gene
expressions.
Simulation Model
• Idea: Simulation of an “ideal” – noiseless
distribution. Application of different noise
models.
• Final distribution for gene i :
• xi = ai + ni , where ai is the noiseless distribution, ni
is the noise.
Ideal Distribution
• Gene i not significant: Normal distribution, where mean
value equals the real data mean value and
corresponding standard deviation.
• Gene i significant: Multivariate normal distribution, with
parameters: A vector with the real data mean values in
the given situation and the two-dimensional covariance
table Σ of the real correlated significant genes, where:
– Σ[i,j] equals the covariance of genes i and j , if correlated,
– Σ[i,j]=0, if not correlated and
– Σ[i,j]=var(i), if i = j.
Noise {1/3}
• The behavior of the data is studied by adding known
noise models:
– Uniform noise:
– Gaussian noise:
Noise {2/3}
• Poisson noise:
• Cauchy noise:
Noise {3/3}
• χ² noise:
• Exponential noise:
Evaluation
– Significant Analysis for the discovery of
differentially expressed genes.
– Use of the differentially expressed genes as
test data in the real data trained SVM
classifier.
Real Data Significant Analysis
SAM tool (Significant Analysis Of Microarrays).
Upregulated: 70
Downregulated: 236
SVM training using real data
• Linear kernel.
• 10-fold cross validation.
• Truth Table:
• Accuracy: 90%.
Predicted
True Normal Diseased
Normal 19 1
Diseased 2 8
Uniform noise
Distribution parameters: a=upper bound, b=lower bound.
Significant genes: 267 upregulated, 245 downregulated.
SVM accuracy: 20%
Gaussian noise
Distribution parameters: mv = min(mv), σ=0,8.
Significant genes: 117 upregulated, 222 downregulated.
SVM accuracy: 80%
Poisson noise
Distribution parameters: λ=1.
Significant genes: 62 upregulated, 182 downregulated.
SVM accuracy: 76,667%
Cauchy noise
Distribution parameters: location=min(mv), scale=0.3.
Significant genes: 40 upregulated, 102 downregulated.
SVM accuracy: 63.334%
χ² noise
Distribution parameters: df=0.75, center=0.
Significant genes: 92 upregulated, 269 downregulated.
SVM accuracy: 83.334%
Exponential noise
Distribution parameters: λ=0.97.
Significant genes: 63 upregulated, 224 downregulated.
SVM accuracy: 86,667%
In depth-study of exponential noise
λ 0,97 2 0,3
Upregulated 64 389 0
Downregulated 224 532 15
SVM accuracy 86.667% 76.667% -
# of Times 1 1,5 2
Upregulated 64 14 0
Downregulated 224 98 48
SVM accuracy 86.667% 83.334% -
• Different λ values:
• Application of noise more than once:
Future applications{1/2}
• ConsensusClusterPlus tool:
Future applications{2/2}
• From consensusClusterPlus: Real data can
be divided in 4 categories.
• Noise simulation considering these 4
categories.
References {1/2}
• “Novel markers for differentiation of lobular and ductal
invasive breast carcinomas by laser microdissection and
microarray analysis.”, Turashvili et al, BMC Cancer, 2007.
• “Using Gene Expression Noise to Understand Gene
Regulation”, Munsky et al., SCIENE Vol. 336.
• “Simulating Correlated Multivariate Normal Data”, Alison
Kosel, 2009.
• “Interplay between gene expression noise and regulatory
network architecture”, Chalancon et al., Trends in Genetics,
Vol. 28.
• “Models of stochastic gene expression”, Paulsson et al.,
Physics of Life Reviews 2 (2005).
• “Intrinsic and extrinsic contributions to stochasticity in gene
expression”, Swain et al, PNAS Vol. 99.
References {2/2}
• “Intrinsic noise in gene regulatory networks”, Mukund Thattai
and Alexander van Oudenaarden, PNAS Vol. 98.
• “Making sense of microarray data distributions”, Hoyle et al,
Bioinformatics Vol. 18.
• “A Flexible Microarray Data Simulation Model”, Doulaye
Dembele, Microarrays, Vol. 2.
• “Simulation of microarray data with realistic characteristics”,
Nykter et al., BMC Bioinformatics 2006.
• http://statweb.stanford.edu/~tibs/SAM/
• “ConsensusClusterPlus: a class discovery tool with confidence
assessments and item tracking”, Wilkerson et al,
Bioinformatics, 2010.
Thank you

More Related Content

Similar to Microarray data noise simulation

Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
Maté Ongenaert
 
Random
RandomRandom
Random
sstest1234
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
Lekki Frazier-Wood
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
Dorothy Bishop
 
Glioblastoma_Linkedin
Glioblastoma_LinkedinGlioblastoma_Linkedin
Glioblastoma_Linkedin
Elsa Fecke
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
Elena Sügis
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
FranciscoJAzuajeG
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
ssuser957fe2
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
AnandKumar459862
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
nedalalazzwy
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Nicole Krämer
 
5
55
7
77
7
77
DNA microarray
DNA microarrayDNA microarray
DNA microarray
S Rasouli
 
Microarray data Analysis.pptx
Microarray data Analysis.pptxMicroarray data Analysis.pptx
Microarray data Analysis.pptx
sanarao25
 
Diagnosis of cancer
Diagnosis of cancerDiagnosis of cancer
Diagnosis of cancer
Smart Karthi
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
Hyun-hwan Jeong
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
imgcommcall
 

Similar to Microarray data noise simulation (20)

Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Random
RandomRandom
Random
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
Glioblastoma_Linkedin
Glioblastoma_LinkedinGlioblastoma_Linkedin
Glioblastoma_Linkedin
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
 
5
55
5
 
7
77
7
 
7
77
7
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
Microarray data Analysis.pptx
Microarray data Analysis.pptxMicroarray data Analysis.pptx
Microarray data Analysis.pptx
 
Diagnosis of cancer
Diagnosis of cancerDiagnosis of cancer
Diagnosis of cancer
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Microarray data noise simulation

  • 1. Microarray data noise simulation Despoina I. Kalfakakou Interinstitutional postgraduate program “Information Technologies in Medicine and Biology” Course: Simulation methods in medicine and biology Instructor: Dr G. Spyrou
  • 2. Microarray data • DNA microarray: a collection of tiny DNA spots on a surface. • Used in order to estimate the expression of a large number of genes at the same time. • The expression measurements are saved in a tsv file, with rows representing the genes and columns representing the samples.
  • 3. Microarray data noise • Biological noise: – Gene expression is a random and noisy process. • “Inner" noise: The result of the inherent stochasticity of biochemical processes such as transcription and translation. • “Outer" noise: Variations in quantities or conditions from other cellular components (e.g., proteins) indirectly result in a change in the expression of a particular gene. • Technical noise: Artefacts.
  • 4. Gene Correlation Possible correlation partners: •X activates Y. •X suppresses Y. •W activates both X and Y. •W activates X, while suppressing Y.
  • 5. Information extraction from real data {1/2} • Real data constitutes of 30 breast tissue samples (2 states: 20 healthy tissue; 10 tumour tissue) • 20368 gene expression measurements per sample. • Data are already normalized.
  • 6. Information extraction from real data{2/2} 1. Per gene study of mean values and standard deviations of gene expressions for each state. 2. Significance Analysis for the discovery of differentially expressed genes (non- parametric t-test). 3. Significant gene covariance matrix construction. 4. SVM training using significant gene expressions.
  • 7. Simulation Model • Idea: Simulation of an “ideal” – noiseless distribution. Application of different noise models. • Final distribution for gene i : • xi = ai + ni , where ai is the noiseless distribution, ni is the noise.
  • 8. Ideal Distribution • Gene i not significant: Normal distribution, where mean value equals the real data mean value and corresponding standard deviation. • Gene i significant: Multivariate normal distribution, with parameters: A vector with the real data mean values in the given situation and the two-dimensional covariance table Σ of the real correlated significant genes, where: – Σ[i,j] equals the covariance of genes i and j , if correlated, – Σ[i,j]=0, if not correlated and – Σ[i,j]=var(i), if i = j.
  • 9. Noise {1/3} • The behavior of the data is studied by adding known noise models: – Uniform noise: – Gaussian noise:
  • 10. Noise {2/3} • Poisson noise: • Cauchy noise:
  • 11. Noise {3/3} • χ² noise: • Exponential noise:
  • 12. Evaluation – Significant Analysis for the discovery of differentially expressed genes. – Use of the differentially expressed genes as test data in the real data trained SVM classifier.
  • 13. Real Data Significant Analysis SAM tool (Significant Analysis Of Microarrays). Upregulated: 70 Downregulated: 236
  • 14. SVM training using real data • Linear kernel. • 10-fold cross validation. • Truth Table: • Accuracy: 90%. Predicted True Normal Diseased Normal 19 1 Diseased 2 8
  • 15. Uniform noise Distribution parameters: a=upper bound, b=lower bound. Significant genes: 267 upregulated, 245 downregulated. SVM accuracy: 20%
  • 16. Gaussian noise Distribution parameters: mv = min(mv), σ=0,8. Significant genes: 117 upregulated, 222 downregulated. SVM accuracy: 80%
  • 17. Poisson noise Distribution parameters: λ=1. Significant genes: 62 upregulated, 182 downregulated. SVM accuracy: 76,667%
  • 18. Cauchy noise Distribution parameters: location=min(mv), scale=0.3. Significant genes: 40 upregulated, 102 downregulated. SVM accuracy: 63.334%
  • 19. χ² noise Distribution parameters: df=0.75, center=0. Significant genes: 92 upregulated, 269 downregulated. SVM accuracy: 83.334%
  • 20. Exponential noise Distribution parameters: λ=0.97. Significant genes: 63 upregulated, 224 downregulated. SVM accuracy: 86,667%
  • 21. In depth-study of exponential noise λ 0,97 2 0,3 Upregulated 64 389 0 Downregulated 224 532 15 SVM accuracy 86.667% 76.667% - # of Times 1 1,5 2 Upregulated 64 14 0 Downregulated 224 98 48 SVM accuracy 86.667% 83.334% - • Different λ values: • Application of noise more than once:
  • 23. Future applications{2/2} • From consensusClusterPlus: Real data can be divided in 4 categories. • Noise simulation considering these 4 categories.
  • 24. References {1/2} • “Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis.”, Turashvili et al, BMC Cancer, 2007. • “Using Gene Expression Noise to Understand Gene Regulation”, Munsky et al., SCIENE Vol. 336. • “Simulating Correlated Multivariate Normal Data”, Alison Kosel, 2009. • “Interplay between gene expression noise and regulatory network architecture”, Chalancon et al., Trends in Genetics, Vol. 28. • “Models of stochastic gene expression”, Paulsson et al., Physics of Life Reviews 2 (2005). • “Intrinsic and extrinsic contributions to stochasticity in gene expression”, Swain et al, PNAS Vol. 99.
  • 25. References {2/2} • “Intrinsic noise in gene regulatory networks”, Mukund Thattai and Alexander van Oudenaarden, PNAS Vol. 98. • “Making sense of microarray data distributions”, Hoyle et al, Bioinformatics Vol. 18. • “A Flexible Microarray Data Simulation Model”, Doulaye Dembele, Microarrays, Vol. 2. • “Simulation of microarray data with realistic characteristics”, Nykter et al., BMC Bioinformatics 2006. • http://statweb.stanford.edu/~tibs/SAM/ • “ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking”, Wilkerson et al, Bioinformatics, 2010.