SlideShare a Scribd company logo
1 of 26
Microarray data noise
simulation
Despoina I. Kalfakakou
Interinstitutional postgraduate program
“Information Technologies in Medicine and Biology”
Course: Simulation methods in medicine and biology
Instructor: Dr G. Spyrou
Microarray data
• DNA microarray: a collection of tiny
DNA spots on a surface.
• Used in order to estimate the expression
of a large number of genes at the same
time.
• The expression measurements are saved
in a tsv file, with rows representing the
genes and columns representing the
samples.
Microarray data noise
• Biological noise:
– Gene expression is a random and noisy
process.
• “Inner" noise: The result of the inherent
stochasticity of biochemical processes such as
transcription and translation.
• “Outer" noise: Variations in quantities or conditions
from other cellular components (e.g., proteins)
indirectly result in a change in the expression of a
particular gene.
• Technical noise: Artefacts.
Gene Correlation
Possible correlation partners:
•X activates Y.
•X suppresses Y.
•W activates both X and Y.
•W activates X, while suppressing Y.
Information extraction from real
data {1/2}
• Real data constitutes of 30 breast tissue
samples (2 states: 20 healthy tissue; 10
tumour tissue)
• 20368 gene expression measurements per
sample.
• Data are already normalized.
Information extraction from real
data{2/2}
1. Per gene study of mean values and standard
deviations of gene expressions for each
state.
2. Significance Analysis for the discovery of
differentially expressed genes (non-
parametric t-test).
3. Significant gene covariance matrix
construction.
4. SVM training using significant gene
expressions.
Simulation Model
• Idea: Simulation of an “ideal” – noiseless
distribution. Application of different noise
models.
• Final distribution for gene i :
• xi = ai + ni , where ai is the noiseless distribution, ni
is the noise.
Ideal Distribution
• Gene i not significant: Normal distribution, where mean
value equals the real data mean value and
corresponding standard deviation.
• Gene i significant: Multivariate normal distribution, with
parameters: A vector with the real data mean values in
the given situation and the two-dimensional covariance
table Σ of the real correlated significant genes, where:
– Σ[i,j] equals the covariance of genes i and j , if correlated,
– Σ[i,j]=0, if not correlated and
– Σ[i,j]=var(i), if i = j.
Noise {1/3}
• The behavior of the data is studied by adding known
noise models:
– Uniform noise:
– Gaussian noise:
Noise {2/3}
• Poisson noise:
• Cauchy noise:
Noise {3/3}
• χ² noise:
• Exponential noise:
Evaluation
– Significant Analysis for the discovery of
differentially expressed genes.
– Use of the differentially expressed genes as
test data in the real data trained SVM
classifier.
Real Data Significant Analysis
SAM tool (Significant Analysis Of Microarrays).
Upregulated: 70
Downregulated: 236
SVM training using real data
• Linear kernel.
• 10-fold cross validation.
• Truth Table:
• Accuracy: 90%.
Predicted
True Normal Diseased
Normal 19 1
Diseased 2 8
Uniform noise
Distribution parameters: a=upper bound, b=lower bound.
Significant genes: 267 upregulated, 245 downregulated.
SVM accuracy: 20%
Gaussian noise
Distribution parameters: mv = min(mv), σ=0,8.
Significant genes: 117 upregulated, 222 downregulated.
SVM accuracy: 80%
Poisson noise
Distribution parameters: λ=1.
Significant genes: 62 upregulated, 182 downregulated.
SVM accuracy: 76,667%
Cauchy noise
Distribution parameters: location=min(mv), scale=0.3.
Significant genes: 40 upregulated, 102 downregulated.
SVM accuracy: 63.334%
χ² noise
Distribution parameters: df=0.75, center=0.
Significant genes: 92 upregulated, 269 downregulated.
SVM accuracy: 83.334%
Exponential noise
Distribution parameters: λ=0.97.
Significant genes: 63 upregulated, 224 downregulated.
SVM accuracy: 86,667%
In depth-study of exponential noise
λ 0,97 2 0,3
Upregulated 64 389 0
Downregulated 224 532 15
SVM accuracy 86.667% 76.667% -
# of Times 1 1,5 2
Upregulated 64 14 0
Downregulated 224 98 48
SVM accuracy 86.667% 83.334% -
• Different λ values:
• Application of noise more than once:
Future applications{1/2}
• ConsensusClusterPlus tool:
Future applications{2/2}
• From consensusClusterPlus: Real data can
be divided in 4 categories.
• Noise simulation considering these 4
categories.
References {1/2}
• “Novel markers for differentiation of lobular and ductal
invasive breast carcinomas by laser microdissection and
microarray analysis.”, Turashvili et al, BMC Cancer, 2007.
• “Using Gene Expression Noise to Understand Gene
Regulation”, Munsky et al., SCIENE Vol. 336.
• “Simulating Correlated Multivariate Normal Data”, Alison
Kosel, 2009.
• “Interplay between gene expression noise and regulatory
network architecture”, Chalancon et al., Trends in Genetics,
Vol. 28.
• “Models of stochastic gene expression”, Paulsson et al.,
Physics of Life Reviews 2 (2005).
• “Intrinsic and extrinsic contributions to stochasticity in gene
expression”, Swain et al, PNAS Vol. 99.
References {2/2}
• “Intrinsic noise in gene regulatory networks”, Mukund Thattai
and Alexander van Oudenaarden, PNAS Vol. 98.
• “Making sense of microarray data distributions”, Hoyle et al,
Bioinformatics Vol. 18.
• “A Flexible Microarray Data Simulation Model”, Doulaye
Dembele, Microarrays, Vol. 2.
• “Simulation of microarray data with realistic characteristics”,
Nykter et al., BMC Bioinformatics 2006.
• http://statweb.stanford.edu/~tibs/SAM/
• “ConsensusClusterPlus: a class discovery tool with confidence
assessments and item tracking”, Wilkerson et al,
Bioinformatics, 2010.
Thank you

More Related Content

Similar to Microarray data noise simulation

Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG researchDorothy Bishop
 
Glioblastoma_Linkedin
Glioblastoma_LinkedinGlioblastoma_Linkedin
Glioblastoma_LinkedinElsa Fecke
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdfnedalalazzwy
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Nicole Krämer
 
DNA microarray
DNA microarrayDNA microarray
DNA microarrayS Rasouli
 
Microarray data Analysis.pptx
Microarray data Analysis.pptxMicroarray data Analysis.pptx
Microarray data Analysis.pptxsanarao25
 
Diagnosis of cancer
Diagnosis of cancerDiagnosis of cancer
Diagnosis of cancerSmart Karthi
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detectionHyun-hwan Jeong
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methodsimgcommcall
 

Similar to Microarray data noise simulation (20)

Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Random
RandomRandom
Random
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
Glioblastoma_Linkedin
Glioblastoma_LinkedinGlioblastoma_Linkedin
Glioblastoma_Linkedin
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
 
5
55
5
 
7
77
7
 
7
77
7
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
Microarray data Analysis.pptx
Microarray data Analysis.pptxMicroarray data Analysis.pptx
Microarray data Analysis.pptx
 
Diagnosis of cancer
Diagnosis of cancerDiagnosis of cancer
Diagnosis of cancer
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
 

Recently uploaded

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 

Recently uploaded (20)

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

Microarray data noise simulation

  • 1. Microarray data noise simulation Despoina I. Kalfakakou Interinstitutional postgraduate program “Information Technologies in Medicine and Biology” Course: Simulation methods in medicine and biology Instructor: Dr G. Spyrou
  • 2. Microarray data • DNA microarray: a collection of tiny DNA spots on a surface. • Used in order to estimate the expression of a large number of genes at the same time. • The expression measurements are saved in a tsv file, with rows representing the genes and columns representing the samples.
  • 3. Microarray data noise • Biological noise: – Gene expression is a random and noisy process. • “Inner" noise: The result of the inherent stochasticity of biochemical processes such as transcription and translation. • “Outer" noise: Variations in quantities or conditions from other cellular components (e.g., proteins) indirectly result in a change in the expression of a particular gene. • Technical noise: Artefacts.
  • 4. Gene Correlation Possible correlation partners: •X activates Y. •X suppresses Y. •W activates both X and Y. •W activates X, while suppressing Y.
  • 5. Information extraction from real data {1/2} • Real data constitutes of 30 breast tissue samples (2 states: 20 healthy tissue; 10 tumour tissue) • 20368 gene expression measurements per sample. • Data are already normalized.
  • 6. Information extraction from real data{2/2} 1. Per gene study of mean values and standard deviations of gene expressions for each state. 2. Significance Analysis for the discovery of differentially expressed genes (non- parametric t-test). 3. Significant gene covariance matrix construction. 4. SVM training using significant gene expressions.
  • 7. Simulation Model • Idea: Simulation of an “ideal” – noiseless distribution. Application of different noise models. • Final distribution for gene i : • xi = ai + ni , where ai is the noiseless distribution, ni is the noise.
  • 8. Ideal Distribution • Gene i not significant: Normal distribution, where mean value equals the real data mean value and corresponding standard deviation. • Gene i significant: Multivariate normal distribution, with parameters: A vector with the real data mean values in the given situation and the two-dimensional covariance table Σ of the real correlated significant genes, where: – Σ[i,j] equals the covariance of genes i and j , if correlated, – Σ[i,j]=0, if not correlated and – Σ[i,j]=var(i), if i = j.
  • 9. Noise {1/3} • The behavior of the data is studied by adding known noise models: – Uniform noise: – Gaussian noise:
  • 10. Noise {2/3} • Poisson noise: • Cauchy noise:
  • 11. Noise {3/3} • χ² noise: • Exponential noise:
  • 12. Evaluation – Significant Analysis for the discovery of differentially expressed genes. – Use of the differentially expressed genes as test data in the real data trained SVM classifier.
  • 13. Real Data Significant Analysis SAM tool (Significant Analysis Of Microarrays). Upregulated: 70 Downregulated: 236
  • 14. SVM training using real data • Linear kernel. • 10-fold cross validation. • Truth Table: • Accuracy: 90%. Predicted True Normal Diseased Normal 19 1 Diseased 2 8
  • 15. Uniform noise Distribution parameters: a=upper bound, b=lower bound. Significant genes: 267 upregulated, 245 downregulated. SVM accuracy: 20%
  • 16. Gaussian noise Distribution parameters: mv = min(mv), σ=0,8. Significant genes: 117 upregulated, 222 downregulated. SVM accuracy: 80%
  • 17. Poisson noise Distribution parameters: λ=1. Significant genes: 62 upregulated, 182 downregulated. SVM accuracy: 76,667%
  • 18. Cauchy noise Distribution parameters: location=min(mv), scale=0.3. Significant genes: 40 upregulated, 102 downregulated. SVM accuracy: 63.334%
  • 19. χ² noise Distribution parameters: df=0.75, center=0. Significant genes: 92 upregulated, 269 downregulated. SVM accuracy: 83.334%
  • 20. Exponential noise Distribution parameters: λ=0.97. Significant genes: 63 upregulated, 224 downregulated. SVM accuracy: 86,667%
  • 21. In depth-study of exponential noise λ 0,97 2 0,3 Upregulated 64 389 0 Downregulated 224 532 15 SVM accuracy 86.667% 76.667% - # of Times 1 1,5 2 Upregulated 64 14 0 Downregulated 224 98 48 SVM accuracy 86.667% 83.334% - • Different λ values: • Application of noise more than once:
  • 23. Future applications{2/2} • From consensusClusterPlus: Real data can be divided in 4 categories. • Noise simulation considering these 4 categories.
  • 24. References {1/2} • “Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis.”, Turashvili et al, BMC Cancer, 2007. • “Using Gene Expression Noise to Understand Gene Regulation”, Munsky et al., SCIENE Vol. 336. • “Simulating Correlated Multivariate Normal Data”, Alison Kosel, 2009. • “Interplay between gene expression noise and regulatory network architecture”, Chalancon et al., Trends in Genetics, Vol. 28. • “Models of stochastic gene expression”, Paulsson et al., Physics of Life Reviews 2 (2005). • “Intrinsic and extrinsic contributions to stochasticity in gene expression”, Swain et al, PNAS Vol. 99.
  • 25. References {2/2} • “Intrinsic noise in gene regulatory networks”, Mukund Thattai and Alexander van Oudenaarden, PNAS Vol. 98. • “Making sense of microarray data distributions”, Hoyle et al, Bioinformatics Vol. 18. • “A Flexible Microarray Data Simulation Model”, Doulaye Dembele, Microarrays, Vol. 2. • “Simulation of microarray data with realistic characteristics”, Nykter et al., BMC Bioinformatics 2006. • http://statweb.stanford.edu/~tibs/SAM/ • “ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking”, Wilkerson et al, Bioinformatics, 2010.