SlideShare a Scribd company logo
1 of 13
Enrico Glaab
Luxembourg Centre for Systems Biomedicine
Exploiting technical
replicate variance in omics
data analysis (RepExplore)
2
Introduction: Variance and replicates
• Biomedical omics datasets may contain different types of variance:
Variance across samples from different individuals:
- Between-group variance
- Within-group variance
Variance across samples from the same individual:
- Intra-individual variance
- Technical variance
• Between- and within-group variance are covered via biological replicates,
intra-individual variance via time-series measurements, and technical
variance via technical replicates
3
Motivation: Loss of technical variance information
• Typically, combinations of biological and summarized technical replicates are
used for case-control analyses:
• Summarization of technical replicates into average measurements can
provide more robust input data, but information on the variance across
technical replicates is lost!
Mean/median
summarization
Mean/median
summarization
Statistical analysis
4
Example: Relevance of technical variance information
Abundance of the top differential metabolite (L-valine) in the Arabidopsis data by
Andersen et al. (2014) without (a) and with technical error (b, see whiskers)
without technical error with technical error
a) b)
mean-summarized
technical replicates
5
Probability of positive likelihood ratio (PPLR)
• We assume the omics data on logarithmic scale has an approximate normal
distribution
• Using mean (μ) and variance (s²) estimates derived from technical replicate
measurements, differential expression can be scored using the probability of
positive likelihood ratio (PPLR):
where P is the cumulative distribution function for the standard normal curve.
• For probe replicates on DNA microarray chips, Liu et al. propose a dedicated
parameter estimation approach to incorporate the probe-level measurement
error into variance estimates s² (Bioinformatics, 2006)
6
Comparison: Classical vs. PPLR top-ranked result
Whisker plots of the top differential metabolite in the Arabidopsis data by
Andersen et al. (2014): (a) classical approach (L-valine), (b) PPLR (L-proline)
a) b)
7
Application to Parkinson‘s disease transcriptome data
a) b)
a) Whisker plot of the top differential gene (NDUFB2) in the Parkinson’s disease
dataset by Simunovic et al. (2009); b) Heat map of the top 10 differentially
expressed genes according to the PPLR statistic.
8
Improved robustness of rankings across studies
• Evaluation: Compare the similarity of gene rankings across two Parkinson‘s
disease datasets (Simunovic et al., 2009, and Zheng et al., 2010)
• Two gene ranking methods are compared: PPLR and limma/eBayes (note:
duplicateCorrelation-function not generally applicable)
• Kendall‘s tau used as similarity measure for rankings
 Significantly higher similarity of rankings with PPLR statistic (p < 2.2E-16
for both up- and down-regulated genes)
9
Rankings improve with increasing numbers of replicates
• Apply method to simulated normal data with stddev. 1, 100 samples
(two groups, 50 per group) and 1000 features/biomolecules (900
uncorrelated, irrelevant and 100 differential, fixed effect size of D = 1)
• Add simulated technical replicates with measurement noise (R function jitter)
• Test enrichment of 100 differential features in the final PPLR ranking
 Enrichment score increases with increasing numbers of replicates
10
Using replicate variance information in PCA
• Principal Component Analysis (PCA) can be cast as a probabilistic model
(Tipping & Bishop, 1999) where d-dimensional data points yn can be
reconstructed from a q-dimensional latent point xn via a linear transformation
(W) and a noise vector εn:
• The corresponding data distribution is:
• If each dimension is allowed to have a different noise variance:
• While the maximum likelihood solution for the original model is equivalent to
PCA, Sanguinetti et al. presented an expectation maximization (EM) method
for the model with feature-specific noise variance (Bioinformatics, 2005)
11
Denoising property & results on Parkinson‘s disease data
• In the new probabilistic PCA, the noise estimate enables an evaluation of
the significance of principal components (PCs)  automated computation
of the maximum number of retainable PCs
• By accounting for measurement error in the model, noisy values are down-
weighted  pre-processing data via the modified PCA tends to provide
tighter sample clusters (confirmed on Parkinson‘s disease transcriptome
data)
12
Summary
• Technical replicate variance in omics data is usally not constant across
different biomolecules  only summarizing replicates via averaging will result
in loss of valuable information
• On Parkinson‘s disease transcriptomics data, accounting for technical
replicate variance provided improved robustness of differential expression
rankings across studies and gave tighter clusters in PCA visualizations
• All methods presented have been implemented on a public web-server
(RepExplore: www.repexplore.tk, Bioinformatics, 2015), providing automated
analyses with interactive visualizations and ranking tables
13
References
1. E. Glaab, R. Schneider, RepExplore: Addressing technical replicate variance in proteomics and metabolomics data analysis, Bioinformatics (2015),
31(13), pp. 2235
2. E. Glaab, Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification, Briefings in Bioinformatics
(2015), in press (doi: 10.1093/bib/bbv044)
3. E. Glaab, R. Schneider, Comparative pathway and network analysis of brain transcriptome changes during adult aging and in Parkinson's disease,
Neurobiology of Disease (2015), 74, 1-13
4. N. Vlassis, E. Glaab, GenePEN: analysis of network activity alterations in complex diseases via the pairwise elastic net, Statistical Applications in
Genetics and Molecular Biology (2015), 14(2), pp. 221
5. E. Glaab, Building a virtual ligand screening pipeline using free software: a survey, Briefings in Bioinformatics (2015), in press (doi:
10.1093/bib/bbv037)
6. E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, A. Valencia. EnrichNet: network-based gene set enrichment analysis, Bioinformatics, 28(18):i451-
i457, 2012
7. E. Glaab, R. Schneider, PathVar: analysis of gene and protein expression variance in cellular pathways using microarray data, Bioinformatics,
28(3):446-447, 2012
8. E. Glaab, J. Bacardit, J. M. Garibaldi, N. Krasnogor, Using rule-based machine learning for candidate disease gene prioritization and sample
classification of cancer gene expression data, PLoS ONE, 7(7):e39932, 2012
9. E. Glaab, A. Baudot, N. Krasnogor, A. Valencia. TopoGSA: network topological gene set analysis,
Bioinformatics, 26(9):1271-1272, 2010
10. E. Glaab, A. Baudot, N. Krasnogor, A. Valencia. Extending pathways and processes using molecular interaction networks to analyse cancer genome
data, BMC Bioinformatics, 11(1):597, 2010
11. H. O. Habashy, D. G. Powe, E. Glaab, N. Krasnogor, J. M. Garibaldi, E. A. Rakha, G. Ball, A. R Green, C. Caldas, I. O. Ellis, RERG (Ras-related and
oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of ER-positive luminal-like subtype, Breast Cancer Research and
Treatment, 128(2):315-326, 2011
12. E. Glaab, J. M. Garibaldi and N. Krasnogor. ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus
methods with cross-study normalization, BMC Bioinformatics,10:358, 2009
13. E. Glaab, J. M. Garibaldi, N. Krasnogor. Learning pathway-based decision rules to classify microarray cancer samples, German Conference on
Bioinformatics 2010, Lecture Notes in Informatics (LNI), 173, 123-134
14. E. Glaab, J. M. Garibaldi and N. Krasnogor. VRMLGen: An R-package for 3D Data Visualization on the Web, Journal of Statistical Software, 36(8),1-
18, 2010

More Related Content

What's hot

Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...laserxiong
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
 
working_example_poster
working_example_posterworking_example_poster
working_example_posterHuikun Zhang
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderAlexander Pico
 
Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Gerald Lushington
 
Computer for Biological Research
Computer for Biological ResearchComputer for Biological Research
Computer for Biological ResearchChakard Chalayut
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizAlexander Pico
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextDmitry Grapov
 

What's hot (20)

Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Guttenberg Resume
Guttenberg ResumeGuttenberg Resume
Guttenberg Resume
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...
 
David
DavidDavid
David
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
MoM2010: Bioinformatics
MoM2010: BioinformaticsMoM2010: Bioinformatics
MoM2010: Bioinformatics
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
working_example_poster
working_example_posterworking_example_poster
working_example_poster
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
 
Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...
 
Computer for Biological Research
Computer for Biological ResearchComputer for Biological Research
Computer for Biological Research
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with Context
 

Viewers also liked

Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Enrico Glaab
 
X-omics Data Integration Challenges
X-omics Data Integration ChallengesX-omics Data Integration Challenges
X-omics Data Integration ChallengesCOST action BM1006
 
Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisCOST action BM1006
 

Viewers also liked (6)

Sangeeta Reddy Presentation - Healthcare data
Sangeeta Reddy Presentation - Healthcare dataSangeeta Reddy Presentation - Healthcare data
Sangeeta Reddy Presentation - Healthcare data
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
integration_Aug2015
integration_Aug2015integration_Aug2015
integration_Aug2015
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
X-omics Data Integration Challenges
X-omics Data Integration ChallengesX-omics Data Integration Challenges
X-omics Data Integration Challenges
 
Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 

Similar to Exploiting technical replicate variance analysis

EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data setsimprovemed
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyJoaquin Dopazo
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeJoaquin Dopazo
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013Dmitry Grapov
 
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...Syed Ahmad Chan Bukhari, PhD
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 

Similar to Exploiting technical replicate variance analysis (20)

EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decade
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
 
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...
 
Madhavi
MadhaviMadhavi
Madhavi
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
NTU-2019
NTU-2019NTU-2019
NTU-2019
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 

Recently uploaded

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxPABOLU TEJASREE
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 

Recently uploaded (20)

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 

Exploiting technical replicate variance analysis

  • 1. Enrico Glaab Luxembourg Centre for Systems Biomedicine Exploiting technical replicate variance in omics data analysis (RepExplore)
  • 2. 2 Introduction: Variance and replicates • Biomedical omics datasets may contain different types of variance: Variance across samples from different individuals: - Between-group variance - Within-group variance Variance across samples from the same individual: - Intra-individual variance - Technical variance • Between- and within-group variance are covered via biological replicates, intra-individual variance via time-series measurements, and technical variance via technical replicates
  • 3. 3 Motivation: Loss of technical variance information • Typically, combinations of biological and summarized technical replicates are used for case-control analyses: • Summarization of technical replicates into average measurements can provide more robust input data, but information on the variance across technical replicates is lost! Mean/median summarization Mean/median summarization Statistical analysis
  • 4. 4 Example: Relevance of technical variance information Abundance of the top differential metabolite (L-valine) in the Arabidopsis data by Andersen et al. (2014) without (a) and with technical error (b, see whiskers) without technical error with technical error a) b) mean-summarized technical replicates
  • 5. 5 Probability of positive likelihood ratio (PPLR) • We assume the omics data on logarithmic scale has an approximate normal distribution • Using mean (μ) and variance (s²) estimates derived from technical replicate measurements, differential expression can be scored using the probability of positive likelihood ratio (PPLR): where P is the cumulative distribution function for the standard normal curve. • For probe replicates on DNA microarray chips, Liu et al. propose a dedicated parameter estimation approach to incorporate the probe-level measurement error into variance estimates s² (Bioinformatics, 2006)
  • 6. 6 Comparison: Classical vs. PPLR top-ranked result Whisker plots of the top differential metabolite in the Arabidopsis data by Andersen et al. (2014): (a) classical approach (L-valine), (b) PPLR (L-proline) a) b)
  • 7. 7 Application to Parkinson‘s disease transcriptome data a) b) a) Whisker plot of the top differential gene (NDUFB2) in the Parkinson’s disease dataset by Simunovic et al. (2009); b) Heat map of the top 10 differentially expressed genes according to the PPLR statistic.
  • 8. 8 Improved robustness of rankings across studies • Evaluation: Compare the similarity of gene rankings across two Parkinson‘s disease datasets (Simunovic et al., 2009, and Zheng et al., 2010) • Two gene ranking methods are compared: PPLR and limma/eBayes (note: duplicateCorrelation-function not generally applicable) • Kendall‘s tau used as similarity measure for rankings  Significantly higher similarity of rankings with PPLR statistic (p < 2.2E-16 for both up- and down-regulated genes)
  • 9. 9 Rankings improve with increasing numbers of replicates • Apply method to simulated normal data with stddev. 1, 100 samples (two groups, 50 per group) and 1000 features/biomolecules (900 uncorrelated, irrelevant and 100 differential, fixed effect size of D = 1) • Add simulated technical replicates with measurement noise (R function jitter) • Test enrichment of 100 differential features in the final PPLR ranking  Enrichment score increases with increasing numbers of replicates
  • 10. 10 Using replicate variance information in PCA • Principal Component Analysis (PCA) can be cast as a probabilistic model (Tipping & Bishop, 1999) where d-dimensional data points yn can be reconstructed from a q-dimensional latent point xn via a linear transformation (W) and a noise vector εn: • The corresponding data distribution is: • If each dimension is allowed to have a different noise variance: • While the maximum likelihood solution for the original model is equivalent to PCA, Sanguinetti et al. presented an expectation maximization (EM) method for the model with feature-specific noise variance (Bioinformatics, 2005)
  • 11. 11 Denoising property & results on Parkinson‘s disease data • In the new probabilistic PCA, the noise estimate enables an evaluation of the significance of principal components (PCs)  automated computation of the maximum number of retainable PCs • By accounting for measurement error in the model, noisy values are down- weighted  pre-processing data via the modified PCA tends to provide tighter sample clusters (confirmed on Parkinson‘s disease transcriptome data)
  • 12. 12 Summary • Technical replicate variance in omics data is usally not constant across different biomolecules  only summarizing replicates via averaging will result in loss of valuable information • On Parkinson‘s disease transcriptomics data, accounting for technical replicate variance provided improved robustness of differential expression rankings across studies and gave tighter clusters in PCA visualizations • All methods presented have been implemented on a public web-server (RepExplore: www.repexplore.tk, Bioinformatics, 2015), providing automated analyses with interactive visualizations and ranking tables
  • 13. 13 References 1. E. Glaab, R. Schneider, RepExplore: Addressing technical replicate variance in proteomics and metabolomics data analysis, Bioinformatics (2015), 31(13), pp. 2235 2. E. Glaab, Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification, Briefings in Bioinformatics (2015), in press (doi: 10.1093/bib/bbv044) 3. E. Glaab, R. Schneider, Comparative pathway and network analysis of brain transcriptome changes during adult aging and in Parkinson's disease, Neurobiology of Disease (2015), 74, 1-13 4. N. Vlassis, E. Glaab, GenePEN: analysis of network activity alterations in complex diseases via the pairwise elastic net, Statistical Applications in Genetics and Molecular Biology (2015), 14(2), pp. 221 5. E. Glaab, Building a virtual ligand screening pipeline using free software: a survey, Briefings in Bioinformatics (2015), in press (doi: 10.1093/bib/bbv037) 6. E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, A. Valencia. EnrichNet: network-based gene set enrichment analysis, Bioinformatics, 28(18):i451- i457, 2012 7. E. Glaab, R. Schneider, PathVar: analysis of gene and protein expression variance in cellular pathways using microarray data, Bioinformatics, 28(3):446-447, 2012 8. E. Glaab, J. Bacardit, J. M. Garibaldi, N. Krasnogor, Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data, PLoS ONE, 7(7):e39932, 2012 9. E. Glaab, A. Baudot, N. Krasnogor, A. Valencia. TopoGSA: network topological gene set analysis, Bioinformatics, 26(9):1271-1272, 2010 10. E. Glaab, A. Baudot, N. Krasnogor, A. Valencia. Extending pathways and processes using molecular interaction networks to analyse cancer genome data, BMC Bioinformatics, 11(1):597, 2010 11. H. O. Habashy, D. G. Powe, E. Glaab, N. Krasnogor, J. M. Garibaldi, E. A. Rakha, G. Ball, A. R Green, C. Caldas, I. O. Ellis, RERG (Ras-related and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of ER-positive luminal-like subtype, Breast Cancer Research and Treatment, 128(2):315-326, 2011 12. E. Glaab, J. M. Garibaldi and N. Krasnogor. ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization, BMC Bioinformatics,10:358, 2009 13. E. Glaab, J. M. Garibaldi, N. Krasnogor. Learning pathway-based decision rules to classify microarray cancer samples, German Conference on Bioinformatics 2010, Lecture Notes in Informatics (LNI), 173, 123-134 14. E. Glaab, J. M. Garibaldi and N. Krasnogor. VRMLGen: An R-package for 3D Data Visualization on the Web, Journal of Statistical Software, 36(8),1- 18, 2010