SlideShare a Scribd company logo
1 of 33
Download to read offline
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Gaussian Process Latent Variable Models &
applications in single-cell genomics
Kieran Campbell
University of Oxford
November 19, 2015
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Introduction to Gaussian Processes
Gaussian Process Latent Variable Models
Applications in single-cell genomics
References
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Introduction
In (Bayesian) supervised learning some (non-)linear function
f (x; w) parametrized by w is assumed to generate data {xn, yn}.
f may take any parametric form, e.g. linear f (x) = w0 + w1x
Posterior inference can be performed on
p(w|y, X) =
p(y|w, X)p(w)
p(y|X)
(1)
Predictions of a new point {y∗, x∗} can be made by
marginalising over w:
p(y∗|y, X, x∗) = dwp(y∗|w, X, x∗)p(w|y, X) (2)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Gaussian Process Regression
Gaussian Processes place a non-parametric prior over the functions
f (x)
f always indexed by ‘input variable’ x
Any subset of functions {fi }N
i=1 are jointly drawn from a
multivariate Gaussian distribution with zero mean and
covariance matrix K:
p(f1, . . . , fN) = N(0, K) (3)
In other words, entirely defined by second-order statistics K
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Choice of Kernel
Behaviour of the GP defined by choice of kernel & parameters
Kernel function K(x, x ) becomes covariance matrix once set
of points ‘realised’
Typical choice is double exponential
K(x, x ) = exp(−λ x − x 2
) (4)
Intuition is if x and x are similar, covariance will be larger and
so f and f will - on average - be closer together
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
GPs with noisy observations
So far assumed observations of f are noise free - GP becomes
function interpolator
Instead observations y(x) corrupted by noise so
y ∼ N(f (x), σ2)
Because everything is Gaussian, can marginalise over (latent)
functions f and find
p(y1, . . . , yN) ∼ N(0, K + σ2
I) (5)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Predictions with noisy observations
To make predictions with GPs only need covariance between ‘old’
inputs X and ‘new’ input x∗:
Let k∗ = K(X, x∗) and k∗∗ = K(x∗, x∗)
Then
p(f∗|x∗, X, y) = N(f∗|kT
∗ K−1
, k∗∗ − kT
∗ K−1
k∗) (6)
This highlights the major disadvantage of GPs - to make
predictions we need to invert an n × n matrix - O(n3)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Effect of RBF kernel parameters
Kernel
κ(xp, xq) =
σ2
f exp − 1
2l2 (xp − xq)2
+ σ2
y δqp
Parameters
l controls horizontal length scale
σf controls vertical length scale
σy noise variance
In figure (l, σf , σy ) have values
(a) (1, 1, 0.1)
(b) (0.3, 1.08, 0.00005)
(c) (3.0, 1.16, 0.89)
Figure: Rasmussen and Williams
2006
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Dimensionality reduction & unsupervised learning
Dimensionality reduction
Want to reduce some observed data Y ∈ RN×D to a set of latent
variables X ∈ RN×Q where Q D.
Methods
Linear: PCA, ICA
Non-linear: Laplacian eigenmaps, MDS, etc.
Probabilistic: PPCA, GPLVM
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Probabilistic PCA (Tipping and Bishop, 1999)
Recall Y observed data matrix, X latent matrix. Then assume
yn = Wxn + ηn
where
W linear relationship between latent space and data space
ηn Gaussian noise mean 0 precision β
Then marginalise out X to find
p(yn|W , β) = N(yn|0, WW T
+ β−1
I)
Analytic solution when W spans principal subspace - probabilistic
PCA.
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
GPLVM (Lawrence 2005)
Alternative representation (dual probabilistic PCA)
Instead of marginalising latent factors X, marginalise mapping W . Let
p(W ) = i N(wi , |0, I) then
p(y:,d |X, β) = N(y:,d |0, XXT
+ β−1
I)
GPLVM
Lawrence’s breakthrough was to realise that the covariance matrix
K = XXT
+ β−1
I
can be replaced by any similarity (kernel) matrix S as in the GP
framework.
GP-LVM define a mapping from the latent space to the observed space.
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
GPLVM example - oil flow data
Figure: PCA (left) and GPLVM (right) on multi-phase oil flow data
(Lawrence 2006)
GPLVM shows better separation between oil flow class (shape) compared
to PCA
GPLVM gives uncertainty in the data space. Since this is shared across all
feautures, can visualise in latent space (pixel intensity)
If we want true uncertainty in latent need Bayesian approach to find
p(latent|data)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Bayesian GPLVM
Ideally we want to know the uncertainty in the latent factors
p(latent|data). Approaches to inference:
Metropolis-hastings - requires lots of tweaking but
‘guaranteed’ for any model
HMC with Stan - fast, requires less tweaking but less support
for arbitrary priors
Variational inference1
1
Titsias, M., & Lawrence, N. (2010). Bayesian Gaussian Process Latent
Variable Model. Artificial Intelligence, 9, 844-851.
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Buettner 2012
Introduce ‘structure preserving’ GPLVM for clustering of single-cell qPCR from
zygote to blastocyst development
Includes a ‘prior’ that preserves local structure by modifying likelihood
(previously studied2
)
Find modified GPLVM gives better separation between different
developmental stages)
2
Maaten, L. Van Der. (2005). Preserving Local Structure in Gaussian
Process Latent Variable Models
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Buettner 2015
Use (GP?)-LVM to construct low-rank cell-to-cell covariance based on expression of
specific gene pathway
Model
yg ∼ N(µg , XXT
+ σ2
ν CCT
+ ν2
g I)
where
X hidden factor such as cell cycle
C observed covariate
Can then assess gene-gene correlation controlling for hidden factors
Non-linear PCA of genes not
annotated as cell-cycle. Left:
before scLVM, right: after.
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Bayesian Gaussian Process Latent Variable Models for
pseudotime inference
Pseudotime Artificial measure of a cells progression through some
process (differentiation, apoptosis, cell cycle)
Cell ordering problem Order high-dimensional transcriptomes
through process
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Current approaches
Monocle
ICA for dimensionality
reduction, longest path
through minimum spanning
tree to assign pseudotime
Uses cubic smoothing splines &
likelihood ratio test for
differential expression analysis
Standard analysis is to examine differential expression across
pseudotime
Questions What is the uncertainty in pseudotime? How does this
impact the false discovery rate of differential expression analysis?
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Bayesian GPLVM for pseudotime inference
1. Reduce dimensionality of gene expression data (LE, t-SNE,
PCA or all at once!)
2. Fit Bayesian GPLVM in reduced space (this is essentially a
probabilistic curve)
3. Quantify posterior samples, uncertainty etc
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Model
γ ∼ Gamma(γα, γβ)
λj ∼ Exp(γ)
σj ∼ InvGamma(α, β)
ti ∼ πt, i = 1, . . . , N,
Σ = diag(σ2
1, . . . , σ2
P)
K(j)
(t, t ) = exp(−λj (t − t )2
)
µj ∼ GP(0, K(j)
), j = 1, . . . , P,
xi ∼ N(µ(ti ), Σ), i = 1, . . . , N.
(7)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Prior issues
How do we define the prior on t, πt?
Typically want t = (t1, . . . , tn) to sit uniformly on [0, 1]
t only appears in the likelihood via λj (t − t )2
Means we can arbitrarily rescale λ → λ
and t →
√
t and get
same likelihood
t equivalent on any subset of [0, 1] - ill-defined problem
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Solutions
Corp prior
Want t to ‘fill out’ over [0, 1]
Introduce repulsive prior
πt(t) ∝
N
i=1
N
j=i+1
sin (π|ti − tj |) (8)
Non conjugate & difficult to evaluate gradient - need Metropolis
Hastings
Constrained random walk inference
If we constrain t to be on [0, 1] and use random walk sampling
(MH, HMC), pseudotimes naturally ‘wander’ towards the boundary
Once there, covariance structure settles them into a steady state
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Applications to biological datasets
Applied Bayesian GPLVM to three datasets:
1. Monocle Differentiating human myoblasts (time series) - 155
cells once contamination removed
2. Ear Differentiating cells from mouse cochlear & utricular
sensory epithelia. Pseudotime shows supporting cells (SC)
differentiating into hair cells (HC)
3. Waterfall Adult neurogenesis (PCA captures pseudotime
variation)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Sampling posterior curves
A Monocle dataset, laplacian eigenmaps representation
B Ear dataset, laplacian eigenmaps representation
C Waterfall dataset, PCA representation
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
What does the posterior uncertainty look like? (I)
95% HPD credible interval typically spans ∼ 1
4 of pseudotime
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
What does the posterior uncertainty look like? (II)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Effect of hyperparameters (Monocle dataset)
Recall
K(t, t ) ∝ exp −λj (t − t )2
λj ∼ Exp(γ)
γ ∼ Gamma(γα, γβ)
|λ| roughly corresponds to arc-length. So what are the effects of
changing γα, γβ?
E[γ] = γα
γβ
, Var[γ] = γα
γ2
β
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Approximate false discovery rate
How to approximate false discovery rate?
Refit differential expression
for each gene across
posterior samples of
pseudotime
Compute p- and q- values
for each sample for each
gene
Statistic is proportion
significant at 5% FDR
Differential gene expression
is false positive if
proportion significant
< 0.95 and q-value < 0.05
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Approximate false discovery rates
Approximate false discovery rate can be very high (∼ 3× larger
than it should be) but is also variable
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Integrating multiple dimensionality reduction algorithms
Can very easily integrate multiple source of data from different
dimensionality reduction algorithms:
p(t, {X}) ∝ πt(t)p(XLE|t)p(XPCA|t)p(XtSNE|t) (9)
Natural extension to integrate multiple heterogeneous source of
data, e.g.
p(t, {X}) ∝ πt(t)p(imaging|t)p(ATAC|t)p(transcriptomics|t)
(10)
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Example: Monocle with LE, PCA & t-SNE
Learning curves for each representation separately:
Joint learning of all representations:
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
FDR from multiple representation learning
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Some good references (I)
Gaussian Processes
Rasmussen, Carl Edward. ”Gaussian processes for machine learning.” (2006).
GPLVM
Lawrence, Neil D. ”Gaussian process latent variable models for visualisation of high
dimensional data.” Advances in neural information processing systems 16.3 (2004):
329-336.
Titsias, Michalis K., and Neil D. Lawrence. ”Bayesian Gaussian process latent variable
model.” International Conference on Artificial Intelligence and Statistics. 2010.
van der Maaten, Laurens. ”Preserving local structure in Gaussian process latent variable
models.” Proceedings of the 18th Annual Belgian-Dutch Conference on Machine Learning.
2009.
Wang, Ye, and David B. Dunson. ”Probabilistic Curve Learning: Coulomb Repulsion and
the Electrostatic Gaussian Process.” arXiv preprint arXiv:1506.03768 (2015).
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics
Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References
Some good references (II)
Latent variable models in single-cell genomics
Buettner, Florian, and Fabian J. Theis. ”A novel approach for resolving differences in
single-cell gene expression patterns from zygote to blastocyst.” Bioinformatics 28.18
(2012): i626-i632.
Buettner, Florian, et al. ”Computational analysis of cell-to-cell heterogeneity in single-cell
RNA-sequencing data reveals hidden subpopulations of cells.” Nature biotechnology 33.2
(2015): 155-160.
Pseudotime
Trapnell, Cole, et al. ”The dynamics and regulators of cell fate decisions are revealed by
pseudotemporal ordering of single cells.” Nature biotechnology 32.4 (2014): 381-386.
Bendall, Sean C., et al. ”Single-cell trajectory detection uncovers progression and regulatory
coordination in human B cell development.” Cell 157.3 (2014): 714-725.
Marco, Eugenio, et al. ”Bifurcation analysis of single-cell gene expression data reveals
epigenetic landscape.” Proceedings of the National Academy of Sciences 111.52 (2014):
E5643-E5650.
Shin, Jaehoon, et al. ”Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades
underlying Adult Neurogenesis.” Cell stem cell 17.3 (2015): 360-372.
Leng, Ning, et al. ”Oscope identifies oscillatory genes in unsynchronized single-cell
RNA-seq experiments.” Nature methods 12.10 (2015): 947-950.
Kieran Campbell University of Oxford
Gaussian Process Latent Variable Models & applications in single-cell genomics

More Related Content

What's hot

Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Francisco Zamora-Martinez
 
Master Defense Slides (translated)
Master Defense Slides (translated)Master Defense Slides (translated)
Master Defense Slides (translated)Francis Piéraut
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachAllen Wu
 
Joint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersJoint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersUniversitat Politècnica de Catalunya
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS
A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS
A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS cscpconf
 

What's hot (20)

Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
MUMS Opening Workshop - Materials Innovation Driven by Data and Knowledge Sys...
MUMS Opening Workshop - Materials Innovation Driven by Data and Knowledge Sys...MUMS Opening Workshop - Materials Innovation Driven by Data and Knowledge Sys...
MUMS Opening Workshop - Materials Innovation Driven by Data and Knowledge Sys...
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
 
Master Defense Slides (translated)
Master Defense Slides (translated)Master Defense Slides (translated)
Master Defense Slides (translated)
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Joint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersJoint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clusters
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS
A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS
A NOVEL ANT COLONY ALGORITHM FOR MULTICAST ROUTING IN WIRELESS AD HOC NETWORKS
 

Similar to Introduction to Gaussian Processes for Dimensionality Reduction and Single-Cell Genomics

Parellelism in spectral methods
Parellelism in spectral methodsParellelism in spectral methods
Parellelism in spectral methodsRamona Corman
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
 
Second Order Heuristics in ACGP
Second Order Heuristics in ACGPSecond Order Heuristics in ACGP
Second Order Heuristics in ACGPhauschildm
 
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Amro Elfeki
 
Streaming Model Transformations by Complex Event Processing
Streaming Model Transformations by Complex Event ProcessingStreaming Model Transformations by Complex Event Processing
Streaming Model Transformations by Complex Event ProcessingIstván Dávid
 
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...Ilya Loshchilov
 
Samplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfSamplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfFadel Adoe
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdfAllanKelvinSales
 
Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...
Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...
Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...Ural-PDC
 
20080620 Formal systems/synthetic biology modelling re-engineered
20080620 Formal systems/synthetic biology modelling re-engineered20080620 Formal systems/synthetic biology modelling re-engineered
20080620 Formal systems/synthetic biology modelling re-engineeredJonathan Blakes
 
proposal_pura
proposal_puraproposal_pura
proposal_puraErick Lin
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445Rasha Orban
 
Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...
Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...
Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...SandroSnchezZamora
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelJessica Minnier
 
Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...
Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...
Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...Michael Hoffman
 

Similar to Introduction to Gaussian Processes for Dimensionality Reduction and Single-Cell Genomics (20)

Parellelism in spectral methods
Parellelism in spectral methodsParellelism in spectral methods
Parellelism in spectral methods
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
Second Order Heuristics in ACGP
Second Order Heuristics in ACGPSecond Order Heuristics in ACGP
Second Order Heuristics in ACGP
 
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
 
Streaming Model Transformations by Complex Event Processing
Streaming Model Transformations by Complex Event ProcessingStreaming Model Transformations by Complex Event Processing
Streaming Model Transformations by Complex Event Processing
 
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
 
Samplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfSamplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdf
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
 
Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...
Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...
Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implemen...
 
20080620 Formal systems/synthetic biology modelling re-engineered
20080620 Formal systems/synthetic biology modelling re-engineered20080620 Formal systems/synthetic biology modelling re-engineered
20080620 Formal systems/synthetic biology modelling re-engineered
 
Paper 6 (azam zaka)
Paper 6 (azam zaka)Paper 6 (azam zaka)
Paper 6 (azam zaka)
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445
 
Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...
Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...
Sen, Z. (1974) Propiedades de muestras pequeñas de modelos estocásticos estac...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 
Quline
Quline Quline
Quline
 
Atomreaktor
AtomreaktorAtomreaktor
Atomreaktor
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
 
Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...
Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...
Segway and the Graphical Models Toolkit: a framework for probabilistic genomi...
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 

Introduction to Gaussian Processes for Dimensionality Reduction and Single-Cell Genomics

  • 1. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Gaussian Process Latent Variable Models & applications in single-cell genomics Kieran Campbell University of Oxford November 19, 2015 Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 2. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 3. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Introduction In (Bayesian) supervised learning some (non-)linear function f (x; w) parametrized by w is assumed to generate data {xn, yn}. f may take any parametric form, e.g. linear f (x) = w0 + w1x Posterior inference can be performed on p(w|y, X) = p(y|w, X)p(w) p(y|X) (1) Predictions of a new point {y∗, x∗} can be made by marginalising over w: p(y∗|y, X, x∗) = dwp(y∗|w, X, x∗)p(w|y, X) (2) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 4. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Gaussian Process Regression Gaussian Processes place a non-parametric prior over the functions f (x) f always indexed by ‘input variable’ x Any subset of functions {fi }N i=1 are jointly drawn from a multivariate Gaussian distribution with zero mean and covariance matrix K: p(f1, . . . , fN) = N(0, K) (3) In other words, entirely defined by second-order statistics K Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 5. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Choice of Kernel Behaviour of the GP defined by choice of kernel & parameters Kernel function K(x, x ) becomes covariance matrix once set of points ‘realised’ Typical choice is double exponential K(x, x ) = exp(−λ x − x 2 ) (4) Intuition is if x and x are similar, covariance will be larger and so f and f will - on average - be closer together Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 6. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References GPs with noisy observations So far assumed observations of f are noise free - GP becomes function interpolator Instead observations y(x) corrupted by noise so y ∼ N(f (x), σ2) Because everything is Gaussian, can marginalise over (latent) functions f and find p(y1, . . . , yN) ∼ N(0, K + σ2 I) (5) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 7. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Predictions with noisy observations To make predictions with GPs only need covariance between ‘old’ inputs X and ‘new’ input x∗: Let k∗ = K(X, x∗) and k∗∗ = K(x∗, x∗) Then p(f∗|x∗, X, y) = N(f∗|kT ∗ K−1 , k∗∗ − kT ∗ K−1 k∗) (6) This highlights the major disadvantage of GPs - to make predictions we need to invert an n × n matrix - O(n3) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 8. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Effect of RBF kernel parameters Kernel κ(xp, xq) = σ2 f exp − 1 2l2 (xp − xq)2 + σ2 y δqp Parameters l controls horizontal length scale σf controls vertical length scale σy noise variance In figure (l, σf , σy ) have values (a) (1, 1, 0.1) (b) (0.3, 1.08, 0.00005) (c) (3.0, 1.16, 0.89) Figure: Rasmussen and Williams 2006 Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 9. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Dimensionality reduction & unsupervised learning Dimensionality reduction Want to reduce some observed data Y ∈ RN×D to a set of latent variables X ∈ RN×Q where Q D. Methods Linear: PCA, ICA Non-linear: Laplacian eigenmaps, MDS, etc. Probabilistic: PPCA, GPLVM Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 10. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Probabilistic PCA (Tipping and Bishop, 1999) Recall Y observed data matrix, X latent matrix. Then assume yn = Wxn + ηn where W linear relationship between latent space and data space ηn Gaussian noise mean 0 precision β Then marginalise out X to find p(yn|W , β) = N(yn|0, WW T + β−1 I) Analytic solution when W spans principal subspace - probabilistic PCA. Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 11. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References GPLVM (Lawrence 2005) Alternative representation (dual probabilistic PCA) Instead of marginalising latent factors X, marginalise mapping W . Let p(W ) = i N(wi , |0, I) then p(y:,d |X, β) = N(y:,d |0, XXT + β−1 I) GPLVM Lawrence’s breakthrough was to realise that the covariance matrix K = XXT + β−1 I can be replaced by any similarity (kernel) matrix S as in the GP framework. GP-LVM define a mapping from the latent space to the observed space. Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 12. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References GPLVM example - oil flow data Figure: PCA (left) and GPLVM (right) on multi-phase oil flow data (Lawrence 2006) GPLVM shows better separation between oil flow class (shape) compared to PCA GPLVM gives uncertainty in the data space. Since this is shared across all feautures, can visualise in latent space (pixel intensity) If we want true uncertainty in latent need Bayesian approach to find p(latent|data) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 13. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Bayesian GPLVM Ideally we want to know the uncertainty in the latent factors p(latent|data). Approaches to inference: Metropolis-hastings - requires lots of tweaking but ‘guaranteed’ for any model HMC with Stan - fast, requires less tweaking but less support for arbitrary priors Variational inference1 1 Titsias, M., & Lawrence, N. (2010). Bayesian Gaussian Process Latent Variable Model. Artificial Intelligence, 9, 844-851. Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 14. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Buettner 2012 Introduce ‘structure preserving’ GPLVM for clustering of single-cell qPCR from zygote to blastocyst development Includes a ‘prior’ that preserves local structure by modifying likelihood (previously studied2 ) Find modified GPLVM gives better separation between different developmental stages) 2 Maaten, L. Van Der. (2005). Preserving Local Structure in Gaussian Process Latent Variable Models Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 15. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Buettner 2015 Use (GP?)-LVM to construct low-rank cell-to-cell covariance based on expression of specific gene pathway Model yg ∼ N(µg , XXT + σ2 ν CCT + ν2 g I) where X hidden factor such as cell cycle C observed covariate Can then assess gene-gene correlation controlling for hidden factors Non-linear PCA of genes not annotated as cell-cycle. Left: before scLVM, right: after. Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 16. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Bayesian Gaussian Process Latent Variable Models for pseudotime inference Pseudotime Artificial measure of a cells progression through some process (differentiation, apoptosis, cell cycle) Cell ordering problem Order high-dimensional transcriptomes through process Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 17. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Current approaches Monocle ICA for dimensionality reduction, longest path through minimum spanning tree to assign pseudotime Uses cubic smoothing splines & likelihood ratio test for differential expression analysis Standard analysis is to examine differential expression across pseudotime Questions What is the uncertainty in pseudotime? How does this impact the false discovery rate of differential expression analysis? Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 18. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Bayesian GPLVM for pseudotime inference 1. Reduce dimensionality of gene expression data (LE, t-SNE, PCA or all at once!) 2. Fit Bayesian GPLVM in reduced space (this is essentially a probabilistic curve) 3. Quantify posterior samples, uncertainty etc Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 19. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Model γ ∼ Gamma(γα, γβ) λj ∼ Exp(γ) σj ∼ InvGamma(α, β) ti ∼ πt, i = 1, . . . , N, Σ = diag(σ2 1, . . . , σ2 P) K(j) (t, t ) = exp(−λj (t − t )2 ) µj ∼ GP(0, K(j) ), j = 1, . . . , P, xi ∼ N(µ(ti ), Σ), i = 1, . . . , N. (7) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 20. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Prior issues How do we define the prior on t, πt? Typically want t = (t1, . . . , tn) to sit uniformly on [0, 1] t only appears in the likelihood via λj (t − t )2 Means we can arbitrarily rescale λ → λ and t → √ t and get same likelihood t equivalent on any subset of [0, 1] - ill-defined problem Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 21. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Solutions Corp prior Want t to ‘fill out’ over [0, 1] Introduce repulsive prior πt(t) ∝ N i=1 N j=i+1 sin (π|ti − tj |) (8) Non conjugate & difficult to evaluate gradient - need Metropolis Hastings Constrained random walk inference If we constrain t to be on [0, 1] and use random walk sampling (MH, HMC), pseudotimes naturally ‘wander’ towards the boundary Once there, covariance structure settles them into a steady state Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 22. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Applications to biological datasets Applied Bayesian GPLVM to three datasets: 1. Monocle Differentiating human myoblasts (time series) - 155 cells once contamination removed 2. Ear Differentiating cells from mouse cochlear & utricular sensory epithelia. Pseudotime shows supporting cells (SC) differentiating into hair cells (HC) 3. Waterfall Adult neurogenesis (PCA captures pseudotime variation) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 23. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Sampling posterior curves A Monocle dataset, laplacian eigenmaps representation B Ear dataset, laplacian eigenmaps representation C Waterfall dataset, PCA representation Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 24. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References What does the posterior uncertainty look like? (I) 95% HPD credible interval typically spans ∼ 1 4 of pseudotime Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 25. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References What does the posterior uncertainty look like? (II) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 26. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Effect of hyperparameters (Monocle dataset) Recall K(t, t ) ∝ exp −λj (t − t )2 λj ∼ Exp(γ) γ ∼ Gamma(γα, γβ) |λ| roughly corresponds to arc-length. So what are the effects of changing γα, γβ? E[γ] = γα γβ , Var[γ] = γα γ2 β Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 27. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Approximate false discovery rate How to approximate false discovery rate? Refit differential expression for each gene across posterior samples of pseudotime Compute p- and q- values for each sample for each gene Statistic is proportion significant at 5% FDR Differential gene expression is false positive if proportion significant < 0.95 and q-value < 0.05 Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 28. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Approximate false discovery rates Approximate false discovery rate can be very high (∼ 3× larger than it should be) but is also variable Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 29. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Integrating multiple dimensionality reduction algorithms Can very easily integrate multiple source of data from different dimensionality reduction algorithms: p(t, {X}) ∝ πt(t)p(XLE|t)p(XPCA|t)p(XtSNE|t) (9) Natural extension to integrate multiple heterogeneous source of data, e.g. p(t, {X}) ∝ πt(t)p(imaging|t)p(ATAC|t)p(transcriptomics|t) (10) Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 30. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Example: Monocle with LE, PCA & t-SNE Learning curves for each representation separately: Joint learning of all representations: Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 31. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References FDR from multiple representation learning Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 32. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Some good references (I) Gaussian Processes Rasmussen, Carl Edward. ”Gaussian processes for machine learning.” (2006). GPLVM Lawrence, Neil D. ”Gaussian process latent variable models for visualisation of high dimensional data.” Advances in neural information processing systems 16.3 (2004): 329-336. Titsias, Michalis K., and Neil D. Lawrence. ”Bayesian Gaussian process latent variable model.” International Conference on Artificial Intelligence and Statistics. 2010. van der Maaten, Laurens. ”Preserving local structure in Gaussian process latent variable models.” Proceedings of the 18th Annual Belgian-Dutch Conference on Machine Learning. 2009. Wang, Ye, and David B. Dunson. ”Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process.” arXiv preprint arXiv:1506.03768 (2015). Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics
  • 33. Introduction to Gaussian Processes Gaussian Process Latent Variable Models Applications in single-cell genomics References Some good references (II) Latent variable models in single-cell genomics Buettner, Florian, and Fabian J. Theis. ”A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst.” Bioinformatics 28.18 (2012): i626-i632. Buettner, Florian, et al. ”Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.” Nature biotechnology 33.2 (2015): 155-160. Pseudotime Trapnell, Cole, et al. ”The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.” Nature biotechnology 32.4 (2014): 381-386. Bendall, Sean C., et al. ”Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development.” Cell 157.3 (2014): 714-725. Marco, Eugenio, et al. ”Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape.” Proceedings of the National Academy of Sciences 111.52 (2014): E5643-E5650. Shin, Jaehoon, et al. ”Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis.” Cell stem cell 17.3 (2015): 360-372. Leng, Ning, et al. ”Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments.” Nature methods 12.10 (2015): 947-950. Kieran Campbell University of Oxford Gaussian Process Latent Variable Models & applications in single-cell genomics