Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

A Bayesian Approach to Model Overlapping
Objects Available as Distance Data
Sandhya Prabhakaran1
and Julia E. Vogt2,3
Memorial Sloan Kettering Cancer Centre, NYC
1
University of Basel
2
Swiss Institute of Bioinformatics
3
MLconf, NYC
29th March 2019

Two religions in Machine Learning
Frequentists
(https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)

Frequentists Bayesians
(https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)

● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probability is a Point estimate
○ What is the relative frequency of tails = no answer

● Frequentists:
● Bayesians:
○ Probability is a distribution
○ What is the relative frequency of tails = 0.5

● Frequentists:
● Bayesians:
○ Probability is a distribution
○ What is the relative frequency of tails = 0.5
○ A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly
believes he has seen a mule.
○ More flexible: inference, thinking, planning and reasoning (downstream analyses)

Bayesian: Clustering of vectorial objects

Bayesian: Clustering of vectorial objects
Clustering
algorithm

Bayesian: Clustering of non-vectorial objects
(Image courtesy: shutterstock)

Bayesian: Clustering of non-vectorial objects
Mostly available as
pairwise distance data

POCD:
Probabilistic model for Overlap Clustering of
Distance data

POCD: Overlap Clustering for distance data

● Bayesian clustering model
● Given pairwise D, we infer Z (the cluster assignment matrix)

Z
● Binary matrix
● Cluster assignment
matrix
● Needs to be inferred

● Bayesian clustering model
● Given pairwise D, we infer Z:
p(Z|D,.) ∝ p(D|Z) p(Z)
(posterior) (likelihood) (prior)

p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)

Prior over Z: Indian Buffet process
● As k → infinity, we arrive at the IBP
● No need to fix the number of clusters
p(Z|D,.) ∝ p(D|Z) p(Z)

Invariant Likelihood: generalised Wishart
● Translation and rotation invariant
p(Z|D,.) ∝ p(D|Z) p(Z)

Inference using Metropolis Hastings
● MCMC algorithm
● Used in models deploying the IBP
● Asymptotically exact
approximations of the posterior
● We need to infer Z and #clusters
p(Z|D,.) ∝ p(D|Z) p(Z)

Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Of the 26 FDA approved anti-HIV drugs:
○ 10 are PIs
● The PIs exhibit similar behaviour
○ Similar chemical structure
● Not readily available
https://www.sciencedirect.com/science/article/pii/S0165614711001398

● Necessary to identify alternative PIs for therapy
○ What are the structural dissimilarities amongst PIs?

● Necessary to identify alternative PIs for therapy
○ What are the structural dissimilarities amongst PIs?
● Use Protein Contact Maps of each PI
○ Distances between all AA residue pairs for a protein
○ Row-wise vectorise the contact map
○ Compute the Normalised Information distance

Contact Maps of the Protease Inhibitors

Reading material
● A tutorial on Bayesian nonparametric models:
http://gershmanlab.webfactional.com/pubs/GershmanBlei12.pdf
● Leo Breiman: ‘Statistical Modeling: The Two Cultures’:
https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
● An abstract of this work as Spotlight at the Bayesian Nonparametrics Workshop at NeurIPS 2018:
https://drive.google.com/file/d/1ExVpeUomv8Z4mPMu5as_CbmrHjVY0IDV/view
● Tutorials on latest Deep learning papers: https://www.depthfirstlearning.com/ ( @DepthFirstLearn)

@sandhya212
Thank you

Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

More Related Content

Similar to Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

More from MLconf

Recently uploaded

Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data