Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Bayesian Approach to Model Overlapping
Objects Available as Distance Data
Sandhya Prabhakaran1
and Julia E. Vogt2,3
Memo...
Two religions in Machine Learning
Frequentists
(https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-...
Two religions in Machine Learning
Frequentists Bayesians
(https://medium.com/datadriveninvestor/bayesian-vs-frequentist-fo...
Two religions in Machine Learning
● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probabilit...
Two religions in Machine Learning
● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probabilit...
Two religions in Machine Learning
● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probabilit...
Bayesian: Clustering
Bayesian: Clustering
Bayesian: Clustering of vectorial objects
Bayesian: Clustering of vectorial objects
Clustering
algorithm
Bayesian: Clustering of non-vectorial objects
(Image courtesy: shutterstock)
Bayesian: Clustering of non-vectorial objects
Mostly available as
pairwise distance data
POCD:
Probabilistic model for Overlap Clustering of
Distance data
POCD: Overlap Clustering for distance data
POCD: Overlap Clustering for distance data
POCD: Overlap Clustering for distance data
POCD: Overlap Clustering for distance data
POCD: Overlap Clustering for distance data
POCD: Overlap Clustering for distance data
POCD: Overlap Clustering for distance data
● Bayesian clustering model
● Given pairwise D, we infer Z (the cluster assignm...
POCD: Overlap Clustering for distance data
Z
● Binary matrix
● Cluster assignment
matrix
● Needs to be inferred
POCD: Overlap Clustering for distance data
● Bayesian clustering model
● Given pairwise D, we infer Z:
p(Z|D,.) ∝ p(D|Z) p...
POCD: Overlap Clustering for distance data
p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)
POCD: Overlap Clustering for distance data
Prior over Z: Indian Buffet process
● As k → infinity, we arrive at the IBP
● N...
POCD: Overlap Clustering for distance data
Invariant Likelihood: generalised Wishart
● Translation and rotation invariant
...
POCD: Overlap Clustering for distance data
Inference using Metropolis Hastings
● MCMC algorithm
● Used in models deploying...
POCD: Overlap Clustering for distance data
Inference using Metropolis Hastings
● MCMC algorithm
● Used in models deploying...
POCD: Overlap Clustering for distance data
Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Of the 26 ...
POCD: Overlap Clustering for distance data
Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Necessary ...
POCD: Overlap Clustering for distance data
Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Necessary ...
POCD: Overlap Clustering for distance data
Contact Maps of the Protease Inhibitors
POCD:
Probabilistic model for Overlap Clustering of
Distance data
Reading material
● A tutorial on Bayesian nonparametric models:
http://gershmanlab.webfactional.com/pubs/GershmanBlei12.pd...
POCD: Overlap Clustering for distance data
@sandhya212
Thank you
Upcoming SlideShare
Loading in …5
×

Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

229 views

Published on

A Bayesian Approach To Model Overlapping Objects Available As Distance Data

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

  1. 1. A Bayesian Approach to Model Overlapping Objects Available as Distance Data Sandhya Prabhakaran1 and Julia E. Vogt2,3 Memorial Sloan Kettering Cancer Centre, NYC 1 University of Basel 2 Swiss Institute of Bioinformatics 3 MLconf, NYC 29th March 2019
  2. 2. Two religions in Machine Learning Frequentists (https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)
  3. 3. Two religions in Machine Learning Frequentists Bayesians (https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)
  4. 4. Two religions in Machine Learning ● A coin toss example: 10 heads in 10 tosses (= data given) ● Frequentists: ○ Probability is a Point estimate ○ What is the relative frequency of tails = no answer
  5. 5. Two religions in Machine Learning ● A coin toss example: 10 heads in 10 tosses (= data given) ● Frequentists: ○ Probability is a Point estimate ○ What is the relative frequency of tails = no answer ● Bayesians: ○ Probability is a distribution ○ What is the relative frequency of tails = 0.5
  6. 6. Two religions in Machine Learning ● A coin toss example: 10 heads in 10 tosses (= data given) ● Frequentists: ○ Probability is a Point estimate ○ What is the relative frequency of tails = no answer ● Bayesians: ○ Probability is a distribution ○ What is the relative frequency of tails = 0.5 ○ A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. ○ More flexible: inference, thinking, planning and reasoning (downstream analyses)
  7. 7. Bayesian: Clustering
  8. 8. Bayesian: Clustering
  9. 9. Bayesian: Clustering of vectorial objects
  10. 10. Bayesian: Clustering of vectorial objects Clustering algorithm
  11. 11. Bayesian: Clustering of non-vectorial objects (Image courtesy: shutterstock)
  12. 12. Bayesian: Clustering of non-vectorial objects Mostly available as pairwise distance data
  13. 13. POCD: Probabilistic model for Overlap Clustering of Distance data
  14. 14. POCD: Overlap Clustering for distance data
  15. 15. POCD: Overlap Clustering for distance data
  16. 16. POCD: Overlap Clustering for distance data
  17. 17. POCD: Overlap Clustering for distance data
  18. 18. POCD: Overlap Clustering for distance data
  19. 19. POCD: Overlap Clustering for distance data
  20. 20. POCD: Overlap Clustering for distance data ● Bayesian clustering model ● Given pairwise D, we infer Z (the cluster assignment matrix)
  21. 21. POCD: Overlap Clustering for distance data Z ● Binary matrix ● Cluster assignment matrix ● Needs to be inferred
  22. 22. POCD: Overlap Clustering for distance data ● Bayesian clustering model ● Given pairwise D, we infer Z: p(Z|D,.) ∝ p(D|Z) p(Z) (posterior) (likelihood) (prior)
  23. 23. POCD: Overlap Clustering for distance data p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
  24. 24. POCD: Overlap Clustering for distance data Prior over Z: Indian Buffet process ● As k → infinity, we arrive at the IBP ● No need to fix the number of clusters p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
  25. 25. POCD: Overlap Clustering for distance data Invariant Likelihood: generalised Wishart ● Translation and rotation invariant p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
  26. 26. POCD: Overlap Clustering for distance data Inference using Metropolis Hastings ● MCMC algorithm ● Used in models deploying the IBP ● Asymptotically exact approximations of the posterior ● We need to infer Z and #clusters p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
  27. 27. POCD: Overlap Clustering for distance data Inference using Metropolis Hastings ● MCMC algorithm ● Used in models deploying the IBP ● Asymptotically exact approximations of the posterior ● We need to infer Z and #clusters p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
  28. 28. POCD: Overlap Clustering for distance data Clustering protein contact maps from HIV Protease inhibitors (PIs) ● Of the 26 FDA approved anti-HIV drugs: ○ 10 are PIs ● The PIs exhibit similar behaviour ○ Similar chemical structure ● Not readily available https://www.sciencedirect.com/science/article/pii/S0165614711001398
  29. 29. POCD: Overlap Clustering for distance data Clustering protein contact maps from HIV Protease inhibitors (PIs) ● Necessary to identify alternative PIs for therapy ○ What are the structural dissimilarities amongst PIs?
  30. 30. POCD: Overlap Clustering for distance data Clustering protein contact maps from HIV Protease inhibitors (PIs) ● Necessary to identify alternative PIs for therapy ○ What are the structural dissimilarities amongst PIs? ● Use Protein Contact Maps of each PI ○ Distances between all AA residue pairs for a protein ○ Row-wise vectorise the contact map ○ Compute the Normalised Information distance
  31. 31. POCD: Overlap Clustering for distance data Contact Maps of the Protease Inhibitors
  32. 32. POCD: Probabilistic model for Overlap Clustering of Distance data
  33. 33. Reading material ● A tutorial on Bayesian nonparametric models: http://gershmanlab.webfactional.com/pubs/GershmanBlei12.pdf ● Leo Breiman: ‘Statistical Modeling: The Two Cultures’: https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726 ● An abstract of this work as Spotlight at the Bayesian Nonparametrics Workshop at NeurIPS 2018: https://drive.google.com/file/d/1ExVpeUomv8Z4mPMu5as_CbmrHjVY0IDV/view ● Tutorials on latest Deep learning papers: https://www.depthfirstlearning.com/ ( @DepthFirstLearn)
  34. 34. POCD: Overlap Clustering for distance data @sandhya212 Thank you

×