Successfully reported this slideshow.
Upcoming SlideShare
×

# Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

229 views

Published on

A Bayesian Approach To Model Overlapping Objects Available As Distance Data

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data

1. 1. A Bayesian Approach to Model Overlapping Objects Available as Distance Data Sandhya Prabhakaran1 and Julia E. Vogt2,3 Memorial Sloan Kettering Cancer Centre, NYC 1 University of Basel 2 Swiss Institute of Bioinformatics 3 MLconf, NYC 29th March 2019
2. 2. Two religions in Machine Learning Frequentists (https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)
3. 3. Two religions in Machine Learning Frequentists Bayesians (https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)
4. 4. Two religions in Machine Learning ● A coin toss example: 10 heads in 10 tosses (= data given) ● Frequentists: ○ Probability is a Point estimate ○ What is the relative frequency of tails = no answer
5. 5. Two religions in Machine Learning ● A coin toss example: 10 heads in 10 tosses (= data given) ● Frequentists: ○ Probability is a Point estimate ○ What is the relative frequency of tails = no answer ● Bayesians: ○ Probability is a distribution ○ What is the relative frequency of tails = 0.5
6. 6. Two religions in Machine Learning ● A coin toss example: 10 heads in 10 tosses (= data given) ● Frequentists: ○ Probability is a Point estimate ○ What is the relative frequency of tails = no answer ● Bayesians: ○ Probability is a distribution ○ What is the relative frequency of tails = 0.5 ○ A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. ○ More flexible: inference, thinking, planning and reasoning (downstream analyses)
7. 7. Bayesian: Clustering
8. 8. Bayesian: Clustering
9. 9. Bayesian: Clustering of vectorial objects
10. 10. Bayesian: Clustering of vectorial objects Clustering algorithm
11. 11. Bayesian: Clustering of non-vectorial objects (Image courtesy: shutterstock)
12. 12. Bayesian: Clustering of non-vectorial objects Mostly available as pairwise distance data
13. 13. POCD: Probabilistic model for Overlap Clustering of Distance data
14. 14. POCD: Overlap Clustering for distance data
15. 15. POCD: Overlap Clustering for distance data
16. 16. POCD: Overlap Clustering for distance data
17. 17. POCD: Overlap Clustering for distance data
18. 18. POCD: Overlap Clustering for distance data
19. 19. POCD: Overlap Clustering for distance data
20. 20. POCD: Overlap Clustering for distance data ● Bayesian clustering model ● Given pairwise D, we infer Z (the cluster assignment matrix)
21. 21. POCD: Overlap Clustering for distance data Z ● Binary matrix ● Cluster assignment matrix ● Needs to be inferred
22. 22. POCD: Overlap Clustering for distance data ● Bayesian clustering model ● Given pairwise D, we infer Z: p(Z|D,.) ∝ p(D|Z) p(Z) (posterior) (likelihood) (prior)
23. 23. POCD: Overlap Clustering for distance data p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
24. 24. POCD: Overlap Clustering for distance data Prior over Z: Indian Buffet process ● As k → infinity, we arrive at the IBP ● No need to fix the number of clusters p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
25. 25. POCD: Overlap Clustering for distance data Invariant Likelihood: generalised Wishart ● Translation and rotation invariant p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
26. 26. POCD: Overlap Clustering for distance data Inference using Metropolis Hastings ● MCMC algorithm ● Used in models deploying the IBP ● Asymptotically exact approximations of the posterior ● We need to infer Z and #clusters p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
27. 27. POCD: Overlap Clustering for distance data Inference using Metropolis Hastings ● MCMC algorithm ● Used in models deploying the IBP ● Asymptotically exact approximations of the posterior ● We need to infer Z and #clusters p(Z|D,.) ∝ p(D|Z) p(Z) (prior)(posterior) (likelihood)
28. 28. POCD: Overlap Clustering for distance data Clustering protein contact maps from HIV Protease inhibitors (PIs) ● Of the 26 FDA approved anti-HIV drugs: ○ 10 are PIs ● The PIs exhibit similar behaviour ○ Similar chemical structure ● Not readily available https://www.sciencedirect.com/science/article/pii/S0165614711001398
29. 29. POCD: Overlap Clustering for distance data Clustering protein contact maps from HIV Protease inhibitors (PIs) ● Necessary to identify alternative PIs for therapy ○ What are the structural dissimilarities amongst PIs?
30. 30. POCD: Overlap Clustering for distance data Clustering protein contact maps from HIV Protease inhibitors (PIs) ● Necessary to identify alternative PIs for therapy ○ What are the structural dissimilarities amongst PIs? ● Use Protein Contact Maps of each PI ○ Distances between all AA residue pairs for a protein ○ Row-wise vectorise the contact map ○ Compute the Normalised Information distance
31. 31. POCD: Overlap Clustering for distance data Contact Maps of the Protease Inhibitors
32. 32. POCD: Probabilistic model for Overlap Clustering of Distance data
33. 33. Reading material ● A tutorial on Bayesian nonparametric models: http://gershmanlab.webfactional.com/pubs/GershmanBlei12.pdf ● Leo Breiman: ‘Statistical Modeling: The Two Cultures’: https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726 ● An abstract of this work as Spotlight at the Bayesian Nonparametrics Workshop at NeurIPS 2018: https://drive.google.com/file/d/1ExVpeUomv8Z4mPMu5as_CbmrHjVY0IDV/view ● Tutorials on latest Deep learning papers: https://www.depthfirstlearning.com/ ( @DepthFirstLearn)
34. 34. POCD: Overlap Clustering for distance data @sandhya212 Thank you