최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
In this presentation, we provide a quick intro do bayesian inference, Gaussian Processes and then later relate to the latest state of the art research on Bayesian Deep Learning, in order to include uncertainty in deep neural net predictions
With the advent of Deep Learning (DL), the field of AI made a giant leap forward and it is nowadays applied in many industrial use-cases. Especially critical systems like autonomous driving, require that DL methods not only produce a prediction but also state the certainty about the prediction in order to assess risks and failure.
In my talk, I will give an introduction to different kinds of uncertainty, i.e. epistemic and aleatoric. To have a baseline for comparison, the classical method of Gaussian Processes for regression problems is presented. I then elaborate on different DL methods for uncertainty quantification like Quantile Regression, Monte-Carlo Dropout, and Deep Ensembles. The talk is concluded with a comparison of these techniques to Gaussian Processes and the current state of the art.
In this presentation, we provide a quick intro do bayesian inference, Gaussian Processes and then later relate to the latest state of the art research on Bayesian Deep Learning, in order to include uncertainty in deep neural net predictions
With the advent of Deep Learning (DL), the field of AI made a giant leap forward and it is nowadays applied in many industrial use-cases. Especially critical systems like autonomous driving, require that DL methods not only produce a prediction but also state the certainty about the prediction in order to assess risks and failure.
In my talk, I will give an introduction to different kinds of uncertainty, i.e. epistemic and aleatoric. To have a baseline for comparison, the classical method of Gaussian Processes for regression problems is presented. I then elaborate on different DL methods for uncertainty quantification like Quantile Regression, Monte-Carlo Dropout, and Deep Ensembles. The talk is concluded with a comparison of these techniques to Gaussian Processes and the current state of the art.
Predictive uncertainty of deep models and its applicationsNAVER Engineering
발표자: 이기민(KAIST 박사과정)
발표일: 2018.4.
The predictive uncertainty (e.g., entropy of softmax distribution of a deep classifier) is indispensable as it is useful in many machine learning applications (e.g., active learning and ensemble learning) as well as when deploying the trained model in real-world systems. In order to improve the quality of the predictive uncertainty, we proposed a novel loss function for training deep models (ICLR 2018). We showed that confidence deep models trained by our method can be very useful in various machine learning applications such as novelty detection (CVPR 2018) and ensemble learning (ICML 2017).
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Uncertainty in Deep Learning, Gal (2016)
Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, McClure & Kriegeskorte (2017)
Uncertainty-Aware Reinforcement Learning from Collision Avoidance, Khan et al. (2016)
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, Lakshminarayanan et al. (2017)
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, Kendal & Gal (2017)
Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, Choi et al. (2017)
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks, Anonymous (2018)
Slides for a talk about Graph Neural Networks architectures, overview taken from very good paper by Zonghan Wu et al. (https://arxiv.org/pdf/1901.00596.pdf)
A brief presentation given on the basics of Ensemble Methods. Given as a 'Lightning Talk' during the 7th Cohort of General Assembly's Data Science Immersive Course
In this presentation is given an introduction to Bayesian networks and basic probability theory. Graphical explanation of Bayes' theorem, random variable, conditional and joint probability. Spam classifier, medical diagnosis, fault prediction. The main software for Bayesian Networks are presented.
This Logistic Regression Presentation will help you understand how a Logistic Regression algorithm works in Machine Learning. In this tutorial video, you will learn what is Supervised Learning, what is Classification problem and some associated algorithms, what is Logistic Regression, how it works with simple examples, the maths behind Logistic Regression, how it is different from Linear Regression and Logistic Regression applications. At the end, you will also see an interesting demo in Python on how to predict the number present in an image using Logistic Regression.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. What is supervised learning?
2. What is classification? what are some of its solutions?
3. What is logistic regression?
4. Comparing linear and logistic regression
5. Logistic regression applications
6. Use case - Predicting the number in an image
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
K-Nearest neighbor is one of the most commonly used classifier based in lazy learning. It is one of the most commonly used methods in recommendation systems and document similarity measures. It mainly uses Euclidean distance to find the similarity measures between two data points.
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
For companies that solve real-world problems and generate revenue from the data science products, being able to understand why a model makes a certain prediction can be as crucial as achieving high prediction accuracy in many applications. However, as data scientists pursuing higher accuracy by implementing complex algorithms such as ensemble or deep learning models, the algorithm itself becomes a blackbox and it creates the trade-off between accuracy and interpretability of a model’s output.
To address this problem, a unified framework SHAP (SHapley Additive exPlanations) was developed to help users interpret the predictions of complex models. In this session, we will talk about how to apply SHAP to various modeling approaches (GLM, XGBoost, CNN) to explain how each feature contributes and extract intuitive insights from a particular prediction. This talk is intended to introduce the concept of general purpose model explainer, as well as help practitioners understand SHAP and its applications.
Predictive uncertainty of deep models and its applicationsNAVER Engineering
발표자: 이기민(KAIST 박사과정)
발표일: 2018.4.
The predictive uncertainty (e.g., entropy of softmax distribution of a deep classifier) is indispensable as it is useful in many machine learning applications (e.g., active learning and ensemble learning) as well as when deploying the trained model in real-world systems. In order to improve the quality of the predictive uncertainty, we proposed a novel loss function for training deep models (ICLR 2018). We showed that confidence deep models trained by our method can be very useful in various machine learning applications such as novelty detection (CVPR 2018) and ensemble learning (ICML 2017).
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Uncertainty in Deep Learning, Gal (2016)
Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, McClure & Kriegeskorte (2017)
Uncertainty-Aware Reinforcement Learning from Collision Avoidance, Khan et al. (2016)
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, Lakshminarayanan et al. (2017)
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, Kendal & Gal (2017)
Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, Choi et al. (2017)
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks, Anonymous (2018)
Slides for a talk about Graph Neural Networks architectures, overview taken from very good paper by Zonghan Wu et al. (https://arxiv.org/pdf/1901.00596.pdf)
A brief presentation given on the basics of Ensemble Methods. Given as a 'Lightning Talk' during the 7th Cohort of General Assembly's Data Science Immersive Course
In this presentation is given an introduction to Bayesian networks and basic probability theory. Graphical explanation of Bayes' theorem, random variable, conditional and joint probability. Spam classifier, medical diagnosis, fault prediction. The main software for Bayesian Networks are presented.
This Logistic Regression Presentation will help you understand how a Logistic Regression algorithm works in Machine Learning. In this tutorial video, you will learn what is Supervised Learning, what is Classification problem and some associated algorithms, what is Logistic Regression, how it works with simple examples, the maths behind Logistic Regression, how it is different from Linear Regression and Logistic Regression applications. At the end, you will also see an interesting demo in Python on how to predict the number present in an image using Logistic Regression.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. What is supervised learning?
2. What is classification? what are some of its solutions?
3. What is logistic regression?
4. Comparing linear and logistic regression
5. Logistic regression applications
6. Use case - Predicting the number in an image
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
K-Nearest neighbor is one of the most commonly used classifier based in lazy learning. It is one of the most commonly used methods in recommendation systems and document similarity measures. It mainly uses Euclidean distance to find the similarity measures between two data points.
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
For companies that solve real-world problems and generate revenue from the data science products, being able to understand why a model makes a certain prediction can be as crucial as achieving high prediction accuracy in many applications. However, as data scientists pursuing higher accuracy by implementing complex algorithms such as ensemble or deep learning models, the algorithm itself becomes a blackbox and it creates the trade-off between accuracy and interpretability of a model’s output.
To address this problem, a unified framework SHAP (SHapley Additive exPlanations) was developed to help users interpret the predictions of complex models. In this session, we will talk about how to apply SHAP to various modeling approaches (GLM, XGBoost, CNN) to explain how each feature contributes and extract intuitive insights from a particular prediction. This talk is intended to introduce the concept of general purpose model explainer, as well as help practitioners understand SHAP and its applications.
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
MASSICCC - Une plateforme SaaS pour le traitement de la classification de données complexes hétérogènes et incomplètes.
Dans ce Tech Talk venez découvrir, tester et apprendre à maîtriser MASSICCC (Massive clustering in cloud computing) une plateforme SaaS orientée utilisateurs, ainsi que ses trois familles d’algorithmes de #classification, fruits des dernières avancées des équipes de recherche Modal & Celeste de Inria, pour analyser et faire de l’apprentissage sur vos "Big Data" (ex : en immobilier, maintenance prédictive, santé, open data, etc. ).
MASSICCC c’est aussi :
- Un accès gratuit pour le test et la recherche sur https://massiccc.lille.inria.fr
- Un "one for all" de la classification
- Une forte interprétabilité des résultats (avec ses graphiques)
- Un mode SaaS qui vous permet un suivi des expériences (en cours ou terminées)
- Et des algorithmes open source qui sont réutilisables indépendamment.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
After we applied the stochastic Galerkin method to solve stochastic PDE, and solve large linear system, we obtain stochastic solution (random field), which is represented in Karhunen Loeve and PCE basis. No sampling error is involved, only algebraic truncation error. Now we would like to escape classical MCMC path to compute the posterior. We develop an Bayesian* update formula for KLE-PCE coefficients.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
3. Classic Deep Learning
∙ A classification model is expressed as f(x) = p(y ∈ c|x, θ)
”The probability that y belongs to the class c predicted from the
observation x”
∙ Training a model is defined as θ∗
= arg minθ
1
N
∑N
i L(xi, yi, θ)
”Finding the parameter θ∗
that minimizes the loss metric L”
2
4. Likelihood
A dataset is denoted as {(x, y)} = D
L(D, θ) = − log p(D|θ)
∙ How likely is the distribution p to fit the data.
∙ minimizing L is maximum likelihood estimation (MLE)
∙ The log negative probability density function (PDF) of p is often
used as MLE
∙ binary cross entropy (BCE) loss
∙ Ordinary Least Squares (OLS) loss
3
8. Regularized Log Likelihood
L(D, θ) = −(log p(D|θ) + logp(θ))
∙ The use of Bayes’ rule to incorporate ’prior knowledge’ into the
problem
∙ Also called maximum a posteriori estimation (MAP)
p(θ|D) =
p(D|θ)p(θ)
p(D)
∝ p(D|θ)p(θ)
L(x, y, θ) = − log p(θ|D)
∝ − log (p(D|θ)p(θ))
= −(log p(D|θ) + logp(θ))
7
9. MAP and MLE Estimation
θ∗
MAP = arg min
θ
[− log p(D|θ) − logp(θ)]
θ∗
MLE = arg min
θ
[− log p(D|θ)]
∙ MLE and MAP estimation only estimate a fixed θ
∙ The resulting predictions are a fixed probability value
∙ In reality, θ might be better expressed as a ’distribution’
f(x) = p(y|xθ∗
MAP) ∈ R
8
10. Bayesian Inference
Eθ[ p(y|x, D) ] =
∫
p(y|x, D, θ)p(θ|D)dθ
∙ Integrating across all probable values of θ (Marginalization)
∙ Solving the integral treats θ as a distribution
∙ For a typical modern deep learning network, θ ∈ R1000000...
∙ Integrating for all possible values of θ is intractable (impossible)
9
11. Bayesian Methods
Instead of directly solving the integral,
p(y|x, D) =
∫
p(y|x, D, θ)p(θ|D)dθ
we approximate the integral and compute
∙ The expectation E[ p(y|x, D) ]
∙ The variance V[ p(y|x, D) ]
using...
∙ Monte Carlo Sampling
∙ Variational Inference (VI)
10
12. Output Distribution
Predicted distribution of p(y|x, D) can be visualized as
∙ Grey region is the confidence interval computed from V[ p(y|x, D) ]
∙ Blue line is the mean of the prediction E[ p(y|x, D) ]
11
13. Why Bayesian Inference?
Modelling uncertainty is becoming important in failure critical
domains
∙ Autonomous driving
∙ Medical diagnostics
∙ Algorithmic stock trading
∙ Public security
12
14. Decision Boundary and Misprediction
∙ MLE and MAP estimations lead to a fixed decision boundary
∙ ’Distant samples’ are often mispredicted with very high confidence
∙ Learning a ’distribution’ can fix this problem
13
15. Adversarial Attacks
∙ Changing even a single pixel can lead to misprediction
∙ These mispredictions have a very high confidence
2
2Su, Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One pixel attack for
fooling deep neural networks.” arXiv preprint arXiv:1710.08864 (2017).
14
16. Autonomous Driving
3
3Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep
learning for computer vision?.” Advances in neural information processing systems.
2017. 15
17. Monte Carlo Intergration
p(y|x, D) =
∫
p(y|x, D, θ)p(θ|D)dθ
≈
1
S
S∑
s=0
p(y|x, D, θs)
where θs are samples from p(θ|D)
∙ Samples are directly pulled from p(θ|D)
∙ In case sampling from p is not possible, use MCMC
16
19. Variational Inference
∙ Variational Inference converts an inference problem into an
optimization problem.
∙ instead of using a complicated distribution such as p(θ | D) we
find a tractable approximation q(θ, λ) parameterized with λ
∙ This is equivalent to minimizing the KL divergence of p and q
∙ Using a distribution q very different to p leads to bad solutions
minimize
λ
KL(q(x; λ) || p(x))
18
21. Evidence Lower Bound (ELBO)
Because of the evidence term p(D) is intractable, optimizing the KL
divergence directly is hard.
However By reformulating the problem,
KL(q(θ; λ)||p(θ|D)) = Eq[− log p(θ, D) + log q(θ; p)] + log p(D)
log p(D) = KL(q(θ; λ)||p(θ|D)) − Eq[− log p(θ, D) + log q(θ; λ)]
log p(D) ≥ Eq[log p(θ, D) − log q(θ; λ)]
∵ KL(q(θ, λ)||p(θ|D)) ≥ 0
20
22. Evidence Lower Bound (ELBO)
maximizeλ L[q(θ; λ)] = Eq[log p(θ, D) − log q(θ; λ)]
∙ Maximizing the evidence lower bound is equivalent of minimizing
the KL divergence
∙ ELBO and KL divergence become equal at the optimum
21
24. Dropout Regularization
∙ Very popular deep learning regularization method before batch
normalization (9000 citations!)
∙ Make weight Wij = 0 following a Bernoulli(p) distribution
4
4Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting.” The Journal of Machine Learning Research 15.1 (2014): 1929-1958. 23
26. Dropout As Variational Approximation
Solving MLE or MAP using dropout is variational inference.
Yarin Gal, PhD Thesis, 2016
The distribution of the weights p(W|D) is approximated using q(p, W)
q(p) is the distribution of the weight W with dropout applied
yi = (Wiyi−1 + bi) ri where ri ∼ Bern(p)
Since L2 loss and L2 regularization assumes W ∼ N(µ, σ2
), the
resulting distribution q is,
q(Wij; p) ∼ p N(µij, σ2
ij) + (1 − p) N(0, σ2
ij)
25
27. Dropout As Variational Approximation
Since the ELBO is given as,
maximizeW,p L[q(W; p)]
= Eq[ log p(W, D) − log q(W; p) ]
∝ Eq[ log p(W|D) −
p
2
|| W ||2
2 ]
=
1
N
N∑
i∈D
log p(W|xi, yi) −
p
2σ2
|| W ||2
2
is the optimization objective.
∙ if p approaches 1 or 0, q(W; p) becomes a constant distribution.
26
28. Monte Carlo Inference
Eθ[ p(y|x, D)] =
∫
p(y|x, D, θ)p(θ)dθ
≈
∫
p(y|x, D, θ)q(θ; p)dθ
= Eq[p(y|x, D)]
≈
1
T
T∑
t
p(y|x, D, θt) θt ∼ q(θ; p)
∙ Prediction is done with dropout turned on and averaging multiple
evaluations.
∙ This is equivalent to monte carlo integration by sampling from the
variational distribution.
27
29. Monte Carlo Inference
Vθ[ p(y|x, D)] ≈
1
S
S∑
s
( p(y|x, D, θs) − Eθ[p(y|x, D)] )2
Uncertainty is the variance of the samples taken from the variational
distribution.
28
30. Monte Carlo Dropout
Examples from the mauna loa CO2 dataset 6
6Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning.” ICML 2016.
29
31. Monte Carlo Dropout Example
Prediction using only 10 samples 7
7Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning.” ICML 2016.
30
32. Monte Carlo Dropout Example
Semantic class segmentation 8
8Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep
learning for computer vision?.” NIPS 2017.
31
33. Monte Carlo Dropout Example
Spatial depth regression 9
9Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep
learning for computer vision?.” NIPS 2017.
32
34. Medical Diagnostics Example
∙ Green: True positive, Red: False Positive
10
10DeVries, Terrance, and Graham W. Taylor. ”Leveraging Uncertainty Estimates for
Predicting Segmentation Quality.” arXiv preprint arXiv:1807.00502 (2018).
33
35. Medical Diagnostics Example
11
∙ Green: True positive, Blue: False Negative
11DeVries, Terrance, and Graham W. Taylor. ”Leveraging Uncertainty Estimates for
Predicting Segmentation Quality.” arXiv:1807.00502 (2018).
34
36. Possible Medical Applications
∙ Statistically correct uncertainty quantification
∙ Bandit setting clinical treatment planning (reinforcement learning)
35
37. Possible Applications: Bandit Setting
Maximizing outcome from multiple slot machines
with estimated distribution.
36
38. Possible Applications: Bandit Setting
Highest predicted outcome? or Lowest prediction uncertainty?
Choose highest predicted outcome? or explore more samples?
(Exploitation-exploration tradeoff)
37
39. Mice Skin Tumor Treatment
Mice with induced cancer tumors.
Treatment options:
∙ No threatment
∙ 5-FU (100mg/kg)
∙ imiquimod (8mg/kg)
∙ combination of imiquimod and 5-FU 38
40. Upper Confidence Bound
Treatment selection policy
at = arg max
a∈A
[µa(xt) + βσ2
a(xt)]
Quality measure
R(T) =
T∑
t
[max
a∈A
µa(xt) − µa(xt)]
where A is the set of possible treatments
µ(x), σ2
(x) is the predicted mean, variance at x
39
41. Upper Confidence Bound
Treatment based on a Bayesian method (Gaussian Process) lead to
longest life expectancy.
12
12Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo
Carcinogenesis, A. Durand, C. Achilleos, D. Iacovides, K. Strati, G. D. Mitsis, and J.
Pineau, MLHC 2018
40
42. References
∙ Murphy, Kevin P. ”Machine learning: a probabilistic perspective.”
(2012).
∙ Yarin Gal, ”Uncertainty in Deep Learning”, Ph.D Thesis (2016)
∙ Blundell, Charles, et al. ”Weight uncertainty in neural networks.”
arXiv preprint arXiv:1505.05424 (2015).
∙ Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian
approximation: Representing model uncertainty in deep learning.”
international conference on machine learning. 2016.
∙ Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in
bayesian deep learning for computer vision?.” Advances in neural
information processing systems. 2017.
41
43. References
∙ Leibig, Christian, et al. ”Leveraging uncertainty information from
deep neural networks for disease detection.” Scientific reports 7.1
(2017): 17816.
∙ Contextual Bandits for Adapting Treatment in a Mouse Model of de
Novo Carcinogenesis A. Durand, C. Achilleos, D. Iacovides, K. Strati,
G. D. Mitsis, and J. Pineau Machine Learning for Healthcare
Conference (MLHC)
∙ Su, Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One
pixel attack forfooling deep neural networks.” arXiv preprint
arXiv:1710.08864 (2017).
42