SlideShare a Scribd company logo
1 of 53
SGLD
STOC H ASTIC
GR AD IEN T
LAN GEVIN
DYN AMIC S
Natan.katz@gmail.com
natank@checkpoint.com
a
Natan Katz
• Algorithm researcher since my military service
• M.sc Applied math- Weizmann Inst.
• Spent one year in Goethe uni –Prof. Kloeden
• Official author in TDS
• More than 10 patents
• Currently at Checkpoint
• Fanatic fan of the Celtics and SGE – so so
Coordiantes
• Natan.katz@gmail.com
• Natank@checkpoint.com
• https://www.linkedin.com/in/natan-katz-2936425/
a
Agenda
• DL – Virtues & Vices
• Uncertainties
• Bayesian Inference
• BNN
• Langevin Dynamics
• Pytorch Optimizer
WHAT DL
DOESN’T DO WELL
A DL Scenario
• We train a CNN to identify images (men versus women)
• We train a CNN to identify images (men versus women)
• Our accuracy can be amazing (98-99%)
Let’s get Cruel
• We offer the model an image of a basketball
• The model outputs “man” or “woman”
Why is that?
Mathematical Observation
We trained a function F such that
F : {space of images}->{“man”, ”woman”}
Statistical Observation
Basketball image is out of our training data
Anecdotes
Vision (Uri Itai)
• Researchers trained a network to classify tanks and trees.
• For 200 images, the test accuracy was 100% .
• In a Pentagon test the model suffered a significant performances’ decay.
The reason was that all the tank images were taken in cloudy days whereas trees in
sunny.
Text
• In text problems, often models focus on words rather extracting latent phenomena
Why Can’t we handle this ?
DL Training’s Vices
In practice
NN can't say “I don’t know”
NN can't provide a protocol upon its decision
• Models cannot “report” about their uncertainties
• Pointwise training may lead to overfitting
Is it crucial ?
• Medical- An engine that decides whether a tumor is malignant or benign
• Autonomous vehicle - Actions that are taken upon DL model
• High frequency trading
A potential solution - Run Inference with dropout
STATISTICS
UNCERTAINTY
TYPES
Epistemic Uncertainty
Episteme= Knowledge
Uncertainty that can be reduced:
• Improving model’s structure
• Adding data
Episteme= Knowledge
Uncertainty that can be reduced:
• Improving model’s structure
• Adding data
Aleatoric Uncertainty :
Aleator = Dice Player
Uncertainty that we cannot reduce:
• Stochasticity of a dice
• Noisy Labels
STATISTICAL
INFERENCE
• Evidence – The observed data
• Hypothesis – The latent variables
• P(Z|X) – Posterior dist.
Important Terms
𝑃 (𝜃|X) =
𝑃𝑟(𝜃)𝑃𝑙(x|𝜃)
𝑃(𝑋)
-Posterior dist.
• We wish
𝑃(𝜃|X) α 𝑃𝑙 𝑋 𝜃 𝑃𝑟(𝜃).
(posterior is prop to the product of prior and liklihood)
Statistical Inference Comparison
FR EQU EN STIST
• No prior knowledfe
• Parameters are unknown but fixed- No
probablity
• Only the data is r..v.
• TrainingMLE - P(data| Θ)
 Neural net are frenquetist entites.
(are they?)
BAYESIAN – ART OF BELIEF
• Previous trials are used as a prior
knowledge
• Used data is integrated into params dist .
• Parameters have probabiity we learn :
• Training. MAP - P( Θ | data)
Need a Bayesian analogue
Frequentist -MLE
Bayessian - MAP
BAYESIAN NEURAL NETWORKS
(BNN)
𝑢1
𝑥4
𝑥5
𝑥6
𝑥7
𝑧0
DL - Pointwise Learning
𝑢3
𝑢2
𝑢1
𝑥4
𝑥5
𝑥6
𝑥7
𝑧0
BNN – Posterior Training (𝞗- Training)
𝑢3
𝑢2
𝞿 𝞗
𝑤𝑖 ~𝞿 𝞗
UNCERTAINTY
ESTIMATION
BNN- Prediction
Predictive Porbabnility for training data D
Perform Droput during the inference
Uncertainy -Estimators
We calculate the prediction’s variance :
• The first term is the epistemic uncertainty – Variance of means
• The second term is the aleatoric uncertainty- Mean of variance
BNN EXAMPLE
HTTPS://TOWARDSDATASCIENCE.COM/MAKING -
YOUR-NEURAL-NETWORK-SAY-I-DONT-KNOW-
BAYESIAN-NNS-USING-PYRO-AND-PYTORCH-
B1C24E6AB8CD
Problem’s Framework
• MNIST CNN model
SOTA ~99.8%
Experiment's settings
• Vanilla trial ~Accuracy 88%
BNN Training
• We use VI with small amount of epochs
For each image in the test :
• Sample 100 networks
• We obtain 100 outputs per image
• We have 10 digits each with 100 scores
• If the median of these 100 scores>0.2 we take
(Indeed, we can accept more than a single result)
Results
On 10000 images:
• The network refused to decide on 1250 images
• On the rest of 8750 images the accuracy was 96%
• On A random data the network refused to answer on 95% of the
images
random
Hisotgrma of Log prob
Not-mnist alphaphabet letters
Refused iamge-mnist 2
Refused iamge-mnist 2
THE MATH BEHIND
CLASSICAL BAYSIAN Q.
Random Process
A function X on the pair (w, t) where :
• w - An outcome of a draw
• t – time index
• If w is fixed- X is a continuous function
• If t is fixed- X is a random variable
Robbins & Monro (Stoch. Approx. 1951)
• An unknown function F and a number θ satisfies:
F(𝑥∗ ) = θ
Y - A measurable r. v. s,t E[Y(x)] = F(x)
How do we Obtain Posterior ?
1. Variational Inference
2. MCMC –Sampling (Metropolis –Hastings, Gibbs)
3. HMC
4. SGLD
Metropolis- Hastings
MH- Properties
• Unbiased, Huge variance and very slow (iterate over the entire data)
• Great History
HMC (Duane 1987, Neal 1995)
• Fatster than MH
• It reaches to a low-density domains
• No minibatches- Need to calulae many graidenst and many accept- reject
SGLD
STOCHASTIC GRADIENT
LANGEVIN DYNAMIC
THE NERDS’ PART
Objectives
Numeric as DL Stochastic as Bayes
• We want a tool that :
• Allows mini btaches
• Good at finding distibutions for sampling weights
• Reduce Overfitting
Physics
Langevin Equation describes the motion of pollen grain in water:
F ~N(0, 𝑡). Brownian Force – collisions with molecules
This equation is an SDE : its solution is a random process
Overdamped Eq & ML (W &T 2011)
F
F(x) -γ𝑣𝑡 +ξ𝑡=0 ξ𝑡 ~N(0, 𝑡). (Brownian Dynamics, molecules d.)
F(x)=𝛻𝐸(x) 𝑣𝑡 =
𝑑𝑋
𝑑𝑡
Discretization
𝑥𝑡+1 = 𝑥𝑡 + dt(𝛻𝐸(x)+ ξ𝑡)
𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡 𝛻𝐸 +ε𝑡 ε𝑡 -Stochastic term
SGD- Let’s Batch !! (Welling & The 2011)
Denote . 𝛻𝐸𝑚𝑏 + u𝑡 =𝛻𝐸. .u𝑡 ~ N(0, V) V bounded
𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡(𝛻𝐸𝑚𝑏 + u𝑡 ) +ε𝑡
• Ignore the stochastic term ε𝑡
𝑃 (𝜃|X) = 𝑒−𝐸 𝑥
R & M -> MAP solution
The Langevin Term. ε𝑡
• We wish to avoid MAP collapsing as we want to exploit (we are
Bayesians)
• We can tweak variances For this purpose !
ε𝑡 ~N(0, 𝜎 )
What is ) σ ?
We need it to create a bigger variance than the SGD
term
If SGD’s variance goes as LR square , we can take LR
The Solution of W & T
• Welling & Teh 2011
Gal
Yarin Gal (2015) –BNN
https://javierantoran.github.io/assets/pdf/poster_advml.pdf
The Associated Langevin
𝑤 = F(w) + ξ𝑡 (Overdamped Langevin on W)
ε𝑡 ~N(0,, 𝜎 )
F(w) = 𝛻𝐸(w)
SGLD EXAMPLE.
H T T P S : / / H E N R I PA L . G I T H U B . I O / B L O G / L A N G E V I N
SGLD-Optimizer
SGLD-Optimizer
NotMnist Measuring the prob (model trained on MNIST)
https://github.com/henripal/sgld/blob/master/nbs/mnist.ipynb
Weight’s Variance CNN on MNIST
Bibliograpy
• https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch-b1c24e6ab8cd
• https://henripal.github.io/blog/langevin
• https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf
• https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
• https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py
• https://github.com/henripal/sgld/blob/master/sgld/sgld/eval.py
• https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf
• https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://arxiv.org/pdf/1710.07283.pdf
• https://arxiv.org/pdf/1710.07283.pdf
Bibliograpy
• https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch-
b1c24e6ab8cd
• https://github.com/noahgolmant/SGLD
• https://henripal.github.io/blog/langevin
• https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
• https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py
• https://javierantoran.github.io/assets/pdf/poster_advml.pdf
• https://www.cs.toronto.edu/~duvenaud/distill_bayes_net/public/
• https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf
• https://javierantoran.github.io/assets/pdf/poster_advml.pdf
• http://physics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren/kap6.pdf
THANK YOU

More Related Content

Similar to SGLD Berlin ML GROUP

Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfSeth Juarez
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
AP Advantage: AP Calculus
AP Advantage: AP CalculusAP Advantage: AP Calculus
AP Advantage: AP CalculusShashank Patil
 
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptKamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppttaoufikakabli1
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by KodebayKodebay
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstonsandymartin
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxkrunal soni
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic AlgorithmSHIMI S L
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 

Similar to SGLD Berlin ML GROUP (20)

Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
AP Advantage: AP Calculus
AP Advantage: AP CalculusAP Advantage: AP Calculus
AP Advantage: AP Calculus
 
1019Lec1.ppt
1019Lec1.ppt1019Lec1.ppt
1019Lec1.ppt
 
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptKamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
naive bayes example.pdf
naive bayes example.pdfnaive bayes example.pdf
naive bayes example.pdf
 
naive bayes example.pdf
naive bayes example.pdfnaive bayes example.pdf
naive bayes example.pdf
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
26 assumptions
26 assumptions26 assumptions
26 assumptions
 
Variational inference
Variational inference  Variational inference
Variational inference
 
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL DivergenceNatan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 

More from Natan Katz (14)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 

Recently uploaded

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 

Recently uploaded (20)

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 

SGLD Berlin ML GROUP

  • 1. SGLD STOC H ASTIC GR AD IEN T LAN GEVIN DYN AMIC S Natan.katz@gmail.com natank@checkpoint.com
  • 2. a Natan Katz • Algorithm researcher since my military service • M.sc Applied math- Weizmann Inst. • Spent one year in Goethe uni –Prof. Kloeden • Official author in TDS • More than 10 patents • Currently at Checkpoint • Fanatic fan of the Celtics and SGE – so so Coordiantes • Natan.katz@gmail.com • Natank@checkpoint.com • https://www.linkedin.com/in/natan-katz-2936425/
  • 3. a Agenda • DL – Virtues & Vices • Uncertainties • Bayesian Inference • BNN • Langevin Dynamics • Pytorch Optimizer
  • 5. A DL Scenario • We train a CNN to identify images (men versus women) • We train a CNN to identify images (men versus women) • Our accuracy can be amazing (98-99%)
  • 6. Let’s get Cruel • We offer the model an image of a basketball • The model outputs “man” or “woman”
  • 7. Why is that? Mathematical Observation We trained a function F such that F : {space of images}->{“man”, ”woman”} Statistical Observation Basketball image is out of our training data
  • 8. Anecdotes Vision (Uri Itai) • Researchers trained a network to classify tanks and trees. • For 200 images, the test accuracy was 100% . • In a Pentagon test the model suffered a significant performances’ decay. The reason was that all the tank images were taken in cloudy days whereas trees in sunny. Text • In text problems, often models focus on words rather extracting latent phenomena
  • 9. Why Can’t we handle this ? DL Training’s Vices In practice NN can't say “I don’t know” NN can't provide a protocol upon its decision • Models cannot “report” about their uncertainties • Pointwise training may lead to overfitting
  • 10. Is it crucial ? • Medical- An engine that decides whether a tumor is malignant or benign • Autonomous vehicle - Actions that are taken upon DL model • High frequency trading A potential solution - Run Inference with dropout
  • 13. Epistemic Uncertainty Episteme= Knowledge Uncertainty that can be reduced: • Improving model’s structure • Adding data Episteme= Knowledge Uncertainty that can be reduced: • Improving model’s structure • Adding data
  • 14. Aleatoric Uncertainty : Aleator = Dice Player Uncertainty that we cannot reduce: • Stochasticity of a dice • Noisy Labels
  • 16. • Evidence – The observed data • Hypothesis – The latent variables • P(Z|X) – Posterior dist.
  • 17. Important Terms 𝑃 (𝜃|X) = 𝑃𝑟(𝜃)𝑃𝑙(x|𝜃) 𝑃(𝑋) -Posterior dist. • We wish 𝑃(𝜃|X) α 𝑃𝑙 𝑋 𝜃 𝑃𝑟(𝜃). (posterior is prop to the product of prior and liklihood)
  • 18. Statistical Inference Comparison FR EQU EN STIST • No prior knowledfe • Parameters are unknown but fixed- No probablity • Only the data is r..v. • TrainingMLE - P(data| Θ)  Neural net are frenquetist entites. (are they?) BAYESIAN – ART OF BELIEF • Previous trials are used as a prior knowledge • Used data is integrated into params dist . • Parameters have probabiity we learn : • Training. MAP - P( Θ | data) Need a Bayesian analogue
  • 22. 𝑢1 𝑥4 𝑥5 𝑥6 𝑥7 𝑧0 BNN – Posterior Training (𝞗- Training) 𝑢3 𝑢2 𝞿 𝞗 𝑤𝑖 ~𝞿 𝞗
  • 24. BNN- Prediction Predictive Porbabnility for training data D Perform Droput during the inference
  • 25. Uncertainy -Estimators We calculate the prediction’s variance : • The first term is the epistemic uncertainty – Variance of means • The second term is the aleatoric uncertainty- Mean of variance
  • 27. Problem’s Framework • MNIST CNN model SOTA ~99.8%
  • 28. Experiment's settings • Vanilla trial ~Accuracy 88% BNN Training • We use VI with small amount of epochs For each image in the test : • Sample 100 networks • We obtain 100 outputs per image • We have 10 digits each with 100 scores • If the median of these 100 scores>0.2 we take (Indeed, we can accept more than a single result)
  • 29. Results On 10000 images: • The network refused to decide on 1250 images • On the rest of 8750 images the accuracy was 96% • On A random data the network refused to answer on 95% of the images
  • 31. Not-mnist alphaphabet letters Refused iamge-mnist 2 Refused iamge-mnist 2
  • 33. Random Process A function X on the pair (w, t) where : • w - An outcome of a draw • t – time index • If w is fixed- X is a continuous function • If t is fixed- X is a random variable
  • 34. Robbins & Monro (Stoch. Approx. 1951) • An unknown function F and a number θ satisfies: F(𝑥∗ ) = θ Y - A measurable r. v. s,t E[Y(x)] = F(x)
  • 35. How do we Obtain Posterior ? 1. Variational Inference 2. MCMC –Sampling (Metropolis –Hastings, Gibbs) 3. HMC 4. SGLD
  • 37. MH- Properties • Unbiased, Huge variance and very slow (iterate over the entire data) • Great History HMC (Duane 1987, Neal 1995) • Fatster than MH • It reaches to a low-density domains • No minibatches- Need to calulae many graidenst and many accept- reject
  • 39. Objectives Numeric as DL Stochastic as Bayes • We want a tool that : • Allows mini btaches • Good at finding distibutions for sampling weights • Reduce Overfitting
  • 40. Physics Langevin Equation describes the motion of pollen grain in water: F ~N(0, 𝑡). Brownian Force – collisions with molecules This equation is an SDE : its solution is a random process
  • 41. Overdamped Eq & ML (W &T 2011) F F(x) -γ𝑣𝑡 +ξ𝑡=0 ξ𝑡 ~N(0, 𝑡). (Brownian Dynamics, molecules d.) F(x)=𝛻𝐸(x) 𝑣𝑡 = 𝑑𝑋 𝑑𝑡 Discretization 𝑥𝑡+1 = 𝑥𝑡 + dt(𝛻𝐸(x)+ ξ𝑡) 𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡 𝛻𝐸 +ε𝑡 ε𝑡 -Stochastic term
  • 42. SGD- Let’s Batch !! (Welling & The 2011) Denote . 𝛻𝐸𝑚𝑏 + u𝑡 =𝛻𝐸. .u𝑡 ~ N(0, V) V bounded 𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡(𝛻𝐸𝑚𝑏 + u𝑡 ) +ε𝑡 • Ignore the stochastic term ε𝑡 𝑃 (𝜃|X) = 𝑒−𝐸 𝑥 R & M -> MAP solution
  • 43. The Langevin Term. ε𝑡 • We wish to avoid MAP collapsing as we want to exploit (we are Bayesians) • We can tweak variances For this purpose ! ε𝑡 ~N(0, 𝜎 ) What is ) σ ? We need it to create a bigger variance than the SGD term If SGD’s variance goes as LR square , we can take LR
  • 44. The Solution of W & T • Welling & Teh 2011 Gal
  • 45. Yarin Gal (2015) –BNN https://javierantoran.github.io/assets/pdf/poster_advml.pdf The Associated Langevin 𝑤 = F(w) + ξ𝑡 (Overdamped Langevin on W) ε𝑡 ~N(0,, 𝜎 ) F(w) = 𝛻𝐸(w)
  • 46. SGLD EXAMPLE. H T T P S : / / H E N R I PA L . G I T H U B . I O / B L O G / L A N G E V I N
  • 49. NotMnist Measuring the prob (model trained on MNIST) https://github.com/henripal/sgld/blob/master/nbs/mnist.ipynb
  • 51. Bibliograpy • https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch-b1c24e6ab8cd • https://henripal.github.io/blog/langevin • https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf • https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py • https://github.com/henripal/sgld/blob/master/sgld/sgld/eval.py • https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://arxiv.org/pdf/1710.07283.pdf • https://arxiv.org/pdf/1710.07283.pdf
  • 52. Bibliograpy • https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch- b1c24e6ab8cd • https://github.com/noahgolmant/SGLD • https://henripal.github.io/blog/langevin • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf • https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py • https://javierantoran.github.io/assets/pdf/poster_advml.pdf • https://www.cs.toronto.edu/~duvenaud/distill_bayes_net/public/ • https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf • https://javierantoran.github.io/assets/pdf/poster_advml.pdf • http://physics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren/kap6.pdf