SlideShare a Scribd company logo
SGLD
STOC H ASTIC
GR AD IEN T
LAN GEVIN
DYN AMIC S
Natan.katz@gmail.com
natank@checkpoint.com
a
Natan Katz
• Algorithm researcher since my military service
• M.sc Applied math- Weizmann Inst.
• Spent one year in Goethe uni –Prof. Kloeden
• Official author in TDS
• More than 10 patents
• Currently at Checkpoint
• Fanatic fan of the Celtics and SGE – so so
Coordiantes
• Natan.katz@gmail.com
• Natank@checkpoint.com
• https://www.linkedin.com/in/natan-katz-2936425/
a
Agenda
• DL – Virtues & Vices
• Uncertainties
• Bayesian Inference
• BNN
• Langevin Dynamics
• Pytorch Optimizer
WHAT DL
DOESN’T DO WELL
A DL Scenario
• We train a CNN to identify images (men versus women)
• We train a CNN to identify images (men versus women)
• Our accuracy can be amazing (98-99%)
Let’s get Cruel
• We offer the model an image of a basketball
• The model outputs “man” or “woman”
Why is that?
Mathematical Observation
We trained a function F such that
F : {space of images}->{“man”, ”woman”}
Statistical Observation
Basketball image is out of our training data
Anecdotes
Vision (Uri Itai)
• Researchers trained a network to classify tanks and trees.
• For 200 images, the test accuracy was 100% .
• In a Pentagon test the model suffered a significant performances’ decay.
The reason was that all the tank images were taken in cloudy days whereas trees in
sunny.
Text
• In text problems, often models focus on words rather extracting latent phenomena
Why Can’t we handle this ?
DL Training’s Vices
In practice
NN can't say “I don’t know”
NN can't provide a protocol upon its decision
• Models cannot “report” about their uncertainties
• Pointwise training may lead to overfitting
Is it crucial ?
• Medical- An engine that decides whether a tumor is malignant or benign
• Autonomous vehicle - Actions that are taken upon DL model
• High frequency trading
A potential solution - Run Inference with dropout
STATISTICS
UNCERTAINTY
TYPES
Epistemic Uncertainty
Episteme= Knowledge
Uncertainty that can be reduced:
• Improving model’s structure
• Adding data
Episteme= Knowledge
Uncertainty that can be reduced:
• Improving model’s structure
• Adding data
Aleatoric Uncertainty :
Aleator = Dice Player
Uncertainty that we cannot reduce:
• Stochasticity of a dice
• Noisy Labels
STATISTICAL
INFERENCE
• Evidence – The observed data
• Hypothesis – The latent variables
• P(Z|X) – Posterior dist.
Important Terms
𝑃 (𝜃|X) =
𝑃𝑟(𝜃)𝑃𝑙(x|𝜃)
𝑃(𝑋)
-Posterior dist.
• We wish
𝑃(𝜃|X) α 𝑃𝑙 𝑋 𝜃 𝑃𝑟(𝜃).
(posterior is prop to the product of prior and liklihood)
Statistical Inference Comparison
FR EQU EN STIST
• No prior knowledfe
• Parameters are unknown but fixed- No
probablity
• Only the data is r..v.
• TrainingMLE - P(data| Θ)
 Neural net are frenquetist entites.
(are they?)
BAYESIAN – ART OF BELIEF
• Previous trials are used as a prior
knowledge
• Used data is integrated into params dist .
• Parameters have probabiity we learn :
• Training. MAP - P( Θ | data)
Need a Bayesian analogue
Frequentist -MLE
Bayessian - MAP
BAYESIAN NEURAL NETWORKS
(BNN)
𝑢1
𝑥4
𝑥5
𝑥6
𝑥7
𝑧0
DL - Pointwise Learning
𝑢3
𝑢2
𝑢1
𝑥4
𝑥5
𝑥6
𝑥7
𝑧0
BNN – Posterior Training (𝞗- Training)
𝑢3
𝑢2
𝞿 𝞗
𝑤𝑖 ~𝞿 𝞗
UNCERTAINTY
ESTIMATION
BNN- Prediction
Predictive Porbabnility for training data D
Perform Droput during the inference
Uncertainy -Estimators
We calculate the prediction’s variance :
• The first term is the epistemic uncertainty – Variance of means
• The second term is the aleatoric uncertainty- Mean of variance
BNN EXAMPLE
HTTPS://TOWARDSDATASCIENCE.COM/MAKING -
YOUR-NEURAL-NETWORK-SAY-I-DONT-KNOW-
BAYESIAN-NNS-USING-PYRO-AND-PYTORCH-
B1C24E6AB8CD
Problem’s Framework
• MNIST CNN model
SOTA ~99.8%
Experiment's settings
• Vanilla trial ~Accuracy 88%
BNN Training
• We use VI with small amount of epochs
For each image in the test :
• Sample 100 networks
• We obtain 100 outputs per image
• We have 10 digits each with 100 scores
• If the median of these 100 scores>0.2 we take
(Indeed, we can accept more than a single result)
Results
On 10000 images:
• The network refused to decide on 1250 images
• On the rest of 8750 images the accuracy was 96%
• On A random data the network refused to answer on 95% of the
images
random
Hisotgrma of Log prob
Not-mnist alphaphabet letters
Refused iamge-mnist 2
Refused iamge-mnist 2
THE MATH BEHIND
CLASSICAL BAYSIAN Q.
Random Process
A function X on the pair (w, t) where :
• w - An outcome of a draw
• t – time index
• If w is fixed- X is a continuous function
• If t is fixed- X is a random variable
Robbins & Monro (Stoch. Approx. 1951)
• An unknown function F and a number θ satisfies:
F(𝑥∗ ) = θ
Y - A measurable r. v. s,t E[Y(x)] = F(x)
How do we Obtain Posterior ?
1. Variational Inference
2. MCMC –Sampling (Metropolis –Hastings, Gibbs)
3. HMC
4. SGLD
Metropolis- Hastings
MH- Properties
• Unbiased, Huge variance and very slow (iterate over the entire data)
• Great History
HMC (Duane 1987, Neal 1995)
• Fatster than MH
• It reaches to a low-density domains
• No minibatches- Need to calulae many graidenst and many accept- reject
SGLD
STOCHASTIC GRADIENT
LANGEVIN DYNAMIC
THE NERDS’ PART
Objectives
Numeric as DL Stochastic as Bayes
• We want a tool that :
• Allows mini btaches
• Good at finding distibutions for sampling weights
• Reduce Overfitting
Physics
Langevin Equation describes the motion of pollen grain in water:
F ~N(0, 𝑡). Brownian Force – collisions with molecules
This equation is an SDE : its solution is a random process
Overdamped Eq & ML (W &T 2011)
F
F(x) -γ𝑣𝑡 +ξ𝑡=0 ξ𝑡 ~N(0, 𝑡). (Brownian Dynamics, molecules d.)
F(x)=𝛻𝐸(x) 𝑣𝑡 =
𝑑𝑋
𝑑𝑡
Discretization
𝑥𝑡+1 = 𝑥𝑡 + dt(𝛻𝐸(x)+ ξ𝑡)
𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡 𝛻𝐸 +ε𝑡 ε𝑡 -Stochastic term
SGD- Let’s Batch !! (Welling & The 2011)
Denote . 𝛻𝐸𝑚𝑏 + u𝑡 =𝛻𝐸. .u𝑡 ~ N(0, V) V bounded
𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡(𝛻𝐸𝑚𝑏 + u𝑡 ) +ε𝑡
• Ignore the stochastic term ε𝑡
𝑃 (𝜃|X) = 𝑒−𝐸 𝑥
R & M -> MAP solution
The Langevin Term. ε𝑡
• We wish to avoid MAP collapsing as we want to exploit (we are
Bayesians)
• We can tweak variances For this purpose !
ε𝑡 ~N(0, 𝜎 )
What is ) σ ?
We need it to create a bigger variance than the SGD
term
If SGD’s variance goes as LR square , we can take LR
The Solution of W & T
• Welling & Teh 2011
Gal
Yarin Gal (2015) –BNN
https://javierantoran.github.io/assets/pdf/poster_advml.pdf
The Associated Langevin
𝑤 = F(w) + ξ𝑡 (Overdamped Langevin on W)
ε𝑡 ~N(0,, 𝜎 )
F(w) = 𝛻𝐸(w)
SGLD EXAMPLE.
H T T P S : / / H E N R I PA L . G I T H U B . I O / B L O G / L A N G E V I N
SGLD-Optimizer
SGLD-Optimizer
NotMnist Measuring the prob (model trained on MNIST)
https://github.com/henripal/sgld/blob/master/nbs/mnist.ipynb
Weight’s Variance CNN on MNIST
Bibliograpy
• https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch-b1c24e6ab8cd
• https://henripal.github.io/blog/langevin
• https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf
• https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
• https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py
• https://github.com/henripal/sgld/blob/master/sgld/sgld/eval.py
• https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf
• https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://arxiv.org/pdf/1710.07283.pdf
• https://arxiv.org/pdf/1710.07283.pdf
Bibliograpy
• https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch-
b1c24e6ab8cd
• https://github.com/noahgolmant/SGLD
• https://henripal.github.io/blog/langevin
• https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
• https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py
• https://javierantoran.github.io/assets/pdf/poster_advml.pdf
• https://www.cs.toronto.edu/~duvenaud/distill_bayes_net/public/
• https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf
• https://javierantoran.github.io/assets/pdf/poster_advml.pdf
• http://physics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren/kap6.pdf
THANK YOU

More Related Content

Similar to SGLD Berlin ML GROUP

Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfSeth Juarez
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
AP Advantage: AP Calculus
AP Advantage: AP CalculusAP Advantage: AP Calculus
AP Advantage: AP CalculusShashank Patil
 
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptKamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppttaoufikakabli1
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by KodebayKodebay
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstonsandymartin
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxkrunal soni
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic AlgorithmSHIMI S L
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 

Similar to SGLD Berlin ML GROUP (20)

Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
AP Advantage: AP Calculus
AP Advantage: AP CalculusAP Advantage: AP Calculus
AP Advantage: AP Calculus
 
1019Lec1.ppt
1019Lec1.ppt1019Lec1.ppt
1019Lec1.ppt
 
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptKamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
Kamada-filehhhhhhhhhhhhhhhhhhhhhhhhhhhh.ppt
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
naive bayes example.pdf
naive bayes example.pdfnaive bayes example.pdf
naive bayes example.pdf
 
naive bayes example.pdf
naive bayes example.pdfnaive bayes example.pdf
naive bayes example.pdf
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
26 assumptions
26 assumptions26 assumptions
26 assumptions
 
Variational inference
Variational inference  Variational inference
Variational inference
 
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL DivergenceNatan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 

More from Natan Katz (14)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 

Recently uploaded

NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent Universitypablovgd
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSEjordanparish425
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesrodneykiptoo8
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Sérgio Sacani
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPirithiRaju
 
THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingJocelyn Atis
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...NathanBaughman3
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPirithiRaju
 
Seminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptxSeminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptxRUDYLUMAPINET2
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard Gill
 
biotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptxbiotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptxANONYMOUS
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversitySteffi Friedrichs
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsMichel Dumontier
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxmuralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsAreesha Ahmad
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionpablovgd
 
GBSN - Microbiology (Lab 1) Microbiology Lab Safety Procedures
GBSN -  Microbiology (Lab  1) Microbiology Lab Safety ProceduresGBSN -  Microbiology (Lab  1) Microbiology Lab Safety Procedures
GBSN - Microbiology (Lab 1) Microbiology Lab Safety ProceduresAreesha Ahmad
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalMd Hasan Tareq
 

Recently uploaded (20)

NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSE
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursing
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
 
Seminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptxSeminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
biotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptxbiotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptx
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
GBSN - Microbiology (Lab 1) Microbiology Lab Safety Procedures
GBSN -  Microbiology (Lab  1) Microbiology Lab Safety ProceduresGBSN -  Microbiology (Lab  1) Microbiology Lab Safety Procedures
GBSN - Microbiology (Lab 1) Microbiology Lab Safety Procedures
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of Bengal
 

SGLD Berlin ML GROUP

  • 1. SGLD STOC H ASTIC GR AD IEN T LAN GEVIN DYN AMIC S Natan.katz@gmail.com natank@checkpoint.com
  • 2. a Natan Katz • Algorithm researcher since my military service • M.sc Applied math- Weizmann Inst. • Spent one year in Goethe uni –Prof. Kloeden • Official author in TDS • More than 10 patents • Currently at Checkpoint • Fanatic fan of the Celtics and SGE – so so Coordiantes • Natan.katz@gmail.com • Natank@checkpoint.com • https://www.linkedin.com/in/natan-katz-2936425/
  • 3. a Agenda • DL – Virtues & Vices • Uncertainties • Bayesian Inference • BNN • Langevin Dynamics • Pytorch Optimizer
  • 5. A DL Scenario • We train a CNN to identify images (men versus women) • We train a CNN to identify images (men versus women) • Our accuracy can be amazing (98-99%)
  • 6. Let’s get Cruel • We offer the model an image of a basketball • The model outputs “man” or “woman”
  • 7. Why is that? Mathematical Observation We trained a function F such that F : {space of images}->{“man”, ”woman”} Statistical Observation Basketball image is out of our training data
  • 8. Anecdotes Vision (Uri Itai) • Researchers trained a network to classify tanks and trees. • For 200 images, the test accuracy was 100% . • In a Pentagon test the model suffered a significant performances’ decay. The reason was that all the tank images were taken in cloudy days whereas trees in sunny. Text • In text problems, often models focus on words rather extracting latent phenomena
  • 9. Why Can’t we handle this ? DL Training’s Vices In practice NN can't say “I don’t know” NN can't provide a protocol upon its decision • Models cannot “report” about their uncertainties • Pointwise training may lead to overfitting
  • 10. Is it crucial ? • Medical- An engine that decides whether a tumor is malignant or benign • Autonomous vehicle - Actions that are taken upon DL model • High frequency trading A potential solution - Run Inference with dropout
  • 13. Epistemic Uncertainty Episteme= Knowledge Uncertainty that can be reduced: • Improving model’s structure • Adding data Episteme= Knowledge Uncertainty that can be reduced: • Improving model’s structure • Adding data
  • 14. Aleatoric Uncertainty : Aleator = Dice Player Uncertainty that we cannot reduce: • Stochasticity of a dice • Noisy Labels
  • 16. • Evidence – The observed data • Hypothesis – The latent variables • P(Z|X) – Posterior dist.
  • 17. Important Terms 𝑃 (𝜃|X) = 𝑃𝑟(𝜃)𝑃𝑙(x|𝜃) 𝑃(𝑋) -Posterior dist. • We wish 𝑃(𝜃|X) α 𝑃𝑙 𝑋 𝜃 𝑃𝑟(𝜃). (posterior is prop to the product of prior and liklihood)
  • 18. Statistical Inference Comparison FR EQU EN STIST • No prior knowledfe • Parameters are unknown but fixed- No probablity • Only the data is r..v. • TrainingMLE - P(data| Θ)  Neural net are frenquetist entites. (are they?) BAYESIAN – ART OF BELIEF • Previous trials are used as a prior knowledge • Used data is integrated into params dist . • Parameters have probabiity we learn : • Training. MAP - P( Θ | data) Need a Bayesian analogue
  • 22. 𝑢1 𝑥4 𝑥5 𝑥6 𝑥7 𝑧0 BNN – Posterior Training (𝞗- Training) 𝑢3 𝑢2 𝞿 𝞗 𝑤𝑖 ~𝞿 𝞗
  • 24. BNN- Prediction Predictive Porbabnility for training data D Perform Droput during the inference
  • 25. Uncertainy -Estimators We calculate the prediction’s variance : • The first term is the epistemic uncertainty – Variance of means • The second term is the aleatoric uncertainty- Mean of variance
  • 27. Problem’s Framework • MNIST CNN model SOTA ~99.8%
  • 28. Experiment's settings • Vanilla trial ~Accuracy 88% BNN Training • We use VI with small amount of epochs For each image in the test : • Sample 100 networks • We obtain 100 outputs per image • We have 10 digits each with 100 scores • If the median of these 100 scores>0.2 we take (Indeed, we can accept more than a single result)
  • 29. Results On 10000 images: • The network refused to decide on 1250 images • On the rest of 8750 images the accuracy was 96% • On A random data the network refused to answer on 95% of the images
  • 31. Not-mnist alphaphabet letters Refused iamge-mnist 2 Refused iamge-mnist 2
  • 33. Random Process A function X on the pair (w, t) where : • w - An outcome of a draw • t – time index • If w is fixed- X is a continuous function • If t is fixed- X is a random variable
  • 34. Robbins & Monro (Stoch. Approx. 1951) • An unknown function F and a number θ satisfies: F(𝑥∗ ) = θ Y - A measurable r. v. s,t E[Y(x)] = F(x)
  • 35. How do we Obtain Posterior ? 1. Variational Inference 2. MCMC –Sampling (Metropolis –Hastings, Gibbs) 3. HMC 4. SGLD
  • 37. MH- Properties • Unbiased, Huge variance and very slow (iterate over the entire data) • Great History HMC (Duane 1987, Neal 1995) • Fatster than MH • It reaches to a low-density domains • No minibatches- Need to calulae many graidenst and many accept- reject
  • 39. Objectives Numeric as DL Stochastic as Bayes • We want a tool that : • Allows mini btaches • Good at finding distibutions for sampling weights • Reduce Overfitting
  • 40. Physics Langevin Equation describes the motion of pollen grain in water: F ~N(0, 𝑡). Brownian Force – collisions with molecules This equation is an SDE : its solution is a random process
  • 41. Overdamped Eq & ML (W &T 2011) F F(x) -γ𝑣𝑡 +ξ𝑡=0 ξ𝑡 ~N(0, 𝑡). (Brownian Dynamics, molecules d.) F(x)=𝛻𝐸(x) 𝑣𝑡 = 𝑑𝑋 𝑑𝑡 Discretization 𝑥𝑡+1 = 𝑥𝑡 + dt(𝛻𝐸(x)+ ξ𝑡) 𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡 𝛻𝐸 +ε𝑡 ε𝑡 -Stochastic term
  • 42. SGD- Let’s Batch !! (Welling & The 2011) Denote . 𝛻𝐸𝑚𝑏 + u𝑡 =𝛻𝐸. .u𝑡 ~ N(0, V) V bounded 𝑥𝑡+1 = 𝑥𝑡 + 𝜖𝑡(𝛻𝐸𝑚𝑏 + u𝑡 ) +ε𝑡 • Ignore the stochastic term ε𝑡 𝑃 (𝜃|X) = 𝑒−𝐸 𝑥 R & M -> MAP solution
  • 43. The Langevin Term. ε𝑡 • We wish to avoid MAP collapsing as we want to exploit (we are Bayesians) • We can tweak variances For this purpose ! ε𝑡 ~N(0, 𝜎 ) What is ) σ ? We need it to create a bigger variance than the SGD term If SGD’s variance goes as LR square , we can take LR
  • 44. The Solution of W & T • Welling & Teh 2011 Gal
  • 45. Yarin Gal (2015) –BNN https://javierantoran.github.io/assets/pdf/poster_advml.pdf The Associated Langevin 𝑤 = F(w) + ξ𝑡 (Overdamped Langevin on W) ε𝑡 ~N(0,, 𝜎 ) F(w) = 𝛻𝐸(w)
  • 46. SGLD EXAMPLE. H T T P S : / / H E N R I PA L . G I T H U B . I O / B L O G / L A N G E V I N
  • 49. NotMnist Measuring the prob (model trained on MNIST) https://github.com/henripal/sgld/blob/master/nbs/mnist.ipynb
  • 51. Bibliograpy • https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch-b1c24e6ab8cd • https://henripal.github.io/blog/langevin • https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf • https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py • https://github.com/henripal/sgld/blob/master/sgld/sgld/eval.py • https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://arxiv.org/pdf/1710.07283.pdf • https://arxiv.org/pdf/1710.07283.pdf
  • 52. Bibliograpy • https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-and-pytorch- b1c24e6ab8cd • https://github.com/noahgolmant/SGLD • https://henripal.github.io/blog/langevin • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf • https://github.com/henripal/sgld/blob/master/sgld/sgld/sgld_optimizer.py • https://javierantoran.github.io/assets/pdf/poster_advml.pdf • https://www.cs.toronto.edu/~duvenaud/distill_bayes_net/public/ • https://d1.awsstatic.com/APG/quantifying-uncertainty-in-deep-learning-systems.pdf • https://javierantoran.github.io/assets/pdf/poster_advml.pdf • http://physics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren/kap6.pdf