SlideShare a Scribd company logo
1 of 30
Replica Exchange For
Multiple-Environment
Reinforcement Learning
Author:
Dmitri Glusco
Supervisor:
Dr. Mykola Maksymenko
● Introduction to Reinforcement Learning
● Exploration/Exploitation dilemma
● Related work
● Solution
● Evaluation
● Conclusions
● Future work
● Review discussion
Contents
Introduction to Reinforcement Learning
https://www.youtube.com/playlist?list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT
Introduction to Reinforcement Learning:
Markov Decision Process
https://en.wikipedia.org/wiki/Markov_decision_process
Return
State-value function
Action-value function
Bellman Optimality Equation
Exploration/Exploitation dilemma
http://dx.doi.org/10.1039/B509983H
● Epsilon-greedy
● Softmax
● Entropy loss
● Regularization
● Dropout
● Noise
Related work
● “Assessing Generalization in Deep
Reinforcement Learning”
● “Population Based Training of Neural
Networks”
● “Training Deep Neural Networks by
optimizing over nonlocal paths in
hyperparameter space”
Solution
Noise
Agent
DQN, A2C
Metropolis-Hastings Exchange
Solution: Hypotheses
● Noise improves exploration
● Metropolis-Hastings replica-exchange improves RL training
Solution: Research pipeline architecture
● No Exchange
● Random Exchange
● Smart Exchange
Exchange Types
Evaluation:
A2C, No Exchange
● Noise slows learning
● Noise improves exploration
● Stable 500 play score at
agents with higher noises
Evaluation:
A2C, Random Exchange
● Distance curves are
combined
● Training scores more
volatile
● Play scores a little bit more
stable
Evaluation:
A2C, Smart Exchange
● Distance curves are
combined
● Training scores with low
noise are higher and more
stable
● Play scores a little bit more
stable
Evaluation:
DQN, No Exchange
● Inverse distance picture
● High noise dominates agent
● Decent play score only at
agents with low noise
Evaluation:
DQN, Random Exchange
Evaluation:
DQN, Smart Exchange
● Training scores with low
noise are higher and more
stable
● Worse play scores
Conclusions
● Noise improves exploration
● Metropolis-Hastings replica-
exchange improves RL training
Hypotheses
Future work
● Environments with higher complexity
● Different noise types
● Compare explorations (noise vs agent)
● Exploration potential of the environment
● Parallelize training between exchanges
Review discussion
C1: No reason to have Bellman’s equation in this work (Chapter 1).
A1: Bellman’s equation is a fundamental equation.
C2: Equation on page 8 lacks any explanation.
A2: Agree. Provide more context or remove the formula.
C3: Page 9 hints at some convergence results, but lacks sufficient explanation and assumptions.
A3: It is a well-known property of the detailed balance condition and Metropolis-Hastings sampling that in
the long time limit we efficiently sample from the underlying stationary distribution.
Review discussion
C4: Why was a multiplicative noise added, and not an additive noise, which seems like a
more natural choice for Cart Pole?
A4: Multiplicative noise takes into account the scale of the value.
C5: The choice of DQN and A2C seems an overkill and somewhat counterproductive. I
would expect that policy methods would be a more natural choice, for instance,
crossentropy or natural evolution strategies.
A5: Focus on the diffusion picture.
Review discussion
C6: Hypothesis 1: whether noise improves exploration. I am not convinced that the charts
shown are enough evidence of the first part. I would have liked to see the trajectories of the
cart in the space (x, x’) which corresponds to the two first coordinates.
A6: Distance is not bound to the environment and only depend on the model parameters.
C7: Hypothesis 2: replica exchange improves training. Improving by getting a better score is
meaningless. The real challenge is to justify the usefulness of a method by showing that it
trains faster/cheaper than the competitors.
A7: Used score as mean values and their variance, which shows the stability of the agent.
Speed/computation mentioned as future work.
Review discussion
C8: The Appendix feels indeed like covering some page quota, charts are hard to read and
might even hint at some data leakage issues, as they suspiciously overlap with each other
too much.
A8: Show more results for a more complete picture.
C9: There are minor grammar issues (misplaced articles), but nothing serious.
A9:
Questions?
Extras
Extras
Extras
Extras
Extras
Extras
Extras
Extras

More Related Content

What's hot

A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkYu Liu
 
Different techniques for speech recognition
Different  techniques for speech recognitionDifferent  techniques for speech recognition
Different techniques for speech recognitionyashi saxena
 
Traveling Salesman Problem (TSP)
Traveling Salesman Problem (TSP)Traveling Salesman Problem (TSP)
Traveling Salesman Problem (TSP)Maksym Voitko
 
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Kimberly Aguada
 
Computational methods and vibrational properties applied to materials modeling
Computational methods and vibrational properties applied to materials modelingComputational methods and vibrational properties applied to materials modeling
Computational methods and vibrational properties applied to materials modelingcippo1987Ita
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningDaniel Emaasit
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct
 

What's hot (9)

A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Different techniques for speech recognition
Different  techniques for speech recognitionDifferent  techniques for speech recognition
Different techniques for speech recognition
 
Traveling Salesman Problem (TSP)
Traveling Salesman Problem (TSP)Traveling Salesman Problem (TSP)
Traveling Salesman Problem (TSP)
 
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
 
Computational methods and vibrational properties applied to materials modeling
Computational methods and vibrational properties applied to materials modelingComputational methods and vibrational properties applied to materials modeling
Computational methods and vibrational properties applied to materials modeling
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
 
Internet
InternetInternet
Internet
 

Similar to RL Training with Metropolis-Hastings Exchange

Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning민재 정
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusAdvanced-Concepts-Team
 
An Exact Branch And Bound Algorithm For The General Quadratic Assignment Problem
An Exact Branch And Bound Algorithm For The General Quadratic Assignment ProblemAn Exact Branch And Bound Algorithm For The General Quadratic Assignment Problem
An Exact Branch And Bound Algorithm For The General Quadratic Assignment ProblemJoe Andelija
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROMMohammad
 
RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...
RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...
RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...ijaia
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Jisu Han
 
An Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsAn Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsSajib Sen
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 

Similar to RL Training with Metropolis-Hastings Exchange (20)

Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
modeling.ppt
modeling.pptmodeling.ppt
modeling.ppt
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel Kus
 
An Exact Branch And Bound Algorithm For The General Quadratic Assignment Problem
An Exact Branch And Bound Algorithm For The General Quadratic Assignment ProblemAn Exact Branch And Bound Algorithm For The General Quadratic Assignment Problem
An Exact Branch And Bound Algorithm For The General Quadratic Assignment Problem
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Fahroo - Optimization and Discrete Mathematics - Spring Review 2013
Fahroo - Optimization and Discrete Mathematics - Spring Review 2013Fahroo - Optimization and Discrete Mathematics - Spring Review 2013
Fahroo - Optimization and Discrete Mathematics - Spring Review 2013
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROM
 
ANSSummer2015
ANSSummer2015ANSSummer2015
ANSSummer2015
 
RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...
RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...
RESEARCH ON FUZZY C- CLUSTERING RECURSIVE GENETIC ALGORITHM BASED ON CLOUD CO...
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
 
An Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsAn Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their Applications
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 

More from Lviv Data Science Summer School

Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Lviv Data Science Summer School
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Lviv Data Science Summer School
 
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Lviv Data Science Summer School
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...Lviv Data Science Summer School
 
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata ItemsLviv Data Science Summer School
 
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...Lviv Data Science Summer School
 
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...Lviv Data Science Summer School
 
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Lviv Data Science Summer School
 
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...Lviv Data Science Summer School
 
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...Lviv Data Science Summer School
 
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...Lviv Data Science Summer School
 
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...Lviv Data Science Summer School
 
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar... Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...Lviv Data Science Summer School
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Lviv Data Science Summer School
 
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...Lviv Data Science Summer School
 
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Lviv Data Science Summer School
 
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...Lviv Data Science Summer School
 
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...Lviv Data Science Summer School
 
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...Lviv Data Science Summer School
 

More from Lviv Data Science Summer School (20)

Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
 
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
 
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
 
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
 
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
 
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
 
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
 
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
 
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar... Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
 
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
 
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
 
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
 
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Arti...
 
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
 

Recently uploaded

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 

RL Training with Metropolis-Hastings Exchange

  • 1. Replica Exchange For Multiple-Environment Reinforcement Learning Author: Dmitri Glusco Supervisor: Dr. Mykola Maksymenko
  • 2. ● Introduction to Reinforcement Learning ● Exploration/Exploitation dilemma ● Related work ● Solution ● Evaluation ● Conclusions ● Future work ● Review discussion Contents
  • 3. Introduction to Reinforcement Learning https://www.youtube.com/playlist?list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT
  • 4. Introduction to Reinforcement Learning: Markov Decision Process https://en.wikipedia.org/wiki/Markov_decision_process Return State-value function Action-value function Bellman Optimality Equation
  • 5. Exploration/Exploitation dilemma http://dx.doi.org/10.1039/B509983H ● Epsilon-greedy ● Softmax ● Entropy loss ● Regularization ● Dropout ● Noise
  • 6. Related work ● “Assessing Generalization in Deep Reinforcement Learning” ● “Population Based Training of Neural Networks” ● “Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space”
  • 8. Solution: Hypotheses ● Noise improves exploration ● Metropolis-Hastings replica-exchange improves RL training
  • 9. Solution: Research pipeline architecture ● No Exchange ● Random Exchange ● Smart Exchange Exchange Types
  • 10. Evaluation: A2C, No Exchange ● Noise slows learning ● Noise improves exploration ● Stable 500 play score at agents with higher noises
  • 11. Evaluation: A2C, Random Exchange ● Distance curves are combined ● Training scores more volatile ● Play scores a little bit more stable
  • 12. Evaluation: A2C, Smart Exchange ● Distance curves are combined ● Training scores with low noise are higher and more stable ● Play scores a little bit more stable
  • 13. Evaluation: DQN, No Exchange ● Inverse distance picture ● High noise dominates agent ● Decent play score only at agents with low noise
  • 15. Evaluation: DQN, Smart Exchange ● Training scores with low noise are higher and more stable ● Worse play scores
  • 16. Conclusions ● Noise improves exploration ● Metropolis-Hastings replica- exchange improves RL training Hypotheses
  • 17. Future work ● Environments with higher complexity ● Different noise types ● Compare explorations (noise vs agent) ● Exploration potential of the environment ● Parallelize training between exchanges
  • 18. Review discussion C1: No reason to have Bellman’s equation in this work (Chapter 1). A1: Bellman’s equation is a fundamental equation. C2: Equation on page 8 lacks any explanation. A2: Agree. Provide more context or remove the formula. C3: Page 9 hints at some convergence results, but lacks sufficient explanation and assumptions. A3: It is a well-known property of the detailed balance condition and Metropolis-Hastings sampling that in the long time limit we efficiently sample from the underlying stationary distribution.
  • 19. Review discussion C4: Why was a multiplicative noise added, and not an additive noise, which seems like a more natural choice for Cart Pole? A4: Multiplicative noise takes into account the scale of the value. C5: The choice of DQN and A2C seems an overkill and somewhat counterproductive. I would expect that policy methods would be a more natural choice, for instance, crossentropy or natural evolution strategies. A5: Focus on the diffusion picture.
  • 20. Review discussion C6: Hypothesis 1: whether noise improves exploration. I am not convinced that the charts shown are enough evidence of the first part. I would have liked to see the trajectories of the cart in the space (x, x’) which corresponds to the two first coordinates. A6: Distance is not bound to the environment and only depend on the model parameters. C7: Hypothesis 2: replica exchange improves training. Improving by getting a better score is meaningless. The real challenge is to justify the usefulness of a method by showing that it trains faster/cheaper than the competitors. A7: Used score as mean values and their variance, which shows the stability of the agent. Speed/computation mentioned as future work.
  • 21. Review discussion C8: The Appendix feels indeed like covering some page quota, charts are hard to read and might even hint at some data leakage issues, as they suspiciously overlap with each other too much. A8: Show more results for a more complete picture. C9: There are minor grammar issues (misplaced articles), but nothing serious. A9:

Editor's Notes

  1. Hello, my name is Dmitrii Glushko, my supervisor is Dr. Mykola Maksymenko and I want to present you my master thesis work that is called Replica Exchange For ...
  2. Here is the contents of my presentation, we will briefly go through introduction to RL, describe the chosen problem, look at the related work, show solution and evaluation, draw conclusions and discuss future work and review
  3. So let’s briefly describe the key concepts in the RL: We have environment and agent. We can control only agent thus we train him. Agent can act with the environment by choosing available action, observe the environment’s state and receive the reward. Main idea is to maximize the obtained reward.
  4. Usually RL environment is described as MDP. It is a set of states, actions, transitions between them and rewards. For the total reward we have the therm “Return”. Few more notions here: policy is the behavior of the agent (what actions he chose in the given state), state value function is the return value for being in a given state, action-value function is the return value for choosing the specific action in a given state. v* or q* is the optimal value/actionvalue function (maximizes). And the foundation of all RL algorithms is the Bellman Optimality Equation for MDPs which shows us that to obtain optimal values we need to chose actions that lead us to the optimal value at the next state.
  5. Statement about exploration: behavior of the agent to choose the action to find the action-state combinations not seen before Statement about exploitation: to choose optimal action. This problem leads to not finding the best agent and the result highly relate on the initial state of the agent -> pure stability of the agents in RL To solve the problem we need to find optimal balance of efficiently exploring promising parts of the space
  6. We reviewed many different related papers but let’s talk about the papers which we are using to build our own solution. Noise PBT: hyperparameter optimization. PBT is a technique to train multiple agents at the same time to find the optimal setup. Technique is based on the genetic algorithm: copy best agents or explore by randomly changing parameters Hyperparameter optimization: noise flattens the loss function - avoid stucking in a local-minima. Changing hyperparameters by the Metropolis-Hastings exchange rule, which guarantees that in the long-time there is no dependence on the initial state (explore whole hyperparameter space)
  7. Noise similar to the assessing generalization paper, but not using sunblaze envs Multiple agents and environments setup to train in parallel similar to the PBT, but with different approach of communication use Metropolis-Hastings Exchange rule similar ot the Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space, with adapted exchange rule
  8. Improves training in terms of having stable best results across different experiments
  9. Appendix slides
  10. Slide for hypotheses Noise makes landscape smoother Metropolis Hastings rule exchange
  11. Slide for hypotheses Noise makes landscape smoother Metropolis Hastings rule exchange
  12. Slide for hypotheses Noise makes landscape smoother Metropolis Hastings rule exchange
  13. Slide for hypotheses Noise makes landscape smoother Metropolis Hastings rule exchange