The process of reinforcement learning (RL) involves trial and error; rewarding actions; and remembering past experiences overtime. This technique is used when building sequential decision-making solutions like automated self-driving cars, video games or personalized content recommendations. However, some of the challenges in building reinforcement learning models is it takes a long time for the system to learn and getting a high accuracy. In this session, we'll explore different reinforcement learning solutions like how to implement relevant user experiences that improve over time, based on behavior using a pre-built API; and how to build your custom model from scratch in python while increasing the learning speed and final performance using Azure Machine Learning & Ray/RLlib
Learn about optimizers of neural networks
Learn more about Machine Learning on http://gollnickdata.com/
Get the Udemy course for just 9.99€ (https://www.udemy.com/hands-on-machine-learning-with-r/?couponCode=MLWITHR_09.99)
Horizon: Deep Reinforcement Learning at ScaleDatabricks
To build a decision-making system, we must provide answers to two sets of questions: (1) ""What will happen if I make decision X?"" and (2) ""How should I pick which decision to make?"".
Typically, the first set of questions are answered with supervised learning: we build models to forecast whether someone will click on an ad, or visit a post. The second set of questions are more open-ended. In this talk, we will dive into how we can answer ""how"" questions, starting with heuristics and search. This will lead us to bandits, reinforcement learning, and Horizon: an open-source platform for training and deploying reinforcement learning models at massive scale. At Facebook, we are using Horizon, built using PyTorch 1.0 and Apache Spark, in a variety of AI-related and control tasks, spanning recommender systems, marketing & promotion distribution, and bandwidth optimization.
The talk will cover the key components of Horizon and the lessons we learned along the way that influenced the development of the platform.
Author: Jason Gauci
Intro to Reinforcement learning - part IIMikko Mäkipää
Introduction to Reinforcement Learning, part II: Basic tabular methods
This is the second presentation in a three-part series covering the basics of Reinforcement Learning (RL).
In this presentation, we introduce some more building blocks, such as policy iteration, bandits and exploration, epsilon-greedy policies, temporal difference methods.
We introduce basic model-free methods that use tabular value representation; Monte Carlo on- and off-policy, Sarsa, Expected Sarsa, and Q-learning.
The algorithms are illustrated using simple black jack as an environment.
Reinforcement Learning Guide For Beginnersgokulprasath06
Reinforcement Learning Guide:
Land in multiple job interviews by joining our Data Science certification course.
Data Science course content designed uniquely, which helps you start learning Data Science from basics to advanced data science concepts.
Content: http://bit.ly/2Mub6xP
Any Queries, Call us@ +91 9884412301 / 9600112302
Learn about optimizers of neural networks
Learn more about Machine Learning on http://gollnickdata.com/
Get the Udemy course for just 9.99€ (https://www.udemy.com/hands-on-machine-learning-with-r/?couponCode=MLWITHR_09.99)
Horizon: Deep Reinforcement Learning at ScaleDatabricks
To build a decision-making system, we must provide answers to two sets of questions: (1) ""What will happen if I make decision X?"" and (2) ""How should I pick which decision to make?"".
Typically, the first set of questions are answered with supervised learning: we build models to forecast whether someone will click on an ad, or visit a post. The second set of questions are more open-ended. In this talk, we will dive into how we can answer ""how"" questions, starting with heuristics and search. This will lead us to bandits, reinforcement learning, and Horizon: an open-source platform for training and deploying reinforcement learning models at massive scale. At Facebook, we are using Horizon, built using PyTorch 1.0 and Apache Spark, in a variety of AI-related and control tasks, spanning recommender systems, marketing & promotion distribution, and bandwidth optimization.
The talk will cover the key components of Horizon and the lessons we learned along the way that influenced the development of the platform.
Author: Jason Gauci
Intro to Reinforcement learning - part IIMikko Mäkipää
Introduction to Reinforcement Learning, part II: Basic tabular methods
This is the second presentation in a three-part series covering the basics of Reinforcement Learning (RL).
In this presentation, we introduce some more building blocks, such as policy iteration, bandits and exploration, epsilon-greedy policies, temporal difference methods.
We introduce basic model-free methods that use tabular value representation; Monte Carlo on- and off-policy, Sarsa, Expected Sarsa, and Q-learning.
The algorithms are illustrated using simple black jack as an environment.
Reinforcement Learning Guide For Beginnersgokulprasath06
Reinforcement Learning Guide:
Land in multiple job interviews by joining our Data Science certification course.
Data Science course content designed uniquely, which helps you start learning Data Science from basics to advanced data science concepts.
Content: http://bit.ly/2Mub6xP
Any Queries, Call us@ +91 9884412301 / 9600112302
Simulation To Reality: Reinforcement Learning For Autonomous DrivingDonal Byrne
Slides for my talk at PyCon Ireland 2019. The talk goes a brief overview of Reinforcement Learning (RL) and then dives in the key steps required to take an RL project from start to finish using autonomous driving as a case study. Finally the talk concludes with some references on where to get started with RL
Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks
Deep Reinforcement Learning has driven exciting AI breakthroughs like self-driving cars, beating the best Go players in the world and even winning at StarCraft. How can businesses harness this power for real world applications?
semi supervised Learning and Reinforcement learning (1).pptxDr.Shweta
Semi-Supervised Learning and Reinforcement Learning are two distinct paradigms within the field of machine learning, each with its own principles and applications. Let's briefly explore each of them:
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.
Abstract Summary:
Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Data Con LA
Online decision making over time needs interacting with an ever changing environment and underlying machine learning models need to change and adapt to this changing environment. This talk discusses class of machine learning algorithms and provides details of how the computation is parallelized using the Spark framework.
Presentazione Tesi Laurea Triennale in InformaticaLuca Marignati
Università degli Studi di Torino
Dipartimento di Informatica
Titolo: Apprendimento per Rinforzo e Applicazione ai Problemi di Pianificazione del Percorso
Topic: Machine Learning
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
My Academic Major Project Movie Recommendation using Artificial Intelligence. We also developed a website named movie engine for the recommendation of movies.
Reinforcement Learning has historically not been as widely adopted in production as other learning approaches (particularly supervised learning), despite being capable of addressing a broader set of problems.
But we are now seeing an growth in production RL applications: so much so that in specific areas it is approaching a tipping point to the mainstream. In this talk, we’ll talk about why this is happening; detail concrete examples of the areas where RL is adding value; and share some practical tips on deploying RL in your organization.
The Reinforcement Learning (RL) is a particular type of learning. It is useful when we try to learn from an unknown environment. Which means, that our model will have to explore the environment in order to collect the necessary data to use for its training. The model is represented as an Agent, trying to achieve a certain goal in a particular environment. The Agent affects this environment by taking actions that change the state of the environment and generate rewards produced by this later one.
The learning relies on the generated rewards, and the goal will be to maximize them. To choose the actions to apply, the agents use a policy. It can be defined as the process that the agent use to choose the actions that will permit it to optimize the overall rewards. In this course, we will see two methods used to develop these polices: policy gradient and Q-Learning. We will implement our examples using the following libraries: OpenAI gym, keras , tensorflow and keras-rl.
[Notebook 1](https://colab.research.google.com/drive/1395LU6jWULFogfErI8CIYpi35Y00YiRj)
[Notebook 2](https://colab.research.google.com/drive/1MpDS5rj-PwzzLIZtAGYnZ_jjEwhWZEdC)
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
More Related Content
Similar to Making smart decisions in real-time with Reinforcement Learning
Simulation To Reality: Reinforcement Learning For Autonomous DrivingDonal Byrne
Slides for my talk at PyCon Ireland 2019. The talk goes a brief overview of Reinforcement Learning (RL) and then dives in the key steps required to take an RL project from start to finish using autonomous driving as a case study. Finally the talk concludes with some references on where to get started with RL
Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks
Deep Reinforcement Learning has driven exciting AI breakthroughs like self-driving cars, beating the best Go players in the world and even winning at StarCraft. How can businesses harness this power for real world applications?
semi supervised Learning and Reinforcement learning (1).pptxDr.Shweta
Semi-Supervised Learning and Reinforcement Learning are two distinct paradigms within the field of machine learning, each with its own principles and applications. Let's briefly explore each of them:
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.
Abstract Summary:
Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Data Con LA
Online decision making over time needs interacting with an ever changing environment and underlying machine learning models need to change and adapt to this changing environment. This talk discusses class of machine learning algorithms and provides details of how the computation is parallelized using the Spark framework.
Presentazione Tesi Laurea Triennale in InformaticaLuca Marignati
Università degli Studi di Torino
Dipartimento di Informatica
Titolo: Apprendimento per Rinforzo e Applicazione ai Problemi di Pianificazione del Percorso
Topic: Machine Learning
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
My Academic Major Project Movie Recommendation using Artificial Intelligence. We also developed a website named movie engine for the recommendation of movies.
Reinforcement Learning has historically not been as widely adopted in production as other learning approaches (particularly supervised learning), despite being capable of addressing a broader set of problems.
But we are now seeing an growth in production RL applications: so much so that in specific areas it is approaching a tipping point to the mainstream. In this talk, we’ll talk about why this is happening; detail concrete examples of the areas where RL is adding value; and share some practical tips on deploying RL in your organization.
The Reinforcement Learning (RL) is a particular type of learning. It is useful when we try to learn from an unknown environment. Which means, that our model will have to explore the environment in order to collect the necessary data to use for its training. The model is represented as an Agent, trying to achieve a certain goal in a particular environment. The Agent affects this environment by taking actions that change the state of the environment and generate rewards produced by this later one.
The learning relies on the generated rewards, and the goal will be to maximize them. To choose the actions to apply, the agents use a policy. It can be defined as the process that the agent use to choose the actions that will permit it to optimize the overall rewards. In this course, we will see two methods used to develop these polices: policy gradient and Q-Learning. We will implement our examples using the following libraries: OpenAI gym, keras , tensorflow and keras-rl.
[Notebook 1](https://colab.research.google.com/drive/1395LU6jWULFogfErI8CIYpi35Y00YiRj)
[Notebook 2](https://colab.research.google.com/drive/1MpDS5rj-PwzzLIZtAGYnZ_jjEwhWZEdC)
Similar to Making smart decisions in real-time with Reinforcement Learning (20)
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Making smart decisions in real-time with Reinforcement Learning
1.
2.
3. Making smart decisions in real-time
with Reinforcement Learning
Ruth Yakubu
Sr. Cloud Advocate
@RuthieYakubu
4. 4
Agenda
Reinforcement Learning (RL) concepts
RL approaches, challenges and
algorithms
Q-Learning methods
Introduction to Azure Personalizer
Demo
Reinforcement Learning on Azure ML
Quick Ray/RLlib framework
Training built-in RL agents using the
RLlib framework
Demo
5. 5
Basic Reinforcement
Learning
• Learning by experience.
• Goal: choose actions that maximize rewards
• Agent: Dog
• State: Sit. Walk
• Reward: Get a Treat. No Treat
• Environment: Room or Anywhere
• We have the Environment, on which an Agent operates by responding to
commands and receiving Rewards and some State information.
• Involves trail and error
• Remember pattern that lead to success or failure.
7. 7
Q-Learning Algorithm
Start with 𝑄∗ 𝑠, 𝑎 = 0 for all 𝑠, 𝑎
Get initial state 𝑠
Repeat until convergence of 𝑄∗:
Select action 𝑎 and get immediate reward 𝑟 and next state 𝑠′
Update Q-value and current state:
𝑄∗
𝑠, 𝑎 ← 𝑅∗
𝑠, 𝑎 + 𝐺𝑎𝑚𝑚𝑎 ∗ 𝑀𝑎𝑥[𝑄 𝑛𝑒𝑥𝑡 𝑠, 𝑎𝑙𝑙 𝑎 ]
Type equation here.
Note: Gamma is a discount value that ranges between 0 and 1
8. Exploration &
Exploitation
• Exploration: process of exploring &
learning more information about
environment
• Exploitation: uses know information
about the environment to gain rewards
quicker
9. 9
How to select actions?
• Common strategies:
• Epsilon-Greedy exploration: with probability 𝜀 execute a random action, otherwise execute the best
action 𝑎∗
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑎 𝑄(𝑠, 𝑎)
• In practice we need a decreasing schedule for 𝜀 during training, so that the agent explores enough at the
beginning and exploits enough as it converges.
• Boltzmann exploration: similar to a softmax distribution 𝑃 𝑎 =
𝑒 𝑄(𝑠,𝑎)/𝑇
𝑎 𝑒 𝑄(𝑠,𝑎)/𝑇 , but with a parameter 𝑇 that
controls the spread of the distribution, such that a high value gives a more uniform distribution than a low
value.
10. 10
• Learns a transition and
reward models of the
environment to compute
optimal policy
Model
Based
• Learns an optimal
policy by interacting
with the environment
Model
Free
• Learns a value function
explicitly and computes the
policy from that
Value
Based
• Learns a policy directly without
computing a value function
Policy
Based
• Learns both a policy (the actor)
and a value function (the critic),
which measures how good a
policy is
Actor
Critic
11. 11
• Learns a transition and
reward models of the
environment to compute
optimal policy
Model
Based
• Learns an optimal
policy by interacting
with the environment
Model
Free
• Learns a value function
explicitly and computes the
policy from that
Value
Based
• Learns a policy directly without
computing a value function
Policy
Based
• Learns both a policy (the actor)
and a value function (the critic),
which measures how good a
policy is
Actor
Critic
13. Reinforcement Learning challenges
• The environment might be stochastic
• The model of the environment is usually
hidden or incomplete
• Actions are interdependent
• There is no supervision
• The feedback received might be partial
and/or delayed
• Partial Observability
• Actions and/or states might be continuous
14. 14
Some Use Cases for RL
• Game Playing (some famous examples: Backgammon, Atari,
Go)
• Operations Research (examples: Pricing, Vehicle Routing)
• Robotic Control
• Dialog Systems
• Energy Optimization
• Resource Allocation (examples: Computation, Networking)
• Autonomous Vehicles
• Computational Finance
15. What does Personalizer do?
15
Present the best action Uses Reinforcement Learning
Exploit the existing model in
most cases
Occasionally, explore new
possibilities
Continuous model updates
Update the scoring model with
the training model.
From a given set of input
actions
16. Your App
Action 2 Info
Action 3 Info
User & Context
Info
Action 1 info
Reward Score
17.
18. How it works?
• Rank API
• Explore
• Exploit
Rank API • Explore
• Exploit
Reward API • Reward action
19. Personalizer in Action
Xbox Home
Results: +40% lift in
engagement for items
Bing Ads
Results: +6% in ad
clickthrough
MSN News
Results: +25%
improvement in News
clickthrough
Personalized: News content on top of page in
MSN.com or Edge DHP/NTP
Reward: Click on content on the first slot
Personalized: Type of content in hero
position, item in secondary river.
Reward: Click and engagement
Personalized: Layout and location of ads
Reward: Ad click through
21. 21
RL on Azure ML – What is It?
Fully-Managed RL service for
large scale distributed
simulation and training, using
Ray/RLlib framework.
Customers create compute
clusters and submit
simulation/training jobs using
standard Azure ML pattern
(Estimator) with SDK & CLI.
RL algorithms are in RLlib –
Deep training is Tensorflow by
default, Pytorch possible.
Available in azureml-sdk 1.0.76
22. 22
RL Jobs Requirements
100’s of parallel
simulations.
Training: can take
multiple days.
Support for multiple
Ray jobs.
Resilient to
simulator / worker
failures.
ML Ops pipeline
integration.
23. 23
Simulators
Support
Open AI Gym.
Custom simulators with Open AI Gym
Environment interface – worker local or
remote in simulator.
Windows support.
Investigating additional simulator support.
24. 24
What is Ray? • High-performance distributed execution
framework targeted at large-scale machine
learning and reinforcement learning applications.
• Uses a lightweight API based on dynamic task
graphs and actors to express a wide range of
applications in a flexible manner.
25. 25
What is RLlib?
Library for Reinforcement Learning built on
top of the Ray framework.
High scalability and unified API.
Provide abstractions for common RL
components: Policy Model, Policy Evaluator,
Policy Optimizer.
Hierarchical and logically centralized control
to compose common RL components.
Provide a basic understanding of common Reinforcement Learning concepts, approaches, and its mathematical foundations and algorithms.
Understand common challenges in Reinforcement Learning and techniques to address them.
Show how to code a Deep Reinforcement Learning agent from scratch, using as an example a Deep Q-Learning agent.
Show a preview of the upcoming RL infrastructure on Azure ML and how to use it to train agents at scale.
We have the Environment, on which an Agent operates by acting on commands and receiving Rewards and some State information.
The goal here is to train an agent that learns to choose actions that maximizes the Rewards received.
In its highest level, an RL system has the structure depicted in this diagram.
We usually model this problem as a Markov Decision Process (MDP).
Agent and environment
State, action, reward
In its highest level, an RL system has the structure depicted in this diagram.
Performing the Q-function is known as the tabular Q-Learning.
Tabular because we explicitly enumerate the Q-values for all state-action pairs in a table and solve the optimization problem through dynamic programming.
Markov Decision Process (MDP)
A central aspect for Q-Learning to work is a good strategy to choose actions in the environment.
The idea here is that an agent needs to execute actions to explore the environment enough, in order to learn from good experiences. On the other hand, the agent also needs a good policy in order to obtain good experiences from the environment.
This is known as the Exploration vs Exploitation tradeoff.
A common strategy to balance exploration and exploitation is known as the Epsilon-Greedy exploration, where we introduce an uncertainty when choosing the best action. This is what we are going to use in our lab.
In practice, we implement this with an annealing scheme for decreasing the probability to pick random actions as the model converges.
There are other strategies, such as the Boltzmann exploration, which is like a softmax function with an additional parameter that controls the spread of the distribution. By varying this parameter we can also control the uncertainty in picking random actions.
With those definitions, we can categorize RL algorithms in the following classes.
Here we will focus in Model-free approaches, getting into the details of Value-based algorithms.
Solving Q-Learning with neural network.
Here are some examples of use cases that can be solved by RL.
What is Personalizer?
Personalizer implements an AI technique called Reinforcement Learning. Here's
how it works.
Suppose we want to display a "hero" action to the user. The user
might not be sure what to do next, but we could display one of several
suggestions. For a gaming app, that might be: "play a game", "watch a
movie", or "join a clan". Based on that user's history and other
contextual information -- say, their location, the time of day, and the day of
the week -- the Personalizer service will rank the possible actions and
suggest the best one to promote
Hopefully, the user will be happy, but how can we be sure? That depends
on what the user does next, and whether that was something we wanted them to do.
According to our business logic we'll assign a "reward score" between 0
and 1 to what happens next. For example, spending more time playing a game or
reading an article, or spending more money in the store, might lead to higher
reward scores. Personalizer feeds that info back into the ranking system
for the next time we need to feature an activity.
You only need the Rank API and Reward API to integrate with your application
Here’s how in the background the Personalizer API is build on Reinforcement Learning
Personalizer has been in development at Microsoft for many years. It's used on
Xbox devices, to determine what activities are featured on the home page, like
playing an installed game, or purchasing a new game from the store, or watching
others play on Mixer. Since the introduction of Personalizer, the Xbox team has
seen a significant lift in key engagement metrics.
Personalizer is also used to optimize the placement of ads in Bing search, and
the articles featured in MSN News, again with great results in improving
engagement from users.
Now you can use Personalizer in your own apps, as well.