SlideShare a Scribd company logo
Horizon: Deep
Reinforcement Learning at
Scale
Jason Gauci
Applied RL, Facebook AI
About Me
• Recommender systems @ Google/Apple/Facebook
• TLM on Horizon: A framework for large-scale RL:
https://github.com/facebookresearch/Horizon
• Eternal Terminal: a replacement for ssh/mosh
https://mistertea.github.io/EternalTerminal/
• Programming Throwdown: tech podcast
https://itunes.apple.com/us/podcast/programming-
throwdown/id427166321?mt=2
Recommender Systems
in 20 10 Minutes
Recommender Systems
1. Retrieval Matrix Factorization, Two Tower DNN
2. Event Prediction DNN, GBDT, Convnets, Seq2seq, etc.
3. Ranking Black Box Optimization, Bandits, RL
4. Data Science A/B Tests, Query Engines, User Studies
https://www.mailmunch.com/blog/sales-funnel/
Recommender Systems are Control Systems
1. Retrieval Control
2. Event Prediction Signal Processing
3. Ranking Control
4. Data Science Causal Analysis
Recommender
Systems are
Control
Systems
Control the user experience
• Explore/exploit
• Freshness
• Slate optimization
Control future models’ data
• Break feedback loops
• De-bias the model
Classification Versus Decision Making
Classification Decision Making
"What" questions (What will happen?) "How" questions (How can we do better?)
Trained on ground truth (Hotdog / Not Hotdog) Trained from another policy (usually a worse one)
Evaluated via accuracy (F1, AUC, NE) Counterfactual Evaluation (IPS, DR, MAGIC)
Assume data is perfect Assume data is flawed (explore/exploit)
Framework For Recommendation
• Action Features: 𝑋" ∈ 𝑅%
• Context Features: 𝑋& ∈ 𝑅%
• Session Features: 𝑋' ∈ 𝑅%
• Event Predictors: 𝐸(𝑋", 𝑋&, 𝑋') → 𝑅
Greedy Slate Recommendation:
• Value Function: 𝑉 𝑋", 𝑋&, 𝑋', 𝐸., 𝐸/, … , 𝐸1 → 𝑅
• Control Function: 𝜋 𝑉3, 𝑉., … , 𝑉1 → {0, … , 𝑛}
• Transition Function: 𝑇 𝑋", 𝑋&, 𝑋', 𝐸., 𝐸/, … , 𝐸1, 𝜋 → 𝑋'9
Discovering The Value Function
• What should we optimize for?
• Ads: Clicks? Conversions? Impressions?
• Feed/Search: Clicks? Time-Spent? Favorable user surveys?
• Answer: All of the above.
• How to combine?
• How to assign credit?
• Differentiable?
Tuning The Value Function
Searching Through Value Functions
Learning Value Functions
• Search is limited
• Curse of dimensionality
• Value models are sequential
• Optimize for long-term value
• Value models should be personalized
• Relationship between event predictors and utility is contextual
• Optimizing metrics is counterfactual
• “If I chose action a’, would metric m increase?”
Learning Value Functions
• Reinforcement Learning is designed around agents who make
decisions and improve their actions over time
Hypothesis: We can use RL to learn better value functions
Intro to RL
Reinforcement Learning (RL)
• Agent
• Recommendation System
• Reward
• User Behavior
• State
• Context (inc. historical)
• Action
• Content
https://becominghuman.ai/the-very-basics-of-reinforcement-learning-154f28a79071
RL Terms
• State (S)
• Every piece of data needed to
decide a single action.
• Example: User/Post/Session
features
• Action (A)
• A decision to be made by the
system
• Example: Which post to show
• Reward (𝑹 𝑺, 𝑨 )
• A function of utility based on the
current state and action
RL Terms
• Transition (𝑻 𝑺, 𝑨 → 𝑺>
)
• A function that maps state-action pairs
to a future state
• Bandit: 𝑻 𝑺, 𝑨 = 𝑻(𝑺)
• Policy (𝝅 𝑺, 𝑨 𝟎, 𝑨 𝟏, … , 𝑨 𝒏 → {𝟎, 𝒏})
• A function that, given a state, chooses
an action
• Episode
• A sequence of state-action pairs for a
single run (e.g. a complete game of Go)
Value Optimization
• Value (𝑸 𝑺, 𝑨 )
• The cumulative discounted
reward given a state and action
• 𝑄 𝑠G, 𝑎G = 𝑟G +
𝝲 ∗ 𝑟GM. + 𝝲/
∗ 𝑟GM/ +
𝝲N
∗ 𝑟GMN + ⋯
• A good policy becomes:
𝜋 𝑠 = 𝑚𝑎𝑥" 𝑄(𝑠, 𝑎)
Value Regression
• 𝑄 𝑠G, 𝑎G = 𝑟G + 𝝲 ∗ 𝑟GM. + 𝝲/
∗ 𝑟GM/ +
𝝲N
∗ 𝑟GMN + ⋯
• Collect historical data
• Solve with linear regression
• Problem: 𝑟GM. also depends on 𝑎GM.
Credit Assignment Problem
• Current state/action
• X’s turn to move
• What is the value?
• Pretty high
Credit Assignment Problem
• Next State/Action
• Now what is the value?
• Low
• The future actions affect the
past value
State Action Reward State Action (SARSA)
• Value Regression
• 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑟GM. + 𝛾/ ∗ 𝑟GM/ +…
• SARSA
• 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑄 𝑠GM., 𝑎GM.
• Idea borrowed from Dynamic Programming
• Using the future Q is more robust
• Value still highly influenced by current policy
Q-Learning: Off-Policy SARSA
• SARSA
• 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑄 𝑠GM., 𝑎GM.
• Q-Learning
• 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑚𝑎𝑥" GM. 𝑄 𝑠GM., 𝑎GM.
• Has better off-policy guarantees
• 𝑚𝑎𝑥" GM. may be difficult to know/compute
Policy Gradients
• Q-Learning: 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑚𝑎𝑥" GM. [𝑄 𝑠GM., 𝑎GM. ]
• What if we can’t do 𝑚𝑎𝑥" GM. [… ]?
• Policy Gradient
• Approximate 𝑚𝑎𝑥" GM. [𝑄 𝑠GM., 𝑎GM. ]
• 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝐴 𝑠GM.
• Learn 𝐴 𝑠GM. assuming Q is perfect:
• Deep Deterministic Policy Gradient
• 𝐿 𝐴 𝑠GM. = min(−𝑄 𝑠GM., 𝑎GM. )
• Soft Actor Critic
• 𝐿 𝐴 𝑠GM. = min(log(𝑃(𝐴 𝑠GM. = 𝑎GM.)) − 𝑄 𝑠GM., 𝑎GM. )
Applying RL at
Scale
Prior State of Applied RL
• Small-scale
• Notable Exceptions: ELF OpenGo, OpenAI Five, AlphaGo
• Simulation-Driven
• Simulators are often deterministic and stationary
Prior State of Applied RL
• Small-scale
• Notable Exceptions: ELF OpenGo, OpenAI Five, AlphaGo
• Simulation-Driven
• Simulators are often deterministic and stationary
Can we train personalized, large-scale RL models and bring them to
billions of people?
Applying RL at Scale
• Batch Feature normalization & training
• Because the loss target is dynamic, normalization is critical
• Distributed training
• Synchronous SGD (PASGD should be fine)
• Fixed (but stochastic) policies
• E-greedy, Softmax, Thompson Sampling
• Fixed policies allow for massive deployment
• No need for checkpointing, online parameter servers
• Counterfactual Policy Evaluation
• Detect anomalies and gain insights offline
Horizon: Applied RL Platform
• Robust
• Massively Parallel
• Open Source
• Built on high-performance platforms
• Spark
• PyTorch
• ONNX
• OpenAI Gym & Gridworld Integration
tests
Safe, Large-Scale deployment
• Deploy models to 1000s of
frontend servers
• Counterfactual Policy Evaluation
• Warm-start for continuous
deployment
• Built-in Explore/Exploit policies
Workflow
Preprocessing & Training
• Preprocessing happens as part of training
• Training begins by imitation learning, then pivots to policy
maximization
• Time-based or sequence-based discount factor
• Highly optimized with Pytorch 1.0
Counterfactual Policy Evaluation (CPE)
• One-Step (estimate reward)
• Direct Method (DM): Learn reward function for all states/actions
• Inverse Propensity Score (IPS): Boost reward by ratio of action probabilities
• Doubly-Robust: Use DM to reduce IPS variance
• Value (estimate cumulative reward)
• Direct Method: Learn reward and transition functions (model-based RL)
• Sequential DR: Extrapolate one-step CPE across episode
• MAGIC: Sliding window approach
CPE: Results on OpenAI Cartpole
Mean absolute error
(fraction of true value):
3.4%
Production Launches
• Infrastructure
• 360 Video adaptive bitrate
• Marketing/Growth
• Newsfeed Notifications
• Page Notifications
• Ad Coupons
• Recommendations
• M Assistant filtering
• Newsfeed/IG Value Model Optimization
Train your own model!
Questions/Comments?
Jason Gauci jjg@fb.com

More Related Content

What's hot

TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit
 
Improving Power Grid Reliability Using IoT Analytics
Improving Power Grid Reliability Using IoT AnalyticsImproving Power Grid Reliability Using IoT Analytics
Improving Power Grid Reliability Using IoT Analytics
Databricks
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
DataWorks Summit
 
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Databricks
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Data Con LA
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Databricks
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
TigerGraph
 
The More the Merrier: Scaling Model Building Infrastructure at Zendesk
The More the Merrier: Scaling Model Building Infrastructure at ZendeskThe More the Merrier: Scaling Model Building Infrastructure at Zendesk
The More the Merrier: Scaling Model Building Infrastructure at Zendesk
Databricks
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
Databricks
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
DataWorks Summit/Hadoop Summit
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
DataWorks Summit
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
Databricks
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA
 
Building A Feature Factory
Building A Feature FactoryBuilding A Feature Factory
Building A Feature Factory
Databricks
 

What's hot (20)

TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Improving Power Grid Reliability Using IoT Analytics
Improving Power Grid Reliability Using IoT AnalyticsImproving Power Grid Reliability Using IoT Analytics
Improving Power Grid Reliability Using IoT Analytics
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
The More the Merrier: Scaling Model Building Infrastructure at Zendesk
The More the Merrier: Scaling Model Building Infrastructure at ZendeskThe More the Merrier: Scaling Model Building Infrastructure at Zendesk
The More the Merrier: Scaling Model Building Infrastructure at Zendesk
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
 
Building A Feature Factory
Building A Feature FactoryBuilding A Feature Factory
Building A Feature Factory
 

Similar to Horizon: Deep Reinforcement Learning at Scale

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
Thomas Jensen. Machine Learning
Thomas Jensen. Machine LearningThomas Jensen. Machine Learning
Thomas Jensen. Machine Learning
Volha Banadyseva
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
Ruth Yakubu
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
Vaibhav Varshney
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
Databricks
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
Liangjie Hong
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
Luca Marignati
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Donal Byrne
 
Online Tuning of Large Scale Recommendation Systems
Online Tuning of Large Scale Recommendation SystemsOnline Tuning of Large Scale Recommendation Systems
Online Tuning of Large Scale Recommendation Systems
Viral Gupta
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllBernard Ong
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
Richard Robinson
 
Dynamic Search and Beyond
Dynamic Search and BeyondDynamic Search and Beyond
Dynamic Search and Beyond
Grace Hui Yang
 
Alexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance TestingAlexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance Testing
Neotys_Partner
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Emil Lupu
 

Similar to Horizon: Deep Reinforcement Learning at Scale (20)

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Thomas Jensen. Machine Learning
Thomas Jensen. Machine LearningThomas Jensen. Machine Learning
Thomas Jensen. Machine Learning
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 
Online Tuning of Large Scale Recommendation Systems
Online Tuning of Large Scale Recommendation SystemsOnline Tuning of Large Scale Recommendation Systems
Online Tuning of Large Scale Recommendation Systems
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Dynamic Search and Beyond
Dynamic Search and BeyondDynamic Search and Beyond
Dynamic Search and Beyond
 
Alexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance TestingAlexander Podelko - Context-Driven Performance Testing
Alexander Podelko - Context-Driven Performance Testing
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

Horizon: Deep Reinforcement Learning at Scale

  • 1. Horizon: Deep Reinforcement Learning at Scale Jason Gauci Applied RL, Facebook AI
  • 2. About Me • Recommender systems @ Google/Apple/Facebook • TLM on Horizon: A framework for large-scale RL: https://github.com/facebookresearch/Horizon • Eternal Terminal: a replacement for ssh/mosh https://mistertea.github.io/EternalTerminal/ • Programming Throwdown: tech podcast https://itunes.apple.com/us/podcast/programming- throwdown/id427166321?mt=2
  • 4. Recommender Systems 1. Retrieval Matrix Factorization, Two Tower DNN 2. Event Prediction DNN, GBDT, Convnets, Seq2seq, etc. 3. Ranking Black Box Optimization, Bandits, RL 4. Data Science A/B Tests, Query Engines, User Studies https://www.mailmunch.com/blog/sales-funnel/
  • 5. Recommender Systems are Control Systems 1. Retrieval Control 2. Event Prediction Signal Processing 3. Ranking Control 4. Data Science Causal Analysis
  • 6. Recommender Systems are Control Systems Control the user experience • Explore/exploit • Freshness • Slate optimization Control future models’ data • Break feedback loops • De-bias the model
  • 7. Classification Versus Decision Making Classification Decision Making "What" questions (What will happen?) "How" questions (How can we do better?) Trained on ground truth (Hotdog / Not Hotdog) Trained from another policy (usually a worse one) Evaluated via accuracy (F1, AUC, NE) Counterfactual Evaluation (IPS, DR, MAGIC) Assume data is perfect Assume data is flawed (explore/exploit)
  • 8. Framework For Recommendation • Action Features: 𝑋" ∈ 𝑅% • Context Features: 𝑋& ∈ 𝑅% • Session Features: 𝑋' ∈ 𝑅% • Event Predictors: 𝐸(𝑋", 𝑋&, 𝑋') → 𝑅 Greedy Slate Recommendation: • Value Function: 𝑉 𝑋", 𝑋&, 𝑋', 𝐸., 𝐸/, … , 𝐸1 → 𝑅 • Control Function: 𝜋 𝑉3, 𝑉., … , 𝑉1 → {0, … , 𝑛} • Transition Function: 𝑇 𝑋", 𝑋&, 𝑋', 𝐸., 𝐸/, … , 𝐸1, 𝜋 → 𝑋'9
  • 9. Discovering The Value Function • What should we optimize for? • Ads: Clicks? Conversions? Impressions? • Feed/Search: Clicks? Time-Spent? Favorable user surveys? • Answer: All of the above. • How to combine? • How to assign credit? • Differentiable?
  • 10. Tuning The Value Function
  • 12. Learning Value Functions • Search is limited • Curse of dimensionality • Value models are sequential • Optimize for long-term value • Value models should be personalized • Relationship between event predictors and utility is contextual • Optimizing metrics is counterfactual • “If I chose action a’, would metric m increase?”
  • 13. Learning Value Functions • Reinforcement Learning is designed around agents who make decisions and improve their actions over time Hypothesis: We can use RL to learn better value functions
  • 15. Reinforcement Learning (RL) • Agent • Recommendation System • Reward • User Behavior • State • Context (inc. historical) • Action • Content https://becominghuman.ai/the-very-basics-of-reinforcement-learning-154f28a79071
  • 16. RL Terms • State (S) • Every piece of data needed to decide a single action. • Example: User/Post/Session features • Action (A) • A decision to be made by the system • Example: Which post to show • Reward (𝑹 𝑺, 𝑨 ) • A function of utility based on the current state and action
  • 17. RL Terms • Transition (𝑻 𝑺, 𝑨 → 𝑺> ) • A function that maps state-action pairs to a future state • Bandit: 𝑻 𝑺, 𝑨 = 𝑻(𝑺) • Policy (𝝅 𝑺, 𝑨 𝟎, 𝑨 𝟏, … , 𝑨 𝒏 → {𝟎, 𝒏}) • A function that, given a state, chooses an action • Episode • A sequence of state-action pairs for a single run (e.g. a complete game of Go)
  • 18. Value Optimization • Value (𝑸 𝑺, 𝑨 ) • The cumulative discounted reward given a state and action • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝝲 ∗ 𝑟GM. + 𝝲/ ∗ 𝑟GM/ + 𝝲N ∗ 𝑟GMN + ⋯ • A good policy becomes: 𝜋 𝑠 = 𝑚𝑎𝑥" 𝑄(𝑠, 𝑎)
  • 19. Value Regression • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝝲 ∗ 𝑟GM. + 𝝲/ ∗ 𝑟GM/ + 𝝲N ∗ 𝑟GMN + ⋯ • Collect historical data • Solve with linear regression • Problem: 𝑟GM. also depends on 𝑎GM.
  • 20. Credit Assignment Problem • Current state/action • X’s turn to move • What is the value? • Pretty high
  • 21. Credit Assignment Problem • Next State/Action • Now what is the value? • Low • The future actions affect the past value
  • 22. State Action Reward State Action (SARSA) • Value Regression • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑟GM. + 𝛾/ ∗ 𝑟GM/ +… • SARSA • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑄 𝑠GM., 𝑎GM. • Idea borrowed from Dynamic Programming • Using the future Q is more robust • Value still highly influenced by current policy
  • 23. Q-Learning: Off-Policy SARSA • SARSA • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑄 𝑠GM., 𝑎GM. • Q-Learning • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑚𝑎𝑥" GM. 𝑄 𝑠GM., 𝑎GM. • Has better off-policy guarantees • 𝑚𝑎𝑥" GM. may be difficult to know/compute
  • 24. Policy Gradients • Q-Learning: 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝑚𝑎𝑥" GM. [𝑄 𝑠GM., 𝑎GM. ] • What if we can’t do 𝑚𝑎𝑥" GM. [… ]? • Policy Gradient • Approximate 𝑚𝑎𝑥" GM. [𝑄 𝑠GM., 𝑎GM. ] • 𝑄 𝑠G, 𝑎G = 𝑟G + 𝛾 ∗ 𝐴 𝑠GM. • Learn 𝐴 𝑠GM. assuming Q is perfect: • Deep Deterministic Policy Gradient • 𝐿 𝐴 𝑠GM. = min(−𝑄 𝑠GM., 𝑎GM. ) • Soft Actor Critic • 𝐿 𝐴 𝑠GM. = min(log(𝑃(𝐴 𝑠GM. = 𝑎GM.)) − 𝑄 𝑠GM., 𝑎GM. )
  • 26. Prior State of Applied RL • Small-scale • Notable Exceptions: ELF OpenGo, OpenAI Five, AlphaGo • Simulation-Driven • Simulators are often deterministic and stationary
  • 27. Prior State of Applied RL • Small-scale • Notable Exceptions: ELF OpenGo, OpenAI Five, AlphaGo • Simulation-Driven • Simulators are often deterministic and stationary Can we train personalized, large-scale RL models and bring them to billions of people?
  • 28. Applying RL at Scale • Batch Feature normalization & training • Because the loss target is dynamic, normalization is critical • Distributed training • Synchronous SGD (PASGD should be fine) • Fixed (but stochastic) policies • E-greedy, Softmax, Thompson Sampling • Fixed policies allow for massive deployment • No need for checkpointing, online parameter servers • Counterfactual Policy Evaluation • Detect anomalies and gain insights offline
  • 29. Horizon: Applied RL Platform • Robust • Massively Parallel • Open Source • Built on high-performance platforms • Spark • PyTorch • ONNX • OpenAI Gym & Gridworld Integration tests
  • 30. Safe, Large-Scale deployment • Deploy models to 1000s of frontend servers • Counterfactual Policy Evaluation • Warm-start for continuous deployment • Built-in Explore/Exploit policies
  • 32. Preprocessing & Training • Preprocessing happens as part of training • Training begins by imitation learning, then pivots to policy maximization • Time-based or sequence-based discount factor • Highly optimized with Pytorch 1.0
  • 33. Counterfactual Policy Evaluation (CPE) • One-Step (estimate reward) • Direct Method (DM): Learn reward function for all states/actions • Inverse Propensity Score (IPS): Boost reward by ratio of action probabilities • Doubly-Robust: Use DM to reduce IPS variance • Value (estimate cumulative reward) • Direct Method: Learn reward and transition functions (model-based RL) • Sequential DR: Extrapolate one-step CPE across episode • MAGIC: Sliding window approach
  • 34. CPE: Results on OpenAI Cartpole Mean absolute error (fraction of true value): 3.4%
  • 35. Production Launches • Infrastructure • 360 Video adaptive bitrate • Marketing/Growth • Newsfeed Notifications • Page Notifications • Ad Coupons • Recommendations • M Assistant filtering • Newsfeed/IG Value Model Optimization
  • 36. Train your own model!