SlideShare a Scribd company logo
1 of 19
Download to read offline
CME 241: Reinforcement Learning for Stochastic
Control Problems in Finance
Ashwin Rao
ICME, Stanford University
Ashwin Rao (Stanford) RL for Finance 1 / 19
Overview of the Course
Theory of Markov Decision Processes (MDPs)
Dynamic Programming (DP) Algorithms (a.k.a. model-based)
Reinforcement Learning (RL) Algorithms (a.k.a. model-free)
Plenty of Python implementations of models and algorithms
Apply these algorithms to 3 Financial/Trading problems:
(Dynamic) Asset-Allocation to maximize utility of Consumption
Optimal Exercise/Stopping of Path-dependent American Options
Optimal Trade Order Execution (managing Price Impact)
By treating each of the problems as MDPs (i.e., Stochastic Control)
We will go over classical/analytical solutions to these problems
Then introduce real-world considerations, and tackle with RL (or DP)
Ashwin Rao (Stanford) RL for Finance 2 / 19
What is the flavor of this course?
An important goal of this course is to effectively blend:
Theory/Mathematics
Programming/Algorithms
Modeling of Read-World Finance/Trading Problems
Ashwin Rao (Stanford) RL for Finance 3 / 19
Pre-Requisites and Housekeeping
Theory Pre-reqs: Optimization, Probability, Pricing, Portfolio Theory
Coding Pre-reqs: Data Structures/Algorithms with numpy/scipy
Alternative: Written test to demonstrate above-listed background
Grade based on Class Participation, 1 Exam, and Programming Work
Passing grade fetches 3 credits, can be applied towards MCF degree
Wed and Fri 4:30pm - 5:50pm, 01/07/2019 - 03/15/2019
Classes in GES building, room 150
Appointments: Any time Fridays or an hour before class Wednesdays
Use appointments time to discuss theory as well as your code
Ashwin Rao (Stanford) RL for Finance 4 / 19
Resources
I recommend Sutton-Barto as the companion book for this course
I won’t follow the structure of Sutton-Barto book
But I will follow his approach/treatment
I will follow the structure of David Silver’s RL course
I encourage you to augment my lectures with David’s lecture videos
Occasionally, I will wear away or speed up/slow down from this flow
We will do a bit more Theory & a lot more coding (relative to above)
You can freely use my open-source code for your coding work
I expect you to duplicate the functionality of above code in this course
We will go over some classical papers on the Finance applications
To understand in-depth the analytical solutions in simple settings
I will augment the above content with many of my own slides
All of this will be organized on the course web site
Ashwin Rao (Stanford) RL for Finance 5 / 19
The MDP Framework (that RL is based on)
Ashwin Rao (Stanford) RL for Finance 6 / 19
Components of the MDP Framework
The Agent and the Environment interact in a time-sequenced loop
Agent responds to [State, Reward] by taking an Action
Environment responds by producing next step’s (random) State
Environment also produces a (random) scalar denoted as Reward
Goal of Agent is to maximize Expected Sum of all future Rewards
By controlling the (Policy : State → Action) function
Agent often doesn’t know the Model of the Environment
Model refers to state-transition probabilities and reward function
So, Agent has to learn the Model AND learn the Optimal Policy
This is a dynamic (time-sequenced control) system under uncertainty
Ashwin Rao (Stanford) RL for Finance 7 / 19
Many real-world problems fit this MDP framework
Self-driving vehicle (speed/steering to optimize safety/time)
Game of Chess (Boolean Reward at end of game)
Complex Logistical Operations (eg: movements in a Warehouse)
Make a humanoid robot walk/run on difficult terrains
Manage an investment portfolio
Control a power station
Optimal decisions during a football game
Strategy to win an election (high-complexity MDP)
Ashwin Rao (Stanford) RL for Finance 8 / 19
Why are these problems hard?
Model of Environment is unknown (learn as you go)
State space can be large or complex (involving many variables)
Sometimes, Action space is also large or complex
No direct feedback on “correct” Actions (only feedback is Reward)
Actions can have delayed consequences (late Rewards)
Time-sequenced complexity (eg: Actions influence future Actions)
Agent Actions need to tradeoff between “explore” and “exploit”
Ashwin Rao (Stanford) RL for Finance 9 / 19
Why is RL interesting/useful to learn about?
RL solves MDP problem when Environment Model is unknown
Or when State or Action space is too large/complex
Promise of modern A.I. is based on success of RL algorithms
Potential for automated decision-making in real-world business
In 10 years: Bots that act or behave more optimal than humans
RL already solves various low-complexity real-world problems
RL will soon be the most-desired skill in the Data Science job-market
Possibilities in Finance are endless (we cover 3 important problems)
Learning RL is a lot of fun! (interesting in theory as well as coding)
Ashwin Rao (Stanford) RL for Finance 10 / 19
Optimal Asset Allocation to Maximize Consumption Utility
You can invest in (allocate wealth to) a collection of assets
Investment horizon is a fixed length of time
Each risky asset has an unknown distribution of returns
Transaction Costs & Constraints on trading hours/quantities/shorting
Allowed to consume a fraction of your wealth at specific times
Dynamic Decision: Time-Sequenced Allocation & Consumption
To maximize horizon-aggregated Utility of Consumption
Utility function represents degree of risk-aversion
So, we effectively maximize aggregate Risk-Adjusted Consumption
Ashwin Rao (Stanford) RL for Finance 11 / 19
MDP for Optimal Asset Allocation problem
State is [Current Time, Current Holdings, Current Prices]
Action is [Allocation Quantities, Consumption Quantity]
Actions limited by various real-world trading constraints
Reward is Utility of Consumption less Transaction Costs
State-transitions governed by risky asset movements
Ashwin Rao (Stanford) RL for Finance 12 / 19
Optimal Exercise of Path-Dependent American Options
RL is an alternative to Longstaff-Schwartz algorithm for Pricing
State is [Current Time, History of Spot Prices]
Action is Boolean: Exercise (i.e., Payoff and Stop) or Continue
Reward always 0, except upon Exercise (= Payoff)
State-transitions governed by Spot Price Stochastic Process
Optimal Policy ⇒ Optimal Stopping ⇒ Option Price
Can be generalized to other Optimal Stopping problems
Ashwin Rao (Stanford) RL for Finance 13 / 19
Optimal Trade Order Execution (controlling Price Impact)
You are tasked with selling a large qty of a (relatively less-liquid) stock
You have a fixed horizon over which to complete the sale
Goal is to maximize aggregate sales proceeds over horizon
If you sell too fast, Price Impact will result in poor sales proceeds
If you sell too slow, you risk running out of time
We need to model temporary and permanent Price Impacts
Objective should incorporate penalty for variance of sales proceeds
Which is equivalent to maximizing aggregate Utility of sales proceeds
Ashwin Rao (Stanford) RL for Finance 14 / 19
MDP for Optimal Trade Order Execution
State is [Time Remaining, Stock Remaining to be Sold, Market Info]
Action is Quantity of Stock to Sell at current time
Reward is Utility of Sales Proceeds (i.e., proceeds-variance-adjusted)
Reward & State-transitions governed by Price Impact Model
Real-world Model can be quite complex (Order Book Dynamics)
Ashwin Rao (Stanford) RL for Finance 15 / 19
Landmark Papers we will cover in detail
Merton’s solution for Optimal Portfolio Allocation/Consumption
Longstaff-Schwartz Algorithm for Pricing American Options
Bertsimas-Lo paper on Optimal Execution Cost
Almgren-Chriss paper on Optimal Risk-Adjusted Execution Cost
Sutton et al’s proof of Policy Gradient Theorem
Ashwin Rao (Stanford) RL for Finance 16 / 19
Week by Week (Tentative) Schedule
Week 1: Introduction to RL and Overview of Finance Problems
Week 2: Theory of Markov Decision Process & Bellman Equations
Week 3: Dynamic Programming (Policy Iteration, Value Iteration)
Week 4: Model-free Prediction (RL for Value Function estimation)
Week 5: Model-free Control (RL for Optimal Value Function/Policy)
Week 6: RL with Function Approximation (including Deep RL)
Week 7: Policy Gradient Algorithms
Week 8: Optimal Asset Allocation problem
Week 9: Optimal Exercise of American Options problem
Week 10: Optimal Trade Order Execution problem
Ashwin Rao (Stanford) RL for Finance 17 / 19
Sneak Peek into a couple of lectures in this course
Policy Gradient Theorem and Compatible Approximation Theorem
HJB Equation and Merton’s Portfolio Problem
Ashwin Rao (Stanford) RL for Finance 18 / 19
Similar Courses offered at Stanford
CS 234 (Emma Brunskill - Winter 2019)
AA 228/CS 238 (Mykel Kochenderfer - Autumn 2018)
CS 332 (Emma Brunskill - Autumn 2018)
MS&E 351 (Ben Van Roy - Winter 2019)
MS&E 348 (Gerd Infanger) - Winter 2020)
MS&E 338 (Ben Van Roy - Spring 2019)
Ashwin Rao (Stanford) RL for Finance 19 / 19

More Related Content

What's hot

"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and..."Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...Quantopian
 
Forecasting Techniques
Forecasting TechniquesForecasting Techniques
Forecasting Techniquesguest865c0e0c
 
Stock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceStock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceTai Tran
 
Class notes forecasting
Class notes forecastingClass notes forecasting
Class notes forecastingArun Kumar
 
Arbitrage pricing theory & Efficient market hypothesis
Arbitrage pricing theory & Efficient market hypothesisArbitrage pricing theory & Efficient market hypothesis
Arbitrage pricing theory & Efficient market hypothesisHari Ram
 
Modeling Transaction Costs for Algorithmic Strategies
Modeling Transaction Costs for Algorithmic StrategiesModeling Transaction Costs for Algorithmic Strategies
Modeling Transaction Costs for Algorithmic StrategiesQuantopian
 
Bba 1584 planning n forecasting
Bba 1584 planning n forecastingBba 1584 planning n forecasting
Bba 1584 planning n forecastinglecturer-notes2014
 
An Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAkhil Goyal
 
Automatic algorithms for time series forecasting
Automatic algorithms for time series forecastingAutomatic algorithms for time series forecasting
Automatic algorithms for time series forecastingRob Hyndman
 
Technical Interview Workshop Sept 17 2007 V2 Website
Technical Interview Workshop Sept 17 2007 V2 WebsiteTechnical Interview Workshop Sept 17 2007 V2 Website
Technical Interview Workshop Sept 17 2007 V2 WebsiteMcGill Investment Club
 
Forecasting and methods of forecasting
Forecasting and methods of forecastingForecasting and methods of forecasting
Forecasting and methods of forecastingMilind Pelagade
 
Forecasting
ForecastingForecasting
ForecastingSVGANGAD
 
ForecastIT 1. Introduction to Forecasting
ForecastIT 1. Introduction to ForecastingForecastIT 1. Introduction to Forecasting
ForecastIT 1. Introduction to ForecastingDeepThought, Inc.
 
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin..."Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...Quantopian
 

What's hot (20)

"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and..."Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
 
Forecasting Techniques
Forecasting TechniquesForecasting Techniques
Forecasting Techniques
 
Stock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceStock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical Evidence
 
Forecasting
ForecastingForecasting
Forecasting
 
Demand Forcasting
Demand ForcastingDemand Forcasting
Demand Forcasting
 
Class notes forecasting
Class notes forecastingClass notes forecasting
Class notes forecasting
 
Arbitrage pricing theory & Efficient market hypothesis
Arbitrage pricing theory & Efficient market hypothesisArbitrage pricing theory & Efficient market hypothesis
Arbitrage pricing theory & Efficient market hypothesis
 
Modeling Transaction Costs for Algorithmic Strategies
Modeling Transaction Costs for Algorithmic StrategiesModeling Transaction Costs for Algorithmic Strategies
Modeling Transaction Costs for Algorithmic Strategies
 
forecasting methods
forecasting methodsforecasting methods
forecasting methods
 
Bba 1584 planning n forecasting
Bba 1584 planning n forecastingBba 1584 planning n forecasting
Bba 1584 planning n forecasting
 
An Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing Theory
 
Automatic algorithms for time series forecasting
Automatic algorithms for time series forecastingAutomatic algorithms for time series forecasting
Automatic algorithms for time series forecasting
 
Technical Interview Workshop Sept 17 2007 V2 Website
Technical Interview Workshop Sept 17 2007 V2 WebsiteTechnical Interview Workshop Sept 17 2007 V2 Website
Technical Interview Workshop Sept 17 2007 V2 Website
 
demand forecasting
demand forecastingdemand forecasting
demand forecasting
 
Forecasting and methods of forecasting
Forecasting and methods of forecastingForecasting and methods of forecasting
Forecasting and methods of forecasting
 
Forecasting
ForecastingForecasting
Forecasting
 
Forecasting model 15 04-31
Forecasting model 15 04-31Forecasting model 15 04-31
Forecasting model 15 04-31
 
ForecastIT 1. Introduction to Forecasting
ForecastIT 1. Introduction to ForecastingForecastIT 1. Introduction to Forecasting
ForecastIT 1. Introduction to Forecasting
 
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin..."Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
 
Chapter 13
Chapter 13Chapter 13
Chapter 13
 

Similar to RL for Finance Problems: Asset Allocation, Options, Trading

Quantitative management
Quantitative managementQuantitative management
Quantitative managementsmumbahelp
 
Operations research.pptx
Operations research.pptxOperations research.pptx
Operations research.pptxssuserad3989
 
Operation research history and overview application limitation
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitationBalaji P
 
decision making in Lp
decision making in Lp decision making in Lp
decision making in Lp Dronak Sahu
 
Introduction to Reinforcement Learning.pdf
Introduction to Reinforcement Learning.pdfIntroduction to Reinforcement Learning.pdf
Introduction to Reinforcement Learning.pdfAbhinavNautiyal8
 
Supply Chain Management - Optimization technology
Supply Chain Management - Optimization technologySupply Chain Management - Optimization technology
Supply Chain Management - Optimization technologyNurhazman Abdul Aziz
 
Fdp session rtu session 1
Fdp session rtu session 1Fdp session rtu session 1
Fdp session rtu session 1sprsingh1
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...QuantInsti
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
CA02CA3103 RMTLPP Formulation.pdf
CA02CA3103 RMTLPP Formulation.pdfCA02CA3103 RMTLPP Formulation.pdf
CA02CA3103 RMTLPP Formulation.pdfMinawBelay
 
080324 Miller Opportunity Matrix Asq Presentation
080324 Miller Opportunity Matrix Asq Presentation080324 Miller Opportunity Matrix Asq Presentation
080324 Miller Opportunity Matrix Asq Presentationrwmill9716
 
Operations research ppt
Operations research pptOperations research ppt
Operations research pptraaz kumar
 
Chapter 00 Introduction Operational research
Chapter 00 Introduction Operational researchChapter 00 Introduction Operational research
Chapter 00 Introduction Operational researchMariaSarwat
 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docxfelicidaddinwoodie
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Steve Feldman
 

Similar to RL for Finance Problems: Asset Allocation, Options, Trading (20)

Quantitative management
Quantitative managementQuantitative management
Quantitative management
 
Operations research.pptx
Operations research.pptxOperations research.pptx
Operations research.pptx
 
Operation research history and overview application limitation
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitation
 
decision making in Lp
decision making in Lp decision making in Lp
decision making in Lp
 
Introduction to Reinforcement Learning.pdf
Introduction to Reinforcement Learning.pdfIntroduction to Reinforcement Learning.pdf
Introduction to Reinforcement Learning.pdf
 
Supply Chain Management - Optimization technology
Supply Chain Management - Optimization technologySupply Chain Management - Optimization technology
Supply Chain Management - Optimization technology
 
Fdp session rtu session 1
Fdp session rtu session 1Fdp session rtu session 1
Fdp session rtu session 1
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
What Can RL do.pptx
What Can RL do.pptxWhat Can RL do.pptx
What Can RL do.pptx
 
CA02CA3103 RMTLPP Formulation.pdf
CA02CA3103 RMTLPP Formulation.pdfCA02CA3103 RMTLPP Formulation.pdf
CA02CA3103 RMTLPP Formulation.pdf
 
Portfolio Planning
Portfolio PlanningPortfolio Planning
Portfolio Planning
 
080324 Miller Opportunity Matrix Asq Presentation
080324 Miller Opportunity Matrix Asq Presentation080324 Miller Opportunity Matrix Asq Presentation
080324 Miller Opportunity Matrix Asq Presentation
 
What is Operation Research?
What is Operation Research?What is Operation Research?
What is Operation Research?
 
Operations research ppt
Operations research pptOperations research ppt
Operations research ppt
 
Optimazation
OptimazationOptimazation
Optimazation
 
Chapter 00 Introduction Operational research
Chapter 00 Introduction Operational researchChapter 00 Introduction Operational research
Chapter 00 Introduction Operational research
 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
 

More from Ashwin Rao

Stochastic Control/Reinforcement Learning for Optimal Market Making
Stochastic Control/Reinforcement Learning for Optimal Market MakingStochastic Control/Reinforcement Learning for Optimal Market Making
Stochastic Control/Reinforcement Learning for Optimal Market MakingAshwin Rao
 
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree SearchAdaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree SearchAshwin Rao
 
Fundamental Theorems of Asset Pricing
Fundamental Theorems of Asset PricingFundamental Theorems of Asset Pricing
Fundamental Theorems of Asset PricingAshwin Rao
 
Evolutionary Strategies as an alternative to Reinforcement Learning
Evolutionary Strategies as an alternative to Reinforcement LearningEvolutionary Strategies as an alternative to Reinforcement Learning
Evolutionary Strategies as an alternative to Reinforcement LearningAshwin Rao
 
Understanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsUnderstanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsAshwin Rao
 
Stochastic Control of Optimal Trade Order Execution
Stochastic Control of Optimal Trade Order ExecutionStochastic Control of Optimal Trade Order Execution
Stochastic Control of Optimal Trade Order ExecutionAshwin Rao
 
Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsAshwin Rao
 
Risk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryAshwin Rao
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TDAshwin Rao
 
HJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemAshwin Rao
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient TheoremAshwin Rao
 
A Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsAshwin Rao
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkAshwin Rao
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffAshwin Rao
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesAshwin Rao
 
Risk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationAshwin Rao
 
Abstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAshwin Rao
 
OmniChannelNewsvendor
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendorAshwin Rao
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderAshwin Rao
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||Ashwin Rao
 

More from Ashwin Rao (20)

Stochastic Control/Reinforcement Learning for Optimal Market Making
Stochastic Control/Reinforcement Learning for Optimal Market MakingStochastic Control/Reinforcement Learning for Optimal Market Making
Stochastic Control/Reinforcement Learning for Optimal Market Making
 
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree SearchAdaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
 
Fundamental Theorems of Asset Pricing
Fundamental Theorems of Asset PricingFundamental Theorems of Asset Pricing
Fundamental Theorems of Asset Pricing
 
Evolutionary Strategies as an alternative to Reinforcement Learning
Evolutionary Strategies as an alternative to Reinforcement LearningEvolutionary Strategies as an alternative to Reinforcement Learning
Evolutionary Strategies as an alternative to Reinforcement Learning
 
Understanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsUnderstanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman Operators
 
Stochastic Control of Optimal Trade Order Execution
Stochastic Control of Optimal Trade Order ExecutionStochastic Control of Optimal Trade Order Execution
Stochastic Control of Optimal Trade Order Execution
 
Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus Foundations
 
Risk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility Theory
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
 
HJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio Problem
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient Theorem
 
A Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier Mathematics
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) pictures
 
Risk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to Correlation
 
Abstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 Hours
 
OmniChannelNewsvendor
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendor
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||
 

Recently uploaded

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 

RL for Finance Problems: Asset Allocation, Options, Trading

  • 1. CME 241: Reinforcement Learning for Stochastic Control Problems in Finance Ashwin Rao ICME, Stanford University Ashwin Rao (Stanford) RL for Finance 1 / 19
  • 2. Overview of the Course Theory of Markov Decision Processes (MDPs) Dynamic Programming (DP) Algorithms (a.k.a. model-based) Reinforcement Learning (RL) Algorithms (a.k.a. model-free) Plenty of Python implementations of models and algorithms Apply these algorithms to 3 Financial/Trading problems: (Dynamic) Asset-Allocation to maximize utility of Consumption Optimal Exercise/Stopping of Path-dependent American Options Optimal Trade Order Execution (managing Price Impact) By treating each of the problems as MDPs (i.e., Stochastic Control) We will go over classical/analytical solutions to these problems Then introduce real-world considerations, and tackle with RL (or DP) Ashwin Rao (Stanford) RL for Finance 2 / 19
  • 3. What is the flavor of this course? An important goal of this course is to effectively blend: Theory/Mathematics Programming/Algorithms Modeling of Read-World Finance/Trading Problems Ashwin Rao (Stanford) RL for Finance 3 / 19
  • 4. Pre-Requisites and Housekeeping Theory Pre-reqs: Optimization, Probability, Pricing, Portfolio Theory Coding Pre-reqs: Data Structures/Algorithms with numpy/scipy Alternative: Written test to demonstrate above-listed background Grade based on Class Participation, 1 Exam, and Programming Work Passing grade fetches 3 credits, can be applied towards MCF degree Wed and Fri 4:30pm - 5:50pm, 01/07/2019 - 03/15/2019 Classes in GES building, room 150 Appointments: Any time Fridays or an hour before class Wednesdays Use appointments time to discuss theory as well as your code Ashwin Rao (Stanford) RL for Finance 4 / 19
  • 5. Resources I recommend Sutton-Barto as the companion book for this course I won’t follow the structure of Sutton-Barto book But I will follow his approach/treatment I will follow the structure of David Silver’s RL course I encourage you to augment my lectures with David’s lecture videos Occasionally, I will wear away or speed up/slow down from this flow We will do a bit more Theory & a lot more coding (relative to above) You can freely use my open-source code for your coding work I expect you to duplicate the functionality of above code in this course We will go over some classical papers on the Finance applications To understand in-depth the analytical solutions in simple settings I will augment the above content with many of my own slides All of this will be organized on the course web site Ashwin Rao (Stanford) RL for Finance 5 / 19
  • 6. The MDP Framework (that RL is based on) Ashwin Rao (Stanford) RL for Finance 6 / 19
  • 7. Components of the MDP Framework The Agent and the Environment interact in a time-sequenced loop Agent responds to [State, Reward] by taking an Action Environment responds by producing next step’s (random) State Environment also produces a (random) scalar denoted as Reward Goal of Agent is to maximize Expected Sum of all future Rewards By controlling the (Policy : State → Action) function Agent often doesn’t know the Model of the Environment Model refers to state-transition probabilities and reward function So, Agent has to learn the Model AND learn the Optimal Policy This is a dynamic (time-sequenced control) system under uncertainty Ashwin Rao (Stanford) RL for Finance 7 / 19
  • 8. Many real-world problems fit this MDP framework Self-driving vehicle (speed/steering to optimize safety/time) Game of Chess (Boolean Reward at end of game) Complex Logistical Operations (eg: movements in a Warehouse) Make a humanoid robot walk/run on difficult terrains Manage an investment portfolio Control a power station Optimal decisions during a football game Strategy to win an election (high-complexity MDP) Ashwin Rao (Stanford) RL for Finance 8 / 19
  • 9. Why are these problems hard? Model of Environment is unknown (learn as you go) State space can be large or complex (involving many variables) Sometimes, Action space is also large or complex No direct feedback on “correct” Actions (only feedback is Reward) Actions can have delayed consequences (late Rewards) Time-sequenced complexity (eg: Actions influence future Actions) Agent Actions need to tradeoff between “explore” and “exploit” Ashwin Rao (Stanford) RL for Finance 9 / 19
  • 10. Why is RL interesting/useful to learn about? RL solves MDP problem when Environment Model is unknown Or when State or Action space is too large/complex Promise of modern A.I. is based on success of RL algorithms Potential for automated decision-making in real-world business In 10 years: Bots that act or behave more optimal than humans RL already solves various low-complexity real-world problems RL will soon be the most-desired skill in the Data Science job-market Possibilities in Finance are endless (we cover 3 important problems) Learning RL is a lot of fun! (interesting in theory as well as coding) Ashwin Rao (Stanford) RL for Finance 10 / 19
  • 11. Optimal Asset Allocation to Maximize Consumption Utility You can invest in (allocate wealth to) a collection of assets Investment horizon is a fixed length of time Each risky asset has an unknown distribution of returns Transaction Costs & Constraints on trading hours/quantities/shorting Allowed to consume a fraction of your wealth at specific times Dynamic Decision: Time-Sequenced Allocation & Consumption To maximize horizon-aggregated Utility of Consumption Utility function represents degree of risk-aversion So, we effectively maximize aggregate Risk-Adjusted Consumption Ashwin Rao (Stanford) RL for Finance 11 / 19
  • 12. MDP for Optimal Asset Allocation problem State is [Current Time, Current Holdings, Current Prices] Action is [Allocation Quantities, Consumption Quantity] Actions limited by various real-world trading constraints Reward is Utility of Consumption less Transaction Costs State-transitions governed by risky asset movements Ashwin Rao (Stanford) RL for Finance 12 / 19
  • 13. Optimal Exercise of Path-Dependent American Options RL is an alternative to Longstaff-Schwartz algorithm for Pricing State is [Current Time, History of Spot Prices] Action is Boolean: Exercise (i.e., Payoff and Stop) or Continue Reward always 0, except upon Exercise (= Payoff) State-transitions governed by Spot Price Stochastic Process Optimal Policy ⇒ Optimal Stopping ⇒ Option Price Can be generalized to other Optimal Stopping problems Ashwin Rao (Stanford) RL for Finance 13 / 19
  • 14. Optimal Trade Order Execution (controlling Price Impact) You are tasked with selling a large qty of a (relatively less-liquid) stock You have a fixed horizon over which to complete the sale Goal is to maximize aggregate sales proceeds over horizon If you sell too fast, Price Impact will result in poor sales proceeds If you sell too slow, you risk running out of time We need to model temporary and permanent Price Impacts Objective should incorporate penalty for variance of sales proceeds Which is equivalent to maximizing aggregate Utility of sales proceeds Ashwin Rao (Stanford) RL for Finance 14 / 19
  • 15. MDP for Optimal Trade Order Execution State is [Time Remaining, Stock Remaining to be Sold, Market Info] Action is Quantity of Stock to Sell at current time Reward is Utility of Sales Proceeds (i.e., proceeds-variance-adjusted) Reward & State-transitions governed by Price Impact Model Real-world Model can be quite complex (Order Book Dynamics) Ashwin Rao (Stanford) RL for Finance 15 / 19
  • 16. Landmark Papers we will cover in detail Merton’s solution for Optimal Portfolio Allocation/Consumption Longstaff-Schwartz Algorithm for Pricing American Options Bertsimas-Lo paper on Optimal Execution Cost Almgren-Chriss paper on Optimal Risk-Adjusted Execution Cost Sutton et al’s proof of Policy Gradient Theorem Ashwin Rao (Stanford) RL for Finance 16 / 19
  • 17. Week by Week (Tentative) Schedule Week 1: Introduction to RL and Overview of Finance Problems Week 2: Theory of Markov Decision Process & Bellman Equations Week 3: Dynamic Programming (Policy Iteration, Value Iteration) Week 4: Model-free Prediction (RL for Value Function estimation) Week 5: Model-free Control (RL for Optimal Value Function/Policy) Week 6: RL with Function Approximation (including Deep RL) Week 7: Policy Gradient Algorithms Week 8: Optimal Asset Allocation problem Week 9: Optimal Exercise of American Options problem Week 10: Optimal Trade Order Execution problem Ashwin Rao (Stanford) RL for Finance 17 / 19
  • 18. Sneak Peek into a couple of lectures in this course Policy Gradient Theorem and Compatible Approximation Theorem HJB Equation and Merton’s Portfolio Problem Ashwin Rao (Stanford) RL for Finance 18 / 19
  • 19. Similar Courses offered at Stanford CS 234 (Emma Brunskill - Winter 2019) AA 228/CS 238 (Mykel Kochenderfer - Autumn 2018) CS 332 (Emma Brunskill - Autumn 2018) MS&E 351 (Ben Van Roy - Winter 2019) MS&E 348 (Gerd Infanger) - Winter 2020) MS&E 338 (Ben Van Roy - Spring 2019) Ashwin Rao (Stanford) RL for Finance 19 / 19