SlideShare a Scribd company logo
1 of 20
Download to read offline
Reinforcement Learning to Mimic Portfolio Behavior
Yigal Jhirad
April 20, 2020
NVIDIA GTC 2020: Deep Learning and AI Conference
Silicon Valley
2
GTC 2020: Table of Contents
I. Reinforcement Learning
— Machine Learning Landscape
— Portfolio Mimicking Strategies
— Reinforcement Learning
— Deep Learning Networks
— Model Specification
— Monte Carlo Simulations/Generative Adversarial Network
— Mean/Variance vs. Reinforcement Learning/DQN
II. Summary
III. Author Biography
DISCLAIMER: This presentation is for information purposes only. The presenter accepts no liability for the content of
this presentation, or for the consequences of any actions taken on the basis of the information provided. Although the
information in this presentation is considered to be accurate, this is not a representation that it is complete or should be
relied upon as a sole resource, as the information contained herein is subject to change.
3
GTC 2020: Artificial Intelligence
Data: Structured/Unstructured
Asset Prices, Volatility
Fundamentals (P/E, PCE, Debt to Equity)
Macro (GDP Growth, Interest Rates, Oil prices)
Technical(Momentum)
Sentiment Analysis
Machine Learning
Unsupervised Learning
Cluster Analysis
Principal Components
Expectation Maximization
Generative Adversarial Network
Supervised Learning
Neural Networks
Support Vector Machines
Classification & Regression Trees
K-Nearest Neighbors
Regression
Reinforcement Learning
DQN
Q-Learning
Q-Matrix
Trial & Error
4
GTC 2020: Portfolio Mimicking Strategies
 Building Tracking Baskets
— Identify alternatives to gain exposure to common factors of an index
— Hedging portfolios to reduce volatility
— Replicate an investment Style
 Portfolio Construction and Replication
— Identify portfolio DNA by identifying factor exposures (e.g. Value/Growth, Large Cap vs. Small
Cap, Momentum)
— Robust tracking portfolios should have two main characteristics:
– Minimize Tracking Error
– Cointegration (Engle/Granger) to minimize drift - the tendency of portfolios to deviate over
time due to structural bias
4
5
GTC 2020: Portfolio Mimicking Strategies
 Traditional Mean/Variance optimization has limitations
— Stability
— Risk model bias
— Traditional tracking error variance minimization techniques do not explicitly embed
cointegration e.g. no guarantees that there will be a mean reverting processor and that tracking
error will be stationary
— As result the replicating portfolio will “drift” further away from the target and requires more
frequent rebalancing
 Cointegration
— Measures long term relationship and dependencies. Long Term equilibrium across asset prices.
— Error correction model
— Stationary tracking error
— Minimize cumulative and maximum performance difference (Drift) between model portfolio and
target
6
GTC 2020: Portfolio Mimicking Strategies
Drift:
Cumulative performance
difference between tracking
portfolios and target portfolio
7
GTC 2020: Reinforcement Learning
QValues
∑∂
∑∂
∑∂
∑∂
∑∂
𝑥2
𝑥1
𝑥3
𝑥4
𝑥5
Deep Q-Learning (DQN)
Q-Learning
State
Environment
Agent
Action
Policy
Rewards
Observations
𝑄 𝑠, 𝑎 𝑟 𝑠, 𝑎 + 𝛾𝑚𝑎𝑥
𝑎
𝑄 𝑠′
, 𝑎
∑∂
8
GTC 2020: Reinforcement Learning
 So what is it about mimicking portfolios that make them appropriate for RL
— Environment/State: Partially Observable/Markov Decision Process
— Data generation process based on historical data, simulated environments with shocks to
environment to reflect risk off regimes
— Agent: Portfolio Management Process
— Action: Portfolio Rebalance
— Rewards: Short term reward – minimize tracking error and long term reward to minimize drift fits
within overall Bellman Equation
— Deep Q-Network implements deep Q-learning and replaces action/state table by a neural network
and learning of the value function through backpropagation
— Dynamic Programming/Feedback Loop
GTC 2020: Reinforcement Learning
Dynamic Policy
Development
9
 The environment and state will integrate
the macro, fundamental, technical and
portfolio exposures
— Environment/State/Agent/Action/Rewards
— The Portfolio construction process (Agent)
will interact with the market environment
(Environment) updating the state (State)
and make portfolio decisions (Action) to
mimic the target portfolio over time
(Rewards)
— The actions will lead to a new portfolio
which will “react” with the current
environment
— Monte Carlo Simulation may be used to
stress the environment and randomly
simulate and perturb states
— Effectively the agent is developing a
dynamic policy that leads to rewards
— Utilize a Double DQN
10
GTC 2020: Deep Learning Networks
 Deep Q-Learning(DQN)
— Use a neural network framework to maximize rewards
— Model Free, off-policy Reinforcement Learning method
— Uses maximum Q-Value across all actions available in that state
— Value based temporal difference (TD)
 Double DQN
— Uses one DQN network for selecting and feedforwarding action and a target DQN network for
evaluating rewards
— Identify update frequency. Use target model to identify next state reward.
 Exploration vs. Exploitation
— Greedy, e-Greedy algorithm with time varying structure, Dynamic Boltzmann Softmax
11
STATE (Partially
Observable)
Inputs:
Fundamental/Macro/Technical
Factors: Price/Earnings>Momentum
Volatility/Correlations
Simulations
Volatility/Correlations
Q-Values
∑|∂
∑|∂
∑|∂
∑|∂
∑|∂
∑|∂
𝑥2
𝑥1
𝑥3
𝑥4
𝑥5
Reinforcement Learning: Deep Q-Learning and Double DQN
DQN Network
Q-Values
∑|∂
∑|∂
∑|∂
∑|∂
∑|∂
∑|∂
𝑥2
𝑥1
𝑥3
𝑥4
𝑥5
Target DQN Network
Short/Long Term
Reward
Cointegration/Drift
Periodically
Update
12
GTC 2020: Model Specification & Formulation
 Create a Target Portfolio based on specific rules and filters
— Utilize P/E, Momentum, Mean Reversion, Dividend Yield, Price to Book, Size, to build out
portfolio
— Number of securities range between 25 to 40 stocks
— Rebalance on a weekly basis to maintain consistency of factor exposures
 Construct a model portfolio based on variance minimization (the MV portfolio)
— A risk model based on a historical covariance matrix
 Construct a model portfolio based on RL, utilizing a double DQN network will attempt to replicate
exposures of this target portfolio
— The environment/state used for RL will integrate the valuation and technical factors across
securities
 Both processes can only look through to the portfolio once a month
— Mixed Integer Optimization used to limit the portfolio to no more than 20 names
 Review tracking risk and drift of these model portfolios vs. the target portfolio
13
GTC 2020: Reinforcement Learning Framework
 Reinforcement Learning Implementation
— Initialize network with random weights
— For each episode:
– For each time step
– Environment passes on state – the market conditions, portfolio positions, etc.
– Select action across a universe of stocks
– Execute action and evaluate short term reward and projected long term reward utilizing a
double DQN implementation
– The environment will “react” to the output and generate a new state
– The new state will integrate the action into its profile
– Randomly shock the environment
 Network hyperparameters
— Learning rate
— Discount Factor for long term rewards
— Incremental vs Mini-Batch module
14
GTC 2020: Monte Carlo Simulations
 Monte Carlo Simulation
— Shock Correlations/Volatilities
— Equities: Increase correlations and volatilities to simulate a risk-off regimes
— Fixed Income
– Parallel Shifts in Term Structure
– Shock Key Rate Durations
— Commodities: Shock Demand/Supply by perturbing term structure - Backwardation/Contango
— Currencies: Shock Volatilities/Correlation
 Assimilate these scenarios along side historical realized performance to complement the data generation
process
— Draw out factor exposures that may be latent variables and not very prominent due to risk
modeling risk regimes (e.g. low volatility environments)
— Drift may be an outcome of low risk environment where many factor remain latent and risk model
bias that emphasizes select factors
15
GTC 2020: Generative Adversarial Network
 Generative Adversarial Network
— Complement simulated data by using a generator and discriminator to play off against each other
to better simulate real world data
— Better absorb time varying and sequential data vs. a one time shock
— Capture long-range relationships such as the presence of volatility clusters
— Simulate stress conditions and identify the impact of shocks on tracking and drift
— Nash Equilibrium
Monte Carlo
Monte Carlo
Simulations
Predicted-Simulated
DataDiscriminator
Historical Data
Extract
GeneratorInput Data
Prices
Volatility
Correlations
Fundamentals
Technicals
Macro
Term Structure
16
Summary of Results: Mean/Variance vs. DQN
Mean/Variance, by design, has consistently lower predicted (ex-ante) tracking
error. Will this translate into lower drift?
Tracking Error
Predicted (Ex-Ante) Rolling Tracking Risk
Mean/Var Optimization vs. DQN
17
Summary of Results: Mean/Variance vs. DQN
Mixed Results with Reinforcement Learning more effective at reducing drift from
2016-2018 and Mean/Variance doing better from 2013-2016
18
Summary of Results: Mean/Variance vs. DQN
Drift:
Cumulative performance
difference between tracking
portfolios and target portfolio
19
GTC 2020: Summary
 Advantages
— Reinforcement Learning complements traditional optimization techniques to better mimic
portfolio behavior and create more robust portfolio replication solutions
— Reward structure fits nicely into optimization framework targeting variance minimization and
cointegration
— Mean/Variance optimization may help RL as a first pass in creating the initial portfolio weights
 Considerations
— Difficult to train
— Optimization/Local Minima and local convergence due to presence of non-convexity
— Computationally intensive. Time constraints.
— Apply Genetic algorithms
— Leverage CUDA
 More research needs to be done
19
20
Author Biography
 Yigal D. Jhirad, Senior Vice President, is Director of Quantitative and Derivatives Strategies
and Portfolio Manager for Cohen & Steers. Mr. Jhirad heads the firm’s Investment Risk
Committee. Prior to joining the firm in 2007, Mr. Jhirad was an executive director in the
institutional equities division of Morgan Stanley, where he headed the company’s portfolio and
derivatives strategies effort. He was responsible for developing quantitative and derivatives
products to a broad array of institutional clients. Mr. Jhirad holds a BS from the Wharton
School. He is a Financial Risk Manager (FRM), as Certified by the Global Association of Risk
Professionals.
 LinkedIn: linkedin.com/in/yigaljhirad

More Related Content

Similar to Reinforcement Learning to Mimic Portfolio Behavior

Investigation and Evaluation of Microgrid Optimization Techniques
Investigation and Evaluation of Microgrid Optimization TechniquesInvestigation and Evaluation of Microgrid Optimization Techniques
Investigation and Evaluation of Microgrid Optimization Techniques
Nevin Sawyer
 

Similar to Reinforcement Learning to Mimic Portfolio Behavior (20)

DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCYDIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
 
Implementing cost accounting tools
Implementing cost accounting tools Implementing cost accounting tools
Implementing cost accounting tools
 
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsM3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
 
Traffic Flow Prediction Using Machine Learning Algorithms
Traffic Flow Prediction Using Machine Learning AlgorithmsTraffic Flow Prediction Using Machine Learning Algorithms
Traffic Flow Prediction Using Machine Learning Algorithms
 
Stock Market Prediction Using Deep Learning
Stock Market Prediction Using Deep LearningStock Market Prediction Using Deep Learning
Stock Market Prediction Using Deep Learning
 
STOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKSSTOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKS
 
The Digital Twin For Production Optimization
The Digital Twin For Production OptimizationThe Digital Twin For Production Optimization
The Digital Twin For Production Optimization
 
AutoCon 0 Day Two Keynote: Kireeti Kompella
AutoCon 0 Day Two Keynote: Kireeti KompellaAutoCon 0 Day Two Keynote: Kireeti Kompella
AutoCon 0 Day Two Keynote: Kireeti Kompella
 
Predictive Learning of Factor Based Strategies using Deep Neural Networks for...
Predictive Learning of Factor Based Strategies using Deep Neural Networks for...Predictive Learning of Factor Based Strategies using Deep Neural Networks for...
Predictive Learning of Factor Based Strategies using Deep Neural Networks for...
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
A factorial study of neural network learning from differences for regression
A factorial study of neural network learning from  differences for regressionA factorial study of neural network learning from  differences for regression
A factorial study of neural network learning from differences for regression
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
Investigation and Evaluation of Microgrid Optimization Techniques
Investigation and Evaluation of Microgrid Optimization TechniquesInvestigation and Evaluation of Microgrid Optimization Techniques
Investigation and Evaluation of Microgrid Optimization Techniques
 
Service Management: Forecasting Hydrogen Demand
Service Management: Forecasting Hydrogen DemandService Management: Forecasting Hydrogen Demand
Service Management: Forecasting Hydrogen Demand
 
IRJET- Self Driving Car using Deep Q-Learning
IRJET-  	  Self Driving Car using Deep Q-LearningIRJET-  	  Self Driving Car using Deep Q-Learning
IRJET- Self Driving Car using Deep Q-Learning
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Predictive Analytics for Alpha Generation and Risk Management
Predictive Analytics for Alpha Generation and Risk ManagementPredictive Analytics for Alpha Generation and Risk Management
Predictive Analytics for Alpha Generation and Risk Management
 
EAD Parameter : A stochastic way to model the Credit Conversion Factor
EAD Parameter : A stochastic way to model the Credit Conversion FactorEAD Parameter : A stochastic way to model the Credit Conversion Factor
EAD Parameter : A stochastic way to model the Credit Conversion Factor
 

Recently uploaded

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 

Recently uploaded (20)

Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 

Reinforcement Learning to Mimic Portfolio Behavior

  • 1. Reinforcement Learning to Mimic Portfolio Behavior Yigal Jhirad April 20, 2020 NVIDIA GTC 2020: Deep Learning and AI Conference Silicon Valley
  • 2. 2 GTC 2020: Table of Contents I. Reinforcement Learning — Machine Learning Landscape — Portfolio Mimicking Strategies — Reinforcement Learning — Deep Learning Networks — Model Specification — Monte Carlo Simulations/Generative Adversarial Network — Mean/Variance vs. Reinforcement Learning/DQN II. Summary III. Author Biography DISCLAIMER: This presentation is for information purposes only. The presenter accepts no liability for the content of this presentation, or for the consequences of any actions taken on the basis of the information provided. Although the information in this presentation is considered to be accurate, this is not a representation that it is complete or should be relied upon as a sole resource, as the information contained herein is subject to change.
  • 3. 3 GTC 2020: Artificial Intelligence Data: Structured/Unstructured Asset Prices, Volatility Fundamentals (P/E, PCE, Debt to Equity) Macro (GDP Growth, Interest Rates, Oil prices) Technical(Momentum) Sentiment Analysis Machine Learning Unsupervised Learning Cluster Analysis Principal Components Expectation Maximization Generative Adversarial Network Supervised Learning Neural Networks Support Vector Machines Classification & Regression Trees K-Nearest Neighbors Regression Reinforcement Learning DQN Q-Learning Q-Matrix Trial & Error
  • 4. 4 GTC 2020: Portfolio Mimicking Strategies  Building Tracking Baskets — Identify alternatives to gain exposure to common factors of an index — Hedging portfolios to reduce volatility — Replicate an investment Style  Portfolio Construction and Replication — Identify portfolio DNA by identifying factor exposures (e.g. Value/Growth, Large Cap vs. Small Cap, Momentum) — Robust tracking portfolios should have two main characteristics: – Minimize Tracking Error – Cointegration (Engle/Granger) to minimize drift - the tendency of portfolios to deviate over time due to structural bias 4
  • 5. 5 GTC 2020: Portfolio Mimicking Strategies  Traditional Mean/Variance optimization has limitations — Stability — Risk model bias — Traditional tracking error variance minimization techniques do not explicitly embed cointegration e.g. no guarantees that there will be a mean reverting processor and that tracking error will be stationary — As result the replicating portfolio will “drift” further away from the target and requires more frequent rebalancing  Cointegration — Measures long term relationship and dependencies. Long Term equilibrium across asset prices. — Error correction model — Stationary tracking error — Minimize cumulative and maximum performance difference (Drift) between model portfolio and target
  • 6. 6 GTC 2020: Portfolio Mimicking Strategies Drift: Cumulative performance difference between tracking portfolios and target portfolio
  • 7. 7 GTC 2020: Reinforcement Learning QValues ∑∂ ∑∂ ∑∂ ∑∂ ∑∂ 𝑥2 𝑥1 𝑥3 𝑥4 𝑥5 Deep Q-Learning (DQN) Q-Learning State Environment Agent Action Policy Rewards Observations 𝑄 𝑠, 𝑎 𝑟 𝑠, 𝑎 + 𝛾𝑚𝑎𝑥 𝑎 𝑄 𝑠′ , 𝑎 ∑∂
  • 8. 8 GTC 2020: Reinforcement Learning  So what is it about mimicking portfolios that make them appropriate for RL — Environment/State: Partially Observable/Markov Decision Process — Data generation process based on historical data, simulated environments with shocks to environment to reflect risk off regimes — Agent: Portfolio Management Process — Action: Portfolio Rebalance — Rewards: Short term reward – minimize tracking error and long term reward to minimize drift fits within overall Bellman Equation — Deep Q-Network implements deep Q-learning and replaces action/state table by a neural network and learning of the value function through backpropagation — Dynamic Programming/Feedback Loop
  • 9. GTC 2020: Reinforcement Learning Dynamic Policy Development 9  The environment and state will integrate the macro, fundamental, technical and portfolio exposures — Environment/State/Agent/Action/Rewards — The Portfolio construction process (Agent) will interact with the market environment (Environment) updating the state (State) and make portfolio decisions (Action) to mimic the target portfolio over time (Rewards) — The actions will lead to a new portfolio which will “react” with the current environment — Monte Carlo Simulation may be used to stress the environment and randomly simulate and perturb states — Effectively the agent is developing a dynamic policy that leads to rewards — Utilize a Double DQN
  • 10. 10 GTC 2020: Deep Learning Networks  Deep Q-Learning(DQN) — Use a neural network framework to maximize rewards — Model Free, off-policy Reinforcement Learning method — Uses maximum Q-Value across all actions available in that state — Value based temporal difference (TD)  Double DQN — Uses one DQN network for selecting and feedforwarding action and a target DQN network for evaluating rewards — Identify update frequency. Use target model to identify next state reward.  Exploration vs. Exploitation — Greedy, e-Greedy algorithm with time varying structure, Dynamic Boltzmann Softmax
  • 11. 11 STATE (Partially Observable) Inputs: Fundamental/Macro/Technical Factors: Price/Earnings>Momentum Volatility/Correlations Simulations Volatility/Correlations Q-Values ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ 𝑥2 𝑥1 𝑥3 𝑥4 𝑥5 Reinforcement Learning: Deep Q-Learning and Double DQN DQN Network Q-Values ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ 𝑥2 𝑥1 𝑥3 𝑥4 𝑥5 Target DQN Network Short/Long Term Reward Cointegration/Drift Periodically Update
  • 12. 12 GTC 2020: Model Specification & Formulation  Create a Target Portfolio based on specific rules and filters — Utilize P/E, Momentum, Mean Reversion, Dividend Yield, Price to Book, Size, to build out portfolio — Number of securities range between 25 to 40 stocks — Rebalance on a weekly basis to maintain consistency of factor exposures  Construct a model portfolio based on variance minimization (the MV portfolio) — A risk model based on a historical covariance matrix  Construct a model portfolio based on RL, utilizing a double DQN network will attempt to replicate exposures of this target portfolio — The environment/state used for RL will integrate the valuation and technical factors across securities  Both processes can only look through to the portfolio once a month — Mixed Integer Optimization used to limit the portfolio to no more than 20 names  Review tracking risk and drift of these model portfolios vs. the target portfolio
  • 13. 13 GTC 2020: Reinforcement Learning Framework  Reinforcement Learning Implementation — Initialize network with random weights — For each episode: – For each time step – Environment passes on state – the market conditions, portfolio positions, etc. – Select action across a universe of stocks – Execute action and evaluate short term reward and projected long term reward utilizing a double DQN implementation – The environment will “react” to the output and generate a new state – The new state will integrate the action into its profile – Randomly shock the environment  Network hyperparameters — Learning rate — Discount Factor for long term rewards — Incremental vs Mini-Batch module
  • 14. 14 GTC 2020: Monte Carlo Simulations  Monte Carlo Simulation — Shock Correlations/Volatilities — Equities: Increase correlations and volatilities to simulate a risk-off regimes — Fixed Income – Parallel Shifts in Term Structure – Shock Key Rate Durations — Commodities: Shock Demand/Supply by perturbing term structure - Backwardation/Contango — Currencies: Shock Volatilities/Correlation  Assimilate these scenarios along side historical realized performance to complement the data generation process — Draw out factor exposures that may be latent variables and not very prominent due to risk modeling risk regimes (e.g. low volatility environments) — Drift may be an outcome of low risk environment where many factor remain latent and risk model bias that emphasizes select factors
  • 15. 15 GTC 2020: Generative Adversarial Network  Generative Adversarial Network — Complement simulated data by using a generator and discriminator to play off against each other to better simulate real world data — Better absorb time varying and sequential data vs. a one time shock — Capture long-range relationships such as the presence of volatility clusters — Simulate stress conditions and identify the impact of shocks on tracking and drift — Nash Equilibrium Monte Carlo Monte Carlo Simulations Predicted-Simulated DataDiscriminator Historical Data Extract GeneratorInput Data Prices Volatility Correlations Fundamentals Technicals Macro Term Structure
  • 16. 16 Summary of Results: Mean/Variance vs. DQN Mean/Variance, by design, has consistently lower predicted (ex-ante) tracking error. Will this translate into lower drift? Tracking Error Predicted (Ex-Ante) Rolling Tracking Risk Mean/Var Optimization vs. DQN
  • 17. 17 Summary of Results: Mean/Variance vs. DQN Mixed Results with Reinforcement Learning more effective at reducing drift from 2016-2018 and Mean/Variance doing better from 2013-2016
  • 18. 18 Summary of Results: Mean/Variance vs. DQN Drift: Cumulative performance difference between tracking portfolios and target portfolio
  • 19. 19 GTC 2020: Summary  Advantages — Reinforcement Learning complements traditional optimization techniques to better mimic portfolio behavior and create more robust portfolio replication solutions — Reward structure fits nicely into optimization framework targeting variance minimization and cointegration — Mean/Variance optimization may help RL as a first pass in creating the initial portfolio weights  Considerations — Difficult to train — Optimization/Local Minima and local convergence due to presence of non-convexity — Computationally intensive. Time constraints. — Apply Genetic algorithms — Leverage CUDA  More research needs to be done 19
  • 20. 20 Author Biography  Yigal D. Jhirad, Senior Vice President, is Director of Quantitative and Derivatives Strategies and Portfolio Manager for Cohen & Steers. Mr. Jhirad heads the firm’s Investment Risk Committee. Prior to joining the firm in 2007, Mr. Jhirad was an executive director in the institutional equities division of Morgan Stanley, where he headed the company’s portfolio and derivatives strategies effort. He was responsible for developing quantitative and derivatives products to a broad array of institutional clients. Mr. Jhirad holds a BS from the Wharton School. He is a Financial Risk Manager (FRM), as Certified by the Global Association of Risk Professionals.  LinkedIn: linkedin.com/in/yigaljhirad