SlideShare a Scribd company logo
Applying reinforcement learning to economics 
Neal Hughes 
Australian National University 
neal.hughes@anu.edu.au 
November 17, 2014 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23
Machine learning 
Machine learning 
I algorithms that `learn' from data, i.e., build models from data with 
minimal theory / human involvement. 
I goes hand in hand with `Big Data' 
Supervised Learning 
I estimating functions mapping `input` variables X to `target' variables Y. 
I aka non-parametric regression 
Reinforcement learning 
I learning to make optimal (reward maximising) decisions in dynamic 
environments: learning optimal policy functions for Markov Decision 
Processes (MDPs) 
I aka approximate dynamic programming 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23
Reinforcement learning 
Agent 
Reward, rt Action, at 
Environment 
State, st 
st+1 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23
A (single agent) water storage problem 
Inflow, It+1 
Release point, F1t 
Storage, St 
Demand node 
1 
Extraction, Et 
Extraction point, F2t 
End of system, F3t 
2 
3 
Return flow, Rt 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23
A (single agent) water storage problem 
max 
fWtgt=¥ 
t=0 
E 
( 
¥å 
t=0 
btP(Qt , It ) 
) 
Subject to: 
St+1 = minfSt Wt  d0aS2/3 
t + It+1, Kg 
0  Wt  St 
Qt  maxf(1  d1b)Wt  d1a, 0g 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23
Why reinforcement learning? 
0 200 400 600 800 1000 
Storage (GL) 
2000 
1500 
1000 
500 
0 
Inflow (GL) 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 6 / 23
The Q function 
The standard Bellman equation with state value function V (s) 
V (s) = max 
a 
 
R(s, a) + b 
Z 
S 
T(s, a, s0)V (s0) ds0 
 
The Bellman equation with action-value function Q(a, s) 
Q(a, s) = R(s, a) + b 
Z 
S 
T(s, a, s0) max 
a 
Q(a, s0) ds0 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23
Fitted Q Iteration 
Algorithm 1: Fitted Q Iteration 
1 initialise s0 
2 Run a simulation with exploration for T periods 
3 Store the samples fat , st , st+1, rtgTt 
=0 
4 initialise Q(at , st ) 
5 repeat // Iterate until convergence 
6 for t = 0 to T do 
7 set ˆQ 
t = rt + b. maxa .Q(a, st+1) 
8 end 
9 estimate Q by regressing ˆQ 
t against (at , st ) 
10 until a stopping rule is satis
ed; 
With large dense data, computing maxa Q(a, .) for each point is wasteful 
Alternative: max over a sample of points and
t a value function (Fitted 
Q-V iteration) 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23
Single agent reinforcement learning 
Figure : An approximately equidistant grid in two dimensions 
4 
3 
2 
1 
0 
−1 
−2 
−3 
−4 −3 −2 −1 0 1 2 3 4 −4 
(a) 10000 iid standard normal points 
4 
3 
2 
1 
0 
−1 
−2 
−3 
−4 −3 −2 −1 0 1 2 3 4 −4 
(b) 100 points at least 0.4 apart 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23
Tilecoding 
input space 
tiling layer 1 
tiling layer 2 
input point Xt 
activated tile, layer 1 
activated tile, layer 2 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23
Single
ne grid 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0.0 
0.0 0.2 0.4 0.6 0.8 1.0 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 11 / 23
Single chunky grid 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0.0 
0.0 0.2 0.4 0.6 0.8 1.0 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 12 / 23
Tilecoding: many chunky grids 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0.0 
0.0 0.2 0.4 0.6 0.8 1.0 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 13 / 23
Tilecoding 
Fitting 
Averaging 
Averages Stochastic Gradient Descent 
Setup 
Regular grids 
`Optimal' displacement vectors 
Linear extrapolation 
Implementation 
Cython with OpenMP 
Perfect `hashing' 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23
A test case 
10000 20000 30000 40000 50000 60000 70000 80000 
Number of samples 
1.000 
0.999 
0.998 
0.997 
0.996 
0.995 
0.994 
0.993 
Social welfare as percentage of SDP 
SDP TC-A TC-ASGD 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23
A test case 
Table : Computation time 
5000 10000 20000 50000 80000 
SDP 6.6 7.2 7.5 7.4 7.4 
TC-A 0.4 0.4 0.5 0.6 0.8 
TC-ASGD 0.4 0.6 0.9 1.3 1.9 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23
Multi agent problems 
Nash equilibrium concepts for stochastic games (Economics) 
Markov Perfect Equilibrium 
Oblivious Equilibrium 
Learning in games (Economics) 
Factious play 
Partial best response dynamic 
Multi-agent learning (Computer Science / Economics) 
each agent follows a single agent RL method 
or we combine RL with game theory / equilibrium concepts 
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23
Multi-agent
tted Q-V iteration 
Each agent follows a
tted Q-V iteration algorithm except... 
I only a sample of agents update their policies each stage 
(similar to partial best response) 
I each new batch of samples is blended with the existing batch of samples 
(similar to

More Related Content

Viewers also liked

The paradox of specialisation: Technological expansion and economic stagnation
The paradox of specialisation: Technological expansion and economic stagnationThe paradox of specialisation: Technological expansion and economic stagnation
The paradox of specialisation: Technological expansion and economic stagnation
anucrawfordphd
 
Fiscal decentralisation and economic growth: evidence from Vietnam
Fiscal decentralisation and economic growth: evidence from VietnamFiscal decentralisation and economic growth: evidence from Vietnam
Fiscal decentralisation and economic growth: evidence from Vietnam
anucrawfordphd
 
Land reforms, labor allocation and economic diversity: evidence from Vietnam ...
Land reforms, labor allocation and economic diversity: evidence from Vietnam ...Land reforms, labor allocation and economic diversity: evidence from Vietnam ...
Land reforms, labor allocation and economic diversity: evidence from Vietnam ...
anucrawfordphd
 
Optimal regulatory regime and competition
Optimal regulatory regime and competitionOptimal regulatory regime and competition
Optimal regulatory regime and competition
anucrawfordphd
 
Could order and ambition emerge from the fragmented climate governance complex?
Could order and ambition emerge from the fragmented climate governance complex?Could order and ambition emerge from the fragmented climate governance complex?
Could order and ambition emerge from the fragmented climate governance complex?
anucrawfordphd
 
Water affordability and state water concessions in Australia
Water affordability and state water concessions in AustraliaWater affordability and state water concessions in Australia
Water affordability and state water concessions in Australia
anucrawfordphd
 
Giving rights to nature: A new institutional approach for overcoming social d...
Giving rights to nature: A new institutional approach for overcoming social d...Giving rights to nature: A new institutional approach for overcoming social d...
Giving rights to nature: A new institutional approach for overcoming social d...
anucrawfordphd
 
Facing our demons: Do mindfulness skills help people deal with failure at work?
Facing our demons: Do mindfulness skills help people deal with failure at work?Facing our demons: Do mindfulness skills help people deal with failure at work?
Facing our demons: Do mindfulness skills help people deal with failure at work?
anucrawfordphd
 
Small states, big effects? Oil price shocks and economic growth in small isla...
Small states, big effects? Oil price shocks and economic growth in small isla...Small states, big effects? Oil price shocks and economic growth in small isla...
Small states, big effects? Oil price shocks and economic growth in small isla...
anucrawfordphd
 
Mental health and disengaged youth
Mental health and disengaged youthMental health and disengaged youth
Mental health and disengaged youth
anucrawfordphd
 
Global public goods and coalition formation under matching mechanisms (discus...
Global public goods and coalition formation under matching mechanisms (discus...Global public goods and coalition formation under matching mechanisms (discus...
Global public goods and coalition formation under matching mechanisms (discus...
anucrawfordphd
 
Global public goods and coalition formation under matching mechanisms
Global public goods and coalition formation under matching mechanismsGlobal public goods and coalition formation under matching mechanisms
Global public goods and coalition formation under matching mechanisms
anucrawfordphd
 
‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...
‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...
‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...
anucrawfordphd
 
Revenue efforts in mineral producing districts In Indonesia: is there a resou...
Revenue efforts in mineral producing districts In Indonesia: is there a resou...Revenue efforts in mineral producing districts In Indonesia: is there a resou...
Revenue efforts in mineral producing districts In Indonesia: is there a resou...
anucrawfordphd
 
Marital assimilation of Central Java people in separate destinations: Investi...
Marital assimilation of Central Java people in separate destinations: Investi...Marital assimilation of Central Java people in separate destinations: Investi...
Marital assimilation of Central Java people in separate destinations: Investi...
anucrawfordphd
 
Where big data meets no data
Where big data meets no dataWhere big data meets no data
Where big data meets no data
anucrawfordphd
 
Shining a light on the Indonesian oil palm and development debate with big data
Shining a light on the Indonesian oil palm and development debate with big dataShining a light on the Indonesian oil palm and development debate with big data
Shining a light on the Indonesian oil palm and development debate with big data
anucrawfordphd
 
Domestic sources of Japanese foreign policy
Domestic sources of Japanese foreign policyDomestic sources of Japanese foreign policy
Domestic sources of Japanese foreign policy
anucrawfordphd
 
Mental health and disengaged youth (discussant paper)
Mental health and disengaged youth (discussant paper)Mental health and disengaged youth (discussant paper)
Mental health and disengaged youth (discussant paper)
anucrawfordphd
 
Trade liberalisation, povery and equality in Indonesia
Trade liberalisation, povery and equality in IndonesiaTrade liberalisation, povery and equality in Indonesia
Trade liberalisation, povery and equality in Indonesia
anucrawfordphd
 

Viewers also liked (20)

The paradox of specialisation: Technological expansion and economic stagnation
The paradox of specialisation: Technological expansion and economic stagnationThe paradox of specialisation: Technological expansion and economic stagnation
The paradox of specialisation: Technological expansion and economic stagnation
 
Fiscal decentralisation and economic growth: evidence from Vietnam
Fiscal decentralisation and economic growth: evidence from VietnamFiscal decentralisation and economic growth: evidence from Vietnam
Fiscal decentralisation and economic growth: evidence from Vietnam
 
Land reforms, labor allocation and economic diversity: evidence from Vietnam ...
Land reforms, labor allocation and economic diversity: evidence from Vietnam ...Land reforms, labor allocation and economic diversity: evidence from Vietnam ...
Land reforms, labor allocation and economic diversity: evidence from Vietnam ...
 
Optimal regulatory regime and competition
Optimal regulatory regime and competitionOptimal regulatory regime and competition
Optimal regulatory regime and competition
 
Could order and ambition emerge from the fragmented climate governance complex?
Could order and ambition emerge from the fragmented climate governance complex?Could order and ambition emerge from the fragmented climate governance complex?
Could order and ambition emerge from the fragmented climate governance complex?
 
Water affordability and state water concessions in Australia
Water affordability and state water concessions in AustraliaWater affordability and state water concessions in Australia
Water affordability and state water concessions in Australia
 
Giving rights to nature: A new institutional approach for overcoming social d...
Giving rights to nature: A new institutional approach for overcoming social d...Giving rights to nature: A new institutional approach for overcoming social d...
Giving rights to nature: A new institutional approach for overcoming social d...
 
Facing our demons: Do mindfulness skills help people deal with failure at work?
Facing our demons: Do mindfulness skills help people deal with failure at work?Facing our demons: Do mindfulness skills help people deal with failure at work?
Facing our demons: Do mindfulness skills help people deal with failure at work?
 
Small states, big effects? Oil price shocks and economic growth in small isla...
Small states, big effects? Oil price shocks and economic growth in small isla...Small states, big effects? Oil price shocks and economic growth in small isla...
Small states, big effects? Oil price shocks and economic growth in small isla...
 
Mental health and disengaged youth
Mental health and disengaged youthMental health and disengaged youth
Mental health and disengaged youth
 
Global public goods and coalition formation under matching mechanisms (discus...
Global public goods and coalition formation under matching mechanisms (discus...Global public goods and coalition formation under matching mechanisms (discus...
Global public goods and coalition formation under matching mechanisms (discus...
 
Global public goods and coalition formation under matching mechanisms
Global public goods and coalition formation under matching mechanismsGlobal public goods and coalition formation under matching mechanisms
Global public goods and coalition formation under matching mechanisms
 
‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...
‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...
‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...
 
Revenue efforts in mineral producing districts In Indonesia: is there a resou...
Revenue efforts in mineral producing districts In Indonesia: is there a resou...Revenue efforts in mineral producing districts In Indonesia: is there a resou...
Revenue efforts in mineral producing districts In Indonesia: is there a resou...
 
Marital assimilation of Central Java people in separate destinations: Investi...
Marital assimilation of Central Java people in separate destinations: Investi...Marital assimilation of Central Java people in separate destinations: Investi...
Marital assimilation of Central Java people in separate destinations: Investi...
 
Where big data meets no data
Where big data meets no dataWhere big data meets no data
Where big data meets no data
 
Shining a light on the Indonesian oil palm and development debate with big data
Shining a light on the Indonesian oil palm and development debate with big dataShining a light on the Indonesian oil palm and development debate with big data
Shining a light on the Indonesian oil palm and development debate with big data
 
Domestic sources of Japanese foreign policy
Domestic sources of Japanese foreign policyDomestic sources of Japanese foreign policy
Domestic sources of Japanese foreign policy
 
Mental health and disengaged youth (discussant paper)
Mental health and disengaged youth (discussant paper)Mental health and disengaged youth (discussant paper)
Mental health and disengaged youth (discussant paper)
 
Trade liberalisation, povery and equality in Indonesia
Trade liberalisation, povery and equality in IndonesiaTrade liberalisation, povery and equality in Indonesia
Trade liberalisation, povery and equality in Indonesia
 

Similar to Applying reinforcement learning to single and multi-agent economic problems

Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Philippe Laborie
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
The Statistical and Applied Mathematical Sciences Institute
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
Cigdem Aslay
 
Classification
ClassificationClassification
Classification
Arthur Charpentier
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Guillaume Costeseque
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Daniel Valcarce
 
Side 2019, part 2
Side 2019, part 2Side 2019, part 2
Side 2019, part 2
Arthur Charpentier
 
Damiano Pasetto
Damiano PasettoDamiano Pasetto
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
Albert Bifet
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
Necip Oguz Serbetci
 
Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...
Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...
Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...
Duy Tân Nguyễn
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithms
Seonho Park
 
1
11
Chapter 00 Introduction Operational research
Chapter 00 Introduction Operational researchChapter 00 Introduction Operational research
Chapter 00 Introduction Operational research
MariaSarwat
 
A brief introduction to Gaussian process
A brief introduction to Gaussian processA brief introduction to Gaussian process
A brief introduction to Gaussian process
Eric Xihui Lin
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIM
Jinseob Kim
 
An energy-efficient flow shop scheduling using hybrid Harris hawks optimization
An energy-efficient flow shop scheduling using hybrid Harris hawks optimizationAn energy-efficient flow shop scheduling using hybrid Harris hawks optimization
An energy-efficient flow shop scheduling using hybrid Harris hawks optimization
journalBEEI
 
How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?
contenidos-ort
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
VisionGEOMATIQUE2014
 

Similar to Applying reinforcement learning to single and multi-agent economic problems (20)

Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
 
Classification
ClassificationClassification
Classification
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Side 2019, part 2
Side 2019, part 2Side 2019, part 2
Side 2019, part 2
 
Damiano Pasetto
Damiano PasettoDamiano Pasetto
Damiano Pasetto
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...
Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...
Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithms
 
1
11
1
 
Chapter 00 Introduction Operational research
Chapter 00 Introduction Operational researchChapter 00 Introduction Operational research
Chapter 00 Introduction Operational research
 
A brief introduction to Gaussian process
A brief introduction to Gaussian processA brief introduction to Gaussian process
A brief introduction to Gaussian process
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIM
 
An energy-efficient flow shop scheduling using hybrid Harris hawks optimization
An energy-efficient flow shop scheduling using hybrid Harris hawks optimizationAn energy-efficient flow shop scheduling using hybrid Harris hawks optimization
An energy-efficient flow shop scheduling using hybrid Harris hawks optimization
 
How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
 
pres06-main
pres06-mainpres06-main
pres06-main
 

More from anucrawfordphd

Land reforms, labor allocation and economic diversity: evidence from Vietnam
Land reforms, labor allocation and economic diversity: evidence from VietnamLand reforms, labor allocation and economic diversity: evidence from Vietnam
Land reforms, labor allocation and economic diversity: evidence from Vietnam
anucrawfordphd
 
Open access spatial data for effective disaster risk reduction
Open access spatial data for effective disaster risk reductionOpen access spatial data for effective disaster risk reduction
Open access spatial data for effective disaster risk reduction
anucrawfordphd
 
  The impact of a large rice price increase on welfare and poverty in Bangl...
  The impact of a large rice price increase on welfare and poverty in Bangl...  The impact of a large rice price increase on welfare and poverty in Bangl...
  The impact of a large rice price increase on welfare and poverty in Bangl...anucrawfordphd
 
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...anucrawfordphd
 
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysis
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data AnalysisDoes Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysis
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysisanucrawfordphd
 
Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...
Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...
Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...anucrawfordphd
 
Pareto Improvements under Matching Mechanisms in a Public Good Economy
Pareto Improvements under Matching Mechanisms in a Public Good EconomyPareto Improvements under Matching Mechanisms in a Public Good Economy
Pareto Improvements under Matching Mechanisms in a Public Good Economyanucrawfordphd
 
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...anucrawfordphd
 
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...anucrawfordphd
 
Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011
Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011
Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011anucrawfordphd
 

More from anucrawfordphd (10)

Land reforms, labor allocation and economic diversity: evidence from Vietnam
Land reforms, labor allocation and economic diversity: evidence from VietnamLand reforms, labor allocation and economic diversity: evidence from Vietnam
Land reforms, labor allocation and economic diversity: evidence from Vietnam
 
Open access spatial data for effective disaster risk reduction
Open access spatial data for effective disaster risk reductionOpen access spatial data for effective disaster risk reduction
Open access spatial data for effective disaster risk reduction
 
  The impact of a large rice price increase on welfare and poverty in Bangl...
  The impact of a large rice price increase on welfare and poverty in Bangl...  The impact of a large rice price increase on welfare and poverty in Bangl...
  The impact of a large rice price increase on welfare and poverty in Bangl...
 
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysi...
 
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysis
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data AnalysisDoes Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysis
Does Humanitarian Aid Crowd Out Development Aid? A Dynamic Panel Data Analysis
 
Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...
Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...
Pareto Improvements under Matching Mechanisms in a Public Good Economy (discu...
 
Pareto Improvements under Matching Mechanisms in a Public Good Economy
Pareto Improvements under Matching Mechanisms in a Public Good EconomyPareto Improvements under Matching Mechanisms in a Public Good Economy
Pareto Improvements under Matching Mechanisms in a Public Good Economy
 
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
 
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
Output Composition of the Monetary Policy Transmission Mechanism: Is Australi...
 
Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011
Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011
Inward workers' remittances and real exchange rates in South Asia, 1980 - 2011
 

Recently uploaded

一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理
一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理
一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理
ydubwyt
 
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
Falcon Invoice Discounting
 
what is the best method to sell pi coins in 2024
what is the best method to sell pi coins in 2024what is the best method to sell pi coins in 2024
what is the best method to sell pi coins in 2024
DOT TECH
 
PF-Wagner's Theory of Public Expenditure.pptx
PF-Wagner's Theory of Public Expenditure.pptxPF-Wagner's Theory of Public Expenditure.pptx
PF-Wagner's Theory of Public Expenditure.pptx
GunjanSharma28848
 
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...
beulahfernandes8
 
655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf
morearsh02
 
What website can I sell pi coins securely.
What website can I sell pi coins securely.What website can I sell pi coins securely.
What website can I sell pi coins securely.
DOT TECH
 
Summary of financial results for 1Q2024
Summary of financial  results for 1Q2024Summary of financial  results for 1Q2024
Summary of financial results for 1Q2024
InterCars
 
Isios-2024-Professional-Independent-Trustee-Survey.pdf
Isios-2024-Professional-Independent-Trustee-Survey.pdfIsios-2024-Professional-Independent-Trustee-Survey.pdf
Isios-2024-Professional-Independent-Trustee-Survey.pdf
Henry Tapper
 
Scope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theoriesScope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theories
nomankalyar153
 
Introduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.pptIntroduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.ppt
VishnuVenugopal84
 
Introduction to Indian Financial System ()
Introduction to Indian Financial System ()Introduction to Indian Financial System ()
Introduction to Indian Financial System ()
Avanish Goel
 
how can I sell/buy bulk pi coins securely
how can I sell/buy bulk pi coins securelyhow can I sell/buy bulk pi coins securely
how can I sell/buy bulk pi coins securely
DOT TECH
 
how to sell pi coins on Binance exchange
how to sell pi coins on Binance exchangehow to sell pi coins on Binance exchange
how to sell pi coins on Binance exchange
DOT TECH
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
DOT TECH
 
What price will pi network be listed on exchanges
What price will pi network be listed on exchangesWhat price will pi network be listed on exchanges
What price will pi network be listed on exchanges
DOT TECH
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
DOT TECH
 
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit CardPoonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
nickysharmasucks
 
The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.
DOT TECH
 
USDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptxUSDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptx
marketing367770
 

Recently uploaded (20)

一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理
一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理
一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理
 
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
 
what is the best method to sell pi coins in 2024
what is the best method to sell pi coins in 2024what is the best method to sell pi coins in 2024
what is the best method to sell pi coins in 2024
 
PF-Wagner's Theory of Public Expenditure.pptx
PF-Wagner's Theory of Public Expenditure.pptxPF-Wagner's Theory of Public Expenditure.pptx
PF-Wagner's Theory of Public Expenditure.pptx
 
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...
 
655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf
 
What website can I sell pi coins securely.
What website can I sell pi coins securely.What website can I sell pi coins securely.
What website can I sell pi coins securely.
 
Summary of financial results for 1Q2024
Summary of financial  results for 1Q2024Summary of financial  results for 1Q2024
Summary of financial results for 1Q2024
 
Isios-2024-Professional-Independent-Trustee-Survey.pdf
Isios-2024-Professional-Independent-Trustee-Survey.pdfIsios-2024-Professional-Independent-Trustee-Survey.pdf
Isios-2024-Professional-Independent-Trustee-Survey.pdf
 
Scope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theoriesScope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theories
 
Introduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.pptIntroduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.ppt
 
Introduction to Indian Financial System ()
Introduction to Indian Financial System ()Introduction to Indian Financial System ()
Introduction to Indian Financial System ()
 
how can I sell/buy bulk pi coins securely
how can I sell/buy bulk pi coins securelyhow can I sell/buy bulk pi coins securely
how can I sell/buy bulk pi coins securely
 
how to sell pi coins on Binance exchange
how to sell pi coins on Binance exchangehow to sell pi coins on Binance exchange
how to sell pi coins on Binance exchange
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
 
What price will pi network be listed on exchanges
What price will pi network be listed on exchangesWhat price will pi network be listed on exchanges
What price will pi network be listed on exchanges
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
 
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit CardPoonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
 
The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.The secret way to sell pi coins effortlessly.
The secret way to sell pi coins effortlessly.
 
USDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptxUSDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptx
 

Applying reinforcement learning to single and multi-agent economic problems

  • 1. Applying reinforcement learning to economics Neal Hughes Australian National University neal.hughes@anu.edu.au November 17, 2014 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23
  • 2. Machine learning Machine learning I algorithms that `learn' from data, i.e., build models from data with minimal theory / human involvement. I goes hand in hand with `Big Data' Supervised Learning I estimating functions mapping `input` variables X to `target' variables Y. I aka non-parametric regression Reinforcement learning I learning to make optimal (reward maximising) decisions in dynamic environments: learning optimal policy functions for Markov Decision Processes (MDPs) I aka approximate dynamic programming Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23
  • 3. Reinforcement learning Agent Reward, rt Action, at Environment State, st st+1 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23
  • 4. A (single agent) water storage problem Inflow, It+1 Release point, F1t Storage, St Demand node 1 Extraction, Et Extraction point, F2t End of system, F3t 2 3 Return flow, Rt Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23
  • 5. A (single agent) water storage problem max fWtgt=¥ t=0 E ( ¥å t=0 btP(Qt , It ) ) Subject to: St+1 = minfSt Wt d0aS2/3 t + It+1, Kg 0 Wt St Qt maxf(1 d1b)Wt d1a, 0g Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23
  • 6. Why reinforcement learning? 0 200 400 600 800 1000 Storage (GL) 2000 1500 1000 500 0 Inflow (GL) Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 6 / 23
  • 7. The Q function The standard Bellman equation with state value function V (s) V (s) = max a R(s, a) + b Z S T(s, a, s0)V (s0) ds0 The Bellman equation with action-value function Q(a, s) Q(a, s) = R(s, a) + b Z S T(s, a, s0) max a Q(a, s0) ds0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23
  • 8. Fitted Q Iteration Algorithm 1: Fitted Q Iteration 1 initialise s0 2 Run a simulation with exploration for T periods 3 Store the samples fat , st , st+1, rtgTt =0 4 initialise Q(at , st ) 5 repeat // Iterate until convergence 6 for t = 0 to T do 7 set ˆQ t = rt + b. maxa .Q(a, st+1) 8 end 9 estimate Q by regressing ˆQ t against (at , st ) 10 until a stopping rule is satis
  • 9. ed; With large dense data, computing maxa Q(a, .) for each point is wasteful Alternative: max over a sample of points and
  • 10. t a value function (Fitted Q-V iteration) Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23
  • 11. Single agent reinforcement learning Figure : An approximately equidistant grid in two dimensions 4 3 2 1 0 −1 −2 −3 −4 −3 −2 −1 0 1 2 3 4 −4 (a) 10000 iid standard normal points 4 3 2 1 0 −1 −2 −3 −4 −3 −2 −1 0 1 2 3 4 −4 (b) 100 points at least 0.4 apart Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23
  • 12. Tilecoding input space tiling layer 1 tiling layer 2 input point Xt activated tile, layer 1 activated tile, layer 2 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23
  • 14. ne grid 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 11 / 23
  • 15. Single chunky grid 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 12 / 23
  • 16. Tilecoding: many chunky grids 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 13 / 23
  • 17. Tilecoding Fitting Averaging Averages Stochastic Gradient Descent Setup Regular grids `Optimal' displacement vectors Linear extrapolation Implementation Cython with OpenMP Perfect `hashing' Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23
  • 18. A test case 10000 20000 30000 40000 50000 60000 70000 80000 Number of samples 1.000 0.999 0.998 0.997 0.996 0.995 0.994 0.993 Social welfare as percentage of SDP SDP TC-A TC-ASGD Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23
  • 19. A test case Table : Computation time 5000 10000 20000 50000 80000 SDP 6.6 7.2 7.5 7.4 7.4 TC-A 0.4 0.4 0.5 0.6 0.8 TC-ASGD 0.4 0.6 0.9 1.3 1.9 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23
  • 20. Multi agent problems Nash equilibrium concepts for stochastic games (Economics) Markov Perfect Equilibrium Oblivious Equilibrium Learning in games (Economics) Factious play Partial best response dynamic Multi-agent learning (Computer Science / Economics) each agent follows a single agent RL method or we combine RL with game theory / equilibrium concepts Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23
  • 22. tted Q-V iteration Each agent follows a
  • 23. tted Q-V iteration algorithm except... I only a sample of agents update their policies each stage (similar to partial best response) I each new batch of samples is blended with the existing batch of samples (similar to
  • 24. ctitious play) Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 18 / 23
  • 25. Conclusions RL can be successfully applied to economic problems Batch methods (such as
  • 26. tted Q-V iteration) are suited to our context tilecoding is a great approximation method for low dimension problems Our multi-agent method provides a middle ground between macro-DP methods and agent based-evolutionary methods Allows us to consider complex multi-agent problems with externalities, but still have near optimal agents Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 19 / 23
  • 27. A (multi-agent) water storage problem Inflow, It+1 Release point, F1t Storage, St Demand node 1 Extraction, Et Extraction point, F2t End of system, F3t 2 3 Return flow, Rt Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 20 / 23
  • 28. Example: capacity sharing Initial balance Updated balance Total Inflow Inflow credit Internal Spill 20 ML +10 ML +10 ML 10 ML User 1 Volume 10 ML User 2 Volume 50 ML User 1 Airspace 40 ML User 1 Volume 30 ML User 2 Volume 50 ML User 1 Airspace 20 ML Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 21 / 23
  • 29. A test case Figure : Mean storage by iteration 0 5 10 15 20 Iteration 800 750 700 650 600 550 Mean storage St (GL) CS NS OA SWA Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 22 / 23
  • 30. A test case Figure : Mean social welfare by iteration 0 5 10 15 20 Iteration 195.5 195.0 194.5 194.0 193.5 193.0 192.5 192.0 i=1 uit ($M) Pn Mean social welfare CS NS OA SWA Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 23 / 23