This document discusses applying reinforcement learning techniques to economic problems. It provides an overview of reinforcement learning and how it can be used to learn optimal policies for problems modeled as Markov decision processes. As an example, it discusses how reinforcement learning can be applied to learn policies for single-agent and multi-agent water storage problems. It also describes some specific reinforcement learning algorithms like fitted Q-iteration that are well-suited for economic problems.
Alternative Infill Strategies for Expensive Multi-Objective OptimisationAlma Rahat
Many multi-objective optimisation problems incorporate computationally or financially expensive objective functions. State-of-the-art algorithms therefore construct surrogate model(s) of the parameter space to objective functions mapping to guide the choice of the next solution to expensively evaluate. Starting from an initial set of solutions, an infill criterion — a surrogate-based indicator of quality — is extremised to determine which solution to evaluate next, until the budget of expensive evaluations is exhausted. Many successful infill criteria are dependent on multi-dimensional integration, which may result in infill criteria that are themselves impractically expensive. We propose a computationally cheap infill criterion based on the minimum probability of improvement over the estimated Pareto set. We also present a range of set-based scalarisation methods modelling hypervolume contribution, dominance ratio and distance measures. These permit the use of straightforward expected improvement as a cheap infill criterion. We investigated the performance of these novel strategies on standard multi-objective test problems, and compared them with the popular SMS-EGO and ParEGO methods. Unsurprisingly, our experiments show that the best strategy is problem dependent, but in many cases a cheaper strategy is at least as good as more expensive alternatives.
Preprint repository: https://ore.exeter.ac.uk/repository/handle/10871/27157
Coastal flood hazards are amongst the deadliest and most costly natural disasters on the planet. Their underlying processes are, in some regards, in an advanced state of knowledge. Yet, the scale and variety of both causes and effects leave open many challenging questions. And our advanced state of knowledge has failed to realize a reduction in deaths or damages. In this talk, I will address the underlying physical processes of coastal flooding and knowledge gaps, with an eye toward current research and the overarching issue of turning knowledge to action.
Analyzing high-frequency time series is increasingly useful with the current explosion in the availability of these data in several application areas, including but not limited to, climate, finance, health analytics, transportation, etc. This talk will give an overview of two statistical frameworks that could be useful for analyzing high-frequency financial time series leading to quantification of financial risk. These include a distribution free approach using penalized estimating functions for modeling inter-event durations and an approximate Bayesian approach for modeling counts of events in regular intervals. A few other potentially useful lines of research in this area will also be introduced.
This talk will report briey on some findings from the problem of picking the weights for a weighted function space in QMC. Then it will be mostly about importance sampling. We want to estimate the probability _ of a union of J rare events. The method uses n samples, each of which picks one of the rare events at random, samples conditionally on that rare event happening and counts the total number of rare events that happen. It was used by Naiman and Priebe for scan
statistics, Shi, Siegmund and Yakir for genomic scans and Adler, Blanchet and Liu for extrema of Gaussian processes. We call it ALOE, for `at least one event'. The ALOE estimate is unbiased and we find that it has a coefficient of variation no larger than p (J + J�1 � 2)=(4n). The coefficient of variation is also no larger than p (__=_ � 1)=n where __ is the union bound. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the J rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by 5772
constraints in 326 dimensions, with probability below 10�22, are estimated with a coefficient of variation of about 0:0024 with only n = 10;000 sample values. In a genomic context, the rare events become false discoveries. There we are interested in the possibility of a large number of simultaneous events, not just one or more. Some work with Kenneth Tay will be presented on that problem.
Joint with Yury Maximov and Michael Chertkov Los Alamos National Laboratory and Kenneth Tay, Stanford
Machine Learning Today: Current Research And Advances From AMLAB, UvAAdvanced-Concepts-Team
With the deep learning 'revolution' barely a decade old, the field of machine learning is accumulating a growing number of interesting research problems. The Amsterdam Machine Learning Laboratory (AMLAB), headed by Profs. Max Welling and Joris Mooij, has enjoyed considerable participation in the creation of many of these areas. Our research spans many subdisciplines including: approximate Bayesian methods, causal inference, equivariant representations, graph neural networks, spiking neural networks, neural compression, low-cost computation, reinforcement learning, explainable AI, medical imaging, generative modelling, flow models, and many more. In this talk, Daniel Worrall (postdoc) will introduce and showcase some of the recent advances from the lab.
Alternative Infill Strategies for Expensive Multi-Objective OptimisationAlma Rahat
Many multi-objective optimisation problems incorporate computationally or financially expensive objective functions. State-of-the-art algorithms therefore construct surrogate model(s) of the parameter space to objective functions mapping to guide the choice of the next solution to expensively evaluate. Starting from an initial set of solutions, an infill criterion — a surrogate-based indicator of quality — is extremised to determine which solution to evaluate next, until the budget of expensive evaluations is exhausted. Many successful infill criteria are dependent on multi-dimensional integration, which may result in infill criteria that are themselves impractically expensive. We propose a computationally cheap infill criterion based on the minimum probability of improvement over the estimated Pareto set. We also present a range of set-based scalarisation methods modelling hypervolume contribution, dominance ratio and distance measures. These permit the use of straightforward expected improvement as a cheap infill criterion. We investigated the performance of these novel strategies on standard multi-objective test problems, and compared them with the popular SMS-EGO and ParEGO methods. Unsurprisingly, our experiments show that the best strategy is problem dependent, but in many cases a cheaper strategy is at least as good as more expensive alternatives.
Preprint repository: https://ore.exeter.ac.uk/repository/handle/10871/27157
Coastal flood hazards are amongst the deadliest and most costly natural disasters on the planet. Their underlying processes are, in some regards, in an advanced state of knowledge. Yet, the scale and variety of both causes and effects leave open many challenging questions. And our advanced state of knowledge has failed to realize a reduction in deaths or damages. In this talk, I will address the underlying physical processes of coastal flooding and knowledge gaps, with an eye toward current research and the overarching issue of turning knowledge to action.
Analyzing high-frequency time series is increasingly useful with the current explosion in the availability of these data in several application areas, including but not limited to, climate, finance, health analytics, transportation, etc. This talk will give an overview of two statistical frameworks that could be useful for analyzing high-frequency financial time series leading to quantification of financial risk. These include a distribution free approach using penalized estimating functions for modeling inter-event durations and an approximate Bayesian approach for modeling counts of events in regular intervals. A few other potentially useful lines of research in this area will also be introduced.
This talk will report briey on some findings from the problem of picking the weights for a weighted function space in QMC. Then it will be mostly about importance sampling. We want to estimate the probability _ of a union of J rare events. The method uses n samples, each of which picks one of the rare events at random, samples conditionally on that rare event happening and counts the total number of rare events that happen. It was used by Naiman and Priebe for scan
statistics, Shi, Siegmund and Yakir for genomic scans and Adler, Blanchet and Liu for extrema of Gaussian processes. We call it ALOE, for `at least one event'. The ALOE estimate is unbiased and we find that it has a coefficient of variation no larger than p (J + J�1 � 2)=(4n). The coefficient of variation is also no larger than p (__=_ � 1)=n where __ is the union bound. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the J rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by 5772
constraints in 326 dimensions, with probability below 10�22, are estimated with a coefficient of variation of about 0:0024 with only n = 10;000 sample values. In a genomic context, the rare events become false discoveries. There we are interested in the possibility of a large number of simultaneous events, not just one or more. Some work with Kenneth Tay will be presented on that problem.
Joint with Yury Maximov and Michael Chertkov Los Alamos National Laboratory and Kenneth Tay, Stanford
Machine Learning Today: Current Research And Advances From AMLAB, UvAAdvanced-Concepts-Team
With the deep learning 'revolution' barely a decade old, the field of machine learning is accumulating a growing number of interesting research problems. The Amsterdam Machine Learning Laboratory (AMLAB), headed by Profs. Max Welling and Joris Mooij, has enjoyed considerable participation in the creation of many of these areas. Our research spans many subdisciplines including: approximate Bayesian methods, causal inference, equivariant representations, graph neural networks, spiking neural networks, neural compression, low-cost computation, reinforcement learning, explainable AI, medical imaging, generative modelling, flow models, and many more. In this talk, Daniel Worrall (postdoc) will introduce and showcase some of the recent advances from the lab.
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Philippe Laborie
Providing robust scheduling algorithms that can solve a large variety of scheduling problems with good performance is one of the biggest challenge of practical schedulers today. In this paper we present a robust scheduling algorithm based on Self-Adapting Large Neighborhood Search and apply it to a large panel of single-mode scheduling problems. The approach combines Large Neighborhood Search with a portfolio of neighborhoods and completion strategies together with Machine Learning techniques to converge on the most efficient neighborhoods and completion strategies for the problem being solved. The algorithm is evaluated on a set of 21 scheduling benchmarks, most of which are well established in the scheduling community. Despite the generality of the approach, for 17 benchmarks out of 21, its mean relative distance to state-of-the-art problem specific algorithms is less than 4%. It even outperforms state-of-the-art problem-specific algorithms on 7 benchmarks clearly showing that our algorithm offers a valuable compromise between robustness and performance.
In this talk we will describe a methodology to handle the causality to make inference on common-cause failure in a situation of missing data. The data are collected in the form of contingency table but the available information are only the numbers of CCF of different orders and the numbers of failure due to a given cause. Therefore only the margins of the contingency table are observed; thefrequencies in each cell are unknown. Assuming a Poisson model for the count, we suggest a Bayesian approach and we use the inverse Bayes formula (IBF) combined with a Metropolis-Hastings algorithm to make inference on the rate of occurrence for the different combination cause, order. The performance of the resulting algorithm is evaluated through simulations. A comparison is made with results obtained from the _-composition approach to deal with causality suggested by Zheng et al. (2013).
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Philippe Laborie
Providing robust scheduling algorithms that can solve a large variety of scheduling problems with good performance is one of the biggest challenge of practical schedulers today. In this paper we present a robust scheduling algorithm based on Self-Adapting Large Neighborhood Search and apply it to a large panel of single-mode scheduling problems. The approach combines Large Neighborhood Search with a portfolio of neighborhoods and completion strategies together with Machine Learning techniques to converge on the most efficient neighborhoods and completion strategies for the problem being solved. The algorithm is evaluated on a set of 21 scheduling benchmarks, most of which are well established in the scheduling community. Despite the generality of the approach, for 17 benchmarks out of 21, its mean relative distance to state-of-the-art problem specific algorithms is less than 4%. It even outperforms state-of-the-art problem-specific algorithms on 7 benchmarks clearly showing that our algorithm offers a valuable compromise between robustness and performance.
In this talk we will describe a methodology to handle the causality to make inference on common-cause failure in a situation of missing data. The data are collected in the form of contingency table but the available information are only the numbers of CCF of different orders and the numbers of failure due to a given cause. Therefore only the margins of the contingency table are observed; thefrequencies in each cell are unknown. Assuming a Poisson model for the count, we suggest a Bayesian approach and we use the inverse Bayes formula (IBF) combined with a Metropolis-Hastings algorithm to make inference on the rate of occurrence for the different combination cause, order. The performance of the resulting algorithm is evaluated through simulations. A comparison is made with results obtained from the _-composition approach to deal with causality suggested by Zheng et al. (2013).
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
Slides of the presentation given at CERI 2016 for the following paper:
Daniel Valcarce, Javier Parapar, Alvaro Barreiro: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems. CERI 2016: Article 9.
http://dx.doi.org/10.1145/2934732.2934737
An energy-efficient flow shop scheduling using hybrid Harris hawks optimizationjournalBEEI
The energy crisis has become an environmental problem, and this has received much attention from researchers. The manufacturing sector is the most significant contributor to energy consumption in the world. One of the significant efforts made in the manufacturing industry to reduce energy consumption is through proper scheduling. Energy-efficient scheduling (EES) is a problem in scheduling to reduce energy consumption. One of the EES problems is in a flow shop scheduling problem (FSSP). This article intends to develop a new approach to solving an EES in the FSSP problem. Hybrid Harris hawks optimization (hybrid HHO) algorithm is offered to resolve the EES issue on FSSP by considering the sequence-dependent setup. Swap and flip procedures are suggested to improve HHO performance. Furthermore, several procedures were used as a comparison to assess hybrid HHO performance. Ten tests were exercised to exhibit the hybrid HHO accomplishment. Based on numerical experimental results, hybrid HHO can solve EES problems. Furthermore, HHO was proven more competitive than other algorithms.
How Reliable is Duality Theory in Empirical Work?contenidos-ort
Coautores: Francisco Rosas (Universidad ORT Uruguay) and Sergio H. Lence(Iowa State University).
2016 Agricultural and Applied Economics Association (AAEA) Annual Meetings. July 2016, Boston, MA.
La teoría de dualidad, que establece una relación entre la función de beneficios de una firma competitiva y su tecnología de producción, ha sido utilizado por ejemplo para estimar elasticidades.
En este estudio se pone en manifiesto problemas de precisión de dicha teoría en algunas aplicaciones prácticas debido a importantes sesgos en la estimaciones de parámetros conocidos de una función de producción.
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
what is the best method to sell pi coins in 2024DOT TECH
The best way to sell your pi coins safely is trading with an exchange..but since pi is not launched in any exchange, and second option is through a VERIFIED pi merchant.
Who is a pi merchant?
A pi merchant is someone who buys pi coins from miners and pioneers and resell them to Investors looking forward to hold massive amounts before mainnet launch in 2026.
I will leave the telegram contact of my personal pi merchant to trade pi coins with.
@Pi_vendor_247
The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...beulahfernandes8
Role in Financial System
NBFCs are critical in bridging the financial inclusion gap.
They provide specialized financial services that cater to segments often neglected by traditional banks.
Economic Impact
NBFCs contribute significantly to India's GDP.
They support sectors like micro, small, and medium enterprises (MSMEs), housing finance, and personal loans.
What website can I sell pi coins securely.DOT TECH
Currently there are no website or exchange that allow buying or selling of pi coins..
But you can still easily sell pi coins, by reselling it to exchanges/crypto whales interested in holding thousands of pi coins before the mainnet launch.
Who is a pi merchant?
A pi merchant is someone who buys pi coins from miners and resell to these crypto whales and holders of pi..
This is because pi network is not doing any pre-sale. The only way exchanges can get pi is by buying from miners and pi merchants stands in between the miners and the exchanges.
How can I sell my pi coins?
Selling pi coins is really easy, but first you need to migrate to mainnet wallet before you can do that. I will leave the telegram contact of my personal pi merchant to trade with.
Tele-gram.
@Pi_vendor_247
Introduction to Indian Financial System ()Avanish Goel
The financial system of a country is an important tool for economic development of the country, as it helps in creation of wealth by linking savings with investments.
It facilitates the flow of funds form the households (savers) to business firms (investors) to aid in wealth creation and development of both the parties
Even tho Pi network is not listed on any exchange yet.
Buying/Selling or investing in pi network coins is highly possible through the help of vendors. You can buy from vendors[ buy directly from the pi network miners and resell it]. I will leave the telegram contact of my personal vendor.
@Pi_vendor_247
Currently pi network is not tradable on binance or any other exchange because we are still in the enclosed mainnet.
Right now the only way to sell pi coins is by trading with a verified merchant.
What is a pi merchant?
A pi merchant is someone verified by pi network team and allowed to barter pi coins for goods and services.
Since pi network is not doing any pre-sale The only way exchanges like binance/huobi or crypto whales can get pi is by buying from miners. And a merchant stands in between the exchanges and the miners.
I will leave the telegram contact of my personal pi merchant. I and my friends has traded more than 6000pi coins successfully
Tele-gram
@Pi_vendor_247
how to sell pi coins in South Korea profitably.DOT TECH
Yes. You can sell your pi network coins in South Korea or any other country, by finding a verified pi merchant
What is a verified pi merchant?
Since pi network is not launched yet on any exchange, the only way you can sell pi coins is by selling to a verified pi merchant, and this is because pi network is not launched yet on any exchange and no pre-sale or ico offerings Is done on pi.
Since there is no pre-sale, the only way exchanges can get pi is by buying from miners. So a pi merchant facilitates these transactions by acting as a bridge for both transactions.
How can i find a pi vendor/merchant?
Well for those who haven't traded with a pi merchant or who don't already have one. I will leave the telegram id of my personal pi merchant who i trade pi with.
Tele gram: @Pi_vendor_247
#pi #sell #nigeria #pinetwork #picoins #sellpi #Nigerian #tradepi #pinetworkcoins #sellmypi
What price will pi network be listed on exchangesDOT TECH
The rate at which pi will be listed is practically unknown. But due to speculations surrounding it the predicted rate is tends to be from 30$ — 50$.
So if you are interested in selling your pi network coins at a high rate tho. Or you can't wait till the mainnet launch in 2026. You can easily trade your pi coins with a merchant.
A merchant is someone who buys pi coins from miners and resell them to Investors looking forward to hold massive quantities till mainnet launch.
I will leave the telegram contact of my personal pi vendor to trade with.
@Pi_vendor_247
how to swap pi coins to foreign currency withdrawable.DOT TECH
As of my last update, Pi is still in the testing phase and is not tradable on any exchanges.
However, Pi Network has announced plans to launch its Testnet and Mainnet in the future, which may include listing Pi on exchanges.
The current method for selling pi coins involves exchanging them with a pi vendor who purchases pi coins for investment reasons.
If you want to sell your pi coins, reach out to a pi vendor and sell them to anyone looking to sell pi coins from any country around the globe.
Below is the contact information for my personal pi vendor.
Telegram: @Pi_vendor_247
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Cardnickysharmasucks
The unveiling of the IndusInd Bank Poonawalla Fincorp eLITE RuPay Platinum Credit Card marks a notable milestone in the Indian financial landscape, showcasing a successful partnership between two leading institutions, Poonawalla Fincorp and IndusInd Bank. This co-branded credit card not only offers users a plethora of benefits but also reflects a commitment to innovation and adaptation. With a focus on providing value-driven and customer-centric solutions, this launch represents more than just a new product—it signifies a step towards redefining the banking experience for millions. Promising convenience, rewards, and a touch of luxury in everyday financial transactions, this collaboration aims to cater to the evolving needs of customers and set new standards in the industry.
The secret way to sell pi coins effortlessly.DOT TECH
Well as we all know pi isn't launched yet. But you can still sell your pi coins effortlessly because some whales in China are interested in holding massive pi coins. And they are willing to pay good money for it. If you are interested in selling I will leave a contact for you. Just telegram this number below. I sold about 3000 pi coins to him and he paid me immediately.
Telegram: @Pi_vendor_247
USDA Loans in California: A Comprehensive Overview.pptxmarketing367770
USDA Loans in California: A Comprehensive Overview
If you're dreaming of owning a home in California's rural or suburban areas, a USDA loan might be the perfect solution. The U.S. Department of Agriculture (USDA) offers these loans to help low-to-moderate-income individuals and families achieve homeownership.
Key Features of USDA Loans:
Zero Down Payment: USDA loans require no down payment, making homeownership more accessible.
Competitive Interest Rates: These loans often come with lower interest rates compared to conventional loans.
Flexible Credit Requirements: USDA loans have more lenient credit score requirements, helping those with less-than-perfect credit.
Guaranteed Loan Program: The USDA guarantees a portion of the loan, reducing risk for lenders and expanding borrowing options.
Eligibility Criteria:
Location: The property must be located in a USDA-designated rural or suburban area. Many areas in California qualify.
Income Limits: Applicants must meet income guidelines, which vary by region and household size.
Primary Residence: The home must be used as the borrower's primary residence.
Application Process:
Find a USDA-Approved Lender: Not all lenders offer USDA loans, so it's essential to choose one approved by the USDA.
Pre-Qualification: Determine your eligibility and the amount you can borrow.
Property Search: Look for properties in eligible rural or suburban areas.
Loan Application: Submit your application, including financial and personal information.
Processing and Approval: The lender and USDA will review your application. If approved, you can proceed to closing.
USDA loans are an excellent option for those looking to buy a home in California's rural and suburban areas. With no down payment and flexible requirements, these loans make homeownership more attainable for many families. Explore your eligibility today and take the first step toward owning your dream home.
USDA Loans in California: A Comprehensive Overview.pptx
Applying reinforcement learning to single and multi-agent economic problems
1. Applying reinforcement learning to economics
Neal Hughes
Australian National University
neal.hughes@anu.edu.au
November 17, 2014
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23
2. Machine learning
Machine learning
I algorithms that `learn' from data, i.e., build models from data with
minimal theory / human involvement.
I goes hand in hand with `Big Data'
Supervised Learning
I estimating functions mapping `input` variables X to `target' variables Y.
I aka non-parametric regression
Reinforcement learning
I learning to make optimal (reward maximising) decisions in dynamic
environments: learning optimal policy functions for Markov Decision
Processes (MDPs)
I aka approximate dynamic programming
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23
3. Reinforcement learning
Agent
Reward, rt Action, at
Environment
State, st
st+1
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23
4. A (single agent) water storage problem
Inflow, It+1
Release point, F1t
Storage, St
Demand node
1
Extraction, Et
Extraction point, F2t
End of system, F3t
2
3
Return flow, Rt
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23
5. A (single agent) water storage problem
max
fWtgt=¥
t=0
E
(
¥å
t=0
btP(Qt , It )
)
Subject to:
St+1 = minfSt Wt d0aS2/3
t + It+1, Kg
0 Wt St
Qt maxf(1 d1b)Wt d1a, 0g
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23
7. The Q function
The standard Bellman equation with state value function V (s)
V (s) = max
a
R(s, a) + b
Z
S
T(s, a, s0)V (s0) ds0
The Bellman equation with action-value function Q(a, s)
Q(a, s) = R(s, a) + b
Z
S
T(s, a, s0) max
a
Q(a, s0) ds0
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23
8. Fitted Q Iteration
Algorithm 1: Fitted Q Iteration
1 initialise s0
2 Run a simulation with exploration for T periods
3 Store the samples fat , st , st+1, rtgTt
=0
4 initialise Q(at , st )
5 repeat // Iterate until convergence
6 for t = 0 to T do
7 set ˆQ
t = rt + b. maxa .Q(a, st+1)
8 end
9 estimate Q by regressing ˆQ
t against (at , st )
10 until a stopping rule is satis
9. ed;
With large dense data, computing maxa Q(a, .) for each point is wasteful
Alternative: max over a sample of points and
10. t a value function (Fitted
Q-V iteration)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23
11. Single agent reinforcement learning
Figure : An approximately equidistant grid in two dimensions
4
3
2
1
0
−1
−2
−3
−4 −3 −2 −1 0 1 2 3 4 −4
(a) 10000 iid standard normal points
4
3
2
1
0
−1
−2
−3
−4 −3 −2 −1 0 1 2 3 4 −4
(b) 100 points at least 0.4 apart
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23
12. Tilecoding
input space
tiling layer 1
tiling layer 2
input point Xt
activated tile, layer 1
activated tile, layer 2
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23
17. Tilecoding
Fitting
Averaging
Averages Stochastic Gradient Descent
Setup
Regular grids
`Optimal' displacement vectors
Linear extrapolation
Implementation
Cython with OpenMP
Perfect `hashing'
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23
18. A test case
10000 20000 30000 40000 50000 60000 70000 80000
Number of samples
1.000
0.999
0.998
0.997
0.996
0.995
0.994
0.993
Social welfare as percentage of SDP
SDP TC-A TC-ASGD
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23
19. A test case
Table : Computation time
5000 10000 20000 50000 80000
SDP 6.6 7.2 7.5 7.4 7.4
TC-A 0.4 0.4 0.5 0.6 0.8
TC-ASGD 0.4 0.6 0.9 1.3 1.9
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23
20. Multi agent problems
Nash equilibrium concepts for stochastic games (Economics)
Markov Perfect Equilibrium
Oblivious Equilibrium
Learning in games (Economics)
Factious play
Partial best response dynamic
Multi-agent learning (Computer Science / Economics)
each agent follows a single agent RL method
or we combine RL with game theory / equilibrium concepts
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23
23. tted Q-V iteration algorithm except...
I only a sample of agents update their policies each stage
(similar to partial best response)
I each new batch of samples is blended with the existing batch of samples
(similar to
24. ctitious play)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 18 / 23
25. Conclusions
RL can be successfully applied to economic problems
Batch methods (such as
26. tted Q-V iteration) are suited to our context
tilecoding is a great approximation method for low dimension problems
Our multi-agent method provides a middle ground between macro-DP
methods and agent based-evolutionary methods
Allows us to consider complex multi-agent problems with externalities,
but still have near optimal agents
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 19 / 23
27. A (multi-agent) water storage problem
Inflow, It+1
Release point, F1t
Storage, St
Demand node
1
Extraction, Et
Extraction point, F2t
End of system, F3t
2
3
Return flow, Rt
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 20 / 23
28. Example: capacity sharing
Initial balance Updated balance
Total Inflow
Inflow credit
Internal Spill
20 ML
+10 ML +10 ML
10 ML
User 1 Volume
10 ML
User 2 Volume
50 ML
User 1 Airspace
40 ML
User 1 Volume
30 ML
User 2 Volume
50 ML
User 1 Airspace
20 ML
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 21 / 23
29. A test case
Figure : Mean storage by iteration
0 5 10 15 20
Iteration
800
750
700
650
600
550
Mean storage St (GL)
CS NS OA SWA
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 22 / 23
30. A test case
Figure : Mean social welfare by iteration
0 5 10 15 20
Iteration
195.5
195.0
194.5
194.0
193.5
193.0
192.5
192.0
i=1 uit ($M)
Pn
Mean social welfare
CS NS OA SWA
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 23 / 23