Applying reinforcement learning to single and multi-agent economic problems

•

0 likes•1,254 views

This document discusses applying reinforcement learning techniques to economic problems. It provides an overview of reinforcement learning and how it can be used to learn optimal policies for problems modeled as Markov decision processes. As an example, it discusses how reinforcement learning can be applied to learn policies for single-agent and multi-agent water storage problems. It also describes some specific reinforcement learning algorithms like fitted Q-iteration that are well-suited for economic problems.

Economy & Finance

Applying reinforcement learning to economics
Neal Hughes
Australian National University
neal.hughes@anu.edu.au
November 17, 2014
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23

Machine learning
Machine learning
I algorithms that `learn' from data, i.e., build models from data with
minimal theory / human involvement.
I goes hand in hand with `Big Data'
Supervised Learning
I estimating functions mapping `input` variables X to `target' variables Y.
I aka non-parametric regression
Reinforcement learning
I learning to make optimal (reward maximising) decisions in dynamic
environments: learning optimal policy functions for Markov Decision
Processes (MDPs)
I aka approximate dynamic programming
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23

Reinforcement learning
Agent
Reward, rt Action, at
Environment
State, st
st+1
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23

A (single agent) water storage problem
Inflow, It+1
Release point, F1t
Storage, St
Demand node
1
Extraction, Et
Extraction point, F2t
End of system, F3t
2
3
Return flow, Rt
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23

A (single agent) water storage problem
max
fWtgt=¥
t=0
E
(
¥å
t=0
btP(Qt , It )
)
Subject to:
St+1 = minfSt Wt d0aS2/3
t + It+1, Kg
0 Wt St
Qt maxf(1 d1b)Wt d1a, 0g
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23

Why reinforcement learning?
0 200 400 600 800 1000
Storage (GL)
2000
1500
1000
500
0
Inflow (GL)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 6 / 23

The Q function
The standard Bellman equation with state value function V (s)
V (s) = max
a

R(s, a) + b
Z
S
T(s, a, s0)V (s0) ds0

The Bellman equation with action-value function Q(a, s)
Q(a, s) = R(s, a) + b
Z
S
T(s, a, s0) max
a
Q(a, s0) ds0
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23

Fitted Q Iteration
Algorithm 1: Fitted Q Iteration
1 initialise s0
2 Run a simulation with exploration for T periods
3 Store the samples fat , st , st+1, rtgTt
=0
4 initialise Q(at , st )
5 repeat // Iterate until convergence
6 for t = 0 to T do
7 set ˆQ
t = rt + b. maxa .Q(a, st+1)
8 end
9 estimate Q by regressing ˆQ
t against (at , st )
10 until a stopping rule is satis

ed;
With large dense data, computing maxa Q(a, .) for each point is wasteful
Alternative: max over a sample of points and

t a value function (Fitted
Q-V iteration)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23

Single agent reinforcement learning
Figure : An approximately equidistant grid in two dimensions
4
3
2
1
0
−1
−2
−3
−4 −3 −2 −1 0 1 2 3 4 −4
(a) 10000 iid standard normal points
4
3
2
1
0
−1
−2
−3
−4 −3 −2 −1 0 1 2 3 4 −4
(b) 100 points at least 0.4 apart
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23

Tilecoding
input space
tiling layer 1
tiling layer 2
input point Xt
activated tile, layer 1
activated tile, layer 2
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23

ne grid
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 11 / 23

Single chunky grid
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 12 / 23

Tilecoding: many chunky grids
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 13 / 23

Tilecoding
Fitting
Averaging
Averages Stochastic Gradient Descent
Setup
Regular grids
`Optimal' displacement vectors
Linear extrapolation
Implementation
Cython with OpenMP
Perfect `hashing'
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23

A test case
10000 20000 30000 40000 50000 60000 70000 80000
Number of samples
1.000
0.999
0.998
0.997
0.996
0.995
0.994
0.993
Social welfare as percentage of SDP
SDP TC-A TC-ASGD
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23

A test case
Table : Computation time
5000 10000 20000 50000 80000
SDP 6.6 7.2 7.5 7.4 7.4
TC-A 0.4 0.4 0.5 0.6 0.8
TC-ASGD 0.4 0.6 0.9 1.3 1.9
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23

Multi agent problems
Nash equilibrium concepts for stochastic games (Economics)
Markov Perfect Equilibrium
Oblivious Equilibrium
Learning in games (Economics)
Factious play
Partial best response dynamic
Multi-agent learning (Computer Science / Economics)
each agent follows a single agent RL method
or we combine RL with game theory / equilibrium concepts
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23

tted Q-V iteration algorithm except...
I only a sample of agents update their policies each stage
(similar to partial best response)
I each new batch of samples is blended with the existing batch of samples
(similar to

Many multi-objective optimisation problems incorporate computationally or financially expensive objective functions. State-of-the-art algorithms therefore construct surrogate model(s) of the parameter space to objective functions mapping to guide the choice of the next solution to expensively evaluate. Starting from an initial set of solutions, an infill criterion — a surrogate-based indicator of quality — is extremised to determine which solution to evaluate next, until the budget of expensive evaluations is exhausted. Many successful infill criteria are dependent on multi-dimensional integration, which may result in infill criteria that are themselves impractically expensive. We propose a computationally cheap infill criterion based on the minimum probability of improvement over the estimated Pareto set. We also present a range of set-based scalarisation methods modelling hypervolume contribution, dominance ratio and distance measures. These permit the use of straightforward expected improvement as a cheap infill criterion. We investigated the performance of these novel strategies on standard multi-objective test problems, and compared them with the popular SMS-EGO and ParEGO methods. Unsurprisingly, our experiments show that the best strategy is problem dependent, but in many cases a cheaper strategy is at least as good as more expensive alternatives. Preprint repository: https://ore.exeter.ac.uk/repository/handle/10871/27157

Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...

Guillaume Costeseque

MUMS: Transition & SPUQ Workshop - Surge Hazards Working Group Issues and Eff...

The Statistical and Applied Mathematical Sciences Institute

Coastal flood hazards are amongst the deadliest and most costly natural disasters on the planet. Their underlying processes are, in some regards, in an advanced state of knowledge. Yet, the scale and variety of both causes and effects leave open many challenging questions. And our advanced state of knowledge has failed to realize a reduction in deaths or damages. In this talk, I will address the underlying physical processes of coastal flooding and knowledge gaps, with an eye toward current research and the overarching issue of turning knowledge to action.

GDRR Opening Workshop - Modeling Approaches for High-Frequency Financial Time...

The Statistical and Applied Mathematical Sciences Institute

Analyzing high-frequency time series is increasingly useful with the current explosion in the availability of these data in several application areas, including but not limited to, climate, finance, health analytics, transportation, etc. This talk will give an overview of two statistical frameworks that could be useful for analyzing high-frequency financial time series leading to quantification of financial risk. These include a distribution free approach using penalized estimating functions for modeling inter-event durations and an approximate Bayesian approach for modeling counts of events in regular intervals. A few other potentially useful lines of research in this area will also be introduced.

QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...

The Statistical and Applied Mathematical Sciences Institute

This talk will report briey on some findings from the problem of picking the weights for a weighted function space in QMC. Then it will be mostly about importance sampling. We want to estimate the probability _ of a union of J rare events. The method uses n samples, each of which picks one of the rare events at random, samples conditionally on that rare event happening and counts the total number of rare events that happen. It was used by Naiman and Priebe for scan statistics, Shi, Siegmund and Yakir for genomic scans and Adler, Blanchet and Liu for extrema of Gaussian processes. We call it ALOE, for `at least one event'. The ALOE estimate is unbiased and we find that it has a coefficient of variation no larger than p (J + J�1 � 2)=(4n). The coefficient of variation is also no larger than p (__=_ � 1)=n where __ is the union bound. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the J rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by 5772 constraints in 326 dimensions, with probability below 10�22, are estimated with a coefficient of variation of about 0:0024 with only n = 10;000 sample values. In a genomic context, the rare events become false discoveries. There we are interested in the possibility of a large number of simultaneous events, not just one or more. Some work with Kenneth Tay will be presented on that problem. Joint with Yury Maximov and Michael Chertkov Los Alamos National Laboratory and Kenneth Tay, Stanford

Ceske budevice

Kjetil Haugen

Mutualisation et Segmentation

Arthur Charpentier

Machine Learning Today: Current Research And Advances From AMLAB, UvA

Advanced-Concepts-Team

With the deep learning 'revolution' barely a decade old, the field of machine learning is accumulating a growing number of interesting research problems. The Amsterdam Machine Learning Laboratory (AMLAB), headed by Profs. Max Welling and Joris Mooij, has enjoyed considerable participation in the creation of many of these areas. Our research spans many subdisciplines including: approximate Bayesian methods, causal inference, equivariant representations, graph neural networks, spiking neural networks, neural compression, low-cost computation, reinforcement learning, explainable AI, medical imaging, generative modelling, flow models, and many more. In this talk, Daniel Worrall (postdoc) will introduce and showcase some of the recent advances from the lab.

Providing robust scheduling algorithms that can solve a large variety of scheduling problems with good performance is one of the biggest challenge of practical schedulers today. In this paper we present a robust scheduling algorithm based on Self-Adapting Large Neighborhood Search and apply it to a large panel of single-mode scheduling problems. The approach combines Large Neighborhood Search with a portfolio of neighborhoods and completion strategies together with Machine Learning techniques to converge on the most efficient neighborhoods and completion strategies for the problem being solved. The algorithm is evaluated on a set of 21 scheduling benchmarks, most of which are well established in the scheduling community. Despite the generality of the approach, for 17 benchmarks out of 21, its mean relative distance to state-of-the-art problem specific algorithms is less than 4%. It even outperforms state-of-the-art problem-specific algorithms on 7 benchmarks clearly showing that our algorithm offers a valuable compromise between robustness and performance.

GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...

The Statistical and Applied Mathematical Sciences Institute

In this talk we will describe a methodology to handle the causality to make inference on common-cause failure in a situation of missing data. The data are collected in the form of contingency table but the available information are only the numbers of CCF of different orders and the numbers of failure due to a given cause. Therefore only the margins of the contingency table are observed; thefrequencies in each cell are unknown. Assuming a Poisson model for the count, we suggest a Bayesian approach and we use the inverse Bayes formula (IBF) combined with a Metropolis-Hastings algorithm to make inference on the rate of occurrence for the different combination cause, order. The performance of the resulting algorithm is evaluated through simulations. A comparison is made with results obtained from the _-composition approach to deal with causality suggested by Zheng et al. (2013).

Viewers also liked

The paradox of specialisation: Technological expansion and economic stagnation

anucrawfordphd

Fiscal decentralisation and economic growth: evidence from Vietnam

anucrawfordphd

Land reforms, labor allocation and economic diversity: evidence from Vietnam ...

anucrawfordphd

Optimal regulatory regime and competition

anucrawfordphd

Could order and ambition emerge from the fragmented climate governance complex?

anucrawfordphd

Water affordability and state water concessions in Australia

anucrawfordphd

Giving rights to nature: A new institutional approach for overcoming social d...

anucrawfordphd

Facing our demons: Do mindfulness skills help people deal with failure at work?

anucrawfordphd

Small states, big effects? Oil price shocks and economic growth in small isla...

anucrawfordphd

Mental health and disengaged youth

anucrawfordphd

Global public goods and coalition formation under matching mechanisms (discus...

anucrawfordphd

Global public goods and coalition formation under matching mechanisms

anucrawfordphd

‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...

anucrawfordphd

Revenue efforts in mineral producing districts In Indonesia: is there a resou...

anucrawfordphd

Marital assimilation of Central Java people in separate destinations: Investi...

anucrawfordphd

Where big data meets no data

anucrawfordphd

Shining a light on the Indonesian oil palm and development debate with big data

anucrawfordphd

Domestic sources of Japanese foreign policy

anucrawfordphd

Mental health and disengaged youth (discussant paper)

anucrawfordphd

Trade liberalisation, povery and equality in Indonesia

anucrawfordphd

Viewers also liked (20)

The paradox of specialisation: Technological expansion and economic stagnation

Fiscal decentralisation and economic growth: evidence from Vietnam

Land reforms, labor allocation and economic diversity: evidence from Vietnam ...

Optimal regulatory regime and competition

Could order and ambition emerge from the fragmented climate governance complex?

Water affordability and state water concessions in Australia

Giving rights to nature: A new institutional approach for overcoming social d...

Facing our demons: Do mindfulness skills help people deal with failure at work?

Small states, big effects? Oil price shocks and economic growth in small isla...

Mental health and disengaged youth

Global public goods and coalition formation under matching mechanisms (discus...

Global public goods and coalition formation under matching mechanisms

‘Putting a Value On It’. The value that New Zealand educational entrepreneurs...

Revenue efforts in mineral producing districts In Indonesia: is there a resou...

Marital assimilation of Central Java people in separate destinations: Investi...

Where big data meets no data

Shining a light on the Indonesian oil palm and development debate with big data

Domestic sources of Japanese foreign policy

Mental health and disengaged youth (discussant paper)

Trade liberalisation, povery and equality in Indonesia

Similar to Applying reinforcement learning to single and multi-agent economic problems

Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...

Philippe Laborie

GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...

The Statistical and Applied Mathematical Sciences Institute

Workload-aware materialization for efficient variable elimination on Bayesian...

Cigdem Aslay

Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.

Classification

Arthur Charpentier

Traffic flow modeling on road networks using Hamilton-Jacobi equations

Guillaume Costeseque

Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...

Daniel Valcarce

Side 2019, part 2

Arthur Charpentier

Damiano Pasetto

CoupledHydrologicalModeling

Leveraging Bagging for Evolving Data Streams

Albert Bifet

Hierarchical Reinforcement Learning with Option-Critic Architecture

Necip Oguz Serbetci

Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...

Duy Tân Nguyễn

Stochastic optimization from mirror descent to recent algorithms

Seonho Park

EasyStudy3

Chapter 00 Introduction Operational research

MariaSarwat

A brief introduction to Gaussian process

Eric Xihui Lin

Deep learning by JSKIM

Jinseob Kim

An energy-efficient flow shop scheduling using hybrid Harris hawks optimization

journalBEEI

The energy crisis has become an environmental problem, and this has received much attention from researchers. The manufacturing sector is the most significant contributor to energy consumption in the world. One of the significant efforts made in the manufacturing industry to reduce energy consumption is through proper scheduling. Energy-efficient scheduling (EES) is a problem in scheduling to reduce energy consumption. One of the EES problems is in a flow shop scheduling problem (FSSP). This article intends to develop a new approach to solving an EES in the FSSP problem. Hybrid Harris hawks optimization (hybrid HHO) algorithm is offered to resolve the EES issue on FSSP by considering the sequence-dependent setup. Swap and flip procedures are suggested to improve HHO performance. Furthermore, several procedures were used as a comparison to assess hybrid HHO performance. Ten tests were exercised to exhibit the hybrid HHO accomplishment. Based on numerical experimental results, hybrid HHO can solve EES problems. Furthermore, HHO was proven more competitive than other algorithms.

How Reliable is Duality Theory in Empirical Work?

contenidos-ort

Coautores: Francisco Rosas (Universidad ORT Uruguay) and Sergio H. Lence(Iowa State University). 2016 Agricultural and Applied Economics Association (AAEA) Annual Meetings. July 2016, Boston, MA. La teoría de dualidad, que establece una relación entre la función de beneficios de una firma competitiva y su tecnología de producción, ha sido utilizado por ejemplo para estimar elasticidades. En este estudio se pone en manifiesto problemas de precisión de dicha teoría en algunas aplicaciones prácticas debido a importantes sesgos en la estimaciones de parámetros conocidos de una función de producción.

MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING

VisionGEOMATIQUE2014

pres06-mainBrian Donhauser

Similar to Applying reinforcement learning to single and multi-agent economic problems (20)

Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...

GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...

Workload-aware materialization for efficient variable elimination on Bayesian...

Classification

Traffic flow modeling on road networks using Hamilton-Jacobi equations

Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...

Side 2019, part 2

Damiano Pasetto

Leveraging Bagging for Evolving Data Streams

Hierarchical Reinforcement Learning with Option-Critic Architecture

Duy Tan NGUYEN_Multi-objective optimization for inventory management systems ...

Stochastic optimization from mirror descent to recent algorithms

Chapter 00 Introduction Operational research

A brief introduction to Gaussian process

Deep learning by JSKIM

An energy-efficient flow shop scheduling using hybrid Harris hawks optimization

How Reliable is Duality Theory in Empirical Work?

MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING

pres06-main

Recently uploaded

一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理

ydubwyt

BCU毕业证原版定制【微信：176555708】【伯明翰城市大学毕业证成绩单-学位证】【微信：176555708】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。 ◆◆◆◆◆ — — — — — — — — 【留学教育】留学归国服务中心 — — — — — -◆◆◆◆◆ 【主营项目】一.毕业证【微信：176555708】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【微信：176555708】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分→ 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！学历顾问：微信：176555708

innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...

Falcon Invoice Discounting

Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.

what is the best method to sell pi coins in 2024

DOT TECH

The best way to sell your pi coins safely is trading with an exchange..but since pi is not launched in any exchange, and second option is through a VERIFIED pi merchant. Who is a pi merchant? A pi merchant is someone who buys pi coins from miners and pioneers and resell them to Investors looking forward to hold massive amounts before mainnet launch in 2026. I will leave the telegram contact of my personal pi merchant to trade pi coins with. @Pi_vendor_247

PF-Wagner's Theory of Public Expenditure.pptx

GunjanSharma28848

The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...

beulahfernandes8

655264371-checkpoint-science-past-papers-april-2023.pdf

morearsh02

What website can I sell pi coins securely.

DOT TECH

Currently there are no website or exchange that allow buying or selling of pi coins.. But you can still easily sell pi coins, by reselling it to exchanges/crypto whales interested in holding thousands of pi coins before the mainnet launch. Who is a pi merchant? A pi merchant is someone who buys pi coins from miners and resell to these crypto whales and holders of pi.. This is because pi network is not doing any pre-sale. The only way exchanges can get pi is by buying from miners and pi merchants stands in between the miners and the exchanges. How can I sell my pi coins? Selling pi coins is really easy, but first you need to migrate to mainnet wallet before you can do that. I will leave the telegram contact of my personal pi merchant to trade with. Tele-gram. @Pi_vendor_247

Summary of financial results for 1Q2024

InterCars

Isios-2024-Professional-Independent-Trustee-Survey.pdf

Henry Tapper

Scope Of Macroeconomics introduction and basic theories

nomankalyar153

Introduction to Value Added Tax System.ppt

VishnuVenugopal84

Introduction to Indian Financial System ()

Avanish Goel

how can I sell/buy bulk pi coins securely

DOT TECH

how to sell pi coins on Binance exchange

DOT TECH

Currently pi network is not tradable on binance or any other exchange because we are still in the enclosed mainnet. Right now the only way to sell pi coins is by trading with a verified merchant. What is a pi merchant? A pi merchant is someone verified by pi network team and allowed to barter pi coins for goods and services. Since pi network is not doing any pre-sale The only way exchanges like binance/huobi or crypto whales can get pi is by buying from miners. And a merchant stands in between the exchanges and the miners. I will leave the telegram contact of my personal pi merchant. I and my friends has traded more than 6000pi coins successfully Tele-gram @Pi_vendor_247

how to sell pi coins in South Korea profitably.

DOT TECH

Yes. You can sell your pi network coins in South Korea or any other country, by finding a verified pi merchant What is a verified pi merchant? Since pi network is not launched yet on any exchange, the only way you can sell pi coins is by selling to a verified pi merchant, and this is because pi network is not launched yet on any exchange and no pre-sale or ico offerings Is done on pi. Since there is no pre-sale, the only way exchanges can get pi is by buying from miners. So a pi merchant facilitates these transactions by acting as a bridge for both transactions. How can i find a pi vendor/merchant? Well for those who haven't traded with a pi merchant or who don't already have one. I will leave the telegram id of my personal pi merchant who i trade pi with. Tele gram: @Pi_vendor_247 #pi #sell #nigeria #pinetwork #picoins #sellpi #Nigerian #tradepi #pinetworkcoins #sellmypi

What price will pi network be listed on exchanges

DOT TECH

The rate at which pi will be listed is practically unknown. But due to speculations surrounding it the predicted rate is tends to be from 30$ — 50$. So if you are interested in selling your pi network coins at a high rate tho. Or you can't wait till the mainnet launch in 2026. You can easily trade your pi coins with a merchant. A merchant is someone who buys pi coins from miners and resell them to Investors looking forward to hold massive quantities till mainnet launch. I will leave the telegram contact of my personal pi vendor to trade with. @Pi_vendor_247

how to swap pi coins to foreign currency withdrawable.

DOT TECH

As of my last update, Pi is still in the testing phase and is not tradable on any exchanges. However, Pi Network has announced plans to launch its Testnet and Mainnet in the future, which may include listing Pi on exchanges. The current method for selling pi coins involves exchanging them with a pi vendor who purchases pi coins for investment reasons. If you want to sell your pi coins, reach out to a pi vendor and sell them to anyone looking to sell pi coins from any country around the globe. Below is the contact information for my personal pi vendor. Telegram: @Pi_vendor_247

Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card

nickysharmasucks

The unveiling of the IndusInd Bank Poonawalla Fincorp eLITE RuPay Platinum Credit Card marks a notable milestone in the Indian financial landscape, showcasing a successful partnership between two leading institutions, Poonawalla Fincorp and IndusInd Bank. This co-branded credit card not only offers users a plethora of benefits but also reflects a commitment to innovation and adaptation. With a focus on providing value-driven and customer-centric solutions, this launch represents more than just a new product—it signifies a step towards redefining the banking experience for millions. Promising convenience, rewards, and a touch of luxury in everyday financial transactions, this collaboration aims to cater to the evolving needs of customers and set new standards in the industry.

The secret way to sell pi coins effortlessly.

DOT TECH

Well as we all know pi isn't launched yet. But you can still sell your pi coins effortlessly because some whales in China are interested in holding massive pi coins. And they are willing to pay good money for it. If you are interested in selling I will leave a contact for you. Just telegram this number below. I sold about 3000 pi coins to him and he paid me immediately. Telegram: @Pi_vendor_247

USDA Loans in California: A Comprehensive Overview.pptx

marketing367770

USDA Loans in California: A Comprehensive Overview If you're dreaming of owning a home in California's rural or suburban areas, a USDA loan might be the perfect solution. The U.S. Department of Agriculture (USDA) offers these loans to help low-to-moderate-income individuals and families achieve homeownership. Key Features of USDA Loans: Zero Down Payment: USDA loans require no down payment, making homeownership more accessible. Competitive Interest Rates: These loans often come with lower interest rates compared to conventional loans. Flexible Credit Requirements: USDA loans have more lenient credit score requirements, helping those with less-than-perfect credit. Guaranteed Loan Program: The USDA guarantees a portion of the loan, reducing risk for lenders and expanding borrowing options. Eligibility Criteria: Location: The property must be located in a USDA-designated rural or suburban area. Many areas in California qualify. Income Limits: Applicants must meet income guidelines, which vary by region and household size. Primary Residence: The home must be used as the borrower's primary residence. Application Process: Find a USDA-Approved Lender: Not all lenders offer USDA loans, so it's essential to choose one approved by the USDA. Pre-Qualification: Determine your eligibility and the amount you can borrow. Property Search: Look for properties in eligible rural or suburban areas. Loan Application: Submit your application, including financial and personal information. Processing and Approval: The lender and USDA will review your application. If approved, you can proceed to closing. USDA loans are an excellent option for those looking to buy a home in California's rural and suburban areas. With no down payment and flexible requirements, these loans make homeownership more attainable for many families. Explore your eligibility today and take the first step toward owning your dream home.

Recently uploaded (20)

一比一原版BCU毕业证伯明翰城市大学毕业证成绩单如何办理

innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...

what is the best method to sell pi coins in 2024

PF-Wagner's Theory of Public Expenditure.pptx

The Evolution of Non-Banking Financial Companies (NBFCs) in India: Challenges...

655264371-checkpoint-science-past-papers-april-2023.pdf

What website can I sell pi coins securely.

Summary of financial results for 1Q2024

Isios-2024-Professional-Independent-Trustee-Survey.pdf

Scope Of Macroeconomics introduction and basic theories

Introduction to Value Added Tax System.ppt

Introduction to Indian Financial System ()

how can I sell/buy bulk pi coins securely

how to sell pi coins on Binance exchange

how to sell pi coins in South Korea profitably.

What price will pi network be listed on exchanges

how to swap pi coins to foreign currency withdrawable.

Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card

The secret way to sell pi coins effortlessly.

USDA Loans in California: A Comprehensive Overview.pptx

Applying reinforcement learning to single and multi-agent economic problems

1. Applying reinforcement learning to economics Neal Hughes Australian National University neal.hughes@anu.edu.au November 17, 2014 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23

2. Machine learning Machine learning I algorithms that `learn' from data, i.e., build models from data with minimal theory / human involvement. I goes hand in hand with `Big Data' Supervised Learning I estimating functions mapping `input` variables X to `target' variables Y. I aka non-parametric regression Reinforcement learning I learning to make optimal (reward maximising) decisions in dynamic environments: learning optimal policy functions for Markov Decision Processes (MDPs) I aka approximate dynamic programming Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23

3. Reinforcement learning Agent Reward, rt Action, at Environment State, st st+1 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23

4. A (single agent) water storage problem Inflow, It+1 Release point, F1t Storage, St Demand node 1 Extraction, Et Extraction point, F2t End of system, F3t 2 3 Return flow, Rt Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23

5. A (single agent) water storage problem max fWtgt=¥ t=0 E ( ¥å t=0 btP(Qt , It ) ) Subject to: St+1 = minfSt Wt d0aS2/3 t + It+1, Kg 0 Wt St Qt maxf(1 d1b)Wt d1a, 0g Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23

6. Why reinforcement learning? 0 200 400 600 800 1000 Storage (GL) 2000 1500 1000 500 0 Inflow (GL) Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 6 / 23

7. The Q function The standard Bellman equation with state value function V (s) V (s) = max a R(s, a) + b Z S T(s, a, s0)V (s0) ds0 The Bellman equation with action-value function Q(a, s) Q(a, s) = R(s, a) + b Z S T(s, a, s0) max a Q(a, s0) ds0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23

8. Fitted Q Iteration Algorithm 1: Fitted Q Iteration 1 initialise s0 2 Run a simulation with exploration for T periods 3 Store the samples fat , st , st+1, rtgTt =0 4 initialise Q(at , st ) 5 repeat // Iterate until convergence 6 for t = 0 to T do 7 set ˆQ t = rt + b. maxa .Q(a, st+1) 8 end 9 estimate Q by regressing ˆQ t against (at , st ) 10 until a stopping rule is satis

9. ed; With large dense data, computing maxa Q(a, .) for each point is wasteful Alternative: max over a sample of points and

10. t a value function (Fitted Q-V iteration) Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23

11. Single agent reinforcement learning Figure : An approximately equidistant grid in two dimensions 4 3 2 1 0 −1 −2 −3 −4 −3 −2 −1 0 1 2 3 4 −4 (a) 10000 iid standard normal points 4 3 2 1 0 −1 −2 −3 −4 −3 −2 −1 0 1 2 3 4 −4 (b) 100 points at least 0.4 apart Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23

12. Tilecoding input space tiling layer 1 tiling layer 2 input point Xt activated tile, layer 1 activated tile, layer 2 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23

13. Single

14. ne grid 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 11 / 23

15. Single chunky grid 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 12 / 23

16. Tilecoding: many chunky grids 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 13 / 23

17. Tilecoding Fitting Averaging Averages Stochastic Gradient Descent Setup Regular grids `Optimal' displacement vectors Linear extrapolation Implementation Cython with OpenMP Perfect `hashing' Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23

18. A test case 10000 20000 30000 40000 50000 60000 70000 80000 Number of samples 1.000 0.999 0.998 0.997 0.996 0.995 0.994 0.993 Social welfare as percentage of SDP SDP TC-A TC-ASGD Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23

19. A test case Table : Computation time 5000 10000 20000 50000 80000 SDP 6.6 7.2 7.5 7.4 7.4 TC-A 0.4 0.4 0.5 0.6 0.8 TC-ASGD 0.4 0.6 0.9 1.3 1.9 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23

20. Multi agent problems Nash equilibrium concepts for stochastic games (Economics) Markov Perfect Equilibrium Oblivious Equilibrium Learning in games (Economics) Factious play Partial best response dynamic Multi-agent learning (Computer Science / Economics) each agent follows a single agent RL method or we combine RL with game theory / equilibrium concepts Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23

21. Multi-agent

22. tted Q-V iteration Each agent follows a

23. tted Q-V iteration algorithm except... I only a sample of agents update their policies each stage (similar to partial best response) I each new batch of samples is blended with the existing batch of samples (similar to

24. ctitious play) Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 18 / 23

25. Conclusions RL can be successfully applied to economic problems Batch methods (such as

26. tted Q-V iteration) are suited to our context tilecoding is a great approximation method for low dimension problems Our multi-agent method provides a middle ground between macro-DP methods and agent based-evolutionary methods Allows us to consider complex multi-agent problems with externalities, but still have near optimal agents Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 19 / 23

27. A (multi-agent) water storage problem Inflow, It+1 Release point, F1t Storage, St Demand node 1 Extraction, Et Extraction point, F2t End of system, F3t 2 3 Return flow, Rt Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 20 / 23

28. Example: capacity sharing Initial balance Updated balance Total Inflow Inflow credit Internal Spill 20 ML +10 ML +10 ML 10 ML User 1 Volume 10 ML User 2 Volume 50 ML User 1 Airspace 40 ML User 1 Volume 30 ML User 2 Volume 50 ML User 1 Airspace 20 ML Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 21 / 23

29. A test case Figure : Mean storage by iteration 0 5 10 15 20 Iteration 800 750 700 650 600 550 Mean storage St (GL) CS NS OA SWA Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 22 / 23

30. A test case Figure : Mean social welfare by iteration 0 5 10 15 20 Iteration 195.5 195.0 194.5 194.0 193.5 193.0 192.5 192.0 i=1 uit ($M) Pn Mean social welfare CS NS OA SWA Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 23 / 23

Applying reinforcement learning to single and multi-agent economic problems

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Applying reinforcement learning to single and multi-agent economic problems

Similar to Applying reinforcement learning to single and multi-agent economic problems (20)

More from anucrawfordphd

More from anucrawfordphd (10)

Recently uploaded

Recently uploaded (20)

Applying reinforcement learning to single and multi-agent economic problems