Uncertainty Awareness in Integrating Machine Learning and Game Theory

Uncertainty Awareness in Integrating
Machine Learning and Game Theory
不確実性を通して見る
機械学習とゲーム理論とのつながり
Rikiya Takahashi
SmartNews, Inc.
rikiya.takahashi@smartnews.com
Mar 5, 2017
Game Theory Workshop 2017
https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-
machine-learning-and-game-theory

About Myself
●
Rikiya TAKAHASHI, Ph.D. (高橋力矢)
– Engineer in SmartNews, Inc., from 2015 to current
– Research Staff Member in IBM Research – Tokyo, from 2004 to 2015
● Research Interests: machine learning, reinforcement learning,
cognitive science, behavioral economics, complex systems
– Descriptive models about real human behavior
– Prescriptive decision making from descriptive models
– Robust algorithms working under high uncertainty
● Limited sample size, high dimensionality, high noise

Example of Previous Work
● Budget-Constrained Markov Decision Process for
Marketing-Mix Optimization (Takahashi+, 2013 & 2014)
2014/01/01 2014/01/08 … 2014/12/31
EM DM TM EM DM TM … EM DM TM
Segment #1 …
Segment #2 …
… …
Segment #N …
EM: e-mail DM: direct mail TM: tele-marketing
$$
E-mail
TV CM
Purchase
prediction
response
stimulus
Browsing
Revenues in past
16 weeks > $200?
#purchase in past
8 weeks > 2?
#browsing in past
4 weeks > 15?
No Yes
Strategic Segment #1
MS
#1
MS
#2
#EMs in past
2 weeks > 2?
No Yes
MS
#255
MS
#256
#EMs in past
2 weeks > 2?
No Yes
…..............................................................
...
Historical
Data
Consumer Segmentation
Time-Series Predictive Modeling
Optimal Marketing-Mix
& Targeting Rules

● Travel-Time Distribution Prediction on a Large
Road Network (Takahashi+, 2012)
A
B
rN/L
rN/L
rN/L
rN/L
rN/L
rN/L
ψ1
(y)
ψ2
(y)
ψ3
(y)
ψ4
(y)
ψ5
(y)
ψ6
(y)
intersection
link
1
0 0
00.5 00.5
0
0.85
Road Network &
Travel Time Data by Taxi
Predictive Modeling
of Travel Time
Distribution
Route-Choice
Recommendation or
Traffic Simulation

● Bayesian Discrete Choice Modeling for Irrational
Compromise Effect (Takahashi & Morimura, 2015)
– Explained later today
A
0
B
C
D
{A, B, C}
{B, C, D}
The option having
the highest share
inexpensiveness
product quality
Utility Calculator
(UC)
Decision Making
System (DMS)
Vector
of attributes
=
A uiA
=3.26
B uiB
=3.33
C uiC
=2.30
send
samples
utility
A
B
utility sample
utility estimate
C

Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues

Machine Learning (ML)
● Set of inductive disciplines to design probabilistic
model and estimate its parameters that maximize
out-of-sample predictive accuracy
– Supervised learning: model and fit P(Y|X)
– Unsupervised learning: model and fit P(X)
● What machine learners care about
– Bias-variance trade-off
– Curse of dimensionality

Estimation via Bayes' theorem
● Basis behind today's most ML algorithm
posterior distribution: p(θ∣D)=
p(D∣θ ) p(θ)
∫θ
p(D∣θ ) p(θ)d θ
predictive distribution: p( y∗
∣D)=∫θ
p( y∗
∣θ) p(θ∣D)d θ
posterior mode: ̂θ =argmax
θ
[log p(D∣θ )+log p(θ )]
predictive distribution: p( y∗
∣D)≃p( y∗
∣̂θ )
Maximum A
Posteriori
estimation
Bayesian
estimation
p(θ )
approximation
● Q. Why placing a prior ?
– A1. To quantify uncertainty as posterior
– A2. To avoid overfitting
data:D model parameter:θ

E.g., Gaussian Process Regression (GPR)
● Bayesian Ridge Regression
– Unlike MAP Ridge regression (dark gray), input-
dependent uncertainty (light gray) is quantified.
prior:( f
f ∗)∼N
(0n+1 ,
(K k∗
k∗
T
K (x
∗
, x
∗
)))
where K =(Kij≡K (xi , x j )),
k∗=(K (x1, x
∗
),…, K (xn , x
∗
))
T
,
K (x , x ')=exp(−γ∥x−x'∥
2
)
data likelihood:(y
y
∗)∼N
((f
f
∗),σ
2
In+1
)
predictive distribution: y
∗
∣K , x
∗
, X , y
∼N (k∗
T
(σ
2
I n+K )
−1
y ,
K (x
∗
, x
∗
)−k∗
T
(σ
2
In+K)
−1
k∗+σ
2
)

Gap between Deduction & Induction
Today's AI is integrating both.
Do not divide the work between
inductive & deductive researchers.
Deductive Mind
● Optimize decisions for
a given environment
● Casino owner's mentality
● Game theorist, probabilist,
operations researcher
Inductive Mind
● Estimate the environment
from observations
● Gambler's mentality
● Statistician, machine learner,
econometrician

Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
Estimate is different from
the true environment .
̂Θ D
Θ
∀i∈{1,…, n} ̂π D , i=arg max
πi
R(πi∣{̂π D , j }j≠i , ̂Θ D )

Dataset
in the Real World
Estimate of
Environment
Inductive Process
Econometrics, etc.
Policy
Decisions
Deductive Process
programming, Markov
D
̂Θ D
̂π D
∀i∈{1,…, n} ̂π D , i=arg max
πi
R(πi∣{̂π D , j }j≠i , ̂Θ D )
How the estimation-based
policy is different from
the true optimal policy ?
̂π D
π
∗
∀i∈{1,…, n} π i
∗
=arg max
πi
R(πi∣{π j
∗
}j≠i ,Θ )

Dataset
in the Real World
Estimate of
Environment
Inductive Process
Econometrics, etc.
Policy
Decisions
Deductive Process
programming, Markov
D
̂Θ D
̂π D
State-of-the-art AI
Dataset
By-product
Direct Optimization
Integration of Machine
Learning and Optimization
Algorithms
Policy
Decisions
D
̌Θ D
̌π D

See the Difference
in the Real World:
Unnecessarily too much effort
in solving each subproblem
Vulnerable to estimation error
State-of-the-art AI
Less effort of needless
intermediate estimation
Robust to estimation error
̌Θ D
̌π D̂π D
̂Θ D
Accurately fitted on minimal
prediction error for dataset D,
while minimizing the error of
this parameter is not the goal.
Exceedingly optimized
given wrong assumption
Fitted but not minimizing the
error for dataset D. Often
less complex than .
Safely optimized with less
reliance on ̌Θ D
̂Θ D

See the Difference
in the Real World:
State-of-the-art AI
Solve a Hard
Inductive Problem
Solve another Hard
Deductive Problem
Solve an Easier Problem
that Involves both
Induction & Deduction
● Recommendation of simple solving
– Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8

Optimization under Uncertainty
● Interval Estimation
(e.g., Bayesian)
– Quantify uncertainty
– Optimize over all
possible environments
● Minimal Estimation
(e.g., Vapnik)
– Omit intermediate step
– Solve the minimal
optimization problem
● Two principles are effective in practice.

Vapnik's Principle (Vapnik, 1995)
When solving a problem of interest, do not solve a
more general problem as an intermediate step.
—Vladimir N. Vapnik
● E.g., classification or regression : predict Y given X
– #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem
– #2. Only fit P(Y|X)
● #2 is better than #1 because of its less estimation error.
– Better particularly when uncertainty is high: small
sample size, high dimensionality, and/or high noise

Batch Reinforcement Learning
● A good example of involving both inductive and
deductive processes.
● Also a good example of how to avoid
needlessly hard estimation.
● Basis behind the recent success of Deep Q-
Network to play games (Mnih+, 2013 & 2015),
and Alpha-Go (Silver+, 2016)

Markov Decision Process
● Framework for long-term-optimal decision making
– S: set of states, A: set of actions
P(s'|s,a): state-transition probability
r(s,a): immediate reward, : discounting factor
– Optimize policy for maximal cumulative reward
…
State #1
(e.g., Gold
Customer)
State #2
(e.g., Silver
Customer)
State #3
(e.g., Normal
Customer) t=0 t=1 t=2
$
$$
$$$
By Action #1
(e.g., ordinary discount on flight ticket)
…
t=0 t=1 t=2
$$
$
$
By Action #2
(e.g., free business-class upgrade)
γ ∈[0,1]
π (a∣s)

Markov Decision Process
● Easy to solve If the environment is known
– Via dynamic programming or linear programming
when P(s'|s,a) & r(s,a) are given with no uncertainty
– Behave myopically at
● For each state s, choose the action a that maximizes r(s,a).
– At time (t-1), choose the optimal action that maximizes
the immediate reward at time (t-1) plus the expected
reward after time t over the state transition distribution.
● What If the environment is unknown?
t →∞

Types of Reinforcement Learning
● Model-based ↔ Model-free
● On policy ↔ Off policy
● Value iteration ↔ policy search
● Model-based approach
– 1. System identification: estimate the MDP parameters
– 2. Sample multiple MDPs from the interval estimate
– 3. Solve every MDP & take the best action of best MDP
● Optimism in the face of uncertainty

Model-free approach
● Remember: our aim is to get the optimal policy.
No need of estimating environment, in principle.
– Act without fully identifying system: as long as we
choose the optimal action, it turned out right in the end.
● Even when doing estimation, utilize intermediate
statistic less complex than P(s'|s,a) & r(s,a).

Bellman Optimality Equation
● Policy is derived if we have an estimate of Q(s,a).
– Simpler than estimating P(s'|s,a) & r(s,a)
r
Q(s ,a)=E[r(s ,a)]+γ EP (s'∣s,a)
[max
a'
Q(s' ,a' )
]
π (a∣s)=
{1 a=argmax
a'
Q(s ,a' )
0 otherwise
̂Q(s ,a) (si ,ai ,si ' ,ri)i=1
n● Get an estimate from episodes

Fitted Q-Iteration (Ernst+, 2005)
● For k=1,2,... iterate 1) value computation and
2) regression as
∀i∈{1,…, n} vi
(k)
:=ri+γ ̂Qk
(1)
(si ' ,argmax
a'
̂Qk
(0)
(si ' ,a')
)
∀ f ∈{0,1} ̂Qk+1
( f )
:=argmin
Q∈H
[1
2
∑i∈J f
(vi
(k )
−Q(si ,ai))
2
+R(Q)]
1)
2)
– H: hypothesis space of function, Q0
≡ 0, R: regularization term
– Indices 1...n are randomly split into sets J0
and J1
, for avoiding
over-estimation of Q values (Double Q-Learning (Hasselt, 2010)).
● Related with Experience Replay in Deep Q-
Network (Mnih+, 2013 & 2015)
– See (Lange+, 2012) for more details.

Policy Gradient
●
Accurately fit policy　　 while roughly fit Q(s,a)
– More directness to the final aim
– Applicable for continuous action problem
π θ (a∣s)
∇θ J (θ)⏟
gradient of performance
= Eπ θ
[∇θ logπ θ (a∣s)Q
π
(s ,a)]⏟
expected log-policy times cumulative-reward over s and a
Policy Gradient Theorem (Sutton+, 2000)
● Variations on providing the rough estimate of Q
– REINFORCE (Williams, 1992): reward samples
– Actor-Critic: regression models (e.g., Natural
Gradient (Kakade, 2002), A3C (Mnih+, 2016))

Functional Approximation in Practice
● Concrete functional form of Q(s,a) and/or
– Q should be a universal functional approximator:
class of functions that can approximate any function
if sufficiently many parameters are introduced.
● Examples of universal approximator
Tree Ensembles
Random Forest, Gradient
Boosted Decision Trees
(Deep) Neural
Networks
Mixture of Radial
Basis Functions
(RBFs)
+
π (a∣s)

Functional Approximation in Practice
● Is any univ. approximator OK? – No, unfortunately.
– Universal approximator is merely asymptotically unbiased.
– Better to have
● Low variance in terms of bias-variance trade-off
● Resistance to curse of dimensionality
● One reason of deep learning's success
– Flexibility to represent multi-modal function with less
parameters than nonparametric (RBF or tree) models
– Techniques to stabilize numerical optimization
● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.

Message
● Uncertainty awareness is essential on data-
oriented decision making.
– No division between induction and deduction
– Removing needless intermediate estimation
– Fitted Q-Iteration as an illustrative example
● Less parameters, less uncertainty

Shrinkage Matters in the Real World.
● Q. Why prior helps avoid over-fitting?
– A. shrinkage towards prior mean (e.g., 0 in Ridge reg.)
● Over-optimization ↔ Over-rationalization?
– (e.g., (Takahashi and Morimura, 2015))
0 Coefficient #1
Coefficient #2
Solution of
2-dimensional
OLS &
Ridge regression
Ordinary Least Squares (OLS)
Ridge : closer to prior mean 0 than OLS
Prior mean 0 is independent from training data

Discrete Choice Modelling
Goal: predict prob. of choosing an option from a choice set.
Why solving this problem?
Brand positioning among competitors
Sales promotion (yet involving some abuse)
Game Theory Workshop 2017 Uncertainty Awareness

Random Utility Theory as a Rational Model
Each human is a rational maximizer of random utility.
Theoretical basis behind many statistical marketing models.
Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train,
2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjoint
analysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence and
Urtasun, 2009)), ...

Complexity of Real Human’s Choice
An example of choosing PC (Kivetz et al., 2004)
Each subject chooses 1 option from a choice set
A B C D E
CPU [MHz] 250 300 350 400 450
Mem. [MB] 192 160 128 96 64
Choice Set #subjects
{A, B, C} 36:176:144
{B, C, D} 56:177:115
{C, D, E} 94:181:109
Can random utility theory still explain the preference reversals?
B C or C B?

Similarity E↵ect (Tversky, 1972)
Top-share choice can change due to correlated utilities.
E.g., one color from {Blue, Red} or {Violet, Blue, Red}?

Attraction E↵ect (Huber et al., 1982)
Introduction of an absolutely-inferior option A (=decoy)
causes irregular increase of option A’s attractiveness.
Despite the natural guess that decoy never a↵ects the choice.
If D A, then D A A .
If A D, then A is superior to both A and D.

Compromise E↵ect (Simonson, 1989)
Moderate options within each chosen set are preferred.
Di↵erent from non-linear utility function involving
diminishing returns (e.g.,
p
inexpensiveness+
p
quality).

Positioning of the Proposed Work
Sim.: similarity, Attr.: attraction, Com.: compromise
Sim. Attr. Com. Mechanism Predict. for Likelihood
Test Set Maximization
SPM OK NG NG correlation OK MCMC
MDFT OK OK OK dominance & indi↵erence OK MCMC
PD OK OK OK nonlinear pairwise comparison OK MCMC
MMLM OK NG OK none OK Non-convex
NLM OK NG NG hierarchy NG Non-convex
BSY OK OK OK Bayesian OK MCMC
LCA OK OK OK loss aversion OK MCMC
MLBA OK OK OK nonlinear accumulation OK Non-convex
Proposed OK NG OK Bayesian OK Convex
MDFT: Multialternative Decision Field Theory (Roe et al., 2001)
PD: Proportional Di↵erence Model (Gonz´alez-Vallejo, 2002)
MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000)
SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009)
NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001)
BSY: Bayesian Model of (Shenoy and Yu, 2013)
LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004)
MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014)

Key Idea #1: a Dual Personality Model
Regard human as an estimator of her/his own utility function.
Assumption 1: DMS does not know the original utility func.
1 UC computes the sample value of every option’s utility,
and sends only these samples to DMS.
2 DMS statistically estimates the utility function.

Utility Calculator as Rational Personality
For every context i and option j, UC computes noiseless
sample of utility vij by applying utility function fUC : RdX !R.
vij = fUC (xij ), fUC (x),b + w>
(x)
b: bias term
: RdX
!Rd
: mapping function
w !Rd
: vector of coe cients

Key Idea #2: DMS is a Bayesian estimator
DMS does not know fUC but has utility samples {vij }
m[i]
j=1 .
Assumption 2: DMS places a choice-set-dependent Gaussian
Process (GP) prior on regressing the utility function.
µi ⇠ N 0m[i], 2
K(Xi )
K(Xi ) = (K(xij , xij0 ))2Rm[i]⇥m[i]
vi , (vi1, . . ., vim[i])>
⇠N µi , 2
Im[i]
µi 2Rm[i]
: vector of utility
2
: noise level
K(·, ·): similarity function
Xi , (xi1 2RdX
, . . . , xim[i])>
The posterior mean is given as
u⇤
i ,E[µi |vi , Xi , K] = K(Xi ) Im[i]+K(Xi )
1
b1m[i]+ i w .

Convex Optimization for Model Parameters
Likelihood of the entire model is tractable, assuming the choice
is given by a logit whose mean utility is the posterior mean u⇤
i .
Thus we can ﬁt the function fUC from the choice data.
Conveniently, MAP estimation of fUC is convex for ﬁxed K.
bb, cw = max
b,w
nX
i=1
`(bHi 1m[i]+Hi i w , yi )
c
2
kw k2
where `(u⇤
i , yi ),log
exp(u⇤
iyi
)
Pm[i]
j0=1exp(u⇤
ij0 )
and Hi ,K(Xi )(Im[i]+K(Xi )) 1

Irrationality as Bayesian Shrinkage
Implication from the posterior-mean utility in (1)
Each option’s utility is shrunk into prior mean 0.
Strong shrinkage for an option dissimilar to the others,
due to its high posterior variance (=uncertainty).
u⇤
i = K(Xi ) Im[i]+K(Xi )
1
| {z }
shrinkage factor
b1m[i]+ i w
| {z }
vec. of utility samples
. (1)
Context e↵ects as Bayesian uncertainty aversion
E.g., RBF kernel
K(x, x0
)=exp( kx x0
k2
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4
FinalEvaluation
X1=(5-X2)
DCBA
{A,B,C}
{B,C,D}

Recovered Context-Dependent Choice Criteria
For a speaker dataset: successfully captured mixture of
objective preference and subjective context e↵ects.
A B C D E
Power [Watt] 50 75 100 125 150
Price [USD] 100 130 160 190 220
Choice Set #subjects
{A, B, C} 45:135:145
{B, C, D} 58:137:111
{C, D, E} 95:155: 91
2
3
4
100 150 200
Evaluation
Price [USD]
EDCBA
Obj. Eval.
{A,B,C}
{B,C,D}
{C,D,E}
-1.1
-1
-0.9
-0.8
AverageLog-Likelihood
Dataset
PC SP SM
LinLogit
NpLogit
LinMix
NpMix
GPUA

A Result of p-beauty Contest by Real Humans
Guess 2/3 of all votes (0-100). Mean is apart from the Nash
equilibrium 0 (Camerer et al., 2004; Ho et al., 2006).
Table: Average Choice in (2/3)-beauty Contests
Subject Pool Group Size Sample Size Mean[Yi ]
Caltech Board 73 73 49.4
80 year olds 33 33 37.0
High School Students 20-32 52 32.5
Economics PhDs 16 16 27.4
Portfolio Managers 26 26 24.3
Caltech Students 3 24 21.5
Game Theorists 27-54 136 19.1

Modeling Bounded Rationality
Early stopping at step k: Level-k thinking or Cognitive
Hierarchy Theory (Camerer et al., 2004)
Humans cannot predict the inﬁnite future.
Using non-stationary transitional state
Randomization of utility via noise "it: Quantal Response
Equilibrium (McKelvey and Palfrey, 1995)
8i 2{1, . . . , n} Y
(t)
i |Y
(t 1)
i = arg max
Y
h
fi (Y , Y
(t 1)
i ) + "it
i
Both methods essentially work as regularization of rationality.
Shrinkage into initial values or uniform choice probabilities

Linking ML with Game Theory (GT)
via Shrinkage Principle
Optimization
without shrinkage
Optimization
with shrinkage
ML GT
Maximum-Likelihood estimation
Bayesian estimation Transitional State
or Quantal Response Equilibrium
Nash Equilibrium
Optimal for training data,
but less generalization
capability to test data
Optimal for given game
but less predictable to real-
world decisions
Shrinkage towards uniform
probabilities causes suboptimality
for the given game, but more
predictable to real-world decisions
Shrinkage towards prior causes
suboptimality for training data,
but more generalization capability
to test data

Early Stopping and Regularization
ML as a Dynamical System
to find the optimal parameters
GT as a Dynamical System
to find the equilibrium
Parameter #1
Parameter #2
Exact Maximum-likelihood
estimate (e.g., OLS)
Exact Bayesian estimate
shrunk towards zero
(e.g., Ridge regression)
0
t=10
t=20
t=30
t=50
An early-stopping
estimate (e.g., Partial
Least Squares)
t=0
t=1
t →∞
t=2
...
mean = 50
mean = 34
mean = 15
mean = 0
Nash
Equilibrium
Level-2
Transitional State

Message
● Bayesian shrinkage ↔ Bounded rationality
– Dual-personality model for contextual effects
– Towards data-oriented & more realistic games:
export ML regularization techniques to GT
● Analyze dynamics or uncertainty-aware equilibria
– Early-stopped transitional state, or
– QRE with uncertainty on each player's utility function

Additional Implications from ML
● Multiple equilibria or saddle points?
● Equilibria or “typical” transitional states?
– Slow convergence
– Plateau of objective function

Recent history in ML
● Waste of ~20 years for local optimality issue
– Neural Networks (NNs) have been criticized for their local
optimality in fitting the parameters.
– ML community has been sticked with convex optimization
approaches (e.g., Support Vector Machines (Vapnik, 1995)).
– Most solutions in fitting high-dimensional NNs, however, are
found to be not local optima but saddle points (Bray & Dean,
2007; Dauphin+, 2014)!
– After skipping saddle points by perturbation, most of the local
optima empirically provide similar prediction capabilities.
● Please do not make the same mistake in multi-
agent optimization problems (=games)!

Why most are saddle points?
● See spectrum of Hessian matrices of a random-
drawn non-linear function from a Gaussian process.
Local minima: every
eigenvalue is positive.
Local maxima: every
eigenvalue is negative.
Univariate Function
Saddle point: both
positive & negative
eigenvalues exist.
● In high-dimensional function, Hessian contains both
positive & negative eigenvalues with high probability.
Bivariate Function
https://en.wikipedia.org/wiki/Saddle_point

Open Questions for Multiple Equilibria
● If a game is very complex involving lots of
parameters in pay-off or utility functions, then
– Are most of its critical points unstable saddle points?
– Is number of equilibria much smaller than our guess?
● If we obtain a few equilibria of such complex game,
– Do most of such equilibria have similar properties?
– Don't we have to obtain other equilibria?

See Dynamics:
“Typical” Transitional State?
● MLers are sensitive to convergence rate in fitting.
– We are in the finite-sample & high-dimensional world:
only asymptotics is powerless, and computational
estimate is not equilibrium but transitional state.
http://sebastianruder.com/optimizing-gradient-descent/
(Kingma & Ba, 2015)

See Dynamics:
“Typical” Transitional State?
● Mixing time of Markov processes of some games
is exponential to the number of players.
– E.g., (Axtell+, 2000) equilibrium: equality of wealth
transitional states: severe inequality
Nash demand game
Equilibrium Transitional State
● What If #players is over thousands or millions?
– Severe inequality in most of the time

See Dynamics: Trapped in Plateau?
● Fitting of a Deep NN is often trapped in plateaus.
– Natural gradient descent (Amari, 1997) is often used
for quickly escaping from plateau.
– In real-world games, are people trapped in plateaus
rather than equilibria?
https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html

Conclusion
● Discussed how uncertainty should be incorporated
in inductive & deductive decision making.
– Quantifying uncertainty or simpler minimal estimation
● Linked Bayesian shrinkage with bounded rationality
– Towards data-oriented regularized equilibrium
● Implications from high-dimensional ML
– Saddle points, transitional state, and/or plateau

THANK YOU FOR ATTENDING!
Download this material from
https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-
machine-learning-and-game-theory

References
References I
Amari, S. (1997). Neural learning in structured parameter spaces -
natural Riemannian gradient. In Advances in Neural Information
Processing Systems 9, pages 127–133. MIT Press.
Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classes
in a multi-agent bargaining model. Working papers, Brookings
Institution - Working Papers.
Bray, A. J. and Dean, D. S. (2007). Statistics of critical points of
gaussian ﬁelds on large-dimensional spaces. Physics Review Letters,
98:150201.
Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is there
something quantum-like about the human mental lexicon? Journal of
Mathematical Psychology, 53(5):362–377.
Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchy
model of games. Quarterly Journal of Economics, 119:861–898.

References
References II
Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to
conjoint analysis. In Advances in Neural Information Processing
Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA.
Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice,
2:19–33.
Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and
Bengio, Y. (2014). Identifying and attacking the saddle point problem
in high-dimensional non-convex optimization. In Advances in Neural
Information Processing Systems 27, pages 2933–2941. Curran
Associates, Inc.
de Barros, J. A. and Suppes, P. (2009). Quantum mechanics,
interference, and the brain. Journal of Mathematical Psychology,
53(5):306–313.

References
References III
Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., and
Allenby, G. M. (2009). A probit model with structured covariance for
similarity e↵ects and source of volume calculations.
http://ssrn.com/abstract=1396232.
Gonz´alez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic and
context-sensitive model of choice behavior. Psychological Review,
109:137–154.
Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumer
research: Issues and outlook. Journal of Consumer Research,
5:103–123.
Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychology
of consumer and ﬁrm behavior with behavioral economics. Journal of
Marketing Research, 43(3):307–331.
Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically
dominated alternatives: Violations of regularity and the similarity
hypothesis. Journal of Consumer Research, 9:90–98.

References
References IV
Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G.,
Becker, S., and Ghahramani, Z., editors, Advances in Neural
Information Processing Systems 14, pages 1531–1538. MIT Press.
Kingma, D. and Ba, J. (2015). Adam: A method for stochastic
optimization. In The International Conference on Learning
Representations (ICLR), San Diego.
Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative models
for capturing the compromise e↵ect. Journal of Marketing Research,
41(3):237–257.
Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization
with gaussian processes. In Proceedings of the 26th Annual
International Conference on Machine Learning (ICML 2009), pages
601–608, New York, NY, USA. ACM.
McFadden, D. and Train, K. (2000). Mixed MNL models for discrete
response. Journal of Applied Econometrics, 15:447–470.

References
References V
McFadden, D. L. (1980). Econometric models of probabilistic choice
among products. Journal of Business, 53(3):13–29.
McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria for
normal form games. Games and Economic Behavior, 10:6–38.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.,
Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for
deep reinforcement learning. In Proceedings of The 33rd International
Conference on Machine Learning (ICML 2016), pages 1928–1937.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare,
M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen,
S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D.,
Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control
through deep reinforcement learning. Nature, 518:529–533.
Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy:
A model of the KT (kahnemantversky)-man. Journal of Mathematical
Psychology, 53(5):349–361.

References
References VI
Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001).
Multialternative decision ﬁeld theory: A dynamic connectionist model
of decision making. Psychological Review, 108:370–392.
Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ects
in preference choice: What makes for a bargain? In Proceedings of the
Cognitive Science Society Conference.
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den
Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V.,
Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N.,
Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T.,
and Hassabis, D. (2016). Mastering the game of Go with deep neural
networks and tree search. Nature, 529:484–489.
Simonson, I. (1989). Choice based on reasons: The case of attraction
and compromise e↵ects. Journal of Consumer Research, 16:158–174.

References
References VII
Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000).
Policy gradient methods for reinforcement learning with function
approximation. In Advances in Neural Information Processing Systems
12, pages 1057–1063. MIT Press.
Takahashi, R. and Morimura, T. (2015). Predicting preference reversals
via gaussian process uncertainty aversion. In Proceedings of the 18th
International Conference on Artiﬁcial Intelligence and Statistics
(AISTATS 2015), pages 958–967.
Trueblood, J. S. (2014). The multiattribute linear ballistic accumulator
model of context e↵ects in multialternative choice. Psychological
Review, 121(2):179–205.
Tversky, A. (1972). Elimination by aspects: A theory of choice.
Psychological Review, 79:281–299.
Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition in
dynamical models of multialternative choice. Psychological Review,
111:757–769.

References
References VIII
Wen, C.-H. and Koppelman, F. (2001). The generalized nested logit
model. Transportation Research Part B, 35:627–641.
Williams, H. (1977). On the formulation of travel demand models and
economic evaluation measures of user beneﬁt. Environment and
Planning A, 9(3):285–344.
Williams, R. J. (1992). Simple statistical gradient-following algorithms
for connectionist reinforcement learning. 8(3):229–256.
Yai, T. (1997). Multinomial probit with structured covariance for route
choice behavior. Transportation Research Part B: Methodological,
31(3):195–207.

Uncertainty Awareness in Integrating Machine Learning and Game Theory

More Related Content

Viewers also liked

Similar to Uncertainty Awareness in Integrating Machine Learning and Game Theory

Recently uploaded

Uncertainty Awareness in Integrating Machine Learning and Game Theory