Your SlideShare is downloading.
×

- 1. Uncertainty Awareness in Integrating Machine Learning and Game Theory 不確実性を通して見る 機械学習とゲーム理論とのつながり Rikiya Takahashi SmartNews, Inc. rikiya.takahashi@smartnews.com Mar 5, 2017 Game Theory Workshop 2017 https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating- machine-learning-and-game-theory
- 2. About Myself ● Rikiya TAKAHASHI, Ph.D. (高橋 力矢) – Engineer in SmartNews, Inc., from 2015 to current – Research Staff Member in IBM Research – Tokyo, from 2004 to 2015 ● Research Interests: machine learning, reinforcement learning, cognitive science, behavioral economics, complex systems – Descriptive models about real human behavior – Prescriptive decision making from descriptive models – Robust algorithms working under high uncertainty ● Limited sample size, high dimensionality, high noise
- 3. Example of Previous Work ● Budget-Constrained Markov Decision Process for Marketing-Mix Optimization (Takahashi+, 2013 & 2014) 2014/01/01 2014/01/08 … 2014/12/31 EM DM TM EM DM TM … EM DM TM Segment #1 … Segment #2 … … … Segment #N … EM: e-mail DM: direct mail TM: tele-marketing $$ E-mail TV CM Purchase prediction response stimulus Browsing Revenues in past 16 weeks > $200? #purchase in past 8 weeks > 2? #browsing in past 4 weeks > 15? No Yes Strategic Segment #1 MS #1 MS #2 #EMs in past 2 weeks > 2? No Yes MS #255 MS #256 #EMs in past 2 weeks > 2? No Yes ….............................................................. ... Historical Data Consumer Segmentation Time-Series Predictive Modeling Optimal Marketing-Mix & Targeting Rules
- 4. Example of Previous Work ● Travel-Time Distribution Prediction on a Large Road Network (Takahashi+, 2012) A B rN/L rN/L rN/L rN/L rN/L rN/L ψ1 (y) ψ2 (y) ψ3 (y) ψ4 (y) ψ5 (y) ψ6 (y) intersection link 1 0 0 00.5 00.5 0 0.85 Road Network & Travel Time Data by Taxi Predictive Modeling of Travel Time Distribution Route-Choice Recommendation or Traffic Simulation
- 5. Example of Previous Work ● Bayesian Discrete Choice Modeling for Irrational Compromise Effect (Takahashi & Morimura, 2015) – Explained later today A 0 B C D {A, B, C} {B, C, D} The option having the highest share inexpensiveness product quality Utility Calculator (UC) Decision Making System (DMS) Vector of attributes = A uiA =3.26 B uiB =3.33 C uiC =2.30 send samples utility A B utility sample utility estimate C
- 6. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
- 7. Machine Learning (ML) ● Set of inductive disciplines to design probabilistic model and estimate its parameters that maximize out-of-sample predictive accuracy – Supervised learning: model and fit P(Y|X) – Unsupervised learning: model and fit P(X) ● What machine learners care about – Bias-variance trade-off – Curse of dimensionality
- 8. Estimation via Bayes' theorem ● Basis behind today's most ML algorithm posterior distribution: p(θ∣D)= p(D∣θ ) p(θ) ∫θ p(D∣θ ) p(θ)d θ predictive distribution: p( y∗ ∣D)=∫θ p( y∗ ∣θ) p(θ∣D)d θ posterior mode: ̂θ =argmax θ [log p(D∣θ )+log p(θ )] predictive distribution: p( y∗ ∣D)≃p( y∗ ∣̂θ ) Maximum A Posteriori estimation Bayesian estimation p(θ ) approximation ● Q. Why placing a prior ? – A1. To quantify uncertainty as posterior – A2. To avoid overfitting data:D model parameter:θ
- 9. E.g., Gaussian Process Regression (GPR) ● Bayesian Ridge Regression – Unlike MAP Ridge regression (dark gray), input- dependent uncertainty (light gray) is quantified. prior:( f f ∗)∼N (0n+1 , (K k∗ k∗ T K (x ∗ , x ∗ ))) where K =(Kij≡K (xi , x j )), k∗=(K (x1, x ∗ ),…, K (xn , x ∗ )) T , K (x , x ')=exp(−γ∥x−x'∥ 2 ) data likelihood:(y y ∗)∼N ((f f ∗),σ 2 In+1 ) predictive distribution: y ∗ ∣K , x ∗ , X , y ∼N (k∗ T (σ 2 I n+K ) −1 y , K (x ∗ , x ∗ )−k∗ T (σ 2 In+K) −1 k∗+σ 2 )
- 10. Gap between Deduction & Induction Today's AI is integrating both. Do not divide the work between inductive & deductive researchers. Deductive Mind ● Optimize decisions for a given environment ● Casino owner's mentality ● Game theorist, probabilist, operations researcher Inductive Mind ● Estimate the environment from observations ● Gambler's mentality ● Statistician, machine learner, econometrician
- 11. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D Estimate is different from the true environment . ̂Θ D Θ ∀i∈{1,…, n} ̂π D , i=arg max πi R(πi∣{̂π D , j }j≠i , ̂Θ D )
- 12. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D ∀i∈{1,…, n} ̂π D , i=arg max πi R(πi∣{̂π D , j }j≠i , ̂Θ D ) How the estimation-based policy is different from the true optimal policy ? ̂π D π ∗ ∀i∈{1,…, n} π i ∗ =arg max πi R(πi∣{π j ∗ }j≠i ,Θ )
- 13. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D State-of-the-art AI Dataset By-product Direct Optimization Integration of Machine Learning and Optimization Algorithms Policy Decisions D ̌Θ D ̌π D
- 14. See the Difference Typical Problem Solving in the Real World: Unnecessarily too much effort in solving each subproblem Vulnerable to estimation error State-of-the-art AI Less effort of needless intermediate estimation Robust to estimation error ̌Θ D ̌π D̂π D ̂Θ D Accurately fitted on minimal prediction error for dataset D, while minimizing the error of this parameter is not the goal. Exceedingly optimized given wrong assumption Fitted but not minimizing the error for dataset D. Often less complex than . Safely optimized with less reliance on ̌Θ D ̂Θ D
- 15. See the Difference Typical Problem Solving in the Real World: State-of-the-art AI Solve a Hard Inductive Problem Solve another Hard Deductive Problem Solve an Easier Problem that Involves both Induction & Deduction ● Recommendation of simple solving – Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8
- 16. Optimization under Uncertainty ● Interval Estimation (e.g., Bayesian) – Quantify uncertainty – Optimize over all possible environments ● Minimal Estimation (e.g., Vapnik) – Omit intermediate step – Solve the minimal optimization problem ● Two principles are effective in practice.
- 17. Vapnik's Principle (Vapnik, 1995) When solving a problem of interest, do not solve a more general problem as an intermediate step. —Vladimir N. Vapnik ● E.g., classification or regression : predict Y given X – #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem – #2. Only fit P(Y|X) ● #2 is better than #1 because of its less estimation error. – Better particularly when uncertainty is high: small sample size, high dimensionality, and/or high noise
- 18. Batch Reinforcement Learning ● A good example of involving both inductive and deductive processes. ● Also a good example of how to avoid needlessly hard estimation. ● Basis behind the recent success of Deep Q- Network to play games (Mnih+, 2013 & 2015), and Alpha-Go (Silver+, 2016)
- 19. Markov Decision Process ● Framework for long-term-optimal decision making – S: set of states, A: set of actions P(s'|s,a): state-transition probability r(s,a): immediate reward, : discounting factor – Optimize policy for maximal cumulative reward … State #1 (e.g., Gold Customer) State #2 (e.g., Silver Customer) State #3 (e.g., Normal Customer) t=0 t=1 t=2 $ $$ $$$ By Action #1 (e.g., ordinary discount on flight ticket) … t=0 t=1 t=2 $$ $ $ By Action #2 (e.g., free business-class upgrade) γ ∈[0,1] π (a∣s)
- 20. Markov Decision Process ● Easy to solve If the environment is known – Via dynamic programming or linear programming when P(s'|s,a) & r(s,a) are given with no uncertainty – Behave myopically at ● For each state s, choose the action a that maximizes r(s,a). – At time (t-1), choose the optimal action that maximizes the immediate reward at time (t-1) plus the expected reward after time t over the state transition distribution. ● What If the environment is unknown? t →∞
- 21. Types of Reinforcement Learning ● Model-based ↔ Model-free ● On policy ↔ Off policy ● Value iteration ↔ policy search ● Model-based approach – 1. System identification: estimate the MDP parameters – 2. Sample multiple MDPs from the interval estimate – 3. Solve every MDP & take the best action of best MDP ● Optimism in the face of uncertainty
- 22. Model-free approach ● Remember: our aim is to get the optimal policy. No need of estimating environment, in principle. – Act without fully identifying system: as long as we choose the optimal action, it turned out right in the end. ● Even when doing estimation, utilize intermediate statistic less complex than P(s'|s,a) & r(s,a).
- 23. Bellman Optimality Equation ● Policy is derived if we have an estimate of Q(s,a). – Simpler than estimating P(s'|s,a) & r(s,a) r Q(s ,a)=E[r(s ,a)]+γ EP (s'∣s,a) [max a' Q(s' ,a' ) ] π (a∣s)= {1 a=argmax a' Q(s ,a' ) 0 otherwise ̂Q(s ,a) (si ,ai ,si ' ,ri)i=1 n● Get an estimate from episodes
- 24. Fitted Q-Iteration (Ernst+, 2005) ● For k=1,2,... iterate 1) value computation and 2) regression as ∀i∈{1,…, n} vi (k) :=ri+γ ̂Qk (1) (si ' ,argmax a' ̂Qk (0) (si ' ,a') ) ∀ f ∈{0,1} ̂Qk+1 ( f ) :=argmin Q∈H [1 2 ∑i∈J f (vi (k ) −Q(si ,ai)) 2 +R(Q)] 1) 2) – H: hypothesis space of function, Q0 ≡ 0, R: regularization term – Indices 1...n are randomly split into sets J0 and J1 , for avoiding over-estimation of Q values (Double Q-Learning (Hasselt, 2010)). ● Related with Experience Replay in Deep Q- Network (Mnih+, 2013 & 2015) – See (Lange+, 2012) for more details.
- 25. Policy Gradient ● Accurately fit policy while roughly fit Q(s,a) – More directness to the final aim – Applicable for continuous action problem π θ (a∣s) ∇θ J (θ)⏟ gradient of performance = Eπ θ [∇θ logπ θ (a∣s)Q π (s ,a)]⏟ expected log-policy times cumulative-reward over s and a Policy Gradient Theorem (Sutton+, 2000) ● Variations on providing the rough estimate of Q – REINFORCE (Williams, 1992): reward samples – Actor-Critic: regression models (e.g., Natural Gradient (Kakade, 2002), A3C (Mnih+, 2016))
- 26. Functional Approximation in Practice ● Concrete functional form of Q(s,a) and/or – Q should be a universal functional approximator: class of functions that can approximate any function if sufficiently many parameters are introduced. ● Examples of universal approximator Tree Ensembles Random Forest, Gradient Boosted Decision Trees (Deep) Neural Networks Mixture of Radial Basis Functions (RBFs) + π (a∣s)
- 27. Functional Approximation in Practice ● Is any univ. approximator OK? – No, unfortunately. – Universal approximator is merely asymptotically unbiased. – Better to have ● Low variance in terms of bias-variance trade-off ● Resistance to curse of dimensionality ● One reason of deep learning's success – Flexibility to represent multi-modal function with less parameters than nonparametric (RBF or tree) models – Techniques to stabilize numerical optimization ● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.
- 28. Message ● Uncertainty awareness is essential on data- oriented decision making. – No division between induction and deduction – Removing needless intermediate estimation – Fitted Q-Iteration as an illustrative example ● Less parameters, less uncertainty
- 29. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
- 30. Shrinkage Matters in the Real World. ● Q. Why prior helps avoid over-fitting? – A. shrinkage towards prior mean (e.g., 0 in Ridge reg.) ● Over-optimization ↔ Over-rationalization? – (e.g., (Takahashi and Morimura, 2015)) 0 Coefficient #1 Coefficient #2 Solution of 2-dimensional OLS & Ridge regression Ordinary Least Squares (OLS) Ridge : closer to prior mean 0 than OLS Prior mean 0 is independent from training data
- 31. Discrete Choice Modelling Goal: predict prob. of choosing an option from a choice set. Why solving this problem? Brand positioning among competitors Sales promotion (yet involving some abuse) Game Theory Workshop 2017 Uncertainty Awareness
- 32. Random Utility Theory as a Rational Model Each human is a rational maximizer of random utility. Theoretical basis behind many statistical marketing models. Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train, 2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjoint analysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence and Urtasun, 2009)), ... Game Theory Workshop 2017 Uncertainty Awareness
- 33. Complexity of Real Human’s Choice An example of choosing PC (Kivetz et al., 2004) Each subject chooses 1 option from a choice set A B C D E CPU [MHz] 250 300 350 400 450 Mem. [MB] 192 160 128 96 64 Choice Set #subjects {A, B, C} 36:176:144 {B, C, D} 56:177:115 {C, D, E} 94:181:109 Can random utility theory still explain the preference reversals? B C or C B? Game Theory Workshop 2017 Uncertainty Awareness
- 34. Similarity E↵ect (Tversky, 1972) Top-share choice can change due to correlated utilities. E.g., one color from {Blue, Red} or {Violet, Blue, Red}? Game Theory Workshop 2017 Uncertainty Awareness
- 35. Attraction E↵ect (Huber et al., 1982) Introduction of an absolutely-inferior option A (=decoy) causes irregular increase of option A’s attractiveness. Despite the natural guess that decoy never a↵ects the choice. If D A, then D A A . If A D, then A is superior to both A and D. Game Theory Workshop 2017 Uncertainty Awareness
- 36. Compromise E↵ect (Simonson, 1989) Moderate options within each chosen set are preferred. Di↵erent from non-linear utility function involving diminishing returns (e.g., p inexpensiveness+ p quality). Game Theory Workshop 2017 Uncertainty Awareness
- 37. Positioning of the Proposed Work Sim.: similarity, Attr.: attraction, Com.: compromise Sim. Attr. Com. Mechanism Predict. for Likelihood Test Set Maximization SPM OK NG NG correlation OK MCMC MDFT OK OK OK dominance & indi↵erence OK MCMC PD OK OK OK nonlinear pairwise comparison OK MCMC MMLM OK NG OK none OK Non-convex NLM OK NG NG hierarchy NG Non-convex BSY OK OK OK Bayesian OK MCMC LCA OK OK OK loss aversion OK MCMC MLBA OK OK OK nonlinear accumulation OK Non-convex Proposed OK NG OK Bayesian OK Convex MDFT: Multialternative Decision Field Theory (Roe et al., 2001) PD: Proportional Di↵erence Model (Gonz´alez-Vallejo, 2002) MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000) SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009) NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001) BSY: Bayesian Model of (Shenoy and Yu, 2013) LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004) MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014) Game Theory Workshop 2017 Uncertainty Awareness
- 38. Key Idea #1: a Dual Personality Model Regard human as an estimator of her/his own utility function. Assumption 1: DMS does not know the original utility func. 1 UC computes the sample value of every option’s utility, and sends only these samples to DMS. 2 DMS statistically estimates the utility function. Game Theory Workshop 2017 Uncertainty Awareness
- 39. Utility Calculator as Rational Personality For every context i and option j, UC computes noiseless sample of utility vij by applying utility function fUC : RdX !R. vij = fUC (xij ), fUC (x),b + w> (x) b: bias term : RdX !Rd : mapping function w !Rd : vector of coe cients Game Theory Workshop 2017 Uncertainty Awareness
- 40. Key Idea #2: DMS is a Bayesian estimator DMS does not know fUC but has utility samples {vij } m[i] j=1 . Assumption 2: DMS places a choice-set-dependent Gaussian Process (GP) prior on regressing the utility function. µi ⇠ N 0m[i], 2 K(Xi ) K(Xi ) = (K(xij , xij0 ))2Rm[i]⇥m[i] vi , (vi1, . . ., vim[i])> ⇠N µi , 2 Im[i] µi 2Rm[i] : vector of utility 2 : noise level K(·, ·): similarity function Xi , (xi1 2RdX , . . . , xim[i])> The posterior mean is given as u⇤ i ,E[µi |vi , Xi , K] = K(Xi ) Im[i]+K(Xi ) 1 b1m[i]+ i w . Game Theory Workshop 2017 Uncertainty Awareness
- 41. Convex Optimization for Model Parameters Likelihood of the entire model is tractable, assuming the choice is given by a logit whose mean utility is the posterior mean u⇤ i . Thus we can ﬁt the function fUC from the choice data. Conveniently, MAP estimation of fUC is convex for ﬁxed K. bb, cw = max b,w nX i=1 `(bHi 1m[i]+Hi i w , yi ) c 2 kw k2 where `(u⇤ i , yi ),log exp(u⇤ iyi ) Pm[i] j0=1exp(u⇤ ij0 ) and Hi ,K(Xi )(Im[i]+K(Xi )) 1 Game Theory Workshop 2017 Uncertainty Awareness
- 42. Irrationality as Bayesian Shrinkage Implication from the posterior-mean utility in (1) Each option’s utility is shrunk into prior mean 0. Strong shrinkage for an option dissimilar to the others, due to its high posterior variance (=uncertainty). u⇤ i = K(Xi ) Im[i]+K(Xi ) 1 | {z } shrinkage factor b1m[i]+ i w | {z } vec. of utility samples . (1) Context e↵ects as Bayesian uncertainty aversion E.g., RBF kernel K(x, x0 )=exp( kx x0 k2 ) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 FinalEvaluation X1=(5-X2) DCBA {A,B,C} {B,C,D} Game Theory Workshop 2017 Uncertainty Awareness
- 43. Recovered Context-Dependent Choice Criteria For a speaker dataset: successfully captured mixture of objective preference and subjective context e↵ects. A B C D E Power [Watt] 50 75 100 125 150 Price [USD] 100 130 160 190 220 Choice Set #subjects {A, B, C} 45:135:145 {B, C, D} 58:137:111 {C, D, E} 95:155: 91 2 3 4 100 150 200 Evaluation Price [USD] EDCBA Obj. Eval. {A,B,C} {B,C,D} {C,D,E} -1.1 -1 -0.9 -0.8 AverageLog-Likelihood Dataset PC SP SM LinLogit NpLogit LinMix NpMix GPUA Game Theory Workshop 2017 Uncertainty Awareness
- 44. A Result of p-beauty Contest by Real Humans Guess 2/3 of all votes (0-100). Mean is apart from the Nash equilibrium 0 (Camerer et al., 2004; Ho et al., 2006). Table: Average Choice in (2/3)-beauty Contests Subject Pool Group Size Sample Size Mean[Yi ] Caltech Board 73 73 49.4 80 year olds 33 33 37.0 High School Students 20-32 52 32.5 Economics PhDs 16 16 27.4 Portfolio Managers 26 26 24.3 Caltech Students 3 24 21.5 Game Theorists 27-54 136 19.1 Game Theory Workshop 2017 Uncertainty Awareness
- 45. Modeling Bounded Rationality Early stopping at step k: Level-k thinking or Cognitive Hierarchy Theory (Camerer et al., 2004) Humans cannot predict the inﬁnite future. Using non-stationary transitional state Randomization of utility via noise "it: Quantal Response Equilibrium (McKelvey and Palfrey, 1995) 8i 2{1, . . . , n} Y (t) i |Y (t 1) i = arg max Y h fi (Y , Y (t 1) i ) + "it i Both methods essentially work as regularization of rationality. Shrinkage into initial values or uniform choice probabilities Game Theory Workshop 2017 Uncertainty Awareness
- 46. Linking ML with Game Theory (GT) via Shrinkage Principle Optimization without shrinkage Optimization with shrinkage ML GT Maximum-Likelihood estimation Bayesian estimation Transitional State or Quantal Response Equilibrium Nash Equilibrium Optimal for training data, but less generalization capability to test data Optimal for given game but less predictable to real- world decisions Shrinkage towards uniform probabilities causes suboptimality for the given game, but more predictable to real-world decisions Shrinkage towards prior causes suboptimality for training data, but more generalization capability to test data
- 47. Early Stopping and Regularization ML as a Dynamical System to find the optimal parameters GT as a Dynamical System to find the equilibrium Parameter #1 Parameter #2 Exact Maximum-likelihood estimate (e.g., OLS) Exact Bayesian estimate shrunk towards zero (e.g., Ridge regression) 0 t=10 t=20 t=30 t=50 An early-stopping estimate (e.g., Partial Least Squares) t=0 t=1 t →∞ t=2 ... mean = 50 mean = 34 mean = 15 mean = 0 Nash Equilibrium Level-2 Transitional State
- 48. Message ● Bayesian shrinkage ↔ Bounded rationality – Dual-personality model for contextual effects – Towards data-oriented & more realistic games: export ML regularization techniques to GT ● Analyze dynamics or uncertainty-aware equilibria – Early-stopped transitional state, or – QRE with uncertainty on each player's utility function
- 49. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
- 50. Additional Implications from ML ● Multiple equilibria or saddle points? ● Equilibria or “typical” transitional states? – Slow convergence – Plateau of objective function
- 51. Recent history in ML ● Waste of ~20 years for local optimality issue – Neural Networks (NNs) have been criticized for their local optimality in fitting the parameters. – ML community has been sticked with convex optimization approaches (e.g., Support Vector Machines (Vapnik, 1995)). – Most solutions in fitting high-dimensional NNs, however, are found to be not local optima but saddle points (Bray & Dean, 2007; Dauphin+, 2014)! – After skipping saddle points by perturbation, most of the local optima empirically provide similar prediction capabilities. ● Please do not make the same mistake in multi- agent optimization problems (=games)!
- 52. Why most are saddle points? ● See spectrum of Hessian matrices of a random- drawn non-linear function from a Gaussian process. Local minima: every eigenvalue is positive. Local maxima: every eigenvalue is negative. Univariate Function Saddle point: both positive & negative eigenvalues exist. ● In high-dimensional function, Hessian contains both positive & negative eigenvalues with high probability. Bivariate Function https://en.wikipedia.org/wiki/Saddle_point
- 53. Open Questions for Multiple Equilibria ● If a game is very complex involving lots of parameters in pay-off or utility functions, then – Are most of its critical points unstable saddle points? – Is number of equilibria much smaller than our guess? ● If we obtain a few equilibria of such complex game, – Do most of such equilibria have similar properties? – Don't we have to obtain other equilibria?
- 54. See Dynamics: “Typical” Transitional State? ● MLers are sensitive to convergence rate in fitting. – We are in the finite-sample & high-dimensional world: only asymptotics is powerless, and computational estimate is not equilibrium but transitional state. http://sebastianruder.com/optimizing-gradient-descent/ (Kingma & Ba, 2015)
- 55. See Dynamics: “Typical” Transitional State? ● Mixing time of Markov processes of some games is exponential to the number of players. – E.g., (Axtell+, 2000) equilibrium: equality of wealth transitional states: severe inequality Nash demand game Equilibrium Transitional State ● What If #players is over thousands or millions? – Severe inequality in most of the time
- 56. See Dynamics: Trapped in Plateau? ● Fitting of a Deep NN is often trapped in plateaus. – Natural gradient descent (Amari, 1997) is often used for quickly escaping from plateau. – In real-world games, are people trapped in plateaus rather than equilibria? https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html
- 57. Conclusion ● Discussed how uncertainty should be incorporated in inductive & deductive decision making. – Quantifying uncertainty or simpler minimal estimation ● Linked Bayesian shrinkage with bounded rationality – Towards data-oriented regularized equilibrium ● Implications from high-dimensional ML – Saddle points, transitional state, and/or plateau
- 58. THANK YOU FOR ATTENDING! Download this material from https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating- machine-learning-and-game-theory
- 59. References References I Amari, S. (1997). Neural learning in structured parameter spaces - natural Riemannian gradient. In Advances in Neural Information Processing Systems 9, pages 127–133. MIT Press. Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classes in a multi-agent bargaining model. Working papers, Brookings Institution - Working Papers. Bray, A. J. and Dean, D. S. (2007). Statistics of critical points of gaussian ﬁelds on large-dimensional spaces. Physics Review Letters, 98:150201. Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is there something quantum-like about the human mental lexicon? Journal of Mathematical Psychology, 53(5):362–377. Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchy model of games. Quarterly Journal of Economics, 119:861–898. Game Theory Workshop 2017 Uncertainty Awareness
- 60. References References II Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in Neural Information Processing Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA. Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice, 2:19–33. Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems 27, pages 2933–2941. Curran Associates, Inc. de Barros, J. A. and Suppes, P. (2009). Quantum mechanics, interference, and the brain. Journal of Mathematical Psychology, 53(5):306–313. Game Theory Workshop 2017 Uncertainty Awareness
- 61. References References III Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., and Allenby, G. M. (2009). A probit model with structured covariance for similarity e↵ects and source of volume calculations. http://ssrn.com/abstract=1396232. Gonz´alez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic and context-sensitive model of choice behavior. Psychological Review, 109:137–154. Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumer research: Issues and outlook. Journal of Consumer Research, 5:103–123. Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychology of consumer and ﬁrm behavior with behavioral economics. Journal of Marketing Research, 43(3):307–331. Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, 9:90–98. Game Theory Workshop 2017 Uncertainty Awareness
- 62. References References IV Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, pages 1531–1538. MIT Press. Kingma, D. and Ba, J. (2015). Adam: A method for stochastic optimization. In The International Conference on Learning Representations (ICLR), San Diego. Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative models for capturing the compromise e↵ect. Journal of Marketing Research, 41(3):237–257. Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization with gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pages 601–608, New York, NY, USA. ACM. McFadden, D. and Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15:447–470. Game Theory Workshop 2017 Uncertainty Awareness
- 63. References References V McFadden, D. L. (1980). Econometric models of probabilistic choice among products. Journal of Business, 53(3):13–29. McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 10:6–38. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning (ICML 2016), pages 1928–1937. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518:529–533. Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy: A model of the KT (kahnemantversky)-man. Journal of Mathematical Psychology, 53(5):349–361. Game Theory Workshop 2017 Uncertainty Awareness
- 64. References References VI Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001). Multialternative decision ﬁeld theory: A dynamic connectionist model of decision making. Psychological Review, 108:370–392. Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ects in preference choice: What makes for a bargain? In Proceedings of the Cognitive Science Society Conference. Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529:484–489. Simonson, I. (1989). Choice based on reasons: The case of attraction and compromise e↵ects. Journal of Consumer Research, 16:158–174. Game Theory Workshop 2017 Uncertainty Awareness
- 65. References References VII Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063. MIT Press. Takahashi, R. and Morimura, T. (2015). Predicting preference reversals via gaussian process uncertainty aversion. In Proceedings of the 18th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS 2015), pages 958–967. Trueblood, J. S. (2014). The multiattribute linear ballistic accumulator model of context e↵ects in multialternative choice. Psychological Review, 121(2):179–205. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79:281–299. Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition in dynamical models of multialternative choice. Psychological Review, 111:757–769. Game Theory Workshop 2017 Uncertainty Awareness
- 66. References References VIII Wen, C.-H. and Koppelman, F. (2001). The generalized nested logit model. Transportation Research Part B, 35:627–641. Williams, H. (1977). On the formulation of travel demand models and economic evaluation measures of user beneﬁt. Environment and Planning A, 9(3):285–344. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. 8(3):229–256. Yai, T. (1997). Multinomial probit with structured covariance for route choice behavior. Transportation Research Part B: Methodological, 31(3):195–207. Game Theory Workshop 2017 Uncertainty Awareness