cvpr2011: game theory in CVPR part 2


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

cvpr2011: game theory in CVPR part 2

  1. 1. Table of ContentApplications – Image Segmentation – Medical Image Analysis – Content-Based Image Retrieval – Matching and Inlier Selection – Detection of Anomalous Behaviour and Selection of Discriminative FeaturesRepeated Games and Online Learning – Learning in Repeated Games – Learning with Experts and Regret-Based Learning – Boosting
  3. 3. Image SegmentationImage segmentation problem: Decompose a given image intosegments, i.e. regions containing “similar” pixels. First step in many computer vision problems Example: Segments might be regions of the image depicting the same object.Semantics Problem: How should we infer objects from segments?
  4. 4. Segmentation via Image Labeling (S. Yu and M. Berthod; CVIU 1995)
  5. 5. Markov Random Field FormulationModel pixel label probability through Markov Random FieldsProbability of a particular labeling L iswhere VcLc is the clique potential of L in clique cAssuming a Gibbs distribution, the posteriory probability isMAP solution is obtained by maximizing
  6. 6. Game-Theoretic FormulationUse game theory to maximize f(L)Relaxation scheme in which• Pixels are players• Labels are available strategiesPayoff of pixel i depends on the labels of its neighboursTheorem: L is a local maximum for f(L) iff is it a Nashequilibrium for the defined game
  7. 7. Relaxation Scheme1. Initialize2. At iteration k, for each object i, choose a label If accept the new label with probability α, otherwise reject it3. If L( k+1) is a Nash point, stop, else go to step 2Proposition: for any 0<α<1, L(k) converges to a Nash equilibriumNote: randomization is necessary to avoid oscillations
  8. 8. Relation to Relaxation LabelingIf the potentials of cliques of size greater than 2 sum to zerothe game is a polymatrix game and is thus related torelaxation labellingThe proposed relaxation scheme thus generalizes (discrete)relaxation labeling to higher order clique potentials
  9. 9. Texture Segmentation
  10. 10. Example Segmentation
  11. 11. Integration of region and boundary modules(A. Chakraborty and J. S. Duncan; TPAMI 1999)
  12. 12. Integration of Region and BoundaryGoal is to integrate region-based approaches with boundary-based approachesObjectives of region-based and boundary-based areincommensurableDue to exponential explosion of pixel dependencies in thegeneral case, attempts to integrate the approaches into asingle objective function result in ad hoc solutionsAvoid the problem of single-objective by casting it into a gametheoretic framework in which the output of one module affectsthe objective function of the other
  13. 13. Integration of Region and BoundaryGeneralized two player game in which strategies are acontinuum• The players are the region module and the boundary module• The strategies are the possible region and boundary configurationsThe payoff for each player is the posterior of the moduleThe selection of one module enters as a prior in thecomputation of the posterior of the other module(limited interaction)
  14. 14. Region and Boundary ModulesThe region is modeled through a Markov Random Field • Pixels labels x are estimated maximizing the posterior conditioned to the intensity observations Y and the boundary prior pThe boundary is modeled through a snake model • Boundary curve p is estimated maximizing the posterior conditioned to the gradient observations I and the boundary prior x
  15. 15. Synthetic Example Input image Region only Boundary only Region after Boundary after Overlaid onGT integration GT integration the template
  16. 16. Example segmentation Input image Single function integrationGame theoretic Overlaid on integration template
  17. 17. Comparison vs. Single Objective
  18. 18. Example SegmentationInput image Hand segmented No region GTintegration integration
  19. 19. Example Segmentation
  20. 20. Example SegmentationInput image Hand segmented No region GTintegration integration
  21. 21. Example Segmentation
  22. 22. Segmentation Using Dominant Sets
  23. 23. Graph-based segmentation Partition_into_dominant_sets(G) repeat find a dominant set remove it from graphuntil all vertices have been clustered
  24. 24. Experimental setup
  25. 25. Intensity SegmentationUse Dominant set framework to cluster pixels into coherent segmentsAffinity based on intensity/color/texture similarity M. Pavan and M. Pelillo; TPAMI 2007
  26. 26. Intensity Segmentation
  27. 27. Intensity Segmentation
  28. 28. Intensity SegmentationM. Pavan and M. Pelillo; TPAMI 2007
  29. 29. Color SegmentationM. Pavan and M. Pelillo; TPAMI 2007
  30. 30. Texture SegmentationM. Pavan and M. Pelillo; TPAMI 2007
  31. 31. Texture SegmentationM. Pavan and M. Pelillo; TPAMI 2007
  32. 32. Out-of sample SegmentationPairwise clustering has problematic scaling behaviorSubsample the pixels and assigning out-of-sample pixels afterthe clustering process has taken placeCan be computed in linear time w.r.t. the size of S
  33. 33. Intensity SegmentationM. Pavan and M. Pelillo; NIPS 2004
  34. 34. Color SegmentationM. Pavan and M. Pelillo; NIPS 2004
  35. 35. Alternative ApproachRecall that the probability of a surviving strategy atequilibrium is related to the centrality of the element to theclusterUse the element with higher probability as a class prototypeAssign new elements to the class with the most similarprototypeConstant time w.r.t. the size of the clustersIdeal for very large datasets (video)
  36. 36. Video SegmentationA. Torsello, M. Pavan, and M. Pelillo; EMMCVPR 2005
  37. 37. Video SegmentationA. Torsello, M. Pavan, and M. Pelillo; EMMCVPR 2005
  38. 38. Hierarchical segmentation and integration of boundary information• Integrate boundary information into pixel affinity• Key idea: – Define a regularizer based on edge response – Use it to impose a scale space on dominant sets• Assume random walk from pixel to pixel that is more likely to move along rather than across edges• Let L be the Laplacian of the edge-response mesh with weight • A lazy random walk is a stochastic process that once in node i it moves to node j with probability
  39. 39. Diffusion KernelA. Torsello and M. Pelillo; EMMCVPR 2009
  40. 40. Regularized Quadratic Program• Define the regularized quadratic program• Proposition: Let λ1 (A), λ2 (A), . . . , λn (A) represent the largest, second largest,..., smallest eigenvalue of matrix A If then ft is a strictly concave function in Δ. Further, if the only solution of the regularized quadratic program belongs to the interior of Δ.
  41. 41. Selection of relevant scales• How do we chose the  at which to separate the levels?• A good choice should be stable: cohesiveness should not change much.• Consider the entropy of the selected cluster – It is a measure of the size and compactness of the cluster• Cut on plateaus of the entropy
  42. 42. Hierarchical SegmentsA. Torsello and M. Pelillo; EMMCVPR 2009
  43. 43. Hierarchical SegmentsA. Torsello and M. Pelillo; EMMCVPR 2009
  44. 44. ReferencesS. Yu and M. Berthod. A Game Strategy Approach for Image Labeling. CVIU 1995.A. Chakraborty and J. S. Duncan. Game-Theoretic Integration for ImageSegmentation. PAMI 1999.M. Pavan and M. Pelillo. A new graph-theoretic approach to clustering andsegmentation. CVPR 2003.M. Pavan and M. Pelillo. Dominant sets and hierarchical clustering. ICCV 2003.M. Pavan and M. Pelillo. Efficient out-of-sample extension of dominant-set clusters.NIPS 2004.A. Torsello, M. Pavan, and M. Pelillo. Spatio-temporal segmentation using dominatsets. EMMCVPR 2005.A. Torsello, S. Rota Bulò and M. Pelillo. Grouping wit asymmetric affinities: A game-theoretic perspective. CVPR 2006.M. Pavan and M. Pelillo. Dominant sets and pairwise clustering. PAMI 2007.A. Torsello and M. Pelillo. Hierarchical Pairwise Segmentation Using Dominant Setsand Anisotropic Diffusion Kernels. EMMCVPR 2009.
  45. 45. Medical Image Analysis
  46. 46. Analysis of fMRI imagesProblems • detect coherent subregions in cortical areas on the basis of similarities between fMRI time series • Localization of activation foci, i.e., functional networks related to a specific cognitive task
  47. 47. Experimental SetupPatient assigned a word matching Stroop taskThis task requires a subject to respond to a particular stimulusdimension while a competing stimulus dimension has to besuppressed.Top row answers: NOBottom row answers: YESBrain response is tracked through time
  48. 48. ParcellationParcellation is the process ofsubdividing a ROI into functionalsubregionsApply replicator equation on thematrix of correlation of each voxeltime-seriesExtract clusters that formconnected and compactcomponentsResults are consistent with brainphysiology
  49. 49. Within-Subject VariabilityInter- and intra-subject variability is low
  50. 50. Meta-Analysis: Recognition of NetworksThe reconstruction of functional networks require the analysisof co-activation of foci1. Extract foci from activation maxima2. Computation of co-activation maxima
  51. 51. Activated FociActivation foci are calculated from the list of activation maxima
  52. 52. Dominant NetworksUse replicator equation on co-activation matrix to extract co-activated foci which form
  53. 53. ReferencesG. Lohmann and S. Bohn. Using Replicator Dynamics for AnalyzingfMRI Data of the Human Brain. IEEE Trans on Medical Imaging 2002.J. Neumann et al. Meta-Analysis of Functional Imaging Data UsingReplicator Dynamics. Human Brain Mapping 2005.J. Neumann et al. The parcellation of cortical areas using replicatordynamics in fMRI. Neuroimage 2006.
  54. 54. Content-Based Image RetrievalM. Wang, Z.-L. Ye, Y. Wang, and S.-X. Wang. Dominant setsclustering for image retrieval. Signal Processing, 2008
  55. 55. Content-Based Image RetrievalContent-based image retrieval focus on searchingimages in database similar to the query imageThere exists a semantic gap between the limited descriptivepower of low-level visual features and high-level concepts.Relevance feedback: Use user feedback to improve relevanceof images retrieved from a query image
  56. 56. Approach1. Use feature vector distance 5. Rearrange distances d(g) by for initial query image descenting order2. User labels into relevant (I+) 6. Cluster images into similarity and irrelevant (I-) sets clusters with dominant sets3. Construct training set (gi,yi) and report in order of importance within each cluster 7. If further refinement necessary, repeat steps 2 to 74. Train SVM to obtain distance d(g) between features of Dominant sets are used to relevant images and classifier reassess relevance order in the surface positive set (step 6)
  57. 57. Approach
  58. 58. ExamplesNo ds reassessment of order of positive examples
  59. 59. ExamplesOrder of positive examples reassessed through ds
  60. 60. ExamplesNo ds reassessment of order of positive examples
  61. 61. ExamplesOrder of positive examples reassessed through ds
  62. 62. Precision and Recall
  63. 63. Matching and Inlier Selection
  64. 64. Matching ProblemThe matching problem is one of finding correspondences within a setof elements, or featuresCentral to any recognition task where the object to be recognized isnaturally divided into several partsCorrespondences are allowed to compete with one another in amatching game, a non-cooperative game where• potential associations between the items to be matched correspond to strategies• payoffs reflect the degree of compatibility between competing hypothesesThe solutions of the matching problem correspond to ESS’s(dominant sets in the association space)The framework can deal with general many-to-many matchingproblems even in the presence of asymmetric compatibilities.
  65. 65. Matching gameLet O1 and O2 be the two sets of features that we want to match and A ⊆O1 × O2 the set of feasible associations that satisfy the unary constraints.Each feasible association represents a possible matching hypothesis.Let C : A × A → R+ be a set of pairwise compatibilities that measure thesupport that one association gives to the other.A submatch (or simply a match) is a set of associations, which satisfies thepairwise feasibility constraints, and two additional criteria: – High internal compatibility, i.e. the associations belonging to the match are mutually highly compatible – low external compatibility, i.e. associations outside the match are scarcely compatible with those inside..The proposed approach generalizes the association graph techniquedescribed by Barrow and Burstall to continuous structural constraints
  66. 66. Properties of Matching GamesDomain-specific information is confined to the definition of thecompatibility function.We are able to deal with many-to-many, one-to-many, many-to-one and one-to-one relations incorporating any hard binaryconstraints with the compatibilities (setting them to 0)Theorem: Consider a matching-game with compatibilities C =(cij) with cij ≥ 0 and cii = 0. If x ∈ Δ is an ESS then cij > 0 for all i,j ∈ σ(x)
  67. 67. Matching ExamplesA. Albarelli, S. Rota-Bulò, A. Torsello, and M. Pelillo; ICCV 2009
  68. 68. Matching ExamplesA. Albarelli, S. Rota-Bulò, A. Torsello, and M. Pelillo; ICCV 2009
  69. 69. GT Matcher and SparsityThe game-theoretic matcher deviates from the quadraticassignment tradition in that it is very selective: it limits toa cohesive set of association even if feasibleassociations might still be availableThe matcher is tuned towards low false positives ratherthan low false negatives such as quadratic assignmentQuadratic assignment is greedy while the gametheoretic matcher favours sparsity in the solutions
  70. 70. Matching and Inlier selectionThere is a domain in which this property is particularly useful:Inlier selectionWhen estimating a transformation acting on some data, weoften need to find correspondences between observationsbefore and after the transformationInlier selection is the process of selecting correspondencesthat are consistent with a single global transformation to beestimated even in the presence of several outlier observationsExamples of problems include surface registration or point-feature matching
  71. 71. Matching and Inlier selectionTypical matching strategies are based on randomselection (RANSAC) or the use of local information suchas feature descriptors.Global coherence checks are only introduced after a firstestimation (filtering)Filtering approaches are not very robust w.r.t. outliers(or structured noise)The game theoretic approach drives the selection ofcorrespondences that satisfy a global compatibilitycriterion
  72. 72. Estimation of Similarity Transformation Lowe GTInput images SIFT features (RANSAC) matcherA. Albarelli, S. Rota-Bulò, A. Torsello, and M. Pelillo; ICCV 2009
  73. 73. Estimation of Similarity TransformationA. Albarelli, S. Rota-Bulò, A. Torsello, and M. Pelillo; ICCV 2009
  74. 74. Surface Registration DARCES Spin Images GT matcherDescriptors are used just toreduce the set of feasibleassociations ACompatibilities are related torigidity constraints(difference in distancesbetween correspondingpoints)No initial motion estimationrequired (coarse) A. Albarelli, E.Rodolà, and A. Torsello; CVPR 2010
  75. 75. Surface RegistrationA. Albarelli, E.Rodolà, and A. Torsello; CVPR 2010
  76. 76. Surface RegistrationA. Albarelli, E.Rodolà, and A. Torsello; CVPR 2010
  77. 77. Point-Matching for Multi-View Bundle AdjustmentDefine (local) compatibility between candidatecorrespondences through a weak (affine) camera modelWe use the orientation and scale information in the featuredescriptors to infer an affine transformation between thecorresponding features Correspondence imply transformationTwo correspondences are compatible if they define similartransformations
  78. 78. ExperimentsA. Albarelli, E.Rodolà, and A. Torsello; 3DPVT 2010
  79. 79. ExperimentsA. Albarelli, E.Rodolà, and A. Torsello; 3DPVT 2010
  80. 80. ExperimentsA. Albarelli, E.Rodolà, and A. Torsello; 3DPVT 2010
  81. 81. ReferencesA. Albarelli, S. Rota Bulò , A. Torsello, and M. Pelillo. Matchingas a Non-Cooperative Game. ICCV 2009.A. Albarelli, E. Rodolà, and A. Torsello. Robust Game-Theoretic Inlier Selection for Bundle Adjustment. 3DPVT 2010.A. Albarelli, E. Rodolà, and A. Torsello. A Game-TheoreticApproach to Fine Surface Registration without Initial MotionEstimation. CVPR 2010.A. Albarelli, E. Rodolà, and A. Torsello. Imposing Semi-localGeometric Constraints for Accurate CorrespondencesSelection in Structure from Motion: a Game-TheoreticPerspective. IJCV 2011.
  82. 82. Detection of Anomalous Behaviour andSelection of Discriminative Features
  83. 83. Representing ActivitiesRepresents human activities assequence of atomic actionsDivide sequences into fixed lengthpieces (n-grams)Represent full actions as a bag ofn-grams R. Hamid et al. Artificial Intelligence 2009
  84. 84. Representing ActivitiesCommon activities are found by extracting dominant sets fromthe set of bags of n-grams representing the actionsSimilarities between actions A and B is Count of n-gram i in B Normalizing constantAnomalous events are those that do not fit any cluster
  85. 85. Deciding the (Ab)NormalityDecide the (ab)normality of a new instance based on itscloseness to the members of the closest activity-classThis is done with respect to the average closeness betweenall the members of the classA new action i is regular wrt the closest class S if ,where T is learned from the training set
  86. 86. Example Activity Classification R. Hamid et al. Artificial Intelligence 2009
  87. 87. Example Anomalous Activity R. Hamid et al. Artificial Intelligence 2009
  88. 88. Selection of Discriminative FeaturesAdopt a similar approach to the selection of discriminativefeatures among a large number of highly similar featuresExtract clusters of similar features and iteratively throw themaway, leaving only uncommon and discriminative featuresUses the fact that dominant set is not a partitioning scheme,but an unsupervised one-class classifier
  89. 89. Application to Surface Registration A. Albarelli, E.Rodolà, and A. Torsello; ECCV 2010
  90. 90. Application to Surface Registration A. Albarelli, E.Rodolà, and A. Torsello; ECCV 2010
  91. 91. Recognition with Textured BackgroundA. Albarelli, E.Rodolà, and A. Torsello; ICPR 2010
  92. 92. Recognition with Textured BackgroundA. Albarelli, E.Rodolà, and A. Torsello; ICPR 2010
  93. 93. ReferencesR. Hamid et al. Detection and Explanation of Anomalous Activities:Representing Activities as Bags of Event n-Grams. CVPR 2005.R. Hamid et al. A Novel Sequence Representation for UnsupervisedAnalysis of Human Activities. Artificial Intelligence 2009.A. Albarelli, E. Rodolà, and A. Torsello. Loosely Distinctive Featuresfor Robust Surface Alignment. To appear ECCV 2010.A. Albarelli, E. Rodolà, A. Cavallarin, and A. Torsello. Robust FigureExtraction on Textured Background: a Game-Theoretic Approach. Toappear ICPR 2010.
  94. 94. Repeated Games and Online Learning
  95. 95. Repeated GamesPrevious assumptions: – players had complete knowledge of the game – the game was played only onceWhat happens if the payoffs are not known, but the gamecan be repeated?Can a player learn a good strategy from the past plays?
  96. 96. Repeated GamesPrevious approach: – just compute an optimal/equilibrium strategyAnother approach: – learn how to play a game by playing it many times, and updating your strategy based on experienceWhy? – Some of the game’s utilities (especially the other players’) may be unknown to you – The other players may not be playing an equilibrium strategy – Computing an optimal strategy can be hard – Learning is what humans typically do
  97. 97. Iterated Prisoners DilemmaPrisoners dilemma Player 1 defect cooperate defect -10,-10 -1,-25 Player 2 cooperate -25,-1 -3,-3What strategies should one apply?Can I adapt to a player willing to cooperate and cooperate myself?Although the Prisoners dilemma has only one Nash equilibrium(everyone defect), cooperation can be sustained in the repeatedPrisoners dilemma if the players are interested enough in futureoutcomes of the game to take some loss in early plays.
  98. 98. Tit for TatIt was first introduced by Anatol Rapoport in Robert Axelrods twotournaments, held around 1980.Although extremely simple it won bothAn agent using this strategy will initially cooperate, then respond inkind to an opponents previous action: If the opponent previously wascooperative, the agent is cooperative. If not, the agent is not.Properties1. Unless provoked, the agent will always cooperate2. If provoked, the agent will retaliate3. The agent is quick to forgive4. The agent must have a good chance of competing against the opponent more than once. Note: used by BitTorrent to optimize download speed. (optimistic chocking)
  99. 99. Fictitious PlayOne widely used model of learning is the process of fictitious play and itsvariants.(G.W. Brown 1951).In it, each player presumes that her/his opponents are playing stationary(possibly mixed) strategies.In this process, agents behave as if they think they are facing an unknownbut stationary (possibly mixed) distribution of opponents strategiesThe players choose their actions in each period to maximize that period’sexpected payoff given their assessment of the distribution of opponent’sactions.The assessment is based on the observed frequencies of the otherplayers past strategies.At each round, each player thus best responds to the empirical frequencyof play of his opponent.
  100. 100. Convergence of Fictitious PlaySuch a method is of course adequate if the opponent indeeduses a stationary strategy, while it is flawed if the opponentsstrategy is non stationary. The opponents strategy may forexample be conditioned on the fictitious players last move.One key question about fictitious play is whether or notthis play converges – if it does,then the stationarity assumption employed by players makes sense, at least asymptotically – if it does not, then it seems implausible that players will maintain that assumption
  101. 101. Convergence of Fictitious PlayConvergence properties of Fictitious Play • If s is a strict Nash equilibrium, and s is played at time t in the process of fictitious play, s is played at all subsequent times. (strict Nash equilibria are absorbing) • Any pure strategy steady state of fictitious play must be a Nash equilibrium • If the empirical distributions over each players choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium • The empirical distributions converge if the game is zero-sum (Miyasawa 1961) or solvable by iterated strict dominance (Nachbar, 1990)
  102. 102. Convergence of Fictitious PlayFictitious play might not converge (Shapley 1964)Modified Rock-Scissors-Paper Player 1 Rock Scissors Paper Rock 0,0 1,0 0,1 Player 2 Scissors 0,1 0,0 1,0 Paper 1,0 0,1 0,0if the players start by choosing (Rock, Scissors), the play willcycle indefinitely.
  103. 103. Sequential PredictionIn a sequential prediction problem a predictor (orforecaster) observes a sequence of symbolseach time t = 1, 2, . . . , before the tth symbol of thesequence is revealed, the forecaster guesses its value st onthe basis of the previous t−1 observations. GOAL: limit the number of prediction mistakes without making any statistical assumptions on the way the data sequence is generated
  104. 104. Stationary Stochastic ProcessIn the classical statistical learning theory, the sequence ofoutcomes, is assumed to be a realization of a stationary stochastic processStatistical properties of the process, and effective predictionrules are estimated on the basis of past observationsIn such a setup, the risk of a prediction rule may be definedas the expected value of some loss functionDifferent rules are compared based on their risk.
  105. 105. Game against the EnvironmentWe want to abandon the idea of a stationarystochastic process in favor of an unknown (evenadversarial) mechanismThe forecaster plays a game against theenvironment, which can, in principle, respond to theforecasters previous predictionsThe goal of the forecaster is to maximize the payoffassociated with his predictionsThe goal of the environment is to minimize theforecasters payoff
  106. 106. Learning with ExpertsWithout a probabilistic model,the notion of risk cannot bedefinedThere is no obvious baselineagainst which to measure theforecaster’s performanceWe provide such baselineby introducingreference forecasters,also called experts.
  107. 107. ExpertsAt time t experts provide an advice in the form of a vector of predicted symbolsThink of experts as classifiers, observing the environmentand giving a predictionExperts are not perfect (each expert can be wrong on anyobservation)We want to get good prediction (high payoff) based onexpert adviceA good prediction is consistent with the performance of thebest experts
  108. 108. Game against the EnvironmentThe forecaster does not have• knowledge of the game (payoff function)• knowledge of the environments strategy profileThe forecaster knows the payoffs received by eachstrategy against each previous play of the environmentHowever the knowledge is based on the actual purestrategies selected by the environment, not its strategyprofile
  109. 109. Prediction GameThe game is played repeatedly in a sequence of rounds. 1. The environment chooses mixed strategy yt,and plays (pure) strategy yt according to the distribution yt 2. The experts provide their predictions 3. The forecaster chooses an expert according to mixed strategy xt The forecaster is permitted to observe the payoff that is, the payoff it would have obtained had it played following pure strategy (expert) i
  110. 110. Prediction GameThe goal of the forecaster is to do almost as well as the bestexpert against the actual sequence of playsThat is, the cumulative payoffShould not be “much worse” that the best (mixed) expert inhindsight
  111. 111. LearnabilityThere are two main questions regarding this predictiongame 1. Is there a solution? I.e., is there a strategy that will work even in this adversarial environment? 2. Can we learn such solution based on the previous plays?
  112. 112. Minimax and LearnabilityThe environments goal is to minimize the Zero-sum two player game forecasters payoffWe are within the hypotheses of the Minimax theoremThere exists a strategy x such thatThe value v is the best the forecaster can be guaranteedsince there exists a strategy y such thatMoreover, (x,y) is a Nash equilibrium
  113. 113. RegretWe define the external regret of having played strategysequence x=(x1,...,xT) w.r.t. expert e as the loss in payoff weincurred in by not having followed es adviceThe learners goal is to minimize the maximum regret w.r.t.any expert
  114. 114. Minimax and LearnabilityIf the forecaster predicts according to a Nash equilibrium x, heis guaranteed a payoff v even against and adversarialenvironmentTheorem: Let G be a zero-sum game with value v. If theforecaster plays for T steps a procedure with external regretR, then its average payoff is at least v-R/TAlgorithm can thus seek to minimize the external regret
  115. 115. Regret-Based LearningAssume {0,1} utilities, and let consider loss L(x,y)=1-u(x,y)rather than utility u(x,y)Greedy algorithmThe greedy algorithm at each time t selects the (mixed)strategy x that, if played for the first t-1 plays, would havegiven the minimum regretThe greedy algorithms loss iswhere Lmin is the loss incurred by the best expertNo deterministic algorithm can obtain a better ratio than N!
  116. 116. Weighted Majority ForecasterThe idea is to assign weights wi to expert i that reflect its pastperformance, and pick an expert with probability proportional toits weightRandomized Weighted Majority (RWM) AlgorithmInitially: wi =1 and pi =1/N, for i=1,...,n.At time t: If Lt= 1 let wt = wt−1(1 − η); else wt = wt-1Use mixed profileRandomized Weighted Majority achieves loss
  117. 117. Experts and BoostingIn general experts can be any (weak) predictor/classifierFinding an optimal strategy over expert advices is equivalentto finding an optimal combination of classifiersBoosting is the problem of converting a weak learningalgorithm that performs just slightly better than randomguessing into one that performs with arbitrarily goodaccuracyBoosting works by running the weak learning algorithmmany times on many distributions, and to combine theselected hypotheses into a final hypothesis
  118. 118. Boosting(Y. Freund and R. E. Schapire, 1996)Boosting proceeds in rounds alternating 1. The booster constructs a distribution p(k) on the sample space X and passes it to the weak learner 2. The weak learner produces an hypothesis h(k) ∈ H with error at most ½ - ;After T rounds the hypotheses h(1),...,h(T) are combined into afinal hypothesis hMain questions – How do we chose p(k)? – How do we combine the hypotheses?
  119. 119. Boosting and GamesDefine a two player game where – the boosters strategies are related to the possible samples – the weak learners strategies are the available hypotheses – The boosters payoff is defined as follows: target conceptThe boosterss goal is to feed a bad distributions to the weak learnerDue to the minimax theorem we haveLess than half of the hypotheses are wrong!Combine them by weighted majority (weighted according to h*)
  120. 120. Boosting and GamesAlternate the game1. The weak learner returns a guaranteed optimal hypothesis ht satisfying2. The booster responds by using randomized weighted majority approach to compute distribution pt* over samplesAfter T repetitions, the boosted hypothesis h* on a newsample x is obtained by majority voting among h1(x),...,hT(x)
  121. 121. ReferencesY. Freund and R. E. Schapire. Game Theory, On-line Prediction andBoosting. COLT 1996.D. Fudenberg and D. K. Levine. The Theory of Learning in Games.The MIT Press, 1998.N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games.Cambridge University Press, 2006.A. Blum and Y. Mansour. Chapter 4 “Learning, Regret minimization,and Equilibria” in N. Nisan, T. Roughgarden, E. Tardos, and V.Vazirani (Eds.), Algorithmic Game Theory, 2007.