09 cv mil_classification

866 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
866
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

09 cv mil_classification

  1. 1. Computer vision: models, learning and inference Chapter 9 Classification models Please send errata to s.prince@cs.ucl.ac.uk
  2. 2. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  3. 3. Models for machine vision Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  4. 4. Example application: Gender ClassificationComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  5. 5. Type 1: Model Pr(w|x) - DiscriminativeHow to model Pr(w|x)? – Choose an appropriate form for Pr(w) – Make parameters a function of x – Function takes parameters q that define its shapeLearning algorithm: learn parameters q from training data x,wInference algorithm: just evaluate Pr(w|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  6. 6. Logistic RegressionConsider two class problem. • Choose Bernoulli distribution over world. • Make parameter l a function of xModel activation with a linear functioncreates number between . Maps to with Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  7. 7. Two parametersLearning by standard methods (ML,MAP, Bayesian)Inference: Just evaluate Pr(w|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  8. 8. Neater NotationTo make notation easier to handle, we • Attach a 1 to the start of every data vector • Attach the offset to the start of the gradient vector fNew model: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  9. 9. Logistic regressionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  10. 10. Maximum LikelihoodTake LogarithmTake derivative: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  11. 11. DerivativesUnfortunately, there is no closed form solution– we cannotget and expression for f in terms of x and wHave to use a general purpose technique: “iterative non-linear optimization” Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  12. 12. OptimizationGoal:How can we find the minimum? Cost function or Objective functionBasic idea: • Start with estimate • Take a series of small steps to • Make sure that each step decreases cost • When can’t improve then must be in minimum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  13. 13. Local MinimaComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  14. 14. ConvexityIf a function is convex, then it has only a single minimumCan tell if a function is convex by looking at 2nd derivatives Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  15. 15. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  16. 16. Gradient Based Optimization• Choose a search direction s based on the local properties of the function• Perform an intensive search along the chosen direction. This is called line search• Then set Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  17. 17. Gradient DescentConsider standing on a hillsideLook at gradient where you arestandingFind the steepest directiondownhillWalk in that direction for somedistance (line search) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  18. 18. Finite differencesWhat if we can’t compute the gradient?Compute finite difference approximation:where ej is the unit vector in the jth direction Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  19. 19. Steepest Descent Problems Close up Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  20. 20. Second DerivativesIn higher dimensions, 2nd derivatives change how much we should move differently in the different directions: changes best direction to move in. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  21. 21. Newton’s MethodApproximate function with Taylor expansionTake derivativeRe-arrange Adding line search Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  22. 22. Newton’s Method Matrix of second derivatives is called the Hessian. Expensive to compute via finite differences. If positive definite then convexComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  23. 23. Newton vs. Steepest Descent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  24. 24. Line Search Gradually narrow down rangeComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  25. 25. Optimization for Logistic Regression Derivatives of log likelihood: Positive definite! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  26. 26. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  27. 27. Maximum likelihood fitsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
  28. 28. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  29. 29. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  30. 30. Bayesian Logistic RegressionLikelihood:Prior (no conjugate):Apply Bayes’ rule: (no closed form solution for posterior) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  31. 31. Laplace ApproximationApproximate posterior distribution with normal • Set mean to MAP estimate • Set covariance to matches that at MAP estimate Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  32. 32. Laplace ApproximationFind MAP solution by optimizing log likelihood Approximate with normal where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  33. 33. Laplace ApproximationPrior Actual posterior Approximated Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  34. 34. InferenceCan re-express in terms of activationUsing transformation properties of normal distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  35. 35. Approximation of Integral Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  36. 36. Bayesian SolutionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  37. 37. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  38. 38. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  39. 39. Non-linear regressionSame idea as for regression.• Apply non-linear transformation• Build model as usual Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  40. 40. Non-linear regressionExample transformations:Fit using optimization (also transformation parameters): Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  41. 41. Non-linear regression in 1D Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  42. 42. Non-linear regression in 2D Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  43. 43. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  44. 44. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  45. 45. Dual Logistic RegressionKEY IDEA:Gradient F is just a vector inthe data spaceCan represent as a weightedsum of the data pointsNow solve for Y. Oneparameter per trainingexample. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  46. 46. Maximum LikelihoodLikelihoodDerivativesDepend only depend on inner products! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
  47. 47. Kernel Logistic Regression Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
  48. 48. ML vs. BayesianBayesian case is known as Gaussian process classification Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
  49. 49. Relevance vector classificationApply sparse prior to dual variables:As before, write as marginalization of dual variables: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
  50. 50. Relevance vector classificationApply sparse prior to dual variables:Gives likelihood: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
  51. 51. Relevance vector classificationUse Laplace approximation result:giving: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51
  52. 52. Relevance vector classificationPrevious result:Second approximation:To solve, alternately update hidden variables in H and mean andvariance of Laplace approximation. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
  53. 53. Relevance vector classification Results: Most hidden variables increase to larger values This means prior over dual variable is very tight around zero The final solution only depends on a very small number of examples – efficient Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
  54. 54. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting & boosting• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
  55. 55. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
  56. 56. Incremental FittingPreviously wrote:Now write: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
  57. 57. Incremental FittingKEY IDEA: Greedily add terms one at a time.STAGE 1: Fit f0, f1, x1STAGE 2: Fit f0, f2, x2STAGE K: Fit f0, fk, xk Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
  58. 58. Incremental FittingComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
  59. 59. DerivativeIt is worth considering the form of the derivative in thecontext of the incremental fitting procedure Actual label Predicted LabelPoints contribute to derivative more if they are stillmisclassified: the later classifiers become increasinglyspecialized to the difficult examples. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 59
  60. 60. BoostingIncremental fitting with step functionsEach step function is called a ``weak classifier``Can`t take derivative w.r.t a so have to just useexhaustive search Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 60
  61. 61. BoostingComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 61
  62. 62. Branching Logistic RegressionA different way to make non-linear classifiersNew activationThe term is a gating function.• Returns a number between 0 and• If 0 then we get one logistic regression model• If 1 then get a different logistic regression model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 62
  63. 63. Branching Logistic Regression Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 63
  64. 64. Logistic Classification Trees Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 64
  65. 65. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 65
  66. 66. Multiclass Logistic RegressionFor multiclass recognition, choose distribution over w andthen make the parameters of this a function of x.Softmax function maps real activations {ak} to numbersbetween zero and one that sum to oneParameters are vectors {fk} Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 66
  67. 67. Multiclass Logistic RegressionSoftmax function maps activations which can take any value toparameters of categorical distribution between 0 and 1 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 67
  68. 68. Multiclass Logistic RegressionTo learn model, maximize log likelihoodNo closed from solution, learn with non-linear optimizationwhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 68
  69. 69. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 69
  70. 70. Random classification treeKey idea: • Binary tree • Randomly chosen function at each split • Choose threshold t to maximize log probabilityFor given threshold, can compute parameters in closed form Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 70
  71. 71. Random classification treeRelated models:Fern: • A tree where all of the functions at a level are the same • Threshold may be same or different • Very efficient to implementForest • Collection of trees • Average results to get more robust answer • Similar to `Bayesian’ approach – average of models with different parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 71
  72. 72. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 72
  73. 73. Non-probabilistic classifiersMost people use non-probabilistic classification methods suchas neural networks, adaboost, support vector machines. This islargely for historical reasonsProbabilistic approaches: • No serious disadvantages • Naturally produce estimates of uncertainty • Easily extensible to multi-class case • Easily related to each other Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 73
  74. 74. Non-probabilistic classifiersMulti-layer perceptron (neural network) • Non-linear logistic regression with sigmoid functions • Learning known as back propagation • Transformed variable z is hidden layerAdaboost • Very closely related to logitboost • Performance very similarSupport vector machines • Similar to relevance vector classification but objective fn is convex • No certainty • Not easily extended to multi-class • Produces solutions that are less sparse • More restrictions on kernel function Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 74
  75. 75. Structure• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 75
  76. 76. Gender ClassificationIncremental logistic regression300 arc tan basis functions:Results: 87.5% (humans=95%) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 76
  77. 77. Fast Face Detection (Viola and Jones 2001)Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 77
  78. 78. Computing Haar Features Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 78
  79. 79. Pedestrian DetectionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 79
  80. 80. Semantic segmentationComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 80
  81. 81. Recovering surface layout Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 81
  82. 82. Recovering body poseComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 82

×