Computer vision:models, learning and inference                Chapter 8                Regression       Please send errata...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Models for machine vision  Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   3
Body Pose RegressionEncode silhouette as 100x1 vector, encode body pose as 55 x1                 vector. Learn relationshi...
Type 1: Model Pr(w|x) -                DiscriminativeHow to model Pr(w|x)?  – Choose an appropriate form for Pr(w)  – Make...
Linear Regression•   For simplicity we will assume that each dimension of    world is predicted separately.•   Concentrate...
Linear RegressionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   7
Neater NotationTo make notation easier to handle, we   • Attach a 1 to the start of every data vector   • Attach the offse...
Combining EquationsWe have on equation for each x,w pair:The likelihood of the whole dataset is be the product of theseind...
LearningMaximum likelihoodSubstituting inTake derivative, set result to zero and re-arrange:          Computer vision: mod...
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   11
Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   12
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Bayesian Regression(We concentrate on f – come back to s2 later!)LikelihoodPriorBayes rule’         Computer vision: model...
Posterior Dist. over Parameters  where     Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   15
InferenceComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   16
Fitting Variance• We’ll fit the variance with maximum likelihood• Optimize the marginal likelihood (likelihood  after grad...
Practical IssueProblem: In high dimensions, the matrix A may be too big to invertSolution: Re-express using Matrix Inversi...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   20
Non-Linear RegressionGOAL:Keep the maths of linear regression, but extend tomore general functionsKEY IDEA:You can make a ...
Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating x against basisfun...
Example: polynomial regressionA special case ofWhere        Computer vision: models, learning and inference. ©2011 Simon J...
Radial basis functionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   24
Arc Tan FunctionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   25
Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating x against basisfun...
Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating against basisfunct...
Maximum LikelihoodSame as linear regression, but substitute in Z for X:          Computer vision: models, learning and inf...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   30
Bayesian ApproachLearn s2 from marginal likelihood as beforeFinal predictive distribution:              Computer vision: m...
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   32
The Kernel TrickNotice that the final equation doesn’t need the data itself, but just dot products between data items of t...
The Kernel TrickSo, we take data xi and xj pass through non-linear function tocreate zi and zj and then take dot products ...
Gaussian Process RegressionBeforeAfter            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince...
Example Kernels(Equivalent to having an infinite number of radial basis functions atevery position in space. Wow!)        ...
RBF Kernel FitsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   37
Fitting Variance• We’ll fit the variance with maximum likelihood• Optimize the marginal likelihood (likelihood after  grad...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   40
Sparse Linear RegressionPerhaps not every dimension of the data x is informativeA sparse solution forces some of the coeff...
Sparse Linear RegressionApply product of t-distributions to parameter vectorAs before, we useNow the prior is not conjugat...
Sparse Linear RegressionTo make progress, write as marginal of joint distribution                  Diagonal matrix with hi...
Sparse Linear RegressionSubstituting in the priorStill cannot compute, but can approximate             Computer vision: mo...
Sparse Linear RegressionTo fit the model, update variance s2 and hidden variables {hd}.• To choose hidden variables•   To ...
Sparse Linear RegressionAfter fitting, some of hidden variables become very big, implies prior tightlyfitted around zero, ...
Sparse Linear RegressionDoesn’t work for non-linear case as we need one hidden variable perdimension – becomes intractable...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Dual Linear RegressionKEY IDEA:Gradient F is just a vector in thedata spaceCan represent as a weightedsum of the data poin...
Dual Linear RegressionOriginal linear regression:Dual variables:Dual linear regression:             Computer vision: model...
Maximum likelihoodMaximum likelihood solution:Dual variables:Same result as before:            Computer vision: models, le...
Bayesian caseCompute distribution over parameters:Gives result:where            Computer vision: models, learning and infe...
Bayesian casePredictive distribution:where: Notice that in both the maximum likelihood and Bayesian case depend on dot pro...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   55
Relevance Vector Machine                                                       Combines ideas of                          ...
Relevance Vector MachineUsing same approximations as for sparse model we get theproblem:To solve, update variance s2 and h...
Structure•   Linear regression•   Bayesian solution•   Non-linear regression•   Kernelization and Gaussian processes•   Sp...
Body Pose Regression     (Agarwal and Triggs 2006)                                                      Encode silhouette ...
Shape ContextReturns 60 x 1 vector for each of 400 points around the silhouette           Computer vision: models, learnin...
Dimensionality ReductionCluster 60D space (based on all training data) into 100 vectorsAssign each 60x1 vector to closest ...
Results• 2636 training examples, solution depends on only 6% of these• 6 degree average errorlearning and inference. ©2011...
Displacement expertsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   63
Regression• Not actually used much in vision• But main ideas all apply to classification:  – Non-linear transformations  –...
Upcoming SlideShare
Loading in...5
×

08 cv mil_regression

235

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
235
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

08 cv mil_regression

  1. 1. Computer vision:models, learning and inference Chapter 8 Regression Please send errata to s.prince@cs.ucl.ac.uk
  2. 2. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  3. 3. Models for machine vision Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  4. 4. Body Pose RegressionEncode silhouette as 100x1 vector, encode body pose as 55 x1 vector. Learn relationship Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  5. 5. Type 1: Model Pr(w|x) - DiscriminativeHow to model Pr(w|x)? – Choose an appropriate form for Pr(w) – Make parameters a function of x – Function takes parameters q that define its shapeLearning algorithm: learn parameters q from training data x,wInference algorithm: just evaluate Pr(w|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  6. 6. Linear Regression• For simplicity we will assume that each dimension of world is predicted separately.• Concentrate on predicting a univariate world state w.Choose normal distribution over world wMake • Mean a linear function of data x • Variance constant Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  7. 7. Linear RegressionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  8. 8. Neater NotationTo make notation easier to handle, we • Attach a 1 to the start of every data vector • Attach the offset to the start of the gradient vector fNew model: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  9. 9. Combining EquationsWe have on equation for each x,w pair:The likelihood of the whole dataset is be the product of theseindividual distributions and can be written aswhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  10. 10. LearningMaximum likelihoodSubstituting inTake derivative, set result to zero and re-arrange: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  11. 11. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  12. 12. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  13. 13. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  14. 14. Bayesian Regression(We concentrate on f – come back to s2 later!)LikelihoodPriorBayes rule’ Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  15. 15. Posterior Dist. over Parameters where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  16. 16. InferenceComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  17. 17. Fitting Variance• We’ll fit the variance with maximum likelihood• Optimize the marginal likelihood (likelihood after gradients have been integrated out) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  18. 18. Practical IssueProblem: In high dimensions, the matrix A may be too big to invertSolution: Re-express using Matrix Inversion LemmaFinal expression: inverses are (I x I) , not (D x D) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  19. 19. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  20. 20. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  21. 21. Non-Linear RegressionGOAL:Keep the maths of linear regression, but extend tomore general functionsKEY IDEA:You can make a non-linear function from a linearweighted sum of non-linear basis functions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  22. 22. Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating x against basisfunctions, then linearly regress against z. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  23. 23. Example: polynomial regressionA special case ofWhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  24. 24. Radial basis functionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  25. 25. Arc Tan FunctionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  26. 26. Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating x against basisfunctions, then linearly regress against z. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  27. 27. Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating against basisfunctions,Computer vision: models, learning and inference. ©2011 Simon J.D. Prince then linearly regress against z. 27
  28. 28. Maximum LikelihoodSame as linear regression, but substitute in Z for X: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  29. 29. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  30. 30. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  31. 31. Bayesian ApproachLearn s2 from marginal likelihood as beforeFinal predictive distribution: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  32. 32. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  33. 33. The Kernel TrickNotice that the final equation doesn’t need the data itself, but just dot products between data items of the form ziTzjSo, we take data xi and xj pass through non-linear function to createzi and zj and then take dot products of different ziTzj Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  34. 34. The Kernel TrickSo, we take data xi and xj pass through non-linear function tocreate zi and zj and then take dot products of different ziTzjKey idea:Define a “kernel” function that does all of this together. • Takes data xi and xj • Returns a value for dot product ziTzjIf we choose this function carefully, then it will corresponds to some underlying z=f[x].Never compute z explicitly - can be very high or infinite dimension Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  35. 35. Gaussian Process RegressionBeforeAfter Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  36. 36. Example Kernels(Equivalent to having an infinite number of radial basis functions atevery position in space. Wow!) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  37. 37. RBF Kernel FitsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  38. 38. Fitting Variance• We’ll fit the variance with maximum likelihood• Optimize the marginal likelihood (likelihood after gradients have been integrated out)• Have to use non-linear optimization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  39. 39. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  40. 40. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  41. 41. Sparse Linear RegressionPerhaps not every dimension of the data x is informativeA sparse solution forces some of the coefficients in f to be zeroMethod:– apply a different prior on f that encourages sparsity– product of t-distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  42. 42. Sparse Linear RegressionApply product of t-distributions to parameter vectorAs before, we useNow the prior is not conjugate to the normal likelihood. Cannotcompute posterior in closed from Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  43. 43. Sparse Linear RegressionTo make progress, write as marginal of joint distribution Diagonal matrix with hidden variables {hd} on diagonal Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  44. 44. Sparse Linear RegressionSubstituting in the priorStill cannot compute, but can approximate Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  45. 45. Sparse Linear RegressionTo fit the model, update variance s2 and hidden variables {hd}.• To choose hidden variables• To choose variancewhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  46. 46. Sparse Linear RegressionAfter fitting, some of hidden variables become very big, implies prior tightlyfitted around zero, can be eliminated from model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
  47. 47. Sparse Linear RegressionDoesn’t work for non-linear case as we need one hidden variable perdimension – becomes intractable with high dimensional transformation. Tosolve this problem, we move to the dual model. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
  48. 48. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
  49. 49. Dual Linear RegressionKEY IDEA:Gradient F is just a vector in thedata spaceCan represent as a weightedsum of the data pointsNow solve for Y. Oneparameter per training example. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
  50. 50. Dual Linear RegressionOriginal linear regression:Dual variables:Dual linear regression: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
  51. 51. Maximum likelihoodMaximum likelihood solution:Dual variables:Same result as before: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51
  52. 52. Bayesian caseCompute distribution over parameters:Gives result:where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
  53. 53. Bayesian casePredictive distribution:where: Notice that in both the maximum likelihood and Bayesian case depend on dot products XTX. Can be kernelized! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
  54. 54. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
  55. 55. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
  56. 56. Relevance Vector Machine Combines ideas of • Dual regression (1 parameter per training example) • Sparsity (most of the parameters are zero) i.e., model that only depends sparsely on training data. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
  57. 57. Relevance Vector MachineUsing same approximations as for sparse model we get theproblem:To solve, update variance s2 and hidden variables {hd} alternately.Notice that this only depends on dot-products and so can bekernelized Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
  58. 58. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
  59. 59. Body Pose Regression (Agarwal and Triggs 2006) Encode silhouette as 100x1 vector, encode body pose as 55 x1 vector. Learn relationshipComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 59
  60. 60. Shape ContextReturns 60 x 1 vector for each of 400 points around the silhouette Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 60
  61. 61. Dimensionality ReductionCluster 60D space (based on all training data) into 100 vectorsAssign each 60x1 vector to closest cluster (Voronoi partition)Final data vector is 100x1 histogram over distribution of assignments Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 61
  62. 62. Results• 2636 training examples, solution depends on only 6% of these• 6 degree average errorlearning and inference. ©2011 Simon J.D. Prince Computer vision: models, 62
  63. 63. Displacement expertsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 63
  64. 64. Regression• Not actually used much in vision• But main ideas all apply to classification: – Non-linear transformations – Kernelization – Dual parameters – Sparse priors Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 64
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×