Upcoming SlideShare
×

# 08 cv mil_regression

235

Published on

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
235
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
5
0
Likes
0
Embeds 0
No embeds

No notes for slide

### 08 cv mil_regression

1. 1. Computer vision:models, learning and inference Chapter 8 Regression Please send errata to s.prince@cs.ucl.ac.uk
2. 2. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
3. 3. Models for machine vision Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
4. 4. Body Pose RegressionEncode silhouette as 100x1 vector, encode body pose as 55 x1 vector. Learn relationship Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
5. 5. Type 1: Model Pr(w|x) - DiscriminativeHow to model Pr(w|x)? – Choose an appropriate form for Pr(w) – Make parameters a function of x – Function takes parameters q that define its shapeLearning algorithm: learn parameters q from training data x,wInference algorithm: just evaluate Pr(w|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
6. 6. Linear Regression• For simplicity we will assume that each dimension of world is predicted separately.• Concentrate on predicting a univariate world state w.Choose normal distribution over world wMake • Mean a linear function of data x • Variance constant Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
7. 7. Linear RegressionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
8. 8. Neater NotationTo make notation easier to handle, we • Attach a 1 to the start of every data vector • Attach the offset to the start of the gradient vector fNew model: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
9. 9. Combining EquationsWe have on equation for each x,w pair:The likelihood of the whole dataset is be the product of theseindividual distributions and can be written aswhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
10. 10. LearningMaximum likelihoodSubstituting inTake derivative, set result to zero and re-arrange: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
11. 11. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
12. 12. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
13. 13. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
14. 14. Bayesian Regression(We concentrate on f – come back to s2 later!)LikelihoodPriorBayes rule’ Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
15. 15. Posterior Dist. over Parameters where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
16. 16. InferenceComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
17. 17. Fitting Variance• We’ll fit the variance with maximum likelihood• Optimize the marginal likelihood (likelihood after gradients have been integrated out) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
18. 18. Practical IssueProblem: In high dimensions, the matrix A may be too big to invertSolution: Re-express using Matrix Inversion LemmaFinal expression: inverses are (I x I) , not (D x D) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
19. 19. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
20. 20. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
21. 21. Non-Linear RegressionGOAL:Keep the maths of linear regression, but extend tomore general functionsKEY IDEA:You can make a non-linear function from a linearweighted sum of non-linear basis functions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
22. 22. Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating x against basisfunctions, then linearly regress against z. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
23. 23. Example: polynomial regressionA special case ofWhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
24. 24. Radial basis functionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
25. 25. Arc Tan FunctionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
26. 26. Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating x against basisfunctions, then linearly regress against z. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
27. 27. Non-linear regressionLinear regression:Non-Linear regression:whereIn other words create z by evaluating against basisfunctions,Computer vision: models, learning and inference. ©2011 Simon J.D. Prince then linearly regress against z. 27
28. 28. Maximum LikelihoodSame as linear regression, but substitute in Z for X: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
29. 29. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
30. 30. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
31. 31. Bayesian ApproachLearn s2 from marginal likelihood as beforeFinal predictive distribution: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
32. 32. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
33. 33. The Kernel TrickNotice that the final equation doesn’t need the data itself, but just dot products between data items of the form ziTzjSo, we take data xi and xj pass through non-linear function to createzi and zj and then take dot products of different ziTzj Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
34. 34. The Kernel TrickSo, we take data xi and xj pass through non-linear function tocreate zi and zj and then take dot products of different ziTzjKey idea:Define a “kernel” function that does all of this together. • Takes data xi and xj • Returns a value for dot product ziTzjIf we choose this function carefully, then it will corresponds to some underlying z=f[x].Never compute z explicitly - can be very high or infinite dimension Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
35. 35. Gaussian Process RegressionBeforeAfter Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
36. 36. Example Kernels(Equivalent to having an infinite number of radial basis functions atevery position in space. Wow!) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
37. 37. RBF Kernel FitsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
38. 38. Fitting Variance• We’ll fit the variance with maximum likelihood• Optimize the marginal likelihood (likelihood after gradients have been integrated out)• Have to use non-linear optimization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
39. 39. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
40. 40. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
41. 41. Sparse Linear RegressionPerhaps not every dimension of the data x is informativeA sparse solution forces some of the coefficients in f to be zeroMethod:– apply a different prior on f that encourages sparsity– product of t-distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
42. 42. Sparse Linear RegressionApply product of t-distributions to parameter vectorAs before, we useNow the prior is not conjugate to the normal likelihood. Cannotcompute posterior in closed from Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
43. 43. Sparse Linear RegressionTo make progress, write as marginal of joint distribution Diagonal matrix with hidden variables {hd} on diagonal Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
44. 44. Sparse Linear RegressionSubstituting in the priorStill cannot compute, but can approximate Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
45. 45. Sparse Linear RegressionTo fit the model, update variance s2 and hidden variables {hd}.• To choose hidden variables• To choose variancewhere Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
46. 46. Sparse Linear RegressionAfter fitting, some of hidden variables become very big, implies prior tightlyfitted around zero, can be eliminated from model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
47. 47. Sparse Linear RegressionDoesn’t work for non-linear case as we need one hidden variable perdimension – becomes intractable with high dimensional transformation. Tosolve this problem, we move to the dual model. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
48. 48. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
49. 49. Dual Linear RegressionKEY IDEA:Gradient F is just a vector in thedata spaceCan represent as a weightedsum of the data pointsNow solve for Y. Oneparameter per training example. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
50. 50. Dual Linear RegressionOriginal linear regression:Dual variables:Dual linear regression: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
51. 51. Maximum likelihoodMaximum likelihood solution:Dual variables:Same result as before: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51
52. 52. Bayesian caseCompute distribution over parameters:Gives result:where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
53. 53. Bayesian casePredictive distribution:where: Notice that in both the maximum likelihood and Bayesian case depend on dot products XTX. Can be kernelized! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
54. 54. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
55. 55. Regression ModelsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
56. 56. Relevance Vector Machine Combines ideas of • Dual regression (1 parameter per training example) • Sparsity (most of the parameters are zero) i.e., model that only depends sparsely on training data. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
57. 57. Relevance Vector MachineUsing same approximations as for sparse model we get theproblem:To solve, update variance s2 and hidden variables {hd} alternately.Notice that this only depends on dot-products and so can bekernelized Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
58. 58. Structure• Linear regression• Bayesian solution• Non-linear regression• Kernelization and Gaussian processes• Sparse linear regression• Dual linear regression• Relevance vector regression• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
59. 59. Body Pose Regression (Agarwal and Triggs 2006) Encode silhouette as 100x1 vector, encode body pose as 55 x1 vector. Learn relationshipComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 59
60. 60. Shape ContextReturns 60 x 1 vector for each of 400 points around the silhouette Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 60
61. 61. Dimensionality ReductionCluster 60D space (based on all training data) into 100 vectorsAssign each 60x1 vector to closest cluster (Voronoi partition)Final data vector is 100x1 histogram over distribution of assignments Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 61
62. 62. Results• 2636 training examples, solution depends on only 6% of these• 6 degree average errorlearning and inference. ©2011 Simon J.D. Prince Computer vision: models, 62
63. 63. Displacement expertsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 63
64. 64. Regression• Not actually used much in vision• But main ideas all apply to classification: – Non-linear transformations – Kernelization – Dual parameters – Sparse priors Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 64
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.