Tracking in Video using a Time-Series Transformation Learning approach

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Tracking in Video using a Time-Series Transformation Learning approach - Presentation Transcript

    1. Tracking in Video using a Time-Series Transformation Learning Approach Harsh Sharma Dept. of Electrical & Computer Engineering University of Illinois at Urbana-Champaign Group Meeting: September 25, 2007
    2. Learning Appearance Manifolds from Video. CVPR 2005 Ali Rahimi, Ben Recht, Trevor Darrell MIT CS and AI Lab http://www.csail.mit.edu/˜rahimi/papers/cvpr2005.pdf Learning to Transform Time Series with a Few Examples Ali Rahimi MIT CS and AI Lab PhD thesis – February 2005 Learning to Transform Time Series with a Few Examples Ali Rahimi, Ben Recht MIT CS and AI Lab to appear in: IEEE Trans. on Pattern Analysis and Machine Intelligence (October 2007)
    3. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    4. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    5. Tracking in Video Sequences
    6. Tracking in Video Sequences objective : obtain the pose of a target from a series of observations (sensor-measurements)
    7. Tracking in Video Sequences objective : obtain the pose of a target from a series of observations (sensor-measurements) essentially → transforming one time-series to another.
    8. What we know already...
    9. What we know already... Change in Appearance of Scenes → governed by a low-dimensional time-varying physical process.
    10. What we know already... Change in Appearance of Scenes → governed by a low-dimensional time-varying physical process. Articulated-Body Tracking: motion of limbs Lip Tracking: change in lip shape/contour-configuration
    11. Lip Tracking...
    12. Articulated-Body Tracking...
    13. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    14. Approaches to Tracking
    15. Approaches to Tracking 1. Nonlinear System Identification
    16. Approaches to Tracking 1. Nonlinear System Identification estimate the dynamics of low-dimensional states and simultaneously learn mappings from states to observations (images/video-frames). involves estimation of state-transition and state-specific observation-generating functions for a generative model.
    17. Approaches to Tracking 1. Nonlinear System Identification estimate the dynamics of low-dimensional states and simultaneously learn mappings from states to observations (images/video-frames). involves estimation of state-transition and state-specific observation-generating functions for a generative model. 2. Manifold Learning
    18. Approaches to Tracking 1. Nonlinear System Identification estimate the dynamics of low-dimensional states and simultaneously learn mappings from states to observations (images/video-frames). involves estimation of state-transition and state-specific observation-generating functions for a generative model. 2. Manifold Learning perform nonlinear dimensionality reduction to recover low-dimensional representations of images, while preserving their geometric attributes in the high-dimensional space.
    19. Approaches to Tracking 1. Nonlinear System Identification estimate the dynamics of low-dimensional states and simultaneously learn mappings from states to observations (images/video-frames). involves estimation of state-transition and state-specific observation-generating functions for a generative model. 2. Manifold Learning perform nonlinear dimensionality reduction to recover low-dimensional representations of images, while preserving their geometric attributes in the high-dimensional space. The underlying idea (that of a low-dimensional representation of the observed) is same in both; NL Sys ID additionally explicitly models the dynamics in the low-dimensional space.
    20. Issues with these Methods!
    21. Issues with these Methods! 1. Nonlinear System Identification
    22. Issues with these Methods! 1. Nonlinear System Identification computationally intensive, function representations like MLP do not scale to image-sized observations
    23. Issues with these Methods! 1. Nonlinear System Identification computationally intensive, function representations like MLP do not scale to image-sized observations optimization methods (for finding the MAP estimate of the observation function) prone to local optima
    24. Issues with these Methods! 1. Nonlinear System Identification computationally intensive, function representations like MLP do not scale to image-sized observations optimization methods (for finding the MAP estimate of the observation function) prone to local optima 2. Manifold Learning
    25. Issues with these Methods! 1. Nonlinear System Identification computationally intensive, function representations like MLP do not scale to image-sized observations optimization methods (for finding the MAP estimate of the observation function) prone to local optima 2. Manifold Learning temporal coherence between adjacent samples of the input time-series ignored (even though this provides useful info. about the manifold’s neighborhood structure and local geometry)
    26. Issues with these Methods! 1. Nonlinear System Identification computationally intensive, function representations like MLP do not scale to image-sized observations optimization methods (for finding the MAP estimate of the observation function) prone to local optima 2. Manifold Learning temporal coherence between adjacent samples of the input time-series ignored (even though this provides useful info. about the manifold’s neighborhood structure and local geometry) sparsely sampled manifolds can make identification of neighboring points hard, and also cause failure to recover meaningful geometric attributes.
    27. An important video-specific issue
    28. An important video-specific issue video-sequences ⇒ huge number of images → requires huge amounts of resources for labeling (to train reliable tracking-systems)
    29. An important video-specific issue video-sequences ⇒ huge number of images → requires huge amounts of resources for labeling (to train reliable tracking-systems) given enough input-output examples, nonlinear regression techniques capable of learning and representing any smooth mapping.
    30. An important video-specific issue video-sequences ⇒ huge number of images → requires huge amounts of resources for labeling (to train reliable tracking-systems) given enough input-output examples, nonlinear regression techniques capable of learning and representing any smooth mapping. “enough” for video-sequences is unfortunately very large.
    31. An important video-specific issue video-sequences ⇒ huge number of images → requires huge amounts of resources for labeling (to train reliable tracking-systems) given enough input-output examples, nonlinear regression techniques capable of learning and representing any smooth mapping. “enough” for video-sequences is unfortunately very large. Another approach → utilize unlabeled data...
    32. An important video-specific issue video-sequences ⇒ huge number of images → requires huge amounts of resources for labeling (to train reliable tracking-systems) given enough input-output examples, nonlinear regression techniques capable of learning and representing any smooth mapping. “enough” for video-sequences is unfortunately very large. Another approach → utilize unlabeled data...but how?
    33. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    34. Proposed Approach
    35. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping
    36. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output.
    37. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. use of prior renders unlabeled data usable. (can think of it as semi-supervised manifold learning)
    38. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. results in a different nonlinear system ID algorithm: observations mapped to states, rather than vice-versa
    39. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. results in a different nonlinear system ID algorithm: observations mapped to states, rather than vice-versa → providing computational advantages over existing approaches (see Assumption below)
    40. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. RBF kernel representation: optimization problem quadratic in states and mapping-parameters
    41. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. RBF kernel representation: optimization problem quadratic in states and mapping-parameters → computationally tractable
    42. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. RBF kernel representation: optimization problem quadratic in states and mapping-parameters → computationally tractable, not subject to local optima
    43. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. RBF kernel representation: optimization problem quadratic in states and mapping-parameters → computationally tractable, not subject to local optima and scalable to high-dimensional observations.
    44. Proposed Approach Main Contribution → a nonlinear regression model with an RBF kernel representation for the mapping...augmented with a prior on the dynamics of the output. Assumption → dynamics of the output time-series (low-dimensional states) are linear and Gaussian
    45. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    46. Function Fitting Input Time-Series: X = {xt }T t=1 Output Time-Series: Y = {yt }T t=1 xt ∈ R M , yt ∈ R N example: Articulated Motion Tracking → M ≈ 106 , N ≈ 20 notational convenience: vectors’ sets such as X also denote matrices stacking the constituent vectors horizontally
    47. Function Fitting Goal: given a set of input-output examples {xi , yi }T , find a function i=1 f : RM → RN note: not considering unlabeled training tokens at the moment
    48. Function Fitting finding fbest : define a loss function V (y , z) between output labels accounting for possibility of ill-posed nature of the problem: regularizer P (f ) The optimization problem can then be stated as: T min V (f (xi ) , yi ) + λ · P (f ) (1) f i=1
    49. Function Fitting form of the mapping f : here → RBF kernel representation J fθ (x) = θj k x, ξj (2) j=1 ξj : pre-specified kernel-centers ∈ RM 1...J k : RM x RM → R : a function of Euclidean distance between its arguments {θj }1...J : parameters of f , ∈ RN V in equation ( 1) quadratic loss ⇒ θ-estimation is a Least-Squares problem
    50. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    51. Reproducing Kernel Hilbert Spaces how to choose the stabilizer P (f )?
    52. Reproducing Kernel Hilbert Spaces Moore-Aronszajn Theorem Let T be an index set. To every positive definite function K on T x T , there corresponds a unique RKHS HK of real-valued functions on T and vice versa. Letting ·, · HK denote the associated inner-product, we have for every f ∈ HK and every t ∈ T , K (t, ·) , f HK = f (t).
    53. Reproducing Kernel Hilbert Spaces Moore-Aronszajn Theorem: implications for k : RM x RM → R
    54. Reproducing Kernel Hilbert Spaces Moore-Aronszajn Theorem: implications for k : RM x RM → R every positive definite kernel k from RM x RM to R defines an inner product on bounded functions (domain: some compact subset of RN , range: R)
    55. Reproducing Kernel Hilbert Spaces Moore-Aronszajn Theorem: implications for k : RM x RM → R every positive definite kernel k from RM x RM to R defines an inner product on bounded functions (domain: some compact subset of RN , range: R) inner product defined to satsify the Reproducing Property k (x, ·) , f (·) Hk = f (x)
    56. Reproducing Kernel Hilbert Spaces Moore-Aronszajn Theorem: implications for k : RM x RM → R every positive definite kernel k from RM x RM to R defines an inner product on bounded functions (domain: some compact subset of RN , range: R) inner product defined to satsify the Reproducing Property k (x, ·) , f (·) Hk = f (x) norm f Hk defined as f , f Hk
    57. Reproducing Kernel Hilbert Spaces by Mercer’s Theorem (not stating it!) k has a countable representation on a compact domain: ∞ k (x1 , x2 ) = λi φi (x1 ) φi (x2 ) i=1 where the functions φi are linearly independent.
    58. Reproducing Kernel Hilbert Spaces Mercer’s Theorem + Reproducing Property ⇒ {φi (·)} a countable basis for Hk ∞ f (x) = f (·) , k (x, ·) = f (·) , λi φi (·) φi (x) i=1 ∞ ∞ = φi (x) λi f (·) , φi (·) = φi (x) ci i=1 i=1 (3) {ci (·)} = λi f (·) , φi (·) :coefficients of f in basis set defined by φi .
    59. Reproducing Kernel Hilbert Spaces Linear Independence + Reproducing Property ⇒ φi (·) orthonormal ∞ φj (x) = φj (·) , k (x, ·) = i=1 φi (x) λi φj (·) , φi (·) ⇒ φi , φj = δij /λi
    60. Reproducing Kernel Hilbert Spaces Then, ∞ ∞ 2 f Hk = f ,f Hk = φi (x) ci , φi (x) ci i=1 i=1 (4) = ci cj φi , φj = ci2 /λi i,j i
    61. Reproducing Kernel Hilbert Spaces 2 2 For a Gaussian kernel, k (x1 , x2 ) = exp − x1 − x2 /σk → φi are sinusoidal λi positive and decaying with increasing i. ⇒ f Hk under this kernel penalizes high-frequency content of f more than the low-frequency content. 2 ⇒ P (f ) = f Hk will favor smoother f ’s
    62. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    63. the Tikhonov problem f : RM → RN = f 1 (x) . . . f N (x) T 2 min V f d (xi ) , yid + λk f d (5) fd Hk i=1
    64. the Tikhonov problem f : RM → RN = f 1 (x) . . . f N (x) T 2 min V f d (xi ) , yid + λk f d (5) fd Hk i=1 Representer Theorem: for an RKHS norm & RBF kernel the minimizer of (5) representable as: T d f (x) = cid k (x, xi ) (6) i=1
    65. the Tikhonov problem Then, for a quadratic loss function V (x, y ) = (x − y )2 2 min Kcd − yd + λk · cd Kcd (7) cd where K = [k (xi , xj )]T xT cd = c1 · · · cT d d y d = y1 · · · yT d d
    66. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    67. penalty function S S favors label sequences exhibiting plausible temporal dynamics
    68. penalty function S S favors label sequences exhibiting plausible temporal dynamics set of labeled outputs: Z = {zi }i∈L
    69. penalty function S S favors label sequences exhibiting plausible temporal dynamics set of labeled outputs: Z = {zi }i∈L zi ∈ RN
    70. penalty function S S favors label sequences exhibiting plausible temporal dynamics set of labeled outputs: Z = {zi }i∈L zi ∈ RN L → index-set for labeled training data
    71. penalty function S T min V (f (xi ) , yi ) + λl · V (f (xi ) , zi ) + λs · S (Y) + λk · P(f ) f ,Y i=1 i∈L (8)
    72. So, what should S be? Assumptions: each coordinate of the state evolves independently of other coordinates reasonable model for time-evolution of state: linear-Gaussian random walk process
    73. So, what should S be? Assumptions: each coordinate of the state evolves independently of other coordinates reasonable model for time-evolution of state: linear-Gaussian random walk process state-sequence: Sd = st d t=1,...,T Sd ∈ R3T : “position” + “velocity” + “acceleration” {y }t , y ˙ , y ¨ t t
    74. So, what should S be? st d = Ast−1 + ωt d (9)   1 αv 0 A= 0 1 αa  (10) 0 0 1 αv , αa : scalars ωt ∼ N (0, Λω )
    75. So, what should S be? st d = Ast−1 + ωt d (9)   1 αv 0 A= 0 1 αa  (10) 0 0 1 αv , αa : scalars ωt ∼ N (0, Λω ) S yd = yd Ω−1 yd → negative log-likelihood of the “position” y component in the above process
    76. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    77. Cost Functional after incorporating S (Y) T 2 2 2 min f d (xi ) − yid +λl · f d (xi ) − zid +λk · f d Hk +λs ·yd Ω−1 yd y f d ,yd i=1 i∈L (11)
    78. Cost Functional after incorporating S (Y) T 2 2 2 min f d (xi ) − yid +λl · f d (xi ) − zid +λk · f d Hk +λs ·yd Ω−1 yd y f d ,yd i=1 i∈L (11) I = L ∪ {1, · · · , T } f d (x) = cid k (x, xi ) (12) i∈I
    79. Cost Functional after incorporating S (Y) 2 2 min KT cd − yd +λl · KL cd − zd +λk ·cd Kcd +λs ·yd Ω−1 yd (13) y cd ,yd where K: kernel matrix for total training data (labeled + unlabeled) KT : rows of K corresponding to unlabeled examples KL : rows of K corresponding to labeled examples zd = zid i∈L
    80. Solving the Cost Functional cd KT KT + λk K + λl KL KL −KT cd −2λl KL zd cd min d + cd ,yd yd −KT I + λs Ω−1 y y 0 yd (14)
    81. Solving the Cost Functional cd KT KT + λk K + λl KL KL −KT cd −2λl KL zd cd min d + cd ,yd yd −KT I + λs Ω−1 y y 0 yd (14) cd Pcc Pcy cd −2λl KL zd cd min + cd ,yd yd Pcy Pyy yd 0 yd
    82. Solving the Cost Functional cd KT KT + λk K + λl KL KL −KT cd −2λl KL zd cd min d + cd ,yd yd −KT I + λs Ω−1 y y 0 yd (14) cd Pcc Pcy cd −2λl KL zd cd min + cd ,yd yd Pcy Pyy yd 0 yd Setting the derivatives to zero: Pcc Pcy cd λl KL zd d = Pcy Pyy y 0
    83. Solving the Cost Functional cd KT KT + λk K + λl KL KL −KT cd −2λl KL zd cd min d + cd ,yd yd −KT I + λs Ω−1 y y 0 yd (14) cd Pcc Pcy cd −2λl KL zd cd min + cd ,yd yd Pcy Pyy yd 0 yd Setting the derivatives to zero: Pcc Pcy cd λl KL zd d = Pcy Pyy y 0 ∗ −1 cd = λl Pcc − Pcy P−1 Pcy yy KL zd (15)
    84. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    85. Experiment Details 2000 frame video Training Data: first 1500 frames, with 5 labeled Test Data: last 500 frames Parameters fit by minimizing leave-1-out cross validation error on labeled points
    86. Labeled Frames in Training Data
    87. Tracked Regions in Unlabeled Training and Test Data
    88. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    89. Experiment Details 2300 frame video Training Data: first 1500 frames, with 12 labeled Test Data: last 800 frames
    90. Labeled Frames in Training Data
    91. Labeled Frames in Training Data
    92. Labeled Frames in Training Data
    93. Tracked Regions in Unlabeled Training Data
    94. Tracked Regions in Test Data
    95. Outline Introduction The Problem at hand How we can solve the tracking problem Time-Series Transformation Learning based Tracking Background Some Notation Reproducing Kernel Hilbert Spaces Nonlinear Regression with Tikhonov Regularization Semi-Supervised Nonlinear Regression with Dynamics Using unlabeled training data Learning the optimal Mapping Results Lip Tracking Arm Tracking Some issues not covered in the paper
    96. Some issues not covered in the paper Rahimi’s PhD thesis covers the following additional details: Choosing Examples to Label Tuning Parameters λk , λl , λs , Λω , αv , αa and RBF kernel parameter σ 2 Algorithm Variation: Nearest-Neighbors representation for the mapping Algorithm Variation: Noise-Free Examples (i.e when the given labels are accurate)
    97. Variation in Error with choice and number of labeled examples
    98. Variation in Error with Kernel and Regularization Parameters
    99. Conclusions 1. When pose is the dominant factor, abundant unlabeled data obviates need for hand-crafted image representations 2. Simple dynamical models capture sufficient temporal coherence to learn an appearance model 3. Combination of labeled and unlabeled examples can obviate need for sophisticated appearance models

    + anantsparshianantsparshi, 3 years ago

    custom

    611 views, 0 favs, 0 embeds more stats

    This is a presentation given by me to our research more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 611
      • 611 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories