SlideShare a Scribd company logo
1 of 52
Download to read offline
2d-3D Relations

Cristian Sminchisescu, TTI-C/ UToronto



People Tracking Tutorial, CVPR 2006
David Forsyth, Deva Ramanan, Cristian Sminchisescu
Inferring 3D from 2D
• History
• Monocular vs. multi-view analysis
• Difficulties
  – structure of the solution and ambiguities
  – static and dynamic ambiguities
• Modeling frameworks for inference and learning
  – top-down (generative, alignment-based)
  – bottom-up (discriminative, predictive, exemplar-based)
  – Learning joint models
• Take-home points
History of Analyzing Humans in Motion
• Markers (Etienne Jules Marey, 1882)

                                        chronophotograph




• Multiple Cameras
 (Eadweard Muybridge , 1884)
Human motion capture today
            120 years and still fighting …

• VICON ~ 100,000 $
   – Excellent performance,
     de-facto standard for special
     effects, animation, etc


• But heavily instrumented
   – Multiple cameras
   – Markers in order to simplify the image correspondence
   – Special room, simple background

 Major challenge: Move from the laboratory to the real world
What is so different between multi-
  view and single-view analysis?
• Different emphasis on the relative importance
  of measurement and prior knowledge
  – Depth ambiguities
  – Self-occluded body parts

• Similar techniques at least one-way
  – Transition monocular->multiview straightforward

• Monocular as the `robust limit’ of multi-view
  – Multiple cameras unavailable, or less effective in
    real-world environments due to occlusion from other
    people, objects, etc.
3D Human Motion Capture Difficulties
                         General poses
                         Self-occlusions
                         Difficult to segment the
                         individual limbs


                         Loss of 3D information in
                         the monocular projection              Different body sizes
                         Partial Views



                                  Several people, occlusions



                                  Reduced observability of
                                  body parts due to loose
Accidental allignments
                                  fitting clothing
     Motion blur
Levels of 3d Modeling
              This section                                Photo         Synthetic




•   Coarse body model        •   Complex body model   •    Complex body model

•   30 - 35 d.o.f            •   50 - 60 d.o.f        •    ? (hundreds) d.o.f

•   Simple appearance        •   Simple appearance    •    Sophisticated modeling
    (implicit texture map)        (edge histograms)        of clothing and lighting
Difficulties
• High-dimensional state space (30-60 dof)

• Complex appearance due to articulation, deformation,
  clothing, body proportions

• Depth ambiguities and self-occlusion

• Fast motions, only vaguely known a-priori
   – External factors, objects, sudden intentions…


• Data association (what is a human part, what is
  background – see the Data Association section)
Difficulties, more concretely
                                 Depth ambiguities




                                        Occlusions
                                        (missing data)
                                          Left arm



Data association                                  Preservation of
                   Left / right leg ?           physical constraints
  ambiguities
Articulated 3d from 2d Joint Positions
       Structure of the monocular solution: Lee and Chen, CVGIP 1985 (!)

• Characterizes the space of solutions, assuming
   – 2d joint positions + limb lengths                                      Jp



   – internal camera parameters
                                                                                      Js



• Builds an interpretation tree of projection-              ps



  consistent hypotheses (3d joint positions)                                     J1        J2
                                                       O         p1

                                                             camera plane


   –   obtained by forward-backward flips in-depth
   –   O(2# of body parts) solutions
   –   In principle, can prune some by physical reasoning
   –   But no procedure to compute joint angles, hence
       difficult to reason about physical constraints

• Not an automatic 3d reconstruction method
   – select the true solution (out of many) manually
                                                                   Taylor, CVIU 2000
• Adapted for orthographic cameras (Taylor 2000)
Why is 3D-from-monocular hard? <v>
          Static, Kinematic, Pose Ambiguities




        • Monocular static pose optima
             – ~ 2Nr of Joints, some pruned by physical constraints
             – Temporally persistent

Sminchisescu and Triggs ‘02
Trajectory Ambiguities <v>
               General smooth dynamics
 Model / image          Filtered       Smoothed




Sminchisescu and Jepson ‘04
                              2 (out of several) plausible trajectories
Visual Inference in a 12d Space
     6d rigid motion + 6d learned latent coordinate




   Interpretation #1
Points at camera when
  conversation ends
  (before the turn)




 Interpretation #2
Says `salut’ when
conversation ends
 (before the turn)
Trajectory Ambiguities <v>
           Learned latent space and smooth dynamics

             Interpretation #1
     Says `salut’ when conversation ends
                (before the turn)




               Interpretation #2
  Points at camera when conversation ends
               (before the turn)

• Image consistent, smooth, typically human…


Sminchisescu and Jepson ‘04
The Nature of 3D Ambiguities
     State


             x3

             x2



             x1


                    r1                    Observation

• Persistent over long time-scales (each S-branch)
• Loops (a, b, c) have limited time-scale support,
  hence ambiguity cannot be resolved by extending it
Generative vs. Discriminative Modelling
                      G                                     x is the model state
                                                            r are image observations
                                                             Goal:      pθ (x | r )

                                                             θ are parameters to learn
             x               r                     r
                                                            given training set of (r,x) pairs
                       D

                                 pθ ( x | r ) α pθ (r | x ) ⋅ p ( x )

• Predict state distributions from             • Optimize alignment with image
  image features                                 features
• Learning to `invert’ perspective             • Can learn state representations,
  projection and kinematics is difficult         dynamics, observation models;
  and produces multiple solutions                but difficult to model human
    – Multivalued mappings ≡ multimodal          appearance
      conditional state distributions          • State inference is expensive,
• Temporal extensions necessary                  need effective optimization
Temporal Inference (tracking)
• Generative (top-down) chain models
  (Kalman Filter, Extended KF, Condensation)


                                               Model the
                                               observation


• Discriminative (bottom-up) chain models
 (Conditional Bayesian Mixture Of Experts Markov Model - BM3E,
  Conditional Random Fields -CRF, Max. Entropy Models - MEMM)


                                           Condition on
                                           the observation
Temporal Inference
• x t state at time t
• O t = (o1 , o 2 ,...o t )
observations up to time t
p (x t | O t )                                          time t




             p (x t +1 | O t )



             p(o t +1 | x t +1 )



             p(x t +1 | O t +1 )                        time t+1



                 cf. CONDENSATION, Isard and Blake, 1996
Generative / Alignment Methods

• Modeling
• Methods for temporal inference
• Learning low-dimensional representations
  and parameters
Model-based Multiview
           Reconstruction
              Kehl, Bray and van Gool ‘05




• Body represented as a textured 3D mesh
• Tracking by minimizing distance between 3d
  points on the mesh and volumetric
  reconstruction obtained from multiple cameras
Generative 3D Reconstruction
            Annealed Particle Filter
                   (Deutscher, Blake and Reid, ‘99-01)
                                                              monocular

Careful design

• Dynamics
• Observation likelihood
   – edge + silhouettes
• Annealing-based
  search procedure,
  improves over particle
  filtering
• Simple background and
  clothing

                               Improved results (complex motions) when
                               multiple cameras (3-6) were used
Generative 3D Reconstruction
          Sidenbladh, Black and Fleet, ’00-02; Sigal et al ‘04

             Monocular                                 Multi-camera




• Condensation-based filter
• Dynamical models
   – walking, snippets
• Careful learning of observation         •   Non-parametric belief
                                              propagation, initialization by limb
  likelihood distributions                    detection and triangulation
Kinematic Jump Sampling
    Sminchisescu & Triggs ’01,’03                                                               mi = (µi , Σi )
                                                                                                                  p t −1
                             C=SelectSamplingChain(mi)
C         C1           CM

                            Candidate Sampling Chains


                                                                                     s=CovarianceScaledSampling(mi)
                                                                                      T=BuildInterpretationTree (s,C)
                                                                                         E=InverseKinematics(T)

                                                                                                                      E



                                                          z
                                             R
                                         z

                                                                  P1


                                                 Rx1Ry1Rz1
                                                                                        Prune and locally optimize E
                                                              z




                                                                                                                    pt
                                                   P2



                                                        Rx2

                                                         z             Rx3Ry3

                                                                  P3            P4

                                                                           z
Kinematic Jump Sampling <v>




Sminchisescu and Triggs ‘03
What can we learn?
• Low-dimensional perceptual representations,
  dynamics (unsupervised)
  – What is the intrinsic model dimensionality?
  – How to preserve physical constraints?
  – How to optimize efficiently?


• Parameters (typically supervised)
  – Observation likelihood (noise variance, feature weighting)
  – Can learn separately (easier) but how well we do?
  – Best to learn by doing (i.e. inference)
     • Maximize the probability of the right answer on the training data,
       hence learning = inference in a loop
     • Need efficient inference methods
Intrinsic Dimension Estimation <v>
       and Latent Representation for Walking
                                                  Intrinsic dimension estimation
• 2500 samples from motion capture                        d = lim r →0
                                                                         log N (r )
                                                                         log(1 / r )
• The Hausdorff dimension (d) is
  effectively 1, lift to 3 for more flexibility

• Use non-linear embedding to learn the
  latent 3d space embedded in an
  ambient 30d human joint angle space




                                                                 Optimize in 3d
                                                                 latent space
Sminchisescu and Jepson ‘04
3D Model-Based Reconstruction
               (Urtasun, Fleet, Hertzmann and Fua’05)




• Track human joints using the WSL tracker (Jepson et al’01)
• Optimize model joint re-projection error in a low-dimensional
  space obtained using probabilistic PCA (Lawrence’04)
Learning Empirical Distribution of
     Edge Filter Responses
       (original slide courtesy of Michael Black)




      pon(F)                                  poff (F)
Likelihood ratio, pon/ poff , used for edge detection
      Geman & Jednyak and Konishi, Yuille, & Coughlan
Learning Dependencies
(original slide courtesy of Michael Black); Roth, Sigal and Black’04
Learning Dependencies
   (original slide courtesy of Michael Black); Roth, Sigal and Black ’04




Filter responses are not conditionally independent
Leaning by Maximum Entropy
The effect of learning on the
                  trajectory distribution




              Before                                  After
 • Learn body proportions + parameters of the observation model (weighting
   of different feature types, variances, etc)
 • Notice reduction in uncertainty
 • The ambiguity diminishes significantly but does not disappear
Sminchisescu, Welling and Hinton ‘03
Conditional /Discriminative/ Indexing
              Methods
 •   Nearest-neighbor, snippets
 •   Regression
 •   Mixture of neural networks
 •   Conditional mixtures of experts
 •   Probabilistic methods for temporal
     Integration
Discriminative 3d: Nearest Neighbor
      Parameter Sensitive Hashing (PSH)
                     Shakhnarovich, Viola and Darell ’03




• Relies on database of (observation, state) pairs rendered
  artificially
   – Locates samples that have observation components similar to the
     current image data (nearest neighbors) and use their state as putative
     estimates

• Extension to multiple cameras and tracking by non-linear model
  optimization (PSH used for initialization Demirdjan et al, ICCV05)
   – Foreground / background segmentation from stereo
Discriminative 3d: Nearest Neighbor Matching
    2D->3D Pose + Annotation
                         Ramanan and Forsyth ‘03

                             user                    3D motion
          Annotations
                              +                      library
     {run,walk, wave, etc.}
                                                    StandWave

                               Motion Synthesizer
    match 1/2 second clips of motion

                                                    3D pose and
                 model                               annotation
                 build          detect


original video                           2D track
2D->3D pose + annotation <v>
       Ramanan and Forsyth’03
Discriminative 3d: Regression Methods
                   Aggarwal and Triggs ‘04, Elgammal & Lee ‘04

• (A&T) 3d pose recovery by non-linear
  regression against silhouette
  observations represented as shape
  context histograms
    – Emphasis on sparse, efficient predictions,
      good generalization

• (A&T) Careful study of dynamical
  regression-based predictors for walking
  and extensions to mixture of regressors
   (HCI’05)


• (E&L) pose from silhouette regression
  where the dimensionality of the input is
  reduced using non-linear embedding
    – Latent (input) to joint angle (output) state
      space map based on RBF networks
Discriminative 3d:
    Specialized Mappings Architecture
                    Rosales and Sclaroff ‘01
• Static 3D human pose
  estimation from
  silhouettes (Hu moments)

• Approximates the
  observation-pose
  mapping from training
  data
   – Mixture of neural networks
   – Models the joint distribution

• Uses the forward model
  (graphics rendering) to
  verify solutions
Conditional Bayesian Mixtures of Experts




                                                         Probability (gate) of Expert
                                            Cluster 1
                                            Cluster 2
              Single Expert
                                            Cluster 3
                                            Expert 1
                                            Expert 2
                                            Expert 3




     Data Sample                 Multiple Experts        Expert Proportions (Gates)
                                                        vs. uniform coefficients (Joint)
• A single expert cannot represent multi-valued relations
• Multiple experts can focus on representing parts of the data
• But the expert contribution (importance) is contextual
   – Disregarding context introduces systematic error (invalid extrapolation)
• The experts need observation-sensitive mixing proportions
Discriminative Temporal Inference
           BM3E= Conditional Bayesian Mixture of Experts Markov Model
  • `Bottom-up’ chain

                                                                                           Conditions on
                                                                                           the observation

                  p(xt | R t ) =    ∫ p (x
                                   x t −1
                                             t   | x t −1 , rt ) p ( x t −1 | R t −1 ), where R t = (r1 ,..., rt )

                             Local conditional                         Temporal (filtered) prior

• The temporal prior is a Gaussian mixture
• The local conditional is a Bayesian mixture of Gaussian experts
• Integrate pair-wise products of Gaussians analytically



Sminchisescu, Kanaujia, Li, Metaxas ‘05
Turn during Dancing <v>




                    Notice imperfect silhouettes
Sminchisescu, Kanaujia, Li, Metaxas ‘05
Washing a Window <v>
              Input                       Filtered   Smoothed




Sminchisescu, Kanaujia, Li, Metaxas ‘05
Low-dimensional Discriminative Inference
   • The pose prediction problem is highly structured
        – Human joint angles are correlated, not independent
        – Learn conditional mixtures between low-dimensional
          spaces decorrelated using kernel PCA (kBME)




                                          RVM – Relevance Vector Machine
                                          KDE – Kernel Dependency Estimator




Sminchisescu, Kanaujia, Li, Metaxas ‘05
Low-dimensional Discriminative Inference
                         (translation removed for better comparison)


                                             4d              10d




                                            6d               56d
Sminchisescu, Kanaujia, Li, Metaxas ‘05
Evaluation on artificially generated
 silhouettes with 3d ground truth
    (average error / average maximum error,
          per joint angle, in degrees)




                      84




• NN = nearest neighbor
• RVM = relevance vector machine
• BME = conditional Bayesian mixture of experts
Evaluation, low-dimensional models
            (average error / joint angle in degrees)




• KDE-RR=ridge regressor between low-dimensional spaces
• KDE-RVM=RVM between low-dimensional spaces
   – Unimodal methods average competing solutions
• kBME=conditional Bayesian mixture between low-
  dimensional state and observation spaces
   – Training and inference is about 10 time faster
Self-supervised Learning of a
Joint Generative-Recognition Model
     • Maximize the probability of the (observed)
       evidence (e.g. images of humans)

                                 pθ ( x, r )                  p ( x, r )
log pθ (r ) = log ∫ Qυ (x | r)               ≥ ∫ Qυ (x | r)log θ         = KL(Qυ (x | r) || pθ ( x, r ))
                   x
                                 Qυ (x | r) x                 Qυ (x | r)
log pθ (r ) − KL(Qυ (x | r) || pθ ( x | r )) = KL(Qυ (x | r) || pθ ( x, r ))


     • Hence, the KL divergence between what the
       generative model p infers and what the
       recognition model Q predicts, with tight bound at
                                 Qυ (x | r) = pθ ( x | r )
Self-supervised Learning of a
 Joint Generative-Recognition Model




• Local optimum for parameters
• Recognition model is a conditional mixture of data-
  driven mean field experts
   – Fast expectations, dimensions decouple
Generalization under clutter



                                       Mixture
                                       of
                                       experts


 Single
 expert




Sminchisescu, Kanaujia, Metaxas ‘06
Take home points
• Multi-view 3d reconstruction reliable in the lab
   – Measurement-oriented
   – Geometric, marker-based
      • correspondence + triangulation
   – Optimize multi-view alignment
      • generative, model-based
   – Data-association in real-world (occlusions) open

• Monocular 3d as robust limit of multi-view
   – Difficulties: depth perception + self-occlusion
   – Stronger dependency on efficient non-convex
     optimization and good observation models
   – Increased emphasis on prior vs. measurement
Take home points (contd.)
• Top-down / Generative / Alignment Models
  – Flexible, but difficult to model human appearance
  – Difficult optimization problems, local optima
  – Can learn constrained representations and
    parameters
       • Can handle occlusion, faster search (low-d)
       • Fewer local optima -- the best more likely true solutions

• Discriminative / Conditional / Exemplar–based
  Models
  –   Need to model complex multi-valued relations
  –   Replace inference with indexing / prediction
  –   Good for initialization, recovery from failure, on-line
  –   Still need to deal with segmentation / data association
For more information
 http://www.cs.toronto.edu/~crismin
 http://ttic.uchicago.edu/~crismin

 crismin@cs.toronto.edu
 crismin@nagoya.uchicago.edu

  For a tutorial on 3d human sensing from
  monocular images, see:

http://www.cs.toronto.edu/~crismin/PAPERS/tutorial_iccv05.pdf
Cristian Sminchisescu thanks the
      following collaborators on various
     elements of the research described
         in the 2d-3d relations section
•   Geoffrey Hinton (Toronto)
•   Allan Jepson (Toronto)
•   Atul Kanaujia (Rutgers)
•   Zhiguo Li (Rutgers)
•   Dimitris Metaxas (Rutgers)
•   Bill Triggs (INRIA)
•   Max Welling (UCIrvine)

More Related Content

What's hot

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad
 
Application of feature point matching to video stabilization
Application of feature point matching to video stabilizationApplication of feature point matching to video stabilization
Application of feature point matching to video stabilizationNikhil Prathapani
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Fcv learn le_cun
Fcv learn le_cunFcv learn le_cun
Fcv learn le_cunzukun
 
Human Action Recognition Using 3D Joint Information and HOOFD Features
Human Action Recognition Using 3D Joint Information and HOOFD FeaturesHuman Action Recognition Using 3D Joint Information and HOOFD Features
Human Action Recognition Using 3D Joint Information and HOOFD FeaturesBarış Üstündağ
 
State of art pde based ip to bt vijayakrishna rowthu
State of art pde based ip to bt  vijayakrishna rowthuState of art pde based ip to bt  vijayakrishna rowthu
State of art pde based ip to bt vijayakrishna rowthuvijayakrishna rowthu
 
Olga Senyukova - Machine Learning Applications in Medicine
Olga Senyukova - Machine Learning Applications in MedicineOlga Senyukova - Machine Learning Applications in Medicine
Olga Senyukova - Machine Learning Applications in MedicineOlsen222
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2Prarinya Siritanawan
 
Accelarating Optical Quadrature Microscopy Using GPUs
Accelarating Optical Quadrature Microscopy Using GPUsAccelarating Optical Quadrature Microscopy Using GPUs
Accelarating Optical Quadrature Microscopy Using GPUsPerhaad Mistry
 
Shearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse ApproximationsShearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
 
Recent Advances in Object-based Change Detection.pdf
Recent Advances in Object-based Change Detection.pdfRecent Advances in Object-based Change Detection.pdf
Recent Advances in Object-based Change Detection.pdfgrssieee
 
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation Slide
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation SlideSIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation Slide
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation SlideI-Chao Shen
 

What's hot (18)

Background subtraction
Background subtractionBackground subtraction
Background subtraction
 
Video stabilization
Video stabilizationVideo stabilization
Video stabilization
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
 
Compressive DIsplays: SID Keynote by Ramesh Raskar
Compressive DIsplays: SID Keynote by Ramesh RaskarCompressive DIsplays: SID Keynote by Ramesh Raskar
Compressive DIsplays: SID Keynote by Ramesh Raskar
 
Application of feature point matching to video stabilization
Application of feature point matching to video stabilizationApplication of feature point matching to video stabilization
Application of feature point matching to video stabilization
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Fcv learn le_cun
Fcv learn le_cunFcv learn le_cun
Fcv learn le_cun
 
2007 112
2007 1122007 112
2007 112
 
ICIP 2012
ICIP 2012ICIP 2012
ICIP 2012
 
Human Action Recognition Using 3D Joint Information and HOOFD Features
Human Action Recognition Using 3D Joint Information and HOOFD FeaturesHuman Action Recognition Using 3D Joint Information and HOOFD Features
Human Action Recognition Using 3D Joint Information and HOOFD Features
 
State of art pde based ip to bt vijayakrishna rowthu
State of art pde based ip to bt  vijayakrishna rowthuState of art pde based ip to bt  vijayakrishna rowthu
State of art pde based ip to bt vijayakrishna rowthu
 
Olga Senyukova - Machine Learning Applications in Medicine
Olga Senyukova - Machine Learning Applications in MedicineOlga Senyukova - Machine Learning Applications in Medicine
Olga Senyukova - Machine Learning Applications in Medicine
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2
 
Accelarating Optical Quadrature Microscopy Using GPUs
Accelarating Optical Quadrature Microscopy Using GPUsAccelarating Optical Quadrature Microscopy Using GPUs
Accelarating Optical Quadrature Microscopy Using GPUs
 
Shearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse ApproximationsShearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse Approximations
 
Iris recognition
Iris recognitionIris recognition
Iris recognition
 
Recent Advances in Object-based Change Detection.pdf
Recent Advances in Object-based Change Detection.pdfRecent Advances in Object-based Change Detection.pdf
Recent Advances in Object-based Change Detection.pdf
 
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation Slide
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation SlideSIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation Slide
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation Slide
 

Viewers also liked (16)

Lenihans Kit &amp; Bath
Lenihans Kit &amp; BathLenihans Kit &amp; Bath
Lenihans Kit &amp; Bath
 
Jagmohancrawl
JagmohancrawlJagmohancrawl
Jagmohancrawl
 
Bakers’ Kitchen
Bakers’ KitchenBakers’ Kitchen
Bakers’ Kitchen
 
10 slides
10 slides10 slides
10 slides
 
Kitchen Remodel 1
Kitchen Remodel 1Kitchen Remodel 1
Kitchen Remodel 1
 
Updatedpdxcruslideshow vision-100818000958-phpapp01
Updatedpdxcruslideshow vision-100818000958-phpapp01Updatedpdxcruslideshow vision-100818000958-phpapp01
Updatedpdxcruslideshow vision-100818000958-phpapp01
 
Christmas Time
Christmas TimeChristmas Time
Christmas Time
 
10 slides
10 slides10 slides
10 slides
 
Two
TwoTwo
Two
 
prova
provaprova
prova
 
10 slides
10 slides10 slides
10 slides
 
Kanade2
Kanade2Kanade2
Kanade2
 
10 slides
10 slides10 slides
10 slides
 
Payroll tables-1214838920274746-8
Payroll tables-1214838920274746-8Payroll tables-1214838920274746-8
Payroll tables-1214838920274746-8
 
Nister iccv2005tutorial
Nister iccv2005tutorialNister iccv2005tutorial
Nister iccv2005tutorial
 
The Shaping Of American Education
The Shaping Of American EducationThe Shaping Of American Education
The Shaping Of American Education
 

Similar to 2d3d

Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseWei Yang
 
Learning to Perceive the 3D World
Learning to Perceive the 3D WorldLearning to Perceive the 3D World
Learning to Perceive the 3D WorldNAVER Engineering
 
Areva10 Technical Day
Areva10 Technical DayAreva10 Technical Day
Areva10 Technical DaySDTools
 
Computer Vision Structure from motion
Computer Vision Structure from motionComputer Vision Structure from motion
Computer Vision Structure from motionWael Badawy
 
Computer Vision sfm
Computer Vision sfmComputer Vision sfm
Computer Vision sfmWael Badawy
 
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])Masaya Kaneko
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
Computer Vision - Stereo Vision
Computer Vision - Stereo VisionComputer Vision - Stereo Vision
Computer Vision - Stereo VisionWael Badawy
 
Geometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse MatrixGeometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse MatrixJun Saito
 
Computational Tools for Extracting, Representing and Analyzing Facial Features
Computational Tools for Extracting, Representing and Analyzing Facial FeaturesComputational Tools for Extracting, Representing and Analyzing Facial Features
Computational Tools for Extracting, Representing and Analyzing Facial Featuressaulnml
 
Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014Darius Burschka
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Dongmin Choi
 
Scrdet++ analysis
Scrdet++ analysisScrdet++ analysis
Scrdet++ analysisNEHA Kapoor
 
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal StructuresLarge Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structuresayubimoak
 
Various object detection and tracking methods
Various object detection and tracking methodsVarious object detection and tracking methods
Various object detection and tracking methodssujeeshkumarj
 
BayesianDecisionTheoryCaseStudiesf .pptx
BayesianDecisionTheoryCaseStudiesf .pptxBayesianDecisionTheoryCaseStudiesf .pptx
BayesianDecisionTheoryCaseStudiesf .pptxJulius346776
 

Similar to 2d3d (20)

Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
 
Learning to Perceive the 3D World
Learning to Perceive the 3D WorldLearning to Perceive the 3D World
Learning to Perceive the 3D World
 
Areva10 Technical Day
Areva10 Technical DayAreva10 Technical Day
Areva10 Technical Day
 
Computer Vision Structure from motion
Computer Vision Structure from motionComputer Vision Structure from motion
Computer Vision Structure from motion
 
Computer Vision sfm
Computer Vision sfmComputer Vision sfm
Computer Vision sfm
 
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Computer Vision - Stereo Vision
Computer Vision - Stereo VisionComputer Vision - Stereo Vision
Computer Vision - Stereo Vision
 
Geometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse MatrixGeometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse Matrix
 
Computational Tools for Extracting, Representing and Analyzing Facial Features
Computational Tools for Extracting, Representing and Analyzing Facial FeaturesComputational Tools for Extracting, Representing and Analyzing Facial Features
Computational Tools for Extracting, Representing and Analyzing Facial Features
 
Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
Scrdet++ analysis
Scrdet++ analysisScrdet++ analysis
Scrdet++ analysis
 
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal StructuresLarge Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
 
10833762.ppt
10833762.ppt10833762.ppt
10833762.ppt
 
Modeling full scale-data(2)
Modeling full scale-data(2)Modeling full scale-data(2)
Modeling full scale-data(2)
 
Various object detection and tracking methods
Various object detection and tracking methodsVarious object detection and tracking methods
Various object detection and tracking methods
 
PPT s02-machine vision-s2
PPT s02-machine vision-s2PPT s02-machine vision-s2
PPT s02-machine vision-s2
 
BayesianDecisionTheoryCaseStudiesf .pptx
BayesianDecisionTheoryCaseStudiesf .pptxBayesianDecisionTheoryCaseStudiesf .pptx
BayesianDecisionTheoryCaseStudiesf .pptx
 
Reza talk
Reza talkReza talk
Reza talk
 

More from Jag Mohan Singh

More from Jag Mohan Singh (20)

10 slides
10 slides10 slides
10 slides
 
10 slides
10 slides10 slides
10 slides
 
10 slides
10 slides10 slides
10 slides
 
Jagmohan presentation2008
Jagmohan presentation2008Jagmohan presentation2008
Jagmohan presentation2008
 
Keynote original
Keynote originalKeynote original
Keynote original
 
Keynote original hyperlink
Keynote original hyperlinkKeynote original hyperlink
Keynote original hyperlink
 
Keynote original hyperlink
Keynote original hyperlinkKeynote original hyperlink
Keynote original hyperlink
 
3dbody outline
3dbody outline3dbody outline
3dbody outline
 
Jagmohan presentation2008
Jagmohan presentation2008Jagmohan presentation2008
Jagmohan presentation2008
 
Kievbrandingshortopt 101123142128-phpapp02
Kievbrandingshortopt 101123142128-phpapp02Kievbrandingshortopt 101123142128-phpapp02
Kievbrandingshortopt 101123142128-phpapp02
 
Payroll tables-1214838920274746-8
Payroll tables-1214838920274746-8Payroll tables-1214838920274746-8
Payroll tables-1214838920274746-8
 
Finalprez webversion-101012091602-phpapp02
Finalprez webversion-101012091602-phpapp02Finalprez webversion-101012091602-phpapp02
Finalprez webversion-101012091602-phpapp02
 
Payroll tables-1214838920274746-8
Payroll tables-1214838920274746-8Payroll tables-1214838920274746-8
Payroll tables-1214838920274746-8
 
Ppt c 1
Ppt c 1Ppt c 1
Ppt c 1
 
Unfont
UnfontUnfont
Unfont
 
Fonts2
Fonts2Fonts2
Fonts2
 
Fonts
FontsFonts
Fonts
 
Linkyes 101005122038-phpapp01
Linkyes 101005122038-phpapp01Linkyes 101005122038-phpapp01
Linkyes 101005122038-phpapp01
 
Ppt Print To Cutepdf Hyperlink
Ppt Print To Cutepdf HyperlinkPpt Print To Cutepdf Hyperlink
Ppt Print To Cutepdf Hyperlink
 
Pptx Print2cutepdf Hyperlink
Pptx Print2cutepdf HyperlinkPptx Print2cutepdf Hyperlink
Pptx Print2cutepdf Hyperlink
 

Recently uploaded

Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Servicedoor45step
 
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...door45step
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Servicedoor45step
 
Bobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdfBobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdflunavro0105
 
Zagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdfZagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdfStripovizijacom
 
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlDxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlYinisingh
 
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call GirlsAbu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girlshayawit234
 
How to order fake Lakeland University diploma?
How to order fake Lakeland University diploma?How to order fake Lakeland University diploma?
How to order fake Lakeland University diploma?melodolykelton
 
Benjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work SlideshowBenjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work Slideshowssuser971f6c
 
Retail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College ParkRetail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College Parkjosebenzaquen
 
Strip Zagor Extra 322 - Dva ortaka.pdf
Strip   Zagor Extra 322 - Dva ortaka.pdfStrip   Zagor Extra 322 - Dva ortaka.pdf
Strip Zagor Extra 322 - Dva ortaka.pdfStripovizijacom
 
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Availabledollysharma2066
 
Escort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAEEscort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAEvecevep119
 
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 60009654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000Sapana Sha
 
Iffco Chowk Call Girls : ☎ 8527673949, Low rate Call Girls
Iffco Chowk Call Girls : ☎ 8527673949, Low rate Call GirlsIffco Chowk Call Girls : ☎ 8527673949, Low rate Call Girls
Iffco Chowk Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Servicedoor45step
 
Escort Service in Al Rigga +971509530047 UAE
Escort Service in Al Rigga +971509530047 UAEEscort Service in Al Rigga +971509530047 UAE
Escort Service in Al Rigga +971509530047 UAEvecevep119
 
New_Cross_Over (Comedy storyboard sample)
New_Cross_Over (Comedy storyboard sample)New_Cross_Over (Comedy storyboard sample)
New_Cross_Over (Comedy storyboard sample)DavonBrooks
 
Russian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts Servicedoor45step
 
STAR Scholars Program Brand Guide Presentation
STAR Scholars Program Brand Guide PresentationSTAR Scholars Program Brand Guide Presentation
STAR Scholars Program Brand Guide Presentationmakaiodm
 

Recently uploaded (20)

Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
 
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
 
Bobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdfBobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdf
 
Zagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdfZagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdf
 
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlDxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
 
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call GirlsAbu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
 
How to order fake Lakeland University diploma?
How to order fake Lakeland University diploma?How to order fake Lakeland University diploma?
How to order fake Lakeland University diploma?
 
Benjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work SlideshowBenjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work Slideshow
 
Retail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College ParkRetail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College Park
 
Strip Zagor Extra 322 - Dva ortaka.pdf
Strip   Zagor Extra 322 - Dva ortaka.pdfStrip   Zagor Extra 322 - Dva ortaka.pdf
Strip Zagor Extra 322 - Dva ortaka.pdf
 
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
 
Escort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAEEscort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAE
 
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 60009654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
 
Iffco Chowk Call Girls : ☎ 8527673949, Low rate Call Girls
Iffco Chowk Call Girls : ☎ 8527673949, Low rate Call GirlsIffco Chowk Call Girls : ☎ 8527673949, Low rate Call Girls
Iffco Chowk Call Girls : ☎ 8527673949, Low rate Call Girls
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
 
Escort Service in Al Rigga +971509530047 UAE
Escort Service in Al Rigga +971509530047 UAEEscort Service in Al Rigga +971509530047 UAE
Escort Service in Al Rigga +971509530047 UAE
 
New_Cross_Over (Comedy storyboard sample)
New_Cross_Over (Comedy storyboard sample)New_Cross_Over (Comedy storyboard sample)
New_Cross_Over (Comedy storyboard sample)
 
Russian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 40 Noida✨8375860717⚡Escorts Service
 
STAR Scholars Program Brand Guide Presentation
STAR Scholars Program Brand Guide PresentationSTAR Scholars Program Brand Guide Presentation
STAR Scholars Program Brand Guide Presentation
 

2d3d

  • 1. 2d-3D Relations Cristian Sminchisescu, TTI-C/ UToronto People Tracking Tutorial, CVPR 2006 David Forsyth, Deva Ramanan, Cristian Sminchisescu
  • 2. Inferring 3D from 2D • History • Monocular vs. multi-view analysis • Difficulties – structure of the solution and ambiguities – static and dynamic ambiguities • Modeling frameworks for inference and learning – top-down (generative, alignment-based) – bottom-up (discriminative, predictive, exemplar-based) – Learning joint models • Take-home points
  • 3. History of Analyzing Humans in Motion • Markers (Etienne Jules Marey, 1882) chronophotograph • Multiple Cameras (Eadweard Muybridge , 1884)
  • 4. Human motion capture today 120 years and still fighting … • VICON ~ 100,000 $ – Excellent performance, de-facto standard for special effects, animation, etc • But heavily instrumented – Multiple cameras – Markers in order to simplify the image correspondence – Special room, simple background Major challenge: Move from the laboratory to the real world
  • 5. What is so different between multi- view and single-view analysis? • Different emphasis on the relative importance of measurement and prior knowledge – Depth ambiguities – Self-occluded body parts • Similar techniques at least one-way – Transition monocular->multiview straightforward • Monocular as the `robust limit’ of multi-view – Multiple cameras unavailable, or less effective in real-world environments due to occlusion from other people, objects, etc.
  • 6. 3D Human Motion Capture Difficulties General poses Self-occlusions Difficult to segment the individual limbs Loss of 3D information in the monocular projection Different body sizes Partial Views Several people, occlusions Reduced observability of body parts due to loose Accidental allignments fitting clothing Motion blur
  • 7. Levels of 3d Modeling This section Photo Synthetic • Coarse body model • Complex body model • Complex body model • 30 - 35 d.o.f • 50 - 60 d.o.f • ? (hundreds) d.o.f • Simple appearance • Simple appearance • Sophisticated modeling (implicit texture map) (edge histograms) of clothing and lighting
  • 8. Difficulties • High-dimensional state space (30-60 dof) • Complex appearance due to articulation, deformation, clothing, body proportions • Depth ambiguities and self-occlusion • Fast motions, only vaguely known a-priori – External factors, objects, sudden intentions… • Data association (what is a human part, what is background – see the Data Association section)
  • 9. Difficulties, more concretely Depth ambiguities Occlusions (missing data) Left arm Data association Preservation of Left / right leg ? physical constraints ambiguities
  • 10. Articulated 3d from 2d Joint Positions Structure of the monocular solution: Lee and Chen, CVGIP 1985 (!) • Characterizes the space of solutions, assuming – 2d joint positions + limb lengths Jp – internal camera parameters Js • Builds an interpretation tree of projection- ps consistent hypotheses (3d joint positions) J1 J2 O p1 camera plane – obtained by forward-backward flips in-depth – O(2# of body parts) solutions – In principle, can prune some by physical reasoning – But no procedure to compute joint angles, hence difficult to reason about physical constraints • Not an automatic 3d reconstruction method – select the true solution (out of many) manually Taylor, CVIU 2000 • Adapted for orthographic cameras (Taylor 2000)
  • 11. Why is 3D-from-monocular hard? <v> Static, Kinematic, Pose Ambiguities • Monocular static pose optima – ~ 2Nr of Joints, some pruned by physical constraints – Temporally persistent Sminchisescu and Triggs ‘02
  • 12. Trajectory Ambiguities <v> General smooth dynamics Model / image Filtered Smoothed Sminchisescu and Jepson ‘04 2 (out of several) plausible trajectories
  • 13. Visual Inference in a 12d Space 6d rigid motion + 6d learned latent coordinate Interpretation #1 Points at camera when conversation ends (before the turn) Interpretation #2 Says `salut’ when conversation ends (before the turn)
  • 14. Trajectory Ambiguities <v> Learned latent space and smooth dynamics Interpretation #1 Says `salut’ when conversation ends (before the turn) Interpretation #2 Points at camera when conversation ends (before the turn) • Image consistent, smooth, typically human… Sminchisescu and Jepson ‘04
  • 15. The Nature of 3D Ambiguities State x3 x2 x1 r1 Observation • Persistent over long time-scales (each S-branch) • Loops (a, b, c) have limited time-scale support, hence ambiguity cannot be resolved by extending it
  • 16. Generative vs. Discriminative Modelling G x is the model state r are image observations Goal: pθ (x | r ) θ are parameters to learn x r r given training set of (r,x) pairs D pθ ( x | r ) α pθ (r | x ) ⋅ p ( x ) • Predict state distributions from • Optimize alignment with image image features features • Learning to `invert’ perspective • Can learn state representations, projection and kinematics is difficult dynamics, observation models; and produces multiple solutions but difficult to model human – Multivalued mappings ≡ multimodal appearance conditional state distributions • State inference is expensive, • Temporal extensions necessary need effective optimization
  • 17. Temporal Inference (tracking) • Generative (top-down) chain models (Kalman Filter, Extended KF, Condensation) Model the observation • Discriminative (bottom-up) chain models (Conditional Bayesian Mixture Of Experts Markov Model - BM3E, Conditional Random Fields -CRF, Max. Entropy Models - MEMM) Condition on the observation
  • 18. Temporal Inference • x t state at time t • O t = (o1 , o 2 ,...o t ) observations up to time t p (x t | O t ) time t p (x t +1 | O t ) p(o t +1 | x t +1 ) p(x t +1 | O t +1 ) time t+1 cf. CONDENSATION, Isard and Blake, 1996
  • 19. Generative / Alignment Methods • Modeling • Methods for temporal inference • Learning low-dimensional representations and parameters
  • 20. Model-based Multiview Reconstruction Kehl, Bray and van Gool ‘05 • Body represented as a textured 3D mesh • Tracking by minimizing distance between 3d points on the mesh and volumetric reconstruction obtained from multiple cameras
  • 21. Generative 3D Reconstruction Annealed Particle Filter (Deutscher, Blake and Reid, ‘99-01) monocular Careful design • Dynamics • Observation likelihood – edge + silhouettes • Annealing-based search procedure, improves over particle filtering • Simple background and clothing Improved results (complex motions) when multiple cameras (3-6) were used
  • 22. Generative 3D Reconstruction Sidenbladh, Black and Fleet, ’00-02; Sigal et al ‘04 Monocular Multi-camera • Condensation-based filter • Dynamical models – walking, snippets • Careful learning of observation • Non-parametric belief propagation, initialization by limb likelihood distributions detection and triangulation
  • 23. Kinematic Jump Sampling Sminchisescu & Triggs ’01,’03 mi = (µi , Σi ) p t −1 C=SelectSamplingChain(mi) C C1 CM Candidate Sampling Chains s=CovarianceScaledSampling(mi) T=BuildInterpretationTree (s,C) E=InverseKinematics(T) E z R z P1 Rx1Ry1Rz1 Prune and locally optimize E z pt P2 Rx2 z Rx3Ry3 P3 P4 z
  • 24. Kinematic Jump Sampling <v> Sminchisescu and Triggs ‘03
  • 25. What can we learn? • Low-dimensional perceptual representations, dynamics (unsupervised) – What is the intrinsic model dimensionality? – How to preserve physical constraints? – How to optimize efficiently? • Parameters (typically supervised) – Observation likelihood (noise variance, feature weighting) – Can learn separately (easier) but how well we do? – Best to learn by doing (i.e. inference) • Maximize the probability of the right answer on the training data, hence learning = inference in a loop • Need efficient inference methods
  • 26. Intrinsic Dimension Estimation <v> and Latent Representation for Walking Intrinsic dimension estimation • 2500 samples from motion capture d = lim r →0 log N (r ) log(1 / r ) • The Hausdorff dimension (d) is effectively 1, lift to 3 for more flexibility • Use non-linear embedding to learn the latent 3d space embedded in an ambient 30d human joint angle space Optimize in 3d latent space Sminchisescu and Jepson ‘04
  • 27. 3D Model-Based Reconstruction (Urtasun, Fleet, Hertzmann and Fua’05) • Track human joints using the WSL tracker (Jepson et al’01) • Optimize model joint re-projection error in a low-dimensional space obtained using probabilistic PCA (Lawrence’04)
  • 28. Learning Empirical Distribution of Edge Filter Responses (original slide courtesy of Michael Black) pon(F) poff (F) Likelihood ratio, pon/ poff , used for edge detection Geman & Jednyak and Konishi, Yuille, & Coughlan
  • 29. Learning Dependencies (original slide courtesy of Michael Black); Roth, Sigal and Black’04
  • 30. Learning Dependencies (original slide courtesy of Michael Black); Roth, Sigal and Black ’04 Filter responses are not conditionally independent Leaning by Maximum Entropy
  • 31. The effect of learning on the trajectory distribution Before After • Learn body proportions + parameters of the observation model (weighting of different feature types, variances, etc) • Notice reduction in uncertainty • The ambiguity diminishes significantly but does not disappear Sminchisescu, Welling and Hinton ‘03
  • 32. Conditional /Discriminative/ Indexing Methods • Nearest-neighbor, snippets • Regression • Mixture of neural networks • Conditional mixtures of experts • Probabilistic methods for temporal Integration
  • 33. Discriminative 3d: Nearest Neighbor Parameter Sensitive Hashing (PSH) Shakhnarovich, Viola and Darell ’03 • Relies on database of (observation, state) pairs rendered artificially – Locates samples that have observation components similar to the current image data (nearest neighbors) and use their state as putative estimates • Extension to multiple cameras and tracking by non-linear model optimization (PSH used for initialization Demirdjan et al, ICCV05) – Foreground / background segmentation from stereo
  • 34. Discriminative 3d: Nearest Neighbor Matching 2D->3D Pose + Annotation Ramanan and Forsyth ‘03 user 3D motion Annotations + library {run,walk, wave, etc.} StandWave Motion Synthesizer match 1/2 second clips of motion 3D pose and model annotation build detect original video 2D track
  • 35. 2D->3D pose + annotation <v> Ramanan and Forsyth’03
  • 36. Discriminative 3d: Regression Methods Aggarwal and Triggs ‘04, Elgammal & Lee ‘04 • (A&T) 3d pose recovery by non-linear regression against silhouette observations represented as shape context histograms – Emphasis on sparse, efficient predictions, good generalization • (A&T) Careful study of dynamical regression-based predictors for walking and extensions to mixture of regressors (HCI’05) • (E&L) pose from silhouette regression where the dimensionality of the input is reduced using non-linear embedding – Latent (input) to joint angle (output) state space map based on RBF networks
  • 37. Discriminative 3d: Specialized Mappings Architecture Rosales and Sclaroff ‘01 • Static 3D human pose estimation from silhouettes (Hu moments) • Approximates the observation-pose mapping from training data – Mixture of neural networks – Models the joint distribution • Uses the forward model (graphics rendering) to verify solutions
  • 38. Conditional Bayesian Mixtures of Experts Probability (gate) of Expert Cluster 1 Cluster 2 Single Expert Cluster 3 Expert 1 Expert 2 Expert 3 Data Sample Multiple Experts Expert Proportions (Gates) vs. uniform coefficients (Joint) • A single expert cannot represent multi-valued relations • Multiple experts can focus on representing parts of the data • But the expert contribution (importance) is contextual – Disregarding context introduces systematic error (invalid extrapolation) • The experts need observation-sensitive mixing proportions
  • 39. Discriminative Temporal Inference BM3E= Conditional Bayesian Mixture of Experts Markov Model • `Bottom-up’ chain Conditions on the observation p(xt | R t ) = ∫ p (x x t −1 t | x t −1 , rt ) p ( x t −1 | R t −1 ), where R t = (r1 ,..., rt ) Local conditional Temporal (filtered) prior • The temporal prior is a Gaussian mixture • The local conditional is a Bayesian mixture of Gaussian experts • Integrate pair-wise products of Gaussians analytically Sminchisescu, Kanaujia, Li, Metaxas ‘05
  • 40. Turn during Dancing <v> Notice imperfect silhouettes Sminchisescu, Kanaujia, Li, Metaxas ‘05
  • 41. Washing a Window <v> Input Filtered Smoothed Sminchisescu, Kanaujia, Li, Metaxas ‘05
  • 42. Low-dimensional Discriminative Inference • The pose prediction problem is highly structured – Human joint angles are correlated, not independent – Learn conditional mixtures between low-dimensional spaces decorrelated using kernel PCA (kBME) RVM – Relevance Vector Machine KDE – Kernel Dependency Estimator Sminchisescu, Kanaujia, Li, Metaxas ‘05
  • 43. Low-dimensional Discriminative Inference (translation removed for better comparison) 4d 10d 6d 56d Sminchisescu, Kanaujia, Li, Metaxas ‘05
  • 44. Evaluation on artificially generated silhouettes with 3d ground truth (average error / average maximum error, per joint angle, in degrees) 84 • NN = nearest neighbor • RVM = relevance vector machine • BME = conditional Bayesian mixture of experts
  • 45. Evaluation, low-dimensional models (average error / joint angle in degrees) • KDE-RR=ridge regressor between low-dimensional spaces • KDE-RVM=RVM between low-dimensional spaces – Unimodal methods average competing solutions • kBME=conditional Bayesian mixture between low- dimensional state and observation spaces – Training and inference is about 10 time faster
  • 46. Self-supervised Learning of a Joint Generative-Recognition Model • Maximize the probability of the (observed) evidence (e.g. images of humans) pθ ( x, r ) p ( x, r ) log pθ (r ) = log ∫ Qυ (x | r) ≥ ∫ Qυ (x | r)log θ = KL(Qυ (x | r) || pθ ( x, r )) x Qυ (x | r) x Qυ (x | r) log pθ (r ) − KL(Qυ (x | r) || pθ ( x | r )) = KL(Qυ (x | r) || pθ ( x, r )) • Hence, the KL divergence between what the generative model p infers and what the recognition model Q predicts, with tight bound at Qυ (x | r) = pθ ( x | r )
  • 47. Self-supervised Learning of a Joint Generative-Recognition Model • Local optimum for parameters • Recognition model is a conditional mixture of data- driven mean field experts – Fast expectations, dimensions decouple
  • 48. Generalization under clutter Mixture of experts Single expert Sminchisescu, Kanaujia, Metaxas ‘06
  • 49. Take home points • Multi-view 3d reconstruction reliable in the lab – Measurement-oriented – Geometric, marker-based • correspondence + triangulation – Optimize multi-view alignment • generative, model-based – Data-association in real-world (occlusions) open • Monocular 3d as robust limit of multi-view – Difficulties: depth perception + self-occlusion – Stronger dependency on efficient non-convex optimization and good observation models – Increased emphasis on prior vs. measurement
  • 50. Take home points (contd.) • Top-down / Generative / Alignment Models – Flexible, but difficult to model human appearance – Difficult optimization problems, local optima – Can learn constrained representations and parameters • Can handle occlusion, faster search (low-d) • Fewer local optima -- the best more likely true solutions • Discriminative / Conditional / Exemplar–based Models – Need to model complex multi-valued relations – Replace inference with indexing / prediction – Good for initialization, recovery from failure, on-line – Still need to deal with segmentation / data association
  • 51. For more information http://www.cs.toronto.edu/~crismin http://ttic.uchicago.edu/~crismin crismin@cs.toronto.edu crismin@nagoya.uchicago.edu For a tutorial on 3d human sensing from monocular images, see: http://www.cs.toronto.edu/~crismin/PAPERS/tutorial_iccv05.pdf
  • 52. Cristian Sminchisescu thanks the following collaborators on various elements of the research described in the 2d-3d relations section • Geoffrey Hinton (Toronto) • Allan Jepson (Toronto) • Atul Kanaujia (Rutgers) • Zhiguo Li (Rutgers) • Dimitris Metaxas (Rutgers) • Bill Triggs (INRIA) • Max Welling (UCIrvine)