SlideShare a Scribd company logo
1 of 62
A Unified Framework for Multi-Target
  Tracking and Collective Activity
             Recognition

     Wongun Choi and Silvio Savarese




    University of Michigan, Ann Arbor


 VisionLab
                                        1
VisionLab
            2
Our Goal




VisionLab
                       3
Our Goal
                •   Multiple target tracking




VisionLab
                                               4
Our Goal
                                   •   Multiple target tracking
                                   •   Recognize activities at
                                       different level of
                                       granularity
                                       –   Atomic activity & pose

                       Walking
                     Facing-back
        Walking
      Facing-front

  Walking
Facing-front




   VisionLab
                                                                    5
Our Goal
                                 •   Multiple target tracking
                                 •   Recognize activities at
     Moving-to-                      different level of
      Opposite
                                     granularity
                                     –   Atomic activity & pose
                                     –   Pairwise interaction

                    Walking-
                  Side-by-Side




VisionLab
                                                                  6
Our Goal
    Crossing       •   Multiple target tracking
                   •   Recognize activities at
                       different level of
                       granularity
                       –   Atomic activity & pose
                       –   Pairwise interaction
                       –   Collective Activity




VisionLab
                                                    7
Our Goal
    Crossing       •   Multiple target tracking
                   •   Recognize activities at
                       different level of
                       granularity
                       –   Atomic activity & pose
                       –   Pairwise interaction
                       –   Collective Activity


                   •   Solve all problems jointly!!




VisionLab
                                                    8
Background




Target                           Atomic                       Pairwise                         Collective
Tracking                         Activity                     Interaction                      Activity

•   Wu et al, 2007           •   Bobick & Davis, 2001    •   Zhou et al, 2008            •   Choi et al, 2009
•   Avidan, 2007             •   Efros et al, 2003       •   Ryoo & Aggarwal, 2009       •   Li et al, 2009
•   Zhang et al, 2008        •   Schuldt et al, 2004     •   Yao et al, 2010             •   Lan et al, 2010
•   Breistein et al, 2009    •   Dollar et al, 2005      •   Choi et al, 2010            •   Ryoo & Aggarwal, 2010
•
•
•
    Ess at al, 2009
    Wojek et al, 2009
    Geiger et al, 2011
                             •
                             •
                             •
                                 Investigated
                                 Niebles et al, 2006
                                 Laptev et al, 2008
                                 Rodriguez et al, 2008
                                                         •   Patron-perez et al, 2010    •
                                                                                         •
                                                                                         •
                                                                                             Choi et al, 2011
                                                                                             Khamis et al, 2011
                                                                                             Lan et al, 2012
•   Brendel et al, 2011      •   Wang & Mori, 2009                                       •   Khamis et al, 2012
•   Pirsiavash et al, 2011   •
                             •
                             •
                                 in isolation
                                 Gupta et al, 2009
                                 Liu et al, 2009
                                 Marszalek et al, 2009
                                                                                         •   Amer et al, 2012


                             •   Liu et al, 2011




                                                                              Hierarchy of activities
                                                                  •    Lan et al, 2010   •     Khamis et al, 2012
                                                                  •
      VisionLab
                                                                       Amer et al 2012                               9
Background




Target     Atomic     Pairwise      Collective
Tracking   Activity   Interaction   Activity




 VisionLab
Contributions
• Bottom-up activity understanding.




                                                          Collective
                                                           activity
                               Gathering

                                                        Interaction
                                Approaching
        Bottom-up




                               Approaching

                                                            Atomic
                                                           activity
                    Walking                   Walking

                                                        Trajectories
                     Walking




   VisionLab
                                                                       11
Contributions
• Bottom-up activity understanding.




        Meeting and
                                      Crossing
         Leaving




   VisionLab
                                                 12
Contributions
• Bottom-up activity understanding.
• Contextual information propagates top-down.



                                                             Collective
                                                              activity
                                  Gathering

                                                           Interaction
                                   Approaching
            Top-down




                                  Approaching

                                                               Atomic
                                                              activity
                       Walking                   Walking

                                                           Trajectories
                        Walking




   VisionLab
                                                                          13
Contributions
• Bottom-up activity understanding.
• Contextual information propagates top-down.




   Meeting and
    Leaving




   VisionLab
                                                14
Contributions
• Bottom-up activity understanding.
• Contextual information propagates top-down.




    Crossing




                             Simple Social Force Model
                              Pellegrini et al 2009
                              Choi et al 2010
                              Leal-Taixe et al 2012
                                                         Repulsion &
                              etc                        Attraction




   VisionLab
                                                                 15
Outline

• Joint Model

• Inference/Training method

• Experimental evaluation

• Conclusion




  VisionLab
                              16
Outline

• Joint Model

• Inference/Training method

• Experimental evaluation

• Conclusion




  VisionLab
                              17
Hierarchical Activity Model
Input: video with tracklets




    VisionLab
                              18
Hierarchical Activity Model
 t-1         t      t+1

  Oc         Oc       Oc                C



  C          C        C
                                        I13
                             I12                  I23
 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I

                                        A2
                              A1                 A3
A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A
                                   O1       O2   O3
OA(t)      OA(t)
           OO       OA(t)
OOA(t)       A(t)
               A
                    OOA(t)
    A                   A                               OC


         VisionLab
                                                             19
Hierarchical Activity Model
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C


 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                             20
Atomic-Observation Potential
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C      Atomic Activity Models
                             • Action: BoW with STIP
                             • Pose: HoG
 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A

                             Dollar et al, 06; Niebles et al, 07
                                                                   Dalal and Triggs, 05

OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                                                                          21
Interaction-Atomic Potential
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C                    I: Standing-in-a-line


                             A: Standing   A: Standing
                             Facing-left   Facing-left
 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A

                                                           A: Standing   A: Standing
                                                           Facing-left   Facing-left
OA(t)      OA(t)
           OO       OA(t)            Model
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                                                            22
Interaction-Atomic Potential
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C                    I: Standing-in-a-line


                             A: Standing   A: Standing
                             Facing-left   Facing-left
 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)            Model
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A
                                                         A: Standing    A: Standing
                                                         Facing-right   Facing-left




         VisionLab
                                                                                23
Collective-Interaction Potential

 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C              C: Queuing



 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)    OA(t)    Model                           I: one-after
                                                        I: one-after
OOA(t)     OOA(t)   OOA(t)
                                                 I: one-after -the-other
                                                         -the-other
    A          A        A                         -the-other
                                            I: standing-
                                            side-by-side



         VisionLab
                                                                 24
Collective-Interaction Potential

 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C              C: Queuing



 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A

                                                 I: facing-oppo facing-
                                                              I:
                                                    site-side each-other
                                           I: facing-
OA(t)      OA(t)
           OO       OA(t)    Model
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A
                                           each-other




         VisionLab
                                                                25
Collective-Observation Potential

 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C      Collective Activity
                             • STL of all targets

 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


                                         Choi et al, 09
OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                                          26
Activity Transition Potential
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C      Smooth activity transition



 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                                          27
Trajectory Estimation
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C


 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                   28
Tracklet Association Problem




VisionLab
                           29
Tracklet Association Problem




Input: Fragmented Trajectories (tracklets)
      •   Detector failures
      •   Occlusion between targets          Output: set of trajectories
      •   Scene clutter                      With correct IDs (color)
      •   etc..




  VisionLab
                                                                      30
Tracklet Association Model
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C


 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I

                             Simple match costs, c
A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)                             ?
  A           A       A         Location affinity
                                Appearance/Color     ?
                                ...


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                                         31
Tracklet Association Model
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C      Crossing




 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                        32
Tracklet Association Model
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C      Crossing




 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                        33
Tracklet Association Model
 t-1         t      t+1

  Oc         Oc       Oc


  C          C        C      Crossing




 I(t)
  I(t)      I(t)
             I(t)    I(t)
                      I(t)
    I          I        I


A(t)
A(t)        A(t)
            A(t)    A(t)
                    A(t)
  A           A       A


OA(t)      OA(t)
           OO       OA(t)
OOA(t)
    A        A(t)
               A
                    OOA(t)
                        A




         VisionLab
                                        34
Outline

• Joint Model

• Inference/Training method

• Experimental evaluation

• Conclusion




  VisionLab
                              35
Inference



            Non-convex Problem!!




VisionLab
                                   36
Inference



 Activity Recognition Given f     Tracklet Association Given C, I, A
     t-1     t       t+1
     Oc      Oc      Oc
                                              Crossing
      C       C       C

     I(t)
      I(t)
        I    I(t)
              I(t)
                I    I(t)
                      I(t)
                        I

     A(t) A(t) A(t)
     A(t) A(t) A(t)
      A    A    A

      OA      OA      OA                Novel Branch and Bound

Iterative Belief Propagation


VisionLab
                                                                  37
Training
• Model weights are learned in a Max-
  Margin framework using Structural
  SVM.




                          Tsochantaridis et al, 2004




  VisionLab
                                                       38
Outline

• Joint Model

• Inference/Training method

• Experimental evaluation

• Conclusion




  VisionLab
                              39
Experiments
• Collective Activity Dataset         Choi et al, 2009
  – 44 videos with multiple people
  – Crossing, Waiting, Queuing, Walking, Talking
                                             Target identities

                                             Interaction
                                             • Approaching
                                             • Leaving
                                             • Passing-by
                                             • Facing-each-other
  Crossing     Waiting      Queuing          • etc..


                                             Atomic Activity
                                             • facing-right
                                             • facing-left
                                             • …
                                             • Walking
  Walking      Talking                       • standing




 VisionLab
                                                                 40
Experiments
• New Dataset
  – 32 videos with multiple people
  – Gathering, Talking, Dismissal, Walking together
    , Chasing, Queuing
                                           Target identities

                                           Interaction
                                           • Approaching
                                           • Walking-in-oppos..
                                           • Facing-each-other
                                           • Standing-in-a-row
                                           • etc..
    Gathering        Talking   Dismissal

                                           Atomic Activity
                                           • facing-right
                                           • facing-left
                                           • …
                                           • Walking
                                           • Standing
                                           • running
 Walking-together    Chasing    Queuing

 VisionLab
                                                             41
Classification Results
                Overall Collective Activity
                  Classification Accuracy

                      79.9%
                                                    79.2%
    +6.6%
                                 +4.9%

                                  74.3%
    73.3%




Crowd Context         Ours    Crowd Context            Ours
   Choi et al, 2009              Choi et al, 2009


     Collective Activity
                                         New Dataset
        Dataset, 2009

 VisionLab
                                                              42
Target Association



                Tracklet

 # of error       1556

 Improvement       0%
over tracklet
                           Result of Dataset VSWS09




  VisionLab
                                               43
Target Association



                               No
                Tracklet   Interaction

 # of error       1556        1109

 Improvement       0%        28.73%
over tracklet
                                         Result of Dataset VSWS09




  VisionLab
                                                             44
Target Association



                               No           With
                Tracklet   Interaction   Interaction

 # of error       1556        1109          894

 Improvement       0%        28.73%        42.54%
over tracklet
                                             Result of Dataset VSWS09




  VisionLab
                                                                 45
Target Association



                               No           With         With GT
                Tracklet   Interaction   Interaction    Activities

 # of error       1556        1109          894             736

 Improvement       0%        28.73%        42.54%         52.76%
over tracklet
                                             Result of Dataset VSWS09




  VisionLab
                                                                   46
Example Classification Result




                    Interaction labels
                    AP: approaching
                    FE: facing-each-other
                    SR: standing-in-a-row
                    ...




 VisionLab
                                     47
Example Classification Result


                    Atomic Activities
                     Action:
                      W - walking
                      S – standing

                    Pose (8 directions)
                     L - left
                     LF– left/front
                     F – front
                     RF- right/front
                     etc.




 VisionLab
                                     48
Example Classification Result



                    Pair-Interactions
                    • AP: approaching
                    • .....
                    • FE: facing-each-
                      other
                    • SS: standing-side-
                      by-side
                    • SQ: standing-in-a-
                      queue




 VisionLab
                                    49
Example Classification Result




 VisionLab
                            50
Example Classification Result




                    Tracklet Association
                    Color/nNumber: ID
                    Solid boxes: tracklets
                    Dashed boxes:
                          match hypothesis




 VisionLab
                                     51
Conclusion
• Propose novel model for joint activity
  recognition and target association.

                      C


                     I13
              I12           I23


                      A2
               A1           A3

                    O1 O2   O3

                                  OC




 VisionLab
                                           53
Conclusion
• Propose novel model for joint activity
  recognition and target association.

• High level contextual information help
  improve target association accuracy
  significantly.

                    Crossing




 VisionLab
                                           54
Conclusion
• Propose novel model for joint activity
  recognition and target association.

• High level contextual information help
  improve target association accuracy
  significantly.

• Best classification results on collective
  activity up to date.




 VisionLab
                                           55
Thanks to

Yingze Bao, Byungsoo Kim, Min Sun, Yu Xiang,
         ONR and anonymous reviewers




VisionLab
                                         56
VisionLab
            57
Branch-and-Bound Association


                Not-PSD => non-convex


• General search algorithm
• Guarantee exact solution
• Require
 – Branch operation
 – Bound operation


VisionLab
                                        58
Branch-and-Bound Illustration

             Q        • Branch
                      • Bound
                      U(Q1) < L(Q0) !


                         U(Q)

                                 U(Q)
                        L(Q)

                                        L(Q)

       Q0        Q1                 Q

 VisionLab
                                        59
Branch-and-Bound Association

• Bound
  – Lower-bound                   ??




                                                I34
          for each interaction variable   I12




 VisionLab
                                                 60
Branch-and-Bound Association

• Bound (cont.)


  Per interaction variable

  Only one non-zero activation
  per line in Hi


            I12




 VisionLab
                                 61
Branch-and-Bound Association

• Bound (cont.)
 – Lower-bound



   • Binary Integer Programming


 – Upper-bound




 VisionLab
                                  62
Branch-and-Bound Association

• Branch
  – Divide problem into two disjoint
    sub-problem.    e.g. Q0 => f = [1, x, x, x, x, x, ….]
                                 Q1 => f = [0, x, x, x, x, x, ….]


  – Find the most ambiguous variable.




 VisionLab
                                                                    63

More Related Content

Viewers also liked

Practical obstetric simulation training
Practical obstetric simulation trainingPractical obstetric simulation training
Practical obstetric simulation training
MCH-org-ua
 

Viewers also liked (9)

Телемедицина та електронна інвентаризація: досвід областей
Телемедицина та електронна інвентаризація: досвід областейТелемедицина та електронна інвентаризація: досвід областей
Телемедицина та електронна інвентаризація: досвід областей
 
Gerbong maut
Gerbong mautGerbong maut
Gerbong maut
 
F255
F255F255
F255
 
Benefits and value of simulation societies
Benefits and value of simulation societiesBenefits and value of simulation societies
Benefits and value of simulation societies
 
Neonatal nosocomial infections
Neonatal nosocomial infectionsNeonatal nosocomial infections
Neonatal nosocomial infections
 
Famo.us - HTML5 Dev Conference Tech Talk
Famo.us - HTML5 Dev Conference Tech TalkFamo.us - HTML5 Dev Conference Tech Talk
Famo.us - HTML5 Dev Conference Tech Talk
 
CUSIM: The dream that came true
 CUSIM: The dream that came true CUSIM: The dream that came true
CUSIM: The dream that came true
 
Practical obstetric simulation training
Practical obstetric simulation trainingPractical obstetric simulation training
Practical obstetric simulation training
 
Taller de servicio al cliente
Taller de servicio al clienteTaller de servicio al cliente
Taller de servicio al cliente
 

Recently uploaded

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Choi ECCV12 presentation

  • 1. A Unified Framework for Multi-Target Tracking and Collective Activity Recognition Wongun Choi and Silvio Savarese University of Michigan, Ann Arbor VisionLab 1
  • 4. Our Goal • Multiple target tracking VisionLab 4
  • 5. Our Goal • Multiple target tracking • Recognize activities at different level of granularity – Atomic activity & pose Walking Facing-back Walking Facing-front Walking Facing-front VisionLab 5
  • 6. Our Goal • Multiple target tracking • Recognize activities at Moving-to- different level of Opposite granularity – Atomic activity & pose – Pairwise interaction Walking- Side-by-Side VisionLab 6
  • 7. Our Goal Crossing • Multiple target tracking • Recognize activities at different level of granularity – Atomic activity & pose – Pairwise interaction – Collective Activity VisionLab 7
  • 8. Our Goal Crossing • Multiple target tracking • Recognize activities at different level of granularity – Atomic activity & pose – Pairwise interaction – Collective Activity • Solve all problems jointly!! VisionLab 8
  • 9. Background Target Atomic Pairwise Collective Tracking Activity Interaction Activity • Wu et al, 2007 • Bobick & Davis, 2001 • Zhou et al, 2008 • Choi et al, 2009 • Avidan, 2007 • Efros et al, 2003 • Ryoo & Aggarwal, 2009 • Li et al, 2009 • Zhang et al, 2008 • Schuldt et al, 2004 • Yao et al, 2010 • Lan et al, 2010 • Breistein et al, 2009 • Dollar et al, 2005 • Choi et al, 2010 • Ryoo & Aggarwal, 2010 • • • Ess at al, 2009 Wojek et al, 2009 Geiger et al, 2011 • • • Investigated Niebles et al, 2006 Laptev et al, 2008 Rodriguez et al, 2008 • Patron-perez et al, 2010 • • • Choi et al, 2011 Khamis et al, 2011 Lan et al, 2012 • Brendel et al, 2011 • Wang & Mori, 2009 • Khamis et al, 2012 • Pirsiavash et al, 2011 • • • in isolation Gupta et al, 2009 Liu et al, 2009 Marszalek et al, 2009 • Amer et al, 2012 • Liu et al, 2011 Hierarchy of activities • Lan et al, 2010 • Khamis et al, 2012 • VisionLab Amer et al 2012 9
  • 10. Background Target Atomic Pairwise Collective Tracking Activity Interaction Activity VisionLab
  • 11. Contributions • Bottom-up activity understanding. Collective activity Gathering Interaction Approaching Bottom-up Approaching Atomic activity Walking Walking Trajectories Walking VisionLab 11
  • 12. Contributions • Bottom-up activity understanding. Meeting and Crossing Leaving VisionLab 12
  • 13. Contributions • Bottom-up activity understanding. • Contextual information propagates top-down. Collective activity Gathering Interaction Approaching Top-down Approaching Atomic activity Walking Walking Trajectories Walking VisionLab 13
  • 14. Contributions • Bottom-up activity understanding. • Contextual information propagates top-down. Meeting and Leaving VisionLab 14
  • 15. Contributions • Bottom-up activity understanding. • Contextual information propagates top-down. Crossing Simple Social Force Model Pellegrini et al 2009 Choi et al 2010 Leal-Taixe et al 2012 Repulsion & etc Attraction VisionLab 15
  • 16. Outline • Joint Model • Inference/Training method • Experimental evaluation • Conclusion VisionLab 16
  • 17. Outline • Joint Model • Inference/Training method • Experimental evaluation • Conclusion VisionLab 17
  • 18. Hierarchical Activity Model Input: video with tracklets VisionLab 18
  • 19. Hierarchical Activity Model t-1 t t+1 Oc Oc Oc C C C C I13 I12 I23 I(t) I(t) I(t) I(t) I(t) I(t) I I I A2 A1 A3 A(t) A(t) A(t) A(t) A(t) A(t) A A A O1 O2 O3 OA(t) OA(t) OO OA(t) OOA(t) A(t) A OOA(t) A A OC VisionLab 19
  • 20. Hierarchical Activity Model t-1 t t+1 Oc Oc Oc C C C I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 20
  • 21. Atomic-Observation Potential t-1 t t+1 Oc Oc Oc C C C Atomic Activity Models • Action: BoW with STIP • Pose: HoG I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A Dollar et al, 06; Niebles et al, 07 Dalal and Triggs, 05 OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 21
  • 22. Interaction-Atomic Potential t-1 t t+1 Oc Oc Oc C C C I: Standing-in-a-line A: Standing A: Standing Facing-left Facing-left I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A A: Standing A: Standing Facing-left Facing-left OA(t) OA(t) OO OA(t) Model OOA(t) A A(t) A OOA(t) A VisionLab 22
  • 23. Interaction-Atomic Potential t-1 t t+1 Oc Oc Oc C C C I: Standing-in-a-line A: Standing A: Standing Facing-left Facing-left I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) Model OOA(t) A A(t) A OOA(t) A A: Standing A: Standing Facing-right Facing-left VisionLab 23
  • 24. Collective-Interaction Potential t-1 t t+1 Oc Oc Oc C C C C: Queuing I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OA(t) Model I: one-after I: one-after OOA(t) OOA(t) OOA(t) I: one-after -the-other -the-other A A A -the-other I: standing- side-by-side VisionLab 24
  • 25. Collective-Interaction Potential t-1 t t+1 Oc Oc Oc C C C C: Queuing I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A I: facing-oppo facing- I: site-side each-other I: facing- OA(t) OA(t) OO OA(t) Model OOA(t) A A(t) A OOA(t) A each-other VisionLab 25
  • 26. Collective-Observation Potential t-1 t t+1 Oc Oc Oc C C C Collective Activity • STL of all targets I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A Choi et al, 09 OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 26
  • 27. Activity Transition Potential t-1 t t+1 Oc Oc Oc C C C Smooth activity transition I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 27
  • 28. Trajectory Estimation t-1 t t+1 Oc Oc Oc C C C I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 28
  • 30. Tracklet Association Problem Input: Fragmented Trajectories (tracklets) • Detector failures • Occlusion between targets Output: set of trajectories • Scene clutter With correct IDs (color) • etc.. VisionLab 30
  • 31. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C I(t) I(t) I(t) I(t) I(t) I(t) I I I Simple match costs, c A(t) A(t) A(t) A(t) A(t) A(t) ? A A A Location affinity Appearance/Color ? ... OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 31
  • 32. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C Crossing I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 32
  • 33. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C Crossing I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 33
  • 34. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C Crossing I(t) I(t) I(t) I(t) I(t) I(t) I I I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA(t) OA(t) OO OA(t) OOA(t) A A(t) A OOA(t) A VisionLab 34
  • 35. Outline • Joint Model • Inference/Training method • Experimental evaluation • Conclusion VisionLab 35
  • 36. Inference Non-convex Problem!! VisionLab 36
  • 37. Inference Activity Recognition Given f Tracklet Association Given C, I, A t-1 t t+1 Oc Oc Oc Crossing C C C I(t) I(t) I I(t) I(t) I I(t) I(t) I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA OA OA Novel Branch and Bound Iterative Belief Propagation VisionLab 37
  • 38. Training • Model weights are learned in a Max- Margin framework using Structural SVM. Tsochantaridis et al, 2004 VisionLab 38
  • 39. Outline • Joint Model • Inference/Training method • Experimental evaluation • Conclusion VisionLab 39
  • 40. Experiments • Collective Activity Dataset Choi et al, 2009 – 44 videos with multiple people – Crossing, Waiting, Queuing, Walking, Talking Target identities Interaction • Approaching • Leaving • Passing-by • Facing-each-other Crossing Waiting Queuing • etc.. Atomic Activity • facing-right • facing-left • … • Walking Walking Talking • standing VisionLab 40
  • 41. Experiments • New Dataset – 32 videos with multiple people – Gathering, Talking, Dismissal, Walking together , Chasing, Queuing Target identities Interaction • Approaching • Walking-in-oppos.. • Facing-each-other • Standing-in-a-row • etc.. Gathering Talking Dismissal Atomic Activity • facing-right • facing-left • … • Walking • Standing • running Walking-together Chasing Queuing VisionLab 41
  • 42. Classification Results Overall Collective Activity Classification Accuracy 79.9% 79.2% +6.6% +4.9% 74.3% 73.3% Crowd Context Ours Crowd Context Ours Choi et al, 2009 Choi et al, 2009 Collective Activity New Dataset Dataset, 2009 VisionLab 42
  • 43. Target Association Tracklet # of error 1556 Improvement 0% over tracklet Result of Dataset VSWS09 VisionLab 43
  • 44. Target Association No Tracklet Interaction # of error 1556 1109 Improvement 0% 28.73% over tracklet Result of Dataset VSWS09 VisionLab 44
  • 45. Target Association No With Tracklet Interaction Interaction # of error 1556 1109 894 Improvement 0% 28.73% 42.54% over tracklet Result of Dataset VSWS09 VisionLab 45
  • 46. Target Association No With With GT Tracklet Interaction Interaction Activities # of error 1556 1109 894 736 Improvement 0% 28.73% 42.54% 52.76% over tracklet Result of Dataset VSWS09 VisionLab 46
  • 47. Example Classification Result Interaction labels AP: approaching FE: facing-each-other SR: standing-in-a-row ... VisionLab 47
  • 48. Example Classification Result Atomic Activities Action: W - walking S – standing Pose (8 directions) L - left LF– left/front F – front RF- right/front etc. VisionLab 48
  • 49. Example Classification Result Pair-Interactions • AP: approaching • ..... • FE: facing-each- other • SS: standing-side- by-side • SQ: standing-in-a- queue VisionLab 49
  • 51. Example Classification Result Tracklet Association Color/nNumber: ID Solid boxes: tracklets Dashed boxes: match hypothesis VisionLab 51
  • 52. Conclusion • Propose novel model for joint activity recognition and target association. C I13 I12 I23 A2 A1 A3 O1 O2 O3 OC VisionLab 53
  • 53. Conclusion • Propose novel model for joint activity recognition and target association. • High level contextual information help improve target association accuracy significantly. Crossing VisionLab 54
  • 54. Conclusion • Propose novel model for joint activity recognition and target association. • High level contextual information help improve target association accuracy significantly. • Best classification results on collective activity up to date. VisionLab 55
  • 55. Thanks to Yingze Bao, Byungsoo Kim, Min Sun, Yu Xiang, ONR and anonymous reviewers VisionLab 56
  • 56. VisionLab 57
  • 57. Branch-and-Bound Association Not-PSD => non-convex • General search algorithm • Guarantee exact solution • Require – Branch operation – Bound operation VisionLab 58
  • 58. Branch-and-Bound Illustration Q • Branch • Bound U(Q1) < L(Q0) ! U(Q) U(Q) L(Q) L(Q) Q0 Q1 Q VisionLab 59
  • 59. Branch-and-Bound Association • Bound – Lower-bound ?? I34 for each interaction variable I12 VisionLab 60
  • 60. Branch-and-Bound Association • Bound (cont.) Per interaction variable Only one non-zero activation per line in Hi I12 VisionLab 61
  • 61. Branch-and-Bound Association • Bound (cont.) – Lower-bound • Binary Integer Programming – Upper-bound VisionLab 62
  • 62. Branch-and-Bound Association • Branch – Divide problem into two disjoint sub-problem. e.g. Q0 => f = [1, x, x, x, x, x, ….] Q1 => f = [0, x, x, x, x, x, ….] – Find the most ambiguous variable. VisionLab 63

Editor's Notes

  1. Good afternoon. I’m Wongun Choi from the University of Michigan. This is a joint work with my advisor, Silvio Savarese.
  2. Consider a video sequence with multiple people,
  3. our goal is to understand behavior of all individuals in the scene.
  4. Firstly, we want to estimate the trajectories of all individuals
  5. Also, we want to recognize semantic activities of the people in different level of granularities. Such activities include:[show1] single person activities in isolation which we call atomic activities, [show2] such as walking or standing, [show3] and the pose of individuals, [show4] such as facing-front or back.
  6. We also want to recognize Interplay between pairs of people, which we call interactions, for instance, [show1] walking side by side or [show2] moving to opposite direction.
  7. And finally, we want to identify the overall behavior of all people in the scene, the collective activity, [show1] for example crossing.
  8. Most importantly, we address all the problems in a unified framework.
  9. So far, a large literature have proposed methods for[show1] tracking multiple targets and recognizing[show2] atomic activity[show3] pairwise interaction[show4] and collective activity[show5] , but most of the times these problems are addressed in isolation[show6] Some exceptions are shown here. But they only focus on solving atomic and collective activity recognition.
  10. As opposed to previous works, we propose to solve all of the four problems in a joint fashion. Let me explain this concept in few more details.
  11. Our model is able to transfer the information in a bottom up fashion, [show1]From the estimation of trajectories of individuals and their atomic activities, we can obtain robust characterizations of interactions and collective activities.
  12. For instance, [show1] if we observe these trajectories, [show2] we can easily infer that the activity is meeting and leaving[show3] but if we see these trajectories, [show4] we will say it is an activity crossing.
  13. At the same time, the information flows from [show1] top to down so as to provide critical contextual information to the lower levels of hierarchy: collective activities help understand interactions; interactions help understands atomic activity and track associations.
  14. Let me give you an example of how activity understanding help trajectory estimation.[show1] For example, if we are given these set of broken trajectories and[show2] if we know that the underlying activity is meeting and leaving, we can interpret these trajectories as follows [show3]
  15. Now given the same broken trajectories but if we know that the activity is crossing, [show1] we can interpret the trajectories as follows.[show2] Similar concept is also introduced as a social force model in previous works, however these works only considered few hand designed types of interactions, such as [show3] repulsion and attraction. We generalize this concept and have high level activity understanding to guide the process of associating tracks.
  16. 3 Minutes up to this!!!This is an outline of today’s talk
  17. Lets begin with our Joint model.
  18. Given a video with a set of short fragment of trajectories[show1] tracklets,the activities of all is encoded as a hierarchical graphical model using factor graph. Each component of our factor graph model are shown on the left
  19. [show1] From each of the tracklets, we obtain observation cue for individuals.[show2] grounded on each observation, we model the atomic activity with variable A.[show3] we encode pair-wise interaction between individuals with variable I.[show4] finally, we assign one collective activity variable, C, to chracterize overall behavior of individuals. [show5] we also provide top-down observation cue for variable C[show6] By considering the temporal relathionship among variables as well, our full model can be compactly represented as the graph shown here.
  20. The model can be written as an energy function as shown in the right side.[show1] The energy can be factorized into multiple local potential each of which encodes relationship among variables.Let’s see the details of what each factors represents.
  21. The highlighted potential encodes the compatibility between atomic activity models and corresponding observations.[show1] Atomic activity are modeled by the BoW representation equipped with Spatio-Temporal-Interest-Point features [Show2] and pose are described using HOG
  22. The second potential psi (I,A,f) captures the compatibility between interactions models and observations of atomic activities. For instance[show1] here we illustrate a visualization of a possible learnt model for the “standing-in-a-line” interaction. This model captures the property that two people that “stand in a line” tend to be located nearby and face the same direction.[show2] Thus, if we are given these observations of atomic activities – standing and facing left – which  are compatible with the learnt “standing-in-a-line” interaction,[show3] the potential is high.
  23. On the other hand, if we are given these observations of atomic activities – standing, facing left, facing right,the compatibility with the model is weak, and thus[show1] the potential is low[show2] such relationship can be compactly represented as an equation below.
  24. Similarly, the potential psi(C,I) encodes the compatibility between collective activity models and a set of observations of pair-wise interactions.[show1] For example, here  we illustrate a visualization of a possible learnt model for the collective activity “queuing”. This model captures the probability of occurrences of interactions labels such as “one-after-the-other”, “facing-each-other”. For the collective activity “queueing”, the interaction“one-after-the-other”is highly probable to occur. The interaction “facing-each-other” is much less so.[show2] Thus, if we observe these set of interactions, [show3] standing-side-by-side, [show4] one-after-the-other, [show5] and so on, with Queing, [show3] the potential psi(C,I) is high
  25. On the other hand, [show1] if we observe these interactions, facing-each-other and facing-opposite-direction,[show2] the potential is low since these are not compatible with the learned model for queuing. [sohw3] such relationship is encoded by this equation.
  26. Similarly to bottom-upcues for atomic activities, The highlighted potential encodes the compatibility between collective activity models and corresponding observations.[show1] These are obtained using the crowd context descriptor introduced by Choi et al in 2009.
  27. Also, we encode temporal smoothness between[show1] collective activities, [show2] interactions, [show3] and atomic activitiesin adjacent time frames.
  28. Last term captures potential related to trajectory estimation. Before we talk about it, let me briefly discuss about how we define the tracking problem.
  29. Suppose we have the activity “people crossing”; [show1] it is likely that our low level observations won’t be just a pair of clean tracks such as the red and back ones.
  30. But we rather observe a fragmented set of trajectories – which we call tracklets. This is because of [show1] detection failures, occlusion, scene clutters, and so on. Given such set of initial inputs, our goal is [show2] to obtain trajectories with consistent IDs by associating the tracklets
  31. As in traditional track association problems, we introduce  [Show1] the variable f for capturing association hypotheses; A simple solution is to have the association cost vector c to encode properties such as[show2] location affinity, color similarity, and so on. As this example shows, these properties, don’t always work:[show3] location affinity becomes ambiguous at the point of crossing.[show4] Color or appearance similarity is not always reliable when people wear similar clothings
  32. In addition to traditional cues, we advocate that the interaction labels provides critical contextual information to guide the process of associating tracklets. [show1] Such information is encoded by the interaction potential that we discussed ealier.[show1] for instance, if we know the interaction is “crossing”, this type of association[next]
  33. [cont.] will give rise to high energy [show1] since this association is compatible with the leanrt model for crossing
  34. whereas this type of association [show1] will give rise to low energy.In the interest of time, we skip the details of mathematics here.
  35. Let’s see how we solve the inference problem and train the model.
  36. The inference problem can be represented as finding the configuration of all variables that maximize the joint energy function. [show1] The optimization, however, computationally very demanding
  37. Thus, we introduce a new iterative method to solve the problem. [show1] Given initial tracklet association, we obtain activity labels using[show2] iterative belief propagation.[show3] and given activity labels, [show4] we obtain tracklet association using our novel branch-and-bound method. For the interests of time, we skip the details of each inference method.
  38. Finally, we obtain the model parameters from set of training data using a structural svm framework .
  39. Let’s discuss our experimental evaluation
  40. For the evaluation, we use the collective activity dataset, proposed by us in 2009. In addition to the collective activity lables, we provided annotations for [show1] target identities[show2] interactions between pairs of people[show3] and atomic properties of individuals.
  41. Also, we collect an additional dataset to test our framework that is composed of 32 videos with 6 collective activities. We also provide labels for target identities, interactions and atomic activities similarly.
  42. Firstly, we compare the collective activity classification accuracy usingBaseline method using a crowd context descriptor, we introduced in 2009 and 11, and our full hierarchical representation. We obtain good improvement, about 6%, in overall classification by utilizing the hierarchical structure in our model tested on the collective activity dataset. Again, we observe similar improvement,about 5 %, in the new dataset.
  43. Now, we analyze the tracklet association results.The first row shows number of error in tracklet matchingAnd the second row shows % of improvement over input tracklets.
  44. By solving the tracklet association without interaction cues, we could obtain about 30% improvement over tracklets, which have abour 400 less matching error.
  45. By incorporating our interaction model with the estimated activity labels, We obtain 14% more improvement, which have abour 200 less matching errors than the baseline without interaction.
  46. Notice that if ground truth activity lables are given, we obtain an upper bound target association error of 736, that is corresponding to 53% improvement over the input. More quantitative evaluation can be found in the paper.
  47. Here we show examplar results we obtained from newly proposed dataset. The estimated collective activity lable is overlaid on top And interaction labels are displayed between pairs of people.
  48. Now let me show results from a more complex sequence. Here, we show the estimated atomic activity label for each individual, overlaid below each bounding boxes, where ….
  49. and interactions between pairs of people. SQ represent interaction standing in a queue.
  50. Finally, we show the estimated collective activity variable on top.
  51. Here, the video shows tracklet association result, we obtained using our unified framework. Color and number on top of bounding box shows identity of target, Solid boxes represent trackletsAnd dashed boxes shows smoothed paths that associate two tracklets.As you see, our model keep the targets’ identities same even after the occlusion by associating tracklets correctly.
  52. Finally, we examplarcomparison between the tracklet association results obtained without interaction model and with interaction model. [show1] Due to the severe occlusion incurred by the car motion, [show2] traditional method without interaction context, lost the targets identities and assign new ID for all after the occlusion. [show3] on the other hand, when we have the interaction in the association model, [show4] we could keep the identity of the targets in such a challenging scenario.
  53. In this paper, we proposed a novel model that seamlessly relate target tracking, and atomic activity, interaction and collective activity understanding. Also , we show that by solving all together, we can achieve better understanding about high level activities. Most interestingly, we also show that high level activity can help trajectory estimation better.
  54. In this paper, we proposed a novel model that seamlessly relate target tracking, and atomic activity, interaction and collective activity understanding. Also , we show that by solving all together, we can achieve better understanding about high level activities. Most interestingly, we also show that high level activity can help trajectory estimation better.
  55. In this paper, we proposed a novel model that seamlessly relate target tracking, and atomic activity, interaction and collective activity understanding. Also , we show that by solving all together, we can achieve better understanding about high level activities. Most interestingly, we also show that high level activity can help trajectory estimation better.