Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Choi ECCV12 presentation

266 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Choi ECCV12 presentation

  1. 1. A Unified Framework for Multi-Target Tracking and Collective Activity Recognition Wongun Choi and Silvio Savarese University of Michigan, Ann Arbor VisionLab 1
  2. 2. VisionLab 2
  3. 3. Our GoalVisionLab 3
  4. 4. Our Goal • Multiple target trackingVisionLab 4
  5. 5. Our Goal • Multiple target tracking • Recognize activities at different level of granularity – Atomic activity & pose Walking Facing-back Walking Facing-front WalkingFacing-front VisionLab 5
  6. 6. Our Goal • Multiple target tracking • Recognize activities at Moving-to- different level of Opposite granularity – Atomic activity & pose – Pairwise interaction Walking- Side-by-SideVisionLab 6
  7. 7. Our Goal Crossing • Multiple target tracking • Recognize activities at different level of granularity – Atomic activity & pose – Pairwise interaction – Collective ActivityVisionLab 7
  8. 8. Our Goal Crossing • Multiple target tracking • Recognize activities at different level of granularity – Atomic activity & pose – Pairwise interaction – Collective Activity • Solve all problems jointly!!VisionLab 8
  9. 9. BackgroundTarget Atomic Pairwise CollectiveTracking Activity Interaction Activity• Wu et al, 2007 • Bobick & Davis, 2001 • Zhou et al, 2008 • Choi et al, 2009• Avidan, 2007 • Efros et al, 2003 • Ryoo & Aggarwal, 2009 • Li et al, 2009• Zhang et al, 2008 • Schuldt et al, 2004 • Yao et al, 2010 • Lan et al, 2010• Breistein et al, 2009 • Dollar et al, 2005 • Choi et al, 2010 • Ryoo & Aggarwal, 2010••• Ess at al, 2009 Wojek et al, 2009 Geiger et al, 2011 • • • Investigated Niebles et al, 2006 Laptev et al, 2008 Rodriguez et al, 2008 • Patron-perez et al, 2010 • • • Choi et al, 2011 Khamis et al, 2011 Lan et al, 2012• Brendel et al, 2011 • Wang & Mori, 2009 • Khamis et al, 2012• Pirsiavash et al, 2011 • • • in isolation Gupta et al, 2009 Liu et al, 2009 Marszalek et al, 2009 • Amer et al, 2012 • Liu et al, 2011 Hierarchy of activities • Lan et al, 2010 • Khamis et al, 2012 • VisionLab Amer et al 2012 9
  10. 10. BackgroundTarget Atomic Pairwise CollectiveTracking Activity Interaction Activity VisionLab
  11. 11. Contributions• Bottom-up activity understanding. Collective activity Gathering Interaction Approaching Bottom-up Approaching Atomic activity Walking Walking Trajectories Walking VisionLab 11
  12. 12. Contributions• Bottom-up activity understanding. Meeting and Crossing Leaving VisionLab 12
  13. 13. Contributions• Bottom-up activity understanding.• Contextual information propagates top-down. Collective activity Gathering Interaction Approaching Top-down Approaching Atomic activity Walking Walking Trajectories Walking VisionLab 13
  14. 14. Contributions• Bottom-up activity understanding.• Contextual information propagates top-down. Meeting and Leaving VisionLab 14
  15. 15. Contributions• Bottom-up activity understanding.• Contextual information propagates top-down. Crossing Simple Social Force Model Pellegrini et al 2009 Choi et al 2010 Leal-Taixe et al 2012 Repulsion & etc Attraction VisionLab 15
  16. 16. Outline• Joint Model• Inference/Training method• Experimental evaluation• Conclusion VisionLab 16
  17. 17. Outline• Joint Model• Inference/Training method• Experimental evaluation• Conclusion VisionLab 17
  18. 18. Hierarchical Activity ModelInput: video with tracklets VisionLab 18
  19. 19. Hierarchical Activity Model t-1 t t+1 Oc Oc Oc C C C C I13 I12 I23 I(t) I(t) I(t) I(t) I(t) I(t) I I I A2 A1 A3A(t)A(t) A(t) A(t) A(t) A(t) A A A O1 O2 O3OA(t) OA(t) OO OA(t)OOA(t) A(t) A OOA(t) A A OC VisionLab 19
  20. 20. Hierarchical Activity Model t-1 t t+1 Oc Oc Oc C C C I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 20
  21. 21. Atomic-Observation Potential t-1 t t+1 Oc Oc Oc C C C Atomic Activity Models • Action: BoW with STIP • Pose: HoG I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A A Dollar et al, 06; Niebles et al, 07 Dalal and Triggs, 05OA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 21
  22. 22. Interaction-Atomic Potential t-1 t t+1 Oc Oc Oc C C C I: Standing-in-a-line A: Standing A: Standing Facing-left Facing-left I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A A A: Standing A: Standing Facing-left Facing-leftOA(t) OA(t) OO OA(t) ModelOOA(t) A A(t) A OOA(t) A VisionLab 22
  23. 23. Interaction-Atomic Potential t-1 t t+1 Oc Oc Oc C C C I: Standing-in-a-line A: Standing A: Standing Facing-left Facing-left I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t) ModelOOA(t) A A(t) A OOA(t) A A: Standing A: Standing Facing-right Facing-left VisionLab 23
  24. 24. Collective-Interaction Potential t-1 t t+1 Oc Oc Oc C C C C: Queuing I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OA(t) Model I: one-after I: one-afterOOA(t) OOA(t) OOA(t) I: one-after -the-other -the-other A A A -the-other I: standing- side-by-side VisionLab 24
  25. 25. Collective-Interaction Potential t-1 t t+1 Oc Oc Oc C C C C: Queuing I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A A I: facing-oppo facing- I: site-side each-other I: facing-OA(t) OA(t) OO OA(t) ModelOOA(t) A A(t) A OOA(t) A each-other VisionLab 25
  26. 26. Collective-Observation Potential t-1 t t+1 Oc Oc Oc C C C Collective Activity • STL of all targets I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A A Choi et al, 09OA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 26
  27. 27. Activity Transition Potential t-1 t t+1 Oc Oc Oc C C C Smooth activity transition I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 27
  28. 28. Trajectory Estimation t-1 t t+1 Oc Oc Oc C C C I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 28
  29. 29. Tracklet Association ProblemVisionLab 29
  30. 30. Tracklet Association ProblemInput: Fragmented Trajectories (tracklets) • Detector failures • Occlusion between targets Output: set of trajectories • Scene clutter With correct IDs (color) • etc.. VisionLab 30
  31. 31. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C I(t) I(t) I(t) I(t) I(t) I(t) I I I Simple match costs, cA(t)A(t) A(t) A(t) A(t) A(t) ? A A A Location affinity Appearance/Color ? ...OA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 31
  32. 32. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C Crossing I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 32
  33. 33. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C Crossing I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 33
  34. 34. Tracklet Association Model t-1 t t+1 Oc Oc Oc C C C Crossing I(t) I(t) I(t) I(t) I(t) I(t) I I IA(t)A(t) A(t) A(t) A(t) A(t) A A AOA(t) OA(t) OO OA(t)OOA(t) A A(t) A OOA(t) A VisionLab 34
  35. 35. Outline• Joint Model• Inference/Training method• Experimental evaluation• Conclusion VisionLab 35
  36. 36. Inference Non-convex Problem!!VisionLab 36
  37. 37. Inference Activity Recognition Given f Tracklet Association Given C, I, A t-1 t t+1 Oc Oc Oc Crossing C C C I(t) I(t) I I(t) I(t) I I(t) I(t) I A(t) A(t) A(t) A(t) A(t) A(t) A A A OA OA OA Novel Branch and BoundIterative Belief PropagationVisionLab 37
  38. 38. Training• Model weights are learned in a Max- Margin framework using Structural SVM. Tsochantaridis et al, 2004 VisionLab 38
  39. 39. Outline• Joint Model• Inference/Training method• Experimental evaluation• Conclusion VisionLab 39
  40. 40. Experiments• Collective Activity Dataset Choi et al, 2009 – 44 videos with multiple people – Crossing, Waiting, Queuing, Walking, Talking Target identities Interaction • Approaching • Leaving • Passing-by • Facing-each-other Crossing Waiting Queuing • etc.. Atomic Activity • facing-right • facing-left • … • Walking Walking Talking • standing VisionLab 40
  41. 41. Experiments• New Dataset – 32 videos with multiple people – Gathering, Talking, Dismissal, Walking together , Chasing, Queuing Target identities Interaction • Approaching • Walking-in-oppos.. • Facing-each-other • Standing-in-a-row • etc.. Gathering Talking Dismissal Atomic Activity • facing-right • facing-left • … • Walking • Standing • running Walking-together Chasing Queuing VisionLab 41
  42. 42. Classification Results Overall Collective Activity Classification Accuracy 79.9% 79.2% +6.6% +4.9% 74.3% 73.3%Crowd Context Ours Crowd Context Ours Choi et al, 2009 Choi et al, 2009 Collective Activity New Dataset Dataset, 2009 VisionLab 42
  43. 43. Target Association Tracklet # of error 1556 Improvement 0%over tracklet Result of Dataset VSWS09 VisionLab 43
  44. 44. Target Association No Tracklet Interaction # of error 1556 1109 Improvement 0% 28.73%over tracklet Result of Dataset VSWS09 VisionLab 44
  45. 45. Target Association No With Tracklet Interaction Interaction # of error 1556 1109 894 Improvement 0% 28.73% 42.54%over tracklet Result of Dataset VSWS09 VisionLab 45
  46. 46. Target Association No With With GT Tracklet Interaction Interaction Activities # of error 1556 1109 894 736 Improvement 0% 28.73% 42.54% 52.76%over tracklet Result of Dataset VSWS09 VisionLab 46
  47. 47. Example Classification Result Interaction labels AP: approaching FE: facing-each-other SR: standing-in-a-row ... VisionLab 47
  48. 48. Example Classification Result Atomic Activities Action: W - walking S – standing Pose (8 directions) L - left LF– left/front F – front RF- right/front etc. VisionLab 48
  49. 49. Example Classification Result Pair-Interactions • AP: approaching • ..... • FE: facing-each- other • SS: standing-side- by-side • SQ: standing-in-a- queue VisionLab 49
  50. 50. Example Classification Result VisionLab 50
  51. 51. Example Classification Result Tracklet Association Color/nNumber: ID Solid boxes: tracklets Dashed boxes: match hypothesis VisionLab 51
  52. 52. Conclusion• Propose novel model for joint activity recognition and target association. C I13 I12 I23 A2 A1 A3 O1 O2 O3 OC VisionLab 53
  53. 53. Conclusion• Propose novel model for joint activity recognition and target association.• High level contextual information help improve target association accuracy significantly. Crossing VisionLab 54
  54. 54. Conclusion• Propose novel model for joint activity recognition and target association.• High level contextual information help improve target association accuracy significantly.• Best classification results on collective activity up to date. VisionLab 55
  55. 55. Thanks toYingze Bao, Byungsoo Kim, Min Sun, Yu Xiang, ONR and anonymous reviewersVisionLab 56
  56. 56. VisionLab 57
  57. 57. Branch-and-Bound Association Not-PSD => non-convex• General search algorithm• Guarantee exact solution• Require – Branch operation – Bound operationVisionLab 58
  58. 58. Branch-and-Bound Illustration Q • Branch • Bound U(Q1) < L(Q0) ! U(Q) U(Q) L(Q) L(Q) Q0 Q1 Q VisionLab 59
  59. 59. Branch-and-Bound Association• Bound – Lower-bound ?? I34 for each interaction variable I12 VisionLab 60
  60. 60. Branch-and-Bound Association• Bound (cont.) Per interaction variable Only one non-zero activation per line in Hi I12 VisionLab 61
  61. 61. Branch-and-Bound Association• Bound (cont.) – Lower-bound • Binary Integer Programming – Upper-bound VisionLab 62
  62. 62. Branch-and-Bound Association• Branch – Divide problem into two disjoint sub-problem. e.g. Q0 => f = [1, x, x, x, x, x, ….] Q1 => f = [0, x, x, x, x, x, ….] – Find the most ambiguous variable. VisionLab 63

×