Choi ECCV12 presentation

A Unified Framework for Multi-Target
Tracking and Collective Activity
Recognition

Wongun Choi and Silvio Savarese

University of Michigan, Ann Arbor

VisionLab
1

Our Goal

VisionLab
3

Our Goal
• Multiple target tracking

VisionLab
4

Our Goal
• Recognize activities at
different level of
granularity
– Atomic activity & pose

Walking
Facing-back
Walking
Facing-front

Walking
Facing-front

VisionLab
5

Our Goal
Moving-to- different level of
Opposite
granularity
– Pairwise interaction

Walking-
Side-by-Side

VisionLab
6

Our Goal
Crossing • Multiple target tracking
different level of
granularity
– Collective Activity

VisionLab
7

Our Goal
Crossing • Multiple target tracking
different level of
granularity
– Collective Activity

• Solve all problems jointly!!

VisionLab
8

Background

Target Atomic Pairwise Collective
Tracking Activity Interaction Activity

• Wu et al, 2007 • Bobick & Davis, 2001 • Zhou et al, 2008 • Choi et al, 2009
• Avidan, 2007 • Efros et al, 2003 • Ryoo & Aggarwal, 2009 • Li et al, 2009
• Zhang et al, 2008 • Schuldt et al, 2004 • Yao et al, 2010 • Lan et al, 2010
• Breistein et al, 2009 • Dollar et al, 2005 • Choi et al, 2010 • Ryoo & Aggarwal, 2010
•
•
•
Ess at al, 2009
Wojek et al, 2009
Geiger et al, 2011
•
•
•
Investigated
Niebles et al, 2006
Laptev et al, 2008
Rodriguez et al, 2008
• Patron-perez et al, 2010 •
•
•
Choi et al, 2011
Khamis et al, 2011
Lan et al, 2012
• Brendel et al, 2011 • Wang & Mori, 2009 • Khamis et al, 2012
• Pirsiavash et al, 2011 •
•
•
in isolation
Gupta et al, 2009
Liu et al, 2009
Marszalek et al, 2009
• Amer et al, 2012

• Liu et al, 2011

Hierarchy of activities
• Lan et al, 2010 • Khamis et al, 2012
•
VisionLab
Amer et al 2012 9

Background

Target Atomic Pairwise Collective
Tracking Activity Interaction Activity

VisionLab

Contributions
• Bottom-up activity understanding.

Collective
activity
Gathering

Interaction
Approaching
Bottom-up

Approaching

Atomic
activity
Walking Walking

Trajectories
Walking

VisionLab
11

Contributions

Meeting and
Crossing
Leaving

VisionLab
12

Contributions
• Contextual information propagates top-down.

Collective
activity
Gathering

Interaction
Approaching
Top-down

Approaching

Atomic
activity
Walking Walking

Trajectories
Walking

VisionLab
13

Contributions

Meeting and
Leaving

VisionLab
14

Contributions

Crossing

Simple Social Force Model
Pellegrini et al 2009
Choi et al 2010
Leal-Taixe et al 2012
Repulsion &
etc Attraction

VisionLab
15

Outline

• Joint Model

• Inference/Training method

• Experimental evaluation

• Conclusion

VisionLab
16

Outline

• Joint Model



• Conclusion

VisionLab
17

Hierarchical Activity Model
Input: video with tracklets

VisionLab
18

t-1 t t+1

Oc Oc Oc C

C C C
I13
I12 I23
I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A2
A1 A3
A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A
O1 O2 O3
OA(t) OA(t)
OO OA(t)
OOA(t) A(t)
A
OOA(t)
A A OC

VisionLab
19

t-1 t t+1

Oc Oc Oc

C C C

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
20

Atomic-Observation Potential
t-1 t t+1

Oc Oc Oc

C C C Atomic Activity Models
• Action: BoW with STIP
• Pose: HoG
I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

Dollar et al, 06; Niebles et al, 07
Dalal and Triggs, 05

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
21

Interaction-Atomic Potential
t-1 t t+1

Oc Oc Oc

C C C I: Standing-in-a-line

A: Standing A: Standing
Facing-left Facing-left
I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t) Model
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
22

Interaction-Atomic Potential
t-1 t t+1

Oc Oc Oc

C C C I: Standing-in-a-line

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t) Model
OOA(t)
A A(t)
A
OOA(t)
A
Facing-right Facing-left

VisionLab
23

Collective-Interaction Potential

t-1 t t+1

Oc Oc Oc

C C C C: Queuing

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t) OA(t) Model I: one-after
I: one-after
OOA(t) OOA(t) OOA(t)
I: one-after -the-other
-the-other
A A A -the-other
I: standing-
side-by-side

VisionLab
24

Collective-Interaction Potential

t-1 t t+1

Oc Oc Oc

C C C C: Queuing

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

I: facing-oppo facing-
I:
site-side each-other
I: facing-
OA(t) OA(t)
OO OA(t) Model
OOA(t)
A A(t)
A
OOA(t)
A
each-other

VisionLab
25

Collective-Observation Potential

t-1 t t+1

Oc Oc Oc

C C C Collective Activity
• STL of all targets

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

Choi et al, 09
OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
26

Activity Transition Potential
t-1 t t+1

Oc Oc Oc

C C C Smooth activity transition

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
27

Trajectory Estimation
t-1 t t+1

Oc Oc Oc

C C C

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
28

Tracklet Association Problem

VisionLab
29

Tracklet Association Problem

Input: Fragmented Trajectories (tracklets)
• Detector failures
• Occlusion between targets Output: set of trajectories
• Scene clutter With correct IDs (color)
• etc..

VisionLab
30

Tracklet Association Model
t-1 t t+1

Oc Oc Oc

C C C

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

Simple match costs, c
A(t)
A(t) A(t)
A(t) A(t)
A(t) ?
A A A Location affinity
Appearance/Color ?
...

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
31

t-1 t t+1

Oc Oc Oc

C C C Crossing

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
32

t-1 t t+1

Oc Oc Oc

C C C Crossing

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
33

t-1 t t+1

Oc Oc Oc

C C C Crossing

I(t)
I(t) I(t)
I(t) I(t)
I(t)
I I I

A(t)
A(t) A(t)
A(t) A(t)
A(t)
A A A

OA(t) OA(t)
OO OA(t)
OOA(t)
A A(t)
A
OOA(t)
A

VisionLab
34

Outline

• Joint Model



• Conclusion

VisionLab
35

Inference

Non-convex Problem!!

VisionLab
36

Inference

Activity Recognition Given f Tracklet Association Given C, I, A
t-1 t t+1
Oc Oc Oc
Crossing
C C C

I(t)
I(t)
I I(t)
I(t)
I I(t)
I(t)
I

A(t) A(t) A(t)
A(t) A(t) A(t)
A A A

OA OA OA Novel Branch and Bound

Iterative Belief Propagation

VisionLab
37

Training
• Model weights are learned in a Max-
Margin framework using Structural
SVM.

Tsochantaridis et al, 2004

VisionLab
38

Outline

• Joint Model



• Conclusion

VisionLab
39

Experiments
• Collective Activity Dataset Choi et al, 2009
– 44 videos with multiple people
– Crossing, Waiting, Queuing, Walking, Talking
Target identities

Interaction
• Approaching
• Leaving
• Passing-by
• Facing-each-other
Crossing Waiting Queuing • etc..

Atomic Activity
• facing-right
• facing-left
• …
• Walking
Walking Talking • standing

VisionLab
40

Experiments
• New Dataset
– 32 videos with multiple people
– Gathering, Talking, Dismissal, Walking together
, Chasing, Queuing
Target identities

Interaction
• Approaching
• Walking-in-oppos..
• Facing-each-other
• Standing-in-a-row
• etc..
Gathering Talking Dismissal

Atomic Activity
• facing-right
• facing-left
• …
• Walking
• Standing
• running
Walking-together Chasing Queuing

VisionLab
41

Classification Results
Overall Collective Activity
Classification Accuracy

79.9%
79.2%
+6.6%
+4.9%

74.3%
73.3%

Crowd Context Ours Crowd Context Ours
Choi et al, 2009 Choi et al, 2009

Collective Activity
New Dataset
Dataset, 2009

VisionLab
42

Target Association

Tracklet

# of error 1556

Improvement 0%
over tracklet
Result of Dataset VSWS09

VisionLab
43

Target Association

No
Tracklet Interaction

# of error 1556 1109

Improvement 0% 28.73%
over tracklet

VisionLab
44

Target Association

No With
Tracklet Interaction Interaction

# of error 1556 1109 894

Improvement 0% 28.73% 42.54%
over tracklet

VisionLab
45

Target Association

No With With GT
Tracklet Interaction Interaction Activities

# of error 1556 1109 894 736

Improvement 0% 28.73% 42.54% 52.76%
over tracklet

VisionLab
46

Example Classification Result

Interaction labels
AP: approaching
FE: facing-each-other
SR: standing-in-a-row
...

VisionLab
47


Atomic Activities
Action:
W - walking
S – standing

Pose (8 directions)
L - left
LF– left/front
F – front
RF- right/front
etc.

VisionLab
48


Pair-Interactions
• AP: approaching
• .....
• FE: facing-each-
other
• SS: standing-side-
by-side
• SQ: standing-in-a-
queue

VisionLab
49


VisionLab
50


Tracklet Association
Color/nNumber: ID
Solid boxes: tracklets
Dashed boxes:
match hypothesis

VisionLab
51

Conclusion
• Propose novel model for joint activity
recognition and target association.

C

I13
I12 I23

A2
A1 A3

O1 O2 O3

OC

VisionLab
53

Conclusion

• High level contextual information help
improve target association accuracy
significantly.

Crossing

VisionLab
54

Conclusion

• High level contextual information help
improve target association accuracy
significantly.

• Best classification results on collective
activity up to date.

VisionLab
55

Thanks to

Yingze Bao, Byungsoo Kim, Min Sun, Yu Xiang,
ONR and anonymous reviewers

VisionLab
56

Branch-and-Bound Association

Not-PSD => non-convex

• General search algorithm
• Guarantee exact solution
• Require
– Branch operation
– Bound operation

VisionLab
58

Branch-and-Bound Illustration

Q • Branch
• Bound
U(Q1) < L(Q0) !

U(Q)

U(Q)
L(Q)

L(Q)

Q0 Q1 Q

VisionLab
59


• Bound
– Lower-bound ??

I34
for each interaction variable I12

VisionLab
60


• Bound (cont.)

Per interaction variable

Only one non-zero activation
per line in Hi

I12

VisionLab
61


• Bound (cont.)
– Lower-bound

• Binary Integer Programming

– Upper-bound

VisionLab
62


• Branch
– Divide problem into two disjoint
sub-problem. e.g. Q0 => f = [1, x, x, x, x, x, ….]
Q1 => f = [0, x, x, x, x, x, ….]

– Find the most ambiguous variable.

VisionLab
63

Choi ECCV12 presentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Recently uploaded

Recently uploaded (20)

Choi ECCV12 presentation

Editor's Notes