GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (tutorial)

GaitSet
R e ga r d i n g G a i t a s a S e t Fo r
C r o s s - Vi e w G a i t R e c o g n i t i o n

Gait Recognition: Identify persons by gait
PAMIGroup FDU
Long Distance: Gait VS. Face, Fingerprint, iris…
No Need of Cooperation: Gait VS. Fingerprint, iris…
Has broad applications in crime prevention,
forensic identification and social security
Robust to the Appearance Change: Gait VS. Person re-id
Classical approach
Gait template
Segmentation, Alignment
Remove color
& texture
Extract temporal
information
Pixel-level Operations

A template aggregates information in pixel-level.
It’s not rational. Obviously, it will lose temporal information.
PAMIGroup FDU

Neural network ought to have access to each frame.
Then… → Gait Sequence?
PAMIGroup FDU

Order in a sequence could cause a bunch of issues:
How to unify frame rate? Unify walking speed? Align first frame?
PAMIGroup FDU

Order in a sequence could cause a bunch of issues:
How to unify frame rate? Unify walking speed? Align first frame?
Let’s get rid of order!
SET
PAMIGroup FDU

Regarding Gait as Set
PAMIGroup FDU
Single Image Set of Image
Sequence of
Image
√
Permutation
invariance
Views
Walking
Condition
Single Multiple Multiple
Multiple
×
SingleSingle
√

GaitSet: Set Pooling (SP)
Use CNN to extract feature of each silhouette in an input set
Then What?
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
PAMIGroup FDU

CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
Set of feature maps → Feature map of set
Use CNN to extract feature of each silhouette in an input set
PAMIGroup FDU

• A permutation invariant function
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
Set Pooling
(SP)
• Able to take a set with arbitrary cardinality
The permutation of the element in the input set
should not influence the output, formulated as:
𝐺 𝑣 𝑗
𝑗 = 1,2, … , 𝑛 = 𝐺 𝑣 𝜋(𝑗)
𝑗 = 1,2, … , 𝑛
where 𝜋 is any permutation.
To ensure the flexibility of the model
Since in real life scenario, the number of a person’s
gait silhouettes can be arbitrary
PAMIGroup FDU

• Statistical Function
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
Set Pooling
(SP)
• Joint Functions
max ⋅ , mean ⋅ , median(⋅)
• max ⋅ + mean ⋅ + median ⋅
• 1_1𝐶(𝑐𝑎𝑡 max ⋅ , mean ⋅ , median ⋅ )
1_1𝐶: 1 × 1 convolutional layer,
𝑐𝑎𝑡: concatenate
• Attention
Copy𝑛times
1,𝑐,ℎ,𝑤
→(𝑛,𝑐,ℎ,𝑤)
𝐴
(1, 𝑐, ℎ, 𝑤)
×
(𝑛, 𝑐, ℎ, 𝑤) 1_1𝐶
𝑚𝑒𝑑𝑖𝑎𝑛
𝑚𝑒𝑎𝑛
𝑚𝑎𝑥
𝑐𝑎𝑡 + 𝑚𝑎𝑥 𝑧(𝑐, ℎ, 𝑤)
Use the global information to learn an attention
map for each frame-level feature map to refine itTorefine
featuremap

GaitSet: Horizontal Pyramid Mapping
• Strips: help the network focus on features with different scales
• Different discriminative information has different scales:
movement of hands, foots, heads, shoulders…
movement of arms, legs…
movement of whole body
• It is commonly used in person re-id
, , …
PAMIGroup FDU

GAP+GMP
𝑤
ℎ 𝑐 𝑑
𝑛=෍
𝑠=1
𝑆
2𝑠−1
fc1,1
fc2,2
fc2,1
fcS,1~2 𝑆−1
𝑑
𝑓′ 𝑓
𝑧2,1
𝑧2,2
GaitSet: Horizontal Pyramid Mapping
• Strips: help the network focus on features with different scales
• To be more efficient, we split the set-level feature map
instead of the original silhouettes.
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
SP
𝑐 𝑤
ℎ
GAP: Global Average Pooling
GMP: Global Max Pooling
• Parameters in FCs are not shared
Different strips represents features with
different scales in different positions.
PAMIGroup FDU

GaitSet: Main Pipeline
PAMIGroup FDU
CNN: extracting information from each silhouette
→ SP: aggregate frame information into set information
→ HPM: get discriminative representation
• Loss: Triplet Loss for each strip
• Test: concatenate all strips to
get the representation of the
input silhouette set

GaitSet: Pipeline
PAMIGroup FDU
• Multi-layer Global Pipeline (MGP)
Shallow layer focus on local and fine-grained information
Deep layer focus on global and coarse-grained information
Use MGP to collect various-level set information

Ablation
• Set VS. GEI (Line 1~2)
With identical network
Exceeds for over 10%
• HPM: Share VS. NOT (Line 2~3)
Exceeds around 2~3%
• MGP (Last 2 line)
Exceeds around 1~3%
• Different SP (Line 3~8)
CL: max(⋅)
BG: 1_1𝐶(𝑐𝑎𝑡 max, mean, median )
NM: attention
max(⋅) is chosen for its simplicity
CASIA-B:
124subjects×11views ×3walking conditions(10videos/view) = 13,640 videos
Walking Condition: NM: normal; BG: carrying bag; CL: with coat
PAMIGroup FDU

Effective CASIA-B:
124subjects×11views ×3walking conditions(10videos/view) = 13,640 videos

Effective
OU-MVLP:
10,307subjects(5153train/5154test)×14views ×2videos/view = 288,596 videos
• Exceed for
over 40%
PAMIGroup FDU

Fast
Directly learns the representation instead of measuring the similarity between a pair of gait.
DATA Network Representation
Learn representation
√ Linear: 𝑛 + 𝑚 × network complexity
For 𝑛 samples in probe and 𝑚 samples in gallery
Measure similarity between a pair of sample
× Quadratic: n × 𝑚 × network complexity
DATA1 Network
Same ID?
DATA2 Network
The average computational
complexity for one sample in
CASIA-B is 8.6GFLOPs
PAMIGroup FDU

Flexible: Limited Silhouettes
25.0
44.1
58.5
68.5
75.2
79.5
82.5
84.7 86.1
87.7
92.9
94.3 94.6 94.8
0.0
20.0
40.0
60.0
80.0
100.0
0 10 20 30 40 50 60 70 80 90 100
Random Images
All Images
95.0
25
Rank-1accuracy(%)
Number of selected Images
• Robustness: reach an 82.5% accuracy with 7 silhouettes
• Our method does learn the motional gait information
• The accuracy rises monotonically with the increase of the silhouette’s number
• One gait period contains around 25 silhouettes. More silhouettes will NOT bring much more
motional information. Consistently, the accuracy is close to the best performance at this position.
PAMIGroup FDU

Flexible: Multiple Views & Walking Conditions
Make full use of each silhouette
An input set can contain any number of non-consecutive silhouettes filmed
under different viewpoints with different walking conditions
Set Contains Two Views
• Combine both parallel and vertical information
• Generally, the larger the difference between
two views is, the better the results are
Set Contains Two Walking Conditions
• The accuracies rise with the increase of
silhouettes number.
• BG & CL have complementary information
PAMIGroup FDU

Thanks
Hanqing Chao
hqchao16@fudan.edu.cn
Yiwei He
heyw15@fudan.edu.cn
Junping Zhang *
jpzhang@fudan.edu.cn
Jianfeng Feng
jffeng@fudan.edu.cn

GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (tutorial)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (tutorial)

Similar to GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (tutorial) (20)

Recently uploaded

Recently uploaded (20)

GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (tutorial)