Human Action Recognition Using 3D
Joint Information and Pyramidal
HOOFD Features
MSc Thesis by
Barış Can Üstündağ
Thesis Advisor: Prof. Dr. Mustafa Ünel
Buraya görseller eklenecek
• Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
• Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
• Motion Perception
– Gunnar Johansson [1971]
• Sequence of images for
Human Motion Analysis
• ‘Moving Light Displays’
enable identification of
people and gender
• Motion Capture [2014]
– Dawn of the Planet of
the Apes
Motivation
• Vast amount of Data
YouTube
• More than 34K hours of video
uploaded every day
Surveillance Cameras
• ~30 M cameras in the US
• ~700K video hours every day
Motivation
• Video Categorization
Movies TV YouTube
Motivation
• Video Categorization
– How many human-pixels are there?
Movies TV YouTube
Motivation
• Video Categorization
– How many human-pixels are there?
Movies TV YouTube
35% 34% 40%
Motivation
• Rehabilitation
– 15M people suffer fom
stroke every year
– Automated systems
– Gamification
Motivation - Application
• Release of Low-cost Depth Cameras
– Kinect (2010)
– Google Tango (developers only, 2014)
– Leap Motion (2013)
• Effective and robust performance
given
– Complex background
– Challenging viewpoints
– Occlusions
Motivation – Why depth?
Google Tango
Leap Motion
Intensity
Based
• Extraction of Cuboids
• Motion History
Images, Motion
Energy Images
Depth
Map
Based
• Depth Motion Maps
• Histogram of
Oriented 4D Normals
Skeletal
Data
Based
• SMIJ – Sequence of
most informative
Joints
• HOJ3D – Histogram
of 3D Joint Locations
Related Work
Related Work
• Extraction of Cuboids,
Dollar et al. [CVPR, 2005]
• Motion History Images
Motion Energy Images,
Gorelick et al. [PAMI, 2007]
Intensity
Based
Related Work
• Histogram of Oriented
4D Normals (HON4D)
Oreifej et al. [CVPR, 2013]
• Depth Motion Maps,
Yang et al. [JRTIP, 2012]
Depth Map
Based
Related Work
• Sequence of Most Informative Joints (SMIJ),
Ofli et al. [CVIU, 2013]
• View Invariant Human
Action Recognition
Using Histogram of
3D Joints,
Xia et al. [CVPR, 2012]
Skeletal Data
Based
• Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
Human Action Recognition Using 3D Joint Information and HOOFD Features
• Depth
Acquisition
• Formation of
shadows
• Eliminating the
noise
• 3D Joints
• HOOFD
• Signal Warping
• Pyramidal HOOFD
Features
• Naive Bayes
• Support Vector
Machines
• Kinect
– Depth data acquisition is
accomplised by using ‘Light
Coding’ Method
• In order to process the depth data
in any application
– Formation of shadows
– Eliminating the noise
Acquiring Depth
Data
Feature
Extraction
Feature
Representation
Classification
• Shadows
– Generated by the foreground objects
• Noise
– Rough object boundaries caused gaps
and holes on depth data
• Bilateral Filter
Space term Range term
Acquiring Depth
Data
Feature
Extraction
Feature
Representation
Classification
• Joint Features
– 20 Joints are provided
by Kinect SDK
– 10 Joint Angles and their
derivatives calculated:
T
kk
k
1

• Joint Features
– Mapped to spherical
Coordinates
– Origin is aligned to
the hip center
– Radius parameter is
discarded
Acquiring
Depth Data
Feature Extraction
Feature
Representation
Classification
• Histogram of Oriented Optical Flows
from Depth (HOOFD)
Acquiring
Depth Data
Feature Extraction
Feature
Representation
Classification
- Optical Flow from Depth Data
•Mapping of depth data to intensity image
•Depth values (z) represented as intensity (I)
•Optical flow field which is invariant to sudden change of brightness
- Optical Flow
• 2D displacement of pixel patches on the
image plane
• Brightness Constancy Equation
• Linearizing assuming small (u,v) using Taylor
Series Expansion
• Histogram of Oriented Optical Flows
from Depth (HOOFD)
),(),,( , ttyyxxItyxI 
0),,(),(),,(),(),,(  tyxIyxvtyxIyxutyxI tyx
t
x
yxu


),(
t
y
yxv


),(
• Optical Flow – Lucas Kanade Method
• Apply it within a local patch
• Minimize using Least-Squares method
 

yx
tyx IvtyxIutyxIvuE
,
2
),,(),,(),(


























ty
tx
yyx
yxx
II
II
v
u
III
III
2
2
bA u

  bAAA TT 1
u



• Optical Flow – Horn Schunk Method
• Assumption: global smoothness in the flow over the whole image
    dydxvvuuE
D
yxyxs   2222Smoothness error:
  dydxIvIuIE
D
tyxc   2Error in brightness
constancy equation
sc EE Minimize:
• Histogram of Oriented Optical Flow from
Depth
• Binning according to:
– Primary Angle between the flow vector and the horizontal axis
– Magnitude of the flow vector
• Orientation & Magnitude images
Histogram Binning example with bin size = 4
)(tan 1
u
v
 22
vuM 
• Signal Warping
– If it is a longer action instance -> Discard frames
– If it is a shorter action instance -> Replicate and
insert frames
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
• Pyramidal HOOFD Features
– Histogram of Oriented Optical Flow from Depth
After obtaining optical flows patches
1. Patches are extracted around each joint
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
• Pyramidal HOOFD Features
– Histogram of Oriented Optical Flow from Depth
After obtaining optical flows patches
1. Patches are extracted around each joint
2. HOOFDs are calculated in a pyramidal
fashion
Level 2
Level 3
Level 1
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
Level 2
Level 3
Level 1
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
Level 2
Level 3
Level 1
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
• Supervised learning
methods
– Training examples are
attached to known classes
• Spam filtering on an e-mail
client
– Examples: Naive Bayes,
Support Vector Machines
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
• Naive Bayes Classifier
– Independence assumption between features
• For example: a car ‘Volkswagen’ with a red color
and 17 inch wheels and these features contribute
independently to classify that this car is a
‘Volkswagen’
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
• Support Vector Machines
– Calculates the choice of the most optimal
hyperplane that defines the decision boundary
between two classes
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
• Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Action Recognition Using 3D Joint Information and HOOFD Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
• Datasets
– MSR Action 3D
• 10 Subjects
• 20 Actions
– MSR Pairs 3D
• 10 Subjects
• 12 Actions
– MSRC-12 Gesture
• 30 Subjects
• 12 Actions
Experiments
Experiment - 1
Settings
• Dataset: MSRC-12 Gesture
• Feature: Joint Features
• Ratio:
• Leave-one-subject-out-cross-valuation
• 50% Training 50% Test
• 75% Training 25% Test
Experiment - 1
Experiment - 1
Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test
Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test
Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test
Smash Action Forward Punch Action
Experiment - 3
Settings
• Feature: HOOFD Features
• Dataset: MSR Action Pairs
• Ratio: 50% Training 50% Test
Conclusion & Future Work
• We developed a novel human action recognition framework by fusing 3D Joint information and
HOOFD features
• We proposed a new feature called Histogram of Oriented Optical Flow from Depth (HOOFD)
• Several experiments with publicly available datasets were conducted to assess the performance of
the proposed technique.
• Comparison with state-of-the-art algorithms show the success of our algorithm.
• As future work,
– Potential of HOOFD will be fully explored
– Different popular classification approaches will be employed (Bag of Words, Random Forest, Boosted Trees)
Thank You ...
???

Human Action Recognition Using 3D Joint Information and HOOFD Features

  • 1.
    Human Action RecognitionUsing 3D Joint Information and Pyramidal HOOFD Features MSc Thesis by Barış Can Üstündağ Thesis Advisor: Prof. Dr. Mustafa Ünel Buraya görseller eklenecek
  • 2.
    • Introduction toHuman Action Recognition – Motivation, Applications – Related Work • Human Action Recognition Using 3D Joint Information and HOOFD Features – Acquiring Depth Data – Feature Extraction • 3D Joints • HOOFD – Feature Representation – Classification • Experiments – Datasets • MSR Action 3D Dataset • MSR Action Pairs Dataset • MSRC-12 Gesture Dataset • Conclusions & Future Work Outline
  • 3.
    • Introduction toHuman Action Recognition – Motivation, Applications – Related Work • Human Action Recognition Using 3D Joint Information and HOOFD Features – Acquiring Depth Data – Feature Extraction • 3D Joints • HOOFD – Feature Representation – Classification • Experiments – Datasets • MSR Action 3D Dataset • MSR Action Pairs Dataset • MSRC-12 Gesture Dataset • Conclusions & Future Work Outline
  • 4.
    • Motion Perception –Gunnar Johansson [1971] • Sequence of images for Human Motion Analysis • ‘Moving Light Displays’ enable identification of people and gender • Motion Capture [2014] – Dawn of the Planet of the Apes Motivation
  • 5.
    • Vast amountof Data YouTube • More than 34K hours of video uploaded every day Surveillance Cameras • ~30 M cameras in the US • ~700K video hours every day Motivation
  • 6.
    • Video Categorization MoviesTV YouTube Motivation
  • 7.
    • Video Categorization –How many human-pixels are there? Movies TV YouTube Motivation
  • 8.
    • Video Categorization –How many human-pixels are there? Movies TV YouTube 35% 34% 40% Motivation
  • 9.
    • Rehabilitation – 15Mpeople suffer fom stroke every year – Automated systems – Gamification Motivation - Application
  • 10.
    • Release ofLow-cost Depth Cameras – Kinect (2010) – Google Tango (developers only, 2014) – Leap Motion (2013) • Effective and robust performance given – Complex background – Challenging viewpoints – Occlusions Motivation – Why depth? Google Tango Leap Motion
  • 11.
    Intensity Based • Extraction ofCuboids • Motion History Images, Motion Energy Images Depth Map Based • Depth Motion Maps • Histogram of Oriented 4D Normals Skeletal Data Based • SMIJ – Sequence of most informative Joints • HOJ3D – Histogram of 3D Joint Locations Related Work
  • 12.
    Related Work • Extractionof Cuboids, Dollar et al. [CVPR, 2005] • Motion History Images Motion Energy Images, Gorelick et al. [PAMI, 2007] Intensity Based
  • 13.
    Related Work • Histogramof Oriented 4D Normals (HON4D) Oreifej et al. [CVPR, 2013] • Depth Motion Maps, Yang et al. [JRTIP, 2012] Depth Map Based
  • 14.
    Related Work • Sequenceof Most Informative Joints (SMIJ), Ofli et al. [CVIU, 2013] • View Invariant Human Action Recognition Using Histogram of 3D Joints, Xia et al. [CVPR, 2012] Skeletal Data Based
  • 15.
    • Introduction toHuman Action Recognition – Motivation, Applications – Related Work • Human Action Recognition Using 3D Joint Information and HOOFD Features – Acquiring Depth Data – Feature Extraction • 3D Joints • HOOFD – Feature Representation – Classification • Experiments – Datasets • MSR Action 3D Dataset • MSR Action Pairs Dataset • MSRC-12 Gesture Dataset • Conclusions & Future Work Outline
  • 16.
    Acquiring Depth Data Feature Extraction Feature Representation Classification Human ActionRecognition Using 3D Joint Information and HOOFD Features • Depth Acquisition • Formation of shadows • Eliminating the noise • 3D Joints • HOOFD • Signal Warping • Pyramidal HOOFD Features • Naive Bayes • Support Vector Machines
  • 17.
    • Kinect – Depthdata acquisition is accomplised by using ‘Light Coding’ Method • In order to process the depth data in any application – Formation of shadows – Eliminating the noise Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 18.
    • Shadows – Generatedby the foreground objects • Noise – Rough object boundaries caused gaps and holes on depth data • Bilateral Filter Space term Range term Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 19.
    • Joint Features –20 Joints are provided by Kinect SDK – 10 Joint Angles and their derivatives calculated: T kk k 1 
  • 20.
    • Joint Features –Mapped to spherical Coordinates – Origin is aligned to the hip center – Radius parameter is discarded Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 21.
    • Histogram ofOriented Optical Flows from Depth (HOOFD) Acquiring Depth Data Feature Extraction Feature Representation Classification - Optical Flow from Depth Data •Mapping of depth data to intensity image •Depth values (z) represented as intensity (I) •Optical flow field which is invariant to sudden change of brightness
  • 22.
    - Optical Flow •2D displacement of pixel patches on the image plane • Brightness Constancy Equation • Linearizing assuming small (u,v) using Taylor Series Expansion • Histogram of Oriented Optical Flows from Depth (HOOFD) ),(),,( , ttyyxxItyxI  0),,(),(),,(),(),,(  tyxIyxvtyxIyxutyxI tyx t x yxu   ),( t y yxv   ),(
  • 23.
    • Optical Flow– Lucas Kanade Method • Apply it within a local patch • Minimize using Least-Squares method    yx tyx IvtyxIutyxIvuE , 2 ),,(),,(),(                           ty tx yyx yxx II II v u III III 2 2 bA u    bAAA TT 1 u   
  • 24.
    • Optical Flow– Horn Schunk Method • Assumption: global smoothness in the flow over the whole image     dydxvvuuE D yxyxs   2222Smoothness error:   dydxIvIuIE D tyxc   2Error in brightness constancy equation sc EE Minimize:
  • 25.
    • Histogram ofOriented Optical Flow from Depth • Binning according to: – Primary Angle between the flow vector and the horizontal axis – Magnitude of the flow vector • Orientation & Magnitude images Histogram Binning example with bin size = 4 )(tan 1 u v  22 vuM 
  • 26.
    • Signal Warping –If it is a longer action instance -> Discard frames – If it is a shorter action instance -> Replicate and insert frames Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 27.
    • Pyramidal HOOFDFeatures – Histogram of Oriented Optical Flow from Depth After obtaining optical flows patches 1. Patches are extracted around each joint Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 28.
    • Pyramidal HOOFDFeatures – Histogram of Oriented Optical Flow from Depth After obtaining optical flows patches 1. Patches are extracted around each joint 2. HOOFDs are calculated in a pyramidal fashion Level 2 Level 3 Level 1 Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 29.
    Level 2 Level 3 Level1 Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 30.
    Level 2 Level 3 Level1 Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 31.
    • Supervised learning methods –Training examples are attached to known classes • Spam filtering on an e-mail client – Examples: Naive Bayes, Support Vector Machines Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 32.
    • Naive BayesClassifier – Independence assumption between features • For example: a car ‘Volkswagen’ with a red color and 17 inch wheels and these features contribute independently to classify that this car is a ‘Volkswagen’ Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 33.
    • Support VectorMachines – Calculates the choice of the most optimal hyperplane that defines the decision boundary between two classes Acquiring Depth Data Feature Extraction Feature Representation Classification
  • 34.
    • Introduction toHuman Action Recognition – Motivation, Applications – Related Work • Action Recognition Using 3D Joint Information and HOOFD Features – Acquiring Depth Data – Feature Extraction • 3D Joints • HOOFD – Feature Representation – Classification • Experiments – Datasets • MSR Action 3D Dataset • MSR Action Pairs Dataset • MSRC-12 Gesture Dataset • Conclusions & Future Work Outline
  • 35.
    • Datasets – MSRAction 3D • 10 Subjects • 20 Actions – MSR Pairs 3D • 10 Subjects • 12 Actions – MSRC-12 Gesture • 30 Subjects • 12 Actions Experiments
  • 36.
    Experiment - 1 Settings •Dataset: MSRC-12 Gesture • Feature: Joint Features • Ratio: • Leave-one-subject-out-cross-valuation • 50% Training 50% Test • 75% Training 25% Test
  • 37.
  • 38.
  • 39.
    Experiment - 2 Settings •Feature: HOOFD Features • Dataset: MSR Action 3D • Ratio: 50% Training 50% Test
  • 40.
    Experiment - 2 Settings •Feature: HOOFD Features • Dataset: MSR Action 3D • Ratio: 50% Training 50% Test
  • 41.
    Experiment - 2 Settings •Feature: HOOFD Features • Dataset: MSR Action 3D • Ratio: 50% Training 50% Test Smash Action Forward Punch Action
  • 42.
    Experiment - 3 Settings •Feature: HOOFD Features • Dataset: MSR Action Pairs • Ratio: 50% Training 50% Test
  • 43.
    Conclusion & FutureWork • We developed a novel human action recognition framework by fusing 3D Joint information and HOOFD features • We proposed a new feature called Histogram of Oriented Optical Flow from Depth (HOOFD) • Several experiments with publicly available datasets were conducted to assess the performance of the proposed technique. • Comparison with state-of-the-art algorithms show the success of our algorithm. • As future work, – Potential of HOOFD will be fully explored – Different popular classification approaches will be employed (Bag of Words, Random Forest, Boosted Trees)
  • 44.

Editor's Notes

  • #23 Brightness values of individual pixels on a local patch are preserved. By linearizing the equation around I(x,y,t) using Taylor series expansion we obtained the second equation
  • #24 Even though we assumed that the equation is equal to 0, practically it is not. We then discretize the equation and applied it within a local patch and we acquired this cost function Minimizing this function using least squares gives us the optical flow vectors as a result
  • #25 However in the literature there is also another method proposed by Horn and Schunk, which introduced a global smoothness constraid over the whole image. This is a useful method to correct errors that is caused by the gaps and holes on depth data. Smoothness is introduced by minimizing the velocities, optical flow vectors
  • #41 HON4D: To make the descriptors more discriminative, they quantized the 4D space using the vertices of a polychoron Dictionary Learning – Group Sparsity Geometric Constraint with Temporal Pyramid Matching