SlideShare a Scribd company logo
1 of 62
Classic Video Datasets and
Algorithms
Qasim Nadeem, Divya Thuremella
Overview
● Background
● Action Recognition Dense Trajectories Paper
○ Main Idea
○ Ways to Track
○ Dense Trajectories
○ Descriptors
● Datasets
○ KTH
○ UCF 101
○ Hollywood-2
○ HMDB
● Evaluation
Background: Optical Flow (and
Lukas Kanade Tracking)
Optical Flow
● 2D vector field describing apparent motion in
images
Calculating Optical Flow
Calculating Optical Flow
Paper: Action Recognition by
Dense Trajectories
Main Idea: Videos as Trajectories
● Dense trajectories = represent video as
3D (height x width x time) voxels
○ each voxel is connected to other voxels
where this object was previously
Main Idea: Videos as Trajectories
Ways to Track
1)Lukas Kanade Tracker
a) Baseline
2) Find SIFT features and match between frames
a) Too expensive
3) Dense Trajectories (proposed method)
Ways to Track
1) Lukas Kanade Tracker
a) Baseline that Dense Trajectories method is being compared
to in analysis section
2) Find SIFT features and match from frame to frame
a) Too expensive and hard to get enough features
3) Dense Trajectories (proposed method)
Ways to Track
1) Lukas Kanade Tracker
a) Baseline
2)Find SIFT features and match between
frames
a) Too expensive
3) Dense Trajectories (proposed method)
Ways to Track
1) Lukas Kanade Tracker
a) Baseline that Dense Trajectories method is being compared
to in analysis section
2) Find SIFT features and match from fr
Ways to Track
1) Lukas Kanade Tracker
a) Baseline
2) Find SIFT features and match between frames
a) Too expensive
3) Dense Trajectories (proposed method)
Ways to Track
1) Lukas Kanade Tracker
a) Baseline that Dense Trajectories method is being compared
to in analysis section
2) Find SIFT features and match from frame to frame
a) Too expensive and hard to get enough features
3) Dense Trajectories (proposed method)
<--Best
Method
Dense Trajectories
● Image → WxW grid, tracked in each scale
● If not connected to prev track, start a new one
● update each point in box with median of all
points in that box
Dense Trajectories
● Image → WxW grid, tracked in each
scale
● If not connected to prev track, start a new one
● update each point in box with median of all
points in that box
Dense Trajectories
● Image → WxW grid, tracked in each
scale
Each box at each scale is
tracked separately
Dense Trajectories
● Image → WxW grid, tracked in each scale
● If not connected to prev track, start a
new one
● update each point in box with median of all
points in that box
Dense Trajectories
● if not connected to a prev
track, start a new one
● when something moves
○ the track it moves to ends
○ the track it moves from replaces
it
○ new track starts where it moves
Dense Trajectories
● if not connected to a prev
track, start a new one
○ the length of a trajectory is
limited to L frames
■ when trajectory > L
(length), it’s removed
■ because trajectories drift
Dense Trajectories
● update each point in box with median of all
optical flow (u,v) vectors of that box
Trajectories → Descriptors
Descriptors
● Trajectory
○ Displacement vector
● HOG
● HOF
● MBH
Descriptors
● HOG
○ Apply x and y derivative filters →
(Ix, Iy) → (magnitude, direction) →
histogram them over NxN pixels
● HOF
● MBH
Descriptors
● HOG:
Descriptors
● HOG
● HOF
○ Same as HOG but instead of (Ix, Iy), use
optical flow (u, v) = (It/Ix, It/Iy) → (mag, dir)
→ histogram over NxN pixels
● MBH
Descriptors for the voxels of trajectories
● HOG
○ Apply x and y derivative filters → (Ix, Iy) → (magnitude, direction)
→ histogram them over NxN pxels
● HOF
○ Same as HOG but instead of (Ix, Iy), use optical flow (u,
v) = (It/Ix, It/Iy) → (mag, dir) → histogram over NxN
pixels
● MBH
○ Same as HOF but histogram is weighted by the magnitude
Descriptors
● HOG
● HOF
● MBH
○ Same as HOF but histogram is
weighted by the magnitude and
expressed as (Ix, Iy)
Descriptors for the voxels of trajectories
● HOG
○ Apply x and y derivative filters → (Ix, Iy) → (magnitude, direction)
→ histogram them over NxN pxels
● HOF
○ Same as HOG but instead of (Ix, Iy), use optical flow (u, v) =
(It/Ix, It/Iy) → (mag, dir) → histogram over NxN pixels
● MBH
○ Same as HOF but histogram is weighted by
the magnitude
Specifics of Experimental Setup
● Sampling step size of W = 5 is dense enough to
give good results
● Used 8 spatial scales spaced by 1/ √ 2
● Experimentally, trajectory length L = 15 frames
● Voxel is nσ ×nσ ×nτ where nσ = 2, nτ = 3
Specifics of Evaluation Setup
● take 100,000 random samples of hog
descriptors from all the training videos
● k-means cluster them into 4,000 words
● map any new descriptor to those words
● Classify using a non-linear SVM with a chi-
squared kernel
4 Important Datasets
● KTH Human Actions (2004)
● Hollywood-2 (2009)
● HMDB-51: Human Motion DataBase (2011)
● UCF 101 (2012)
KTH Human Actions (2004)
● ‘Simple’/’Controlled’ clips intentionally captured
● Total 2391 clips
● 25 FPS; Avg length of 4 sec (~100 frame clips)
● 160x120 spatial resolution
● Homogeneous background
● Static camera
KTH Human Actions (2004)
● 6 action classes; 4 scenarios; 25 actors;
● Homogeneous background; static camera
Hollywood-2 (2009)
● 12 action classes; 10 scene classes; total 3669 clips
● ~20.1 hours of video in total
● Clips from 69 Hollywood Movies (different movies for
test & train)
● Automatic action annotation (!)
● Manual verification afterwards to clean-up
Hollywood-2 (2009)
● Scripts describe with scenes, characters, transcribed
dialogs and human action (free online websites..)
● Subtitles have time information but only precise
speech
● Align speech sections between subtitles and scripts
● Transfer time information to scene descriptions in
scripts
Hollywood-2 (2009)
A “Text” Action Classifier
● Train a Regularized Perceptron text
classifier for each action class
● Assign action labels to scene descriptions
● Does much better than hand-tuned
regular-expression matching
Hollywood-2 (2009)
Hollywood-2 (2009)
● Sample video clips can contain multiple actions (probably true of other
datasets too)
● Also gave conditional probability estimates: p(scene|action) and
p(action|scene) using the movie scripts for clips not cleaned
HMDB (2011)
● 7000 manually annotated clips from YouTube & Movies
● 51 action classes (>= 100 clips each)
● 90+% accuracy on existing popular datasets (KTH, Weizmann etc)
● Interesting experiment to show that HMDB’s action categories mainly
differ in motion rather than static poses
● Contrary to UCF-50 & Hollywood2: “solvable” with static information
alone
● Shown on Left:
i) Hand-waving
ii) Drinking
iii) Sword Fighting
iv) Diving
v) Running
vi) Kicking
● Large variation in
camera
viewpoint/motion,
cluttered background,
position/scale/appear
ance of actors
UCF 101 (2012)
● “Realistic” action videos taken from YouTube
● Extension of earlier UCF-50
● Examples: ...Apply Eye Makeup, Archery, Baby Crawling, Blowing
Candles, Body Weight Squats, Boxing Punching Bag, Hammering,
Handstand Push-ups, Handstand Walking, Walking with a dog, Wall
Push-ups…
● ~Twice as big as UCF-50, HMDB-51
● Authors’ claim: “Most challenging data set to date”;
Largest diversity in actions, variations in camera motion, object
appearance/pose/scale/viewpoint, cluttered background,
illumination conditions..
● Authors’ Baseline result (w/ standard BOW approach): 43.90%
UCF 101 (2012)
● 5 broad categories of actions
(101):
i) Human-Object Interaction
ii) Body-Motion Only
iii) Human- Human Interaction
iv) Playing Musical
Instruments
v) Sports
● 25 Groups (4-7 videos): clips
with commonalities (similar
background, viewpoint etc)
Number
Of
Videos
Clip
Lengths
Evaluation
Action Recognition by Dense Trajectories by
Wang, Klaser, Schmid, Cheng-Lin CVPR’11
Results on four Datasets
Comparing to the state-of-
the-art
Per-class accuracy analysis on YouTube
Per-class AP analysis on Hollywood2
Varying hyper-parameters of the System
● L (Trajectory Length)
● W (Step Size)
● N (Neighborhood Size)
● n_sigma * n_sigma * n_tau (Grid Structure)
Improvements made by IDT:
1. Uses 2 types of features:
○ SURF focuses on blob-type structures,
○ Harris Corner Detector fires on corners edges
2. Remove majority trajectories
○ RANSAC on optical flow,SURF, & Harris
○ Correct for majority homography
○ threshold magnitude of the (u, v)
Improvements made by IDT:
1. Uses 2 types of features:
○ SURF focuses on blob-type structures,
○ Harris Corner Detector fires on corners and edges
2. Estimate the camera homography using RANSAC & correct for it
○ Ransac on optical flow & feature transforms of SURF & Harris
○ Correct for majority homography
○ remove majority trajectories by thresholding magnitude of the
displacement vectors
More Improvements made by IDT:
3. Human detector bounding box
○ mask to remove feature matches inside for
when homography
4. Fisher Vector > Bag of Features
○ Uses PCA, Gaussian Mixture model, then
classifies by Linear SVM
● ...
Thoughts on the paper (!!)
;

More Related Content

Similar to Classic video datasets and algorithms.pptx

Action_recognition-topic.pptx
Action_recognition-topic.pptxAction_recognition-topic.pptx
Action_recognition-topic.pptxcomputerscience98
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniquesMazin Alwaaly
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
SPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationSPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationActiveEon
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachSymeon Papadopoulos
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachREVEAL - Social Media Verification
 
BallCatchingRobot
BallCatchingRobotBallCatchingRobot
BallCatchingRobotgauravbrd
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithmsArmando Vieira
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationGeorge Awad
 
People counting in low density video sequences2
People counting in low density video sequences2People counting in low density video sequences2
People counting in low density video sequences2Ahmed Tememe
 

Similar to Classic video datasets and algorithms.pptx (20)

Temporal Segment Network
Temporal Segment NetworkTemporal Segment Network
Temporal Segment Network
 
Action_recognition-topic.pptx
Action_recognition-topic.pptxAction_recognition-topic.pptx
Action_recognition-topic.pptx
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniques
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
SPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationSPPRA'2013 Paper Presentation
SPPRA'2013 Paper Presentation
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
BallCatchingRobot
BallCatchingRobotBallCatchingRobot
BallCatchingRobot
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Spatio-temporal reasoning for traffic scene understanding
Spatio-temporal reasoning for traffic scene understandingSpatio-temporal reasoning for traffic scene understanding
Spatio-temporal reasoning for traffic scene understanding
 
WaveNet
WaveNetWaveNet
WaveNet
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
Presentation sim opencv
Presentation sim opencvPresentation sim opencv
Presentation sim opencv
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
Mvs adas
Mvs adasMvs adas
Mvs adas
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept Localization
 
Isvc08
Isvc08Isvc08
Isvc08
 
People counting in low density video sequences2
People counting in low density video sequences2People counting in low density video sequences2
People counting in low density video sequences2
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Classic video datasets and algorithms.pptx

  • 1. Classic Video Datasets and Algorithms Qasim Nadeem, Divya Thuremella
  • 2. Overview ● Background ● Action Recognition Dense Trajectories Paper ○ Main Idea ○ Ways to Track ○ Dense Trajectories ○ Descriptors ● Datasets ○ KTH ○ UCF 101 ○ Hollywood-2 ○ HMDB ● Evaluation
  • 3. Background: Optical Flow (and Lukas Kanade Tracking)
  • 4. Optical Flow ● 2D vector field describing apparent motion in images
  • 7. Paper: Action Recognition by Dense Trajectories
  • 8. Main Idea: Videos as Trajectories ● Dense trajectories = represent video as 3D (height x width x time) voxels ○ each voxel is connected to other voxels where this object was previously
  • 9. Main Idea: Videos as Trajectories
  • 10. Ways to Track 1)Lukas Kanade Tracker a) Baseline 2) Find SIFT features and match between frames a) Too expensive 3) Dense Trajectories (proposed method)
  • 11. Ways to Track 1) Lukas Kanade Tracker a) Baseline that Dense Trajectories method is being compared to in analysis section 2) Find SIFT features and match from frame to frame a) Too expensive and hard to get enough features 3) Dense Trajectories (proposed method)
  • 12. Ways to Track 1) Lukas Kanade Tracker a) Baseline 2)Find SIFT features and match between frames a) Too expensive 3) Dense Trajectories (proposed method)
  • 13. Ways to Track 1) Lukas Kanade Tracker a) Baseline that Dense Trajectories method is being compared to in analysis section 2) Find SIFT features and match from fr
  • 14. Ways to Track 1) Lukas Kanade Tracker a) Baseline 2) Find SIFT features and match between frames a) Too expensive 3) Dense Trajectories (proposed method)
  • 15. Ways to Track 1) Lukas Kanade Tracker a) Baseline that Dense Trajectories method is being compared to in analysis section 2) Find SIFT features and match from frame to frame a) Too expensive and hard to get enough features 3) Dense Trajectories (proposed method) <--Best Method
  • 16. Dense Trajectories ● Image → WxW grid, tracked in each scale ● If not connected to prev track, start a new one ● update each point in box with median of all points in that box
  • 17. Dense Trajectories ● Image → WxW grid, tracked in each scale ● If not connected to prev track, start a new one ● update each point in box with median of all points in that box
  • 18. Dense Trajectories ● Image → WxW grid, tracked in each scale Each box at each scale is tracked separately
  • 19. Dense Trajectories ● Image → WxW grid, tracked in each scale ● If not connected to prev track, start a new one ● update each point in box with median of all points in that box
  • 20. Dense Trajectories ● if not connected to a prev track, start a new one ● when something moves ○ the track it moves to ends ○ the track it moves from replaces it ○ new track starts where it moves
  • 21. Dense Trajectories ● if not connected to a prev track, start a new one ○ the length of a trajectory is limited to L frames ■ when trajectory > L (length), it’s removed ■ because trajectories drift
  • 22. Dense Trajectories ● update each point in box with median of all optical flow (u,v) vectors of that box
  • 24. Descriptors ● Trajectory ○ Displacement vector ● HOG ● HOF ● MBH
  • 25. Descriptors ● HOG ○ Apply x and y derivative filters → (Ix, Iy) → (magnitude, direction) → histogram them over NxN pixels ● HOF ● MBH
  • 27. Descriptors ● HOG ● HOF ○ Same as HOG but instead of (Ix, Iy), use optical flow (u, v) = (It/Ix, It/Iy) → (mag, dir) → histogram over NxN pixels ● MBH
  • 28. Descriptors for the voxels of trajectories ● HOG ○ Apply x and y derivative filters → (Ix, Iy) → (magnitude, direction) → histogram them over NxN pxels ● HOF ○ Same as HOG but instead of (Ix, Iy), use optical flow (u, v) = (It/Ix, It/Iy) → (mag, dir) → histogram over NxN pixels ● MBH ○ Same as HOF but histogram is weighted by the magnitude
  • 29. Descriptors ● HOG ● HOF ● MBH ○ Same as HOF but histogram is weighted by the magnitude and expressed as (Ix, Iy)
  • 30. Descriptors for the voxels of trajectories ● HOG ○ Apply x and y derivative filters → (Ix, Iy) → (magnitude, direction) → histogram them over NxN pxels ● HOF ○ Same as HOG but instead of (Ix, Iy), use optical flow (u, v) = (It/Ix, It/Iy) → (mag, dir) → histogram over NxN pixels ● MBH ○ Same as HOF but histogram is weighted by the magnitude
  • 31. Specifics of Experimental Setup ● Sampling step size of W = 5 is dense enough to give good results ● Used 8 spatial scales spaced by 1/ √ 2 ● Experimentally, trajectory length L = 15 frames ● Voxel is nσ ×nσ ×nτ where nσ = 2, nτ = 3
  • 32. Specifics of Evaluation Setup ● take 100,000 random samples of hog descriptors from all the training videos ● k-means cluster them into 4,000 words ● map any new descriptor to those words ● Classify using a non-linear SVM with a chi- squared kernel
  • 33. 4 Important Datasets ● KTH Human Actions (2004) ● Hollywood-2 (2009) ● HMDB-51: Human Motion DataBase (2011) ● UCF 101 (2012)
  • 34. KTH Human Actions (2004) ● ‘Simple’/’Controlled’ clips intentionally captured ● Total 2391 clips ● 25 FPS; Avg length of 4 sec (~100 frame clips) ● 160x120 spatial resolution ● Homogeneous background ● Static camera
  • 35. KTH Human Actions (2004) ● 6 action classes; 4 scenarios; 25 actors; ● Homogeneous background; static camera
  • 36. Hollywood-2 (2009) ● 12 action classes; 10 scene classes; total 3669 clips ● ~20.1 hours of video in total ● Clips from 69 Hollywood Movies (different movies for test & train) ● Automatic action annotation (!) ● Manual verification afterwards to clean-up
  • 37. Hollywood-2 (2009) ● Scripts describe with scenes, characters, transcribed dialogs and human action (free online websites..) ● Subtitles have time information but only precise speech ● Align speech sections between subtitles and scripts ● Transfer time information to scene descriptions in scripts
  • 39. A “Text” Action Classifier ● Train a Regularized Perceptron text classifier for each action class ● Assign action labels to scene descriptions ● Does much better than hand-tuned regular-expression matching
  • 41. Hollywood-2 (2009) ● Sample video clips can contain multiple actions (probably true of other datasets too) ● Also gave conditional probability estimates: p(scene|action) and p(action|scene) using the movie scripts for clips not cleaned
  • 42.
  • 43.
  • 44. HMDB (2011) ● 7000 manually annotated clips from YouTube & Movies ● 51 action classes (>= 100 clips each) ● 90+% accuracy on existing popular datasets (KTH, Weizmann etc) ● Interesting experiment to show that HMDB’s action categories mainly differ in motion rather than static poses ● Contrary to UCF-50 & Hollywood2: “solvable” with static information alone
  • 45. ● Shown on Left: i) Hand-waving ii) Drinking iii) Sword Fighting iv) Diving v) Running vi) Kicking ● Large variation in camera viewpoint/motion, cluttered background, position/scale/appear ance of actors
  • 46. UCF 101 (2012) ● “Realistic” action videos taken from YouTube ● Extension of earlier UCF-50 ● Examples: ...Apply Eye Makeup, Archery, Baby Crawling, Blowing Candles, Body Weight Squats, Boxing Punching Bag, Hammering, Handstand Push-ups, Handstand Walking, Walking with a dog, Wall Push-ups… ● ~Twice as big as UCF-50, HMDB-51 ● Authors’ claim: “Most challenging data set to date”; Largest diversity in actions, variations in camera motion, object appearance/pose/scale/viewpoint, cluttered background, illumination conditions.. ● Authors’ Baseline result (w/ standard BOW approach): 43.90%
  • 47. UCF 101 (2012) ● 5 broad categories of actions (101): i) Human-Object Interaction ii) Body-Motion Only iii) Human- Human Interaction iv) Playing Musical Instruments v) Sports ● 25 Groups (4-7 videos): clips with commonalities (similar background, viewpoint etc)
  • 48.
  • 51. Evaluation Action Recognition by Dense Trajectories by Wang, Klaser, Schmid, Cheng-Lin CVPR’11
  • 52. Results on four Datasets
  • 53. Comparing to the state-of- the-art
  • 55. Per-class AP analysis on Hollywood2
  • 56. Varying hyper-parameters of the System ● L (Trajectory Length) ● W (Step Size) ● N (Neighborhood Size) ● n_sigma * n_sigma * n_tau (Grid Structure)
  • 57.
  • 58. Improvements made by IDT: 1. Uses 2 types of features: ○ SURF focuses on blob-type structures, ○ Harris Corner Detector fires on corners edges 2. Remove majority trajectories ○ RANSAC on optical flow,SURF, & Harris ○ Correct for majority homography ○ threshold magnitude of the (u, v)
  • 59. Improvements made by IDT: 1. Uses 2 types of features: ○ SURF focuses on blob-type structures, ○ Harris Corner Detector fires on corners and edges 2. Estimate the camera homography using RANSAC & correct for it ○ Ransac on optical flow & feature transforms of SURF & Harris ○ Correct for majority homography ○ remove majority trajectories by thresholding magnitude of the displacement vectors
  • 60. More Improvements made by IDT: 3. Human detector bounding box ○ mask to remove feature matches inside for when homography 4. Fisher Vector > Bag of Features ○ Uses PCA, Gaussian Mixture model, then classifies by Linear SVM
  • 61. ● ... Thoughts on the paper (!!)
  • 62. ;

Editor's Notes

  1. Put slide scroll back in again
  2. Generally explain what’s going on Slide in between
  3. Specific parameters the paper found best
  4. Specific parameters the paper found best
  5. STATS: Common Objects in Context (COCO). A collection of more than 120 thousand images with descriptions (C5, C40) Flickr 8K. A collection of 8 thousand described images taken from flickr.com. Flickr 30K. A collection of 30 thousand described images taken from flickr.com.
  6. ..
  7. ..
  8. ..
  9. ..
  10. ..
  11. ..
  12. ..
  13. ..
  14. ..
  15. ..
  16. -Each clip was validated by at least two human observers -using Amazon Mechanical Turk, 14 joint locations manually annotated at every frame for 5 randomly selected clips from each of 9 action categories of the UCF Sports dataset. Classifying the features derived from the joint locations at single frames results in a recognition rate above 98% (chance level 11%) -For HMDB, jjoint locations at single frames now reaches only 35% (chance level 10%)
  17. -Argued that 90+% accuracy w/ known methods on existing popular datasets (KTH, Weizmann etc) -Each clip was validated by at least two human observers
  18. ..
  19. ..
  20. ..
  21. ..
  22. ..
  23. ROUGH DESCRIPTION: [9] => 2010 paper (Boston Univ.) An approach for human action recognition that integrates multiple feature channels from several entities such as objects, scenes and people. Formulate the problem in a multiple instance learning (MIL) framework, based on multiple feature channels. And use a discriminative approach. Youtube dataset is also ‘weakly annotated’ in the sense that it’s not known where/when the action in clip occurs (possibly issue with most datasets). [31] => 2010 paper (INRIA in France) Built upon the bag-of-features approach with non-local cues by i) decompose video into region classes & ii) augment local features with region class labels. Regions by: (i) motion-based foreground segmentation, (ii) person detection, (iii) static action detection and (iv) object detection
  24. Better accuracy on 8/11 action classes on YouTube
  25. Better Average-Precision on 8/12 action classes on Hollywood2
  26. => L is the number of frames “tracked” in a trajectory. After L frames, trajectory removed from tracking process =>
  27. Experiments done on validation sets of Youtube/Hollywood2 to see results of varying hyper-parameters