Your SlideShare is downloading. ×
0
From Unsupervised to
Semi-Supervised Event Detection
Wen-Sheng Chu
Robotics Institute, Carnegie Mellon University
July 9, ...
Outline
1. Unsupervised Temporal Commonality
Discovery
(Chu et al, ECCV’12)
2. Personalized Facial Action Unit Detection
(...
Unsupervised Commonality Discovery
in Images
Where are the repeated patterns?
3
(Chu’10, Mukherjee’11, Collins’12)
Unsupervised Commonality Discovery
in Videos?
• We name it Temporal Commonality Discovery (TCD).
• Goal: Given two videos,...
TCD is hard!
1) No prior knowledge on commonalities
– We do not know what, where and how many
commonalities exist in the v...
Formulation
6
Integer programming!
Optimization: Interpretation
7
Optimization: Native Search
Complexity 8
Optimization: Branch-and-Bound
• Similar to the idea of ESS (Lampert’08), we search the
space by splitting intervals.
9
Optimization: Branch-and-Bound
• Bounding histogram bins
10
1. Bounding L1 distance:
2. Intersection similarity:
3. X2 distance:
Optimization: Branch-and-Bound
11
Unlikely
search
regions
(B1,E1,B2,E2; -10)
Searching Structure
(B1,E1,B2,E2; 32)
Priority queue
(sorted by bound scores)
…...
(B1,E1,B2,E2; -105)
Algorithm
(B1,E1,B2,E2; 32)
Priority queue
(sorted by bound scores)
…
(B1,E1,B2,E2; -50)
(B1,E1,B2,E2;...
(B1,E1,B2,E2; -105)
Algorithm
(B1,E1,B2,E2; 32)
Priority queue
(sorted by bound scores)
…
(B1,E1,B2,E2; -50)
Top state
(B1...
Algorithm
(B1,E1,B2,E2; 32)
Priority queue
(sorted by bound scores)
…
(B1,E1,B2,E2; -50)
Top state
(B1,E’1,B2,E2; -76)
(B1...
Compare with Relevant Work
1. Difference between TCD and ESS
[1]/STBB[2]
– Different learning framework:
• Unsupervised v....
Experiment (1): Synthesized Sequence
Histograms of the discovered
pair of subsequences
17
Experiment (2):
Discover Common Facial Actions
• RU-FACS dataset*
– Interview videos with 29 subjects
– 5000~8000 frames/v...
Experiment (2):
Discover Common Facial Actions
19
• Parametric settings for Sliding Windows (SW)
• Log of #evaluations:
• Quality of discovered patterns:
• a
Experiment (2)...
Experiment (2):
Discover Common Facial Actions
• Compare with LCCS* on -distance
21
* “Frame-level temporal calibration of...
Experiment (3): Discover
Multiple Common Human Motions
• CMU-Mocap dataset:
– http://mocap.cs.cmu.edu/
• 15 sequences from...
Experiment (3): Discover
Multiple Common Human Motions
23
Experiment (3): Discover
Multiple Common Human Motions
• Compare with LCCS* on -distance
24
Extension: Video Indexing
• Goal: Given a query , find the best common
subsequence in the target video
• A straightforward...
A Prototype for Video Indexing
26
Summary
27
Questions?
[1+ “Common Visual Pattern Discovery via Spatially Coherent
Correspondences,” In CVPR 2010.
[2+ “MOMI-cosegment...
Outline
1. Unsupervised Temporal Commonality
Discovery
(Chu et al, ECCV’12)
2. Selective Transfer Machine for Personalized...
AU 6+12
Facial Action Units (AU)
30
Main Idea
31
Related Work: Features
32
Related Work: Classifiers
33
Feature Bias
Person specific!
34
Occurrence Bias
35
Selective Transfer Machine (STM)
Formulation
Maximizes margin of penalized SVM
Minimize distribution mismatch
36
Goal (1): Maximize penalized SVM margin
margin
penalized loss
37
Goal (2): Minimize Distribution Mismatch
• Kernel Mean Matching (KMM)*
38
* “Covariate shift by kernel mean matching”, Dat...
Goal (2): Minimize Distribution Mismatch
Groundtruth
Bad estimator
for testing data!
39
Better fitting!
Groundtruth
Selection by reweighting
training data
40
Goal (2): Minimize Distribution Mismatch
41
42
Optimization: Alternate Convex Search
43
Optimization: Alternative Convex Search
Compare with Relevant Work
44
[1] "Covariate shift by kernel mean matching," Dataset shift in machine
learning, 2009.
[2] ...
Experiments
• Features
– SIFT descriptors on 49 facial landmarks
– Preserve 98% energy using PCA
45
Datasets #Subjects #Vi...
Experiment (1): Synthetic Data
46
• Two protocols
– PS1: train/test are separate data of the same subject
– PS2: training subjects include test subject (sam...
Experiment (2): Selection Ability of STM
48
• 123 subjects, 597 videos, ~20 frames/video
Experiment (3): CK+
49
Experiment (4): GEMEP-FERA
50
• 7 subjects, 87 videos, 20~60 frames/video
• 29 subjects, 29 videos, 5000~7000 frames/vid
Experiment (5): RU-FACS
51
Summary
• Person-specific biases exist among face-
related problems, esp. facial expression
• We propose to alleviate the ...
Questions?
[1] "Covariate shift by kernel mean matching," Dataset shift in machine
learning, 2009.
[2] "Transductive infer...
Upcoming SlideShare
Loading in...5
×

From Unsupervised to Semi-Supervised Event Detection

317

Published on

A joint presentation of a ECCV'12 and a CVPR'13

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
317
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "From Unsupervised to Semi-Supervised Event Detection"

  1. 1. From Unsupervised to Semi-Supervised Event Detection Wen-Sheng Chu Robotics Institute, Carnegie Mellon University July 9, 2013 1 Jeffery CohnFernando De la Torre
  2. 2. Outline 1. Unsupervised Temporal Commonality Discovery (Chu et al, ECCV’12) 2. Personalized Facial Action Unit Detection (Chu et al, CVPR’13) 2
  3. 3. Unsupervised Commonality Discovery in Images Where are the repeated patterns? 3 (Chu’10, Mukherjee’11, Collins’12)
  4. 4. Unsupervised Commonality Discovery in Videos? • We name it Temporal Commonality Discovery (TCD). • Goal: Given two videos, discover common events in an unsupervised fashion. 4
  5. 5. TCD is hard! 1) No prior knowledge on commonalities – We do not know what, where and how many commonalities exist in the video 2) Exhaustive search are computationally prohibitive – E.g., two videos with 300 frames have >8,000,000,000 possible matches. possible locations possible lengths possibilities/sequence Another possibilities/sequence 5
  6. 6. Formulation 6 Integer programming!
  7. 7. Optimization: Interpretation 7
  8. 8. Optimization: Native Search Complexity 8
  9. 9. Optimization: Branch-and-Bound • Similar to the idea of ESS (Lampert’08), we search the space by splitting intervals. 9
  10. 10. Optimization: Branch-and-Bound • Bounding histogram bins 10
  11. 11. 1. Bounding L1 distance: 2. Intersection similarity: 3. X2 distance: Optimization: Branch-and-Bound 11
  12. 12. Unlikely search regions (B1,E1,B2,E2; -10) Searching Structure (B1,E1,B2,E2; 32) Priority queue (sorted by bound scores) … (B1,E1,B2,E2; -50) (B1,E1,B2,E2; -105) State S = (Rectangle set; score) 12
  13. 13. (B1,E1,B2,E2; -105) Algorithm (B1,E1,B2,E2; 32) Priority queue (sorted by bound scores) … (B1,E1,B2,E2; -50) (B1,E1,B2,E2; -105) Top state 1. Pop out the top state 2. Split 13
  14. 14. (B1,E1,B2,E2; -105) Algorithm (B1,E1,B2,E2; 32) Priority queue (sorted by bound scores) … (B1,E1,B2,E2; -50) Top state (B1,E’1,B2,E2; -76) (B1,E’’1,B2,E2; -61) 3. Compute bounding scores 4. Push back the split states 14
  15. 15. Algorithm (B1,E1,B2,E2; 32) Priority queue (sorted by bound scores) … (B1,E1,B2,E2; -50) Top state (B1,E’1,B2,E2; -76) (B1,E’’1,B2,E2; -61) • The algorithm stop when the top state contains an unique rectangle. Omit most of the search space with large distances 15
  16. 16. Compare with Relevant Work 1. Difference between TCD and ESS [1]/STBB[2] – Different learning framework: • Unsupervised v.s. Supervised – New bounding functions for TCD 2. Difference between TCD and [3] – Different objective: • Commonality Discovery v.s. Temporal Clustering [1] “Efficient subwindow search: A branch and bound framework for object localization”, PAMI 2009. [2] “Discriminative video pattern search for efficient action detection”, PAMI 2011. 16
  17. 17. Experiment (1): Synthesized Sequence Histograms of the discovered pair of subsequences 17
  18. 18. Experiment (2): Discover Common Facial Actions • RU-FACS dataset* – Interview videos with 29 subjects – 5000~8000 frames/video – Collect 100 segments that containing smiley mouths (AU- 12) – Evaluate in terms of averaged precision 18 * “Automatic recognition of facial actions in spontaneous expressions”, Journal of Multimedia 2006.
  19. 19. Experiment (2): Discover Common Facial Actions 19
  20. 20. • Parametric settings for Sliding Windows (SW) • Log of #evaluations: • Quality of discovered patterns: • a Experiment (2): Speed Evaluation Speed #evaluation of the distance function´ log nT C D nSW i d(r SW i ) ¡ d(r T C D ) 20
  21. 21. Experiment (2): Discover Common Facial Actions • Compare with LCCS* on -distance 21 * “Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence”, ICASSP 2009.
  22. 22. Experiment (3): Discover Multiple Common Human Motions • CMU-Mocap dataset: – http://mocap.cs.cmu.edu/ • 15 sequences from Subject 86 • 1200~2600 frames and up to 10 actions/seq • Exclude the comparison with SW because it needs >1012 evaluations 22
  23. 23. Experiment (3): Discover Multiple Common Human Motions 23
  24. 24. Experiment (3): Discover Multiple Common Human Motions • Compare with LCCS* on -distance 24
  25. 25. Extension: Video Indexing • Goal: Given a query , find the best common subsequence in the target video • A straightforward extension: Temporal Search Space 25
  26. 26. A Prototype for Video Indexing 26
  27. 27. Summary 27
  28. 28. Questions? [1+ “Common Visual Pattern Discovery via Spatially Coherent Correspondences,” In CVPR 2010. [2+ “MOMI-cosegmentation: simultaneous segmentation of multiple objects among multiple images,” In ACCV 2010. [3+ “Scale invariant cosegmentation for image groups,” In CVPR 2011. [4+ “Random walks based multi-image segmentation: Quasiconvexity results and GPU-based solutions,” In CVPR 2012. [5+ “Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence,” In ICASSP 2009. [6+ “Efficient ESS with submodular score functions,” In CVPR 2011. 28 http://humansensing.cs.cmu.edu/wschu/
  29. 29. Outline 1. Unsupervised Temporal Commonality Discovery (Chu et al, ECCV’12) 2. Selective Transfer Machine for Personalized Facial Action Unit Detection (Chu et al, CVPR’13) 29
  30. 30. AU 6+12 Facial Action Units (AU) 30
  31. 31. Main Idea 31
  32. 32. Related Work: Features 32
  33. 33. Related Work: Classifiers 33
  34. 34. Feature Bias Person specific! 34
  35. 35. Occurrence Bias 35
  36. 36. Selective Transfer Machine (STM) Formulation Maximizes margin of penalized SVM Minimize distribution mismatch 36
  37. 37. Goal (1): Maximize penalized SVM margin margin penalized loss 37
  38. 38. Goal (2): Minimize Distribution Mismatch • Kernel Mean Matching (KMM)* 38 * “Covariate shift by kernel mean matching”, Dataset shift in machine learning, 2009.
  39. 39. Goal (2): Minimize Distribution Mismatch Groundtruth Bad estimator for testing data! 39
  40. 40. Better fitting! Groundtruth Selection by reweighting training data 40 Goal (2): Minimize Distribution Mismatch
  41. 41. 41
  42. 42. 42 Optimization: Alternate Convex Search
  43. 43. 43 Optimization: Alternative Convex Search
  44. 44. Compare with Relevant Work 44 [1] "Covariate shift by kernel mean matching," Dataset shift in machine learning, 2009. [2] "Transductive inference for text classification using support vector machines," In ICML 1999. [3] "Domain adaptation problems: A DASVM classification technique and a circular validation strategy," PAMI 2010.
  45. 45. Experiments • Features – SIFT descriptors on 49 facial landmarks – Preserve 98% energy using PCA 45 Datasets #Subjects #Videos #Frm/vid Content CK+ 123 593 ~20 NeutralPeak GEMEP-FERA 7 87 20~60 Acting RU-FACS 29 29 5000~7500 Interview
  46. 46. Experiment (1): Synthetic Data 46
  47. 47. • Two protocols – PS1: train/test are separate data of the same subject – PS2: training subjects include test subject (same protocol in [2]) • GEMEP-FERA Experiment (2): Comparison with Person- specific (PS) Classifiers 47
  48. 48. Experiment (2): Selection Ability of STM 48
  49. 49. • 123 subjects, 597 videos, ~20 frames/video Experiment (3): CK+ 49
  50. 50. Experiment (4): GEMEP-FERA 50 • 7 subjects, 87 videos, 20~60 frames/video
  51. 51. • 29 subjects, 29 videos, 5000~7000 frames/vid Experiment (5): RU-FACS 51
  52. 52. Summary • Person-specific biases exist among face- related problems, esp. facial expression • We propose to alleviate the biases by personalizing classifiers using STM • Next – Joint optimization in terms of – Reduce the memory cost using SMO – Explore more potential biases in face problems, e.g., occurrence bias 52
  53. 53. Questions? [1] "Covariate shift by kernel mean matching," Dataset shift in machine learning, 2009. [2] "Transductive inference for text classification using support vector machines," In ICML 1999. [3] "Domain adaptation problems: A DASVM classification technique and a circular validation strategy," PAMI 2010. *4+ “Integrating structured biological data by kernel maximum mean discrepancy”, Bioinformatics 2006. *5+ “Meta-analysis of the first facial expression recognition challenge,” IEEE Trans. on Systems, Man, and Cybernetics, Part B, 2012. 53 http://humansensing.cs.cmu.edu/wschu/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×