Juho Kim Phu Nguyen
Sarah Weir Philip J. Guo
Robert C. Miller Krzysztof Z. Gajos
Crowdsourcing Step-by-Step
Information Ex...
how-to videos online
learning from how-to videos:
limited by video player interfaces
Watching Example
Problem in Watching


It’s difficult to navigate to
specific parts you’re interested in.
Problem in Watching


It’s difficult to navigate to
specific parts you’re interested in.
find
repeat
skip
How-to Video: Step-by-Step Nature
Apply
gradient map
Completeness & detail of step-by-step instructions are
integral to task performance.
Eiriksdottir and Catrambone, 2011
Pro...
Design Insight
Enable step-by-step navigation with high interactivity
ToolScape: Step-aware video player
work in progress
images
parts with no
visual progress
step labels & links
enhance existing how-to videos with
step-level interactivity & annotation
Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-...
Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-...
Study: Photoshop Design Tasks
12 novice Photoshop users
manually annotated videos
Baseline ToolScape
With ToolScape, learners will…

H1. feel more confident about their design
skills.
- self-efficacy gain

H2. believe they ...
H1. Higher self-efficacy gain with ToolScape
–  Four 7-Likert scale questions
–  Mann-Whitney’s U test (Z=2.06, p<0.05), e...
H2. Higher self-rating with ToolScape
–  One 7-Likert scale question
–  Mann-Whitney’s U test (Z=2.70, p<0.01), error bar:...
H3. External raters rank ToolScape designs higher.
–  (Ranking: Lower is better)
–  Wilcoxon Signed-rank test (W=317, Z=-2...
Non-sequentially navigating video
Step-level navigation: clicked 8.9 times per task
“It is great for skipping straight to ...
Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-...
Annotations for Step-Aware Video Player
•  step time
•  step label
•  before/after results
Design Goals for Annotation Method
•  domain-independent
•  existing videos
•  untrained annotators
Crowdsourcing
Multi-stage crowdsourcing workflow
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
 VERIFY
 EXPAND
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
 VERIFY
 EXPAND
Input video
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
 VERIFY
 EXPAND
Input video
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
 VERIFY
 EXPAND
Input video
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
 VERIFY
 EXPAND
Input video
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
 VERIFY
 EXPAND
Input video
Output timeline
Stage 1. FIND candidate steps
Labeling a step
Time-based Clustering
Stage 2. VERIFY steps by voting/improving
Quality control for Stage 2
•  Majority voting
•  Breaking ties
– String matching to combine
“similar enough” labels

– Lo...
Stage 3. EXPAND with
before/after images
Quality control for Stage 3
•  Majority voting
•  Breaking ties:
– Pixel diff to combine
“similar enough” frames

– Choose...
Evaluation
•  Generalizable? 

75 Photoshop / Cooking / Makeup videos
•  Accurate?
precision and recall
against trained an...
Across all domains,
~80% precision and recall
Domain
 Precision
 Recall
Cooking
 0.77
 0.84
Makeup
 0.74
 0.77
Photoshop
 ...
Conceptual Level Differences
•  “Now apply the bronzer to your face
evenly”
•  “Apply the bronzer to the forehead”
•  “App...
Timing is 2.7 seconds off on average
Ground truth: one step every 17.3 seconds
2.7 seconds
Cost: $1.07 per minute of video
• 111 HITs / video (3 workers / task)
• $2.50 / video (Find + Verify)
• $4.85 / video (Fin...
Contributions

•  Study: increased interactivity improved
task performance & self-efficacy
•  Crowd video annotation metho...
hierarchical solution structure extraction
Catrambone, R. The subgoal learning model: Creating better examples so that
stu...
hierarchical solution structure extraction
Ongoing Work: Beyond low-level steps
Learnersourcing: learners as a crowd
•  Mo...
Future of How-to Video Learning
What if we had 1000s of
fully annotated videos?
•  Flexible learning paths with multiple v...
Crowdsourcing Step-by-Step Information Extraction to
Enhance Existing How-to Videos
Juho Kim
MIT CSAIL

juhokim@mit.edu

j...
CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
Upcoming SlideShare
Loading in …5
×

CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

788 views

Published on

Millions of learners today use how-to videos to master new skills in a variety of domains. But browsing such videos is often tedious and inefficient because video player interfaces are not optimized for the unique step-by-step structure of such videos. This research aims to improve the learning experience of existing how-to videos with step-by-step annotations.

We first performed a formative study to verify that annotations are actually useful to learners. We created ToolScape, an interactive video player that displays step descriptions and intermediate result thumbnails in the video timeline. Learners in our study performed better and gained more self-efficacy using ToolScape versus a traditional video player.

To add the needed step annotations to existing how-to videos at scale, we introduce a novel crowdsourcing workflow. It extracts step-by-step structure from an existing video, including step times, descriptions, and before and after images. We introduce the Find-Verify-Expand design pattern for temporal and visual annotation, which applies clustering, text processing, and visual analysis algorithms to merge crowd output. The workflow does not rely on domain-specific customization, works on top of existing videos, and recruits untrained crowd workers. We evaluated the workflow with Mechanical Turk, using 75 cooking, makeup, and Photoshop videos on YouTube. Results show that our workflow can extract steps with a quality comparable to that of trained annotators across all three domains with 77% precision and 81% recall.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
788
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
8
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

  1. 1. Juho Kim Phu Nguyen Sarah Weir Philip J. Guo Robert C. Miller Krzysztof Z. Gajos Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
  2. 2. how-to videos online
  3. 3. learning from how-to videos: limited by video player interfaces
  4. 4. Watching Example
  5. 5. Problem in Watching It’s difficult to navigate to specific parts you’re interested in.
  6. 6. Problem in Watching It’s difficult to navigate to specific parts you’re interested in. find repeat skip
  7. 7. How-to Video: Step-by-Step Nature Apply gradient map
  8. 8. Completeness & detail of step-by-step instructions are integral to task performance. Eiriksdottir and Catrambone, 2011 Proactive & random access, semantic indices in instructional videos: better task performance and learner satisfaction Zhang et al., 2006 Interactivity can help overcome the difficulties of perception and comprehension. Stopping, starting and replaying an animation can allow reinspection. Tversky et al., 2002
  9. 9. Design Insight Enable step-by-step navigation with high interactivity
  10. 10. ToolScape: Step-aware video player
  11. 11. work in progress images parts with no visual progress step labels & links
  12. 12. enhance existing how-to videos with step-level interactivity & annotation
  13. 13. Research Questions Does step-by-step navigation help learners? Preliminary user study How can we annotate an existing how-to video with step-by-step information? Crowdsourcing annotation workflow
  14. 14. Research Questions Does step-by-step navigation help learners? Preliminary user study How can we annotate an existing how-to video with step-by-step information? Crowdsourcing annotation workflow
  15. 15. Study: Photoshop Design Tasks 12 novice Photoshop users manually annotated videos
  16. 16. Baseline ToolScape
  17. 17. With ToolScape, learners will… H1. feel more confident about their design skills. - self-efficacy gain H2. believe they produced better designs. - self-rating on designs produced H3. actually produce better designs. - external rating on designs produced
  18. 18. H1. Higher self-efficacy gain with ToolScape –  Four 7-Likert scale questions –  Mann-Whitney’s U test (Z=2.06, p<0.05), error bar: standard error 1.4   0   1   2   3   4   5   6   7   ToolScape   Baseline   0.1  3.8   3.8  
  19. 19. H2. Higher self-rating with ToolScape –  One 7-Likert scale question –  Mann-Whitney’s U test (Z=2.70, p<0.01), error bar: standard error 5.3   3.5   0   1   2   3   4   5   6   7   ToolScape   Baseline  
  20. 20. H3. External raters rank ToolScape designs higher. –  (Ranking: Lower is better) –  Wilcoxon Signed-rank test (W=317, Z=-2.79, p<0.01, r=0.29) , error bar: standard error –  Krippendorff’s alpha = 0.753 5.7   7.3   0   2   4   6   8   10   12   ToolScape   Baseline  
  21. 21. Non-sequentially navigating video Step-level navigation: clicked 8.9 times per task “It is great for skipping straight to relevant portions of the tutorial.” “It was also easier to go back to parts I missed.”
  22. 22. Research Questions Does step-by-step navigation help learners? Preliminary user study How can we annotate an existing how-to video with step-by-step information? Crowdsourcing annotation workflow
  23. 23. Annotations for Step-Aware Video Player •  step time •  step label •  before/after results
  24. 24. Design Goals for Annotation Method •  domain-independent •  existing videos •  untrained annotators
  25. 25. Crowdsourcing
  26. 26. Multi-stage crowdsourcing workflow When & What are the steps? Vote & Improve Before/After the steps? FIND VERIFY EXPAND
  27. 27. When & What are the steps? Vote & Improve Before/After the steps? FIND VERIFY EXPAND Input video
  28. 28. When & What are the steps? Vote & Improve Before/After the steps? FIND VERIFY EXPAND Input video
  29. 29. When & What are the steps? Vote & Improve Before/After the steps? FIND VERIFY EXPAND Input video
  30. 30. When & What are the steps? Vote & Improve Before/After the steps? FIND VERIFY EXPAND Input video
  31. 31. When & What are the steps? Vote & Improve Before/After the steps? FIND VERIFY EXPAND Input video Output timeline
  32. 32. Stage 1. FIND candidate steps
  33. 33. Labeling a step
  34. 34. Time-based Clustering
  35. 35. Stage 2. VERIFY steps by voting/improving
  36. 36. Quality control for Stage 2 •  Majority voting •  Breaking ties – String matching to combine “similar enough” labels – Longer string “grate three cups of cheese” > “grate cheese”
  37. 37. Stage 3. EXPAND with before/after images
  38. 38. Quality control for Stage 3 •  Majority voting •  Breaking ties: – Pixel diff to combine “similar enough” frames – Choose what’s closer to the step
  39. 39. Evaluation •  Generalizable? 75 Photoshop / Cooking / Makeup videos •  Accurate? precision and recall against trained annotators’ labels
  40. 40. Across all domains, ~80% precision and recall Domain Precision Recall Cooking 0.77 0.84 Makeup 0.74 0.77 Photoshop 0.79 0.79 All 0.77 0.81
  41. 41. Conceptual Level Differences •  “Now apply the bronzer to your face evenly” •  “Apply the bronzer to the forehead” •  “Apply the bronzer to the cheekbones” •  “Apply the bronzer to the jawline”
  42. 42. Timing is 2.7 seconds off on average Ground truth: one step every 17.3 seconds 2.7 seconds
  43. 43. Cost: $1.07 per minute of video • 111 HITs / video (3 workers / task) • $2.50 / video (Find + Verify) • $4.85 / video (Find + Verify + Expand) • $0.32 / step (time + label + before/after)
  44. 44. Contributions •  Study: increased interactivity improved task performance & self-efficacy •  Crowd video annotation method & Find-Verify-Expand design pattern •  Evaluation: fully extracted 75 existing videos across 3 domains, 80% accuracy
  45. 45. hierarchical solution structure extraction Catrambone, R. The subgoal learning model: Creating better examples so that students can solve novel problems. Journal of Experimental Psychology: General, 127, (1998). Ongoing Work: Beyond low-level steps
  46. 46. hierarchical solution structure extraction Ongoing Work: Beyond low-level steps Learnersourcing: learners as a crowd •  Motivated, qualified •  Feedback loop between learners & system
  47. 47. Future of How-to Video Learning What if we had 1000s of fully annotated videos? •  Flexible learning paths with multiple videos •  Step-level search, recommendation •  Patterns from multiple solutions
  48. 48. Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos Juho Kim MIT CSAIL juhokim@mit.edu juhokim.com Acknowledgement: This work was supported in part by Quanta Computer & the Samsung Fellowship.

×