CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Juho Kim Phu Nguyen
Sarah Weir Philip J. Guo
Robert C. Miller Krzysztof Z. Gajos
Crowdsourcing Step-by-Step
Information Extraction to
Enhance Existing How-to Videos

learning from how-to videos:
limited by video player interfaces

Problem in Watching

It’s difficult to navigate to
specific parts you’re interested in.

Problem in Watching

It’s difficult to navigate to
specific parts you’re interested in.
find
repeat
skip

How-to Video: Step-by-Step Nature
Apply
gradient map

Completeness & detail of step-by-step instructions are
integral to task performance.
Eiriksdottir and Catrambone, 2011
Proactive & random access, semantic indices in
instructional videos: better task performance and learner
satisfaction
Zhang et al., 2006
Interactivity can help overcome the difﬁculties of
perception and comprehension. Stopping, starting and
replaying an animation can allow reinspection.
Tversky et al., 2002

Design Insight
Enable step-by-step navigation with high interactivity

ToolScape: Step-aware video player

work in progress
images
parts with no
visual progress
step labels & links

enhance existing how-to videos with
step-level interactivity & annotation

Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-to
video with step-by-step information?
Crowdsourcing annotation workflow

Study: Photoshop Design Tasks
12 novice Photoshop users
manually annotated videos

With ToolScape, learners will…

H1. feel more confident about their design
skills.
- self-efficacy gain

H2. believe they produced better designs.
- self-rating on designs produced

H3. actually produce better designs.
- external rating on designs produced

H1. Higher self-efficacy gain with ToolScape
–  Four 7-Likert scale questions
–  Mann-Whitney’s U test (Z=2.06, p<0.05), error bar: standard error
1.4

0
1
2
3
4
5
6
7

ToolScape

Baseline
0.1
3.8

3.8

H2. Higher self-rating with ToolScape
–  One 7-Likert scale question
–  Mann-Whitney’s U test (Z=2.70, p<0.01), error bar: standard error
5.3

3.5

0
1
2
3
4
5
6
7

ToolScape

Baseline

H3. External raters rank ToolScape designs higher.
–  (Ranking: Lower is better)
–  Wilcoxon Signed-rank test (W=317, Z=-2.79, p<0.01, r=0.29) , error bar: standard error
–  Krippendorff’s alpha = 0.753
5.7

7.3

0
2
4
6
8
10
12

ToolScape

Baseline

Non-sequentially navigating video
Step-level navigation: clicked 8.9 times per task
“It is great for skipping straight to relevant
portions of the tutorial.”

“It was also easier to go back to parts I missed.”

Annotations for Step-Aware Video Player
•  step time
•  step label
•  before/after results

Design Goals for Annotation Method
•  domain-independent
•  existing videos
•  untrained annotators

Multi-stage crowdsourcing workflow
When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
VERIFY
EXPAND

When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
VERIFY
EXPAND
Input video

When &
What are the
steps?
Vote &
Improve
Before/After
the steps?
FIND
VERIFY
EXPAND
Input video
Output timeline

Stage 2. VERIFY steps by voting/improving

Quality control for Stage 2
•  Majority voting
•  Breaking ties
– String matching to combine
“similar enough” labels

– Longer string

“grate three cups of cheese” > “grate cheese”

Stage 3. EXPAND with
before/after images

Quality control for Stage 3
•  Majority voting
•  Breaking ties:
– Pixel diff to combine
“similar enough” frames

– Choose what’s closer to the step

Evaluation
•  Generalizable?

75 Photoshop / Cooking / Makeup videos
•  Accurate?
precision and recall
against trained annotators’ labels

Across all domains,
~80% precision and recall
Domain
Precision
Recall
Cooking
0.77
0.84
Makeup
0.74
0.77
Photoshop
0.79
0.79
All
0.77
0.81

Conceptual Level Differences
•  “Now apply the bronzer to your face
evenly”
•  “Apply the bronzer to the forehead”
•  “Apply the bronzer to the cheekbones”
•  “Apply the bronzer to the jawline”

Timing is 2.7 seconds off on average
Ground truth: one step every 17.3 seconds
2.7 seconds

Cost: $1.07 per minute of video
• 111 HITs / video (3 workers / task)
• $2.50 / video (Find + Verify)
• $4.85 / video (Find + Verify + Expand)
• $0.32 / step (time + label + before/after)

Contributions

•  Study: increased interactivity improved
task performance & self-efficacy
•  Crowd video annotation method &
Find-Verify-Expand design pattern
•  Evaluation: fully extracted 75 existing videos
across 3 domains, 80% accuracy

hierarchical solution structure extraction
Catrambone, R. The subgoal learning model: Creating better examples so that
students can solve novel problems. Journal of Experimental Psychology: General, 127, (1998).
Ongoing Work: Beyond low-level steps

hierarchical solution structure extraction
Ongoing Work: Beyond low-level steps
Learnersourcing: learners as a crowd
•  Motivated, qualified
•  Feedback loop between learners & system

Future of How-to Video Learning
What if we had 1000s of
fully annotated videos?
•  Flexible learning paths with multiple videos
•  Step-level search, recommendation
•  Patterns from multiple solutions

Crowdsourcing Step-by-Step Information Extraction to
Enhance Existing How-to Videos
Juho Kim
MIT CSAIL

juhokim@mit.edu

juhokim.com
Acknowledgement: This work was supported in part by
Quanta Computer & the Samsung Fellowship.

CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Similar to CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos (20)

Recently uploaded

Recently uploaded (20)

CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos