Strategies for Practical Active Learning, Robert Munro

Strategies for Practical Active
LearningRobert Munro, PhD
VP of Machine Learning, CrowdFlower
@WWRob
Open Data Science Conference #ODSC
November 3, 2017

My Background
Disaster Response/Recovery
Stanford PhD: NLP in Health and Disaster Response
Product for NLP at AWS’s Amazon AI
VP of ML at CrowdFlower: Annotation and Human-in-the-
Loop ML

Active Learning
What is Active Learning?
• Selecting the optimal data to manually label for Machine Learning
Why it is important?
• The right data can increase accuracy more than the algorithm
Why it is overlooked?
• Active Learning is everywhere in industry, but <5% of academic papers

Active Learning
Selecting the optimal data to manually label for Machine Learning
Often a continuous feedback loop
“Please identify
pictures of cats,
like this one”
“Ok!”
“Are these cats?”

Why is Active Learning
Important?
Human resources are limited. What is the right data to focus on?
“Please identify
pictures of cats,
like this one”
“Ok!”
“Are these cats?”

Mentions in ACM papers for AI-related terms (http://dl.acm.org/):
Academia has largely ignored Active Learning
Why is Active Learning
Overlooked?

Background:
ImageNet and TensorFlow

ImageNet
~1 Million images labeled
with 1,000+ categories
The categories are from
WordNet: hierarchy of terms
Source: http://image-net.org/explore

TensorFlow:
an open Machine Learning library
We will use a pre-trained Deep Learning model for ImageNet
Deep Learning models for images are networks of ‘layers’ where each
layer is a further refinement from raw pixels to the target label
Matthew Zeiler and Rob Fergus. ZF NET

TensorFlow’s ImageNet model
Example output :
['canoe', 0.90240431], ['paddle,
boat paddle', 0.042475685],
['gondola', 0.0011620093],
['sandbar, sand bar',
0.0011261732], ['snorkel',
0.00047367468]
Predicting that this image is a
‘canoe’ with 90.2% confidence, is
a ‘paddle/boat paddle’ with 4.2%
confidence, etc

Active Learning:
What should humans review to add new
labels to?

Starter code
“Active Learning with
TensorFlow and ImageNet”
https://github.com/rmunro/active_learning_imagenet
Starter code and images to use
Active Learning to apply ImageNet labels to
new sports-related images

Ambiguous items
Example:
the top two predictions have
36.9% and 32.2% confidence
[['volleyball', 0.36908466],
['balance beam, beam',
0.32213417], ['stage',
0.020542733], ['basketball',
0.019910889], ['horizontal bar,
high bar', 0.011983166]]

Low confidence items
Example:
the top prediction has only
11.2% confidence
['parachute, chute',
0.11202857], ['geyser',
0.075139046], ['wing',
0.074320331], ['cliff, drop,
drop-off', 0.074191555],
['balloon', 0.053766355]

Randomly selected items
Evaluate accuracy on a
random set of items
The most valuable items to label
are confidently wrong
[['volleyball', 0.80830169], ['rugby
ball', 0.029293904], ['bathing cap,
swimming cap', 0.020639554],
['soccer ball', 0.020503236],
['bikini, two-piece', 0.011906843]]

Advanced Active Learning
Clustering (unsupervised or 2nd-to-last
layer)
Select equal numbers from all clusters

Advanced Active Learning
Using external resources:
eg WordNet distance between top predictions
['lawn mower, mower', 0.44160703], ['crash
helmet', 0.18804552], ['vacuum, vacuum
cleaner', 0.038397752], ['go-kart',
0.03737054], ['motor scooter, scooter',
0.033097573]

Active Learning Exceptions
What if low confidence items are not spread across all classes?
Eg: Squash or Racquetball?

Active Learning Exceptions
What if you care about some types of labels more than others?
Over-sample the labels you care about.
Use clustering or external resources.
Be careful about introducing bias!

Trade-offs:
• More repetitive work is faster but more error prone due to boredom
• Less repetitive work is slower but more accurate
Workflows:
• What is the best interface to get unbiased data for evaluation?
• What is the fastest interface to get human verification on confident model
predictions?
For starter code on interfaces, see:
https://github.com/rmunro/annotation_imagenet
Interface Design for
Annotations

In reality, you can chose:
– Crowdsourced workers
– Trained contractors
– Business Process Outsources
– Your own in-house annotators
– Some combination of the above
Getting Human Judgments

Ensuring Quality Annotations
1. Embed ‘gold’ (known) answers to quiz workers and track accuracy
2. Select the right annotators for the job
3. Give the same job to multiple people, and tracking agreement
4. Break up complex tasks into simpler ones
5. Remove ordering effects and ‘priming’
6. Subjective task? Use Bayesian truth serum

Getting started
Annotate ~10% of new data randomly. This is your
baseline
Use random, held-out data for accuracy:
micro-F, macro-f, ROC, entropy / information gain

Getting started
Annotate ~90% of new data using ambiguous or low-
confidence. Compare 10% subset to baseline.
More accurate? Continue!

Getting started
Annotate 90% of new data using ambiguous or low-
confidence. Compare 10% subset to baseline.
Less accurate? Try more advanced strategies!

Getting started
Does accuracy start to plateau or
decline relative to baseline? You
might be biased towards a subset:
Increase % of randomly selected
items
Still not working? Looking into
clustering and other methods to
ensure data variety

Summary
What is Active Learning?
• Selecting the optimal data to manually label for Machine Learning
• You now know how to do this!
Why it is important?
• The right data can increase accuracy more than the algorithm
• Test this for yourself!
Why it is overlooked?
• Active Learning is everywhere in industry, but <5% of academic papers
• Please share your results!

Strategies for Practical Active Learning, Robert Munro

More Related Content

What's hot

Similar to Strategies for Practical Active Learning, Robert Munro

More from Robert Munro

Recently uploaded

Strategies for Practical Active Learning, Robert Munro