Active learning

Active Learning
Ragib Ahsan
Committee
Prof. Xinhua Zhang (Chair)
Prof. Brian Ziebart
Prof. Jon A Solworth

Overview
● What is active learning?
● Does active learning make any difference?
● Active learning from multiple oracles
● Active learning with weak and strong oracle
● Multiple oracles with varying expertise
2

What is Active Learning?
● Introduced in Education by 1990s
● Let students participate actively
● Doing things rather than just listening
● Inspired machine learning
● Also known as Query Learning
3

Contrast to passive learning
Passive Learning Active Learning
4

Applications
● Fewer labeled data
● Speech Recognition
○ Word level annotation can take ten times longer
than actual audio (Zhu, 2005)
● Medical Diagnosis
○ Expert doctors
● Document Classification
5

Active Learning Examples
Pool based active learning (Settles, 2009) 6

Active Learning Examples
a) Toy dataset, two Gaussians b) logistic regression model produces 70% accuracy c) logistic
regression with active querying produces 90% accuracy (Settles, 2009)
7

Human Active Learning
[Source: JSLHR]
8

Does AL make any difference?
“Learners do benefit from the
opportunity to actively select
examples during learning. But
It is very difficult to asses the
magnitude of difference that
active learning makes
compared to passive learning”
Laughlin (1973)
There were conflicting claims
throughout the literature on
the effectiveness of active
learning
9

“People make inappropriate
queries to assess simple logical
hypotheses such as if p then q
(frequently examining q
instances to see if they are p, and
failing to explore not-q instances”
Wason et al. (1972)
“If the learning task is properly
construed, human actually do a
great job in asking questions”
Gigerenzer et al.(2002)
Oaksford et al. (2007)
10

Castro et al. (2008) addressed these questions:
[Q1] Do humans perform better when they can select their own examples for labeling,
compared to passive observation of labeled examples?
[Q2] If so, do they achieve the full benefit of active learning suggested by statistical
learning theory?
[Q3] If they do not, can machine learning be used to enhance human performance?
[Q4] Do the answers to these questions vary depending upon the difficulty of the
learning problem?
11

Task Formulation
● Binary Classification in interval [0,1]
● Unknown decision boundary,
● 0 and 1 class
● n samples
● Xi
[0, 1], Yi
{0, 1}
● Yi
is correct with probability 1 − ε
● 0 ≤ ε < 1/2
12
[Source: Castro et. al. (2008)]

Error bound (ε = 0)
● Passive Learning
○ Random sampling
○ Error: O(1/n)
● Active Learning
○ Binary search
○ Error: O(2-n
)
13

Error bound (ε > 0)
● Passive learning
● Active learning
[ Maximum Likelihood Estimate ]
14

Experiment
A few 3D visual stimuli and their X values used in our experiment.
Participant was asked to guess the decision boundary
after every three iterations
15

Experiment
● Random
○ No queries
● Human Active
○ Active queries
● Machine Yoked
○ Machine makes query
○ Human observes
16

Results
Iteration, n
17[Source: Castro et. al. (2008)]

Answers
[Q1] Do humans perform better when they can select their own examples for labeling,
compared to passive observation of labeled examples? - Yes, in low noise levels
[Q2] If so, do they achieve the full benefit of active learning suggested by statistical
learning theory? - No, slower decay constants
[Q3] If they do not, can machine learning be used to enhance human performance? -
Inconclusive
[Q4] Do the answers to these questions vary depending upon the difficulty of the
learning problem? - Yes, with noise levels
18

Conclusion
● Simple learning task
● Machine Yoked Learning
● Impact on:
○ Fields of psychology and cognitive sciences
○ Intelligent tutoring systems
19

Multiple Oracle: Challenges
● How to select the most informative query?
● How to select the best oracle to ask questions?
● How to deal with disagreement among the
oracles?
● How to deal with a noisy or weak oracle?
22

Weak and strong labeler
● Zhang et al. (2015) considered exactly two oracles
● One standard oracle
○ Accurate but costly
● One weak oracle
○ Noisy but cheap
● Goal
○ Reduce number of queries to standard oracle
○ No impact on accuracy
23

Observations
● Difference Classifier to predict disagreement between
strong and weak labeler
○ Might not be statistically consistent
○ Can use cost-sensitive difference classifier
● Active learning queries a localized region of space
○ Train difference classifier on that localized region
24

Disagreement Based Active Learning (DBAL)
Vt
X
h1
h2
h7
h6
h3
h5
h4
h*
x1
x2
x8
x3
x6
x5
x7
x4
h1
(x1
) = h2
(x1
) = h3
(x1
) = h4
(x1
) = h4
(x1
)
h1
(x3
) != h2
(x3
) = h3
(x3
) = h4
(x3
) = h5
(x3
)
h1
(x4
) = h2
(x4
) = h3
(x4
) = h4
(x4
) = h5
(x4
)
query x3
O . . . . . . . . . . .
update
25

Problem Formulation
● Unlabeled Distribution, U
● Input space, X
● Label space, Y
● Hypothesis class, H
● Data distribution, D
● Excess error,
● Goal:
with as few queries to O as possible
Strong
Oracle
O
Weak
Oracle
W
26

Algorithm
● Three key ideas
○ Difference classifier
○ Disagreement region DIS(V)
■ Region of the input space
where two member
classifiers disagree
○ Epoch based agnostic CAL
■ Train fresh difference
classifier in each epoch
27
[Source: Theory of Active Learning
(Steve Hanneke, 2014)]

Algorithm
● Initialize error 0
, total number of epochs k0
and draw some n0
examples
to form labeled dataset S0
● In each iteration up to k’ iterations:
○ Set target error
○ Draw nk
unlabeled samples
○ Identify disagreement region Ak
○ Train difference classifier hdf with Ak
, O, W
○ Active learning using hdf
■ Draw mk examples, use hdf
and query either O or W. Add the labeled data
to Sk
● Return a classifier learned from the labeled dataset Sk’
28

Performance Guarantee
● First term for learning, second for training difference classifier
● Second term is lower order term when d ≈ d’
● Fitting the difference classifier does not incur a high overhead
29

AL from crowds
● Multiple experts in supervised learning (Raykar et al.,
2009 and Yan et al., 2010)
● NLP tasks from AMT data (Snow et al., 2008)
● Yan et al., 2011 proposed a novel method in active
learning
● Focus:
○ Most informative query
○ Most useful annotator
31

Proposed Model
32
[Source: Yan et. al. (2011)]

Algorithm
● Two key steps
○ Select a sample to label next
○ Select the best annotator to label
● Select sample
○ Uncertainty sampling
■ Select the sample for which classifier is least
certain about
34

Algorithm: Select Sample
Where, and (ᾶ > 0)
Separating hyperplane:
35

Algorithm: Select Annotator
(3.6)
36

Experiment
(left) Labels, (center) Areas of Labeler expertise and (right) annotator selection information for the
simplified two dimensional Galaxy Dim Data (Yan et al., 2011)
38

Experiment: Baselines
● active learning+majority vote
○ Active query based on majority vote of all annotators
● random sample+multi-labeler
○ Multi labeler algorithm on randomly sampled
examples
● random sample+majority vote
○ Random sampling with majority vote
39

Experimental Result
Accuracy comparisons on text data for the polarity, focus and the evidence labelings (Yan et al., 2011)
40

More Analyses
● Decision boundary intersects
all region of expertise
● Comparison with single oracle
AL
● Specialized vs General
expertise
41
[Source: Yan et. al. (2011)]

Future Direction
● More Applications
○ Real world problems
● Optimal number of oracles
○ Does multiple oracles always performs better than single oracle?
○ Is there an optimal number of oracles that works best?
● Cost function associated with labeling
○ Choose single vs multiple oracles
● General expertise
○ Each of multiple oracles have general expertise
42

References
● Castro, Rui M. et al. (2008). “Human Active Learning”. In: NIPS.
● Gigerenzer, Gerd and Reinhard Selten (2002). Bounded rationality: The
adaptive toolbox. MIT press.
● Laughlin, Patrick R. (1973). “Focusing strategy in concept attainment as a
function of instructions and task”. In: Journal of Experimental Psychology.
● Oaksford, Mike and Nick Chater (2007). Bayesian rationality: The
probabilistic approach to human reasoning. Oxford University Press.
● Raykar, Vikas C. et al. (2009). “Supervised learning from multiple experts:
whom to trust when everyone lies a bit”. In: ICML.
● Settles, Burr (2009). Active Learning Literature Survey. Computer Sciences
Technical Report 1648. University of Wisconsin–Madison.
43

References
● Snow, Rion et al. (2008). “Cheap and Fast - But is it Good? Evaluating
Non-Expert Annotations for Natural Language Tasks”. In: EMNLP.
● Wason, Peter Cathcart and Philip N Johnson-Laird (1972). Psychology of
reasoning: Structure and content. Vol. 86. Harvard University Press.
● Yan, Yan et al. (2010). “Modeling annotator expertise: Learning when
everybody knows a bit of something”. In: AISTATS.
● Yan, Yan et al. (2011). “Active Learning from Crowds”. In: ICML.
● Zhang, Chicheng and Kamalika Chaudhuri (2015). “Active Learning from
Weak and Strong Labelers”. In: NIPS.
● Zhu, Xiaojin (2005). “Semi-supervised Learning with Graphs”. AAI3179046.
PhD thesis. Pittsburgh, PA, USA
● Hanneke, Steve (2014). “Theory of Active Learning”
44

Appendix: WeakStrong Algorithm
47

Appendix: WeakStrong Algorithm
48

Appendix: WeakStrong Performance Guarantee
49

Active learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Active learning

Similar to Active learning (20)

Recently uploaded

Recently uploaded (20)

Active learning