The document presents a method for actively evaluating classifiers on large datasets. It aims to estimate the true accuracy of a classifier on an unlabeled dataset given a small labeled subset. It proposes using a stratified estimate, where the labeled and unlabeled data are stratified into buckets based on a learned hash function of instance features. Weights for each bucket are estimated based on the fraction of unlabeled instances in that bucket. This allows accuracy estimation to scale to large unlabeled datasets.
2. ers on Large
Datasets
Namit Katariya, Arun Iyer, Sunita Sarawagi
IIT Bombay
December 13, 2012
International Conference on Data Mining, 2012
Active Evaluation of Classi
4. Problem setup Accuracy estimation Results Summary References
Problem setup
Setup
A classi
5. er C(x) deployed on
A large unlabeled dataset D
Estimate true accuracy of C(x) on D
Given
A labeled set L : small or unrepresentative of D
Measured accuracy on labeled set6= True accuracy on data
A human labeler
Compelling problem in many real-life applications
Active Evaluation of Classi
7. Problem setup Accuracy estimation Results Summary References
Problem setup
Setup
A classi
8. er C(x) deployed on
A large unlabeled dataset D
Estimate true accuracy of C(x) on D
Given
A labeled set L : small or unrepresentative of D
Measured accuracy on labeled set6= True accuracy on data
A human labeler
Compelling problem in many real-life applications
Active Evaluation of Classi
10. Problem setup Accuracy estimation Results Summary References
Problem setup
Setup
A classi
11. er C(x) deployed on
A large unlabeled dataset D
Estimate true accuracy of C(x) on D
Given
A labeled set L : small or unrepresentative of D
Measured accuracy on labeled set6= True accuracy on data
A human labeler
Compelling problem in many real-life applications
Active Evaluation of Classi
13. Problem setup Accuracy estimation Results Summary References
Problem setup
Setup
A classi
14. er C(x) deployed on
A large unlabeled dataset D
Estimate true accuracy of C(x) on D
Given
A labeled set L : small or unrepresentative of D
Measured accuracy on labeled set6= True accuracy on data
A human labeler
Compelling problem in many real-life applications
Active Evaluation of Classi
17. er e.g. a user script
existing work assumes that C(x) is probabilistic and can output
well-calibrated Pr(yjx) values.
2 Provide interactive speed to user in loop even when D is very
large
even a single sequential scan on D may not be practical.
D can be accessed only via an index.
3 Require user to label as few additional instances as possible
Similar to active learning but dierent ...
Active learning usually used in the context of learning classi
23. er e.g. a user script
existing work assumes that C(x) is probabilistic and can output
well-calibrated Pr(yjx) values.
2 Provide interactive speed to user in loop even when D is very
large
even a single sequential scan on D may not be practical.
D can be accessed only via an index.
3 Require user to label as few additional instances as possible
Similar to active learning but dierent ...
Active learning usually used in the context of learning classi
29. er e.g. a user script
existing work assumes that C(x) is probabilistic and can output
well-calibrated Pr(yjx) values.
2 Provide interactive speed to user in loop even when D is very
large
even a single sequential scan on D may not be practical.
D can be accessed only via an index.
3 Require user to label as few additional instances as possible
Similar to active learning but dierent ...
Active learning usually used in the context of learning classi
35. er e.g. a user script
existing work assumes that C(x) is probabilistic and can output
well-calibrated Pr(yjx) values.
2 Provide interactive speed to user in loop even when D is very
large
even a single sequential scan on D may not be practical.
D can be accessed only via an index.
3 Require user to label as few additional instances as possible
Similar to active learning but dierent ...
Active learning usually used in the context of learning classi
40. Problem setup Accuracy estimation Results Summary References
Outline
The problem has two aspects
1 Accuracy estimation (What this talk is about)
Given a
41. xed L what is the best estimator of ?
How to do this scalably on large D?
2 Instance selection (Not covered in this talk, details in paper)
Selecting instances from D to be labeled by a human and adding
to L. Performed in a loop.
Active Evaluation of Classi
43. Problem setup Accuracy estimation Results Summary References
Accuracy estimation
Simple averaged estimate : ^R = 1n
P
i2L ai is poor when L is
small or biased
A classical solution : strati
44. ed estimate
Stratify L and D into B buckets (L1;D1); : : : ; (LB;DB)
Measure accuracy ^b of Lb in each bucket b
Estimate weight wb as fraction of D instances in bucket b= jDbj
jDj
Strati
45. ed estimate ^S =
P
b wb ^b
Error of ^S Error of ^R if instances within a bucket are
homogeneous
Two challenges
1 Selecting a strati
46. cation strategy that puts instances with same
error in the same bucket
2 Finding wb scalably when D is large ) cannot stratify whole of D.
Active Evaluation of Classi
48. Problem setup Accuracy estimation Results Summary References
Accuracy estimation
Simple averaged estimate : ^R = 1n
P
i2L ai is poor when L is
small or biased
A classical solution : strati
49. ed estimate
Stratify L and D into B buckets (L1;D1); : : : ; (LB;DB)
Measure accuracy ^b of Lb in each bucket b
Estimate weight wb as fraction of D instances in bucket b= jDbj
jDj
Strati
50. ed estimate ^S =
P
b wb ^b
Error of ^S Error of ^R if instances within a bucket are
homogeneous
Two challenges
1 Selecting a strati
51. cation strategy that puts instances with same
error in the same bucket
2 Finding wb scalably when D is large ) cannot stratify whole of D.
Active Evaluation of Classi
53. Problem setup Accuracy estimation Results Summary References
Accuracy estimation
Simple averaged estimate : ^R = 1n
P
i2L ai is poor when L is
small or biased
A classical solution : strati
54. ed estimate
Stratify L and D into B buckets (L1;D1); : : : ; (LB;DB)
Measure accuracy ^b of Lb in each bucket b
Estimate weight wb as fraction of D instances in bucket b= jDbj
jDj
Strati
55. ed estimate ^S =
P
b wb ^b
Error of ^S Error of ^R if instances within a bucket are
homogeneous
Two challenges
1 Selecting a strati
56. cation strategy that puts instances with same
error in the same bucket
2 Finding wb scalably when D is large ) cannot stratify whole of D.
Active Evaluation of Classi
58. Problem setup Accuracy estimation Results Summary References
Accuracy estimation
Simple averaged estimate : ^R = 1n
P
i2L ai is poor when L is
small or biased
A classical solution : strati
59. ed estimate
Stratify L and D into B buckets (L1;D1); : : : ; (LB;DB)
Measure accuracy ^b of Lb in each bucket b
Estimate weight wb as fraction of D instances in bucket b= jDbj
jDj
Strati
60. ed estimate ^S =
P
b wb ^b
Error of ^S Error of ^R if instances within a bucket are
homogeneous
Two challenges
1 Selecting a strati
61. cation strategy that puts instances with same
error in the same bucket
2 Finding wb scalably when D is large ) cannot stratify whole of D.
Active Evaluation of Classi
63. Problem setup Accuracy estimation Results Summary References
Accuracy estimation
Simple averaged estimate : ^R = 1n
P
i2L ai is poor when L is
small or biased
A classical solution : strati
64. ed estimate
Stratify L and D into B buckets (L1;D1); : : : ; (LB;DB)
Measure accuracy ^b of Lb in each bucket b
Estimate weight wb as fraction of D instances in bucket b= jDbj
jDj
Strati
65. ed estimate ^S =
P
b wb ^b
Error of ^S Error of ^R if instances within a bucket are
homogeneous
Two challenges
1 Selecting a strati
66. cation strategy that puts instances with same
error in the same bucket
2 Finding wb scalably when D is large ) cannot stratify whole of D.
Active Evaluation of Classi
69. cation strategy (h)
Existing approach : Bin Pr(yjx) values assumed to be provided
by a classi
70. er
Bennett and Carvalho Druck and McCallum
Not applicable since we wish to handle arbitrary classi
71. ers.
Our Proposal :
F(x; C) : A feature representation of the input x and the result of
deploying C on x
Learn a hash function h on features using the labeled data L
Strati
76. cation strategy (h)
Existing approach : Bin Pr(yjx) values assumed to be provided
by a classi
77. er
Bennett and Carvalho Druck and McCallum
Not applicable since we wish to handle arbitrary classi
78. ers.
Our Proposal :
F(x; C) : A feature representation of the input x and the result of
deploying C on x
Learn a hash function h on features using the labeled data L
Strati
83. cation strategy (h)
Existing approach : Bin Pr(yjx) values assumed to be provided
by a classi
84. er
Bennett and Carvalho Druck and McCallum
Not applicable since we wish to handle arbitrary classi
85. ers.
Our Proposal :
F(x; C) : A feature representation of the input x and the result of
deploying C on x
Learn a hash function h on features using the labeled data L
Strati
90. cation strategy (h)
Existing approach : Bin Pr(yjx) values assumed to be provided
by a classi
91. er
Bennett and Carvalho Druck and McCallum
Not applicable since we wish to handle arbitrary classi
92. ers.
Our Proposal :
F(x; C) : A feature representation of the input x and the result of
deploying C on x
Learn a hash function h on features using the labeled data L
Strati
96. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Background
Hash function : concatenation of r bits
bit k = sign(wk :F(x)) i.e. the side of wk on which x lies
wk : parameters to be learnt
Learning problem : non-smooth and non-convex
Existing work : learning distance-preserving hash function
Learn wk parameters sequentially smooth the sign function
through various tricks
Our problem can be expressed as special case of learning
distance-preserving hash function
Exploit 0/1 nature of accuracy values rather than optimizing a
black-box distance measure
Allows a more ecient algorithm
Active Evaluation of Classi
98. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Background
Hash function : concatenation of r bits
bit k = sign(wk :F(x)) i.e. the side of wk on which x lies
wk : parameters to be learnt
Learning problem : non-smooth and non-convex
Existing work : learning distance-preserving hash function
Learn wk parameters sequentially smooth the sign function
through various tricks
Our problem can be expressed as special case of learning
distance-preserving hash function
Exploit 0/1 nature of accuracy values rather than optimizing a
black-box distance measure
Allows a more ecient algorithm
Active Evaluation of Classi
100. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Background
Hash function : concatenation of r bits
bit k = sign(wk :F(x)) i.e. the side of wk on which x lies
wk : parameters to be learnt
Learning problem : non-smooth and non-convex
Existing work : learning distance-preserving hash function
Learn wk parameters sequentially smooth the sign function
through various tricks
Our problem can be expressed as special case of learning
distance-preserving hash function
Exploit 0/1 nature of accuracy values rather than optimizing a
black-box distance measure
Allows a more ecient algorithm
Active Evaluation of Classi
102. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Background
Hash function : concatenation of r bits
bit k = sign(wk :F(x)) i.e. the side of wk on which x lies
wk : parameters to be learnt
Learning problem : non-smooth and non-convex
Existing work : learning distance-preserving hash function
Learn wk parameters sequentially smooth the sign function
through various tricks
Our problem can be expressed as special case of learning
distance-preserving hash function
Exploit 0/1 nature of accuracy values rather than optimizing a
black-box distance measure
Allows a more ecient algorithm
Active Evaluation of Classi
104. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Background
Hash function : concatenation of r bits
bit k = sign(wk :F(x)) i.e. the side of wk on which x lies
wk : parameters to be learnt
Learning problem : non-smooth and non-convex
Existing work : learning distance-preserving hash function
Learn wk parameters sequentially smooth the sign function
through various tricks
Our problem can be expressed as special case of learning
distance-preserving hash function
Exploit 0/1 nature of accuracy values rather than optimizing a
black-box distance measure
Allows a more ecient algorithm
Active Evaluation of Classi
106. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Our approach
Main step :
107. nd the best value of wk for a bit k, assuming
hyperplanes of other bits are
108. xed
For each bucket formed from remaining hyperplanes, arbitrarily
choose which side of hyperplane wk it wants to call +ve or -ve
Use logistic loss to
109. nd the optimal wk so that in each bucket,
wk correctly puts points on the right side
Can be solved optimally using a standard logistic classi
111. ne the side chosen as positive in
each bucket until convergence
We see in our experiments that our method provides much better
results than existing algorithms
Active Evaluation of Classi
113. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Our approach
Main step :
114. nd the best value of wk for a bit k, assuming
hyperplanes of other bits are
115. xed
For each bucket formed from remaining hyperplanes, arbitrarily
choose which side of hyperplane wk it wants to call +ve or -ve
Use logistic loss to
116. nd the optimal wk so that in each bucket,
wk correctly puts points on the right side
Can be solved optimally using a standard logistic classi
118. ne the side chosen as positive in
each bucket until convergence
We see in our experiments that our method provides much better
results than existing algorithms
Active Evaluation of Classi
120. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Our approach
Main step :
121. nd the best value of wk for a bit k, assuming
hyperplanes of other bits are
122. xed
For each bucket formed from remaining hyperplanes, arbitrarily
choose which side of hyperplane wk it wants to call +ve or -ve
Use logistic loss to
123. nd the optimal wk so that in each bucket,
wk correctly puts points on the right side
Can be solved optimally using a standard logistic classi
125. ne the side chosen as positive in
each bucket until convergence
We see in our experiments that our method provides much better
results than existing algorithms
Active Evaluation of Classi
127. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Our approach
Main step :
128. nd the best value of wk for a bit k, assuming
hyperplanes of other bits are
129. xed
For each bucket formed from remaining hyperplanes, arbitrarily
choose which side of hyperplane wk it wants to call +ve or -ve
Use logistic loss to
130. nd the optimal wk so that in each bucket,
wk correctly puts points on the right side
Can be solved optimally using a standard logistic classi
132. ne the side chosen as positive in
each bucket until convergence
We see in our experiments that our method provides much better
results than existing algorithms
Active Evaluation of Classi
134. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Our approach
Main step :
135. nd the best value of wk for a bit k, assuming
hyperplanes of other bits are
136. xed
For each bucket formed from remaining hyperplanes, arbitrarily
choose which side of hyperplane wk it wants to call +ve or -ve
Use logistic loss to
137. nd the optimal wk so that in each bucket,
wk correctly puts points on the right side
Can be solved optimally using a standard logistic classi
139. ne the side chosen as positive in
each bucket until convergence
We see in our experiments that our method provides much better
results than existing algorithms
Active Evaluation of Classi
141. Problem setup Accuracy estimation Results Summary References
Learning hyperplanes
Our approach
Main step :
142. nd the best value of wk for a bit k, assuming
hyperplanes of other bits are
143. xed
For each bucket formed from remaining hyperplanes, arbitrarily
choose which side of hyperplane wk it wants to call +ve or -ve
Use logistic loss to
144. nd the optimal wk so that in each bucket,
wk correctly puts points on the right side
Can be solved optimally using a standard logistic classi
146. ne the side chosen as positive in
each bucket until convergence
We see in our experiments that our method provides much better
results than existing algorithms
Active Evaluation of Classi
148. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
149. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
151. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
152. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
154. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
155. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
157. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
158. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
160. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
161. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
163. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
164. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
166. Problem setup Accuracy estimation Results Summary References
Estimating bucket weights without sequentially hashing D
^S =
P
b wb ^b = 1N
P
i2D ^h(fi )
Proposal sampling : ^Sq = 1N
1m
P
i2Q
^h(fi )
q(i)
Optimal q(i) / ^h(fi ) : impossible without assigning each i 2 D
to a bucket of h(:)
Allowed q(i ) are the ones which assign same probability to all
instances i within an index partition Du of D
Claim : Under above restriction, qu /
qP
b ^2
bp(bju) where
p(bju) is the fraction of i 2 Du with h(fi ) = b
p(bju) estimate : Initially depend on labeled data small static
sample. Re
167. ne estimate as more instances get sampled from Du
Active Evaluation of Classi
169. Problem setup Accuracy estimation Results Summary References
Results
Summary of datasets used
TableAnnote : Annotate columns of Web tables to type nodes of
an ontology
Spam : Classifying web-pages as spam or not
DNA : Binary DNA classi
170. cation task
HomeGround, HomePredicted : Dataset of (entity, web-page)
instances and decide if web-page was a homepage for the entity
Dataset # Size Accuracy (%)
Features Seed(L) Unlabeled(D) Seed(L) True(D)
TableAnnote 42 541 11,954,983 56.4 16.5
Spam 1000 5000 350,000 86.4 93.2
DNA 800 100,000 50,000,000 72.2 77.9
HomeGround 66 514 1060 50.4 32.8
HomePredicted 66 8658 13,951,053 83.2 93.9
Active Evaluation of Classi
172. Problem setup Accuracy estimation Results Summary References
Results
Comparison of estimation strategies on the TableAnnote dataset
Random Random Sampling
PropSample Sampling from proposal distribution (Sawade et al., 2010)
ScoreBins Strati
180. er scores
SeqProj hash Learn hyperplanes using method in Wang et al.
BRE hash Learn hyperplanes using method in Kulis and Darrell
Active Evaluation of Classi
186. Problem setup Accuracy estimation Results Summary References
Results
Comparison of methods of sampling from indexed data for estimating bucket weights
1.2%
1.0%
0.8%
0.6%
0.4%
0.2%
0.0%
500
1500
2500
1000
2500
20000
1000
2500
20000
TableAnnote HomePredicted DNA
Data sample size
Difference with full scan
Our
Uniform
Figure: Comparing methods of sampling from indexed data for estimating
bucket weights
Active Evaluation of Classi
188. Problem setup Accuracy estimation Results Summary References
Summary
1 Addressed the challenge of calibrating a classi
189. er's accuracy on
large unlabeled datasets given small amounts of labeled data and
a human labeler
2 Proposed a strati
190. ed sampling-based method for accuracy
estimation that provides better estimates than simple averaging
better selection of instances for labeling than random sampling
3 Between 15% and 62% relative reduction in error achieved
compared to existing approaches
4 Algorithm made scalable by proposing optimal sampling strategies
for accessing indexed unlabeled data directly
5 Close to optimal performance while reading three orders of
magnitude fewer instances on large datasets
Active Evaluation of Classi
192. Problem setup Accuracy estimation Results Summary References
Summary
1 Addressed the challenge of calibrating a classi
193. er's accuracy on
large unlabeled datasets given small amounts of labeled data and
a human labeler
2 Proposed a strati
194. ed sampling-based method for accuracy
estimation that provides better estimates than simple averaging
better selection of instances for labeling than random sampling
3 Between 15% and 62% relative reduction in error achieved
compared to existing approaches
4 Algorithm made scalable by proposing optimal sampling strategies
for accessing indexed unlabeled data directly
5 Close to optimal performance while reading three orders of
magnitude fewer instances on large datasets
Active Evaluation of Classi
196. Problem setup Accuracy estimation Results Summary References
Summary
1 Addressed the challenge of calibrating a classi
197. er's accuracy on
large unlabeled datasets given small amounts of labeled data and
a human labeler
2 Proposed a strati
198. ed sampling-based method for accuracy
estimation that provides better estimates than simple averaging
better selection of instances for labeling than random sampling
3 Between 15% and 62% relative reduction in error achieved
compared to existing approaches
4 Algorithm made scalable by proposing optimal sampling strategies
for accessing indexed unlabeled data directly
5 Close to optimal performance while reading three orders of
magnitude fewer instances on large datasets
Active Evaluation of Classi
200. Problem setup Accuracy estimation Results Summary References
Summary
1 Addressed the challenge of calibrating a classi
201. er's accuracy on
large unlabeled datasets given small amounts of labeled data and
a human labeler
2 Proposed a strati
202. ed sampling-based method for accuracy
estimation that provides better estimates than simple averaging
better selection of instances for labeling than random sampling
3 Between 15% and 62% relative reduction in error achieved
compared to existing approaches
4 Algorithm made scalable by proposing optimal sampling strategies
for accessing indexed unlabeled data directly
5 Close to optimal performance while reading three orders of
magnitude fewer instances on large datasets
Active Evaluation of Classi
204. Problem setup Accuracy estimation Results Summary References
Summary
1 Addressed the challenge of calibrating a classi
205. er's accuracy on
large unlabeled datasets given small amounts of labeled data and
a human labeler
2 Proposed a strati
206. ed sampling-based method for accuracy
estimation that provides better estimates than simple averaging
better selection of instances for labeling than random sampling
3 Between 15% and 62% relative reduction in error achieved
compared to existing approaches
4 Algorithm made scalable by proposing optimal sampling strategies
for accessing indexed unlabeled data directly
5 Close to optimal performance while reading three orders of
magnitude fewer instances on large datasets
Active Evaluation of Classi
212. ers at web-scale. In CIKM.
Druck, G. and McCallum, A. (2011). Toward interactive training and
evaluation. In CIKM.
Kulis, B. and Darrell, T. (2009). Learning to hash with binary
reconstructive embeddings. In NIPS.
Sawade, C., Landwehr, N., Bickel, S., and Scheer, T. (2010). Active
risk estimation. In ICML.
Wang, J., Kumar, S., and Chang, S. (2010). Sequential projection
learning for hashing with compact codes. In ICML.
Active Evaluation of Classi