1. Inconsistency
and
Outliers
Ac#ve
Learning
by
Outlier
Detec#on
Inconsistency
Robustness
Symposium
2011
Neil
Rubens
Assistant
Professor
University
of
Electro-‐Communica#ons
Tokyo,
Japan
2. Outline
Inconsistency
Robustness
is
a
mul#-‐disciplinary
issue.
We
discuss
some
of
the
aspect
of
Inconsistency
Robustness
from
the
perspec#ve
of
Machine
Learning:
• What
is
Inconsistency
• Can
Inconsistency
be
Useful
• Measuring
Inconsistency
4. Outlier
Types
• Spa#al
Outlier
– unlabeled
data
Our
Focus
• Model
Outlier
– labeled
data
5. Causes
of
Outliers
• Faulty
data
– Entry
error,
malfunc#on,
etc.
• Chance/Devia#on
• Incorrect
Model
Our
Focus
hQp://www.dkimages.com/discover/previews/
852/20223083.JPG
6. Typical
Treatment
of
Outliers
• Assume
that
the
learned
model
is
correct
and
discard
points
that
don’t
agree
with
the
model
7. Our
Focus
Atypical
Treatment
of
Outliers
• Assume
that
data
is
right,
and
that
the
model
is
wrong
8. some tweaking. How
some tweaking. However, if
Moreover obtaining label
it should be changed signi
beled data is needed for per
labeled data is large enoug
problem as impractical. Wh
incompatability and keep m
Due to abundance of data
labeled data is rather scarc
Obtaining Data could be “COSTLY”be change
additional labeled data as to
it should
assumption that the current
incompatability and
Medicine: —
diagnosis: pain, time, $ x1
x2
drug discovery: $$$, time y
Practicality:
.
User Interaction: b
y
effort, time —
focus).
Practicality:
—
—
x1
x2
b
Due to abundance
y
y
Expertise Elicitation:
.
problem as impractic
$, time labeled data is rathe
labeled data is large
additional labeled dat
focus).
Moreover obtainin
beled data is needed
–
9. — — if some
problem as descent ... (except the number ofdata descent ...
x2 issue is exhorbated, in al settins This issue... exhor
outliers,issuemight be discarding most outliers,issue... exhor
it should be changed significantly; instead of be changet
it should be changed significantly; instead of be changet
gradient impractical. While the ulabeled samples we c
focus). Say why it’s an interesting problem: Say why of t
some tweaking. How
some tweaking. How
some tweaking. However, if the current model is inaccura
additional is inaccura
problem as impractic
gradient impractic
This phenomena occurs frequently during phenomena o
This phenomena occurs frequently during phenomena o
This we is exhorbated, in al settins This we might be
additional labeled data as to enable personalization (a comm
additional labeled problem: enable personalization (a comm
of mac
problem as impractical. While the ulabeled data is abunda
problem as is abunda
incompatability and keep making minor Moreover obtainin
Moreover obtainin
– d) Say what fol
[2]. learned model and/or existing data is refered to asa
ronment in which w
some tweaking. However, if the current model labeled dat
a) State the dat
of non-stationary en
the goal of machine learning isoftonon-stationary en
additional labeled pro
assumption is large
labeled be is large
overal, the is rathe
labeled very is rathe
focus). Not all it’s
of The learning process [7], [6], or in aThe learning accur
assumption that the j
Due to abundance of data; one may mistakenly dismiss t
labeled data is large enough; there may stilldata a need j
Due to abundance of data; one may mistakenly dismiss t
informative data poi
assumption that the current model is accurate, and requires c
assumption that the current model is accurate, and requires c
ronment in which changes may occur in y underlying mo
predictive model mo
– d) Say what follows from your solution: If we disc
labeled data is rather scarce. Even iflabeled data amount
labeled data is rather scarce. Even ifmake is data amount
ronment in which ch
which changes data. Data in the inconsistentch
beled data is needed for personaliization— data is needed
c) Say needed for personaliization c) Say needed
of .the learning process [7], [6], or in a.the learning proc
the goalan proc
This the early sta
====the early sta
beled data iswhat your solution achieves: ... data iswhat yo
Due to abundance
Due to abundance
ronment inmodel from the may occur that is underlying fro
overal, the small)
x1 x1 more info
incompatability and
incompatability and
labeled data is large enough; there may stilldata a need
it should ignoring
it should ignoring
Moreover obtaining labeled data could be expensive.
outliers expensive.
labeled bethat the
the learned model
x2 x2 is rather
Contributions
Contributions
Moreover obtaining labeled data could be are bad
y y informatio
x2 which is
in which is
Practicality:
Practicality:
. . and consis
b
y b
y
incompatability and keep making minor tweaks.
tweaks.
the outcom
learn
— — –
b)
outlier.
This
focus).
Practicality: Practicality:
the
beled ...
beled
——
—
—
—
—
—
x1
x2
–1
in
b
b
x
Due to abundance of data; one mayDue to abundance of data; one may m
mistakenly dismiss this
–
–
y
y
y
[2].
[2].
.
outliers
problem as impractical. While the ulabeled as impractical. While the ulab
problem data is abundant,
tal to learn
labeled data is rather scarce. Evenlabeled data is amountscarce. Even if
if overal, the rather of
unless o
labeled data is large enough; there may still be alarge enough; there ma
labeled data is need for
anomaly d
additional labeled data as tomodel isadditional and requires just model isperso
assumption that the current enable personalization (a common enable accu
accurate, labeled data as to
assumption that the current active le
focus). focus). ——
some tweaking. However, if the current tweaking. inaccurate, if the curren
some model is However, learning. t
should be changed labeled data instead of ignoring significantly; ofte
Moreover changed labeled AL: cou
it Moreover obtaining significantly; could bebe obtaining La- needs toins
it should expensive. the data b
data as to
Unlabeled Data
beled data is needed keeppersonaliization tweaks. neededkeep personaliizatio
incompatability and for making minor ... and are ig
incompatability and for making minor
beled data is
Sampling
—
–
— —– indeed con
if some
This issue is exhorbated, in al settins in issue is... http://je
This which exhorbated, in al settins
make is very small)
x1 x1 more info
This phenomena occurs frequently x2
x2 during the early stages new-physi
This phenomena occurs frequently d
is rather
a) State the
2. Bad
Contributions
Contributions
of y the learning process [7], [6], orof ythe non-stationary envi- [6], or in
in a learning process [7], informatio
outliers are bad
Practicality:
Practicality:
ronment in which changes may occur .in the underlying model may occur in
. ronment in which changes data consis
and includ
[2]. b
y [2].
b
y profession
the outcom
predictive
–
— —– this here:I
====
–
b)
outlier.
This
focus).
Contributions
Practicality: Contributions
Practicality: May Be G
——
—
—
—
—
—
—
x1
x2
x1
gradient descent ... (exceptone mayDue to abundance... (exceptVersion of
Due to abundance of data; the number of samples we this one may m
gradient descent ofcan
mistakenly dismiss data; the numb
b
b
–
–
–
y
y
y
y
[2].
.
the
outliers
make is very small)
problem as impractical. While the problem very small)
make is data is abundant, —
ulabeled as impractical. While the ulab
tal to learn
labeled data is rather scarce. Even if—
— labeled datathe rather scarce. Even if o
overal, is amount of
unless o
labeled State is large enough; there may still bethe needenough; there ma
a) data the problem: labeled data is a problem:
a) State large for
b) Say why data as interesting problem:labeled it’s an to anomaly d
additional labeled it’s an to enable personalization (a common enable perso
additional Not all of as interesting pro
b) Say why data the
outliers are bad
focus). focus). are bad
outliers ——
c) Say obtaining labeled achieves: be what your labeled data ofte type of
Moreover obtaining solution AL: cou
Moreover what your solution data could Sayexpensive. La-
c) achieve
Multiple Hypothesis Hypothesis/Model data is If we follows from f (x, ✓)
beled data iswhat follows from your solution: needed for personaliization
d) Say needed for personaliizationd) Say
beled ... Selection what discard and are ig
your so
10. assumption that the c
assumption that the current model is accurate, and requires jus
some tweaking. However, if the currentsome tweaking. Ho
model is inaccurate
it should be change
it should be changed significantly; instead of ignoring th
incompatability and keep making minorincompatability and
tweaks.
— —
x1 x1
x2 x2
y y
. .
y y
— —
Practicality: Practicality:
b b
.
.
Due to abundance
Due to abundance of data; one may mistakenly dismiss thi
[2].
y
y
–
Little is learned –
y
y
–
b
b
x2
x1
—
—
—
x1
—
—
problem as impractical. While the ulabeled data as abundant
problem is impracti
the
focus).
c)
====
labeled data is rathe
labeled data is rather scarce. Even if overal, the amount o
labeled data is large
with some data
(irregardless of the output values)
labeled data is large enough; there may still be a need fo
Consistent Sample
Inconsistent Sample
additional labeled data as to enable personalization labeled da
additional (a commo
Practicality:
Practicality:
beled # of hypotheses
focus). focus).
Will not agreebeled of the hypotheses
Contributions
Moreover obtaining labeled data could Moreover obtaini
be expensive. La
additional labeled
assumption that the current model is accurate, and requires jus
Due very small)
...
some tweaking. However, if the current beled data inaccurate
beled data is needed for personaliization model is is needed
– –
it should be changed significantly; instead of ignoring th
which ...
incompatability and keep making settins tweaks.issue is exho
This issue is exhorbated, in al minor inThis
a) data the problem:
— This phenomena
This phenomena occurs frequently during the early stage
non-stationary envi
ofxthe learning process [7], [6], or in aof the learning proc
1
ronment in which changes may occur in ronment in which ch
x2 the underlying mode
[2].
y [2].
.– –
yContributions Contributions
—gradient descent ... (except the number gradient descent ..
of samples we ca
Does not allow to reducedata be needed for personaliization ...
make is very small)
outliers, weis needed for personaliization ...
make is very small)
Practicality:
b
— —
.
Due to abundance of data; one may mistakenly dismiss thi
y
y
–
b
x2
—
—
incompatability and keep making minor tweaks.
problem State the problem:
a) as impractical. While the ulabeled data State the pro
a) is abundant
the
focus). Say what your solution achieves: focus).
labeled
labeled data why it’s an interesting if overal, the amountit’s
b) Say is rather scarce. Even problem: Not all of th
b) Say why o
This issue is exhorbated, in al settins in which ...
labeled data bad large enough; there may still be a bad fo
outliers are is outliers are need
Inconsistent Sample
c) Say what your solution achieves:
additional labeled data as to enable personalization (a what yo
c) Say commo
beled data is
samples we
d) d) Say what fo
focus). Say what follows from your solution: If we discar
outliers, we might b
outliers, we might be discarding most informative data point
Moreover obtaining labeled data could be expensive. La
d) Say obtaining labeled your solution: If we discard
labeled Say whylarge an interesting problem: Notaall of the
of x2 learning process [7], [6], or in a non-stationary envi-
be expensive. La-
additional labeled data as to enable personalization (a common
labeled data is large enough; there may still be a need for
labeled data is rather scarce. Even if overal, the amount of
problem as impractical. While the ulabeled data is abundant,
Due to abundance of data; one may mistakenly dismiss this
Moreover what follows from data could be expensive. La-
b) data is it’s enough; there may still be data is for
labeled amount of
assumption that the
====
beled data is needed for personaliization ...====
Due to abundance
Number of hypotheses is reduced needed
–The goal of machine learning is to The goal accurat
learn an of ma
predictive model from the data. Data that is inconsistent wit
This issue is exhorbated, in al settins predictive...
in which model fro
–
–
—
—
the learned model occurs frequently duringlearned model a
This phenomena and/or existing data is refered to stage
the the early as
outlier.
——
——
——
of the learning process [7], [6], or in a outlier.
non-stationary envi
active
f (x, ✓)
— —
ronment in which changes may occur in the underlying mode
[2].Learned model is often assumed to be Learned model cor
approximately is
problem as impractical. While the ulabeled data is as impractical. While the
and consisten
and consisten
outliers are
outliers are
May Be Good
professionally
unless obje
if some po
assumption that the current model is accurate, and requires just current model is
make isto abundance of data; one may mistakenly dismiss this of data; one m
it should is changed significantly; instead should be changed significantly
This phenomena occurs frequently during x1 early stages new-physics.h
the outcomes)
—— learn
labeled State is rather scarce. Even if overal, the data is rather scarce. Even
tal to learning
Moreover obtaining labeled data could some tweaking. However, if the cu
is rather limi
Moreover obtaining labeled data
more informa
AL: often c
might be discarding most informative data points for personaliiz
the here:It Tu
this outcomes)
type of outl
problem abundant, tal to learning
rect, therefore using
it of ignoring the needs to be la
rather limi
more informa
AL: often c
ronment in which changes may occur in the underlying model data including
is 2. Bad data
if some poi
information is
anomaly detec
need large enough; there
incompatability and keep making m
some tweaking. However, if the current model is inaccurate, learning. typic
indeed contain
information is
outliers are bad data as to enable personalization (a common anomaly detec
additional labeled data as to enable p
and are ignore
gradient descent ... (except the number of Practicality: can Version of Tru
indeed contain
http://jeffjon
unless objec
and are ignore
13. Model Selection
(a) under-fit (b) over-fit (c) appropriate fit
Figure 8: Dependence between model complexity and accuracy.
If
there
is
no
inconsistency
between
the
training
and
tes#ng
data
then
the
most
complex
model
would
tend
be
selected.
14. Change
Detec#on
/
Model
Correc#on
Is
inconsistency
caused
by
noise
(or
minor
factors)
or
by
changes
in
the
underlying
model
– Applica#ons:
medical
diagnos#cs,
intrusion
detec#on,
network
analysis,
finance
hQp://www.sa#magingcorp.com/galleryimages/high-‐resolu#on-‐landsat-‐satellite-‐imagery-‐oman.jpg
15. Conclusion
• Inconsistency
could
be
useful
for:
– Hypothesis
Learning
– Model
Selec#on
– Model
Correc#on
Neil
Rubens
Assistant
Professor
Ac#ve
Intelligence
Group
Laboratory
for
Knowledge
Compu#ng
University
of
Electro-‐Communica#ons
Tokyo,
Japan
hQp://Ac#veIntelligence.org