Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction

Estevam
R.
Hruschka
Jr.

Federal
University
of
São
Carlos

Machine Reading the Web:
Beyond Named Entity
Recognition and Relation
Extraction

Disclaimers

•  Previous
versions
of
this
tutorial
were
presented
at

IBERAMIA2012
(h@p://iberamia2012.dsic.upv.es/tutorials/)

and
WWW2013
(h@p://www2013.org/program/machine-‐
reading-‐the-‐web/).
Also,
a
short
version
was
presented
at

ECMLPKDD2015
Summer
School(h@p://
www.ecmlpkdd2015.org/summer-‐school/ss-‐schedule).

•  Feel
free
to
e-‐mail
me
(estevam.hruschka@gmail.com)
with

quesTons
about
this
tutorial
or
any
feedback/suggesTons/
criTcisms.
Your
feedback
can
help
improving
the
quality
of

these
slides,
thus,
they
are
very
welcome.

•  As
in
many
tutorials’
slides,
these
slides
were
prepared
to
be

presented,
and
la@er
studied.
Thus,
they
are
meant
to
be

more
self-‐contained
than
slides
from
a
paper
presentaTon.

Disclaimers

•  Due
to
Tme
constraints,
I
do
not
intend
to
cover
all
the

algorithms
and
publicaTons
related
to
YAGO,
KnowItAll,
NELL

and
DBPedia.
What
I
do
intend,
instead,
is
to
give
an
overview

of
all
four
projects
and
what
is
the
main
approach
to
“Read

the
Web”,
used
in
each
project.

•  YAGO,
KnowItAll,
NELL
and
DBPedia
are
not
the
only
research

eﬀorts
focusing
on
“Reading
the
Web”.
They
were
selected,

to
be
presented
in
this
tutorial,
because
they
represent
four

diﬀerent
and
very
relevant
approaches
to
this
problem,
but
it

does
not
mean
they
are
the
best
(or
the
only
relevant)
ones

at
all.

Outline

•  Machine
Learning

•  Machine
Reading

•  Reading
the
Web

– YAGO

– KnowItAll

– NELL

– DBPedia

Picture
taken
from
[Fern,
2008]

Picture
taken
from
[DARPA,
2012]

Outline

•  Machine
Learning

•  Machine
Reading

•  Reading
the
Web

– DBPedia

– YAGO

– KnowItAll

– NELL

The
YAGO-‐NAGA
Project:

HarvesAng,
Searching,
and
Ranking

Knowledge
from
the
Web

KnowItAll:
Open
InformaTon
ExtracTon

Machine
Learning

•  What
is
Machine
Learning?

The
ﬁeld
of
Machine
Learning
seeks
to
answer

the
quesTon

“How
can
we
build
computer
systems
that

automaTcally
improve
with
experience,
and

what
are
the
fundamental
laws
that
govern
all

learning
processes?”
[Mitchell,
2006]

Machine
Learning

•  What
is
Machine
Learning?

a
machine
learns
with
respect
to
a
parTcular:

-‐  task
T

-‐  performance
metric
P

-‐  type
of
experience
E

if
the
system
reliably
improves
its
performance
P
at

task
T,
following
experience
E.
[Mitchell,
1997]

Machine
Learning

•  Examples
of
Machine
Learning
approaches

for
diﬀerent
tasks
(T),
performance
metrics

(P)
an
experiences
(E)

-‐  data
mining

-‐  autonomous
discovery

-‐  database
updaTng

-‐  programming
by
example

-‐  Pa@ern
recogniTon

Machine
Learning

•  Supervised
Learning;

•  Unsupervised
Learning

•  Semi-‐Supervised
Learning

Supervised
Learning

(one
simple
anecdotal
approach)

0

5

10

15

20

25

0
5
10
15
20
25

Series1

Series2

0

5

10

15

20

25

0
5
10
15
20
25

Series1

Series2

Supervised
Learning

(one
simple
anecdotal
approach)

0

5

10

15

20

25

0
5
10
15
20
25

Series1

Series2

Supervised
Learning

(one
simple
anecdotal
approach)

?????????

What
model

should
be

chosen?

?????????

0

5

10

15

20

25

0
5
10
15
20
25

Unsupervised
Learning

(one
simple
anecdotal
approach)

0

5

10

15

20

25

0
5
10
15
20
25

?????????

What
model

should
be

chosen?

?????????

Unsupervised
Learning

(one
simple
anecdotal
approach)

0

5

10

15

20

25

0
5
10
15
20
25

Series1

Series2

Unlabeled

Semi-‐supervised
Learning

(one
simple
anecdotal
approach)

0

5

10

15

20

25

0
5
10
15
20
25

Series1

Series2

Unlabeled

Semi-‐supervised
Learning

(one
simple
anecdotal
approach)

?????????

What
model

should
be

chosen?

?????????

Machine
Reading

•  “The
autonomous
understanding
of

text”
[Etzioni
et
al.,
2007]

•  “One
of
the
most
important
methods
by
which

human
beings
learn
is
by
reading”
[Clark
et
al.,

2007],
thus
why
not
building
machines
capable

of
learning
by
reading?

Machine
Reading

•  “The
problem
of
deciding
what
was
implied
by
a

wri@en
text,
of
reading
between
the
lines
is
the

problem
of
inference.”
[Norvig,
2007]

•  Typically,
Machine
Reading
is
diﬀerent
from

Natural
Language
Processing
alone

It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading

This
slide
was
adapted
from
[Hady
et
al.,
2011]

Machine
Reading

same

This
slide
was
adapted
from
[Hady
et
al.,
2011]


Machine
Reading

same

same
same

same

same

same

This
slide
was
adapted
from
[Hady
et
al.,
2011]

Machine
Reading

same

same
same

same

same

same

uncleOf

owns

hires

headOf

This
slide
was
adapted
from
[Hady
et
al.,
2011]

Machine
Reading

same

same
same

same

same

same

uncleOf

owns

hires

headOf

aﬀairWith

aﬀairWith

enemyOf

This
slide
was
adapted
from
[Hady
et
al.,
2011]

Machine
Reading

•  One
important
(ini6al)
approach
to
machine

reading
is
to
extract
facts
from
text
and
store

them
in
a
structured
form.

•  Facts
can
be
seen
as
enTTes
and
their

relaTons

•  Ontology
is
one
of
the
most
common

representaTon
for
the
extracted
facts

Machine
Reading

•  Named
EnTty
ResoluTon/RecogniTon

•  RelaTon
ExtracTon

•  Co-‐reference
and
Polysemy
ResoluTon

•  RelaTon
Discovery

•  Inference

•  Knowledge
Base

•  Document/Sentence
Understanding
(Micro-‐
Reading)

Machine
Reading

•  Named
EnTty
ResoluTon/RecogniTon

–  Semi-‐structured
data

The
“Low-‐Hanging
Fruit”

•  Wikipedia
infoboxes
&
categories

•  HMTL
lists
&
tables,
etc.

–  Free
text

•  Hearst-‐pa@erns;
clustering
by
verbal
phrases

•  Natural-‐language
processing

•  Advanced
pa@erns
&
iteraTve
bootstrapping

(“Dual
IteraTve
Pa@ern
RelaTon
ExtracTon”)

Named
EnTty
RecogniTon

•  Named
EnTty
RecogniTon
[Nadeau
&
Sekine,

2007]

– term
“Named
EnTty”
coined
for
the
Sixth
Message

Understanding
Conference
(MUC-‐6)
(R.
Grishman

&
Sundheim
1996).

– important
sub-‐tasks
of
IE
called
“Named
EnTty

RecogniTon
and
ClassiﬁcaTon
(NERC)”.

•  recognize
informaTon
units
like
names,

including
person,
organizaAon
and
locaAon

names,
and
numeric
expressions
including

Ame,
date,
money
and
percent
expressions.

•  In
Machine
Reading,
many
other
enTTes:

product,
kitchen
item,
sport,
etc.

Named
EnTty
RecogniTon

[Nadeau
&
Sekine,
2007]

•  Named
EnTty
ResoluTon
[Theobald
&
Weikum,

2012]

– Which
individual
enTTes
belong
to
which
classes?

•  instanceOf
(Surajit
Chaudhuri,
computer
scien6sts),

•  instanceOf
(BarbaraLiskov,
computer
scien6sts),

•  instanceOf
(Barbara
Liskov,
female
humans),
…

Named
EnTty
ResoluTon

•  Named
EnTTes
RecogniTon
as
a
machine

learning
task.

– Supervised
Learning

NLP
tools

(POS,
Parse

Trees)

text

Features

ExtracTon

Classiﬁer

Named
EnTty
RecogniTon

•  Named
EnTty
RecogniTon
as
a
Machine
Learning
task.

–  Supervised
Learning

–  Possible
features
[RaTnov
&
Roth,
2009],
[Khambhatla,

2004],
[Zhou
et.
al.
2005]

• 
Words
“around”
and
including
enTTes

•  POS
(Part-‐Of-‐Speech)

•  Preﬁxes
and
suﬃxes

•  CapitalizaTon

•  Number
of
words

•  Number
of
characters

•  First
word,
last
word

•  gaze@eer
matches

Named
EnTty
RecogniTon

•  Supervised
Learning

NLP
tools

(POS,
Parse

Trees)

text

Features

ExtracTon
Classiﬁer

Named
EnTty
RecogniTon

•  Supervised
Learning

NLP
tools

(POS,
Parse

Trees)

text

Features

ExtracTon
Classiﬁer

Kernels

Named
EnTty
RecogniTon

•  Supervised
Learning
using
Kernels

– A
Kernel
deﬁnes
similarity
implicitly
in
a
higher

dimensional
space

– Can
be
based
on
Strings,
Word
Sequences,
Parse

Trees,
etc.

•  For
strings
similarity∝
number
of
common
substrings

(or
subsequences)

•  Recommended
reading
on
string
kernels
[Lodhi
et.
al.,

2002]

Named
EnTty
RecogniTon
[Bach
&
Badaskar,
2007]

[Bach
&
Badaskar,
2007]

Named
EnTty
RecogniTon

•  Semi-‐supervised

Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

Set
of

labeled
Pa@ern

Examples

Named
EnTty
RecogniTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in

is
the
CEO
of
X

Named
EnTty
RecogniTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in

is
the
CEO
of
X

Named
EnTty
RecogniTon

Set
of
labeled
Instances


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in

is
the
CEO
of
X

Named
EnTty
RecogniTon

Set
of
labeled
Instances


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in

is
the
CEO
of
X

Named
EnTty
RecogniTon

Google

Apple

Set
of
labeled
Instances


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in

is
the
CEO
of
X

Named
EnTty
RecogniTon

Google

Apple

NE

Pa@ern

Classiﬁer

Set
of
labeled
Instances


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in

is
the
CEO
of
X

Named
EnTty
RecogniTon

Google

Apple

NE

Pa@ern

Classiﬁer

What
about

unsupervised?

Set
of
labeled
Instances


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

Named
EnTty
RecogniTon

NE

Pa@ern

Classiﬁer

What
about

unsupervised?

Set
of
labeled
Instances

•  Unsupervised

Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
NE
instances.

NE

Instances

Classiﬁer

Set
of

labeled
Pa@ern

Examples

Named
EnTty
RecogniTon

NE

Pa@ern

Classiﬁer

•  [RaTnov
&
Roth,
2009]

Named
EnTty
RecogniTon

•  [Pennington
&
Socher
&
Manning,
2014]

Named
EnTty
RecogniTon

Machine
Reading

•  Named
EnTty
ResoluTon/ExtracTon

•  RelaTon
ExtracTon

and
Polysemy
ResoluTon

•  RelaTon
Discovery

•  Inference

•  Knowledge
Base
RepresentaTon

Understanding
(Micro-‐
Reading)

Machine
Reading

•  RelaTon
ExtracTon

–  Semi-‐structured
data

The
“Low-‐Hanging
Fruit”

•  Wikipedia
infoboxes
&
categories

•  HMTL
lists
&
tables,
etc.

–  Free
text

•  Hearst-‐pa@erns;
clustering
by
verbal
phrases

•  Natural-‐language
processing

•  Advanced
pa@erns
&
iteraTve
bootstrapping

(“Dual
IteraTve
Pa@ern
RelaTon
ExtracTon”)

Machine
Reading

•  RelaTon
ExtracTon
[Theobald
&
Weikum,
2012]

–  Which
instances
(pairs
of
individual
enTTes)
are
there

for
given
binary
relaTons
with
speciﬁc
type

signatures?

•  hasAdvisor
(JimGray,
MikeHarrison)

•  hasAdvisor
(HectorGarcia-‐Molina,
Gio
Wiederhold)

•  hasAdvisor
(Susan
Davidson,
Hector
Garcia-‐Molina)

•  graduatedAt
(JimGray,
Berkeley)

•  graduatedAt
Stanford)

•  hasWonPrize
(JimGray,
TuringAward)

•  bornOn
(JohnLennon,
9Oct1940)

•  diedOn
(JohnLennon,
8Dec1980)

•  marriedTo
(JohnLennon,
YokoOno)

RelaTon
ExtracTon

•  ExtracTng
semanTc
relaTons
between
enTTes

in
text

•  RelaTon
extracTon
as
a
Machine
Learning
task.

– Supervised
Learning

NLP
tools

(POS,
Parse

Trees)

text

Features

ExtracTon

Classiﬁer

[Bach
&
Badaskar,
2007]

RelaTon
ExtracTon

•  RelaTon
extracTon
as
a
Machine
Learning
task.

–  Supervised
Learning

–  Possible
features
[Khambhatla,
2004],
[Zhou
et.
al.

2005]

• 
Words
between
and
including
enTTes

•  Types
of
enTTes
(person,
locaTon,
etc)

•  Number
of
enTTes
between
the
two
enTTes,
whether
both

enTTes
belong
to
same
chunk

•  #
words
separaTng
the
two
enTTes

•  Path
between
the
two
enTTes
in
a
parse
tree

[Bach
&
Badaskar,
2007]

RelaTon
ExtracTon

•  ExtracTng
semanTc
relaTons
between
enTTes

in
text

•  RelaTon
extracTon
as
a
classiﬁcaTon
task.

– Supervised
Learning

NLP
tools

(POS,
Parse

Trees,
NER)

text

Features

ExtracTon

Classiﬁer

[Bach
&
Badaskar,
2007]

RelaTon
ExtracTon

•  ExtracTng
semanTc
relaTons
between
enTTes

in
text

•  RelaTon
extracTon
as
a
classiﬁcaTon
task.

– Supervised
Learning

NLP
tools

(POS,
Parse

Trees,
NER)

text

Features

ExtracTon

Classiﬁer

Kernels

[Bach
&
Badaskar,
2007]

RelaTon
ExtracTon

•  Supervised
Learning
using
Kernels

– A
Kernel
deﬁnes
similarity
implicitly
in
a
higher

dimensional
space

– Can
be
based
on
Strings,
Word
Sequences,
Parse

Trees,
etc.

•  For
strings,
similarity∝
number
of
common
substrings

(or
subsequences)

•  Recommended
reading
on
string
kernels
[Lodhi
et.
al.,

2002]

[Bach
&
Badaskar,
2007]

RelaTon
ExtracTon
[Bach
&
Badaskar,
2007]

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of

labeled
Pa@ern

Examples

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in
Y

Y
is
the
headquarter
of
X

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in
Y

Y
is
the
headquarter
of
X

Pair
of

Instances

Classiﬁer

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in
Y

Y
is
the
headquarter
of
X

Pair
of

Instances

Classiﬁer

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Pair
of

Instances

Classiﬁer

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

X
is
headquartered
in
Y

Y
is
the
headquarter
of
X

Google-‐Mountain
View

Apple-‐CuperAno

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

X
is
headquartered
in
Y

Y
is
the
headquarter
of
X

Google-‐Mountain
View

Apple-‐CuperAno

Pair
of

Instances

Classiﬁer

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

X
is
headquartered
in
Y

Y
is
the
headquarter
of
X

Google-‐Mountain
View

Apple-‐CuperAno

Pair
of

Instances

Classiﬁer

What
about

unsupervised?

RelaTon
ExtracTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

Pair
of

Instances

Classiﬁer

What
about

unsupervised?

RelaTon
ExtracTon

•  Unsupervised

Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

Pair
of

Instances

Classiﬁer

•  Supervised
learning
[Bunescu
&
Mooney,
2005]

•  Distant
and
ParTal
Supervised
[Angeli
&

Tibshirani
&
Wu
&
Manning,
2014]

RelaTon
ExtracTon

Co-‐Reference
and
Polysemy
ResoluTon

•  Co-‐reference:
expressions
that
refer
to
the

same
enTty

Example
(ﬁgure)
taken
from:
h@p://nlp.stanford.edu/projects/coref.shtml

Co-‐Reference
and
Polysemy
ResoluTon

expressions
that
refer
to
the

same
enTty

Example
(ﬁgure)
taken
from:
h@p://nlp.stanford.edu/projects/coref.shtml

within-document co-reference

Co-‐Reference
and
Polysemy
ResoluTon

expressions
that
refer
to
the

same
enTty

Example
(ﬁgure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]

apple

computer

Apple

Computer

Co-‐Reference
and
Polysemy
ResoluTon

expressions
that
refer
to
the

same
enTty

Example
(ﬁgure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]

apple

apple

computer

Apple

Computer

Co-‐Reference
and
Polysemy
ResoluTon

expressions
that
refer
to
the

same
enTty

Example
(ﬁgure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]

apple

apple

computer

Apple

Computer

cross-document co-reference

Co-‐Reference
and
Polysemy
ResoluTon

expressions
that
refer
to
the
same

enTty

•  Which
names
denote
which
enTTes?
[Theobald

&
Weikum,
2012]

–  means
(“Lady
Di“,
Diana
Spencer),

–  means
(“Diana
Frances
Mountba@en-‐Windsor”,
Diana

Spencer),
…

–  means
(“Madonna“,
Madonna
Louise
Ciccone),

–  means
(“Madonna“,
Madonna(painTng
by
Edward

Munch)),
…

cross-document co-reference

Co-‐Reference
and
Polysemy
ResoluTon

•  Polysemy:
is
the
capacity
for
a
sign
(such
as
a

word,
phrase,
or
symbol)
to
have
mulTple

meanings
[Wikipedia]

Co-‐Reference
and
Polysemy
ResoluTon

•  Polysemy:
is
the
capacity
for
a
sign
(such
as
a

word,
phrase,
or
symbol)
to
have
mulTple

meanings
[Wikipedia]

Example
(ﬁgure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]

apple

apple

(the
fruit)

Apple

Computer

Co-‐Reference
and
Polysemy
ResoluTon

•  Co-‐Reference
and
Polysemy

Example
(ﬁgure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]

apple

apple

computer

apple

(the
fruit)

Apple

Computer

Co-‐Reference
and
Polysemy
ResoluTon

and
Polysemy:

– Supervised
Learning

NLP
tools

(POS,
Parse

Trees)

text

Features

ExtracTon

Classiﬁer


ResoluTon.

– Supervised

Learning

– Possible

features

[Bengtson
&

Roth,
2008]

Co-‐Reference
and
Polysemy
ResoluTon

Co-‐Reference
and
Polysemy
ResoluTon

and
Polysemy:

– Supervised
Learning

NLP
tools

(POS,
Parse

Trees)

text

Features

ExtracTon

Classiﬁer

Kernels

•  Supervised
Learning
using
Kernels

– A
Kernel
deﬁnes
similarity
implicitly
in
a
higher

dimensional
space

– Can
be
based
on
Strings,
Word
Sequences,
Parse

Trees,
etc.

•  For
strings
similarity∝
number
of
common
substrings

(or
subsequences)

•  Recommended
reading
on
string
kernels
[Lodhi
et.
al.,

2002]

Co-‐Reference
and
Polysemy
ResoluTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of

labeled
Pa@ern

Examples

Co-‐Reference
and
Polysemy
ResoluTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of

labeled
Pa@ern

Examples

X
also
know
as
Y

Co-‐Reference
and
Polysemy
ResoluTon


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Set
of

labeled
Pa@ern

Examples

Pair
of

Instances

Classiﬁer

Co-‐Reference
and
Polysemy
ResoluTon

X
also
know
as
Y

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pair
of

Instances

Classiﬁer


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Co-‐Reference
and
Polysemy
ResoluTon

X
also
know
as
Y

Pair
of

Instances

Classiﬁer

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Apple
Computer
-‐

Apple


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Co-‐Reference
and
Polysemy
ResoluTon

X
also
know
as
Y

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

Pair
of

Instances

Classiﬁer


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Co-‐Reference
and
Polysemy
ResoluTon

Apple
Computer
-‐

Apple

X
also
know
as
Y

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

Pair
of

Instances

Classiﬁer


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Co-‐Reference
and
Polysemy
ResoluTon

Apple
Computer
-‐

Apple

X
also
know
as
Y

What
about

unsupervised?

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

Pair
of

Instances

Classiﬁer


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Co-‐Reference
and
Polysemy
ResoluTon

What
about

unsupervised?

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

Pa@ern

Classiﬁer

Pair
of

Instances

Classiﬁer


Approaches

– Bootstrap
can

generate
a
large

number
of
pa@erns

and
relaTon

instances.

Co-‐Reference
and
Polysemy
ResoluTon

ResoluTon:

[Singh
et
al.,
2011],
[Krishnamurthy
&
Mitchell,

2011],[Du@a
&
Weikum,
2015]

•  Polysemy
ResoluTon:

[Krishnamurthy
&
Mitchell,
2011],
[Galárraga
et

al.,

2014]

Co-‐Reference
and
Polysemy
ResoluTon

Machine
Reading

•  Named
EnTty
ResoluTon/ExtracTon

•  RelaTon
ExtracTon

and
Synonym
ResoluTon

•  RelaTon
Discovery

•  Inference

•  Knowledge
Base
RepresentaTon

Understanding
(Micro-‐
Reading)

Machine
Reading

•  RelaTon
Discovery

– Which
new
relaTons
are
there
for
given
pair
of

enTTes?

•  hasAdvisor
(JimGray,
MikeHarrison)

Machine
Reading

•  RelaTon
Discovery

– Which
new
relaTons
are
there
for
given
pair
of

enTTes?

•  hasAdvisor
(JimGray,
MikeHarrison)

•  hasCoAuthor(HectorGarcia-‐Molina,
Gio
Wiederhold)

Machine
Reading

•  RelaTon
Discovery

– Which
new
relaTons
are
there
for
given
pair
of

enTTes?

•  hasAdvisor
(JimGray,
MikeHarrison)

Gio
Wiederhold)

•  graduatedAt
(JimGray,
Berkeley)

Machine
Reading

•  RelaTon
Discovery

– Which
new
relaTons
are
there
for
given
pair
of

enTTes?

•  hasAdvisor
(JimGray,
MikeHarrison)

Gio
Wiederhold)

•  graduatedAt
(JimGray,
Berkeley)

•  studiedAt
Stanford)

•  bornOn
(JohnLennon,
9Oct1940)

•  releasedAlbum
(JohnLennon,
10Dec1965)

Set
of
labeled
pairs
of

Instances
Examples

Set
of

labeled
Pa@ern

Examples

RelaTon
Discovery

Clustering

Algorithm

Inference

•  Inference
is
the
act
or
process
of
deriving

logical
conclusions
from
premises
known
or

assumed
to
be
true
[Wikipedia]

Inference

•  Manually
craved
inference
rules

•  AutomaTcally
learned
inference
rules

•  Data
mining
the
Knowledge
Base

Machine
Reading

•  Ontology
RepresentaTon

Facts
(RDF
triples)

1:
(Jim,
hasAdvisor,
Mike)

2:
(Surajit,
hasAdvisor,
Jeﬀ)

3:
(Madonna,
marriedTo,
GuyRitchie)

4:
(Nicolas,
marriedTo,
Carla)

5:
(ManchesterU,
wonCup,
ChampionsLeague)

ReiﬁcaTon:

“Facts
about
Facts”:

6:

(1,
inYear,
1968)

7:

(2,
inYear,
2006)

8:

(3,
validFrom,
22-‐Dec-‐2000)

9:

(3,
validUnTl,
Nov-‐2008)

10:
(4,
validFrom,
2-‐Feb-‐2008)

11:
(2,
source,
SigmodRecord)

12:
(5,
inYear,
1999)

13:
(5,
locaTon,
CampNou)

14:
(5,
source,
Wikipedia)

Document/Sentence
UnderstanTng

(MicroRead)

•  “The
scienTst
observed
the
bu@erﬂy
with
the

blue
circle”

Document/Sentence
UnderstanTng

(MicroRead)

•  “The
scienTst
observed
the
bu[erﬂy
with
the

blue
circle”

Document/Sentence
UnderstanTng

(MicroRead)

•  “The
scienTst
observed
the
bu[erﬂy
with
the

blue
circle”

•  “The
scienTst
observed
the
bu@erﬂy
with
the

blue
microscope”

Document/Sentence
UnderstanTng

(MicroRead)

•  “The
scienTst
observed
the
bu[erﬂy
with
the

blue
circle”

•  “The
scienAst
observed
the
bu@erﬂy
with
the

blue
microscope”

DBPedia

h@p://wiki.dbpedia.org/

DBPedia

Mapping
Wikipedia
semi-‐structured
data
into
RDF
triples

DBPedia

Mapping
Wikipedia
semi-‐structured
data
into
RDF
triples

Semi-‐structured
data

The
“Low-‐Hanging
Fruit”

DBPedia

•  How
to
Read
Wikipedia
Semi-‐structured
data?

[Lehmann
et
al.,
2014]

– Parse
Wikipedia
Markup
language

– Overcome
the
lack
of
standard
problem

•  Same
properTes
might
have
diﬀerent
names

•  “Datebirth”
and
“Birth_date”

•  “Birthplace”
and
“Birth_place”

– Instead
of
“Modeling
the
World”,
try
to
structure

the
available
informaTon

The
YAGO-‐NAGA
Project:

HarvesAng,
Searching,
and
Ranking
Knowledge

from
the
Web

YAGO

•  Yet
Another
Great
Ontology
-‐
YAGO

•  Main
Goal:
building
a
conveniently
searchable,

large-‐scale,
highly
accurate
knowledge
base
of

common
facts
in
a
machine-‐processable

representaTon

YAGO

•  Turn
Web
into
Knowledge
Base
[Weikum
et

al.,
2009]

– Building
a
comprehensive
Knowledge
Base
of

human
knowledge

– knowledge
from
Wikipedia
and
WordNet

– the
ontology
check
itself
for
precision

YAGO

•  The
knowledge
base
is
automaTcally

constructed
from
Wikipedia

•  Each
arTcle
in
Wikipedia
becomes
an
enTty
in

the
kb
(e.g.,
since
Leonard
Cohen
has
an

arTcle
in
Wikipedia,
LeonardCohen
becomes

an
enTty
in
YAGO).

YAGO

Free

Text

InfoBox

YAGO

Wikipedia
InfoBox

YAGO

Wikipedia
InfoBox

Semi-‐structured
data

The
“Low-‐Hanging
Fruit”

YAGO

•  Certain
categories
are
exploited
to
deliver

type
informaTon
(e.g.,
the
arTcle
about

Leonard
Cohen
is
in
the
category
Canadian

male
poets,
so
he
becomes
a
Canadian
poet).

YAGO

•  For
each
category
of
a
page
[Hoﬀart
et
al.,
2012]

–  Using
shallow
parsing,
determine
the
head
word
of
the

category
name.
In
the
example
of
Canadian
poets,
the

head
word
is
poets.

–  If
the
head
word
is
in
plural,
then
proposes
the
category
as

a
class
and
the
arTcle
enTty
as
an
instance

–  Link
the
class
to
the
WordNet
taxonomy
(most
frequent

sense
of
the
head
word
in
WordNet)

•  only
countable
nouns
can
appear
in
plural
form

•  only
countable
nouns
can
be
ontological
classes

•  themaTc
categories
(such
as
Canadian
poetry)
are

diﬀerent
from
conceptual
Categories

YAGO

•  head
words
that
are
not
conceptual
even
though

they
appear
in
plural
(such
as
stubs
in
Canadian

poetry
stubs)
are
in
the
first
list
of
excepTons.

•  words
that
do
not
map
to
their
most
frequent

sense,
but
to
a
different
sense
are
in
the
second

excepTon
list

–  The
word
capital,
e.g.,
refers
to
the
main
city
of
a

country
in
the
majority
of
cases
and
not
to
the

financial
amount,
which
is
the
most
frequent
sense
in

WordNet.

YAGO

•  About
100
manually
deﬁned
relaTons

–  wasBornOnDate

–  locatedIn

–  hasPopulaTon

•  Categories
and
infoboxes
are
exploited
to
deliver
facts

(instances
of
relaTons).

•  Manually
deﬁned
pa@erns
that
map
categories
and

infobox
a@ributes
to
fact
templates

–  infobox
a@ribute
born=Montreal,
thus

wasBornIn(LeonardCohen,
Montreal)

•  Pa@ern-‐based
extracTons
resulted
in
2
million

extracted
enTTes
and
20
million
facts

YAGO

•  Based
on
declaraTve
rules
(stored
in
text
ﬁles)

•  The
rules
take
the
form
of
subject-‐
predicate-‐
object
triples,
so
that
they
are
basically

addiTonal
facts

•  There
are
diﬀerent
types
of
rules

YAGO

•  Factual
rules:

deﬁniTon
of
all
relaTons,
their
domains
and

ranges,
and
the
deﬁniTon
of
the
classes
that
make
up
the

YAGO
hierarchy
of
literal
types.

•  ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the

knowledge
base,
then
another
fact
shall
be
added.
Horn

clause
rules.

•  Replacement
rules:
for
interpreTng
micro-‐formats,

cleaning
up
HTML
tags,
and
normalizing
numbers.

•  ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the

Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,

arTcle
Ttles,
and
even
other
regular
elements
in
the
source

such
as
headings,
links,
or
references.

YAGO

•  Factual
rules:

deﬁniTon
of
all
relaTons,
their
domains
and

ranges,
and
the
deﬁniTon
of
the
classes
that
make
up
the

YAGO
hierarchy
of
literal
types.

•  ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the

knowledge
base,
then
another
fact
shall
be
added.
Horn

clause
rules.

•  Replacement
rules:
for
interpreTng
micro-‐formats,

cleaning
up
HTML
tags,
and
normalizing
numbers.

•  ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the

Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,

arTcle
Ttles,
and
even
other
regular
elements
in
the
source

such
as
headings,
links,
or
references.

Knowledge

RepresentaTon

YAGO

•  Factual
rules:

deﬁniTon
of
all
relaTons,
their
domains
and

ranges,
and
the
deﬁniTon
of
the
classes
that
make
up
the

YAGO
hierarchy
of
literal
types.

•  ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the

knowledge
base,
then
another
fact
shall
be
added.
Horn

clause
rules.

•  Replacement
rules:
for
interpreTng
micro-‐formats,

cleaning
up
HTML
tags,
and
normalizing
numbers.

•  ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the

Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,

arTcle
Ttles,
and
even
other
regular
elements
in
the
source

such
as
headings,
links,
or
references.

Inference

YAGO

•  Factual
rules:

deﬁniTon
of
all
relaTons,
their
domains
and

ranges,
and
the
deﬁniTon
of
the
classes
that
make
up
the

YAGO
hierarchy
of
literal
types.

•  ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the

knowledge
base,
then
another
fact
shall
be
added.
Horn

clause
rules.

•  Replacement
rules:
for
interpreTng
micro-‐formats,

cleaning
up
HTML
tags,
and
normalizing
numbers.

•  ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the

Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,

arTcle
Ttles,
and
even
other
regular
elements
in
the
source

such
as
headings,
links,
or
references.
Knowledge

RepresentaTon

YAGO

•  Factual
rules:

deﬁniTon
of
all
relaTons,
their
domains
and

ranges,
and
the
deﬁniTon
of
the
classes
that
make
up
the

YAGO
hierarchy
of
literal
types.

•  ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the

knowledge
base,
then
another
fact
shall
be
added.
Horn

clause
rules.

•  Replacement
rules:
for
interpreTng
micro-‐formats,

cleaning
up
HTML
tags,
and
normalizing
numbers.

•  ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the

Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,

arTcle
Ttles,
and
even
other
regular
elements
in
the
source

such
as
headings,
links,
or
references.

InformaTon

ExtracTon

YAGO

•  AutomaTcally
veriﬁes
consistency

– Check
uniqueness
of
funcTonal
arguments

•  spouse(x,y)
∧
diﬀ(y,z)
⇒
¬spouse(x,z)

– Check
domains
and
ranges
of
relaTons

•  spouse(x,y)
⇒
female(x)

•  spouse(x,y)
⇒
male(y)

•  spouse(x,y)
⇒
(f(x)∧m(y))
∨
(m(x)∧f(y))

YAGO

•  AutomaTcally
verifies
consistency

– Hard
Constraint

•  hasAdvisor(x,y)
∧
graduatedInYear(x,t)
∧
graduatedInYear(y,s)
⇒
s
<
t

– Sov
Constraint

•  firstPaper(x,p)
∧
firstPaper(y,q)
∧
author(p,x)
∧
author(p,y)
)
∧

inYear(p)
>
inYear(q)
+
5years
⇒
hasAdvisor(x,y)
[0.6]

YAGO

•  AutomaTcally
verifies
consistency

– Hard
Constraint

•  hasAdvisor(x,y)
∧
graduatedInYear(x,t)
∧
graduatedInYear(y,s)
⇒
s
<
t

– Sov
Constraint

•  firstPaper(x,p)
∧
firstPaper(y,q)
∧
author(p,x)
∧
author(p,y)
)
∧

inYear(p)
>
inYear(q)
+
5years
⇒
hasAdvisor(x,y)
[0.6]

Inference

YAGO

•  Ontology
RepresentaTon

– EnTTes
and
RelaTons
of
public
interest

– Format:
TSV,
RDF,
XML,
N3,
Web
Interface

– Learns

•  Instances
and
pa@erns
from
Wikipedia;

•  Taxonomy
from
WordNet;

•  Geotags
informaTon
from
Geonames.

YAGO

•  Named
EnTty
ResoluTon/ExtracTon
[Theobald
&

Weikum,
2012]

– Based
on
rules
and
pa@erns
extracted
from

Wikipedia

– DisambiguaTon
is
a
relevant
issue

– Semi-‐structured
data

The
“Low-‐Hanging
Fruit”

•  Wikipedia
infoboxes
&
categories

•  HMTL
lists
&
tables,
etc.

YAGO

•  Named
EnTty
ResoluTon/ExtracTon
[Theobald
&

Weikum,
2012]

– Based
on
rules
and
pa@erns
extracted
from

Wikipedia

– DisambiguaTon
is
a
relevant
issue

data

The
“Low-‐Hanging
Fruit”

•  Wikipedia
infoboxes
&
categories

•  HMTL
lists
&
tables,
etc.

Natural
Language

Processing

Machine

Learning

YAGO

•  RelaTon
ExtracTon
[Theobald
&
Weikum,
2012]

– Based
on
rules
and
pa@erns
extracted
from

Wikipedia

data

The
“Low-‐Hanging
Fruit”

•  Wikipedia
infoboxes
&
categories

•  HMTL
lists
&
tables,
etc.

YAGO

•  RelaTon
ExtracTon
[Theobald
&
Weikum,
2012]

– Based
on
rules
and
pa@erns
extracted
from

Wikipedia

data

The
“Low-‐Hanging
Fruit”

•  Wikipedia
infoboxes
&
categories

•  HMTL
lists
&
tables,
etc.

Natural
Language

Processing

Machine

Learning

Machine
Reading

This
slide
was
taken
from
[Hoﬀart
et
al.,
2015]

YAGO

•  YAGO2:
Exploring
and
Querying
World

Knowledge
in
Time,
Space,
Context,
and
Many

Languages

– New
relaTons
speciﬁcally
designed
to
cover
Tme,

space
and
context

– Wikipedia
translated
pages
as
sources
for
other

languages

YAGO

•  YAGO3
[Mahdisoltani
&
Biega
&
Suchanek,
2015]

–  an
extension
of
the
YAGO
knowledge
base;

–  built
from
the
Wikipedias
in
mulTple
languages.

–  fuses
the
mulTlingual
informaTon
with
the
English
WordNet

–  categories,
infoboxes,
and
Wikidata,
to
learn
the
meaning
of

infobox
a@ributes
across
languages

–  10
diﬀerent
languages

–  precision
of
95%-‐100%
in
the
a@ribute
mapping

–  enlarges
YAGO
by
1m
new
enTTes
and
7m
new
facts.

YAGO

•  More
on
YAGO:

–  Very
nice
tutorials:

•  “Knowledge
Bases
for
Web
Content
AnalyTcs”
at
WWW

2015,
Florence,
May
2015.

•  "SemanTc
Knowledge
Bases
from
Web
Sources"
at
IJCAI

2011,
Barcelona,
July
2011

"HarvesTng
Knowledge
from
Web
Data
and
Text"
at
CIKM

2010,
Toronto,
October
2010

"From
InformaTon
to
Knowledge:
HarvesTng
EnTTes
and

RelaTonships
from
Web
Sources"
at
PODS
2010,

Indianapolis,
June
2010

–  Project
Website:

•  h[p://www.mpi-‐inf.mpg.de/yago-‐naga/

YAGO

•  More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)

YAGO

•  More
on
YAGO

?X
<hasChild>
?C
?Y
<hasChild>
?C
=>
?X
<isMarriedTo>
?Y

YAGO

•  More
on
YAGO

?X
<hasChild>
?C
?Y
<hasChild>
?C
=>
?X
<isMarriedTo>
?Y

Machine

Learning

YAGO

•  More
on
YAGO

?X
<hasChild>
?C
?Y
<hasChild>
?C
=>
?X
<isMarriedTo>
?Y

Machine

Learning

Inference

KnowItAll

•  MoTvaTon:
New
Paradigm
for
Search
[Etzioni,
2008]

–  The
future
of
Web
Search

–  Read
the
Web
instead
of
retrieving
Web
pages
to

perform
Web
Search

KnowItAll

•  InformaTon
ExtracTon
(IE)
+
tractable

inference

–  IE(sentence)
=
who
did
what?

•  speaker(P.
Smith,
ECMLPKDD2012)

–  Inference
=
uncover
implicit
informaTon

•  Will
Pi@sburgh
Steelers
be
champions
again?

•  Open
InformaTon
ExtracTon
[Banko
et
al.,
2007]

Open
InformaTon
ExtracTon

[Banko
et
al.,
2007]

•  Open
IE
systems
avoid
speciﬁc
nouns
and

verbs

•  Extractors
are
unlexicalized—formulated
only

in
terms
of:

– 
syntacTc
tokens
(e.g.,
part-‐of-‐speech
tags)

–  closed-‐word
classes
(e.g.,
of,
in,
such
as).

•  Open
IE
extractors
focus
on
generic
ways
in

which
relaTonships
are
expressed
in
English

–  naturally
generalizing
across
domains.

Open
InformaTon
ExtracTon

[Banko
et
al.,
2007]

•  Open
IE
extractors
focus
on
generic
ways
in

which
relaTonships
are
expressed
in
English

–  naturally
generalizing
across
domains.

RelaTon

Discovery

Open
InformaTon
ExtracTon

•  Open
IE
systems
are
tradiTonally
based
on

three
steps
[Etzioni
et
al.,
2011]:

–  1.
Label:
Sentences
are
automaTcally
labeled
with

extracTons
using
heurisTcs
or
distant
supervision.

Unsupervised

Learning

Open
InformaTon
ExtracTon

•  Open
IE
systems
are
tradiTonally
based
on

three
steps
[Etzioni
et
al.,
2011]:

–  1.
Label:
Sentences
are
automaTcally
labeled
with

extracTons
using
heurisTcs
or
distant
supervision.

–  2.
Learn:
A
relaTon
phrase
extractor
is
learned
using
a

sequence-‐labeling
graphical
model
(e.g.,
CRF).

Supervised

Learning

Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction

Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction

Similar to Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction (20)

Recently uploaded

Recently uploaded (20)

Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction