Search as Communication: Lessons from a Personal Journey

Search
as
Communica/on:

Lessons
from
a
Personal
Journey

Daniel
Tunkelang

Head
of
Query
Understanding,
LinkedIn

These
are
great
textbooks
on
informa/on
retrieval.

Unfortunately,
I
never
read
them
in
school.

But
I
did
study
graphs
and
stuﬀ.

I
found
myself
developing
a
search
engine.

And
the
next
thing
I
knew,
I
was
a
search
guy.

So
what
did
I
learn
along
the
way?

Search
isn't
a
ranking
problem.

It's
a
communica/on
problem.

Outline

1.
Lessons
from
Library
Science

2.
Adventures
with
InformaAon
ExtracAon

3.
A
Moment
of
Clarity

1.
Lessons
from
Library
Science

InformaAon
need
query
select
from
results

rank
using
IR
model

USER:

SYSTEM:

M-‐idf
PageRank

A
birds-‐eye
view
of
how
search
engines
work.

Old
school
search:
ask
a
librarian.

Search
lives
in
an
informa/on-‐seeking
context.

[Pirolli
and
Card,
2005]

vs.

Recognize
ambiguity
and
ask
for
clariﬁca/on.

Clarify,
then
reﬁne.

Computers
Books

Faceted
search.
It’s
not
just
for
e-‐commerce.

Give
users
transparency,
guidance,
and
control.

Take-‐away
for
search
engine
developers:

Act
like
a
librarian.
Communicate
with
your
user.

2.
Adventures
with
Informa/on
Extrac/on

String
matching
is
great
but
has
limits.

20
20
for i in [1..n]!
s ← w1 w2 … wi!
if Pc(s) > 0!
a ← new Segment()!
a.segs ← {s}!
a.prob ← Pc(s)!
B[i] ← {a}!
for j in [1..i-1]!
for b in B[j]!
s ← wj wj+1 … wi!
if Pc(s) > 0!
a.segs ← b.segs U {s}!
a.prob ← b.prob * Pc(s)!
B[i] ← B[i] U {a}!
sort B[i] by prob!
truncate B[i] to size k!
People
search
for
en//es.
Recognize
them!

Named
en/ty
recogni/on
is
free,
as
in
free
beer.

Problem:
they
process
each
document
separately.

EnAty

DetecAon

System

Why
not
take
advantage
of
corpus
features?

Give
your
documents
the
right
to
vote!

Use
a
high-‐recall
method
to
collect
candidates.

•  e.g.,
all
Atle-‐case
spans
of
words
other

than
single
word
beginning
a
sentence.

Process
each
document
separately.

•  Each
candidate
is
assigned
an
enAty
type,

or
no
type
at
all.

If
a
candidate
is
mostly
assigned
a
single
enAty

type,
extrapolate
to
all
its
occurrences.

Looking
for
topics?
Use
idf,
and
its
cousin
ridf.

Inverse
document
frequency
(idf)

•  Too
low?
Probably
a
stop
word.

•  Too
high?
Could
be
noise.

Residual
inverse
document
frequency
(ridf)

•  Predict
idf
using
Poisson
model.

•  Diﬀerence
between
idf
and
predicted
idf.

“a
good
keyword
is
far
from
Poisson”

[Church
and
Gale,
1995]

Terminology
extrac/on?
Try
data
recycling.

Obtain
en//es
by
any
means
necessary.

Take-‐away
for
search
engine
developers:

En/ty
detec/on
is
crucial.
And
it
isn’t
that
hard.

3.
A
Moment
of
Clarity

informaAon
Need
query
select
from
results

rank
using
IR
model

USER:

SYSTEM:

M-‐idf
PageRank

Let’s
go
back
to
our
pigeons
for
a
moment.

What
does
this
process
look
like
to
the
system?

vs.

And
here’s
what
it
looks
like
to
the
user.

GOOD
NOT
SO
GOOD

But
can
the
system
tell
the
diﬀerence?

User
experience
should
reﬂect
system
conﬁdence.

vs.

h^p://searchengineland.com/ge`ng-‐organized-‐paid-‐search-‐user-‐intent-‐the-‐search-‐funnel-‐116312

Derived
from
[Jansen
et
al,
2007].

Searches
reﬂect
a
variety
of
informa/on
needs.

34
34
for i in [1..n]!
s ← w1 w2 … wi!
if Pc(s) > 0!
a.segs ← {s}!
a.prob ← Pc(s)!
B[i] ← {a}!
for j in [1..i-1]!
for b in B[j]!
s ← wj wj+1 … wi!
if Pc(s) > 0!
a.segs ← b.segs U {s}!
a.prob ← b.prob * Pc(s)!
B[i] ← B[i] U {a}!
sort B[i] by prob!
truncate B[i] to size k!
We
can
segment
informa/on
need
from
the
query.

We
can
learn
from
analyzing
user
behavior.

And
we
can
look
at
our
relevance
scores.

Naviga/onal
Exploratory

Claudia
Hauﬀ,
Query
Diﬃculty
for
Digital
Libraries
[2009]

There
are
many
pre-‐
and
post-‐retrieval
signals.

Take-‐away
for
search
engine
developers:

Queries
vary
in
diﬃculty.
Recognize
and
adapt.

Review

1.  Lessons
from
Library
Science

•  Act
like
a
librarian.
Communicate
with
users.

2.
Adventures
with
InformaAon
ExtracAon

•  EnAty
detecAon
is
crucial.
And
isn’t
that
hard.

3.
A
Moment
of
Clarity

•  Queries
vary
in
diﬃculty.
Recognize
and
adapt.

Conclusion:

Read
the
textbooks.

But
treat
search
as
a
communica/on
problem.

WE’RE
HIRING!

hbp://data.linkedin.com/search

Contact
me:

dtunkelang@linkedin.com

hbp://linkedin.com/in/dtunkelang

Search as Communication: Lessons from a Personal Journey

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Search as Communication: Lessons from a Personal Journey

Similar to Search as Communication: Lessons from a Personal Journey (20)

More from Daniel Tunkelang

More from Daniel Tunkelang (20)

Recently uploaded

Recently uploaded (20)

Search as Communication: Lessons from a Personal Journey