Socializing Search. Professionally.

Socializing Search. Professionally.
Sriram Sankar
Principal Staff Engineer
Recruiting Solutions

Daniel Tunkelang
Head, Query Understanding

Whether you’ve tried to find an Apache committer…

you’ve probably used LinkedIn Search.

4

Let’s talk about…

• Infrastructure

• Quality
5

LinkedIn Search leverages the economic graph.

6

Social means that relevance is highly personalized.

7

Machine-learned ranking, socially.
 Relevance models incorporate user features:
score = P (Document | Query, User)

 Our model: tree with logistic regression leaves.
X2=?

b0 + b1 T(x1 )+...+ bn xn

X10< 0.1234 ?

a0 + a1 P(x1 )+...+ anQ(xn )

g 0 + g1 R(x1 )+...+ g nQ(xn )
8

LinkedIn’s focus: entity-oriented search.

Company

Name
Search

Employees

Jobs

9

Query understanding can act as a relevance filter.

for i in [1..n]
s
w1 w2 … wi
if Pc(s) > 0
a
new Segment()
a.segs
{s}
a.prob
Pc(s)
B[i]
{a}
for j in [1..i-1]
for b in B[j]
s
wj wj+1 … wi
if Pc(s) > 0
a
new Segment()
a.segs
b.segs U {s}
a.prob
b.prob * Pc(s)
B[i]
B[i] U {a}
sort B[i] by prob
truncate B[i] to size k

10

Less is more.
warren buffett

11

Coming soon: entity-driven search assist.
link
Jobs at LinkedIn
People currently working at LinkedIn
People who used to work at LinkedIn

Search

Infrastructure

Lucene
 Map of terms to documents – the index
 Provides an API to add and remove documents to the
index
 Provides an API to query the index

13

1.

2.

BLAH BLAH BLAH

BLAH BLAH

Daniel

Daniel BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH

Sriram

BLAH

LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH

Sriram

LinkedIn

1
2
Inverted Index

Forward Index
14

A standard scoring capability is built in

15

 Extremely easy to build a search engine
 But difficult to get sophisticated

16

The LinkedIn Search Stack
Request
Live
Updates

Updates

Query Rewriter

Index Retrieval

Scorer
Offline
Data
Building

Data

Sorter/Blender

Response
17

Search Index Served by Lucene
 Inverted index
 Forward index
 Static rank based document ordering

18

Offline Data Builds on Hadoop
 Multi-stage map-reduce pipeline allows complex data
processing
 Produces sharded single segment Lucene index with
documents sorted by static rank
 Produces data models for use in query rewriting

19

Live Data Updates
 Feed based framework to support updates to offline data
builds
 Lucene enhanced with a partial index update capability

20

Query Rewriting (and Planning)
 Accepts raw query and user metadata
 Produces Lucene retrieval query and metadata for
scoring
 May use data models built offline

21

Index Retrieval
 Lucene query built by query rewriter is used to retrieve
documents from the Lucene index
 Documents are retrieved in static rank order (best
document first)
 Retrieval may be early-terminated – given that retrieval is
in static rank order
 No scoring is performed during retrieval

22

Scoring
 Scoring is performed after retrieval
 Its input is the retrieved document (i.e., includes the
forward index), a description of how the retrieval query
matched the document, and the scoring metadata
produced by the rewriter
 Costly features can be computed offline during the index
building process in Hadoop – e.g., tf/idf calculations

23

Summary
Quality
 LinkedIn Search leverages the economic graph.
 Social means that relevance is highly personalized.
 Less is more: query understanding is a relevance filter.
 Moving in the direction of suggesting structured queries.
System
 Powered by Lucene, but with additional components.
 Offline data builds on Hadoop, partial index updates.
 Index uses static ranking and early termination.
 Scoring performed outside of Lucene.

24

Sriram Sankar
ssankar@linkedin.com
https://linkedin.com/in/sriramxsankar

Daniel Tunkelang
dtunkelang@linkedin.com
https://linkedin.com/in/dtunkelang
25

Socializing Search. Professionally.

Recommended

Recommended

More Related Content

More from Daniel Tunkelang

More from Daniel Tunkelang (20)

Recently uploaded

Recently uploaded (20)

Socializing Search. Professionally.