SlideShare a Scribd company logo
1 of 25
Socializing Search. Professionally.
Sriram Sankar
Principal Staff Engineer
Recruiting Solutions

Daniel Tunkelang
Head, Query Understanding
Whether you’ve tried to find an Apache committer…
…or an Apache commander,

3
you’ve probably used LinkedIn Search.

4
Let’s talk about…

• Infrastructure

• Quality
5
LinkedIn Search leverages the economic graph.

6
Social means that relevance is highly personalized.

7
Machine-learned ranking, socially.
 Relevance models incorporate user features:
score = P (Document | Query, User)

 Our model: tree with logistic regression leaves.
X2=?

b0 + b1 T(x1 )+...+ bn xn

X10< 0.1234 ?

a0 + a1 P(x1 )+...+ anQ(xn )

g 0 + g1 R(x1 )+...+ g nQ(xn )
8
LinkedIn’s focus: entity-oriented search.

Company

Name
Search

Employees

Jobs

9
Query understanding can act as a relevance filter.

for i in [1..n]
s
w1 w2 … wi
if Pc(s) > 0
a
new Segment()
a.segs
{s}
a.prob
Pc(s)
B[i]
{a}
for j in [1..i-1]
for b in B[j]
s
wj wj+1 … wi
if Pc(s) > 0
a
new Segment()
a.segs
b.segs U {s}
a.prob
b.prob * Pc(s)
B[i]
B[i] U {a}
sort B[i] by prob
truncate B[i] to size k

10
Less is more.
warren buffett

11
Coming soon: entity-driven search assist.
link
Jobs at LinkedIn
People currently working at LinkedIn
People who used to work at LinkedIn

Search
Infrastructure

Lucene
 Map of terms to documents – the index
 Provides an API to add and remove documents to the
index
 Provides an API to query the index

13
1.

2.

BLAH BLAH BLAH

BLAH BLAH

Daniel

Daniel BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH

Sriram

BLAH

LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH

Sriram

LinkedIn

1
2
Inverted Index

Forward Index
14
A standard scoring capability is built in

15
 Extremely easy to build a search engine
 But difficult to get sophisticated

16
The LinkedIn Search Stack
Request
Live
Updates

Updates

Query Rewriter

Index Retrieval

Scorer
Offline
Data
Building

Data

Sorter/Blender

Response
17
Search Index Served by Lucene
 Inverted index
 Forward index
 Static rank based document ordering

18
Offline Data Builds on Hadoop
 Multi-stage map-reduce pipeline allows complex data
processing
 Produces sharded single segment Lucene index with
documents sorted by static rank
 Produces data models for use in query rewriting

19
Live Data Updates
 Feed based framework to support updates to offline data
builds
 Lucene enhanced with a partial index update capability

20
Query Rewriting (and Planning)
 Accepts raw query and user metadata
 Produces Lucene retrieval query and metadata for
scoring
 May use data models built offline

21
Index Retrieval
 Lucene query built by query rewriter is used to retrieve
documents from the Lucene index
 Documents are retrieved in static rank order (best
document first)
 Retrieval may be early-terminated – given that retrieval is
in static rank order
 No scoring is performed during retrieval

22
Scoring
 Scoring is performed after retrieval
 Its input is the retrieved document (i.e., includes the
forward index), a description of how the retrieval query
matched the document, and the scoring metadata
produced by the rewriter
 Costly features can be computed offline during the index
building process in Hadoop – e.g., tf/idf calculations

23
Summary
Quality
 LinkedIn Search leverages the economic graph.
 Social means that relevance is highly personalized.
 Less is more: query understanding is a relevance filter.
 Moving in the direction of suggesting structured queries.
System
 Powered by Lucene, but with additional components.
 Offline data builds on Hadoop, partial index updates.
 Index uses static ranking and early termination.
 Scoring performed outside of Lucene.

24
Sriram Sankar
ssankar@linkedin.com
https://linkedin.com/in/sriramxsankar

Daniel Tunkelang
dtunkelang@linkedin.com
https://linkedin.com/in/dtunkelang
25

More Related Content

More from Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the UserDaniel Tunkelang
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInDaniel Tunkelang
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityDaniel Tunkelang
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsDaniel Tunkelang
 

More from Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter Authority
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
 
exploring semantic means
exploring semantic meansexploring semantic means
exploring semantic means
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Socializing Search. Professionally.

  • 1. Socializing Search. Professionally. Sriram Sankar Principal Staff Engineer Recruiting Solutions Daniel Tunkelang Head, Query Understanding
  • 2. Whether you’ve tried to find an Apache committer…
  • 3. …or an Apache commander, 3
  • 4. you’ve probably used LinkedIn Search. 4
  • 5. Let’s talk about… • Infrastructure • Quality 5
  • 6. LinkedIn Search leverages the economic graph. 6
  • 7. Social means that relevance is highly personalized. 7
  • 8. Machine-learned ranking, socially.  Relevance models incorporate user features: score = P (Document | Query, User)  Our model: tree with logistic regression leaves. X2=? b0 + b1 T(x1 )+...+ bn xn X10< 0.1234 ? a0 + a1 P(x1 )+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn ) 8
  • 9. LinkedIn’s focus: entity-oriented search. Company Name Search Employees Jobs 9
  • 10. Query understanding can act as a relevance filter. for i in [1..n] s w1 w2 … wi if Pc(s) > 0 a new Segment() a.segs {s} a.prob Pc(s) B[i] {a} for j in [1..i-1] for b in B[j] s wj wj+1 … wi if Pc(s) > 0 a new Segment() a.segs b.segs U {s} a.prob b.prob * Pc(s) B[i] B[i] U {a} sort B[i] by prob truncate B[i] to size k 10
  • 11. Less is more. warren buffett 11
  • 12. Coming soon: entity-driven search assist. link Jobs at LinkedIn People currently working at LinkedIn People who used to work at LinkedIn Search
  • 13. Infrastructure Lucene  Map of terms to documents – the index  Provides an API to add and remove documents to the index  Provides an API to query the index 13
  • 14. 1. 2. BLAH BLAH BLAH BLAH BLAH Daniel Daniel BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH Sriram LinkedIn 1 2 Inverted Index Forward Index 14
  • 15. A standard scoring capability is built in 15
  • 16.  Extremely easy to build a search engine  But difficult to get sophisticated 16
  • 17. The LinkedIn Search Stack Request Live Updates Updates Query Rewriter Index Retrieval Scorer Offline Data Building Data Sorter/Blender Response 17
  • 18. Search Index Served by Lucene  Inverted index  Forward index  Static rank based document ordering 18
  • 19. Offline Data Builds on Hadoop  Multi-stage map-reduce pipeline allows complex data processing  Produces sharded single segment Lucene index with documents sorted by static rank  Produces data models for use in query rewriting 19
  • 20. Live Data Updates  Feed based framework to support updates to offline data builds  Lucene enhanced with a partial index update capability 20
  • 21. Query Rewriting (and Planning)  Accepts raw query and user metadata  Produces Lucene retrieval query and metadata for scoring  May use data models built offline 21
  • 22. Index Retrieval  Lucene query built by query rewriter is used to retrieve documents from the Lucene index  Documents are retrieved in static rank order (best document first)  Retrieval may be early-terminated – given that retrieval is in static rank order  No scoring is performed during retrieval 22
  • 23. Scoring  Scoring is performed after retrieval  Its input is the retrieved document (i.e., includes the forward index), a description of how the retrieval query matched the document, and the scoring metadata produced by the rewriter  Costly features can be computed offline during the index building process in Hadoop – e.g., tf/idf calculations 23
  • 24. Summary Quality  LinkedIn Search leverages the economic graph.  Social means that relevance is highly personalized.  Less is more: query understanding is a relevance filter.  Moving in the direction of suggesting structured queries. System  Powered by Lucene, but with additional components.  Offline data builds on Hadoop, partial index updates.  Index uses static ranking and early termination.  Scoring performed outside of Lucene. 24