SlideShare a Scribd company logo
1 of 59
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Search at LinkedIn
Sriram Sankar, Principal Staff Engineer
Kumaresh Pattabiraman, Senior Product Manager
https://www.youtube.com/watch?v=obCHKPYHuhA
2
Search at LinkedIn
 Personalized professional search
 Part of a bigger product experience
 But a really big part of it
3
4
Some history . . .
Approach to Search
 Off the shelf components (Lucene)
 Extended to address Lucene limitations (Sensei, Bobo,
Zoie, Content Store)
 Specialized verticals (Cleo, Krati)
 Stack adopted for other purposes (recommendations,
newsfeed, ads, analytics, etc.)
5
Lucene
An open source API that supports search functionality:
 Add new documents to index
 Delete documents from the index
 Construct queries
 Search the index using the query
 Score the retrieved documents
6
The Search Index
 Inverted Index: Mapping from (search) terms to list of
documents (they are present in)
 Forward Index: Mapping from documents to metadata
about them
7
8
BLAH BLAH BLAH Kumaresh BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH
BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH2.
1.
Kumaresh Sriram LinkedIn
2
1
Inverted Index Forward Index
The Search Index
 The lists are called posting lists
 Upto hundreds of millions of posting lists
 Upto hundreds of millions of documents
 Posting lists may contain as few as a single hit and as
many as tens of millions of hits
 Terms can be
– words in the document
– inferred attributes about the document
9
Lucene Queries
 “Sriram Sankar”
 Sriram Kumaresh
 +Sriram +LinkedIn
 +Kumaresh connection:418001
 +Kumaresh industry:software connection:418001^4
10
Lucene Scoring
 As documents are added to the index, Lucene maintains
some metadata on the terms (e.g., term position, tf/idf)
 Lucene accepts scoring information via query
modifications, boosts, etc.
 Lucene assigns a score to each retrieved document
using this information
11
Sensei
Layer over Lucene that provides:
 Sharding
 Cluster management
 Enhanced query language
12
13
Sensei BQL
SELECT *
FROM cars
WHERE price > 2000.00
USING RELEVANCE MODEL my_model
(favoriteColor:"black", favoriteTag:"cool")
DEFINED AS (String favoriteColor,
String favoriteTag)
BEGIN
float boost = 1.0;
if (tags.contains(favoriteTag))
boost += 0.5;
if (color.equals(my_color))
boost += 1.2;
return _INNER_SCORE * boost;
END
14
Live Updates – Zoie and Content Store
 The index reader has to be reopened before earlier live
updates are visible
 The only way to perform a live update is to replace the
entire document – which requires access to the
unchanged attributes also
15
Zoie
16
Search Content Store
17
Search
Content
Store
Lucene
Index
Activity
Feeds
Deletes
Inserts
Faceting
18
Bobo
19
Typeahead (Instant Search)
 Results as you type
 Conventional wisdom: Inverted indices cannot support
typeahead
 Cleo, Krati
20
21
Fast forward to last year –
and growing pains . . .
Scalability
 Rebuilding index from scratch extremely difficult
 Not possible to use complex algorithms during indexing
 Live updates at document granularity
 Inflexible scoring – both at Lucene and Sensei levels
22
Fragmentation
 Too many open source components glued together with
primary developers spread across many companies
 Different instantiations starting to diverge to deal with
their specific growing pains – so diverging stacks and
distracted engineers
23
24
Our new search stack . . .
Two verticals already in production
Life of a Query
25
Query Rewriter/
Planner
Results
Merging
User
Query
Search
Results
Search Shard
Search Shard
Life of a Query – Within A Search Shard
26
Rewritten
Query
Top
Results
From
Shard
INDEX
Top
Results
Retrieve a
Document
Score the
Document
Life of a Query – Within A Rewriter
27
Query
DATA
MODEL
Rewriter
State
Rewriter
Module
DATA
MODEL
DATA
MODEL
Rewritten
Query
Rewriter
Module
Rewriter
Module
Life of Data - Offline
28
INDEX
Derived DataRaw Data
DATA
MODEL
DATA
MODEL
DATA
MODEL
DATA
MODEL
DATA
MODEL
Benefits of New Stack
 A complete search engine
 Frequent reindexing possible (a full reset)
 Resharding becomes easy
 Clear separation of infrastructure and relevance functions
 A single stack with a single identity!
29
Early Termination
 We order documents in the index based on a static rank –
from most important to least important
 An offline relevance algorithm assigns a static rank to
each document on which the sorting is performed
 This allows retrieval to be early-terminated (assuming a
strong correlation between static rank and importance of
result for a specific query)
 Happens to work well with personalized search also
30
New Strategy for Live Updates
 Lucene segments are “document-partitioned”
 We have enhanced Lucene with “term-partitioned”
segments
 We use 3 term-partitioned segments:
– Base index (never changed)
– Live update buffer
– Snapshot index
 Fault tolerant, and performant
 No more content store!
31
32
Base Index
Snapshot
Index
Live Update
Buffer
Data Distribution
 Bit torrent based data distribution framework
 More details at a later time
33
Relevance
 Offline analysis – resulting in a better index and data
models
 Query rewriting – for better and more accurate recall
 Scoring – to fine tune each of the retrieved results
 Reranking – selection of top results for overall result set
quality
 Blending – to combine results from multiple verticals
34
Machine Learned Scorers
 Goal: To automatically build a function whose arguments
are interesting features of the query and the document
 Input to the machine learning system is a set of training
data that describes how the function should behave on
various combination of feature values
 The function takes the form of standard templates – a
linear formula is commonly used (due to simplicity)
35
Linear Regression on a Single Feature
36
37
LinkedIn Scorer:
Different Linear Models for Different Intents
 Relevance models incorporate user features:
score = P (Document | Query, User)
 Tree with linear regression leaves
37
b0 +b1T(x1)+...+bn xn
a0 +a1 P(x1)+...+anQ(xn)
X2=?
X10< 0.1234 ?
g0 +g1 R(x1)+...+gnQ(xn)
Going Forward
 Further standardize infrastructure for relevance
components
 Scatter-gather
 Java GC issues
 Extend infrastructure to browser/device
 Reintegrate diverging stacks
38
Product Overview
39
LinkedIn’s Vision
40
“Create economic opportunity for every member of the
global workforce”
The Economic Graph
41
Search is core to the economic graph vision
42
LI as a way to get the
day job
Job Seeker
Who uses search?
Casual User
LI as professional
identity
43
Outbound
professional
(Recruiter / Sales)
LI as day job
Casual User
Name Search
Topic Search
44
Instant: Name Search
Search all members by name or approximate name
45
Unified Search: Topic Search
One federated search result page with all relevant entities
about the topic
46
Outbound professional
Exploratory people search
47
Instant: Search Suggestions
Entity-aware suggestions for companies, skills & titles
48
Instant: Just one keystroke
From name search to exploratory search
49
People Search
Explore using facets and advanced search fields
50
People Search
Leverage the network through shared connections
51
Recruiter & Sales Navigator
Products powered by search
52
Job Seeker
Job Search
53
Instant: Search Suggestions
Entity-aware suggestions for companies, skills & titles
54
Job Search
Explore using facets and advanced search fields
55
Job Search
Leverage the network through relationship to job poster or
connections in the company
56
Other Search Users include…
Students – University Search
Information Seekers / Researchers - Content Search
Advertisers / Content Marketers – Company & Group Search
57
Bringing it all together
58
300 Million+ members
Search the economic graph of
300M profiles
3B Endorsements
300K jobs
3M Companies
2M Groups
25K Schools
100M+ pieces of professional content
One index
One unified search stack
Users
Product
Platform
59

More Related Content

Viewers also liked

[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013The Hive
 
Alan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLAlan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLThe Hive
 
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonNotes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonThe Hive
 
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29The Hive
 
Tomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLTomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLThe Hive
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventThe Hive
 
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...The Hive
 
Groupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The HiveGroupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The HiveThe Hive
 
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...The Hive
 
Pre production planning
Pre production planningPre production planning
Pre production planningsofiamorana1
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
Leanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The HiveLeanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The HiveThe Hive
 
My magazine edited
My magazine editedMy magazine edited
My magazine editedsofiamorana1
 
Startup Series: Lean Analytics, Innovation, and Tilting at Windmills
Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsStartup Series: Lean Analytics, Innovation, and Tilting at Windmills
Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsThe Hive
 
Very beautiful
Very beautifulVery beautiful
Very beautifulasmaeazed
 

Viewers also liked (17)

[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
 
Alan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLAlan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQL
 
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonNotes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
 
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
 
Tomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLTomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQL
 
Mumhsocialpdf
MumhsocialpdfMumhsocialpdf
Mumhsocialpdf
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
 
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
 
Groupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The HiveGroupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The Hive
 
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
 
Pre production planning
Pre production planningPre production planning
Pre production planning
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
La musica
La musicaLa musica
La musica
 
Leanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The HiveLeanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The Hive
 
My magazine edited
My magazine editedMy magazine edited
My magazine edited
 
Startup Series: Lean Analytics, Innovation, and Tilting at Windmills
Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsStartup Series: Lean Analytics, Innovation, and Tilting at Windmills
Startup Series: Lean Analytics, Innovation, and Tilting at Windmills
 
Very beautiful
Very beautifulVery beautiful
Very beautiful
 

Similar to Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman

Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchC4Media
 
Understanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchUnderstanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchJeff Fried
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminarJeff Fried
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ DataikuPAPIs.io
 
Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013Terrence Nguyen
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010bgerman
 
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group
 
Fried sp techcon hybrid search deeper dive
Fried sp techcon hybrid search deeper diveFried sp techcon hybrid search deeper dive
Fried sp techcon hybrid search deeper diveJeff Fried
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Gabriel Moreira
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTBert Johnson
 
Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...
Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...
Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...OutSystemsNeo
 
People soft search framework
People soft search frameworkPeople soft search framework
People soft search frameworkAnoop Savio
 
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)Jeff Fried
 
Understanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchUnderstanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchJeff Fried
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015Gina Montgomery, V-TSP
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakDeepak Agarwal
 
ONA (organizational network analysis) - enabling individuals to impact their ...
ONA (organizational network analysis) - enabling individuals to impact their ...ONA (organizational network analysis) - enabling individuals to impact their ...
ONA (organizational network analysis) - enabling individuals to impact their ...Agron Fazliu
 
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)Jeff Fried
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Sonya Liberman
 

Similar to Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman (20)

Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
Understanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchUnderstanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid Search
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminar
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010
 
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
 
Fried sp techcon hybrid search deeper dive
Fried sp techcon hybrid search deeper diveFried sp techcon hybrid search deeper dive
Fried sp techcon hybrid search deeper dive
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FAST
 
Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...
Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...
Advanced Bootstrapping and Integrations - Chennai OutSystems User Group 27th ...
 
People soft search framework
People soft search frameworkPeople soft search framework
People soft search framework
 
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
 
Understanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchUnderstanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid Search
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
ONA (organizational network analysis) - enabling individuals to impact their ...
ONA (organizational network analysis) - enabling individuals to impact their ...ONA (organizational network analysis) - enabling individuals to impact their ...
ONA (organizational network analysis) - enabling individuals to impact their ...
 
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 

More from The Hive

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie MuirheadThe Hive
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...The Hive
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell AutomationThe Hive
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroThe Hive
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 

More from The Hive (20)

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve Omohundro
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at Twitter
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions Search at LinkedIn Sriram Sankar, Principal Staff Engineer Kumaresh Pattabiraman, Senior Product Manager
  • 3. Search at LinkedIn  Personalized professional search  Part of a bigger product experience  But a really big part of it 3
  • 5. Approach to Search  Off the shelf components (Lucene)  Extended to address Lucene limitations (Sensei, Bobo, Zoie, Content Store)  Specialized verticals (Cleo, Krati)  Stack adopted for other purposes (recommendations, newsfeed, ads, analytics, etc.) 5
  • 6. Lucene An open source API that supports search functionality:  Add new documents to index  Delete documents from the index  Construct queries  Search the index using the query  Score the retrieved documents 6
  • 7. The Search Index  Inverted Index: Mapping from (search) terms to list of documents (they are present in)  Forward Index: Mapping from documents to metadata about them 7
  • 8. 8 BLAH BLAH BLAH Kumaresh BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH2. 1. Kumaresh Sriram LinkedIn 2 1 Inverted Index Forward Index
  • 9. The Search Index  The lists are called posting lists  Upto hundreds of millions of posting lists  Upto hundreds of millions of documents  Posting lists may contain as few as a single hit and as many as tens of millions of hits  Terms can be – words in the document – inferred attributes about the document 9
  • 10. Lucene Queries  “Sriram Sankar”  Sriram Kumaresh  +Sriram +LinkedIn  +Kumaresh connection:418001  +Kumaresh industry:software connection:418001^4 10
  • 11. Lucene Scoring  As documents are added to the index, Lucene maintains some metadata on the terms (e.g., term position, tf/idf)  Lucene accepts scoring information via query modifications, boosts, etc.  Lucene assigns a score to each retrieved document using this information 11
  • 12. Sensei Layer over Lucene that provides:  Sharding  Cluster management  Enhanced query language 12
  • 13. 13
  • 14. Sensei BQL SELECT * FROM cars WHERE price > 2000.00 USING RELEVANCE MODEL my_model (favoriteColor:"black", favoriteTag:"cool") DEFINED AS (String favoriteColor, String favoriteTag) BEGIN float boost = 1.0; if (tags.contains(favoriteTag)) boost += 0.5; if (color.equals(my_color)) boost += 1.2; return _INNER_SCORE * boost; END 14
  • 15. Live Updates – Zoie and Content Store  The index reader has to be reopened before earlier live updates are visible  The only way to perform a live update is to replace the entire document – which requires access to the unchanged attributes also 15
  • 20. Typeahead (Instant Search)  Results as you type  Conventional wisdom: Inverted indices cannot support typeahead  Cleo, Krati 20
  • 21. 21 Fast forward to last year – and growing pains . . .
  • 22. Scalability  Rebuilding index from scratch extremely difficult  Not possible to use complex algorithms during indexing  Live updates at document granularity  Inflexible scoring – both at Lucene and Sensei levels 22
  • 23. Fragmentation  Too many open source components glued together with primary developers spread across many companies  Different instantiations starting to diverge to deal with their specific growing pains – so diverging stacks and distracted engineers 23
  • 24. 24 Our new search stack . . . Two verticals already in production
  • 25. Life of a Query 25 Query Rewriter/ Planner Results Merging User Query Search Results Search Shard Search Shard
  • 26. Life of a Query – Within A Search Shard 26 Rewritten Query Top Results From Shard INDEX Top Results Retrieve a Document Score the Document
  • 27. Life of a Query – Within A Rewriter 27 Query DATA MODEL Rewriter State Rewriter Module DATA MODEL DATA MODEL Rewritten Query Rewriter Module Rewriter Module
  • 28. Life of Data - Offline 28 INDEX Derived DataRaw Data DATA MODEL DATA MODEL DATA MODEL DATA MODEL DATA MODEL
  • 29. Benefits of New Stack  A complete search engine  Frequent reindexing possible (a full reset)  Resharding becomes easy  Clear separation of infrastructure and relevance functions  A single stack with a single identity! 29
  • 30. Early Termination  We order documents in the index based on a static rank – from most important to least important  An offline relevance algorithm assigns a static rank to each document on which the sorting is performed  This allows retrieval to be early-terminated (assuming a strong correlation between static rank and importance of result for a specific query)  Happens to work well with personalized search also 30
  • 31. New Strategy for Live Updates  Lucene segments are “document-partitioned”  We have enhanced Lucene with “term-partitioned” segments  We use 3 term-partitioned segments: – Base index (never changed) – Live update buffer – Snapshot index  Fault tolerant, and performant  No more content store! 31
  • 33. Data Distribution  Bit torrent based data distribution framework  More details at a later time 33
  • 34. Relevance  Offline analysis – resulting in a better index and data models  Query rewriting – for better and more accurate recall  Scoring – to fine tune each of the retrieved results  Reranking – selection of top results for overall result set quality  Blending – to combine results from multiple verticals 34
  • 35. Machine Learned Scorers  Goal: To automatically build a function whose arguments are interesting features of the query and the document  Input to the machine learning system is a set of training data that describes how the function should behave on various combination of feature values  The function takes the form of standard templates – a linear formula is commonly used (due to simplicity) 35
  • 36. Linear Regression on a Single Feature 36
  • 37. 37 LinkedIn Scorer: Different Linear Models for Different Intents  Relevance models incorporate user features: score = P (Document | Query, User)  Tree with linear regression leaves 37 b0 +b1T(x1)+...+bn xn a0 +a1 P(x1)+...+anQ(xn) X2=? X10< 0.1234 ? g0 +g1 R(x1)+...+gnQ(xn)
  • 38. Going Forward  Further standardize infrastructure for relevance components  Scatter-gather  Java GC issues  Extend infrastructure to browser/device  Reintegrate diverging stacks 38
  • 40. LinkedIn’s Vision 40 “Create economic opportunity for every member of the global workforce”
  • 42. Search is core to the economic graph vision 42
  • 43. LI as a way to get the day job Job Seeker Who uses search? Casual User LI as professional identity 43 Outbound professional (Recruiter / Sales) LI as day job
  • 45. Instant: Name Search Search all members by name or approximate name 45
  • 46. Unified Search: Topic Search One federated search result page with all relevant entities about the topic 46
  • 48. Instant: Search Suggestions Entity-aware suggestions for companies, skills & titles 48
  • 49. Instant: Just one keystroke From name search to exploratory search 49
  • 50. People Search Explore using facets and advanced search fields 50
  • 51. People Search Leverage the network through shared connections 51
  • 52. Recruiter & Sales Navigator Products powered by search 52
  • 54. Instant: Search Suggestions Entity-aware suggestions for companies, skills & titles 54
  • 55. Job Search Explore using facets and advanced search fields 55
  • 56. Job Search Leverage the network through relationship to job poster or connections in the company 56
  • 57. Other Search Users include… Students – University Search Information Seekers / Researchers - Content Search Advertisers / Content Marketers – Company & Group Search 57
  • 58. Bringing it all together 58 300 Million+ members Search the economic graph of 300M profiles 3B Endorsements 300K jobs 3M Companies 2M Groups 25K Schools 100M+ pieces of professional content One index One unified search stack Users Product Platform
  • 59. 59

Editor's Notes

  1. Video – not a dig on any one, but trying to show we need to do some unique stuff On a journey – have made a lot of progress, but we still have a long way to go. Kumaresh will focus on our product experiences at the end.
  2. Like most other companies needing to integrate search into their products
  3. Conventional wisdom – CS276 notes, Facebook, etc. – LinkedIn not alone on this.
  4. Other growing companies should keep all of this in mind Rebuilding – no index enhancements, resharding limited to adding shards at end Live updates require content store
  5. Other growing companies should keep all of this in mind Rebuilding – no index enhancements, resharding limited to adding shards at end Live updates require content store
  6. Unifying infrastructure always pays dividends even if not the perfect fit for each use case Typeahead (instant) in production – so no more Cleo
  7. Leaving out frontend, device side stuff
  8. Scoring taken out of Lucene
  9. Rewriting examples - intent recognition, stemming, synonyms, personalization Rationale for data models - examples are intent models, synonym tables, etc.
  10. May no longer have Lucene