SlideShare a Scribd company logo
1 of 39
Download to read offline
How Lucene Powers LinkedIn
Segmentation & Targeting Platform
Lucene/SOLR Revolution EU, November 2013
Hien Luu, Raj Rangaswamy
©2013 LinkedIn Corporation. All Rights Reserved.
About Us
*

Hien	
  Luu	
  

Rajasekaran	
  
Rangaswamy	
  
Agenda
§  Little bit about LinkedIn
§  Segmentation & Targeting Platform Overview
§  How Lucene powers Segmentation & Targeting
Platform
§  Q&A

©2013 LinkedIn Corporation. All Rights Reserved.
Our Mission
Connect the world’s professionals to make them
more productive and successful.

Our Vision
Create economic opportunity for every
professional in the world.

Members First!
The world’s largest professional network
Over 65% of members are now international

	
  
>30M
	
  
>90%

Fortune	
  100	
  Companies	
  	
  
use	
  LinkedIn	
  Talent	
  Soln	
  to	
  hire	
  

>3M	
  
Company	
  Pages	
  

	
  

	
  
19

Languages	
  

	
  

>5.7B	
  
Professional	
  searches	
  in	
  2012	
  

	
  
©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
•  Headquartered	
  in	
  Mountain	
  View,	
  Calif.,	
  with	
  offices	
  around	
  the	
  world!
•  LinkedIn	
  has	
  ~4200	
  full-­‐Kme	
  employees	
  located	
  around	
  the	
  world	
  
*
	
  

Source :
http://press.linkedin.com/about
SegmentaKon	
  &	
  TargeKng	
  

©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting

Bhaskar Ghosh

Attribute types
Segmentation & Targeting
1. Create attributes
§ 
§ 
§ 
§ 
§ 

Name
Email
State
Occupation
Etc.

2. Attributes Added to Table
Name	
  

Email	
  

State	
  

OccupaEon	
  

John	
  Smith	
  

jsmith@blah.com	
  

California	
  

Engineer	
  

Jane	
  Smith	
  

smithj@mail.com	
  

Nevada	
  

HR	
  Manager	
  

Jane	
  Doe	
  

jdoe@email.com	
  

California	
  

…	
  

Engineer	
  

3. Create Target Segment:
California, Engineer
Name	
  

Email	
  

State	
  

OccupaEon	
  

John	
  Smith	
  

jsmith@blah.com	
  

California	
  

Engineer	
  

Jane	
  Doe	
  

jdoe@email.com	
  

California	
  

4. Export List & Send Vendor

Engineer	
  

LinkedIn Confidential ©2013 All Rights Reserved

10	
  
Segmentation & Targeting

§  Business definition
–  Business would like to launch new campaign
often
–  Business would like to specify targeting criteria
using arbitrary set of attributes
–  Attributes need to be computed to fulfill the
targeting criteria
–  The attribute data resides on Hadoop or TD
–  Business is most comfortable with SQL-like
language
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting

Attribute
Computation
Engine

©2013 LinkedIn Corporation. All Rights Reserved.

Attribute
Serving
Engine
Segmentation & Targeting
Attribute
consolidation

Self-service

Attribute
Computation
Engine

Support various
data sources
©2013 LinkedIn Corporation. All Rights Reserved.

Attribute
availability
Segmentation & Targeting
PB

Attribute computation
~238M
TB

TB

~440

©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Build
segments

Self-service

Attribute
Serving
Engine

Attribute predicate
expression
©2013 LinkedIn Corporation. All Rights Reserved.

Build lists
Segmentation & Targeting
count

filter
$

1234

complex
sum expressions

Σ

Serving Engine
~238M

~440
LinkedIn Member Attribute table

©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Who are the job seekers?

Who are the LinkedIn Talent Solution prospects
in Europe?

Who are north American recruiters that
don’t work for a competitor?

©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform

Complex tree-like attribute predicate expressions

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture

§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Architecture

Attribute
Serving
Engine

Attribute
Computation
Engine

Data
Storage
Layer
©2013 LinkedIn Corporation. All Rights Reserved.

Attribute
Indexing

Attribute
Creation
Engine

Attribute
Serving
Engine

Attribute
Materialization
Engine

Attribute
Metastore
Indexer
Mapper
mysql
attribute
store

Avro data in
HDFS

Attribute
Definitions
HDFS

Hadoop
Indexer MR

shard 1

shard 2

Index Merger
shard n

K=> AvroKey<GenericRecord>
V=> AvroValue<NullWritable>

Reducer
K=> NullWritable
V=> LuceneDocumentWrapper

LuceneOutputFormat
RecordWriter
LuceneDocumentWrapper
Document

Web Servers

Index
©2013 LinkedIn Corporation. All Rights Reserved.
Serving
JSON Predicate
Expression

JSON Lucene
Query Parser

Inverted
Index
©2013 LinkedIn Corporation. All Rights Reserved.

Inverted
Index

Segment &
List

Inverted
Index
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Serving – Load Balanced Model
HTTP Request

Load Balancer

Web Server 1

Shard 1

Web Server 2

Shard 2

Shared Drive
©2013 LinkedIn Corporation. All Rights Reserved.

Web Server n

Shard n
Serving – Load Balanced Model

But Wait…..
•  Is load balancing alone good enough?
•  What about distribution and failover?

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Next Steps - Distributed Model

•  A generic cluster management framework
•  Used to manage partitioned and replicated resources in
distributed systems
•  Built on top of Zookeeper that hides the complexity of ZK
primitives
•  Provides distributed features such as leader election, twophase commit etc. via a model of state machine
http://helix.incubator.apache.org/
©2013 LinkedIn Corporation. All Rights Reserved.
Next Steps - Distributed Model
HTTP Request

Load Balancer

Scatter Gather

Web Server 1

Web Server 2

Web Server 3

Shard
1

active

Shard
2

active

Shard
3

active

Shard
2

standby

Shard
3

standby

Shard
1

standby

©2013 LinkedIn Corporation. All Rights Reserved.
Next Steps - Distributed Model
HTTP Request

Load Balancer

Scatter Gather

Web Server 1

Web Server 2

Web Server 3

Shard
1

active

Shard
2

active

Shard
3

failure

Shard
2

standby

Shard
3

active

Shard
1

failure

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues – Use Case
•  Once segments are built, users want to forecast, see a
target revenue projection for the campaigns that they want
to run.
•  Campaigns can be run on various Revenue Models
•  This involves adding per member Propensity Scores and
Dollar Amounts

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues – Why not Stored Fields?
Why not use Stored Fields?

Document ID

•  Stored fields have one indirection
per document resulting in two disk
seeks per document

.fdx

fetch filepointer to field data

.fdt

scan by id until field is found

•  Performance cost quickly adds up
when fetching millions of documents

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues – Why not Field Cache?
Why not use Field Cache?
•  Is memory resident
•  Works fine when there is enough memory
•  But keeping millions of un-inverted values in memory is impossible
•  Additional cost to parse values (from String and to String)

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues
•  Dense column based storage (1 Value per Document and 1 Column
per field and segment)
•  Accepts primitives
•  No conversion from/to String needed
•  Loads 80x-100x faster than building a FieldCache
•  All the work is done during Indexing
•  DocValue fields can be indexed and stored too

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Lessons Learnt
Indexing
•  Reuse index writers, field and document instances
•  Create many partitions and Merge them in a different process
•  Rebuild (bootstrap) entire index if possible
•  Use partial updates with caution
•  Analyze the index
Serving
•  Reuse a single instance of IndexSearcher
•  Limit usage of stored fields and term vectors
•  Plan for load balancing and failover
•  Cache term frequencies
•  Use different machines for Serving and indexing

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Why not use an existing solution?
•  Doesn’t allow dynamic schema
•  Difficult to bootstrap indexes built in
hadoop
•  Indexing elevates query latency

•  Doesn’t allow dynamic schema
•  Difficult to bootstrap indexes built in
hadoop
•  Larger memory overhead
•  Comparatively slow

©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com

©2013 LinkedIn Corporation. All Rights Reserved.

More Related Content

What's hot

Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Foreign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with PostgresForeign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with PostgresEDB
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneInnovative Management Services
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
OpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewOpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewKingsley Uyi Idehen
 
Virtuoso Universal Server Overview
Virtuoso Universal Server OverviewVirtuoso Universal Server Overview
Virtuoso Universal Server Overviewrumito
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIAndrew Brust
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Alex Gorbachev
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkVimal Sharma
 

What's hot (20)

Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Foreign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with PostgresForeign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with Postgres
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
 
Big data course
Big data  courseBig data  course
Big data course
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
OpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewOpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers Overview
 
Virtuoso Universal Server Overview
Virtuoso Universal Server OverviewVirtuoso Universal Server Overview
Virtuoso Universal Server Overview
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BI
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas Talk
 

Similar to How Lucene Powers the LinkedIn Segmentation and Targeting Platform

How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHien Luu
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to PostgresEDB
 
The Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesThe Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesEDB
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresEDB
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldMaria Colgan
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQServiceRocket
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationEmbarcadero Technologies
 
SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)Alan Eardley
 
SharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptSharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptJohn Mongell
 
LinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformLinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformHien Luu
 
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationDataWorks Summit
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)Sid Anand
 
Introduction to Active Directory
Introduction to Active DirectoryIntroduction to Active Directory
Introduction to Active DirectoryJalpesh Vadgama
 
(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020Markus Michalewicz
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?Nicolas Georgeault
 
#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?Tammy Bednar
 
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...Chris Muir
 

Similar to How Lucene Powers the LinkedIn Segmentation and Targeting Platform (20)

How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
The Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesThe Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle Databases
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to Postgres
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQ
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)
 
SharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptSharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the Crypt
 
LinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformLinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting Platform
 
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data Application
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Introduction to Active Directory
Introduction to Active DirectoryIntroduction to Active Directory
Introduction to Active Directory
 
(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?
 
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...
 
Muruga logeswaran CV-Senior .Net Developer
Muruga logeswaran CV-Senior .Net DeveloperMuruga logeswaran CV-Senior .Net Developer
Muruga logeswaran CV-Senior .Net Developer
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 

How Lucene Powers the LinkedIn Segmentation and Targeting Platform

  • 1. How Lucene Powers LinkedIn Segmentation & Targeting Platform Lucene/SOLR Revolution EU, November 2013 Hien Luu, Raj Rangaswamy ©2013 LinkedIn Corporation. All Rights Reserved.
  • 2. About Us * Hien  Luu   Rajasekaran   Rangaswamy  
  • 3. Agenda §  Little bit about LinkedIn §  Segmentation & Targeting Platform Overview §  How Lucene powers Segmentation & Targeting Platform §  Q&A ©2013 LinkedIn Corporation. All Rights Reserved.
  • 4. Our Mission Connect the world’s professionals to make them more productive and successful. Our Vision Create economic opportunity for every professional in the world. Members First!
  • 5. The world’s largest professional network Over 65% of members are now international   >30M   >90% Fortune  100  Companies     use  LinkedIn  Talent  Soln  to  hire   >3M   Company  Pages       19 Languages     >5.7B   Professional  searches  in  2012     ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Other Company Facts •  Headquartered  in  Mountain  View,  Calif.,  with  offices  around  the  world! •  LinkedIn  has  ~4200  full-­‐Kme  employees  located  around  the  world   *   Source : http://press.linkedin.com/about
  • 7. SegmentaKon  &  TargeKng   ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. Segmentation & Targeting Bhaskar Ghosh Attribute types
  • 10. Segmentation & Targeting 1. Create attributes §  §  §  §  §  Name Email State Occupation Etc. 2. Attributes Added to Table Name   Email   State   OccupaEon   John  Smith   jsmith@blah.com   California   Engineer   Jane  Smith   smithj@mail.com   Nevada   HR  Manager   Jane  Doe   jdoe@email.com   California   …   Engineer   3. Create Target Segment: California, Engineer Name   Email   State   OccupaEon   John  Smith   jsmith@blah.com   California   Engineer   Jane  Doe   jdoe@email.com   California   4. Export List & Send Vendor Engineer   LinkedIn Confidential ©2013 All Rights Reserved 10  
  • 11. Segmentation & Targeting §  Business definition –  Business would like to launch new campaign often –  Business would like to specify targeting criteria using arbitrary set of attributes –  Attributes need to be computed to fulfill the targeting criteria –  The attribute data resides on Hadoop or TD –  Business is most comfortable with SQL-like language ©2013 LinkedIn Corporation. All Rights Reserved.
  • 12. Segmentation & Targeting Attribute Computation Engine ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine
  • 13. Segmentation & Targeting Attribute consolidation Self-service Attribute Computation Engine Support various data sources ©2013 LinkedIn Corporation. All Rights Reserved. Attribute availability
  • 14. Segmentation & Targeting PB Attribute computation ~238M TB TB ~440 ©2013 LinkedIn Corporation. All Rights Reserved.
  • 15. Segmentation & Targeting Build segments Self-service Attribute Serving Engine Attribute predicate expression ©2013 LinkedIn Corporation. All Rights Reserved. Build lists
  • 16. Segmentation & Targeting count filter $ 1234 complex sum expressions Σ Serving Engine ~238M ~440 LinkedIn Member Attribute table ©2013 LinkedIn Corporation. All Rights Reserved.
  • 17. LinkedIn Segmentation & Targeting Platform Who are the job seekers? Who are the LinkedIn Talent Solution prospects in Europe? Who are north American recruiters that don’t work for a competitor? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 18. LinkedIn Segmentation & Targeting Platform Complex tree-like attribute predicate expressions ©2013 LinkedIn Corporation. All Rights Reserved.
  • 19. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 20. Architecture Attribute Serving Engine Attribute Computation Engine Data Storage Layer ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Indexing Attribute Creation Engine Attribute Serving Engine Attribute Materialization Engine Attribute Metastore
  • 21. Indexer Mapper mysql attribute store Avro data in HDFS Attribute Definitions HDFS Hadoop Indexer MR shard 1 shard 2 Index Merger shard n K=> AvroKey<GenericRecord> V=> AvroValue<NullWritable> Reducer K=> NullWritable V=> LuceneDocumentWrapper LuceneOutputFormat RecordWriter LuceneDocumentWrapper Document Web Servers Index ©2013 LinkedIn Corporation. All Rights Reserved.
  • 22. Serving JSON Predicate Expression JSON Lucene Query Parser Inverted Index ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Segment & List Inverted Index
  • 23. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 24. Serving – Load Balanced Model HTTP Request Load Balancer Web Server 1 Shard 1 Web Server 2 Shard 2 Shared Drive ©2013 LinkedIn Corporation. All Rights Reserved. Web Server n Shard n
  • 25. Serving – Load Balanced Model But Wait….. •  Is load balancing alone good enough? •  What about distribution and failover? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 26. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 27. Next Steps - Distributed Model •  A generic cluster management framework •  Used to manage partitioned and replicated resources in distributed systems •  Built on top of Zookeeper that hides the complexity of ZK primitives •  Provides distributed features such as leader election, twophase commit etc. via a model of state machine http://helix.incubator.apache.org/ ©2013 LinkedIn Corporation. All Rights Reserved.
  • 28. Next Steps - Distributed Model HTTP Request Load Balancer Scatter Gather Web Server 1 Web Server 2 Web Server 3 Shard 1 active Shard 2 active Shard 3 active Shard 2 standby Shard 3 standby Shard 1 standby ©2013 LinkedIn Corporation. All Rights Reserved.
  • 29. Next Steps - Distributed Model HTTP Request Load Balancer Scatter Gather Web Server 1 Web Server 2 Web Server 3 Shard 1 active Shard 2 active Shard 3 failure Shard 2 standby Shard 3 active Shard 1 failure ©2013 LinkedIn Corporation. All Rights Reserved.
  • 30. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 31. DocValues – Use Case •  Once segments are built, users want to forecast, see a target revenue projection for the campaigns that they want to run. •  Campaigns can be run on various Revenue Models •  This involves adding per member Propensity Scores and Dollar Amounts ©2013 LinkedIn Corporation. All Rights Reserved.
  • 32. DocValues – Why not Stored Fields? Why not use Stored Fields? Document ID •  Stored fields have one indirection per document resulting in two disk seeks per document .fdx fetch filepointer to field data .fdt scan by id until field is found •  Performance cost quickly adds up when fetching millions of documents ©2013 LinkedIn Corporation. All Rights Reserved.
  • 33. DocValues – Why not Field Cache? Why not use Field Cache? •  Is memory resident •  Works fine when there is enough memory •  But keeping millions of un-inverted values in memory is impossible •  Additional cost to parse values (from String and to String) ©2013 LinkedIn Corporation. All Rights Reserved.
  • 34. DocValues •  Dense column based storage (1 Value per Document and 1 Column per field and segment) •  Accepts primitives •  No conversion from/to String needed •  Loads 80x-100x faster than building a FieldCache •  All the work is done during Indexing •  DocValue fields can be indexed and stored too ©2013 LinkedIn Corporation. All Rights Reserved.
  • 35. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 36. Lessons Learnt Indexing •  Reuse index writers, field and document instances •  Create many partitions and Merge them in a different process •  Rebuild (bootstrap) entire index if possible •  Use partial updates with caution •  Analyze the index Serving •  Reuse a single instance of IndexSearcher •  Limit usage of stored fields and term vectors •  Plan for load balancing and failover •  Cache term frequencies •  Use different machines for Serving and indexing ©2013 LinkedIn Corporation. All Rights Reserved.
  • 37. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 38. Why not use an existing solution? •  Doesn’t allow dynamic schema •  Difficult to bootstrap indexes built in hadoop •  Indexing elevates query latency •  Doesn’t allow dynamic schema •  Difficult to bootstrap indexes built in hadoop •  Larger memory overhead •  Comparatively slow ©2013 LinkedIn Corporation. All Rights Reserved.
  • 39. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.