LinkedIn Member Segmentation Platform: A Big Data Application

LinkedIn Segmentation & Targeting
Platform: A Big Data Application
Hadoop Summit, June 2013
Hien Luu, Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.

Our mission
Connect the world’s professionals to make
them more productive and successful

Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about

*
>88%Fortune 100 Companies
use LinkedIn Talent Soln to hire
Company Pages
>2.9M
Professional searches in 2012
>5.7B
Languages
19
>30MFastest growing demographic:
Students and NCGs
The world’s largest professional network
Over 64% of members are now international
Source :

Other Company Facts
*
• Headquartered in Mountain View, Calif., with offices around the world!
• As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around
the world
Source :

Agenda
 Company Overview
• Big Data @ LinkedIn
• The Segmentation & Targeting Problem
• Solution : LinkedIn Segmentation & Targeting Platform
• Q & A

Big Data @ LinkedIn

LinkedIn : Big Data Story
Our Big Data Story depends on Infrastructure!
• On-line Data Infrastructure
• Near-line Data Infrastructure
• Offline Data Infrastructure
Oracle or
Espresso
Updates
Web
Serving
Teradata
Data Streams
Near-lineOn-line Off-line

Big Data Story : On-line Data
On-line Data Infrastructure
• Supports typical OLTP requirements
• Highly concurrent R/W access
• Transactional guarantees
• Back-up & Recovery
• Supports a central LinkedIn Data Principle!
• “All data everywhere”
• All OLTP databases need to provide a
time-line consistent change stream
• For this, we developed and open-
sourced Databus!
Oracle or
Espresso
Updates
Web
Serving
On-line

Oracle or
Espresso Data Change Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standar
dization
A user updates the company, title, & school on his profile. He also accepts a
connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:
• the profile change is applied to the Standardization service
 E.g. the many forms of IBM were canonicalized for search-friendliness
• …. and to the Search Index
 Recruiters can find you immediately by new keywords
• the connection change is applied to the Graph Index service
 The user can now start receiving feed updates from his new connections

Databus streams also update Hadoop!
Oracle or
Espresso
Search
Index
Graph
Index
Read
Replica
Updates
Standar
dization
Data Change Events

Big Data Story : Near-line & Off-line Data
2 Main Sources of Data @ LinkedIn
• User-provided data
• e.g. Member Profile data (e.g. employment, education history, endorsements)
• Tracking data via web site instrumentation
• e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares
Oracle or
Espresso
Updates
Databus
Web
Servers
Teradata

The
Segmentation & Targeting
Problem

Segmentation & Targeting Attribute types
Bhaskar Ghosh

Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)

Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Attributes
Segment
Definition
Segment

Problem Definition
• The business wants to launch new campaigns often
• The business wants to specify targeting criteria (segment
definitions) using an arbitrary set of attributes
• The attributes often need to be computed to fulfill the targeting
criteria
• This data resides on Hadoop or TD
• The business is most comfortable with SQL-like languages

Segmentation & Targeting Solution

Attribute
Computation
Engine
Attribute
Serving
Engine

Attribute
Computation
Engine
Self-service
Support various
data sources
Attribute
consolidation
Attribute
availability

Attribute computation
~225M
PB
TB
TB
~240

LinkedIn Segmentation & Targeting Platform
Attribute Portal Web Application
Attribute & Definition
Metadata

Attribute &
Definition
Metadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST

M/R
Stitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
Data
Loader
Attribute consolidation & availability

LinkedIn big table, the most sought after data
Segmentation
Propensity
Model
Ad hoc analysis
LinkedIn big table

Attribute
Serving
Engine
Self-service
Attribute predicate
expression
Build
segments
Build lists

Serving Engine
$
count filter sum
complex
expressions
Σ1234
LinkedIn big table
~225M
~240

Inverted
Index
Inverted
Index
Inverted
Index
M/R
Indexer
LinkedIn big table
Attribute &
Definition
Metadata

Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?

JSON Predicate
Expression
JSON Lucene
Query Parser
Inverted
Index
Inverted
Index
Inverted
Index
Segment &
List

Complex tree-like attribute predicate expressions

A marketing campaign is represented by a list

Conclusion
Move at business speed and scale at LinkedIn scale
 Segmentation & Targeting Platform
– Self-service
– Multiple data sources & massive data volume
– Support complex expression evaluation in seconds
– Attribute availability at business speed

Engineering Team
 Jessica Ho
 Swetha Karthik
 Raj Rangaswamy
 Tony Tong
 Ajinkya Harkare
 Hien Luu
 Sid Anand

Questions?
More info: data.linkedin.com

LinkedIn Member Segmentation Platform: A Big Data Application

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to LinkedIn Member Segmentation Platform: A Big Data Application

Similar to LinkedIn Member Segmentation Platform: A Big Data Application (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

LinkedIn Member Segmentation Platform: A Big Data Application

Editor's Notes