SlideShare a Scribd company logo
1 of 37
Download to read offline
LinkedIn Segmentation & Targeting
Platform: A Big Data Application
Hadoop Summit, June 2013
Hien Luu, Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
About Us
*
Hien Luu Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Our mission
Connect the world’s professionals to make
them more productive and successful
Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
*
>88%Fortune 100 Companies
use LinkedIn Talent Soln to hire
Company Pages
>2.9M
Professional searches in 2012
>5.7B
Languages
19
>30MFastest growing demographic:
Students and NCGs
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
*
• Headquartered in Mountain View, Calif., with offices around the world!
• As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around
the world
Source :
http://press.linkedin.com/about
Agenda
 Company Overview
• Big Data @ LinkedIn
• The Segmentation & Targeting Problem
• Solution : LinkedIn Segmentation & Targeting Platform
• Q & A
Big Data @ LinkedIn
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn : Big Data Story
©2013 LinkedIn Corporation. All Rights Reserved.
Our Big Data Story depends on Infrastructure!
• On-line Data Infrastructure
• Near-line Data Infrastructure
• Offline Data Infrastructure
Oracle or
Espresso
Updates
Web
Serving
Teradata
Data Streams
Near-lineOn-line Off-line
Big Data Story : On-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
On-line Data Infrastructure
• Supports typical OLTP requirements
• Highly concurrent R/W access
• Transactional guarantees
• Back-up & Recovery
• Supports a central LinkedIn Data Principle!
• “All data everywhere”
• All OLTP databases need to provide a
time-line consistent change stream
• For this, we developed and open-
sourced Databus!
Oracle or
Espresso
Updates
Web
Serving
On-line
Big Data Story : On-line Data
Oracle or
Espresso Data Change Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standar
dization
A user updates the company, title, & school on his profile. He also accepts a
connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:
• the profile change is applied to the Standardization service
 E.g. the many forms of IBM were canonicalized for search-friendliness
• …. and to the Search Index
 Recruiters can find you immediately by new keywords
• the connection change is applied to the Graph Index service
 The user can now start receiving feed updates from his new connections
Big Data Story : On-line Data
Databus streams also update Hadoop!
Oracle or
Espresso
Search
Index
Graph
Index
Read
Replica
Updates
Standar
dization
Data Change Events
Big Data Story : Near-line & Off-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
2 Main Sources of Data @ LinkedIn
• User-provided data
• e.g. Member Profile data (e.g. employment, education history, endorsements)
• Tracking data via web site instrumentation
• e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares
Oracle or
Espresso
Updates
Databus
Web
Servers
Teradata
The
Segmentation & Targeting
Problem
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting Attribute types
Bhaskar Ghosh
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Attributes
Segment
Definition
Segment
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Problem Definition
• The business wants to launch new campaigns often
• The business wants to specify targeting criteria (segment
definitions) using an arbitrary set of attributes
• The attributes often need to be computed to fulfill the targeting
criteria
• This data resides on Hadoop or TD
• The business is most comfortable with SQL-like languages
Segmentation & Targeting Solution
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Attribute
Serving
Engine
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Self-service
Support various
data sources
Attribute
consolidation
Attribute
availability
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute computation
~225M
PB
TB
TB
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute Portal Web Application
Attribute & Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute &
Definition
Metadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
M/R
Stitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
Data
Loader
Attribute consolidation & availability
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn big table, the most sought after data
Segmentation
Propensity
Model
Ad hoc analysis
LinkedIn big table
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Serving
Engine
Self-service
Attribute predicate
expression
Build
segments
Build lists
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Serving Engine
$
count filter sum
complex
expressions
Σ1234
LinkedIn big table
~225M
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Inverted
Index
Inverted
Index
Inverted
Index
M/R
Indexer
LinkedIn big table
Attribute &
Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
JSON Predicate
Expression
JSON Lucene
Query Parser
Inverted
Index
Inverted
Index
Inverted
Index
Segment &
List
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Complex tree-like attribute predicate expressions
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
A marketing campaign is represented by a list
Conclusion
©2013 LinkedIn Corporation. All Rights Reserved.
Move at business speed and scale at LinkedIn scale
 Segmentation & Targeting Platform
– Self-service
– Multiple data sources & massive data volume
– Support complex expression evaluation in seconds
– Attribute availability at business speed
Engineering Team
 Jessica Ho
 Swetha Karthik
 Raj Rangaswamy
 Tony Tong
 Ajinkya Harkare
 Hien Luu
 Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com
©2013 LinkedIn Corporation. All Rights Reserved.

More Related Content

What's hot

Product & Brand Management. Development of a new product for Manolo Blahnik
Product & Brand Management. Development of a new product for Manolo BlahnikProduct & Brand Management. Development of a new product for Manolo Blahnik
Product & Brand Management. Development of a new product for Manolo BlahnikSofia Fominova
 
INDIGO V/S SPICEJET
INDIGO V/S SPICEJETINDIGO V/S SPICEJET
INDIGO V/S SPICEJETROHITRAJ433
 
Go to Market Strategy
Go to Market StrategyGo to Market Strategy
Go to Market StrategyRajiv Netra
 
Ch 2 Marketing research
Ch 2 Marketing researchCh 2 Marketing research
Ch 2 Marketing researchRitesh Kumar
 
Sales & Channels
Sales & ChannelsSales & Channels
Sales & ChannelsPearl Kalra
 
Customer Conversion Strategy PowerPoint Presentation Slides
Customer Conversion Strategy PowerPoint Presentation SlidesCustomer Conversion Strategy PowerPoint Presentation Slides
Customer Conversion Strategy PowerPoint Presentation SlidesSlideTeam
 
Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides
Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides
Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides SlideTeam
 
Consumer buying behavior of mobile handsets
Consumer buying behavior of mobile handsetsConsumer buying behavior of mobile handsets
Consumer buying behavior of mobile handsetsHeemanish Midde
 
Management Summary PowerPoint Presentation Slides
Management Summary PowerPoint Presentation SlidesManagement Summary PowerPoint Presentation Slides
Management Summary PowerPoint Presentation SlidesSlideTeam
 
Corporate Brand Strategy of Dupont
Corporate Brand Strategy of DupontCorporate Brand Strategy of Dupont
Corporate Brand Strategy of DupontNishant Varshney
 
Product Launch Go To Market Strategy PowerPoint Presentation Slides
Product Launch Go To Market Strategy PowerPoint Presentation SlidesProduct Launch Go To Market Strategy PowerPoint Presentation Slides
Product Launch Go To Market Strategy PowerPoint Presentation SlidesSlideTeam
 
2002 JEEP LIBERTY Service Repair Manual
2002 JEEP LIBERTY Service Repair Manual2002 JEEP LIBERTY Service Repair Manual
2002 JEEP LIBERTY Service Repair Manualjskfmm bdhgb
 
Sales forecasting by brands academy
Sales forecasting by brands academySales forecasting by brands academy
Sales forecasting by brands academyBrands Academy
 
Brand Management Final Project
Brand Management Final ProjectBrand Management Final Project
Brand Management Final Projectbarry0306
 
Harley Davidson Marketing Plan
Harley Davidson Marketing PlanHarley Davidson Marketing Plan
Harley Davidson Marketing PlanJon Englund
 
Detailed Market Size Analysis PowerPoint Presentation Slides
Detailed Market Size Analysis PowerPoint Presentation SlidesDetailed Market Size Analysis PowerPoint Presentation Slides
Detailed Market Size Analysis PowerPoint Presentation SlidesSlideTeam
 

What's hot (20)

Chevrolet
ChevroletChevrolet
Chevrolet
 
Product & Brand Management. Development of a new product for Manolo Blahnik
Product & Brand Management. Development of a new product for Manolo BlahnikProduct & Brand Management. Development of a new product for Manolo Blahnik
Product & Brand Management. Development of a new product for Manolo Blahnik
 
INDIGO V/S SPICEJET
INDIGO V/S SPICEJETINDIGO V/S SPICEJET
INDIGO V/S SPICEJET
 
Go to Market Strategy
Go to Market StrategyGo to Market Strategy
Go to Market Strategy
 
Ch 2 Marketing research
Ch 2 Marketing researchCh 2 Marketing research
Ch 2 Marketing research
 
Sales & Channels
Sales & ChannelsSales & Channels
Sales & Channels
 
Customer Conversion Strategy PowerPoint Presentation Slides
Customer Conversion Strategy PowerPoint Presentation SlidesCustomer Conversion Strategy PowerPoint Presentation Slides
Customer Conversion Strategy PowerPoint Presentation Slides
 
Business of supercars
Business of supercarsBusiness of supercars
Business of supercars
 
Routes to market
Routes to marketRoutes to market
Routes to market
 
Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides
Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides
Six Building Blocks Of Digital Evolution PowerPoint Presentation Slides
 
Consumer buying behavior of mobile handsets
Consumer buying behavior of mobile handsetsConsumer buying behavior of mobile handsets
Consumer buying behavior of mobile handsets
 
Management Summary PowerPoint Presentation Slides
Management Summary PowerPoint Presentation SlidesManagement Summary PowerPoint Presentation Slides
Management Summary PowerPoint Presentation Slides
 
Corporate Brand Strategy of Dupont
Corporate Brand Strategy of DupontCorporate Brand Strategy of Dupont
Corporate Brand Strategy of Dupont
 
Product Launch Go To Market Strategy PowerPoint Presentation Slides
Product Launch Go To Market Strategy PowerPoint Presentation SlidesProduct Launch Go To Market Strategy PowerPoint Presentation Slides
Product Launch Go To Market Strategy PowerPoint Presentation Slides
 
2002 JEEP LIBERTY Service Repair Manual
2002 JEEP LIBERTY Service Repair Manual2002 JEEP LIBERTY Service Repair Manual
2002 JEEP LIBERTY Service Repair Manual
 
Sales forecasting by brands academy
Sales forecasting by brands academySales forecasting by brands academy
Sales forecasting by brands academy
 
Brand Management Final Project
Brand Management Final ProjectBrand Management Final Project
Brand Management Final Project
 
Harley Davidson Marketing Plan
Harley Davidson Marketing PlanHarley Davidson Marketing Plan
Harley Davidson Marketing Plan
 
Blue star
Blue starBlue star
Blue star
 
Detailed Market Size Analysis PowerPoint Presentation Slides
Detailed Market Size Analysis PowerPoint Presentation SlidesDetailed Market Size Analysis PowerPoint Presentation Slides
Detailed Market Size Analysis PowerPoint Presentation Slides
 

Viewers also liked

Segmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedInSegmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedInchristyaron
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...christyaron
 
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...Aatif Awan
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseDataWorks Summit
 
How LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a BillionHow LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a BillionAatif Awan
 
Market segmentation presentation
Market segmentation presentationMarket segmentation presentation
Market segmentation presentationAmol Salve
 

Viewers also liked (8)

Segmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedInSegmentation and Messaging 2014Aug LinkedIn
Segmentation and Messaging 2014Aug LinkedIn
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
LinkedIn Targeting
LinkedIn TargetingLinkedIn Targeting
LinkedIn Targeting
 
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
Targeting, Segmentation and Messaging Approaches for Marketing and Sales Effe...
 
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Confer...
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
How LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a BillionHow LinkedIn built a Community of Half a Billion
How LinkedIn built a Community of Half a Billion
 
Market segmentation presentation
Market segmentation presentationMarket segmentation presentation
Market segmentation presentation
 

Similar to LinkedIn Segmentation & Targeting Platform: A Big Data Application

LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Linked in for small businesses 2013
Linked in for small businesses 2013Linked in for small businesses 2013
Linked in for small businesses 2013Richard Masters
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
 
#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraphVincent Biret
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation frameworkJoseph Adler
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfUnveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfAqsaBatool21
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn mislam77
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
How Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfHow Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfAqsaBatool21
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryInside Analysis
 
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)vivekkaushik795
 
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...Vincent Biret
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationInside Analysis
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution Sirinporn Setworaya
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfAqsaBatool21
 

Similar to LinkedIn Segmentation & Targeting Platform: A Big Data Application (20)

LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Ict careers
Ict careersIct careers
Ict careers
 
Linked in for small businesses 2013
Linked in for small businesses 2013Linked in for small businesses 2013
Linked in for small businesses 2013
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation framework
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
Add-On Demo
Add-On DemoAdd-On Demo
Add-On Demo
 
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfUnveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
How Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfHow Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdf
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
 
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
 

More from Sid Anand

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Sid Anand
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari Sid Anand
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with MavenSid Anand
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
 

More from Sid Anand (20)

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationBuild Intuit
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfwill854175
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxKunal Gupta
 
The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemSafe Software
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2DianaGray10
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024BookNet Canada
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxatharvdev2010
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 

Recently uploaded (20)

BoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another CenturyBoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another Century
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientation
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptx
 
The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data Ecosystem
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptx
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 

LinkedIn Segmentation & Targeting Platform: A Big Data Application

  • 1. LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Our mission Connect the world’s professionals to make them more productive and successful
  • 4. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 5. * >88%Fortune 100 Companies use LinkedIn Talent Soln to hire Company Pages >2.9M Professional searches in 2012 >5.7B Languages 19 >30MFastest growing demographic: Students and NCGs The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Other Company Facts * • Headquartered in Mountain View, Calif., with offices around the world! • As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world Source : http://press.linkedin.com/about
  • 7. Agenda  Company Overview • Big Data @ LinkedIn • The Segmentation & Targeting Problem • Solution : LinkedIn Segmentation & Targeting Platform • Q & A
  • 8. Big Data @ LinkedIn ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. LinkedIn : Big Data Story ©2013 LinkedIn Corporation. All Rights Reserved. Our Big Data Story depends on Infrastructure! • On-line Data Infrastructure • Near-line Data Infrastructure • Offline Data Infrastructure Oracle or Espresso Updates Web Serving Teradata Data Streams Near-lineOn-line Off-line
  • 10. Big Data Story : On-line Data ©2013 LinkedIn Corporation. All Rights Reserved. On-line Data Infrastructure • Supports typical OLTP requirements • Highly concurrent R/W access • Transactional guarantees • Back-up & Recovery • Supports a central LinkedIn Data Principle! • “All data everywhere” • All OLTP databases need to provide a time-line consistent change stream • For this, we developed and open- sourced Databus! Oracle or Espresso Updates Web Serving On-line
  • 11. Big Data Story : On-line Data Oracle or Espresso Data Change Events Search Index Graph Index Read Replicas Updates Standar dization A user updates the company, title, & school on his profile. He also accepts a connection The write is made to an Oracle or Espresso Master and DataBus replicates it: • the profile change is applied to the Standardization service  E.g. the many forms of IBM were canonicalized for search-friendliness • …. and to the Search Index  Recruiters can find you immediately by new keywords • the connection change is applied to the Graph Index service  The user can now start receiving feed updates from his new connections
  • 12. Big Data Story : On-line Data Databus streams also update Hadoop! Oracle or Espresso Search Index Graph Index Read Replica Updates Standar dization Data Change Events
  • 13. Big Data Story : Near-line & Off-line Data ©2013 LinkedIn Corporation. All Rights Reserved. 2 Main Sources of Data @ LinkedIn • User-provided data • e.g. Member Profile data (e.g. employment, education history, endorsements) • Tracking data via web site instrumentation • e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares Oracle or Espresso Updates Databus Web Servers Teradata
  • 14. The Segmentation & Targeting Problem ©2013 LinkedIn Corporation. All Rights Reserved.
  • 16. Segmentation & Targeting Attribute types Bhaskar Ghosh
  • 17. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2)
  • 18. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2) Attributes Segment Definition Segment
  • 19. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Problem Definition • The business wants to launch new campaigns often • The business wants to specify targeting criteria (segment definitions) using an arbitrary set of attributes • The attributes often need to be computed to fulfill the targeting criteria • This data resides on Hadoop or TD • The business is most comfortable with SQL-like languages
  • 20. Segmentation & Targeting Solution ©2013 LinkedIn Corporation. All Rights Reserved.
  • 21. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Attribute Serving Engine
  • 22. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Self-service Support various data sources Attribute consolidation Attribute availability
  • 23. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute computation ~225M PB TB TB ~240
  • 24. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Portal Web Application Attribute & Definition Metadata
  • 25. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute & Definition Metadata TD Executor Hive Executor Pig Executor REST REST REST
  • 26. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. M/R Stitcher /path/dataset1 /path/dataset2 /path/dataset3 /path/dataset4 /path/lnkd_big_table Data Loader Attribute consolidation & availability
  • 27. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. LinkedIn big table, the most sought after data Segmentation Propensity Model Ad hoc analysis LinkedIn big table
  • 28. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine Self-service Attribute predicate expression Build segments Build lists
  • 29. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Serving Engine $ count filter sum complex expressions Σ1234 LinkedIn big table ~225M ~240
  • 30. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Inverted Index Inverted Index M/R Indexer LinkedIn big table Attribute & Definition Metadata
  • 31. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Who are north American recruiters that don’t work for a competitor? Who are the LinkedIn Talent Solution prospects in Europe? Who are the job seekers?
  • 32. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. JSON Predicate Expression JSON Lucene Query Parser Inverted Index Inverted Index Inverted Index Segment & List
  • 33. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Complex tree-like attribute predicate expressions
  • 34. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. A marketing campaign is represented by a list
  • 35. Conclusion ©2013 LinkedIn Corporation. All Rights Reserved. Move at business speed and scale at LinkedIn scale  Segmentation & Targeting Platform – Self-service – Multiple data sources & massive data volume – Support complex expression evaluation in seconds – Attribute availability at business speed
  • 36. Engineering Team  Jessica Ho  Swetha Karthik  Raj Rangaswamy  Tony Tong  Ajinkya Harkare  Hien Luu  Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 37. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.

Editor's Notes

  1. We’re making great strides toward our mission:LinkedIn has over 225 million members, and we’re now adding more than two members per second. This is the fastest rate of absolute member growth in the company’s history. Sixty-four percent of LinkedIn members are currently located outside of the United States.LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 88 of the Fortune 100 companies.More than 2.9 million companies have LinkedIn Company Pages.LinkedIn members did over 5.7 billion professionally-oriented searches on the platform in 2012.[See http://press.linkedin.com/about for a complete list of LinkedIn facts and stats]
  2. Email Campaign & Ad targetingAcquire new paid customersRetain and engage existing customersPromote new productsTraining and other important announcements* Talk about the speed of changing segmentation and targeting criteria
  3. Professional identitySocial dataBehavioral
  4. Given the business problem that Sid outlined, the solution we came up with has two partsThe first part is about compute attributes based on the attribute definitionThe second part is about serving the attribute values to define segments, effectively performing user segmentation
  5. The attribute computation engine needs to support these 4 high level requirementsSelf-service meaning thatThere needs to be an easy way for someone on the business team to express the computational logic to compute a set of attributes for the needs of their marketing campaignsThis engine takes care of the complexity in executing the computational logic in terms of when, how and where to store the computation resultSupport various data sourcesData are in multiple places – TD and Hadoop. We need support thatFortunately SQL and HiveSQL are very similarAttribute consolidationOnce all the attributes are computed, they needed to be consolidated into a single dataset to make it easy everyone to consume and analyzeData availabilityRegister with Hive and copy the data onto TD system for business folks to consume
  6. At the high level, the attribute computation engine needs to be able compute attributes that come from different data sets, and some of these data sets are huge.And this presents all kinds of interesting challenges, as you can imagineThe output of the computation engine is this big table – 225M roows, one for each member, ~240 columns, one for each attributesBehavioral Data Site Engagement,OL Transactions,Searches,Comments,Discussions….Social DataConnections,Follows,EndorsementsDemographic DataThis data comes from member profileLocation,Gender,Title,Function,Seniority,Education
  7. Self-service way to manage attributesA web application where a member of marketing operations or business analyst team can use to express the computation logic in the form SQL select statement. And we call that attribute definition.The SQL statement is either a Teradata SQL statement or Hive QL statementThe web application validates the SQL statements to make sure they are valid and plus we need to extract the attribute name and their types, which will be useful for various purposeThe metadata about the attribute definitions and attributes are captured in a MySQL database. For HIVE QL queries - we support Hive hints as well general tuning parameters like split sizeOnce an attribute definition passes the validation step, it will go through an approval process, which is designed toMake sure there is no attribute duplicatesMake sure the query properly tunedOne of the benefits of this attribute portal is the centralization attribute definitions and make it easy to discovery attributes, the logic behind these attributes and data sourcewhen someone starts working on a marketing campaign, they first identify the targeting criteria based on the goals of the campaignfrom the set of targeting criteria, they identify what are the needed member attributes
  8. Attribute computing workhorseThese executors are scheduled to run on a regular basisThey contact the attribute definition metadata repository to retrieve what attribute definitions to executeThey execute the query in parallel using APIsTD executorExecute using JDBC and store result in temporary tablesWe are using an in house library called LASSEN, which is an M/R library that leverages the power of MapReduce framework to quickly and efficiently download the data to HDFS. Hive executorProgrammatically execute these Hive queriesOne of the classes in Hive is not thread safe, therefore we can’t execute Hive QLs in parallel using multiple threads, so we use multiple Hive executors approach insteadPig executorExecute pig script filesHas the ability to rerun only the failed scriptsInteresting runtime detailsWe have all kinds of queries, simple one and complex ones. The complex ones that may take hours to complete. However we don’t want a query that takes 5 or 6 hours. That would delay the attribute computing phase for all the queries. Our system has a built in mechanism to kill a long running query that exceeds certain amount of timeWhat about failed queries – even though we validate them at the attribute def. submission time, some of them will fail at runtime due to various reason. Our system is built to be resilient against these failed queries. Only the attributes of the failed queries will not be available. Our system collects accounting information about each of the queries – so we know how many queries were successfully completed, how many failed and how long each takes.The output of each attribute definition is stored in a separated folder. So if we have 50 attribute definitions, the result of those queries are scattered across 50 places on Hadoop
  9. Once the executors are completed executing and materializing the attributesThe job of the stitcher is to combine all these attributes together into a single data set, which I call LinkedIn big tableIt is an MapReduce job and it acts as a gateway to perform some validations like member id must not be less than 0 or certain values can’t be longer than certain lengthThe output of sticher is a single data set in Avro format that contains one record for every single LinkedIn memberThis output is also registered in Hive for data scientists to consumeTo make the linkedIn big table available for business analysts to generate more insights and further analysis, this same date set is copied onto TD via Data Loader componentThe processing executing these attribute definitions or select statements, stitching the attributes together into s single dataset and load the data onto TD takes about 5 to 6 hours.Not all attributes need to be refreshed daily, so we have a concept partial refresh and full refreshPartial refresh – only a subset of needed attribute definitions are executed and it takes much less time – 2-3 hours vs 5 to 6 hrs
  10. Linkedin big table – 200GBThe LinkedIn big table is used for multiple purposesPropensity modelRanking model, where each member is assigned a certain score to indicate how likely a member belongs to certain class of member or likely to take an action.i.e job seeker, or how likely someone will upgrade to paid subscription.Business analysts and data scientistsFor their own analysis The most sought after dataA very rich data set that contains all kinds of interesting attributes about our members and it is all in a single place.Because of the heavy lifting has been done and data is available in a single placeOthers don’t to have hunt down what data sets
  11. Self-service – web application for business analysts and marketing team to useSomeone who is not familiar with SQLUI that support drag and dropAttribute predicate expression is basically a boolean expression that is evaluated to true or false by comparing an attribute value to an expected valueFor example, whether the value of country attribute is United States or whether a member has more than 30 connectionsIn order to build segments – we need a way for expressing attribute predicates i.e. country in canada or in united statesSave this expression and evaluate it at a later pointBuilding segmentCombining various attribute predicates into a segmentBuild listsCombining segments together to target a certain set of member population for a marketing campagin
  12. Based on the requirements I talked about in the previous slide, the serving engine needs to support the following features/operationsCount – how many members meet certain criteriaFilter members that meet certain criteriaSum – each member is assigned a life time value for a particular product, so we need the ability compute the total dollar amount of a segment based on how many members meet the defined criteriaComplex nested expression with support for conjunction (and) and disjunction (or)The core problem that the serving engine needs to solve is to support arbitrary predicate expression against any of the attributes and return the result in a reasonable amount of time. We basically think this is an information retrieval problem, so we leverage Lucene to help us with this problemTo support those arbitrary predicate expressions, we found Lucene to be pretty good at this kind of problem.
  13. Map reduce applicationConsume data in Avro format and create Lucece indexesUsing custom writable to wrap a Lucene documentEach Lucence document contains all the 240+ attributes for each memberUse custom OutputFormat to build Lucene index segmentStore on local disk of reducer taskCopy onto HDFS at the end of the reduce taskLinkedIn big table – 200GBIndex – 175GB* # of map and reduce task
  14. First one requires only one attributes – job seeker statusSecond requires two attributesTalent solution prospectsCountry where they work inFirst one would need 3 attributesWhether a member is a recruiterThe country that member works inWhether the company they work is considered a competitor of LinkedIn
  15. JSON Predicate Expression – use JSON to define the format of the predicate expression. JSON is well suited for this purpose and it supports nested data structure, fairly flexible, easy to parseSupports different data typesFor each data types, certain operators are supported.An JSON predicate expression consists of an attribute name, data type, operator, and one or more valuesThe JSON predication expression is the contract between the browser and serverStoring the predicate expression in mysql and evaluate it at run time
  16. Web applicationHas a UI for defining segments and listsSegment builderDrag arbitrary attributes and build predicate expressionsWith a click of a button, marketing team can get a sense of how many members meet the defined criteria define in the segmentThis will allow them a chance to change the criteria to increase the count for decrease the countSegments are meant as building blocks
  17. Segments are building blocks and certain reusable Each marketing campaign is represented by a list, which is a collection of segments, each segment can be one of the two types.Inclusions – include members that meet the defined criteria of each of the selected segmentsNet count and raw countExclusions – exclude those members
  18. One of things we are working on is to improve the turn around time for attributes – from the time an attribute is defined to the time it is available for building segments
  19. * Give a shout out for engineering team that work on this platform