LinkedIn Segmentation & Targeting Platform: A Big Data Application

1,835 views

Published on

This talk was given by Hien Luu (Senior Software Engineer at LinkedIn) and Siddharth Anand (Senior Staff Software Engineer at LinkedIn) at the Hadoop Summit (June 2013).

Published in: Technology

LinkedIn Segmentation & Targeting Platform: A Big Data Application

  1. 1. LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  2. 2. About Us * Hien  Luu   Sid  Anand  
  3. 3. ©2013 LinkedIn Corporation. All Rights Reserved. Our  mission   Connect the world’s professionals to make them more productive and successful
  4. 4. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  5. 5. * >88%   Fortune  100  Companies     use  LinkedIn  Talent  Soln  to  hire   Company  Pages     >2.9M   Professional  searches  in  2012     >5.7B   Languages     19   >30M   Fastest  growing  demographic:   Students  and  NCGs   The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  6. 6. Other Company Facts * •  Headquartered  in  Mountain  View,  Calif.,  with  offices  around  the  world!   •  As  of  June  1,  2013,  LinkedIn  has  ~3,700  full-­‐Rme  employees  located  around   the  world     Source : http://press.linkedin.com/about
  7. 7. Agenda ü  Company Overview •  Big Data @ LinkedIn •  The Segmentation & Targeting Problem •  Solution : LinkedIn Segmentation & Targeting Platform •  Q & A
  8. 8.   Big  Data  @  LinkedIn   ©2013 LinkedIn Corporation. All Rights Reserved.
  9. 9. LinkedIn : Big Data Story ©2013 LinkedIn Corporation. All Rights Reserved. Our  Big  Data  Story  depends  on  Infrastructure!   •  On-­‐line  Data  Infrastructure   •  Near-­‐line  Data  Infrastructure   •  Offline  Data  Infrastructure   Oracle  or   Espresso   Updates   Web   Serving   Teradata   Data  Streams   Near-­‐line  On-­‐line   Off-­‐line  
  10. 10. Big Data Story : On-line Data ©2013 LinkedIn Corporation. All Rights Reserved. On-­‐line  Data  Infrastructure   •  Supports  typical  OLTP  requirements     •  Highly  concurrent  R/W  access   •  TransacRonal  guarantees   •  Back-­‐up  &  Recovery   •  Supports  a  central  LinkedIn  Data  Principle!     •  “All  data  everywhere”   •  All  OLTP  databases  need  to  provide  a   Rme-­‐line  consistent  change  stream     •  For  this,  we  developed  and  open-­‐ sourced  Databus!   Oracle  or   Espresso   Updates   Web   Serving   On-­‐line  
  11. 11. Big Data Story : On-line Data Oracle  or   Espresso   Data  Change  Events   Search   Index   Graph   Index   Read   Replicas   Updates   Standar dizaRon   A user updates the company, title, & school on his profile. He also accepts a connection The write is made to an Oracle or Espresso Master and DataBus replicates it: •  the profile change is applied to the Standardization service Ø  E.g. the many forms of IBM were canonicalized for search-friendliness •  …. and to the Search Index Ø  Recruiters can find you immediately by new keywords •  the connection change is applied to the Graph Index service Ø  The user can now start receiving feed updates from his new connections
  12. 12. Big Data Story : On-line Data Databus streams also update Hadoop! Oracle  or   Espresso   Search   Index   Graph   Index   Read   Replica   Updates   Standar dizaRon   Data  Change  Events  
  13. 13. Big Data Story : Near-line & Off-line Data ©2013 LinkedIn Corporation. All Rights Reserved. 2  Main  Sources  of  Data  @  LinkedIn   •  User-­‐provided  data   •  e.g.  Member  Profile  data  (e.g.  employment,  educaRon  history,  endorsements)   •  Tracking  data  via  web  site  instrumentaRon     •  e.g.  pages  viewed,  email  opened/sent,  social  gestures  :  posts/likes/shares   Oracle  or   Espresso   Updates   Databus   Web   Servers   Teradata  
  14. 14. The   SegmentaRon  &  TargeRng     Problem   ©2013 LinkedIn Corporation. All Rights Reserved.
  15. 15. Segmentation & Targeting
  16. 16. Segmentation & Targeting Attribute types Bhaskar Ghosh
  17. 17. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step  1  :  Take  some  informaSon  about  users   Member  ID   Join  Date   Country   Responded  to   PromoSon  X1   1   01/01/2013   FR   F   2   01/02/2013   BE   F   3   01/03/2013   FR   F   4   02/01/2013   FR   T   Step  2  :  Provide  some  targeSng  criteria  for  a  new  promoSon     Pick  members  where   •  Join  Date  between('01/01/2013",  '01/31/2013")  and     •  Country="FR"  and     •  Responded  to  PromoRon  X1="F"     à  Members  1  &  3     Step  3  :  Target  them  for  a  different  email  campaign  (promoRon_X2)  
  18. 18. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step  1  :  Take  some  informaSon  about  users   Member  ID   Join  Date   Country   Responded  to   PromoSon  X1   1   01/01/2013   FR   F   2   01/02/2013   BE   F   3   01/03/2013   FR   F   4   02/01/2013   FR   T   Step  2  :  Provide  some  targeSng  criteria  for  a  new  promoSon     Pick  members  where   •  Join  Date  between('01/01/2013",  '01/31/2013")  and     •  Country="FR"  and     •  Responded  to  PromoRon  X1="F"     à  Members  1  &  3     Step  3  :  Target  them  for  a  different  email  campaign  (promoRon_X2)   Alributes   Segment   DefiniRon   Segment  
  19. 19. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Problem  DefiniSon     •  The  business  wants  to  launch  new  campaigns  omen   •  The  business  wants  to  specify  targeRng  criteria  (segment   definiRons)  using  an  arbitrary  set  of  alributes   •  The  alributes  omen  need  to  be  computed  to  fulfill  the  targeRng   criteria   •  This  data  resides  on  Hadoop  or  TD   •  The  business  is  most  comfortable  with  SQL-­‐like  languages      
  20. 20.   SegmentaRon  &  TargeRng  SoluRon   ©2013 LinkedIn Corporation. All Rights Reserved.
  21. 21. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Attribute Serving Engine
  22. 22. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Self-service Support various data sources Attribute consolidation Attribute availability
  23. 23. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute computation ~225M PB TB TB ~240
  24. 24. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Portal Web Application Attribute & Definition Metadata
  25. 25. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute & Definition Metadata TD Executor Hive Executor Pig Executor REST REST REST
  26. 26. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. M/R Stitcher /path/dataset1 /path/dataset2 /path/dataset3 /path/dataset4 /path/lnkd_big_table Data Loader Attribute consolidation & availability
  27. 27. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. LinkedIn big table, the most sought after data Segmentation Propensity Model Ad hoc analysis LinkedIn big table
  28. 28. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine Self-service Attribute predicate expression Build segments Build lists
  29. 29. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Serving Engine $ count filter sum complex expressions Σ1234 LinkedIn big table ~225M ~240
  30. 30. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Inverted Index Inverted Index M/R Indexer LinkedIn big table Attribute & Definition Metadata
  31. 31. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Who are north American recruiters that don’t work for a competitor? Who are the LinkedIn Talent Solution prospects in Europe? Who are the job seekers?
  32. 32. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. JSON Predicate Expression JSON Lucene Query Parser Inverted Index Inverted Index Inverted Index Segment & List
  33. 33. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Complex tree-like attribute predicate expressions
  34. 34. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. A marketing campaign is represented by a list
  35. 35. Conclusion ©2013 LinkedIn Corporation. All Rights Reserved. Move at business speed and scale at LinkedIn scale §  Segmentation & Targeting Platform –  Self-service –  Multiple data sources & massive data volume –  Support complex expression evaluation in seconds –  Attribute availability at business speed
  36. 36. Engineering Team §  Jessica Ho §  Swetha Karthik §  Raj Rangaswamy §  Tony Tong §  Ajinkya Harkare §  Hien Luu §  Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  37. 37. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.

×