Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

LinkedIn Member Segmentation Platform: A Big Data Application

1,762 views

Published on

Creating member segmentations is one of the main functions of a marketing team at any Internet company. Marketing teams are constantly creating various member segments to tailor to the needs of marketing campaigns and these needs change frequently. Therefore there is a huge need for a self-service member segmentation platform that is easy to use and scalable to support large member data set. This presentation will go into the architecture of the LinkedIn Member Segmentation platform and how it leverages Hadoop technologies like Apache Pig, Apache Hive and enterprise data warehouse system like Teradata to provide a self-service way to create and manage member segmentations. In addition, it will also cover some of the interesting challenges and lessons learned from building this platform.

Published in: Technology, Business
  • Be the first to comment

LinkedIn Member Segmentation Platform: A Big Data Application

  1. 1. LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  2. 2. About Us * Hien Luu Sid Anand
  3. 3. ©2013 LinkedIn Corporation. All Rights Reserved. Our mission Connect the world’s professionals to make them more productive and successful
  4. 4. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  5. 5. * >88%Fortune 100 Companies use LinkedIn Talent Soln to hire Company Pages >2.9M Professional searches in 2012 >5.7B Languages 19 >30MFastest growing demographic: Students and NCGs The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  6. 6. Other Company Facts * • Headquartered in Mountain View, Calif., with offices around the world! • As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world Source : http://press.linkedin.com/about
  7. 7. Agenda  Company Overview • Big Data @ LinkedIn • The Segmentation & Targeting Problem • Solution : LinkedIn Segmentation & Targeting Platform • Q & A
  8. 8. Big Data @ LinkedIn ©2013 LinkedIn Corporation. All Rights Reserved.
  9. 9. LinkedIn : Big Data Story ©2013 LinkedIn Corporation. All Rights Reserved. Our Big Data Story depends on Infrastructure! • On-line Data Infrastructure • Near-line Data Infrastructure • Offline Data Infrastructure Oracle or Espresso Updates Web Serving Teradata Data Streams Near-lineOn-line Off-line
  10. 10. Big Data Story : On-line Data ©2013 LinkedIn Corporation. All Rights Reserved. On-line Data Infrastructure • Supports typical OLTP requirements • Highly concurrent R/W access • Transactional guarantees • Back-up & Recovery • Supports a central LinkedIn Data Principle! • “All data everywhere” • All OLTP databases need to provide a time-line consistent change stream • For this, we developed and open- sourced Databus! Oracle or Espresso Updates Web Serving On-line
  11. 11. Big Data Story : On-line Data Oracle or Espresso Data Change Events Search Index Graph Index Read Replicas Updates Standar dization A user updates the company, title, & school on his profile. He also accepts a connection The write is made to an Oracle or Espresso Master and DataBus replicates it: • the profile change is applied to the Standardization service  E.g. the many forms of IBM were canonicalized for search-friendliness • …. and to the Search Index  Recruiters can find you immediately by new keywords • the connection change is applied to the Graph Index service  The user can now start receiving feed updates from his new connections
  12. 12. Big Data Story : On-line Data Databus streams also update Hadoop! Oracle or Espresso Search Index Graph Index Read Replica Updates Standar dization Data Change Events
  13. 13. Big Data Story : Near-line & Off-line Data ©2013 LinkedIn Corporation. All Rights Reserved. 2 Main Sources of Data @ LinkedIn • User-provided data • e.g. Member Profile data (e.g. employment, education history, endorsements) • Tracking data via web site instrumentation • e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares Oracle or Espresso Updates Databus Web Servers Teradata
  14. 14. The Segmentation & Targeting Problem ©2013 LinkedIn Corporation. All Rights Reserved.
  15. 15. Segmentation & Targeting
  16. 16. Segmentation & Targeting Attribute types Bhaskar Ghosh
  17. 17. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2)
  18. 18. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2) Attributes Segment Definition Segment
  19. 19. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Problem Definition • The business wants to launch new campaigns often • The business wants to specify targeting criteria (segment definitions) using an arbitrary set of attributes • The attributes often need to be computed to fulfill the targeting criteria • This data resides on Hadoop or TD • The business is most comfortable with SQL-like languages
  20. 20. Segmentation & Targeting Solution ©2013 LinkedIn Corporation. All Rights Reserved.
  21. 21. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Attribute Serving Engine
  22. 22. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Self-service Support various data sources Attribute consolidation Attribute availability
  23. 23. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute computation ~225M PB TB TB ~240
  24. 24. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Portal Web Application Attribute & Definition Metadata
  25. 25. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute & Definition Metadata TD Executor Hive Executor Pig Executor REST REST REST
  26. 26. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. M/R Stitcher /path/dataset1 /path/dataset2 /path/dataset3 /path/dataset4 /path/lnkd_big_table Data Loader Attribute consolidation & availability
  27. 27. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. LinkedIn big table, the most sought after data Segmentation Propensity Model Ad hoc analysis LinkedIn big table
  28. 28. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine Self-service Attribute predicate expression Build segments Build lists
  29. 29. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Serving Engine $ count filter sum complex expressions Σ1234 LinkedIn big table ~225M ~240
  30. 30. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Inverted Index Inverted Index M/R Indexer LinkedIn big table Attribute & Definition Metadata
  31. 31. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Who are north American recruiters that don’t work for a competitor? Who are the LinkedIn Talent Solution prospects in Europe? Who are the job seekers?
  32. 32. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. JSON Predicate Expression JSON Lucene Query Parser Inverted Index Inverted Index Inverted Index Segment & List
  33. 33. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Complex tree-like attribute predicate expressions
  34. 34. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. A marketing campaign is represented by a list
  35. 35. Conclusion ©2013 LinkedIn Corporation. All Rights Reserved. Move at business speed and scale at LinkedIn scale  Segmentation & Targeting Platform – Self-service – Multiple data sources & massive data volume – Support complex expression evaluation in seconds – Attribute availability at business speed
  36. 36. Engineering Team  Jessica Ho  Swetha Karthik  Raj Rangaswamy  Tony Tong  Ajinkya Harkare  Hien Luu  Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  37. 37. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.

×