Your SlideShare is downloading. ×
0
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Linked in stream experimentation framework
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Linked in stream experimentation framework

629

Published on

Talk from Strata Santa Clara 2014. Given with Xin Fu and Bee-Chung Chen

Talk from Strata Santa Clara 2014. Given with Xin Fu and Bee-Chung Chen

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
629
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. LinkedIn’s STREAM EXPERIMENTATION FRAMEWORK Joseph Adler, Bee-Chung Chen, and Xin Fu O’Reilly Strata Conference February 12 2014 ©2014 LinkedIn Corporation. All Rights Reserved.
  • 2. ©2014 LinkedIn Corporation. All Rights Reserved.
  • 3. The LinkedIn Stream Like many social networks, the centerpiece of LinkedIn’s home page is a news stream. It contains • Updates about users’ networks • News stories and shares • Recommendations ©2014 LinkedIn Corporation. All Rights Reserved.
  • 4. The LinkedIn Stream We operate at a large scale. • 277+ million members • 75+ million monthly unique • users 5000+ employees ©2014 LinkedIn Corporation. All Rights Reserved.
  • 5. The LinkedIn Stream Today, we’ll tell you how we experiment with new content in the stream: • Creating new content • Maximizing relevance • Managing tests ©2014 LinkedIn Corporation. All Rights Reserved.
  • 6. History of the LinkedIn Stream Network updates were introduced in 2006 Back then, LinkedIn had • 5mm members • 875k monthly uniques • 70 employees ©2014 LinkedIn Corporation. All Rights Reserved.
  • 7. History of the LinkedIn Stream In practice this meant: •Slow changing content, small number of updates, weekly visit rate ‣ No ranking/optimization •Small number of active tests, limited analytics resources ‣ Primitive resources for A/B tests •Limited engineering resources ‣ Hacky solution for testing new content... ©2014 LinkedIn Corporation. All Rights Reserved.
  • 8. History of the LinkedIn Stream We experimented with new content using a system called the Analytics Prototype Engine, or APE. It was implemented as an ad slot on the home page. Big wins included: • People You May Know • Groups You Might Like • Jobs You Might Be Interested In ©2014 LinkedIn Corporation. All Rights Reserved.
  • 9. History of the LinkedIn Stream We added more content over the next couple of years: •Status updates •Twitter content •Group discussions •OpenSocial content (TripIt, GitHub, and more...) ©2014 LinkedIn Corporation. All Rights Reserved.
  • 10. History of the LinkedIn Stream By 2009, the stream looked very similar to the stream today. LinkedIn was much bigger than when we first added a news stream... • 55mm members • 36mm monthly uniques • 500 employees (end of year) ©2014 LinkedIn Corporation. All Rights Reserved.
  • 11. History of the LinkedIn Stream … but the infrastructure hadn’t changed much and we were experiencing growing pains: •No system for ranking and optimization: ‣ Users were overwhelmed with low relevance updates •No system for A/B testing ‣ Overlapping A/B tests, poor experiment design, difficult analysis •No system for rapid prototyping/testing ‣ APE was making the site slow and unstable, and was shut down ©2014 LinkedIn Corporation. All Rights Reserved.
  • 12. History of the Stream In the rest of this talk, we’ll tell you how we’ve addressed these challenges (and used a lot of data science to make this happen). ©2014 LinkedIn Corporation. All Rights Reserved.
  • 13. Content Insertion In the beginning (2006), experiments happened outside the stream through APE: • Easy data uploads • Management UI • Templates ©2014 LinkedIn Corporation. All Rights Reserved.
  • 14. Content Insertion Most new content experiments boil down to one thing: creating experimental data. We wanted the data experts to be able to create experiments easily by focusing on data, not on writing production code (and wrestling with build systems, deployment processes, etc). We created a system that lets data scientists push new content into the stream by writing scripts (in Pig, Hive, etc). ©2014 LinkedIn Corporation. All Rights Reserved.
  • 15. Content Insertion Project Gorilla brought the spirit of APE back to the home page, inside the stream. nhome USCP Federator Gorilla First Pass Ranker Architecture diagram → Gorilla Voldemort Store Gorilla Batch Gorilla jobs ©2014 LinkedIn Corporation. All Rights Reserved.
  • 16. Content Insertion What does this consist of? •An Apache Pig UDF for pushing content •A batch process that filters, consolidates, and ranks updates •A process that pushes data from Hadoop into Voldemort (our NoSQL key/value store) •An online system that fetches updates from the store and mixes them into the stream ©2014 LinkedIn Corporation. All Rights Reserved. nhome USCP Federator Gorilla First Pass Ranker Gorilla Voldemort Store Gorilla Batch Gorilla jobs
  • 17. Content Insertion Our implementation is very simple: •LinkedIn production systems use rest.li as an API (JSON data + schema) •We create data offline on Hadoop, put it in Voldemort, and surface it through an API This means that we can experiment easily using existing templates, tracking, etc; we just have to change the data that’s rendered. (We’re also experimenting with a similar real time system based on Apache Samza.) ©2014 LinkedIn Corporation. All Rights Reserved.
  • 18. Relevance Optimization Bring each individual user the most relevant items from different sources to optimize for a single or multiple measurable objectives ©2014 LinkedIn Corporation. All Rights Reserved.
  • 19. Relevance Optimization • Maximize users’ clicks on items in the stream • Rank items according their click rates • Probability that a user would click an item • Predict the click rate based on • User features: Profile, visit pattern, interests, … • Item features: Type, topics, keywords, … • User-item interaction features • Context: Device, time of day, previous page … ©2014 LinkedIn Corporation. All Rights Reserved.
  • 20. Relevance Optimization Large scale logistic regression •Input: A set of past users’ responses to items Response 1 0 … Feature Vector (Gender=M, JobTitle=CEO, ItemType=JobChange, ...) (Gender=F, JobTitle=Engineer, ItemType=Article, ...) … •Output: Model parameters •Challenge: Data too large to fit in a single machine •Solution: Train a model using MapReduce on Hadoop ©2014 LinkedIn Corporation. All Rights Reserved.
  • 21. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 22. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 23. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 24. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 25. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 26. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 27. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 28. Relevance Optimization Large scale Logistic Regression with ADMM Large Input Data Set Partition 1 Partition 2 Partition 3 … Partition K Logistic Regression Logistic Regression Logistic Regression … Logistic Regression Consensus Computation ©2014 LinkedIn Corporation. All Rights Reserved.
  • 29. Relevance Optimization Diversity Users get tired when seeing items of the same type many times in the stream. Example: Group discussions Drop in Click Rate 2 consecutive discussions 21% 3 consecutive discussions 48% ©2014 LinkedIn Corporation. All Rights Reserved.
  • 30. Relevance Optimization Multi-Objective Optimization • Different items in the stream generate different kinds of value • Click • Social actions: Like, share, comment, … • Revenue from sponsored items • One approach: Maximize revenue s.t. clicks and social actions are still within ε% of optimal • It requires extensive experiments! ©2014 LinkedIn Corporation. All Rights Reserved.
  • 31. Experimentation Framework Stream experiments are carried out on LinkedIn’s central experimentation platform: • A one stop solution for feature • • A/B testing, ramping, and advanced targeting needs Built-in power calculation to aid experiment design Automated reporting and analysis capabilities Mock­up of UI ©2014 LinkedIn Corporation. All Rights Reserved.
  • 32. Experimentation Framework • History: assign members into test groups based on modulo of Member IDs • A very high likelihood of range overlaps between tests • Just one experiment can negatively affect results of other tests executed on the same page • Now: deterministic pseudo-random algorithm for treatment assignment computation • Improved logging of treatment assignment • Automated scorecards • Record of historical experiments ©2014 LinkedIn Corporation. All Rights Reserved.
  • 33. Experimentation Framework • History: focus on productspecific metrics • Stream relevance change • ⇒ CTR Profile redesign ⇒ # of profile views • Now: standardized, tiered metric system • Sitewide Tier 1 metrics • Product-specific Tier 2 / Tier 3 • metrics Comprehensive understanding of feature impact ©2014 LinkedIn Corporation. All Rights Reserved. Mock­up of UI
  • 34. Conclusions LinkedIn has always experimented with site content. As we’ve grown, we’ve had to rethink how we experiment. Key lessons: •Managing experimentation at scale is hard •Scale means users, content volume, and employees •Invest in platforms if it saves time, money, labor. ©2014 LinkedIn Corporation. All Rights Reserved.
  • 35. ©2014 LinkedIn Corporation. All Rights Reserved.

×