Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Developing Data Products
SF Data Science Meetup
Pete Skomoroch @peteskomoroch
September 19 2013
©2012 LinkedIn Corporation...
Developing Data Products
Examples, Techniques, & Lessons Learned
Our Mission
Connect the world’s professionals to make them
more productive and successful.
Our Vision
Create economic oppo...
LinkedIn is the leading professional network site
Worldwide Workforce
3,300M+
2
Worldwide
Professionals
640M+
2
LinkedIn M...
LinkedIn profiles represent our professional identity
©2012 LinkedIn Corporation. All Rights Reserved. 5
238MMembers 238M ...
We have a lot of data.
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
And (like everyone else), we store it in Hadoop.
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
And (like everyone else), we store it in Hadoop.
And people build awesome things with that data.
©2...
What do we mean by data
products?
Building products from data at LinkedIn
A few examples:
 People You May Know
 Skills and Endorsements
 Year in Review
...
Collaborative Filtering: LinkedIn Skill Pages
©2012 LinkedIn Corporation. All Rights Reserved.
Classification: giving structure to unstructured data
©2012 LinkedIn Corporation. All Rights Reserved.
Extract
Clustering & Disambiguation
©2012 LinkedIn Corporation. All Rights Reserved.
De-duplication and Normalization
©2012 LinkedIn Corporation. All Rights Reserved.
©2012 LinkedIn Corporation. All Rights Reserved. 15
Network Algorithms: Relevance & Ranking
Prediction: Personalized Skill Recommendations
©2012 LinkedIn Corporation. All Rights Reserved.
Skill Endorsements: Over 2 Billion and Growing
©2012 LinkedIn Corporation. All Rights Reserved.
©2012 LinkedIn Corporation. All Rights Reserved. 20
Social Proof and the Skill Endorsement Graph
The Economic Graph: Skills, Jobs, People, Locations…
©2012 LinkedIn Corporation. All Rights Reserved. 21
Location
Lessons learned developing data
products
Collect the right data at the right time
Large amounts of data can reveal new patterns
©2012 LinkedIn Corporation. All Rights Reserved. 24
ProbabilityofJobTitle
Ti...
Be wary of “black-box” approaches
©2012 LinkedIn Corporation. All Rights Reserved. 25
Look at your data
©2012 LinkedIn Corporation. All Rights Reserved. 26
Aggregate statistics can be misleading
©2012 LinkedIn Corporation. All Rights Reserved. 27
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8...
Build a viewer app, “micro-listen”
©2012 LinkedIn Corporation. All Rights Reserved. 28
Algorithmic intuition: include data geeks in design
©2012 LinkedIn Corporation. All Rights Reserved. 29
OODA: Think like a jet fighter
©2012 LinkedIn Corporation. All Rights Reserved. 30
OODA: Observe, Orient, Decide, Act
©2012 LinkedIn Corporation. All Rights Reserved. 31
OODA: The speed you can move determines victory
©2012 LinkedIn Corporation. All Rights Reserved. 32
Red teaming: what can go wrong likely will
©2012 LinkedIn Corporation. All Rights Reserved. 33
Error data is valuable, analyze it and adapt
©2012 LinkedIn Corporation. All Rights Reserved. 34
Conclusion: tips for developing data products
 Collect the right data at the right time
 Large amounts of data can revea...
Questions?
@peteskomoroch
©2012 LinkedIn Corporation. All Rights Reserved. 36
SF Data Science: Developing Data Products
SF Data Science: Developing Data Products
Upcoming SlideShare
Loading in …5
×

SF Data Science: Developing Data Products

2,635 views

Published on

Examples, techniques, and lessons learned building data products over the last 4 years at LinkedIn.

Pete Skomoroch is a Principal Data Scientist at LinkedIn where he leads a team focused on building data products leveraging LinkedIn's powerful identity and reputation data.

The talk describes some techniques and best practices applied to develop products like LinkedIn Skills & Endorsements.

This talk was presented at the SF Data Science Meetup on September 19th, 2013

Published in: Technology, Education
  • Be the first to comment

SF Data Science: Developing Data Products

  1. 1. Developing Data Products SF Data Science Meetup Pete Skomoroch @peteskomoroch September 19 2013 ©2012 LinkedIn Corporation. All Rights Reserved.
  2. 2. Developing Data Products Examples, Techniques, & Lessons Learned
  3. 3. Our Mission Connect the world’s professionals to make them more productive and successful. Our Vision Create economic opportunity for every professional in the world. Members First!
  4. 4. LinkedIn is the leading professional network site Worldwide Workforce 3,300M+ 2 Worldwide Professionals 640M+ 2 LinkedIn Members 238M+ 1 ©2012 LinkedIn Corporation. All Rights Reserved. 4
  5. 5. LinkedIn profiles represent our professional identity ©2012 LinkedIn Corporation. All Rights Reserved. 5 238MMembers 238M Member Profiles 1 2
  6. 6. We have a lot of data. ©2012 LinkedIn Corporation. All Rights Reserved.
  7. 7. We have a lot of data. And (like everyone else), we store it in Hadoop. ©2012 LinkedIn Corporation. All Rights Reserved.
  8. 8. We have a lot of data. And (like everyone else), we store it in Hadoop. And people build awesome things with that data. ©2012 LinkedIn Corporation. All Rights Reserved.
  9. 9. What do we mean by data products?
  10. 10. Building products from data at LinkedIn A few examples:  People You May Know  Skills and Endorsements  Year in Review  Network Updates Digest  InMaps  Who’s viewed my profile  Collaborative Filtering  Groups You May Like  and more… ©2012 LinkedIn Corporation. All Rights Reserved.
  11. 11. Collaborative Filtering: LinkedIn Skill Pages ©2012 LinkedIn Corporation. All Rights Reserved.
  12. 12. Classification: giving structure to unstructured data ©2012 LinkedIn Corporation. All Rights Reserved. Extract
  13. 13. Clustering & Disambiguation ©2012 LinkedIn Corporation. All Rights Reserved.
  14. 14. De-duplication and Normalization ©2012 LinkedIn Corporation. All Rights Reserved.
  15. 15. ©2012 LinkedIn Corporation. All Rights Reserved. 15 Network Algorithms: Relevance & Ranking
  16. 16. Prediction: Personalized Skill Recommendations ©2012 LinkedIn Corporation. All Rights Reserved.
  17. 17. Skill Endorsements: Over 2 Billion and Growing ©2012 LinkedIn Corporation. All Rights Reserved.
  18. 18. ©2012 LinkedIn Corporation. All Rights Reserved. 20 Social Proof and the Skill Endorsement Graph
  19. 19. The Economic Graph: Skills, Jobs, People, Locations… ©2012 LinkedIn Corporation. All Rights Reserved. 21 Location
  20. 20. Lessons learned developing data products
  21. 21. Collect the right data at the right time
  22. 22. Large amounts of data can reveal new patterns ©2012 LinkedIn Corporation. All Rights Reserved. 24 ProbabilityofJobTitle Time since graduation
  23. 23. Be wary of “black-box” approaches ©2012 LinkedIn Corporation. All Rights Reserved. 25
  24. 24. Look at your data ©2012 LinkedIn Corporation. All Rights Reserved. 26
  25. 25. Aggregate statistics can be misleading ©2012 LinkedIn Corporation. All Rights Reserved. 27 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10
  26. 26. Build a viewer app, “micro-listen” ©2012 LinkedIn Corporation. All Rights Reserved. 28
  27. 27. Algorithmic intuition: include data geeks in design ©2012 LinkedIn Corporation. All Rights Reserved. 29
  28. 28. OODA: Think like a jet fighter ©2012 LinkedIn Corporation. All Rights Reserved. 30
  29. 29. OODA: Observe, Orient, Decide, Act ©2012 LinkedIn Corporation. All Rights Reserved. 31
  30. 30. OODA: The speed you can move determines victory ©2012 LinkedIn Corporation. All Rights Reserved. 32
  31. 31. Red teaming: what can go wrong likely will ©2012 LinkedIn Corporation. All Rights Reserved. 33
  32. 32. Error data is valuable, analyze it and adapt ©2012 LinkedIn Corporation. All Rights Reserved. 34
  33. 33. Conclusion: tips for developing data products  Collect the right data at the right time  Large amounts of data can reveal new patterns  Be wary of “black box” approaches  Look at your raw data  Aggregate statistics can be misleading  Build and use viewer apps  Include data geeks in design process  OODA: Think like a jet fighter  Red-teaming: anticipate edge cases  Find opportunity in your error data ©2012 LinkedIn Corporation. All Rights Reserved.
  34. 34. Questions? @peteskomoroch ©2012 LinkedIn Corporation. All Rights Reserved. 36

×