Talk given at Big Boulder conference hosted by Gnip in Boulder, Colorodo on June 21, 2012. This talk provides an intro to Data Science at LinkedIn, and highlights the type of roles a Data Science team can play at a data-driven company. We use data (1) to create products that truly serve our members, (2) to derive insights, and (3) to generate wisdom which enables us to take the products and company to the next level. LinkedIn's data on 160+ million professionals' careers and networks provides a fascinating playground for data scientists to discover data insights about career trends, the social web and the economy.
2. Connect the world’s professionals to make
them more productive and successful
3. LinkedIn at a glance
• Founded in 2003
• 160M+ members
• 2 new signups per second
• Executives from all Fortune 500 companies
• 42% are “decision makers”
• Average Household income $107,000
• ~4B annual people searches
• Over 200 countries & territories
• 17 different languages
• Amazing data set to slice and dice
4. What does the Data Science team at LinkedIn
do?
da·ta noun pl but singular or pl in constr, often attributiveˈdā-tə, ˈda- also ˈdä-
• Information in numerical form that can be
digitally transmitted or processed
Source : http://www.merriam-webster.com
5. Web Logs = Data Normalized Data = Information
Parse,
Normalize,
Standardize
In that case... Information
9. What is Data Science?
Using (multiple) data elements in clever ways to solve
iterative or auxiliary data problems that when combined
solve a data problem that might otherwise be intractable.
What makes a data scientist?
Data Scientist = Curiosity + Intuition + Data gathering +
Standardization + Statistics + Modeling + Visualization +
Communication
analytics &
data science
10. What do we do with the data?
• Products: data driven product
design
• Insights
• Wisdom
26. The 10 Most attractive start ups to Bay Area
Engineers
27. What do we do with the data?
• Products
• Insights
• Wisdom
28. Strategic Analyses:
Using data to drive the business.
• What is the value of an action that a user takes on the
site?
• What early behavior on the site is predictive of future
engagement?
• What is the value of a user?
• Does mobile usage impact site engagement?
30. Keep your profile up to date, but...
I am a highly motivated, innovative dynamic professional
with extensive experience in analytics, who has a proven
track record of being a problem solver and a team player
in fast paced entrepreneurial environments
33. Data Products on your LinkedIn homepage:
All of them are data products!
34. Standardization - The
challenges
We can standardize the title Software
Engineer with 6000+ variations including:
s.w engineer
sw. enginner
sofware engineer
sw. engineer
software enginer - china
We can standardize the company IBM with
8000+ variations including:
ibm - ireland
ibm research
TJ Watson Labs
International Buss. Machines
35. Big Data that scales.
What Technologies do we use?
Crowdsourcing