Presentation from a talk given at Boston Big Data Innovation Summit, September 2012.
Summary: The Data Science team at LinkedIn focuses on 3 main goals: (1) providing data-driven business and product insights, (2) creating data products, and (3) extracting interesting insights from our data such as analysis of the economic status of the country or identifying hot companies in a certain geographic region. In this talk I describe how we ensure that our products are data driven -- really data infused at the core -- and share interesting insights we uncover using LinkedIn's rich data. We discuss what makes a good data scientist, and what techniques and technologies LinkedIn data scientists use to convert our rich data into actionable product and business insights, to create data-driven products that truly serve our members.
LinkedIn at a Glance• Founded in 2003• 175M+ members• 2 new signups per second• Executives from all Fortune 500 companies• 80% are “decision makers”• Average Household income in US: $86,000• ~4B annual people searches• Over 200 countries & territories• 17 different languages 4
Your Professional Identity Amazing dataset that we can slice and dice. By seniority, by job function – we can ask many interesting questions. 5
So what data do we have?175M+ professional profiles 6 6
What does the Data Science team atLinkedIn do with the data?• Product and Business Insights• Build Data Products• Extract Insights we Share Externally 7
da·ta noun pl but singular or pl in constr, often attributiveˈdā-tə, ˈda- also ˈdä-Information in numerical form that can be digitally transmitted orprocessed Source : http://www.merriam-webster.com Normalized Data =Web Logs = Data Information Parse, Normalize, Standardize From data to Information
If you can’t measureit, you can’t fix it.Measure everything.Know thyself: What’s going on?In the form of reporting, knowing the numbers, understanding usage ofproducts, patterns in the data, segments of users, tracking the growth andhealth of the ecosystem. 10
Rethinking our Mobile App: what do people on thispage? Where do they go next? How many drop off? What is the stickiest product? What works, what doesn’t? 12
At what times in the day are people using differentdevices? Desktop usage iPad device accessing linkedin.com via browser Hours of the day 13
data information knowledge insights wisdomWisdom: What’s the next needlemover?
Strategic Analyses:Using data to drive the business.• What is the value of an action that a user takes on the site?• What early behavior on the site is predictive of future engagement?• What is the value of a user?• What is mobile’s impact on social actions?• How does mobile usage impact desktop site engagement? 15
What is Data Science?Using (multiple) data elements in clever ways to solveiterative or auxiliary data problems that when combinedsolve a data problem that might otherwise be intractable.What makes a data scientist?Data Scientist = Curiosity + Intuition + Product & BusinessSense + Data gathering + Standardization+ Statistics+ Modeling + Visualization +Communication analytics & data science 16
What does the Data Science team atLinkedIn do with the data?• Product and Business Insights• Build Data Products• Extract Insights we Share Externally 17
If your name is Chip, you are likely in sales! 36
The 10 Most attractive start ups to Bay AreaEngineers 37
The Power of Aggregation Before employees worked at Yahoo! (169) Google (96) Oracle (78) Microsoft (72) IBM (43) Before employees worked at Google(475) Microsoft (448) LinkedIn (169) Apple, Inc. (154) ebay (133) 38
It’s all about the people who do end-to-end datascience 43
Our partners to make it all happen• Product and Design: use data to influence the design of the product, and user experience & interaction• Marketing: build models to predict members’ propensity to act on an email campaign "call to action". When is the best time to message that user and what does it depend on?• Business Operations: e.g. How is transition to mobile impacting ads, subscription upsells• Executive team: on strategic questions• Engineering: understanding how data is tracked and implemented• Data Services: how do we build tools &infrastructure to democratize the data?Above all – maintaining the mindset of a data-drivencompany. 44