Big Data EcoSystem @ LinkedIn
October 20, 2012
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved
Sunil Shirguppi
Head of Data Services- International
LinkedIn Corporation
...
Outline
LinkedIn Overview
Data Science
Big Data Eco-System
Learnings
LinkedIn Confidential ©2013 All Rights Reserved 3
Our Mission
Connect the world’s professionals
to make them more productive and successful
LinkedIn Confidential ©2013 All ...
We are the professional profile of record
Googled yourself lately?
Don’t feel bad, we all do it.
Executives from all
Companies are
LinkedIn members
The LinkedIn Opportunity
LinkedIn Confidential ©2013 All Rights Reserved 7
Fundamentally transforming the way the world wo...
The World’s Largest Professional Network
LinkedIn Confidential ©2013 All Rights Reserved 8
*as of Nov 4, 2011
**as of June...
Multiple revenue channels
 Premium Subscriptions
 Self Serve Ads
 Hiring Solutions
 Marketing Solutions
Let’s talk Data…
Business is recognizing the importance of analytics
Data Scientist = Curiosity + Intuition + Data
gathering + Standardization + Statistics + Modeling
+ Visualization + Commun...
Big Data at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 13
* Chart from Philip Russom- Research Director: TDWI
What do we do with Data?
 Data Standardization
 Build innovative data products to help professionals
 Draw insights
 D...
Few Data-Driven Products
LinkedIn Confidential ©2013 All Rights Reserved 15
Pandora Search for People
Events You
May Be
In...
How do we do it?
LinkedIn Sample Data Stack
Crowdsourcing
Big Data at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 19
Users
Online Data
Store
Near-Line
Data Store
Appli...
LinkedIn Data Stack – Online
LinkedIn Confidential ©2013 All Rights Reserved 20
Users
Online Data
Store
Near-Line
Data Sto...
LinkedIn Data Stack – Nearline
LinkedIn Confidential ©2013 All Rights Reserved 21
Users
Online Data
Store
Near-Line
Data S...
LinkedIn Data Stack – Pipeline
LinkedIn Confidential ©2013 All Rights Reserved 22
Users
Online Data
Store
Near-Line
Data S...
LinkedIn Data Stack – Offline
LinkedIn Confidential ©2013 All Rights Reserved 23
Users
Online Data
Store
Near-Line
Data St...
LinkedIn with Hadoop, Aster, and Teradata
LinkedIn Confidential ©2013 All Rights Reserved 24
Integrated Data
Warehouse
• E...
It’s a global economy
Country connectedness on LinkedIn
Data deep dives
Job migration after financial collapse
How Often do people change jobs?
Visualization is important
If your name is Chip, you are likely in sales!
31
Industry Growth
Buzzwords
What next?
• Self service analytics
• Metadata framework
• Integrate reporting solutions
• Go Mobile!
• Scalability and Da...
Challenges
• Data volumes and availability
– Billion+ rows every day
– Users in Global locations need data
• Multiple plat...
Key Learnings
 Self Service
– Making data accessible to key stakeholders in a timely
manner creates tremendous value.
– V...
Web 3.0 – It’s all about data!!
LinkedIn Confidential ©2013 All Rights Reserved 36
ULTIMATELY…
It is all about the people!
LinkedIn Confidential ©2013 All Rights Reserved 39
Thank You!
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
Upcoming SlideShare
Loading in …5
×

Big Data Ecosystem @ LinkedIn

1,701 views

Published on

Just a copy from http://www.isim.ac.in/Infovision%202012/presentations/sunilshirguppilinkedin.pdf for saving purpose. All right reserved by the above link.

Published in: Technology

Big Data Ecosystem @ LinkedIn

  1. 1. Big Data EcoSystem @ LinkedIn October 20, 2012 LinkedIn Confidential ©2013 All Rights Reserved
  2. 2. LinkedIn Confidential ©2013 All Rights Reserved Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppi
  3. 3. Outline LinkedIn Overview Data Science Big Data Eco-System Learnings LinkedIn Confidential ©2013 All Rights Reserved 3
  4. 4. Our Mission Connect the world’s professionals to make them more productive and successful LinkedIn Confidential ©2013 All Rights Reserved 4
  5. 5. We are the professional profile of record Googled yourself lately? Don’t feel bad, we all do it.
  6. 6. Executives from all Companies are LinkedIn members
  7. 7. The LinkedIn Opportunity LinkedIn Confidential ©2013 All Rights Reserved 7 Fundamentally transforming the way the world worksFundamentally transforming the way the world works Connect talent with opportunity at massive scale +
  8. 8. The World’s Largest Professional Network LinkedIn Confidential ©2013 All Rights Reserved 8 *as of Nov 4, 2011 **as of June 30, 2011 2 4 8 17 32 55 90 2004 2005 2006 2007 2008 2009 2010 LinkedIn Members (Millions) 175M+* 82% Fortune 100 Companies use LinkedIn to hire Company Pages >2M ** New Members joining ~2/sec Professional searches in 2011 ~4.2B
  9. 9. Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  10. 10. Let’s talk Data…
  11. 11. Business is recognizing the importance of analytics
  12. 12. Data Scientist = Curiosity + Intuition + Data gathering + Standardization + Statistics + Modeling + Visualization + Communication What makes a Data Scientist?
  13. 13. Big Data at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 13 * Chart from Philip Russom- Research Director: TDWI
  14. 14. What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the business Before we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  15. 15. Few Data-Driven Products LinkedIn Confidential ©2013 All Rights Reserved 15 Pandora Search for People Events You May Be Interested In Groups browse maps
  16. 16. How do we do it?
  17. 17. LinkedIn Sample Data Stack Crowdsourcing
  18. 18. Big Data at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 19 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs High-level data environment Challenges so complex which off-the-shelf or a few technologies can’t address Built our own combination of toolsets/ technologies to meet specific requirements
  19. 19. LinkedIn Data Stack – Online LinkedIn Confidential ©2013 All Rights Reserved 20 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability
  20. 20. LinkedIn Data Stack – Nearline LinkedIn Confidential ©2013 All Rights Reserved 21 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Key value accessVoldemort • Search platform • Distributed Graph engine Zoie Bobo Sensei D-Graph
  21. 21. LinkedIn Data Stack – Pipeline LinkedIn Confidential ©2013 All Rights Reserved 22 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Messaging for site events, monitoring • Change data capture streams
  22. 22. LinkedIn Data Stack – Offline LinkedIn Confidential ©2013 All Rights Reserved 23 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics
  23. 23. LinkedIn with Hadoop, Aster, and Teradata LinkedIn Confidential ©2013 All Rights Reserved 24 Integrated Data Warehouse • Exec Dashboards • Adhoc/OLAP • Complex SQL • SQL Data transformation & batch processing • Image processing • Search indexes • Graph (PYMK) • MapReduce Analytic Platform for data discovery • nPath Pattern/Path • Clickstream analysis • A/B site testing • Data Sciences discovery • SQL-MapReduce Aster/Teradata Bi-Directional Connector Aster/Teradata Hadoop Connectors Batch data transformations for engineering groups using HDFS + MapReduce Batch data transformations for engineering groups using HDFS + MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Integration with structured data, operational intelligence, scalable distribution of analytics Integration with structured data, operational intelligence, scalable distribution of analytics
  24. 24. It’s a global economy Country connectedness on LinkedIn
  25. 25. Data deep dives Job migration after financial collapse
  26. 26. How Often do people change jobs?
  27. 27. Visualization is important
  28. 28. If your name is Chip, you are likely in sales!
  29. 29. 31 Industry Growth
  30. 30. Buzzwords
  31. 31. What next? • Self service analytics • Metadata framework • Integrate reporting solutions • Go Mobile! • Scalability and Data Quality
  32. 32. Challenges • Data volumes and availability – Billion+ rows every day – Users in Global locations need data • Multiple platforms – Agile development – Data Integration  Data Quality – User input data – Data standardization
  33. 33. Key Learnings  Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think • Measuring your future investments – Performance is not the only measure – Company fundamentals matter • As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  34. 34. Web 3.0 – It’s all about data!! LinkedIn Confidential ©2013 All Rights Reserved 36
  35. 35. ULTIMATELY…
  36. 36. It is all about the people!
  37. 37. LinkedIn Confidential ©2013 All Rights Reserved 39 Thank You!

×