Your SlideShare is downloading. ×
0
Big Data EcoSystem @ LinkedInOctober 20, 2012LinkedIn Confidential ©2013 All Rights Reserved
Sunil Shirguppi                         Head of Data Services- International                         LinkedIn Corporation ...
OutlineLinkedIn OverviewData ScienceBig Data Eco-SystemLearningsLinkedIn Confidential ©2013 All Rights Reserved   3
Our Mission     Connect the world’s professionalsto make them more productive and successfulLinkedIn Confidential ©2013 Al...
Googled yourself lately?Don’t feel bad, we all do it.  We are the professional profile of record
Executives from all  Companies areLinkedIn members
The LinkedIn OpportunityConnect talent with opportunity at massive scale                                                  ...
The World’s Largest Professional Network                                                175M+       90     *              ...
Multiple revenue channels     Premium Subscriptions     Self Serve Ads     Hiring Solutions     Marketing Solutions
Let’s talk Data…
Business is recognizing the importance of analytics
What makes a Data Scientist?Data Scientist = Curiosity + Intuition + Datagathering + Standardization + Statistics + Modeli...
Big Data at LinkedIn* Chart from Philip Russom- Research Director: TDWILinkedIn Confidential ©2013 All Rights Reserved    ...
What do we do with Data?     Data Standardization     Build innovative data products to help professionals     Draw ins...
Few Data-Driven Products                                                  Pandora Search for People                       ...
How do we do it?
LinkedIn Sample Data Stack                             Crowdsourcing
Big Data at LinkedIn         High-level data environment                                                  Near-Line       ...
LinkedIn Data Stack – Online                             Systems                              Capabilities                ...
LinkedIn Data Stack – Nearline                             Systems                                         Capabilities   ...
LinkedIn Data Stack – Pipeline                             Systems                            Capabilities                ...
LinkedIn Data Stack – Offline                             Systems                            Capabilities                 ...
LinkedIn with Hadoop, Aster, and Teradata                         Aster/Teradata                                    Aster/...
It’s a global economy    Country connectedness on LinkedIn
Data deep dives          Job migration after financial collapse
How Often do people change jobs?
Visualization is important
If your name is Chip, you are likely in sales!
Industry Growth                  31
Buzzwords
What next?•   Self service analytics•   Metadata framework•   Integrate reporting solutions•   Go Mobile!•   Scalability a...
Challenges• Data volumes and availability    – Billion+ rows every day    – Users in Global locations need data  Data Qua...
Key Learnings Self Service   – Making data accessible to key stakeholders in a timely     manner creates tremendous value...
Web 3.0 – It’s all about data!!LinkedIn Confidential ©2013 All Rights Reserved                 36
ULTIMATELY…
It is all about the people!
Thank You!LinkedIn Confidential ©2013 All Rights Reserved        39
Infovision sunil shirguppi  _ large scale BI and analytics
Infovision sunil shirguppi  _ large scale BI and analytics
Upcoming SlideShare
Loading in...5
×

Infovision sunil shirguppi _ large scale BI and analytics

606

Published on

Big Data EcoSystem @ LinkedIn

Sunil Shirguppi

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
606
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Infovision sunil shirguppi _ large scale BI and analytics "

  1. 1. Big Data EcoSystem @ LinkedInOctober 20, 2012LinkedIn Confidential ©2013 All Rights Reserved
  2. 2. Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppiLinkedIn Confidential ©2013 All Rights Reserved
  3. 3. OutlineLinkedIn OverviewData ScienceBig Data Eco-SystemLearningsLinkedIn Confidential ©2013 All Rights Reserved 3
  4. 4. Our Mission Connect the world’s professionalsto make them more productive and successfulLinkedIn Confidential ©2013 All Rights Reserved 4
  5. 5. Googled yourself lately?Don’t feel bad, we all do it. We are the professional profile of record
  6. 6. Executives from all Companies areLinkedIn members
  7. 7. The LinkedIn OpportunityConnect talent with opportunity at massive scale + Fundamentally transforming the way the world worksLinkedIn Confidential ©2013 All Rights Reserved 7
  8. 8. The World’s Largest Professional Network 175M+ 90 * ~2/sec New Members joining 55 >2M Company Pages 32 82% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional 2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011 LinkedIn Confidential ©2013 All Rights Reserved 8
  9. 9. Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  10. 10. Let’s talk Data…
  11. 11. Business is recognizing the importance of analytics
  12. 12. What makes a Data Scientist?Data Scientist = Curiosity + Intuition + Datagathering + Standardization + Statistics + Modeling+ Visualization + Communication
  13. 13. Big Data at LinkedIn* Chart from Philip Russom- Research Director: TDWILinkedIn Confidential ©2013 All Rights Reserved 13
  14. 14. What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the businessBefore we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  15. 15. Few Data-Driven Products Pandora Search for People Groups browse maps Events You May Be Interested InLinkedIn Confidential ©2013 All Rights Reserved 15
  16. 16. How do we do it?
  17. 17. LinkedIn Sample Data Stack Crowdsourcing
  18. 18. Big Data at LinkedIn High-level data environment Near-Line Data Store Application Online Data Offline Data Store Store Users Web Logs Challenges so complex which Built our own combination of off-the-shelf or a few toolsets/ technologies to technologies can’t address meet specific requirementsLinkedIn Confidential ©2013 All Rights Reserved 19
  19. 19. LinkedIn Data Stack – Online Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 20
  20. 20. LinkedIn Data Stack – Nearline Systems Capabilities Voldemort • Key value access Zoie Bobo Sensei • Search platform D-Graph • Distributed Graph engine Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 21
  21. 21. LinkedIn Data Stack – Pipeline Systems Capabilities • Messaging for site events, monitoring • Change data capture streams Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 22
  22. 22. LinkedIn Data Stack – Offline Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 23
  23. 23. LinkedIn with Hadoop, Aster, and Teradata Aster/Teradata Aster/Teradata Hadoop Connectors Bi-Directional ConnectorData transformation Analytic Platform for data Integrated Data& batch processing discovery Warehouse• Image processing • nPath Pattern/Path • Exec Dashboards• Search indexes • Clickstream analysis • Adhoc/OLAP• Graph (PYMK) • A/B site testing • Complex SQL• MapReduce • Data Sciences discovery • SQL • SQL-MapReduce Interactive MapReduce Batch data transformations for Integration with structured data, analytics for the enterprise using engineering groups using HDFS + operational intelligence, scalable MapReduce Analytics & MapReduce distribution of analytics SQL-MapReduceLinkedIn Confidential ©2013 All Rights Reserved 24
  24. 24. It’s a global economy Country connectedness on LinkedIn
  25. 25. Data deep dives Job migration after financial collapse
  26. 26. How Often do people change jobs?
  27. 27. Visualization is important
  28. 28. If your name is Chip, you are likely in sales!
  29. 29. Industry Growth 31
  30. 30. Buzzwords
  31. 31. What next?• Self service analytics• Metadata framework• Integrate reporting solutions• Go Mobile!• Scalability and Data Quality
  32. 32. Challenges• Data volumes and availability – Billion+ rows every day – Users in Global locations need data  Data Quality – User input data – Data standardization • Multiple platforms – Agile development – Data Integration
  33. 33. Key Learnings Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think• Measuring your future investments – Performance is not the only measure – Company fundamentals matter• As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  34. 34. Web 3.0 – It’s all about data!!LinkedIn Confidential ©2013 All Rights Reserved 36
  35. 35. ULTIMATELY…
  36. 36. It is all about the people!
  37. 37. Thank You!LinkedIn Confidential ©2013 All Rights Reserved 39
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×