Infovision sunil shirguppi  _ large scale BI and analytics
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Infovision sunil shirguppi _ large scale BI and analytics

  • 887 views
Uploaded on

Big Data EcoSystem @ LinkedIn ...

Big Data EcoSystem @ LinkedIn

Sunil Shirguppi

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
887
On Slideshare
886
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
32
Comments
0
Likes
1

Embeds 1

http://www.docseek.net 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big Data EcoSystem @ LinkedInOctober 20, 2012LinkedIn Confidential ©2013 All Rights Reserved
  • 2. Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppiLinkedIn Confidential ©2013 All Rights Reserved
  • 3. OutlineLinkedIn OverviewData ScienceBig Data Eco-SystemLearningsLinkedIn Confidential ©2013 All Rights Reserved 3
  • 4. Our Mission Connect the world’s professionalsto make them more productive and successfulLinkedIn Confidential ©2013 All Rights Reserved 4
  • 5. Googled yourself lately?Don’t feel bad, we all do it. We are the professional profile of record
  • 6. Executives from all Companies areLinkedIn members
  • 7. The LinkedIn OpportunityConnect talent with opportunity at massive scale + Fundamentally transforming the way the world worksLinkedIn Confidential ©2013 All Rights Reserved 7
  • 8. The World’s Largest Professional Network 175M+ 90 * ~2/sec New Members joining 55 >2M Company Pages 32 82% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional 2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011 LinkedIn Confidential ©2013 All Rights Reserved 8
  • 9. Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  • 10. Let’s talk Data…
  • 11. Business is recognizing the importance of analytics
  • 12. What makes a Data Scientist?Data Scientist = Curiosity + Intuition + Datagathering + Standardization + Statistics + Modeling+ Visualization + Communication
  • 13. Big Data at LinkedIn* Chart from Philip Russom- Research Director: TDWILinkedIn Confidential ©2013 All Rights Reserved 13
  • 14. What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the businessBefore we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  • 15. Few Data-Driven Products Pandora Search for People Groups browse maps Events You May Be Interested InLinkedIn Confidential ©2013 All Rights Reserved 15
  • 16. How do we do it?
  • 17. LinkedIn Sample Data Stack Crowdsourcing
  • 18. Big Data at LinkedIn High-level data environment Near-Line Data Store Application Online Data Offline Data Store Store Users Web Logs Challenges so complex which Built our own combination of off-the-shelf or a few toolsets/ technologies to technologies can’t address meet specific requirementsLinkedIn Confidential ©2013 All Rights Reserved 19
  • 19. LinkedIn Data Stack – Online Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 20
  • 20. LinkedIn Data Stack – Nearline Systems Capabilities Voldemort • Key value access Zoie Bobo Sensei • Search platform D-Graph • Distributed Graph engine Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 21
  • 21. LinkedIn Data Stack – Pipeline Systems Capabilities • Messaging for site events, monitoring • Change data capture streams Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 22
  • 22. LinkedIn Data Stack – Offline Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 23
  • 23. LinkedIn with Hadoop, Aster, and Teradata Aster/Teradata Aster/Teradata Hadoop Connectors Bi-Directional ConnectorData transformation Analytic Platform for data Integrated Data& batch processing discovery Warehouse• Image processing • nPath Pattern/Path • Exec Dashboards• Search indexes • Clickstream analysis • Adhoc/OLAP• Graph (PYMK) • A/B site testing • Complex SQL• MapReduce • Data Sciences discovery • SQL • SQL-MapReduce Interactive MapReduce Batch data transformations for Integration with structured data, analytics for the enterprise using engineering groups using HDFS + operational intelligence, scalable MapReduce Analytics & MapReduce distribution of analytics SQL-MapReduceLinkedIn Confidential ©2013 All Rights Reserved 24
  • 24. It’s a global economy Country connectedness on LinkedIn
  • 25. Data deep dives Job migration after financial collapse
  • 26. How Often do people change jobs?
  • 27. Visualization is important
  • 28. If your name is Chip, you are likely in sales!
  • 29. Industry Growth 31
  • 30. Buzzwords
  • 31. What next?• Self service analytics• Metadata framework• Integrate reporting solutions• Go Mobile!• Scalability and Data Quality
  • 32. Challenges• Data volumes and availability – Billion+ rows every day – Users in Global locations need data  Data Quality – User input data – Data standardization • Multiple platforms – Agile development – Data Integration
  • 33. Key Learnings Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think• Measuring your future investments – Performance is not the only measure – Company fundamentals matter• As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  • 34. Web 3.0 – It’s all about data!!LinkedIn Confidential ©2013 All Rights Reserved 36
  • 35. ULTIMATELY…
  • 36. It is all about the people!
  • 37. Thank You!LinkedIn Confidential ©2013 All Rights Reserved 39