Infovision sunil shirguppi  _ large scale BI and analytics
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Infovision sunil shirguppi _ large scale BI and analytics

on

  • 841 views

Big Data EcoSystem @ LinkedIn

Big Data EcoSystem @ LinkedIn

Sunil Shirguppi

Statistics

Views

Total Views
841
Views on SlideShare
840
Embed Views
1

Actions

Likes
1
Downloads
31
Comments
0

1 Embed 1

http://www.docseek.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Infovision sunil shirguppi _ large scale BI and analytics Presentation Transcript

  • 1. Big Data EcoSystem @ LinkedInOctober 20, 2012LinkedIn Confidential ©2013 All Rights Reserved
  • 2. Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppiLinkedIn Confidential ©2013 All Rights Reserved
  • 3. OutlineLinkedIn OverviewData ScienceBig Data Eco-SystemLearningsLinkedIn Confidential ©2013 All Rights Reserved 3
  • 4. Our Mission Connect the world’s professionalsto make them more productive and successfulLinkedIn Confidential ©2013 All Rights Reserved 4
  • 5. Googled yourself lately?Don’t feel bad, we all do it. We are the professional profile of record
  • 6. Executives from all Companies areLinkedIn members
  • 7. The LinkedIn OpportunityConnect talent with opportunity at massive scale + Fundamentally transforming the way the world worksLinkedIn Confidential ©2013 All Rights Reserved 7
  • 8. The World’s Largest Professional Network 175M+ 90 * ~2/sec New Members joining 55 >2M Company Pages 32 82% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional 2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011 LinkedIn Confidential ©2013 All Rights Reserved 8
  • 9. Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  • 10. Let’s talk Data…
  • 11. Business is recognizing the importance of analytics
  • 12. What makes a Data Scientist?Data Scientist = Curiosity + Intuition + Datagathering + Standardization + Statistics + Modeling+ Visualization + Communication
  • 13. Big Data at LinkedIn* Chart from Philip Russom- Research Director: TDWILinkedIn Confidential ©2013 All Rights Reserved 13
  • 14. What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the businessBefore we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  • 15. Few Data-Driven Products Pandora Search for People Groups browse maps Events You May Be Interested InLinkedIn Confidential ©2013 All Rights Reserved 15
  • 16. How do we do it?
  • 17. LinkedIn Sample Data Stack Crowdsourcing
  • 18. Big Data at LinkedIn High-level data environment Near-Line Data Store Application Online Data Offline Data Store Store Users Web Logs Challenges so complex which Built our own combination of off-the-shelf or a few toolsets/ technologies to technologies can’t address meet specific requirementsLinkedIn Confidential ©2013 All Rights Reserved 19
  • 19. LinkedIn Data Stack – Online Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 20
  • 20. LinkedIn Data Stack – Nearline Systems Capabilities Voldemort • Key value access Zoie Bobo Sensei • Search platform D-Graph • Distributed Graph engine Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 21
  • 21. LinkedIn Data Stack – Pipeline Systems Capabilities • Messaging for site events, monitoring • Change data capture streams Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 22
  • 22. LinkedIn Data Stack – Offline Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 23
  • 23. LinkedIn with Hadoop, Aster, and Teradata Aster/Teradata Aster/Teradata Hadoop Connectors Bi-Directional ConnectorData transformation Analytic Platform for data Integrated Data& batch processing discovery Warehouse• Image processing • nPath Pattern/Path • Exec Dashboards• Search indexes • Clickstream analysis • Adhoc/OLAP• Graph (PYMK) • A/B site testing • Complex SQL• MapReduce • Data Sciences discovery • SQL • SQL-MapReduce Interactive MapReduce Batch data transformations for Integration with structured data, analytics for the enterprise using engineering groups using HDFS + operational intelligence, scalable MapReduce Analytics & MapReduce distribution of analytics SQL-MapReduceLinkedIn Confidential ©2013 All Rights Reserved 24
  • 24. It’s a global economy Country connectedness on LinkedIn
  • 25. Data deep dives Job migration after financial collapse
  • 26. How Often do people change jobs?
  • 27. Visualization is important
  • 28. If your name is Chip, you are likely in sales!
  • 29. Industry Growth 31
  • 30. Buzzwords
  • 31. What next?• Self service analytics• Metadata framework• Integrate reporting solutions• Go Mobile!• Scalability and Data Quality
  • 32. Challenges• Data volumes and availability – Billion+ rows every day – Users in Global locations need data  Data Quality – User input data – Data standardization • Multiple platforms – Agile development – Data Integration
  • 33. Key Learnings Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think• Measuring your future investments – Performance is not the only measure – Company fundamentals matter• As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  • 34. Web 3.0 – It’s all about data!!LinkedIn Confidential ©2013 All Rights Reserved 36
  • 35. ULTIMATELY…
  • 36. It is all about the people!
  • 37. Thank You!LinkedIn Confidential ©2013 All Rights Reserved 39