Infovision sunil shirguppi  _ large scale BI and analytics
Upcoming SlideShare
Loading in...5
×
 

Infovision sunil shirguppi _ large scale BI and analytics

on

  • 826 views

Big Data EcoSystem @ LinkedIn

Big Data EcoSystem @ LinkedIn

Sunil Shirguppi

Statistics

Views

Total Views
826
Views on SlideShare
825
Embed Views
1

Actions

Likes
1
Downloads
31
Comments
0

1 Embed 1

http://www.docseek.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Infovision sunil shirguppi  _ large scale BI and analytics Infovision sunil shirguppi _ large scale BI and analytics Presentation Transcript

  • Big Data EcoSystem @ LinkedInOctober 20, 2012LinkedIn Confidential ©2013 All Rights Reserved
  • Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppiLinkedIn Confidential ©2013 All Rights Reserved
  • OutlineLinkedIn OverviewData ScienceBig Data Eco-SystemLearningsLinkedIn Confidential ©2013 All Rights Reserved 3 View slide
  • Our Mission Connect the world’s professionalsto make them more productive and successfulLinkedIn Confidential ©2013 All Rights Reserved 4 View slide
  • Googled yourself lately?Don’t feel bad, we all do it. We are the professional profile of record
  • Executives from all Companies areLinkedIn members
  • The LinkedIn OpportunityConnect talent with opportunity at massive scale + Fundamentally transforming the way the world worksLinkedIn Confidential ©2013 All Rights Reserved 7
  • The World’s Largest Professional Network 175M+ 90 * ~2/sec New Members joining 55 >2M Company Pages 32 82% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional 2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011 LinkedIn Confidential ©2013 All Rights Reserved 8
  • Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  • Let’s talk Data…
  • Business is recognizing the importance of analytics
  • What makes a Data Scientist?Data Scientist = Curiosity + Intuition + Datagathering + Standardization + Statistics + Modeling+ Visualization + Communication
  • Big Data at LinkedIn* Chart from Philip Russom- Research Director: TDWILinkedIn Confidential ©2013 All Rights Reserved 13
  • What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the businessBefore we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  • Few Data-Driven Products Pandora Search for People Groups browse maps Events You May Be Interested InLinkedIn Confidential ©2013 All Rights Reserved 15
  • How do we do it?
  • LinkedIn Sample Data Stack Crowdsourcing
  • Big Data at LinkedIn High-level data environment Near-Line Data Store Application Online Data Offline Data Store Store Users Web Logs Challenges so complex which Built our own combination of off-the-shelf or a few toolsets/ technologies to technologies can’t address meet specific requirementsLinkedIn Confidential ©2013 All Rights Reserved 19
  • LinkedIn Data Stack – Online Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 20
  • LinkedIn Data Stack – Nearline Systems Capabilities Voldemort • Key value access Zoie Bobo Sensei • Search platform D-Graph • Distributed Graph engine Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 21
  • LinkedIn Data Stack – Pipeline Systems Capabilities • Messaging for site events, monitoring • Change data capture streams Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 22
  • LinkedIn Data Stack – Offline Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics Near-Line Data Store Application Online Data Offline Data Store Store Users Web LogsLinkedIn Confidential ©2013 All Rights Reserved 23
  • LinkedIn with Hadoop, Aster, and Teradata Aster/Teradata Aster/Teradata Hadoop Connectors Bi-Directional ConnectorData transformation Analytic Platform for data Integrated Data& batch processing discovery Warehouse• Image processing • nPath Pattern/Path • Exec Dashboards• Search indexes • Clickstream analysis • Adhoc/OLAP• Graph (PYMK) • A/B site testing • Complex SQL• MapReduce • Data Sciences discovery • SQL • SQL-MapReduce Interactive MapReduce Batch data transformations for Integration with structured data, analytics for the enterprise using engineering groups using HDFS + operational intelligence, scalable MapReduce Analytics & MapReduce distribution of analytics SQL-MapReduceLinkedIn Confidential ©2013 All Rights Reserved 24
  • It’s a global economy Country connectedness on LinkedIn
  • Data deep dives Job migration after financial collapse
  • How Often do people change jobs?
  • Visualization is important
  • If your name is Chip, you are likely in sales!
  • Industry Growth 31
  • Buzzwords
  • What next?• Self service analytics• Metadata framework• Integrate reporting solutions• Go Mobile!• Scalability and Data Quality
  • Challenges• Data volumes and availability – Billion+ rows every day – Users in Global locations need data  Data Quality – User input data – Data standardization • Multiple platforms – Agile development – Data Integration
  • Key Learnings Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think• Measuring your future investments – Performance is not the only measure – Company fundamentals matter• As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  • Web 3.0 – It’s all about data!!LinkedIn Confidential ©2013 All Rights Reserved 36
  • ULTIMATELY…
  • It is all about the people!
  • Thank You!LinkedIn Confidential ©2013 All Rights Reserved 39