• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Usage in Linkedin
 

Big Data Usage in Linkedin

on

  • 518 views

Information Excellence Presentation 2010 Sep from Hari Shankar, Linkedin Big Data Engineer, on Big Data usage in Linkedin

Information Excellence Presentation 2010 Sep from Hari Shankar, Linkedin Big Data Engineer, on Big Data usage in Linkedin

Statistics

Views

Total Views
518
Views on SlideShare
518
Embed Views
0

Actions

Likes
0
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Usage in Linkedin Big Data Usage in Linkedin Presentation Transcript

    • Information Excellence 2012 Sep Session Harvesting Information ExcellenceRecruiting Solutions
    • Today’s Speakers Hari Shankar, Big Data Engineer, Linkedin Big Data Usage and Implementation in Linkedin Thank You for hosting us todayInformation Excellence 2 informationexcellence.wordpress.com
    • Big data and HadoopSeptember 2012Hari Shankar MenonSoftware engineerLinkedIn 3
    • About me LinkedIn Engineering  Data warehouse team Previously, Software engineer @Clickable – Worked on building the reporting and analytics platform on Hadoop and HBase. Hadoop and Open-source enthusiast 4
    • Agenda About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 5
    • Our missionConnect the world’s professionals to make them more productive and successful 6
    • LinkedIn by numbers 175M+ 90 ~2/sec New Members joining >2M 55 Company Pages 32 85% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011 7
    •  About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 8
    • What is big data?* Chart from Philip Russom- Research Director: TDWI
    • Infrastructure technologies Search technologies Primary data store (Front-end) Document-oriented store Distributed key-value store Distributed PubSub messaging Database change replication SenseiDB Zoie Bobo 10
    • Open sourcehttp://data.linkedin.com/opensource 11
    •  About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 12
    •  What is Hadoop Evolution of Hadoop Impact 13
    • @ Recommendation systems – Generating recommendations – Modeling – A/B Testing – Grandfathering Data warehouse/ETL – Raw data storage – Aggregations – Heavy lifting Data sciences – Strategic analyses – Experimentation sandbox 14
    • The Recommendations opportunity• Relevance/Late Pandora Search for People ncy• Offline computation Events You Groups browse maps May Be Interested In• Caching 15
    • Improving recommendations• Mathematical modeling• A/B Testing• Grandfathering 16
    • Hadoop in the Data warehouse • Longer retention • Source of truth • Complex • Lower retention transformations • Ad-hoc analysis • Algorithmic computations 17
    • Hadoop in Data Sciences• Deep dives• Sandbox• Hackday projects 18
    • Data Insights - 1 Job migration after financial collapse 19
    • Data Insights - 2 20
    • Data Insights - 3 21
    •  About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 22
    • Challenges1. User adoption of new technologies2. Real-time processing3. Graph/Network algorithms4. Making data accessible 23
    • User adoption 24
    • Real-time processing• Challenges • Random reads/writes • Warm-up time• Solutions • Parts of the problem that can be moved offline? • HBase, Voldemort 25
    • Map-reduce-incompatible problems• Graph problems• Traditional joins 26
    • Making data accessible• Hadoop  Tons of data 27
    • Finally!No Silver bulletHadoop  Offline processingScalability by design 28
    • www.linkedin.com/in/harisreekumarwww.linkedin.com/company/linkedin/careers 29
    • About Information Excellence Group Community Focused Volunteer Driven Knowledge Share Accelerated Learning Collective Excellence Distilled Knowledge Shared, Non Conflicting Goals Validation / Brainstorm platform Progress Mentor, Guide, Coach Information Excellence Satisfied, Empowered Towards an Enriched Professional Profession, Business and Society Richer Industry and AcademiaInformation Excellence 30 informationexcellence.wordpress.com