Information Excellence                       2012 Sep Session         Harvesting Information ExcellenceRecruiting Solutions
Today’s Speakers                         Hari Shankar,                         Big Data Engineer, Linkedin                ...
Big data and HadoopSeptember 2012Hari Shankar MenonSoftware engineerLinkedIn                      3
About me LinkedIn Engineering        Data warehouse team Previously, Software engineer @Clickable   – Worked on buildin...
Agenda About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges                                 5
Our missionConnect the world’s professionals to make  them more productive and successful                                 ...
LinkedIn by numbers                                 175M+                                            90                   ...
 About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges                                 8
What is big data?* Chart from Philip Russom- Research Director: TDWI
Infrastructure technologies                                            Search technologies Primary data store (Front-end) ...
Open sourcehttp://data.linkedin.com/opensource                                  11
 About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges                                 12
 What is Hadoop Evolution of Hadoop Impact                        13
@ Recommendation systems   –   Generating recommendations   –   Modeling   –   A/B Testing   –   Grandfathering Data war...
The Recommendations opportunity• Relevance/Late                   Pandora Search for People  ncy• Offline  computation    ...
Improving recommendations• Mathematical modeling• A/B Testing• Grandfathering                             16
Hadoop in the Data warehouse         • Longer retention    • Source of truth         • Complex             • Lower retenti...
Hadoop in Data Sciences• Deep dives• Sandbox• Hackday projects                           18
Data Insights - 1            Job migration after financial collapse                                                     19
Data Insights - 2                    20
Data Insights - 3                    21
 About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges                                 22
Challenges1.   User adoption of new technologies2.   Real-time processing3.   Graph/Network algorithms4.   Making data acc...
User adoption                24
Real-time processing• Challenges   • Random reads/writes   • Warm-up time• Solutions   • Parts of the problem that can be ...
Map-reduce-incompatible problems• Graph problems• Traditional joins                                            26
Making data accessible• Hadoop  Tons of data                                27
Finally!No Silver bulletHadoop  Offline processingScalability by design                              28
www.linkedin.com/in/harisreekumarwww.linkedin.com/company/linkedin/careers                                            29
About Information Excellence Group                                       Community Focused                               ...
Upcoming SlideShare
Loading in …5
×

Big Data Usage in Linkedin

821 views

Published on

Information Excellence Presentation 2010 Sep from Hari Shankar, Linkedin Big Data Engineer, on Big Data usage in Linkedin

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
821
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big Data Usage in Linkedin

  1. 1. Information Excellence 2012 Sep Session Harvesting Information ExcellenceRecruiting Solutions
  2. 2. Today’s Speakers Hari Shankar, Big Data Engineer, Linkedin Big Data Usage and Implementation in Linkedin Thank You for hosting us todayInformation Excellence 2 informationexcellence.wordpress.com
  3. 3. Big data and HadoopSeptember 2012Hari Shankar MenonSoftware engineerLinkedIn 3
  4. 4. About me LinkedIn Engineering  Data warehouse team Previously, Software engineer @Clickable – Worked on building the reporting and analytics platform on Hadoop and HBase. Hadoop and Open-source enthusiast 4
  5. 5. Agenda About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 5
  6. 6. Our missionConnect the world’s professionals to make them more productive and successful 6
  7. 7. LinkedIn by numbers 175M+ 90 ~2/sec New Members joining >2M 55 Company Pages 32 85% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011 7
  8. 8.  About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 8
  9. 9. What is big data?* Chart from Philip Russom- Research Director: TDWI
  10. 10. Infrastructure technologies Search technologies Primary data store (Front-end) Document-oriented store Distributed key-value store Distributed PubSub messaging Database change replication SenseiDB Zoie Bobo 10
  11. 11. Open sourcehttp://data.linkedin.com/opensource 11
  12. 12.  About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 12
  13. 13.  What is Hadoop Evolution of Hadoop Impact 13
  14. 14. @ Recommendation systems – Generating recommendations – Modeling – A/B Testing – Grandfathering Data warehouse/ETL – Raw data storage – Aggregations – Heavy lifting Data sciences – Strategic analyses – Experimentation sandbox 14
  15. 15. The Recommendations opportunity• Relevance/Late Pandora Search for People ncy• Offline computation Events You Groups browse maps May Be Interested In• Caching 15
  16. 16. Improving recommendations• Mathematical modeling• A/B Testing• Grandfathering 16
  17. 17. Hadoop in the Data warehouse • Longer retention • Source of truth • Complex • Lower retention transformations • Ad-hoc analysis • Algorithmic computations 17
  18. 18. Hadoop in Data Sciences• Deep dives• Sandbox• Hackday projects 18
  19. 19. Data Insights - 1 Job migration after financial collapse 19
  20. 20. Data Insights - 2 20
  21. 21. Data Insights - 3 21
  22. 22.  About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 22
  23. 23. Challenges1. User adoption of new technologies2. Real-time processing3. Graph/Network algorithms4. Making data accessible 23
  24. 24. User adoption 24
  25. 25. Real-time processing• Challenges • Random reads/writes • Warm-up time• Solutions • Parts of the problem that can be moved offline? • HBase, Voldemort 25
  26. 26. Map-reduce-incompatible problems• Graph problems• Traditional joins 26
  27. 27. Making data accessible• Hadoop  Tons of data 27
  28. 28. Finally!No Silver bulletHadoop  Offline processingScalability by design 28
  29. 29. www.linkedin.com/in/harisreekumarwww.linkedin.com/company/linkedin/careers 29
  30. 30. About Information Excellence Group Community Focused Volunteer Driven Knowledge Share Accelerated Learning Collective Excellence Distilled Knowledge Shared, Non Conflicting Goals Validation / Brainstorm platform Progress Mentor, Guide, Coach Information Excellence Satisfied, Empowered Towards an Enriched Professional Profession, Business and Society Richer Industry and AcademiaInformation Excellence 30 informationexcellence.wordpress.com

×