• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data infrastructure and Hadoop at LinkedIn
 

Data infrastructure and Hadoop at LinkedIn

on

  • 750 views

 

Statistics

Views

Total Views
750
Views on SlideShare
709
Embed Views
41

Actions

Likes
3
Downloads
35
Comments
1

2 Embeds 41

http://www.linkedin.com 34
https://www.linkedin.com 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • so many authors
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Being part of LinkedIn, being a social media company, we deal with a lot of data. We face with a lot of the challenges – sell LIHadoop user group
  • For us, fundamentally changing the way the world works begins with our mission statement: To connect the world’s professionals and entrepreneurs to make them more productive and successful. This means not only helping people to find their dream jobs, but also enabling them to be great at the jobs they’re already in. Platform that lets us become more productiveTalent is THE driving force for success and economic opportunity; that holds true for both individual professionals and the companies they work for. At our core, LinkedIn is in the business of connecting talent with opportunity at massive scale.  We are able to do this in an unprecedented way due to the convergence of two unique trends:Scalable infrastructure that connects hundreds of millions of people in milliseconds, andExtraordinary shifts in online behavior related to the way people represent their identities, build their networks and share information and knowledge. This is fundamentally changing the world in the way we live, play, and, of course, work. And that’s where LinkedIn is focused: on fundamentally transforming the way the world works. These factors enable LI to connect talent+opportunity.
  • With north of 175 million members, we’re making great strides toward our mission of connecting the world’s professionals to make them more productive and successful. For us this not only means helping people to find their dream jobs, but also enabling them to be great at the jobs they’re already in.-With terabytes of data flowing through our systems, generated from member’s profile, their connections and their activity on LinkedIn, we have amassed rich and structured data of one of the most influential, affluent and highly-educated audience on the web. This huge semi-structured data is getting updated in real-time and growing at a tremendous pace, we are all very excited about the data opportunity at LinkedIn
  • The power of LinkedIn’s platform grows exponentially as we continue toAdd more membersGet them to come back more often, and Give them more reasons to engage on the siteThese three actions drive network effects that form a virtuous cycle on LinkedIn. As membership grows, and activity on the platform increases, it improves the quantity and quality of data propagated throughout the network, which we then use to create better and more relevant products and services for our members and customers. Virtuous cycle. We have recommendation solutions for everyone, for individuals, recruiters and advertisers In our view, recommendations are ubiquitous and they permeate the whole site. Enables professionals to be more productive.Volume – Generally large – in several TB’s – sometimes in PBVariety – 80% of the data is unstructured, Growing at 15 time the rate of growth of structured data,,Velocity – High velocityUser data (More structured)Traffic data (Real-time)3rd party data (Batch data, but unstructured)Example
  • Need for various technologiesOne size doesn’t fit all
  • History: Google paper, Doug cutting, Yahoo, Storage and computation- Synonymous with big dataEmpowering.Made a lot of new ideas feasible, spurned a new bunch of startupsAbility to store and process => More data to storeMay be 2 slidesNAS systems, OLAP. But not feasible. Hadoop democratized scalable data processing.
  • We have recommendation solutions for everyone, for individuals, recruiters and advertisers In our view, recommendations are ubiquitous and they permeate the whole site.Very visible value addition – Right information to the right user at the right timeIntegral to virality of the networkProblems:Computation intensive algorithmsVariety of recommendationsLots of A/B testing required
  • We have recommendation solutions for everyone, for individuals, recruiters and advertisers In our view, recommendations are ubiquitous and they permeate the whole site.50% of job views/applications by members are a direct result of recommendations.Similar results across all recommendations
  • We have recommendation solutions for everyone, for individuals, recruiters and advertisers In our view, recommendations are ubiquitous and they permeate the whole site.AggregationsComplex transformationsLong-term data storageLoad sharing (?)
  • The Hadoop impactETL jobs transfer to hadoop has helped make data available to adhoc queries by data scientsts.
  • We have a unique perspective into data Before the collapse, we saw substantial spikes in user activity for the following 5 companies during major financial events:One hypothesis is that many of the employees left the financial industry.  According to the LinkedIn data set, that just isn’t true. Bank of America acquired Merrill Lynch and Nomura acquired Lehman Brothers’ franchise in the Asia Pacific region),Barclays was by far the biggest beneficiary, scooping up 10% of the laid off talent, followed by Credit Suisse at 1.5% and Citigroup at 1.1 %.
  • We have recommendation solutions for everyone, for individuals, recruiters and advertisers In our view, recommendations are ubiquitous and they permeate the whole site.
  • ENG SLIDE What is a data scientists? What are the different technologies, big data, challenges and opportunities? Open Source – IN Maps (hackday projects), full fledged products.
  • Add images for SQL/Mapreduce
  • Hadoop is, and will always be optimized for sequential reads and throughput rather than speed of completion
  • Use abstractions!  Pig and Hive
  • No random read/writsNative APIs insuficient

Data infrastructure and Hadoop at LinkedIn Data infrastructure and Hadoop at LinkedIn Presentation Transcript