• Save
Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network
Upcoming SlideShare
Loading in...5
×
 

Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network

on

  • 1,787 views

 

Statistics

Views

Total Views
1,787
Views on SlideShare
1,650
Embed Views
137

Actions

Likes
4
Downloads
0
Comments
0

4 Embeds 137

http://www.linkedin.com 43
http://assets.txmblr.com 37
https://www.linkedin.com 29
http://kvarkado.tumblr.com 28

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • LinkedIn – I am currently a data scientist at LinkedIn, one of the world's most advanced big data companies.LivePerson – I have previously worked at LivePerson where I was the first person hired to build their big data solution, so I have experienced both the very beginning of big data solutions and the cutting edge.I will share with you the lessons I've learned while working on big data from both ends of the spectrumI also have a business degree from the Israeli Institute of Technology and a computer science degree from Ben-Gurion university
  • This is what I am going to talk about, I chose these subjects because they answer the most burning questions both when I was starting with big data and when I was perfecting my craft
  • The term, Big Data, is used in many ways, so before I'll start talking about big data, I want to explain what big data is
  • Yes, there is an entry in the Oxford English Dictionary for Big Data
  • The main word here is standard. Before Big Data, standard methods and tools were enough to process the data we had and now it's not, but what happened?
  • Data created opportunities, which in turn created demand for even more data and the amount of data in the world grew larger and larger
  • So what are those big data opportunities I've mentioned? The best way to see is through examples.
  • Amazon, the ecommerce giant analyzes data about its shoppers. It analyzes what products they are looking at, what products they are searching for and most importantly, what products they are buying.This analysis enables them to produce a product I am sure you have all seen ...
  • Here we can see that if I look at the book "Big Data Analytics", Amazon provides me with other recommendations about similar books.-- Show increase in sales –So why did it increase sales so much? The logic here is simple, the more products customers see, the higher the chance they will buy something. Amazon wants to show us as many products as it can in order to get us to buy something.
  • My second example is Netflix.Netflix is an American company that started as a DVD rental service and quickly became a streaming platform for movies and TV shows. It has about 30 million subscribers.At the end of each movie, Netflix asks the viewer to rate the movie he just watched. Netflix has billions of movie ratings from millions of users and it uses this data to create the following product.
  • Using our rating history, Netflix calculates a unique "taste" for every one of its subscribers and uses this taste to recommend them movies. This product is so important to Netflix, that in 2006 Netflix offered a prize of million dollars to whoever can improve their algorithm by more than 10%.-- Show statisticsSo why is this recommendation engine is so important? The more users find movies they like on Netflix, the longer they will keep their subscription, earning money to Netflix.
  • My third example is a small Israeli startup. Waze is a GPS mobile app that tracks where people are and at what speed are they travelling.
  • Waze uses this data to compute traffic maps where they show which streets are have traffic jams and route you according to this data, providing much better traffic suggestions than apps that don't use traffic information.After gaining more than 50 million users for its app, Waze was acquired by Google for about 1.1 billion dollars.Side note: I understand there will be a talk later today by a Korean company that does something very similar.
  • The above examples, and many more, lead me to the first lesson I've learned about big data
  • These are great examples. But to dive even deeper to big data applications, let's look at the company I currently work for, LinkedIn.Since we said that Big Data is more about business than data, let me show you first what is LinkedIn's business.
  • LinkedIn is the largest professional social network in the world. It has more than 225M members. Our largest markets today are North America and Europe, but Asia is growing very well too, with several countries having more than a million members on LinkedIn.
  • Not only LinkedIn has a lot of members, it also makes significant revenue. Across it 3 bussiness lines, LinkedIn has made almost a billion dollars last year and about 325 million in the first quarter of 2013.
  • These 3 product lines are Premium Subscriptions, Marketing Solutions and Talent Solutions.Let's dive more deeply into each one of them to understand them better
  • The premium subscriptions business is for LinkedIn users that want to get extra features on LinkedIn. Those features might be better analytics about who viewed their profile and the ability to contact anyone on LinkedIn through In Mails, LinkedIn's personal messaging system.This product really separates LinkedIn from other social networks in the fact that some of the users of the network pay extra to use it.
  • Marketing solutions is more similar to what you can find on other social networks. We offer companies the ability to market their products to our members. Since LinkedIn is a professional network with most members having a job or even a lucrative one. The target population is very appealing for marketers who want to market their products.
  • Our third and largest in terms of revenue product line is the talent solution. Here companies like Sony, Walmart and Loreal pay for their recruiters to have additional functionality for their recruiting needs. This is almost like another product inside LinkedIn for our recruiter members. This product line bring about 57% of LinkedIn's revenue.
  • LinkedIn's number 1 mission is connecting talent with opportunity. Both helping companies find new talent and helping our 225+ million members find new opportunities when they need themOne of the first big data applications at LinkedIn was to help members find a new job, and I will now dive deep into how it was done
  • JYMBII is a big data product that matches members with job postings on LinkedIn. For example: here is me, and some of the jobs companies posted on LinkedIn. For every job, we create a score on how much this job is a good fit for the member. Here you can see that I am a good match for a data scientist position at Facebook, and not such a good match for a product manager at Yahoo.
  • After creating scores for all the jobs in our database, we create a small widget on our homepage where every member can see his top matching jobs.
  • I will walk you through the 3 pillars of every big data product – Design, Algorithms and Infrastructure/Framework.
  • Let's start with design. In a consumer oriented company design is very important, because this is how users interact with your product. Also, in many cases, design is the hardest thing for a single small team to change because so many teams are involved.In most companies the big data team is separate from the team that works on the main product, so those of you who already started implementing big data solutions probably know how difficult it is to try to do some tests on the main product. Try to do anything you can to bypass other teams in your organization to test your big data solutions.When LinkedIn's Data Science team decided to build JYMBII, they wanted a very very simple way to test whether their product is working without making too many changes to the main site. This is how they did it. They started with email. Here you can see how the actual email looks today, where I got some recommendations for jobs I might be interested in.The reason why they chose email, is because it is a way to test your product on a small subset of users, without everyone who comes to your website being affected by it and also there is no need to make any changes to the main website.
  • After the initial emails showed great success and that people are actually interested in it. Our team has built this very small widget that shows the top jobs you might be interested in. Again, it was done with minimum integration with the main website, by having this widget replace one of the ads we had on the site for a certain percentage of the users.
  • After the great success of the widget, Jobs have now their own section at the LinkedIn website where users can search for jobs and more.Having the job section resulted in having 1000 times more users looking at the LinkedIn jobs than beforehandRemember, JYMBII did not start with its own website, but grew up to have it.
  • My main message about how to design data products is to start simple and grow with success.
  • Let's now talk about algorithms, or how does LinkedIn matches members with job postings.The first iteration of the algorithm was very simple. We look at the member's profile, we look at the job posting and we do keyword matching. Very similar to how recruiters screen candidate resumes for a potential match. In this example we can see that my profile is a pretty decent match for this job opportunity.There is no need for a natural language processing expert or a computer science doctor to implement this algorithm. It is pretty simple and worked pretty well for our first prototype.
  • When the first protype of the email succeeded the team moved to imrove the algorithm a bit further, adding features like education and experience which are also very important for determining the candidate's fit to a position. These improvement, improved the recommendations even further, resulting in more people engaging with jobs on the LinkedIn website
  • Finally, now that we have our job page on the website where users can search for jobs, save jobs and apply for jobs. We can use all of these signals to recommend users similar jobs to the ones the found themselves.All of these improvements resulted in a 50% more accurate job recommendations to our members.
  • The message for algorithms is the same as it for design, don't try to implememnt something very difficult before you know your customers even want it. Start simple and grow with success.
  • Here is a quote from a Twitter engineering manager that I like very much. What it says that most of the time, Hadoop doesn't solve a big data problem, it actually brings a set of new problems to deal with even before we know that what we are trying to build is worth building.
  • The first JYMBII prototype was developed using a very simple technology. Oracle, some perl scripts in between in some shell scripts. The process involved someone copying files manually from one computer to another, running some scripts on that computer and then copying back the results. The process was so inefficient that it took 6 weeks to run.But 6 weeks is better than never.
  • After the success of the initial product, LinkedIn has decided to make some infrastructure invetment in buying a parallel database from companies like GreenPlum and AsterData. This sped up the process to run now in a single week instead of 6.
  • Eventually LinkedIn moved not only to Hadoop but also built it's own infrastucture with project like Kafka, Voldemort and Zoie. You can find more information about them on the linkedin open source page.Now we are generating new recommendations every day, which is 50 times better than having it every 6 weeks.You probably figured out the second lessong by now ...
  • One of the most important questions that kept me busy for a long time as well is where you find big data expertsBefore I give you the answer, I would like to show you 2 graphs
  • Here you can see that in the beginning of 2011 the demand for big data experts was 30 times higher than the year before. Now it is even higher. Everyone is looking for big data experts.
  • Here is a graph from LinkedIn's own analytics team. Here you can see that 33% of the people who started a job as data scientist or analysts are new to this job.You can probably see where I am going with this. Most people who work in big data are new to big data.LinkedIn have realized it quickly and here is the proof ...
  • Here is an actual LinkedIn job posting from 2008 when LinkedIn just started with big data.The key message is this ... No specific technical skills are requiredHere is an example of how LinkedIn have implemented this strategy on 2 of my colleagues.
  • Joseph Adler came to LinkedIn from Netflix, where he did Operations Engineering. Now he is one of our top experts on big data and even written a very successful book about it.
  • Jason is a new data scientist at linkedin. Prior to that he was radar signal processing expert. He is still just at the beginning of his career at LinkedIn, but so far he is doing very well and educating himself quickly,
  • My third lesson is a bit hard to chew, but if you follow my previous 2, it becomes easier. Look for big data experts everywhere and at all times, but don't let it stop you from starting your projects.
  • So how do you start a big data project? I would like to show you a very simple recipe you could follow
  • As always, in order to make it more clear, I will use an example to guide us through the recipe.People You May Know is a LinkedIn Big Data product that traverses your profile and the entire LinkedIn graph to suggest people you should connect with.Let's see how can we use our recipe to create big data applications such as People You May Know.
  • Important business metric – how often members visit the websiteCorrelating factors – How many new items they have on their news feed. But that is not the root of the cause, something else is affecting it.Causing factors – How many connections do the have.Product – Recommend new connections to users – People You May Know.Beware of the second-system effect, how many of you have been involved with projects where the first prototype was pretty succesful and the second one was much bigger and failed?

Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network Presentation Transcript

  • How LinkedIn leveraged its data to become the world's largest professional network
  • About me ©2013 LinkedIn Corporation. All Rights Reserved. 2 Vitaly Gordon
  • ©2013 LinkedIn Corporation. All Rights Reserved. Agenda 1 What is Big Data? 2 Big Data Applications 3 LinkedIn’s Big Data Solutions 4 Finding Experts 5 Big Data Recipe 6 Summary
  • ©2013 LinkedIn Corporation. All Rights Reserved. 1 What is Big Data? 2 Big Data Applications 3 LinkedIn’s Big Data Solutions 4 Finding Experts 5 Big Data Recipe 6 Summary
  • Data sets that are too large and complex to manipulate or interrogate with standard methods or tools. Oxford Dictionary ©2013 LinkedIn Corporation. All Rights Reserved.
  • Data sets that are too large and complex to manipulate or interrogate with standard methods or tools. Oxford Dictionary ©2013 LinkedIn Corporation. All Rights Reserved.
  • Big Data Growth ©2013 LinkedIn Corporation. All Rights Reserved. 7 1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09 Storage Growth Data Growth
  • ©2013 LinkedIn Corporation. All Rights Reserved. 2 Big Data Applications 3 LinkedIn’s Big Data Solutions 4 Finding Experts 5 Big Data Recipe 6 Summary 1 What is Big Data?
  • ©2013 LinkedIn Corporation. All Rights Reserved. 9
  • ©2013 LinkedIn Corporation. All Rights Reserved. 10 increase in sales
  • ©2013 LinkedIn Corporation. All Rights Reserved. 11
  • ©2013 LinkedIn Corporation. All Rights Reserved. 12 of watched content
  • ©2013 LinkedIn Corporation. All Rights Reserved. 13
  • ©2013 LinkedIn Corporation. All Rights Reserved. 14 40M users in 18 months
  • Big Data is more about Business than Data
  • ©2013 LinkedIn Corporation. All Rights Reserved. 3 LinkedIn’s Big Data Solutions 4 Finding Experts 5 Big Data Recipe 6 Summary 1 What is Big Data? 2 Big Data Applications
  • ©2013 LinkedIn Corporation. All Rights Reserved. 17
  • LinkedIn Revenue Quarterly Revenue ------------------200 ----------------------------------2010-------------------------------2011---------------- Hiring Solutions Marketing Solutions Premium Subscriptions ($ millions) -----------------2012-------------------2013--- ©2013 LinkedIn Corporation. All Rights Reserved. 18 23 28 30 39 45 55 62 82 94 121 139 168 188 228 252 304 325 0 50 100 150 200 250 300 350 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1
  • ©2013 LinkedIn Corporation. All Rights Reserved. 19
  • Premium Subscriptions ©2013 LinkedIn Corporation. All Rights Reserved. 20
  • Marketing Solutions ©2013 LinkedIn Corporation. All Rights Reserved. 21
  • Talent Solutions ©2013 LinkedIn Corporation. All Rights Reserved. 22
  • Connecting Talent With Opportunity ©2013 LinkedIn Corporation. All Rights Reserved.
  • Jobs You May Be Interested In (JYMBII) – Case Study ©2013 LinkedIn Corporation. All Rights Reserved. 24 Software Engineer at Data Scientist at Product Manager at
  • Jobs You May Be Interested In – Case Study ©2013 LinkedIn Corporation. All Rights Reserved. 25 Design
  • JYMBII – Building The Product ©2013 LinkedIn Corporation. All Rights Reserved. 26 Algorithms Design Design Algorithms Framework
  • Design ©2013 LinkedIn Corporation. All Rights Reserved. 27
  • Design ©2013 LinkedIn Corporation. All Rights Reserved. 28
  • Design ©2013 LinkedIn Corporation. All Rights Reserved. 29 1,000X more users
  • Start simple Grow with success ©2013 LinkedIn Corporation. All Rights Reserved.
  • Algorithms ©2013 LinkedIn Corporation. All Rights Reserved. 31 `
  • Algorithms ©2013 LinkedIn Corporation. All Rights Reserved. 32
  • Algorithms ©2013 LinkedIn Corporation. All Rights Reserved. 33 50% better results
  • Start simple Grow with success ©2013 LinkedIn Corporation. All Rights Reserved.
  • Technology ©2013 LinkedIn Corporation. All Rights Reserved. 35 Some people, when confronted with a big data problem, think, I'll use Hadoop. Now they have a big data problem and a big Hadoop cluster. Dmitry Ryaboy, Twitter Engineering Manager
  • Technology ©2013 LinkedIn Corporation. All Rights Reserved. 36
  • Technology Advancement ©2013 LinkedIn Corporation. All Rights Reserved. 37
  • Technology Advancement ©2013 LinkedIn Corporation. All Rights Reserved. 38 50X faster Kafka
  • Start simple, grow with success
  • ©2013 LinkedIn Corporation. All Rights Reserved. 4 Finding Experts 5 Big Data Recipe 6 Summary 1 What is Big Data? 2 Big Data Applications 3 LinkedIn’s Big Data Solutions
  • Finding Data Experts ©2013 LinkedIn Corporation. All Rights Reserved. 41 Increase in demand for big data experts X
  • Finding Data Experts ©2013 LinkedIn Corporation. All Rights Reserved. 42 Are new analytics experts 33
  • Finding Data Experts ©2013 LinkedIn Corporation. All Rights Reserved. 43 Be challenged at LinkedIn We're looking for superb analytical minds of all levels to expand our small team that will build some of the most innovative products at LinkedIn. No specific technical skills are required (we'll help you learn SQL, Python, and R). You should be extremely intelligent, have a quantitative background, and be able to learn quickly and work independently. This is the perfect job for someone who's really smart, driven, and extremely skilled at creatively solving problems. You'll learn statistics, data mining, programming, and product design, but you've gotta start with what we can't teach—intellectual sharpness and creativity.
  • LinkedIn Experts ©2013 LinkedIn Corporation. All Rights Reserved. 44
  • LinkedIn Experts ©2013 LinkedIn Corporation. All Rights Reserved. 45
  • Don't wait for a big data expert to knock on your door - create your own
  • ©2013 LinkedIn Corporation. All Rights Reserved. 5 Big Data Recipe 6 Summary 1 What is Big Data? 2 Big Data Applications 3 LinkedIn’s Big Data Solutions 4 Finding Experts
  • ©2013 LinkedIn Corporation. All Rights Reserved. 48 Big Data Recipe
  • ©2013 LinkedIn Corporation. All Rights Reserved. 49 Big Data Recipe INGREDIENTS 1. Important business metric 2. Correlating factors 3. Causing factors 4. Product to affect the behavior METHOD OF PREPARATION 1. Build a simple prototype 2. Measure the effect 3. Improve logic and scale 4. Measure the effect 5. Improve logic and scale 6. Measure the effect
  • ©2013 LinkedIn Corporation. All Rights Reserved. 6 Summary 1 What is Big Data? 2 Big Data Applications 3 LinkedIn’s Big Data Solutions 4 Finding Experts 5 Big Data Recipe
  • ©2013 LinkedIn Corporation. All Rights Reserved. 51
  • ©2013 LinkedIn Corporation. All Rights Reserved. 52 감사합니다