Data By The People, For The People


Published on

Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn

Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)

LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.


Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Published in: Technology

Data By The People, For The People

  1. DanielData By The People, For The PeopleDaniel TunkelangDirector, Data ScienceLinkedIn Recruiting Solutions 1
  2. Why do 175M+ people use LinkedIn? 2
  3. Identity: find and be found 3
  4. Insights: discover and share knowledge 4
  5. People use LinkedIn because of other people. 5
  6. People as Users + People as Data Unique opportunities and challenges! §  Search §  Recommendations §  Networking 6
  7. Search 7
  8. People search is personal! 8
  9. But not all relevance factors are personal. Good Bad 9
  10. People are semi-structured objects. for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 10
  11. LinkedIn uses scale to derive structure. Software Developer 11
  12. Social network is more than a ranking signal. 12
  13. People are a gateway to other entities. 13
  14. Search: Summary People finding people. People being found. People finding content. Through other people. 14
  15. Recommendations 15
  16. Recommendation products at LinkedIn Similar Profiles Connections Network updates Events You May Be Interested In News 16
  17. LinkedIn’s recommender ecosystemRecommendations drive:> 50% of connections > 50% of job applications > 50% of group joins 17
  18. Inputs for recommender systems Social Graph Content Behavior Queries Page Views Actions … 18
  19. Jobs You Might Be Interested In 19
  20. How LinkedIn matches people to jobs Job Corpus Stats Matching Transition probabilities Connectivity Binary yrs of experience to reach titletitle industry … Exact matches: education needed for this titlegeo description …company functional area geo, industry, … User Base Soft Similarity (candidate expertise, job description) transition Filtered 0.56 probabilities, Similarity Candidate similarity, (candidate specialties, job description) … 0.2 Transition probability Text (candidate industry, job industry)General Current Position 0.43expertise titlespecialties summary Title Similarityeducation tenure length 0.8headline industry Similarity (headline, title)geo functional areaexperience … 0.7 . derive d . . 20
  21. Is job-hunting socially contagious? [Posse, 2012] 21
  22. Social referralSuggest based on connection strengthand relevance to target user. 2x conversion! [Amin et al, 2012] 22
  23. Suggested skill endorsements 23
  24. Recommendations: Summary Content is king. Connections provide social dimension. Context determines where and when a recommendation is appropriate. 24
  25. Networking 25
  26. People You May Know 26
  27. Closing the triangles Carol Alice ? Bob§  Triads suggest and affect relationships. [Simmel, 1908], [Granovetter, 1973]§  Triangle closing is a Big Data problem. [Shah, 2011]§  Use machine learning to rank candidates. 27
  28. Shared connections as a signal 28
  29. Power of social proof 29
  30. More power of social proof … 30
  31. Networking: Summary Close triangles to suggest connections. Connections as social proof. Unleash the power of weak ties. 31
  32. Conclusion§  People use LinkedIn because of other people.§  Primary use cases: – Find and be found. – Discover and share knowledge.§  People are at the heart of LinkedIn’s products: – Search – Recommendations – Networking 32
  33. Thank You! 175M+ 2/sec 62% non U.S. 25th 90 We’re Most visit website worldwide (Comscore 6-12) 55 Hiring! >2M Company pages 85% 32 17 8 2 4 Fortune 500 Companies use LinkedIn to hire2004 2005 2006 2007 2008 2009 2010 2011 LinkedIn Members (Millions) Learn more at 33