Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DanielScale, Structure, andSemanticsDaniel TunkelangPrincipal Data Scientist at LinkedIn      Recruiting Solutions        ...
Take-Aways  Communication trumps knowledge representation.   Communication is the problem and the solution.               ...
Overview1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem.      ...
The Bad News1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem.  ...
AI: a dream deferred.                        5
Memex: the Computer Science Version                                      6
Cyc      7
Freebase           8
Wolfram Alpha                9
Knowledge representation is overrated.Today’s knowledge repositories are:§  incomplete§  inconsistent§  inscrutable§  ...
The Good News1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem. ...
Deep Blue            vs.                  12
Watson         13
Plain Old Search Engines are Pretty Good Too  http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/       ...
The Unreasonable Effectiveness of Data§  simple models + lots of data >>                              elaborate models + ...
Today’s Challenge1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication probl...
Semi-structured Data         Michael K. Bergman, http://www.mkbergman.com/                                                ...
Semi-structured Data at LinkedInSummary                            <person>I lead a data science                 <id>team ...
Semi-structured Search is a Killer App                                         19
Another Example: Helping a FriendDear Daniel,Im attaching the resume of an old friend who just moved upto the Bay Area.He ...
Company Search                 21
Semi-structured Data Empowers Users                                      22
Data-Driven Recommendations                              23
Data-Driven Computation Serves Communication  for i in [1..n]!    s ← w 1 w 2 … w i!    if Pc(s) > 0!      a ← new Segment...
Recommendations Leverage Semi-structured Data               Job                                            Corpus Stats   ...
Skills: A Practical Knowledge Representation                                               26
Data-Driven Query Expansion for Recall                                         27
Data-Driven Query Refinement for Precision                                             28
There is no perfect schema or vocabulary.§  And even if there were, not everyone would use it.§  Knowledge representatio...
Communication is the problem and the solution.§  Rich communication channel fills gaps in system’s    knowledge represent...
The Future is Upon Us                        31
One More Thing     “More data beats clever algorithms      but better data beats more data.”        Monica Rogati @ Strata...
Thank You!                  Questions?                    Contact:             dtunkelang@linkedin.com                  We...
Upcoming SlideShare
Loading in …5
×

Scale, Structure, and Semantics

13,317 views

Published on

Keynote at 2012 Semantic Technology and Business Conference

Scale, Structure, and Semantics
Daniel Tunkelang, LinkedIn

Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000.

Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data.

In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.

Published in: Technology, Education
  • Be the first to comment

Scale, Structure, and Semantics

  1. DanielScale, Structure, andSemanticsDaniel TunkelangPrincipal Data Scientist at LinkedIn Recruiting Solutions 1
  2. Take-Aways Communication trumps knowledge representation. Communication is the problem and the solution. 2
  3. Overview1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem. 3
  4. The Bad News1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem. 4
  5. AI: a dream deferred. 5
  6. Memex: the Computer Science Version 6
  7. Cyc 7
  8. Freebase 8
  9. Wolfram Alpha 9
  10. Knowledge representation is overrated.Today’s knowledge repositories are:§  incomplete§  inconsistent§  inscrutable§  and not sustained by economic incentives.1986 estimate of effort to complete Cyc:§  250,000 rules + 350 person-years 10
  11. The Good News1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem. 11
  12. Deep Blue vs. 12
  13. Watson 13
  14. Plain Old Search Engines are Pretty Good Too http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/ 14
  15. The Unreasonable Effectiveness of Data§  simple models + lots of data >> elaborate models + less data§  machine translation: parallel corpora >> elaborate rules for syntactic and semantic patterns§  semantic web formalism just means semantic interpretation on shorter strings between angle bracketsAlon Halevy, Peter Norvig, and Fernando Pereira (2009) 15
  16. Today’s Challenge1.  Knowledge representation is overrated.2.  Computation is underrated.3.  We have a communication problem. 16
  17. Semi-structured Data Michael K. Bergman, http://www.mkbergman.com/ 17
  18. Semi-structured Data at LinkedInSummary <person>I lead a data science <id>team at LinkedIn, which <first-name />analyzes terabytes of <last-name />data to produce products <location>and insights that serve <name>LinkedIn’s members. <country>Prior to LinkedIn, I led a <code>local search quality team </country>at Google and was a </location>founding employee of <industry>faceted search pioneer …Endeca (acquired by </person>Oracle in 2010), where…
  19. Semi-structured Search is a Killer App 19
  20. Another Example: Helping a FriendDear Daniel,Im attaching the resume of an old friend who just moved upto the Bay Area.He has a very strong background in:§  mobile / wireless applications§  start-ups and new product launches§  international expansionBest regards,XXX 20
  21. Company Search 21
  22. Semi-structured Data Empowers Users 22
  23. Data-Driven Recommendations 23
  24. Data-Driven Computation Serves Communication for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 24
  25. Recommendations Leverage Semi-structured Data Job Corpus Stats Matching Transition probabilities Connectivity Binary yrs of experience to reach titletitle industry … Exact matches: education needed for this titlegeo description …company functional area geo, industry, … User Base Soft Similarity (candidate expertise, job description) transition Filtered 0.56 probabilities, Similarity Candidate similarity, (candidate specialties, job description) … 0.2 Transition probability Text (candidate industry, job industry) General Current Position 0.43 expertise title specialties summary Title Similarity education tenure length 0.8 headline industry Similarity (headline, title) geo functional area experience … 0.7 . derive d . . 25
  26. Skills: A Practical Knowledge Representation 26
  27. Data-Driven Query Expansion for Recall 27
  28. Data-Driven Query Refinement for Precision 28
  29. There is no perfect schema or vocabulary.§  And even if there were, not everyone would use it.§  Knowledge representation has only succeeded within narrow scope.§  Brute force is surprisingly effective but does not leverage the user as an intelligent partner. 29
  30. Communication is the problem and the solution.§  Rich communication channel fills gaps in system’s knowledge representation and in user’s knowledge.§  Use data science to make the system smart, but be humble and empower the human user. Youve got the brawn Ive got the brains Lets make lots of money Pet Shop Boys, “Opportunities” 30
  31. The Future is Upon Us 31
  32. One More Thing “More data beats clever algorithms but better data beats more data.” Monica Rogati @ Strata 2012 32
  33. Thank You! Questions? Contact: dtunkelang@linkedin.com We’re Hiring! 33

×