Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Michael Wick - Human Machine Cooperation: User Corrections for AKBC

933 views

Published on

Published in: Education, Technology
  • I have done a couple of papers through ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ they have always been great! They are always in touch with you to let you know the status of paper and always meet the deadline!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you need your papers to be written and if you are not that kind of person who likes to do researches and analyze something - you should definitely contact these guys! They are awesome ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Michael Wick - Human Machine Cooperation: User Corrections for AKBC

  1. 1. Human Machine Cooperation: User Corrections for AKBC Michael Wick, Karl Schultz, Andrew McCallum University of Massachusetts, Amherst.
  2. 2. Motivation• KBs for real-world decision making• Problem: data needs integration • AKBC/IE is scalable, but inaccurate • Humans are more accurate, lack coverage• Question: how do we combine human and machine KBC?
  3. 3. Goal: build a database of every scientist in the world.
  4. 4. Knowledge Base Construction .pdf Text Text .bib docs Structured docs.html Data query Entity Relation Entities, Mentions Mentions Relations Entity Relation Resolution KBExtraction Extraction (Coref) Wei Li Attends( Wei Li W. Li Wei Li, W. Li Xinghua U. Xinghua U.) Xinghua U. “truth” answer Problem: (1) errors snowball in IE pipeline (2) errors persist in DB - forever
  5. 5. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3id=1
  6. 6. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref?id=1
  7. 7. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOWid=1
  8. 8. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOWid=1
  9. 9. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”
  10. 10. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1”
  11. 11. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1” “Fernando Pereira with id=5 is Fernando Pereira with id=4”
  12. 12. How should these edits be managed?
  13. 13. Edits to CoreferenceKB with coref errors
  14. 14. Edits to CoreferenceKB with coref errors
  15. 15. Edits to CoreferenceKB with coref errorsStream of user edits good edit bad edit must-link must-link
  16. 16. Edits to CoreferenceKB with coref errorsStream of user edits good edit bad edit must-link must-linkIncorporate edits: how do we resolve conflicts?
  17. 17. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  18. 18. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  19. 19. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  20. 20. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  21. 21. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  22. 22. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  23. 23. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
  24. 24. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
  25. 25. How should edits be managed?
  26. 26. How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time
  27. 27. How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time• Our approach: • edits as statistical evidence • “truth” inferred from evidence
  28. 28. What is the truth?
  29. 29. What is the truth? The truth Evidence
  30. 30. What is the truth? The truth Evidence Unstructured data (e.g.PDFs)
  31. 31. What is the truth? The truth Structured data (e.g., ACM, DBLP) Evidence Unstructured data (e.g.PDFs)
  32. 32. What is the truth? The truth Structured data (e.g., ACM, DBLP) EvidenceUser edits Unstructured data (e.g.PDFs)
  33. 33. What is the truth? TheInfered by MCMC truth Structured data IE models (e.g., ACM, DBLP) (e.g., CRFs) Evidence User edits Unstructured data (e.g.PDFs)
  34. 34. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”
  35. 35. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira”
  36. 36. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  37. 37. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Institution: Google “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  38. 38. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  39. 39. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  40. 40. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  41. 41. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: NLP must-link Topics: MPEG
  42. 42. Human Edits: Mentions Added to DB First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingName: Fernando Pereira Name: Fernando PereiraInstitution: Google Institution: U. Edinburgh Name: Fernando Pereira Name: Fernando Pereira Topics: CRF Topics: Prolog Name: Fernando Pereira Name: Fernando Pereira Topics: NLP Topics: MPEG
  43. 43. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingName: Fernando Pereira Name: Fernando PereiraInstitution: Google Institution: U. EdinburghName: Fernando Pereira Name: Fernando PereiraTopics: CRF Topics: PrologName: Fernando PereiraTopics: NLP Name: Fernando Pereira Topics: MPEG
  44. 44. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming
  45. 45. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
  46. 46. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? YES Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
  47. 47. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP
  48. 48. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref?
  49. 49. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
  50. 50. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
  51. 51. Experiments1. Build initial KB with automatic coreference
  52. 52. Experiments1. Build initial KB with automatic coreference
  53. 53. Experiments1. Build initial KB with automatic coreference2. Simulate user edits good edit bad edit must-link must-link
  54. 54. Experiments1. Build initial KB with automatic coreference2. Simulate user edits good edit bad edit must-link must-link3. Apply edits: our probabilistic vs two deterministic approaches
  55. 55. Hierarchical + Human Edits Better incorporation of correct human edits Database quality versus the number of correct human edits Edit incorporation strategy Our probabilistic 0.80 Epistemological (probabilistic) Overwrite Maximally satisfy reasoning 0.75 0.70F1 accuracy Local 0.65 satisfaction 0.60 Traditional 0.55 Overwrite 0 5 10 15 20 25 30 No. of human edits
  56. 56. Hierarchical + Human Edits More robust to incorrect human edits Database quality versus the number of errorful human edits Our probabilistic Edit incorporation strategy 0.8 Epistemological (probabilistic) reasoning Complete trust in users 0.7 0.6Precision 0.5 Complete trust in humans 0.4 0 10 20 30 40 50 60
  57. 57. Come see our poster!• Technical details including - Hierarchical CRF for coreference - MCMC for inference• Probabilistic incorporation of human edits• Epistemological Databases THANK YOU

×