Your SlideShare is downloading. ×
0
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Michael Wick - Human Machine Cooperation: User Corrections for AKBC

343

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
343
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • *Reminder of an epistemological database: streaming evidence is stored, truth is inferred\n *“Coref is the foundation for everything”\n *“Coref everywhere”\n * I will speak today about our work scaling coreference to large scales\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Human Machine Cooperation: User Corrections for AKBC Michael Wick, Karl Schultz, Andrew McCallum University of Massachusetts, Amherst.
    • 2. Motivation• KBs for real-world decision making• Problem: data needs integration • AKBC/IE is scalable, but inaccurate • Humans are more accurate, lack coverage• Question: how do we combine human and machine KBC?
    • 3. Goal: build a database of every scientist in the world.
    • 4. Knowledge Base Construction .pdf Text Text .bib docs Structured docs.html Data query Entity Relation Entities, Mentions Mentions Relations Entity Relation Resolution KBExtraction Extraction (Coref) Wei Li Attends( Wei Li W. Li Wei Li, W. Li Xinghua U. Xinghua U.) Xinghua U. “truth” answer Problem: (1) errors snowball in IE pipeline (2) errors persist in DB - forever
    • 5. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3id=1
    • 6. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref?id=1
    • 7. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOWid=1
    • 8. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOWid=1
    • 9. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”
    • 10. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1”
    • 11. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1” “Fernando Pereira with id=5 is Fernando Pereira with id=4”
    • 12. How should these edits be managed?
    • 13. Edits to CoreferenceKB with coref errors
    • 14. Edits to CoreferenceKB with coref errors
    • 15. Edits to CoreferenceKB with coref errorsStream of user edits good edit bad edit must-link must-link
    • 16. Edits to CoreferenceKB with coref errorsStream of user edits good edit bad edit must-link must-linkIncorporate edits: how do we resolve conflicts?
    • 17. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
    • 18. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
    • 19. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
    • 20. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
    • 21. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
    • 22. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
    • 23. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
    • 24. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
    • 25. How should edits be managed?
    • 26. How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time
    • 27. How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time• Our approach: • edits as statistical evidence • “truth” inferred from evidence
    • 28. What is the truth?
    • 29. What is the truth? The truth Evidence
    • 30. What is the truth? The truth Evidence Unstructured data (e.g.PDFs)
    • 31. What is the truth? The truth Structured data (e.g., ACM, DBLP) Evidence Unstructured data (e.g.PDFs)
    • 32. What is the truth? The truth Structured data (e.g., ACM, DBLP) EvidenceUser edits Unstructured data (e.g.PDFs)
    • 33. What is the truth? TheInfered by MCMC truth Structured data IE models (e.g., ACM, DBLP) (e.g., CRFs) Evidence User edits Unstructured data (e.g.PDFs)
    • 34. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”
    • 35. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira”
    • 36. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
    • 37. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Institution: Google “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
    • 38. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
    • 39. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
    • 40. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira”
    • 41. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: NLP must-link Topics: MPEG
    • 42. Human Edits: Mentions Added to DB First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingName: Fernando Pereira Name: Fernando PereiraInstitution: Google Institution: U. Edinburgh Name: Fernando Pereira Name: Fernando Pereira Topics: CRF Topics: Prolog Name: Fernando Pereira Name: Fernando Pereira Topics: NLP Topics: MPEG
    • 43. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingName: Fernando Pereira Name: Fernando PereiraInstitution: Google Institution: U. EdinburghName: Fernando Pereira Name: Fernando PereiraTopics: CRF Topics: PrologName: Fernando PereiraTopics: NLP Name: Fernando Pereira Topics: MPEG
    • 44. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming
    • 45. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
    • 46. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? YES Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
    • 47. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP
    • 48. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref?
    • 49. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
    • 50. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
    • 51. Experiments1. Build initial KB with automatic coreference
    • 52. Experiments1. Build initial KB with automatic coreference
    • 53. Experiments1. Build initial KB with automatic coreference2. Simulate user edits good edit bad edit must-link must-link
    • 54. Experiments1. Build initial KB with automatic coreference2. Simulate user edits good edit bad edit must-link must-link3. Apply edits: our probabilistic vs two deterministic approaches
    • 55. Hierarchical + Human Edits Better incorporation of correct human edits Database quality versus the number of correct human edits Edit incorporation strategy Our probabilistic 0.80 Epistemological (probabilistic) Overwrite Maximally satisfy reasoning 0.75 0.70F1 accuracy Local 0.65 satisfaction 0.60 Traditional 0.55 Overwrite 0 5 10 15 20 25 30 No. of human edits
    • 56. Hierarchical + Human Edits More robust to incorrect human edits Database quality versus the number of errorful human edits Our probabilistic Edit incorporation strategy 0.8 Epistemological (probabilistic) reasoning Complete trust in users 0.7 0.6Precision 0.5 Complete trust in humans 0.4 0 10 20 30 40 50 60
    • 57. Come see our poster!• Technical details including - Hierarchical CRF for coreference - MCMC for inference• Probabilistic incorporation of human edits• Epistemological Databases THANK YOU

    ×