Successfully reported this slideshow.
Human Machine Cooperation: User Corrections for AKBC Michael Wick, Karl Schultz, Andrew McCallum     University of Massach...
Motivation• KBs for real-world decision making• Problem: data needs integration • AKBC/IE is scalable, but inaccurate • Hu...
Goal: build a database of every scientist in the world.
Knowledge Base Construction .pdf  Text   Text .bib  docs                                               Structured   docs.h...
KB Coreference Errors       First: Fernando                    First: Fernando       Last: Pereira                      La...
KB Coreference Errors       First: Fernando                     First: Fernando       Last: Pereira                       ...
KB Coreference Errors       First: Fernando                               First: Fernando       Last: Pereira             ...
KB Coreference Errors       First: Fernando                               First: Fernando       Last: Pereira             ...
Human Edits to Coreference    “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”
Human Edits to Coreference       “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 i...
Human Edits to Coreference       “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 i...
How should these edits be managed?
Edits to CoreferenceKB with coref errors
Edits to CoreferenceKB with coref errors
Edits to CoreferenceKB with coref errorsStream of user edits          good edit      bad edit                          mus...
Edits to CoreferenceKB with coref errorsStream of user edits          good edit                   bad edit                ...
Strategy 1: Most recent edit gets priority Edit 1: good edit      Edit 2: bad edit                                    must...
Strategy 1: Most recent edit gets priority Edit 1: good edit      Edit 2: bad edit                                    must...
Strategy 1: Most recent edit gets priority Edit 1: good edit      Edit 2: bad edit                                    must...
Strategy 1: Most recent edit gets priority Edit 1: good edit      Edit 2: bad edit                                    must...
Strategy 1: Most recent edit gets priority Edit 1: good edit      Edit 2: bad edit                                    must...
Strategy 1: Most recent edit gets priority Edit 1: good edit      Edit 2: bad edit                                    must...
Strategy 2: Deterministic integration of edits Edit 1: good edit             Edit 2: bad edit                             ...
Strategy 2: Deterministic integration of edits Edit 1: good edit             Edit 2: bad edit                             ...
How should edits be managed?
How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” ...
How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” ...
What is the truth?
What is the truth?       The      truth     Evidence
What is the truth?        The       truth     Evidence    Unstructured data (e.g.PDFs)
What is the truth?        The       truth     Structured data                (e.g., ACM, DBLP)     Evidence    Unstructure...
What is the truth?            The           truth      Structured data                     (e.g., ACM, DBLP)          Evid...
What is the truth?                        TheInfered by MCMC                       truth      Structured data  IE models  ...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”  Name: Fernando Pereira  ...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”  Name: Fernando Pereira  ...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”  Name: Fernando Pereira  ...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”  Name: Fernando Pereira  ...
Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”  Name: Fernando Pereira  ...
Human Edits: Mentions         Added to DB        First: Fernando                               First: Fernando        Last...
Human Edits:     Perform Coreference     First: Fernando                    First: Fernando     Last: Pereira             ...
Human Edits:Perform CoreferenceFirst: Fernando             First: FernandoLast: Pereira               Last: PereiraInstitu...
Human Edits:Perform CoreferenceFirst: Fernando                               First: FernandoLast: Pereira                 ...
Human Edits:Perform CoreferenceFirst: Fernando                               First: FernandoLast: Pereira                 ...
Incorrect editFirst: Fernando             First: FernandoLast: Pereira               Last: PereiraInstitution: Google,UPen...
Incorrect editFirst: Fernando                 First: FernandoLast: Pereira                   Last: PereiraInstitution: Goo...
Incorrect editFirst: Fernando                                  First: FernandoLast: Pereira                               ...
Incorrect editFirst: Fernando                                  First: FernandoLast: Pereira                               ...
Experiments1. Build initial KB with automatic coreference
Experiments1. Build initial KB with automatic coreference
Experiments1. Build initial KB with automatic coreference2. Simulate user edits           good edit                    bad...
Experiments1. Build initial KB with automatic coreference2. Simulate user edits           good edit                       ...
Hierarchical + Human Edits                     Better incorporation of correct human edits                                ...
Hierarchical + Human Edits                       More robust to incorrect human edits                        Database qual...
Come see our poster!• Technical details including - Hierarchical CRF for coreference - MCMC for inference• Probabilistic i...
Michael Wick - Human Machine Cooperation: User Corrections for AKBC
Upcoming SlideShare
Loading in …5
×

Michael Wick - Human Machine Cooperation: User Corrections for AKBC

706 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Michael Wick - Human Machine Cooperation: User Corrections for AKBC

  1. 1. Human Machine Cooperation: User Corrections for AKBC Michael Wick, Karl Schultz, Andrew McCallum University of Massachusetts, Amherst.
  2. 2. Motivation• KBs for real-world decision making• Problem: data needs integration • AKBC/IE is scalable, but inaccurate • Humans are more accurate, lack coverage• Question: how do we combine human and machine KBC?
  3. 3. Goal: build a database of every scientist in the world.
  4. 4. Knowledge Base Construction .pdf Text Text .bib docs Structured docs.html Data query Entity Relation Entities, Mentions Mentions Relations Entity Relation Resolution KBExtraction Extraction (Coref) Wei Li Attends( Wei Li W. Li Wei Li, W. Li Xinghua U. Xinghua U.) Xinghua U. “truth” answer Problem: (1) errors snowball in IE pipeline (2) errors persist in DB - forever
  5. 5. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3id=1
  6. 6. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref?id=1
  7. 7. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOWid=1
  8. 8. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingid=5 id=3 Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOWid=1
  9. 9. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”
  10. 10. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1”
  11. 11. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1” “Fernando Pereira with id=5 is Fernando Pereira with id=4”
  12. 12. How should these edits be managed?
  13. 13. Edits to CoreferenceKB with coref errors
  14. 14. Edits to CoreferenceKB with coref errors
  15. 15. Edits to CoreferenceKB with coref errorsStream of user edits good edit bad edit must-link must-link
  16. 16. Edits to CoreferenceKB with coref errorsStream of user edits good edit bad edit must-link must-linkIncorporate edits: how do we resolve conflicts?
  17. 17. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  18. 18. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  19. 19. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  20. 20. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  21. 21. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  22. 22. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  23. 23. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
  24. 24. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
  25. 25. How should edits be managed?
  26. 26. How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time
  27. 27. How should edits be managed?• User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time• Our approach: • edits as statistical evidence • “truth” inferred from evidence
  28. 28. What is the truth?
  29. 29. What is the truth? The truth Evidence
  30. 30. What is the truth? The truth Evidence Unstructured data (e.g.PDFs)
  31. 31. What is the truth? The truth Structured data (e.g., ACM, DBLP) Evidence Unstructured data (e.g.PDFs)
  32. 32. What is the truth? The truth Structured data (e.g., ACM, DBLP) EvidenceUser edits Unstructured data (e.g.PDFs)
  33. 33. What is the truth? TheInfered by MCMC truth Structured data IE models (e.g., ACM, DBLP) (e.g., CRFs) Evidence User edits Unstructured data (e.g.PDFs)
  34. 34. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”
  35. 35. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira”
  36. 36. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  37. 37. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Institution: Google “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  38. 38. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  39. 39. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  40. 40. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  41. 41. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: NLP must-link Topics: MPEG
  42. 42. Human Edits: Mentions Added to DB First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingName: Fernando Pereira Name: Fernando PereiraInstitution: Google Institution: U. Edinburgh Name: Fernando Pereira Name: Fernando Pereira Topics: CRF Topics: Prolog Name: Fernando Pereira Name: Fernando Pereira Topics: NLP Topics: MPEG
  43. 43. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programmingName: Fernando Pereira Name: Fernando PereiraInstitution: Google Institution: U. EdinburghName: Fernando Pereira Name: Fernando PereiraTopics: CRF Topics: PrologName: Fernando PereiraTopics: NLP Name: Fernando Pereira Topics: MPEG
  44. 44. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming
  45. 45. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
  46. 46. Human Edits:Perform CoreferenceFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: U. Edinburgh, SRITopics: CRF, IE, NLP Topics: logic programming,Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? YES Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
  47. 47. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP
  48. 48. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref?
  49. 49. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
  50. 50. Incorrect editFirst: Fernando First: FernandoLast: Pereira Last: PereiraInstitution: Google,UPenn Institution: Superior TecnicTopics: CRF, IE, NLP Topics: MPEGVenues: ICML, NIPS, EMNLP Venues: ICIP Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
  51. 51. Experiments1. Build initial KB with automatic coreference
  52. 52. Experiments1. Build initial KB with automatic coreference
  53. 53. Experiments1. Build initial KB with automatic coreference2. Simulate user edits good edit bad edit must-link must-link
  54. 54. Experiments1. Build initial KB with automatic coreference2. Simulate user edits good edit bad edit must-link must-link3. Apply edits: our probabilistic vs two deterministic approaches
  55. 55. Hierarchical + Human Edits Better incorporation of correct human edits Database quality versus the number of correct human edits Edit incorporation strategy Our probabilistic 0.80 Epistemological (probabilistic) Overwrite Maximally satisfy reasoning 0.75 0.70F1 accuracy Local 0.65 satisfaction 0.60 Traditional 0.55 Overwrite 0 5 10 15 20 25 30 No. of human edits
  56. 56. Hierarchical + Human Edits More robust to incorrect human edits Database quality versus the number of errorful human edits Our probabilistic Edit incorporation strategy 0.8 Epistemological (probabilistic) reasoning Complete trust in users 0.7 0.6Precision 0.5 Complete trust in humans 0.4 0 10 20 30 40 50 60
  57. 57. Come see our poster!• Technical details including - Hierarchical CRF for coreference - MCMC for inference• Probabilistic incorporation of human edits• Epistemological Databases THANK YOU

×