Facilitating Human Intervention in
Coreference Resolution with
Comparative Entity Summaries
Danyun Xu, Gong Cheng, Yuzhong...
Coreference resolution
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
isDi...
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Candidate core...
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Candidate core...
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Candidate core...
Present entire entity descriptions?
Present a compact comparative summary!
givenName: “Tim”
surname: “Berners-Lee”
isDirectorOf: W3C
name: “Tim Berners-Lee”
i...
Present a compact comparative summary!
Which property-value (PV) pairs
are more helpful?
Four aspects of a good comparative summary
1. Reflecting commonality
2. Reflecting difference
3. Providing information on ...
1. Commonality
• Common PV pairs =
comparable properties + similar values
TimBL
givenName: “Tim”
surname: “Berners-Lee”
al...
1. Commonality
• Common PV pairs =
comparable properties + similar values
• More helpful properties =
more like an Inverse...
1. Commonality (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
C...
1. Commonality (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
•...
1. Commonality (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
•...
1. Commonality (weakness)
• Only reflecting commonality can be misleading.
TBL
name: “Tim Berners-Lee”
type: ComputerScien...
2. Difference
• Different PV pairs =
comparable properties + dissimilar values
TBL
name: “Tim Berners-Lee”
type: ComputerS...
2. Difference
• Different PV pairs =
comparable properties + dissimilar values
• More helpful properties =
more like a Fun...
2. Difference (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
• ...
3. Information on identity
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
...
3. Information on identity (details)
• Information on identity
• Estimated based on the data set
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 = 1 −
log 𝑁𝑢𝑚...
4. Diversity of information
• Overlapping PV pairs =
similar properties or similar values
TimBL
givenName: “Tim”
surname: ...
To find an optimal summary
(or, to find the most helpful PV pairs)
• Maximize
• Commonality
• Difference
• Information on ...
To find an optimal summary
(or, to find the most helpful PV pairs)
• Maximize
• Commonality
• Difference
• Information on ...
Evaluation method
• 4 approaches to be blindly tested
• 20 subjects (university students)
• 24 random tasks for each subje...
Data sets and tasks
• Data sets
Places
Films
Data sets and tasks
• Data sets
• Tasks
http://dbpedia.org/resource/Paris,_Texas
http://dbpedia.org/resource/Paris
http://...
Data sets and tasks
• Data sets
• Tasks
Paris
http://dbpedia.org/resource/Paris,_Texas
http://dbpedia.org/resource/Paris
h...
Approaches
Approach Description
NOSUMM Present entire entity descriptions
GENERIC • Information on identity [3]
• Diversit...
Results (1)
• Accuracy of verification
• COMPSUMM ≈ NOSUMM
> COMPSUMM-C
> GENERIC
Results (2)
• Efficiency of verification
• COMPSUMM > NOSUMM (2.7—2.9 times faster)
Take-home messages
• Provide entity summaries for verifying coreference.
• improves efficiency (2.7—2.9 times faster)
• wi...
Future work
• Present = Summarize + Visualize
Candidate coreferent entities
…
TimBL ------ Wendy
TimBL ------ TBL
ChrisB -...
Thanks for your attention
Results (3)
• Erroneous decisions
• COMPSUMM-C > COMPSUMM (mostly in negative cases)
Performance testing
• Offline computation
• Comparability between properties (the learning part)
• Likeness to an IFP/FP
•...
Performance testing
• Offline computation
• Comparability between properties (the learning part)
• Likeness to an IFP/FP
•...
Upcoming SlideShare
Loading in …5
×

Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

818 views
683 views

Published on

Presented at ESWC2014, Crete, Greece.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
818
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

  1. 1. Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries Danyun Xu, Gong Cheng, Yuzhong Qu Nanjing University, China Presented at ESWC 2014, Crete, Greece
  2. 2. Coreference resolution TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  3. 3. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go
  4. 4. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify
  5. 5. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify Existing focus
  6. 6. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify Our focus
  7. 7. Present entire entity descriptions?
  8. 8. Present a compact comparative summary! givenName: “Tim” surname: “Berners-Lee” isDirectorOf: W3C name: “Tim Berners-Lee” invented: WWW
  9. 9. Present a compact comparative summary! Which property-value (PV) pairs are more helpful?
  10. 10. Four aspects of a good comparative summary 1. Reflecting commonality 2. Reflecting difference 3. Providing information on identity 4. Providing diverse information
  11. 11. 1. Commonality • Common PV pairs = comparable properties + similar values TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI
  12. 12. 1. Commonality • Common PV pairs = comparable properties + similar values • More helpful properties = more like an Inverse Functional Property (IFP) TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI
  13. 13. 1. Commonality (details) • Comparability between properties • Learned from known coreferent entities • String similarity Comparable properties = Properties having similar values
  14. 14. 1. Commonality (details) • Comparability between properties • Learned from known coreferent entities • String similarity • Similarity between values • String similarity Comparable properties = Properties having similar values
  15. 15. 1. Commonality (details) • Comparability between properties • Learned from known coreferent entities • String similarity • Similarity between values • String similarity • Likeness to an IFP • Estimated based on the data set 𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑣𝑎𝑙𝑢𝑒𝑠 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 Comparable properties = Properties having similar values
  16. 16. 1. Commonality (weakness) • Only reflecting commonality can be misleading. TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  17. 17. 2. Difference • Different PV pairs = comparable properties + dissimilar values TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  18. 18. 2. Difference • Different PV pairs = comparable properties + dissimilar values • More helpful properties = more like a Functional Property (FP) TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  19. 19. 2. Difference (details) • Comparability between properties • Learned from known coreferent entities • String similarity • Dissimilarity between values • String similarity • Likeness to a FP • Estimated based on the data set 𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
  20. 20. 3. Information on identity TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI
  21. 21. 3. Information on identity (details) • Information on identity • Estimated based on the data set 𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 = 1 − log 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 ℎ𝑎𝑣𝑖𝑛𝑔 𝑡ℎ𝑖𝑠 𝑃𝑉 𝑝𝑎𝑖𝑟 log 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
  22. 22. 4. Diversity of information • Overlapping PV pairs = similar properties or similar values TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C Overlapping
  23. 23. To find an optimal summary (or, to find the most helpful PV pairs) • Maximize • Commonality • Difference • Information on identity • Diversity of information • Subject to • A length limit
  24. 24. To find an optimal summary (or, to find the most helpful PV pairs) • Maximize • Commonality • Difference • Information on identity • Diversity of information • Subject to • A length limit • Formulated as a binary quadratic knapsack problem • Solved by GRASP-based local search
  25. 25. Evaluation method • 4 approaches to be blindly tested • 20 subjects (university students) • 24 random tasks for each subject • 4 approaches * (3 positive cases + 3 negative cases) • Sorted in random order givenName: “Tim” surname: “Berners-Lee” isDirectorOf: W3C name: “Tim Berners-Lee” invented: WWW Entity summary Subject Coreferent Non-coreferent Not sure Present Verify
  26. 26. Data sets and tasks • Data sets Places Films
  27. 27. Data sets and tasks • Data sets • Tasks http://dbpedia.org/resource/Paris,_Texas http://dbpedia.org/resource/Paris http://sws.geonames.org/4717560/ http://sws.geonames.org/2988507/ sameAs (positive case) sameAs (positive case) Places Films
  28. 28. Data sets and tasks • Data sets • Tasks Paris http://dbpedia.org/resource/Paris,_Texas http://dbpedia.org/resource/Paris http://sws.geonames.org/4717560/ http://sws.geonames.org/2988507/ disambiguates sameAs (positive case) sameAs (positive case) (negative cases) Places Films
  29. 29. Approaches Approach Description NOSUMM Present entire entity descriptions GENERIC • Information on identity [3] • Diversity of information COMPSUMM • Commonality • Difference • Information on identity • Diversity of information COMPSUMM-C • Commonality • Difference • Information on identity • Diversity of information [3] Gong Cheng et al. RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization (ISWC 2011)
  30. 30. Results (1) • Accuracy of verification • COMPSUMM ≈ NOSUMM > COMPSUMM-C > GENERIC
  31. 31. Results (2) • Efficiency of verification • COMPSUMM > NOSUMM (2.7—2.9 times faster)
  32. 32. Take-home messages • Provide entity summaries for verifying coreference. • improves efficiency (2.7—2.9 times faster) • without notably affecting accuracy • Provide comparative (but not just generic) summaries. • Show both commonality and difference.
  33. 33. Future work • Present = Summarize + Visualize Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify Our focus
  34. 34. Thanks for your attention
  35. 35. Results (3) • Erroneous decisions • COMPSUMM-C > COMPSUMM (mostly in negative cases)
  36. 36. Performance testing • Offline computation • Comparability between properties (the learning part) • Likeness to an IFP/FP • Information on identity
  37. 37. Performance testing • Offline computation • Comparability between properties (the learning part) • Likeness to an IFP/FP • Information on identity • Online computation • Similarity between properties/values • Optimization • Results • Places (DBpedia and GeoNames): 24ms per case • Films (DBpedia and LinkedMDB): 35ms per case

×