Coreference Resolution

1,738 views

Published on

Coreference Resolution presentation by Shumin Wu and Nicolas Nicolov of J.D. Power and Associates

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,738
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
66
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 100648 mentions 67038 coref groups longest mention: http://images.myride.com/images/non-vehicle/Misc/Shows/2007%20Sema/Jeep/Wrangler/OEM/JeepWranglerUltimatewith392HEMIfromMopar2_ (544x408).jpg CookingMethod 8 CameraFeature 9575 Vehicles.SUVs 2347 FoodFeature 348 Person 13505 Vehicles.Cars 10258 Units.Money 986 FoodPart 13 Vehicles.Trucks 467 Location 245 Units 1192 Time.Date 103 CameraPart 3240 Time.Month 171 Time.DaysOfTheWeek 49 Food 801 Marketing 124 Time.Year 1443 CookingTool 3 CarFeature 6119 Time.OClock 21 CarPart 15478 Vehicles 905 GeoPolitical 138 Units.Rate 764 GeoPolitical.City 360 Beverage 82 Time 924 Organization 8206 CameraAccessory 352 Meal 28 Units.Age 57 Facility 1052 Descriptor 13087 GeoPolitical.USStates 127 GeoPolitical.Countries 316 Time.Duration 540 Camera 5197 nutrient 247 GeoPolitical.Nationalities 195
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 02/21/10 23:38 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • Coreference Resolution

    1. 1. Corefrence Resolution A Machine Learning Approach nicolas_ nicolov @ jdpa .com shumin. wu @ colorado .edu Shumin Wu Ph.D. Candidate in Computer Science University of Colorado at Boulder The Center for Spoken Language Research  1777 Exposition Drive Boulder, Colorado 80301, U.S.A. Nicolas Nicolov Senior Director, Science J.D. Power and Associates, McGraw-Hill Web Intelligence Division 4888 Pearl East Circle Boulder, CO 80301, U.S.A.
    2. 2. <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 (Message Understanding Conference) F-measure </li></ul></ul><ul><ul><li>B 3 (Bagga, Baldwin, Biermann) </li></ul></ul><ul><ul><li>CEAF (Constrained Entity-Alignment F-Measure) </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning </li></ul></ul>Outline
    3. 3. Coreference Audi is an automaker that makes luxury cars and SUVs. The company was born in Germany . It was established by August Horch in 1910. Horch had previosly founded another company and his models were quite popular. Audi started with four cylinder models. By 1914, Horch 's new cars were racing and winning. August Horch left the Audi company in 1920 to take a position as an industry representative for the German motor vehicle industry federation. Currently Audi is a subsidiary of the Volkswagen group and produces cars of outstanding quality.
    4. 4. <ul><li>Sentiment Analysis (SA) </li></ul><ul><ul><li>Use coreference resolution to find sentiment elements of “Audi” the company vs. German auto industry. </li></ul></ul><ul><li>Search/Question Answering (QA) </li></ul><ul><ul><li>Query for bio of “Jim Martin” the computer scientist vs. “Jim Martin” the politician </li></ul></ul><ul><li>Machine Translation (MT) </li></ul><ul><ul><li>Chinese zero anaphora resolution : </li></ul></ul><ul><ul><li>看了很多相机后, [ 我 ] 买了个松下, 因为 [ 它 ] 镜头好。 </li></ul></ul><ul><ul><li>After looking at many cameras, I bought a Panasonic, because it has a good lens. </li></ul></ul>Coreference Applications
    5. 5. MUC6 F-measure a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: Count the number of corresponding links between mentions Precision = 4/5 Recall = 4/6 F-measure = 2* Precision * Recall/( Precision + Recall ) = 0.727
    6. 6. MUC6 F-measure Degenerate Case a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: Precision = N/A Recall = 0 F-measure = N/A a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Discounts single mention entities All mentions form individual singleton entities.
    7. 7. MUC6 F-measure Degenerate Case 2 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: Precision = 6/8 Recall = 1 F-measure = 0.857 !!! a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Does not adequately penalize dense links. All mentions form one big entity.
    8. 8. Roadmap <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 (Message Understanding Conference) F-measure </li></ul></ul><ul><ul><li>B 3 (Bagga, Baldwin, Biermann) </li></ul></ul><ul><ul><li>CEAF (Constrained Entity-Alignment F-Measure) </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning (Robust Risk Minimization) </li></ul></ul>
    9. 9. B 3 F-measure a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: For each mention, compute the proportion of corresponding mentions between reference and system entity. Precision =1/9*( Recall = 1/9*( F-measure = 0.760 3/3 +3/3 +3/3 +1/3 +2/3 +2/3 +1/1 +2/2 +2/2 ) = 0.852 3/4 +3/4 +3/4 +1/4 +2/2 +2/2 +1/3 +2/3 +2/3 ) = 0.685
    10. 10. B 3 F-measure Degenerate Case a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: Precision = 1 Recall = 1/3 F-measure = 0.5 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 All mentions form individual singleton entities.
    11. 11. B 3 F-measure Degenerate Case 2 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: Precision = 1/9*(4/9+4/9+4/9+4/9+2/9+2/9+3/9+3/9+3/9) = 0.358 Recall = 1 F-measure = 0.527 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Which system entity maps to which reference entity? All mentions form one big entity.
    12. 12. Roadmap <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 (Message Understanding Conference) F-measure </li></ul></ul><ul><ul><li>B 3 (Bagga, Baldwin, Biermann) </li></ul></ul><ul><ul><li>CEAF (Constrained Entity-Alignment F-Measure) </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning (Robust Risk Minimization) </li></ul></ul>
    13. 13. CEAF a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: Find the one-to-one entity mapping between reference (R) and system (S) maximizing similarity measure
    14. 14. CEAF Degenerate Case a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 All mentions form individual singleton entities.
    15. 15. CEAF Degenerate Case 2 a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 Reference: System output: a 1 a 2 a 3 a 4 b 1 b 2 c 1 c 2 c 3 All mentions form one big entity.
    16. 16. Performance Measures Summary <ul><li>MUC6 F-measure </li></ul><ul><ul><li>Ignores single mention entities. </li></ul></ul><ul><ul><li>Potentially biased toward large clusters. </li></ul></ul><ul><ul><li>No one-to-one entity mapping guarantee. </li></ul></ul><ul><li>B 3 </li></ul><ul><ul><li>Set view of mentions in an entity. </li></ul></ul><ul><ul><li>Based on number of corresponding mentions between entities averaged over total number of mentions. </li></ul></ul><ul><ul><li>Does not provide one-to-one entity mapping. </li></ul></ul><ul><li>CEAF </li></ul><ul><ul><li>One-to-one entity mapping. </li></ul></ul><ul><ul><li>Optimal mapping can be tuned to a different similarity measure. </li></ul></ul>
    17. 17. Roadmap <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 </li></ul></ul><ul><ul><li>B 3 </li></ul></ul><ul><ul><li>CEAF </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning (Robust Risk Minimization) </li></ul></ul>
    18. 18. ICWSM JDPA Coreference Corpus <ul><li>The JDPA Corpus consists of user-generated content (blog posts) containing opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Mentions of each entity are marked as co-referential. Mentions are assigned semantic types consisting of the Automatic Content Extraction (ACE) mention types and additional domain-specific types. Meronymy (part-of and feature-of) and instance relations are also annotated. Expressions which convey sentiment toward an entity are annotated with the polarity of their prior and contextual sentiments as well the mentions they target. The following modifiers are annotated. These may target other modifiers or sentiment expressions: </li></ul><ul><ul><li>negators (expressions which invert the polarity of a sentiment expression or modifier) </li></ul></ul><ul><ul><li>neutralizers (expressions that do not commit the speaker to the truth of the target sentiment expression or modifier) </li></ul></ul><ul><ul><li>committers (expressions which shift the commitment of the speaker toward the truth a sentiment expression or modifier) </li></ul></ul><ul><ul><li>intensifiers (expressions which shift the intensity of a sentiment expression or modifier) </li></ul></ul><ul><li>Additionally, we have annotated when the opinion holder of a sentiment expression is someone other than the author of the blog by linking the expression to the holder. We also annotate when two entities are compared on a particular dimension. </li></ul>
    19. 19. ICWSM JDPA Corpus: Mention Types <ul><li>Person, Organization, Location </li></ul><ul><li>GeoPolitical (Countries, USStates, Nationalities, City) </li></ul><ul><li>Time (Year, Month, Date, Duration, Days, OClock) </li></ul><ul><li>Units (Money, Age) </li></ul><ul><li>Vehicles (Cars, SUVs, Trucks) </li></ul><ul><li>CarPart, CarFeature </li></ul><ul><li>Camera (Part, Feature, Accessory) </li></ul><ul><li>Meal, Food, Beverage, FoodFeature, FoodPart, CookingMethod, Marketing, CookingTool </li></ul><ul><li>Descriptor </li></ul>
    20. 20.
    21. 21. ICWSM JDPA Corpus Statistics <ul><li>Mentions: 100,648 </li></ul><ul><li>Entities: 67,038 </li></ul>mentions entities with x mentions in them *Entity x DocId. “Audi” in doc5 and in doc7 are considered different entities. Mention type Entities* Person 13,505 Organization 8,206 Location 245 City 360 US State 127 Country 316 Nationality 195 Facility 1,052 Vehicles 13,977 CarPart 15,478 CarFeature 6,119 Camera 5,197 CameraFeature 9,575 CameraPart 3,240 CameraAccessory 352 Descriptor 13,087 … …
    22. 22. Roadmap <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 </li></ul></ul><ul><ul><li>B 3 </li></ul></ul><ul><ul><li>CEAF </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning (Robust Risk Minimization) </li></ul></ul>
    23. 23. Coreference Approaches <ul><li>Build mention-pair model </li></ul><ul><ul><li>Select mention-pair features: </li></ul></ul><ul><ul><ul><li>Lexical match </li></ul></ul></ul><ul><ul><ul><li>Distance between mentions </li></ul></ul></ul><ul><ul><ul><li>Syntactic features </li></ul></ul></ul><ul><ul><li>Heuristics </li></ul></ul><ul><ul><li>Machine learning </li></ul></ul><ul><ul><ul><li>Classifiers: MaxEnt, RRM, SVM, etc. </li></ul></ul></ul><ul><li>Cluster compatible entities, mentions </li></ul><ul><ul><li>Greedy clustering: </li></ul></ul><ul><ul><ul><li>Forward, reverse direction, sentence to document </li></ul></ul></ul><ul><ul><li>BellTree </li></ul></ul>
    24. 24. Mention Pair Features Considered <ul><li>Lexical </li></ul><ul><ul><li>String match: exact, left or right substring </li></ul></ul><ul><ul><li>Acronym (GM & General Motors) </li></ul></ul><ul><ul><li>Edit distance (Toyota & Toyoda) </li></ul></ul><ul><ul><li>Lemma: words of the entire mention, the head noun, and determiner (if present) </li></ul></ul><ul><ul><li>Capitalization: whole word or only first letter </li></ul></ul><ul><ul><li>Number: whether NP is a number or starts with a number </li></ul></ul><ul><li>Distance-based </li></ul><ul><ul><li>Word distance: number of in-between words </li></ul></ul><ul><ul><li>Sentence distance: number of in-between sentences </li></ul></ul><ul><ul><li>Mention distance: number of in-between mentions </li></ul></ul><ul><li>Syntax-based </li></ul><ul><ul><li>Part-of-speech tag of the mention head </li></ul></ul><ul><li>Pronoun </li></ul><ul><ul><li>Gender: masculine, femine, neuter </li></ul></ul><ul><ul><li>Number: singular, plural </li></ul></ul><ul><ul><li>Personal: first, second, third </li></ul></ul><ul><ul><li>Possessive </li></ul></ul><ul><ul><li>Reflexive </li></ul></ul>
    25. 25. Roadmap <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 </li></ul></ul><ul><ul><li>B 3 </li></ul></ul><ul><ul><li>CEAF </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning (Robust Risk Minimization) </li></ul></ul>
    26. 26. Coreference Heuristics <ul><li>Compatible mentions: </li></ul><ul><ul><li>Exact string match of capitalized mentions </li></ul></ul><ul><ul><ul><li>“ Audi” & “Audi” </li></ul></ul></ul><ul><ul><li>Exact string match of mentions within a sentence </li></ul></ul><ul><ul><ul><li>“ car” & “car” </li></ul></ul></ul><ul><ul><li>Acronyms </li></ul></ul><ul><ul><ul><li>“ GM” & “General Motor” </li></ul></ul></ul><ul><ul><li>First person pronoun </li></ul></ul><ul><ul><ul><li>“ I” & “me” </li></ul></ul></ul><ul><ul><li>Second person pronoun </li></ul></ul><ul><ul><ul><li>“ you” & “yours” </li></ul></ul></ul><ul><ul><li>Third person pronoun with gender, number agreement in the same sentence </li></ul></ul><ul><ul><ul><li>“ he” & “him”, “she” & “her” </li></ul></ul></ul><ul><ul><li>Non-numeral mentions with editing distance < 15% of the length of the mention </li></ul></ul><ul><ul><ul><li>“ engine” & “eengine”, “Toyota” & “Toyoda” </li></ul></ul></ul><ul><li>Incompatible mentions: </li></ul><ul><ul><li>Different acronyms </li></ul></ul><ul><ul><ul><li>“ GM” & “BMW” </li></ul></ul></ul><ul><ul><li>Personal, gender, number disagreement </li></ul></ul><ul><ul><ul><li>“ I” & “you”, “he” & “she”, “car” & “cars” </li></ul></ul></ul>
    27. 27. Heuristic System: Results Order of clustering (local (mentions within sentence) to global, forward, and reverse direction) did not alter our results. Configuration MUC-F MUC-P MUC-R B 3 -F B 3 -P B 3 -R Unlinked entities -- -- 0 71.8 100 56.0 Single entity (w/ all mentions) 61.4 44.3 100 8.6 4.5 100 w/ editing distant 66.2 64.0 68.5 78.6 76.8 80.4 editing dist. at sentence level 64.8 63.7 66.0 78.7 77.8 79.4 w/o editing distant 70.0 75.8 64.9 83.0 86.8 79.5 Edit dist + cardinality match 70.4 78.7 63.7 83.7 89.2 78.8
    28. 28. Roadmap <ul><li>What is Coreference? </li></ul><ul><li>Coreference Performance Measures </li></ul><ul><ul><li>MUC6 </li></ul></ul><ul><ul><li>B 3 </li></ul></ul><ul><ul><li>CEAF </li></ul></ul><ul><li>ICWSM JDPA Corpus </li></ul><ul><li>Approaches </li></ul><ul><ul><li>Heuristic </li></ul></ul><ul><ul><li>Machine Learning (Robust Risk Minimization) </li></ul></ul>
    29. 29. Machine Learning Approach <ul><li>Feature value </li></ul><ul><ul><li>Converted to binary (quantize scalar values) </li></ul></ul><ul><li>Training sample selection </li></ul><ul><ul><li>Positive samples formed with pairs of consecutive coreferent mentions, negative samples formed using any mentions between consecutive coreferent mentions </li></ul></ul><ul><ul><li>All mention pairs within a small window (a few sentences) </li></ul></ul><ul><li>Robust Risk Minimization (Generalized Winnow) </li></ul><ul><ul><li>Linear classifier </li></ul></ul><ul><ul><li>Multiplicative weight update (quickly discounts irrelevant features after a few iterations) </li></ul></ul><ul><ul><li>Class margin ( m ) can be converted to probability: </li></ul></ul>
    30. 30. Robust Risk Minimization (RRM) RRM was proposed by Tong Zhang; best CoNLL’03 chunker. Separate positive and negative weights Multiplicative weight update
    31. 31. Bell Tree Input sequence: a 1 , b 1 , b 2 , a 2 … Lots of states in the search space to explore! [ a 1 ] [a 1 , b 1 ] [a 1 ][ b 1 ] [a 1 ,b 1 , b 2 ] [a 1 ,b 1 ][ b 2 ] [a 1 , b 2 ][b 1 ] [a 1 ][b 1 , b 2 ] [a 1 ][b 1 ][ b 2 ] [a 1 ,b 1 ,b 2 , a 2 ] [a 1 ,b 1 ,b 2 ][ a 2 ] [a 1 , a 2 ] [b 1 ,b 2 ] [a 1 ][b 1 ,b 2 , a 2 ] [a 1 ][b 1 ,b 2 ][ a 2 ] [a 1 ,b 1 ][b 2 ][ a 2 ] [a 1 ,b 1 , a 2 ][b 2 ] [a 1 ,b 1 ][b 2 , a 2 ] [a 1 ,b 2 , a 2 ][b 1 ] [a 1 ,b 2 ][b 1 , a 2 ] [a 1 ,b 2 ][b 1 ][ a 2 ] [a 1 , a 2 ][b 1 ][b 2 ] [a 1 ][b 1 , a 2 ][b 2 ] [a 1 ][b 1 ][b 2 , a 2 ] [a 1 ][b 1 ][b 2 ] [ a 2 ]
    32. 32. Bell Tree Coreference Model <ul><li>Given an input sequence of mentions m 1 , m 2 ..., m k, ...m n , and: </li></ul><ul><ul><li>e t : entity </li></ul></ul><ul><ul><li>E k : set of partial entities containing mentions m 1 …m k </li></ul></ul><ul><ul><li>A k : index of entity which the next mention should merge with </li></ul></ul><ul><ul><li>L : binary ( 1 =link to existing entity, 0 =create entity) </li></ul></ul><ul><li>Define </li></ul><ul><li>link model (link mention to existing entity): </li></ul><ul><li>creation model (start a new entity): </li></ul>Entities assumed to be independent Link probability derived from the most probable mention pair Tunable to encourage or penalize entity creation
    33. 33. Bell Tree in Action Input sequence: a 1 , b 1 , b 2 , a 2 , c 1 … Coreference probability: [ a 1 ] p = 1 [a 1 , b 1 ] p=0.4 a 1 b 1 b 2 a 2 c 1 a 1 1 b 1 0.4 1 b 2 0.2 0.9 1 a 2 0.8 0.1 0.3 1 c 1 0.4 0.3 0.4 0.2 1 [a 1 ][ b 1 ] p=0.6 [a 1 ,b 1 , b 2 ] p=0.36 [a 1 ,b 1 ][ b 2 ] p=0.04 [a 1 , b 2 ][b 1 ] p=0.12 [a 1 ][b 1 , b 2 ] p=0.54 [a 1 ][b 1 ][ b 2 ] p=0.06 [a 1 ,b 1 ,b 2 , a 2 ] p=0.288 [a 1 ,b 1 ,b 2 ][ a 2 ] p=0.072 [a 1 , a 2 ] [b 1 ,b 2 ] p=0.432 [a 1 ][b 1 ,b 2 , a 2 ] p=0.162 [a 1 ][b 1 ,b 2 ][ a 2 ] p=0.108 [a 1 ,b 1 ,b 2 , a 2 , c 1 ] p=0.1152 [a 1 ,b 1 ,b 2 , a 2 ][ c 1 ] p=0.1728 [a 1 ,a 2, c 1 ][b 1 , b 2 ] p=0.1728 [a 1 ,a 2 ][b 1 , b 2, c 1 ] p=0.1728 [a 1 ,a 2 ][b 1 , b 2 ][ c 1 ] p=0.2592
    34. 34. Discussion <ul><li>How can coreference scoring measures be evaluated? </li></ul><ul><ul><li>Consistency: </li></ul></ul><ul><ul><ul><li>Does better score equate better human judgment of output? </li></ul></ul></ul><ul><ul><ul><li>Do all measures score higher for one set of output over another? </li></ul></ul></ul><ul><ul><li>Application specific: </li></ul></ul><ul><ul><ul><li>Does better score translate to better application performance (sentiment analysis, machine translation)? </li></ul></ul></ul><ul><li>Techniques for picking mention-pair training samples </li></ul><ul><ul><li>Cluster mention-pairs and pick minority class samples within each cluster. </li></ul></ul>
    35. 35. Future Work <ul><li>Coreference model </li></ul><ul><ul><li>Features </li></ul></ul><ul><ul><ul><li>Entity class type </li></ul></ul></ul><ul><ul><ul><li>Dependency and/or semantic role features for sentence level mentions: parse tree path, predicate, arguments </li></ul></ul></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><ul><li>Training sample selection: select mention pairs with discriminatory features </li></ul></ul></ul><ul><ul><ul><li>Multi-class: classify between mentions with strong compatibility indicators and mentions with weak compatibility indicators </li></ul></ul></ul><ul><ul><ul><li>Algorithms: SVM, Random Forest </li></ul></ul></ul><ul><li>Clustering </li></ul><ul><ul><li>Algorithm: SVM cluster , soft CSP </li></ul></ul><ul><ul><li>Different similarity metrics </li></ul></ul>
    36. 36. Acknowledgements <ul><li>Dr. Xiaoqiang Luo IBM T.J.Watson Research Center </li></ul><ul><li>Prof. Martha Palmer Univ. of Colorado </li></ul><ul><li>Prof. James Martin Univ. of Colorado </li></ul><ul><li>Jason Kessler J.D. Power and Associates </li></ul><ul><li>Dr. Miriam Eckert J.D. Power and Associates </li></ul>
    37. 37. References <ul><li>Amit Bagga & Breck Baldwin. 1998. Algorithms for Scoring Coreference Chains. 1 st International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference , pp. 563–566. </li></ul><ul><li>Dan Cristea & Oana Postolache. 2005. How to Deal with Wicked Anaphora. Anaphora Processing: Linguistic, Cognitive and Computational Modelling , ed. by A. Branco, T. McEnery & R. Mitkov, pp. 17-46. John Benjamins: Amsterdam & Philadelphia. </li></ul><ul><li>Thomas Finley & Thorsten Joachims. 2005. Supervised Clustering with Support Vector Machines. 22nd International Conference on Machine Learning ( ICML’05 ), pp. 217–224, New York, N.Y., U.S.A. ACM. </li></ul><ul><li>Xiaoqiang Luo, Abe Ittycheriah, Hongyan Jing, Nanda Kambhatla & Salim Roukos. 2004. A Mention-Synchronous Coreference Resolution Algorithm Based on the Bell Tree. 42nd Annual Meeting on Association for Computational Linguistics ( ACL’04 ), page 135, Morristown, N.J., U.S.A. ACL. </li></ul><ul><li>Xiaoqiang Luo. 2005. On Coreference Resolution Performance Metrics. Human Language Technology and Empirical Methods in Natural Language Processing ( HLT’05 ), pp. 25–32, Morristown, N.J., U.S.A. ACL. </li></ul><ul><li>Jason S. Kessler & Nicolas Nicolov. 2009. Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations. 3rd International AAAI Conference on Weblogs and Social Media ( ICWSM’09 ), San Jose, California, U.S.A. </li></ul><ul><li>Nicolas Nicolov. 2003. Book review: Anaphora Resolution” (R.Mitkov), IEEE Computational Intelligence Bulletin , Vol. 2, No. 1, pp. 31-32, June 2003. </li></ul><ul><li>Oana Postolache & Corina Forascu. 2004. A Coreference Model on Excerpts from a Novel. European Summer School in Logic Language and Information – ESSLLI'04 , pp. 202-213. Nancy, France. </li></ul><ul><li>Marc Vilain, John Burger, John Aberdeen, Dennis Connolly & Lynette Hirschman. 1995. A Model-Theoretic Coreference Scoring Scheme. 6th conference on Message understanding (MUC6 ’95), pp. 45–52, Morristown, N.J., U.S.A. ACL. </li></ul><ul><li>Tong Zhang, Fred Damerau & David Johnson. 2002. Text Chunking Based on a Generalization of Winnow. Journal of Machine Learning Research, 2:615–637. </li></ul>
    38. 38. Thank you!

    ×