The role of linguistic information for shallow language processing

2,755 views
2,621 views

Published on

Invited talk on at Knowledge Engineering: Principles and Techniques (KEPT2007), Cluj-Napoca, Romania, June 2007

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,755
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The role of linguistic information for shallow language processing

  1. 1. The Role of Linguistic Information for Shallow Language Processing Constantin Orasan Research Group in Computational Linguistics University of Wolverhampton http://www.wlv.ac.uk/~in6093/
  2. 2. <ul><li>We need to be able to process language automatically: </li></ul><ul><ul><li>To have better access to information </li></ul></ul><ul><ul><li>To interact better with computers </li></ul></ul><ul><ul><li>To have texts translated from one language to another </li></ul></ul><ul><li>… so why not replicate the way humans process language? </li></ul>
  3. 3. Process language in a similar manner to humans <ul><li>… “natural language systems must not simply understand the shallow surface meaning of language, but must also be able to understand the deeper implications and inferences that a user is likely to intend and is likely to take from language” (Waltz, 1982) </li></ul><ul><li>Also referred to as deep processing </li></ul>
  4. 4. Deep vs. shallow linguistic processing <ul><li>Deep processing: tries to build an elaborated representation of the document in order to “understand” and make inferences </li></ul><ul><li>Shallow processing: extracts bits of information which could be useful for the task (e.g. shallow surface meaning), but no attempt is made to understand the document </li></ul>
  5. 5. Purpose of this talk <ul><li>To show that deep processing has limited applicability </li></ul><ul><li>To show that it is possible to improve the performance of shallow methods by adding linguistic information </li></ul><ul><li>Text summarisation is taken as example </li></ul>
  6. 6. Structure <ul><li>Introduction </li></ul><ul><li>FRUMP </li></ul><ul><li>Shallow processing for automatic summarisation </li></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul>
  7. 7. Automatic summarisation <ul><li>Attempts to produce summaries using automatic means </li></ul><ul><li>Produces extracts: </li></ul><ul><ul><li>extract and rearrange </li></ul></ul><ul><ul><li>Uses units from the source as such </li></ul></ul><ul><li>Produces abstracts: </li></ul><ul><ul><li>understand and generate </li></ul></ul><ul><ul><li>Rewords the information in the source </li></ul></ul>
  8. 8. Automatic abstraction <ul><li>Many methods try to replicate the way humans produce summaries </li></ul><ul><li>Very popular in the 1980s because it fit the overall AI trend </li></ul><ul><li>The abstracts are quite good in terms of coherence and cohesion </li></ul><ul><li>Tend to keep the information in some intermediate format </li></ul>
  9. 9. FRUMP <ul><li>The most famous automatic abstracting system </li></ul><ul><li>Attempts to understand parts of the document </li></ul><ul><li>Uses 50 sketchy scripts </li></ul><ul><li>Discards information which is not relevant to the script </li></ul><ul><li>Words from the source are used to select the relevant script </li></ul>
  10. 10. Example of script <ul><li>The ARREST script: </li></ul><ul><ul><li>Police goes where the suspect is </li></ul></ul><ul><ul><li>There is optional fighting between the suspect and the police </li></ul></ul><ul><ul><li>The suspect is apprehended </li></ul></ul><ul><ul><li>The suspect is taken to a police station </li></ul></ul><ul><ul><li>The suspect is charged </li></ul></ul><ul><ul><li>The suspect is incarcerated or released on bond </li></ul></ul>
  11. 11. System organisation <ul><li>Relies on: </li></ul><ul><ul><li>a PREDICTOR which takes the current context and predicts next events </li></ul></ul><ul><ul><li>a SUBSTANTIATOR which verifies and flesh out the predictions </li></ul></ul><ul><li>If the PREDICTOR is wrong, it backtracks </li></ul><ul><li>The SUBSTANTIATOR relies on textual information and inferences </li></ul>
  12. 12. The output <ul><li>Example of summary: </li></ul><ul><ul><li>A bomb explosion in a Philippines Airlines jet has killed the person who planted the bomb and injured 3 people. </li></ul></ul><ul><li>The output can be in several languages </li></ul><ul><li>It is very coherent and brief </li></ul>
  13. 13. Limitations <ul><li>It works very well when it can understand the text, but … </li></ul><ul><li>Language is ambiguous so it is common to misunderstand a text (e.g. “Carter and Sadat embraced under a cherry tree in the White House garden, a symbolic gesture belying the differences between the two governments”  MEETING script) </li></ul>
  14. 14. Limitations (II) <ul><li>It can handle only scripts which are predefined </li></ul><ul><li>In can deal only with information which is encoded in the scripts </li></ul><ul><li>It can make inferences only about concepts it knows </li></ul><ul><li>… it is domain dependent and cannot be easily adapted to other domains </li></ul>
  15. 15. Limitations (III) <ul><li>sometimes it can misunderstand some scripts with funny results: </li></ul><ul><li>Vatican City. The dead of the Pope shakes the world. He passed away … </li></ul><ul><li>Summary: </li></ul><ul><li>Earthquake in the Vatican. One dead. </li></ul>
  16. 16. <ul><li>… “ natural language systems must not simply understand the shallow surface meaning of language, but must also be able to understand the deeper implications and inferences that a user is likely to intend and is likely to take from language” (Waltz, 1982) </li></ul><ul><li>“… there seems to be no prospect for anything other than narrow-domain natural-language systems for the foreseeable future” (Waltz, 1982) </li></ul>
  17. 17. Automatic extraction <ul><li>Users various shallow methods to determine which sentences are important </li></ul><ul><li>It is fairly domain independent </li></ul><ul><li>Extracts units (e.g. sentences, paragraphs) and usually presents them in the order they appear </li></ul><ul><li>The extracts are not very coherent, but they can give the gist of the text </li></ul>
  18. 18. Purpose of this research <ul><li>Show how different types of linguistic information can be used to improve the quality of automatic summaries </li></ul><ul><li>Build automatic summarisers which relies on an increasing number of modules </li></ul><ul><li>Combine this information </li></ul><ul><li>Assess each of the summarisers </li></ul>
  19. 19. Setting of this research <ul><li>A corpus of 65 scientific articles from JAIR was used </li></ul><ul><li>Over 600,000 words in total </li></ul><ul><li>They were in electronic format </li></ul><ul><li>Contain author produced summaries </li></ul><ul><li>2%, 3%, 5%, 6% and 10% summaries are produced </li></ul>
  20. 20. Evaluation metric <ul><li>Cosine similarity between the automatic extract and the human produced abstract </li></ul><ul><li>It would be very interesting to repeat the experiments using alternative evaluation metrics e.g. ROUGE </li></ul>
  21. 21. Extracts vs. abstracts <ul><li>Human abstract </li></ul><ul><li>The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. </li></ul><ul><li>Extract </li></ul><ul><li>S16 Inductive Logic Programming (ILP) is a subfield of Logic Programming and Machine Learning that tries to induce clausal theories from given sets of positive and negative examples. </li></ul><ul><li>S24 The two main operations in ILP for modification of a theory are generalization and specialization. </li></ul><ul><li>S26 These operations only make sense within a generality order. </li></ul>
  22. 22. Extracts vs. abstracts <ul><li>Human abstract </li></ul><ul><li>The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. </li></ul><ul><li>Extract </li></ul><ul><li>S16 Inductive Logic Programming (ILP) is a subfield of Logic Programming and Machine Learning that tries to induce clausal theories from given sets of positive and negative examples. </li></ul><ul><li>S24 The two main operations in ILP for modification of a theory are generalization and specialization . </li></ul><ul><li>S26 These operations only make sense within a generality order . </li></ul>
  23. 23. Extracts vs. abstracts (II) <ul><li>It is not possible to obtain 100% match between extracts and abstracts </li></ul><ul><li>There is somewhere an upper limit for extracts </li></ul><ul><li>This upper limit is represented by the set of sentences which maximise the similarity with human abstracts </li></ul>
  24. 24. Determining the upper limit <ul><li>Try to find out the set of sentences which maximises the similarity with the human abstract </li></ul><ul><li>Two approaches: </li></ul><ul><ul><li>Greedy algorithm </li></ul></ul><ul><ul><li>A genetic algorithm </li></ul></ul><ul><li>More details in Orasan (2005) </li></ul>
  25. 25. The upper limit
  26. 26. Baseline <ul><li>Is a very simple method which does not employ too much knowledge </li></ul><ul><li>The first and last sentence in the paragraphs were used </li></ul>
  27. 27. The upper and lower limit
  28. 28. Term-based summarisation <ul><li>One of the most popular summarisation methods </li></ul><ul><li>It is rarely used on its own </li></ul><ul><li>Assumes that the importance of a sentence can be determined on the basis of the importance of words it contains </li></ul><ul><li>Various methods can be used to determine the importance of words </li></ul>
  29. 29. Term-frequency <ul><li>The importance of a word is determined by how frequent it is </li></ul><ul><li>Not very good for very frequent words such as articles and prepositions </li></ul><ul><li>A stop list can be used to filter out such words </li></ul>
  30. 30. TF*IDF <ul><li>Very popular method in IR and AS </li></ul><ul><li>IDF = inverse document frequency </li></ul><ul><li>A word which is frequent in a collection of documents cannot be important for a document even if it is quite frequent </li></ul>
  31. 33. Indicating phrases <ul><li>Indicating phrases are groups of words which can indicate the importance or “un-importance” of a sentence </li></ul><ul><li>They are usually meta-discourse markers </li></ul><ul><li>They are genre dependent </li></ul><ul><li>E.g. in this paper, we present , we conclude that , for example, we believe </li></ul>
  32. 35. More accurate word frequencies <ul><li>Words can be referred to by pronouns, this means that … </li></ul><ul><li>concepts represented by these words do not get accurate frequency scores </li></ul><ul><li>A pronoun resolution algorithm was employed to determine the antecedents of pronouns … </li></ul><ul><li>and obtain more accurate frequency scores for words </li></ul>
  33. 36. Mitkov’s Anaphora Resolution System (MARS) <ul><li>Relies on a set of boosting and impeding indicators to determine the antecedent from a set candidates: </li></ul><ul><ul><li>Prefer: subject, terms, closer candidates </li></ul></ul><ul><ul><li>Penalise: indefinite NPs, far away candidates </li></ul></ul><ul><li>A third of the pronouns in the corpus were annotated with anaphoric information </li></ul><ul><li>MARS: 51% success rate </li></ul><ul><li>More in Mitkov, Evans and Orasan (2002) </li></ul>
  34. 38. Combination of modules <ul><li>Used a linear combination of the previous modules: </li></ul><ul><ul><li>Term-based summariser enhanced with anaphora resolution </li></ul></ul><ul><ul><li>Indicating phrases </li></ul></ul><ul><ul><li>Positional clues </li></ul></ul><ul><li>The scores assigned by each module as normalised and each module obtained a weight of 1 </li></ul>
  35. 40. Discourse information <ul><li>Use a genetic algorithm to produce extracts which: </li></ul><ul><ul><li>Have the score assigned by the “Combined” summariser high </li></ul></ul><ul><ul><li>Consecutive sentences feature the same entities </li></ul></ul><ul><li>Loosely implements the Centering Theory </li></ul>
  36. 42. Conclusions <ul><li>It is possible to improve the accuracy of shallow automatic summarisers by using additional linguistic information </li></ul><ul><li>The linguistic information is relatively simple and easy to obtain </li></ul><ul><li>… but things are not always the way expect (see Orasan 2006) </li></ul><ul><li>The methods are domain independent </li></ul>
  37. 44. Thank you

×