Published on

Published in: Design, Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Novel approach of Domain Specific Ellipsis Handling in Question Answering Systems Rahul Chitturi Language Technology Research Center, IIIT-Hyderabad, INDIA Abstract Exact query: When does that train 1024 arrive in Bangalore? Human conversations often tend to be incomplete. Many a time, we tend Query 5: And Delhi? to shorten our conversations. The notion of omission from a text of one or Exact query: When does that train 1024 arrive in Delhi? more words that are obviously understood, but that must be supplied, to make a construction grammatically correct is called ellipsis [1]. In a con- versation, the computer should be in a position to handle the ellipsis de- The problem which we deal in this paper is, given a conversation pending on the context, previous dialogues and knowledge. Given a as in example 1; the exact(Complete) queries should be obtained. specific domain question answering system, we deal with how to handle Complete queries are the queries for which the SQL queries can ellipsis in that particular domain. In this paper we classify the ellipsis into be generated. This problem is first handled with the syntactic three types and try to provide solution for each of the three cases taking cues from the preceding queries. If there is no much clue then an example of the Railway Domain. The evaluation of this algorithm is semantic cues are used to handle the situation which is not gener- done comparing the results with that of well known Question Answering ally employed in the QA systems. The present QA systems like Systems, which proves that this approach is portable for domain specific AnswerBus [8], Quartz [9], Pai [10] don t take care of this ellip- systems. sis, which is quite essential in a natural conversation. Even the popular systems like START use only the syntactic information to handle the ellipsis [7]. In our paper, we present the semantic 1 Introduction approach which handles many of the complex ellipsis to make the The development of widespread computer technology has conversation more natural. This comparison is made in the changed many of our daily practices. Unfortunately, even today evaluation section (7). the computers lack the very basic sense of naturalness in commu- nicating with man. The creators of computer technology can lessen the disruptive force of the technology by practicing good 2 Issues in handling ellipsis, in a question design. Well designed computer systems should be useful, us- answering system able, easily learned, easily communicative and perform functions that let people do the things they want to do. It is this fundamen- 2.1 Identifying the complete queries tal necessity, which is ultimately leading the computer scientists to overcome this barrier, concentrating on the natural means of Identifying the completeness of a given sentence is the very basic communication. Tremendous research is being carried on the issue in ellipsis handling. In the example 1, the first query is a Natural Language Processing, Vision, etc now a day. The prob- complete sentence and the rest are incomplete sentences. It is lem which we deal in this paper is the Ellipsis Handling in a quite complex to identify the complete sentence. Even if the sen- Natural Language Dialogue System. tence structure is considered, for a given complete structure there can be sentences that are not complete [2]. Ellipsis structures pose a crucial problem for Natural Lan- guage Processing systems, designed to provide text understand- 2.2 Scope of the context ing or to handle dialogues. They contain information which is not overtly expressed, but which must be recovered through the iden- Generally there is a perplexity regarding the number of queries tification of an antecedent or previous occurrences. that should be kept in the memory, so that if they are referred to, the required knowledge can be appropriately retained. It is diffi- In a domain specific dialogue system, a machine answers queries cult to retrieve the desired query from its elliptical notation in the specific to that domain. For the dialogue to be as natural as pos- given knowledge base. This is clearly understood looking at the sible, the system should be able to handle incomplete questions. example 1. In this example, in order to handle the ellipsis in In order that the machine understands the query, the complete query 5, all the information from the first query is indispensable. query corresponding to an incomplete query has to be generated. So, the problem here is how many previous queries should be Let s see the example of ellipses in the railway reservation do- kept in the memory and also the way in which they should be main. The queries numbered are in the actual conversation and stored. their exact meanings are given correspondingly. 2.3 Entities in the domain Example 1 Query 1: What is the next train to Calcutta? Generally, there is a mapping difficulty between the entities in Answer: Train number 1024. the Entity Relationship Diagram of a Database Management Sys- tem and the entities in the domain that is being queried. It is Query 2: When does it start? worthwhile to note that the entities in the DBMS are different Exact query: When does the train 1024 to Calcutta start? from the entities that are to be modeled semantically as in Dialog Systems. This can be well understood from the discussions in the Query 3: Which platform? later part of the paper. Exact query: To which platform will the train 1024 arrive? The queries in a question answering system can be divided into Query 4: When does it arrive in Bangalore? three types. This classification also depends on the type of do-
  2. 2. main and the type of queries that are going to be handled. Based Query 2: To S (station)? Or From T (station)? on the experience that is gained from the structure of the queries in the Railway Reservation Domain, the generalization is done on 4.2 Type 2 (Grouping Based) the following classification for all the question answering sys- tems. Let s see the following example: 3 Difference between the ellipsis in discourse and Example 6 Query 1: At what time will X (Train) arrive? the question answering systems Query 2: What about Y(Train) ? 3.1 Ellipsis in Discourse In this case there will not be any prepositions. So these can be handled only by identifying the group of Noun Phrase (NP) to The author of the reference [6] mentions that there are various which it belongs. One might get a doubt that how is this different ways to describe the different types of ellipsis occurring in Eng- from the previous type (refer 3.1). Let us now see the following lish and other languages]. Sanders (1977) uses alphabetic charac- example ters to identify the six different positions in which ellipsis can occur, ranging from the first position in the first clause (position Example 7 A) to the last position in the second clause (position F): Query 1: When is the train from Bombay to Delhi? ABC&DEF Query 2: To Calcutta? Although there is disagreement about precisely which positions If we use grouping based method then we give no importance to permit ellipsis in English, most would agree that English allows the preposition. This results in ambiguity that which should the ellipsis in positions C, D, and E. Example (2) illustrates C- entity refer (Bombay? or Delhi?). Ellipsis: ellipsis of a constituent at the end of the first clause (marked by brackets) that is identical to a constituent (placed in 4.3 Type 3 (Semantic Based) italics) at the end of the second clause. All those ellipsis which cannot be classified as the above two Example 2(C Type): The author wrote [ ] and the copy-editor types come under this type. For this type, a semantic diagram can revised the introduction to the book. be built from the Entity Relationship diagram of the DBMS of the given domain. This can be easily understood by looking at the Examples (3) and (4) illustrate D- and E-Ellipsis: ellipsis of, re- following diagrams. (Please refer to Fig.1 and Fig. 2). spectively, the first and second parts of the second clause. Example 3(D Type): The students completed their course work Example 8 and [ ] left for summer vacation. Example 4(E Type): Sally likes fish, and her mother [ ] hamburg- Query 1: When will the train X arrive? ers. Query 2: To which platform? These types predominantly look at the intra-sentential ellipses. Every query can be handled by this type. But as this type is related to semantics, this gives only basic semantic relations. The 3.2 Ellipsis in Question Answering System first two types which are syntactically solvable are more accurate in giving the exact relationship. As seen in Example 1, the ellipsis in the QA systems is very dif- ferent from the general ellipsis. These are basically inter- 5 Algorithm for Ellipsis handling sentential ellipses. The case in Example 2 doesn t come into pic- ture in QA systems. Also in the Examples 3 and 4, there is a lot In this paper the ellipsis handling problem is divided into four of structural difference from the Example1. The author of the parts. First the completeness of the queries is identified. Then the reference [6] mentions that 86% of the elliptical coordinations are entities in the query need to be mapped to that of the domain. The of type D. C accounts for 2% and E for 5.5%. So, the ellipsis in queries along with their mapped entities are then analyzed. The the QA systems cannot be applied to the general ellipsis. analyzed queries are kept in memory so that the ellipsis in subse- quent queries can be handled. 4 Classification of elliptical queries in a ques- 5.1 Identifying the complete queries tion answering system The syntactic structure could be used with its corresponding se- In this paper, we classify the ellipsis in a question answering mantics, to obtain the semantics for the complete sentence. In this system into three types. The first two types have syntactic cues. case, the anaphoric expression is constrained to have the same The third type is based on the semantic cues. semantics as the complete expression [3]. But in our case, since this is a domain specific system the queries in the domain are 4.1 Type 1 (Preposition Based) limited. Finally, these have to be mapped to the DBMS queries. Though this seems to be very trivial for ellipsis han- So, a set of complete queries can be identified which are related dling, most of the ellipsis in a domain can be handled by this. to that domain and for which the DBMS queries can be mapped. This type of ellipsis is identified by the prepositions in the query These can be treated as complete queries. All these complete or sentence. This is easily understood by the following example: queries are stored in the beginning. As simulating a human con- versation is a very complex problem, some laborious work has to Example 5 be done in the initial stages of the system. This can be even Query 1: Is there any train from X (station) to Y (station)? automated using speech recognition systems at the field of our
  3. 3. interest. To enact the human conversation a lot of data is required intervention these queries can be checked if they are complete. for training the system. Using speech recognition systems the These can be used as templates for these complete queries. queries in the domain can be obtained. And with little human memory. So, in the next incomplete sentence if the same type of preposition entity occurs then the previously entered value is Num Name Seats replaced by the present value. Destination Source Generally, while speaking more stress is put on the head noun of Train the sentence. So, the head noun of a complete dialog is identified. Then whenever an incomplete dialog appears, the relationship Time Day between the head noun of the previous complete query and the s head noun of the incomplete dialog is identified. If in database, there are many queries with only those two heads as entities, then Plat- they are returned. If no relationships exist between the two enti- Travels Arrives Name Dis ties then null is returned. Book s Example 12 Pas- Na Loca- senger When will the train X (Group: Train_specific) arrive? PNR Station Train X is the head NP of the query Ad- dress To which platform (Group: Platform) ? Book- Platform is the head NP of the query Counter ing Offers Id Then the relationship between the Train_specific group and the Platform group is identified. Then all the queries with only these Avails Con- cession two semantic entities are returned. Example 10 Is there any train from X (place/station) to Y (place/station)? Typ Percentage To Delhi? ;{ To Station_name} is together treated as the entity e destination . Then to Y should be replaced with to Delhi Figure 1 Entity Relationship Diagram for Railway Reservation 5.3.2 Group Based Ellipsis System The entities which are left after the processing the prepositions, Example 9 will fall into some group. For example Delhi Express , Train 1) Will the {Train_specific} go from {Source} to {Destination}? number 4567 , etc refer to a specific train. If an incomplete query comes, then the value for that group in previous complete query Train_specific is a specific train { Train number 2039 , Delhi is replaced with the new value. Express , etc} Source is a station or place { Delhi , Mumbai station , etc} Destination is a station or place { Delhi , Mumbai Example 11 station , etc} At what time will X (Group: Train_specific) arrive? What about Y (Group: Train_specific)? This Y is substituted 5.2 Matcher in the previous complete query in the place of X. For each entity in the domain, all the possible values for that 5.3.3 Semantic Based Ellipsis entity are stored in the semantic graph. So, whenever a noun phrase appears, it is matched with all the possible values of each In the semantic graph, some entities have relations between one entity. Thus the noun phrases which are the entities in our domain another. The basic relationship between the possible semantic are identified. The entities need not be noun phrases but in this entities should be kept in the database in the beginning. paper we used only some defined set of noun phrases as the enti- ties. The output of the matcher will be given to the ellipsis han- These three types (3.1, 3.2, and 3.3) are not mutually exclusive. dler. But the procedure and the order in which they are applied is very important. As shown in example 3, if the solution for the second 5.3 Ellipsis Handler type is applied first, then there will be some problems. So, one has to apply the solutions for these types one after the other. As The following methods have to be employed one after the other first two types are more accurate, first apply 4.1, then 4.2. If the in the order. queries cannot be handled by these two types, then apply 4.3. This approach would handle most of the ellipsis in that domain 5.3.1 Preposition Based Ellipsis 5.4 Scope of the Context The prepositions which are important in handling ellipsis in the given domain are noted. Whenever these prepositions occur be- It is very complex to know how many queries should be kept in fore a semantic entity, they can be treated as a separate preposi- the memory. It depends on the type of domain. For example In- tion entity, which is different from the original entity and the teractive NLI agent [4] supports natural language queries and preposition. And the most recent value of this will be kept in the
  4. 4. commands along with a search history so that users can use their maintained. That is only entities are stored. At first, the entities queries based on the previous search results. should be given the default values. If some other value is occurs then the most recent value for that entity is stored. If the dialogues in the domain are kept in the memory, it becomes very difficult to handle the queries. So, a hash of all the entities is R1 Pnr Address Train type Pnr_number Passenger name R9 Train R5 Specific Train Booking Counter >Train name Counter id >Train number R7 R8 R2 R4 R6 Platform Concession Platform number Concession Type Station Source {To station} Destination {From sta- R3 tion} Figure 2: Semantic Graph, Edges indicate Basic Relations between the semantic entities which are in ovals and their attributes which are in rectangles. An example Basic Relation between Train and Platform: To which Platfrom will the train arrive? ` In the example 1, the word train in query 1 is identified as entity The Mechanism of Ellipsis Handling Train . Similarly Calcutta is identified as Destination (Destination is intermediate station in which the train arrives). In query 3, the word platform is identified as Platform . In query4, Bangalore is identified as Destination , so Calcutta is replaced Natural language query with Bangalore, as Calcutta is the latest value for entity Destination . In query 5, the Delhi is identified as Destination , Defaults & and Bangalore is replaced with the Delhi, as Delhi is the latest Possible values of value for entity Destination . entities Entity Identifier 6. Snapshot of the output Templates The following is the exact dialog with the system Complete query Identifier user: Tell me about the availability status of all train running from secundrabad to Delhi on 03-03-05? Incomplete query sys: ***Tell me about the availability status of all {train; Train} Memory running {from secundrabad; Source} {to Delhi; Destination} on Storage {03-03-05; Date}? Ellipsis Classifier Semantic sys: THIS IS ASSUMED AS A COMPLETE QUERY Graph ------------------------------------------------------------------ user: to calcutta ? Ellipsis Handler sys: ***{to calcutta;To_Station} ? sys: DID YOU MEAN? Preposition based ellipsis Ellipsis free query sys: Tell me about the availability status of all train running from secundrabad to Calcutta on 03-03-05? sys: ***Tell me about the availability status of all {train; Train} Figure 3: Schematic Diagram of the mechanism of ellipsis running {from Secundrabad; Source} {to Calcutta; Destination} handling on {03-03-05; Date}? ------------------------------------------------------------------ user: What about on 04-03-05?
  5. 5. sys: ***what about on {04-03-05; Date}? sys: ***to which {platform; Platform} will the {train; Train} arrive sys: DID YOU MEAN? Grouping based ellipsis ----------------------------------------------------------------- 7. Evaluation sys: Tell me about the availability status of all train running from secundrabad to Calcutta on 04-03-05? Many of the present QA systems do not handle the ellipsis. So, sys: ***Tell me about the availability status of all {train; Train} the results of our algorithm are compared with that of the START running {from secundrabad; Source} {to Calcutta; Destination} system which handles ellipsis. Also the START system handles on {04-03-05; Date}? only the ellipsis considering the previous query in a dialogue as ----------------------------------------------------------------- the context [7]. Generally in human conversation this is not the user: To which platform? case. In our case the till the occurrence of complete query all the incomplete queries are handled. sys: recognized the query as: ***To which {platform; Platform}? This approach is tested in three Domains - Railway Reservation sys: DID YOU MEAN? Semantic based ellipsis System, Course Registration System, and Library Maintenance System. For each system we took 100 test cases such that all the /* Complete queries with {Train; Platform} Train is the head NP three type of ellipses are handled. The test cases are the dialogs in of the latest complete query and the Platform is the head NP of that domain which have some inter-sentential ellipses as in ex- the previous query */ ample (1). These are tested with our algorithm and START sys- tem. Table 1-3 shows the result in all the three systems and types. sys: to which platform will the train X (the name of the train running from secundrabad to Calcutta on 04-03-05) arrive ? Table 1. Comparison with START system in Railway Reservation Domain System Accuracy for Accuracy for Accuracy for Railway Reservation Type 1 Ellipsis Type 2 Ellipsis Type 3 Ellipsis Total Test Cases: 40 Total Test Cases: 35 Total Test Cases: 25 Algorithm discussed 100% 97.14% 80% START system 57.5% 42.85% 0% Table 2. Comparison with START system in Course Registration Domain System Accuracy for Accuracy for Accuracy for Course Registration Type 1 Ellipsis Type 2 Ellipsis Type 3 Ellipsis Total Test Cases: 35 Total Test Cases: 30 Total Test Cases: 35 Algorithm discussed 100% 93.3% 65.71% START system 54.2% 40% 0% Table 3. Comparison with START system in Library Maintenance Domain System Accuracy for Accuracy for Accuracy for Library Maintenance Type 1 Ellipsis Type 2 Ellipsis Type 3 Ellipsis Total Test Cases: 40 Total Test Cases: 30 Total Test Cases: 30 Algorithm discussed 100% 96.6% 83.3% START system 52.5% 50% 0% 8. Future work 10. References The endeavor from here onwards would be to identify ellipses 1. involving NPs, Verb groups and also for semantic nets. The idea would be to make the system respond instantaneously to user 2. Mary Dalrymple, Stuart M. Shieber, and Fernando : queries and use this query as a supervised input to generate a Pereira. 1991. Ellipsis and higher-order unification. database which relates to similar patters easily the next time they Linguistics and Philosophy, 14:399-452. are keyed in. Also a speech component identifying prosody ef- 3. Andrew Kehler, Common Topics and Coherent Situa- fects is planned. tions: Interpreting Ellipsis in the Context of Discourse 9. Acknowledgements Inference In Proceedings of the 32nd Annual Confer- ence of the Association for Computational Linguistics I would like to Dr. Dipti Misra Sharma and Prof. Rajeev Sangal, (ACL-94), pp. 50- 57, Las Cruces, June, 1994. LTRC, IIIT-Hyderabad who helped us a lot in this project, with 4. Lee Jong-Hyeok, Rho Hyunchul, Park Young-Tack, their feed back. Choi Joongmin, Seo Jungyu Interactive NLI Agent
  6. 6. For Multi-Agent Web Search Model - Geunbae, Jong- call Oriented Approach to Open Domain Question Hyeok..(1998) Answering 10. 5. Koeneman, Olaf, Sergio Baauw & Frank Wijnen (1998). Reconstruction in VP-ellipsis: Reflexive vs. 11. Jay Budzik and Kristian J. Hammond. Learning for non-reflexive predicates. Poster presented at the 11th Question Answering and Text Classification: Integrat- Annual CUNY Conference on Human Sentence ing Knowledge-Based and Statistical Techniques. Processing. New Brunswick, NJ, March 19-21, 1998. AAAI Workshop on Text Classification. Menlo Park, CA, 1998 6. Charles F. Meyer, University of Massachusetts, Boston : English Corpus Linguistics An Introduction, Series: 12. Sanda Harabagiu, Marius Pasca, and Steven Maiorano. Studies in English Language Experiments with open-domain textual question answering. COLING-2000. Association for 7. Boris Katz, MIT CSAIL: Discourse and Dialog in the Computational Linguistics/Morgan Kaufmann, Aug START Question Answering System, SIGDial 04 2000. 13. Sanda Harabagiu, Mihai Surdeanu, Rada Mihalcea, 8. IR-244 2002 Pinto, D., Branstein, M., Coleman, R., Roxana Girju, Vasile Rus, Finley Lacatusu, Paul King, M., Li, W., Wei, X. and Croft, W.B. QuASM: A Morarescu and Razvan Bunescu. Answering Complex, System for Question Answering Using Semi-Structured List and Context Questions with LCC s Question- An- Data , the JCDL 2002 Joint Conference on Digital swering Server. Tenth Text REtrieval Conference Libraries, pp. 46-55 (TREC-10). Gaithersburg, MD. November 13-16, 9. David Ahn, Valentin Jijkoun, Gilad Mishne, Karin 2001. Müller, Maarten de Rijke, and Stefan Schlobach (In- formatics Institute, University of Amsterdam): A Re-