Using construction grammar in conversational systems
Using Construction Grammar in Conversational Systems Marie-Claire Jenkins, PhD Thesis (High level overview)
Overview This thesis was motivated by the machine's limitations in understanding natural language and in forming responses. The limitations and complexities of current search engine querying was also a factor. Conversational systems are good for testing possible solutions and are useful on the web. We used methods that are not common in these systems: - Construction Grammar (CxG) - OWL ontologies - Lexical semantics - A new stemmer (Uea-Lite)
What I'm going to talk about <ul><ul><li>Conversational systems: what they are and how they work & what their limitations are </li></ul></ul><ul><ul><li> The Turing test and the Loebner prize </li></ul></ul><ul><ul><li> 2 early experimental systems that we built </li></ul></ul><ul><ul><li>OWL ontologies vs databases </li></ul></ul><ul><ul><li> Construction grammar and Fluid construction grammar </li></ul></ul><ul><ul><li> UEA-Lite stemmer </li></ul></ul><ul><ul><li>Machine learning component </li></ul></ul><ul><ul><li> KIA system diagram </li></ul></ul><ul><ul><li>Evaluation methods and learnings </li></ul></ul>
Things I covered in my research: - Natural language understanding - Natural language generation - Human computer interaction - Service oriented systems Things I didn't cover in my research: - Knowledge acquisition - Open domains - Affective behaviour - Everything else
Conversational systems They are more commonly referred to as "chatbots" or “ Artificial Conversational Entities ” They converse with a user in natural language and simulate a human-human conversation. They need to: - "Understand ” the user input - Retrieve relevant information - Generate a natural language response There are 3 different kinds of chatbots...
Social chatbots Their purpose is to chat freely about anything at all with a user, much like you would with a friend. They are used online for fun.
Educational chatbots Their purpose is to help the user learn about something such as a new language, history or geography. They are often used in schools
Service oriented chatbots Their purpose is to help customers find their way around the website and also to answer questions about their products & services.
How they work There are a variety of methods used but the most popular are: - Database driven - AIML (artificial intelligence markup language, xml based) - Canned responses - Stochastic methods - Supervised learning - Named entity recognition - Templates
Phrase-based systems “ Phrase Based systems” are seen as generalized templates at the sentence level (like phrase structure rules) or at the discourse level. 1- Phrasal pattern selected [subject noun verb] 2 - Each part of the pattern is expanded [noun modifiers] 3 - When each phrasal pattern has been replaced by 1+ words –END They are very difficult to build because the phrasal interrelationships must be clearly specified otherwise there can be inappropriate phrase expansions.
Feature-based systems In “Feature-based systems” each possible alternative is represented by a feature and each sentence is specified by them. Sentence generation is achieved by using all of these features until the sentence is determined. Features may include: positive/negative, past/present, statement/question… Strength: any distinction in language can be a feature Weakness: very hard to maintain feature inter-relationships and the control of feature selection.
Observations from live data Tests on dialogue from the human-human customer service system on a large commercial website reveal that there is no consistency in language or phrase formulation. There is a very small amount of Formulaic language (canned responses). A question was never formulated in the same way and never answered in the same way (apart from formulaicity). This makes it hard for us to produce templates or anticipate user utterances.
More Limitations Main issues with existing systems: - Scalability - Knowledge & information storage - User input disambiguation - Response generation (word order, vocabulary, etc...) - Knowledge/information retrieval - Anaphora - Managing the dialogue - Displaying appropriate behaviour (affective issues) - Knowledge assimilation - Evaluation
Turing test “ A machine is termed capable of thinking if it can, under certain prescribed conditions imitate a human by answering questions sufficiently well to deceive a human questioner for a reasonable period of time. ” (Turing) Objections to the test include proving intelligence, "understanding" and other things. My personal opinion has changed since the beginning of my PhD research: “ The question of whether a computer can think is no more interesting than the question of whether a submarine can swim. ” (Dijkstra)
Loebner prize This yearly contest is run by Hugh Loebner who has offered a $100,000 prize for the 1st chatbot to pass the Turing test This test is controversial. Marvin Minsky said : “ I do hope that someone will volunteer to violate this proscription so that Mr. Loebner will indeed revoke his stupid prize, save himself some money, and spare us the horror of this obnoxious and unproductive annual publicity campaign. ”
Loebner prize diagram Michael Mauldin- carnegie mellon
John We built a conversational chatbot and entered it into the Loebner prize (2006). It was designed & built in 2 months and operated on a closed domain. Reason: to run on a small database requiring little manual labour. We used ngrams, weighted responses, a vector approach, perl, Brill, UEA-Lite, wildcards, AIML We were a finalist and we learned that: - A small database worked for a small amount of time - A database system makes for laborious build and limited information (well used systems work much better) - Template methods are limited - Canned responses are awkward - AIML is restrictive
KIA: the HCI tests We designed a system made to research human-machine interaction and human behaviour: this is a test on humans and not the system We included functions that were meant to test user persistence with query repair, emotive response, language etc... Results: users persist, are emotive, sensitive to interface design and more. Details available in our paper
Databases vs OWL ontologies: Databases focus on local semantics and ontologies on global semantics. In ontologies the semantics are explicit and in databases implicit. Ontologies allow data to be reused whereas database schemas cannot be reused. Ontologies are portable between websites to facilitate maintenance and construction Restrictions in databases do not allow for all of the necessary relations to be built into the data.
OWL flavour We used OWL (Web Ontology Language) as it is more expressive than other semantic web languages and is built to enable ontologies to be created easily. It is a semantic markup language and an extension of RDF (Resource Description Framework). There are different subsets of OWL: OWL Full, OWL Lite and OWL DL (Description Logic). We chose to use OWL DL.
Why Ontologies & why OWL DL? Taxonomies are also not as expansive as ontologies. “ At one extreme there are ontologies and the other mind maps and pathfinder networks, and in between taxonomies and browserable hierarchies ”. (Brewtser and Wilkes) Ontologies have a greater potential for inference and a greater degree of formality. OWL DL has stricter restrictions which are necessary in our type of system. It has maximum expressiveness without losing computational completeness (all entailments are will be computed) and decidability (all computations will finish in finite time) of reasoning systems.
Construction Grammar It is a cognitive linguistic method and it is: - Constraint based - Generative - Non-derivational - A monostratal grammatical model - Incorporates the cognitive and interactional foundations of language - Consists of taxonomies of families of constructions - Uses entire constructions as the primary unit of grammar - Is a pairing of form and meaning (metonomic) - Frames used in CxG != regular frames because the argument structure types invoke frames which designate event types - The verb alone is not the main unit of meaning, the construction itself is
Constructions Words Sentences Constructions make sense in computing
Example of CxG Semantics: relational predicate involving a singer Syntactics: predicate requires arguments and ``Heather'' is the subject Generative Grammar Construction Grammar
Advantages of CxG - Adapts to changing language patterns easily - Takes into consideration both semantics and syntactics - Constructions are easier to manage than words as the atomic unit - Allows for integration into bigger collections of constructions - Can be computed
UEA-Lite stemmer After testing the system with all available stemmers, we realised that we needed to design our own to facilitate topic/construction detection. UEA-Lite stems conservatively to orthographically correct word forms and recognizes words which do not need to be stemmed. There is a Perl, Java and Ruby version More information here (an updated paper to follow soon)
Machine learning <ul><li>It identifies constructions (NP or VP), the syntactic pole and the semantic pole feed information so constructions to be loaded with meaning and form information. </li></ul><ul><li>The machine learning engine finds sets of constructions which commonly work in conjunction with each other or that have been used in conjunction in the past. </li></ul><ul><li>The weights are adjusted each time a new construction is added. This happens when the system encounters a new instance. </li></ul><ul><li>The engine runs through this data and calculates a probability of the right matches to the query information to be found. </li></ul>
Algorithms - Jaccard Distance to weight the constructions (how often different constructions are found in conjunction, partial or complete) - Naive Bayes algorithm clusters all of the constructions according to their different features in our training set (requires little training data) Once the data has been processed through the Naive Bayes algorithm we know which constructions are often found with others, and in what order. We not only look at the syntax but also at the semantic aspect both in isolation and in conjunction with each other. The role of the classifier is to determine which categories future constructions belong to, and also to tell us which constructions are a likely match to a query.
Naïve Bayes for CxG P (Constructions) doesn't change over time. Naive Bayes estimates a multinomial distribution over categories, which is the prior distribution of categories We can therefore say that: Best category [ArgaMax cat in cats] = P (constructions ¦ cat) (P (cat)) If c1, c2, ... cn are the constructions in the document, then: Best category [ArgaMax cat in cats] = P(c1|cat)*P(c2|cat)*...*P(cn|cat)*P(cat)
System diagram There are many more components to the system than presented in this presentation as you can see.
Evaluation methods There are not any robust evaluation methods for conversational systems but we found that a mixture of the following worked well: - Human evaluation (feedback form) - " Pourpre ” to evaluate sentence complexity (Jimmy Lin) - Expected vs Given response score Evaluation is not finished as yet but the initial results are encouraging with good knowledge retrieval and construction selection.
Things that didn't work <ul><li>Using LSI/PLSI to determine the similarity between individual utterances in order to extract useful constructions failed. </li></ul><ul><li>The reasons: </li></ul><ul><li> LSI is an information retrieval method and Q&A systems require a higher level of accuracy. </li></ul><ul><li>Information retrieval uses a hammer and every problem is a nail. </li></ul><ul><li>Subtler systems require a more delicate approach. </li></ul><ul><li>It is very hard to get LSI to scale to sentence level, which is interesting as it has been proven that it doesn't scale </li></ul><ul><li>The fact that it can't capture polysemy is ok because we disambiguate prior to this and append information to constructions </li></ul>
Fluid construction Grammar (FCG) (also didn't work!) - Bi-directional (using rules) - Selects meanings and maps them into the real world. - "fluid" because it takes into consideration the fact that users change and update their grammars often. - User input can be broken down syntactically in order to gain meaning from the grammatical components, whilst also being able to map the semantic relationships BUT : not developed enough to work well in our system Also: bi-directional rules are very hard to write
Some Outcomes & Learnings - Construction Grammar is a useful method for NLU & NLG - OWL ontologies are well suited to these systems - Stemming affects the system greatly - Fluid CxG is not practical at this time - Better evaluation methods need to be developed - Turing test is not useful as it does not prove machine intelligence or understanding - User perception is a primordial area of research
Applications & Future work - Assisted search - Summarization systems - Content creation - Speech systems - Sentiment analysis - More powerful AI module - Anaphora resolution - Open domain testing - Improved machine learning - Further work on query disambiguation methods
Thank you Find me at: http://www.scienceforseo.com http://twitter.com/missmcj Google reader