200801229 final presentation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

200801229 final presentation

on

  • 346 views

Presentation on Textual commitment system for natural language understanding

Presentation on Textual commitment system for natural language understanding

Statistics

Views

Total Views
346
Views on SlideShare
346
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

200801229 final presentation Presentation Transcript

  • 1. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Towards Building a Text Commitment System Gaurav Arora1 200801229 Supervisor Prof. Prasenjit Majumder1 1 Dhirubhai Ambani Institute of Information and Communication Technology
  • 2. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesOutline 1 Introduction Problem Defination Natural Langauge understanding Literature Survey and Usage Approach Overview 2 Structural Features 3 Our Approach Generating Model for simple sentences Extracting Similar POS Patterns and sentence genreration 4 Results and Analysis
  • 3. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesProblem DefinitionTextual Commitment Publicly Held Beliefs Textual Commitment system simplifies complex sentence in a set of simple sentences which are public beliefs conveyed by the complex sentence. Textual Commitment Origin Textual Entailment was proposed by LCC(Language Computer Corporation) to be used as core component module for Natural Language Understanding.
  • 4. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesProblem DefinitionTextual Commitment Example Example Complex Sentence Text: "The Extra Girl" (1923) is a story of a small?town girl,Sue Graham (played by Mabel Normand) who comes to Hollywood to be in the pictures. This Mabel Normand vehicle, produced by Mack Sennett, followed earlier films about the film industry and also paved the way for later films about Hollywood, such as King Vidor?s "Show People" (1928). Simplified Sentences T1. "The Extra Girl" is a story of a small?town girl. T2. "The Extra Girl" is a story of Sue Graham. T3. Sue Graham is a small?town girl. T4. Sue Graham [was] played by Mabel Normand. T5. Sue Graham comes to Hollywood to be in the pictures. T6. A Mabel Normand vehicle was produced by Mack Sennett.
  • 5. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesNatural Language understandingLanguage Understanding Components By machine reading or understanding text mean the formation of a coherent set of beliefs based on a textual corpus and a background theory. Textual Entailment systems determine whether one sentence is entailed by another. Language understanding Features Noisy Limited scope Corpus-wide statistics Minimal reasoning Bottom up General Very Fast!
  • 6. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesLiterature Survey and UsageQuestion Answering to QA4MRE Question Answering(QA) System have a upper bound of 60% of accuracy in systems performance. Current QA system have less emphasis on Understanding and analyzing text. To tackle 60% upper bound QA4MRE focuses on understanding single document and emphasis is on component like Textual Commitment,Textual Entailment.
  • 7. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesLiterature Survey and UsageTextual Entailment Pascal Recognising Textual Entailment(RTE) Challenge is reputed evaluation campaign for research in Textual Entailment from past 7 years. Researcher use logic prover to detect entailment to overcome need of background knowledge with an performance upper bound as 71%. LCC Proposed Textual commitment obtained 9% improvement over upper bound.
  • 8. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesLiterature Survey and UsageTextual Entailment classes Figure: Textual Entailment Classes
  • 9. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesLiterature Survey and UsageTextual Commitment Approach LCC Heuristic Approach LCC’s TC system uses a series of ex-traction heuristics in order to enumerate a subset of the discourse commitments that are inferable from either the text or hypothesis Statistical approach for Textual Commitment Due to unavailability of Heuristics, we decided to build a Textual Commitment system using statistical features of Language.
  • 10. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesApproach OverviewStatistical approach for Textual Commitment Learning Grammatical Structural rules of Simple Sentences(POS Tags). Converting Complex Sentences into Structural Elements. Finding Similar Rules for Generating Simple sentences. Generating simple sentences in natural language based on Rules. Example Part of Speech Tagging They-PRP were-VBD easy-JJ as-IN they-PRP levelled-VBD Feature Key feature for statistical language , Textual Commitment generation is Part of Speech tagging.
  • 11. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesSimple Sentence Distribution Figure: A Distribution of sentence in english
  • 12. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesComparison of POS Tags Figure: A Distribution of POS Tags in simple sentences
  • 13. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesGenerating Model for simple sentencesModule 1 Block diagram
  • 14. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesGenerating Model for simple sentencesBasic Components Tri-gram Language Model Generation on POS Tags. Artificial Generation of POS Patterns. Ranking of Artificially generated sentences based on created Language Model. Example Ranked POSTAG Patterns -53.7293 DT NN VBD VBN -54.0778 PRP VBP RB VBN -54.2327 NNP NN NNP NNP -54.7982 PRP VBP RB JJ -55.3234 NNP NNP NN NNP Total Generated Rules: 9606406
  • 15. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesGenerating Model for simple sentencesDistribution of POSTAG in Simple Sentence Tokens Figure: Textual Entailment Classes
  • 16. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesGenerating Model for simple sentencesDistribution of Simple Sentence based on LM score Total Rules: 9606406 Number of rules categorized by scores Rules > -100 ( 679545 ) Rules > -90 ( 170662 ) Rules > -80 ( 27280 ) Rules > -70 ( 2328 ) Rules > -65 ( 474 ) Rules > -60 ( 76 ) Rules > -70,Words length - 5 and 6 1594 Rules > -83,Words length - 7 and 8 3110 Total Rules considered for matching: 4704
  • 17. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesExtracting Similar POS Patterns and sentence generationModule 2 and 3 Block Diagram
  • 18. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesExtracting Similar POS Patterns and sentence generationExtracting Similar POS Patterns - Basic Components Extraction of POS tags and Chunks from Complex sentences. Chunks are Noun Phrase,Together occurring words which must also occur together in simple sentences. Considering POS Rules from Module 1 as Virtual documents. Searching for Rules/Documents Similar to Chunks and POSTAG in complex sentences. Xapian is used for search,Phrase search to ensure occurrence of chunks tags together in Similar rules.
  • 19. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesExtracting Similar POS Patterns and sentence generationExtracting Similar POS Patterns - Module I/O Example Sentence: A Revenue Cutter, the ship was named for Harriet Lane, niece of President James Buchanan,who served as Buchanan?s White House hostess. Example Frequency of POS tags,chunks in Complex Sentences: POSTags: WP=1, VBN=1, IN=3, NNP=8, DT=2, VBD=2,.. Chunks: DT NN NN=1, VBD VBN=1,NNP NNP NNP=1 ,.. Example Extracted Patterns from Xapian: 91% NNP NNP VBD VBN IN DT NN RB 86% NNP VBD RB VBN IN DT NN RB
  • 20. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesExtracting Similar POS Patterns and sentence generationSimple Sentence Generation - Basic Components Replacement of all chunks in Similar POS Tag Rules with chunk value. Additional rules with different chunk values are added if chunk maps to more than one value. After replacement of chunk, Left POS Tags are filled with values. Module Generate a lot of noisy sentences from this module.
  • 21. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesExtracting Similar POS Patterns and sentence generationSimple Sentence Generation - Module I/O chunk value mapping NNP NNP-1=White House, NNP NNP-0=Harriet Lane,VBD VBN-0=was named, DT NN-0=the ship, RB IN-0=niece of Example A Revenue Cutter, the ship was named for Harriet Lane, niece of President James Buchanan,who served as Buchanan?s White House hostess. Simple Sentences: Harriet Lane President James Buchanan niece Harriet Lane served for hostess Harriet Lane was for the ship Buchanan ? White House the ship hostess Harriet Lane was the ship
  • 22. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesRecall System Recall of System is important,System is input to textual entailment and other Natural Language Understanding Module. Recall of our statistical Textual Commitment system is 0.23. System Recall calculated on 5 Complex Sentences. Recall value shows positive signs for Sophisticated Statistical Textual Commitment system.
  • 23. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesAnalysis Require additional module to rank good sentences and remove noisy sentences. Sophisticated Natural Language Generation Module. Generating Simple sentence pattern from Complex sentences rather than Artificially generating Rules. Finding a more suitable Model- Combination of bi-gram and tri-gram , bigram model .
  • 24. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesAcknowledgement I would like to express my sincere thanks to Prof. Prasenjit Majumder for providing opportunity to work under his esteem guidance and helping throughout the project and providing his valuable critical suggestion on my work.Additionally i would like to thanks SRILM and Xapian team for helping me work with their open source software.
  • 25. Introduction Structural Features Our Approach Results and Analysis Acknowledgement ReferencesReferences Hickl: A discourse commitment-based framework for recognizing textual entailment Anselmo Peñas et. al. Overview of QA4MRE at CLEF 2011: Question Answering for Machine Reading Evaluation, Working Notes of CLEF (2011) L. Bentivogli (FBK-irst) et. al. The Sixth PASCAL Recognizing Textual Entailment Challenge ( 2010) Olly Betts,Xapian,version 1.2.9 Asher Stern and Ido Dagan: A Confidence Model for Syntactically-Motivated Entailment Proofs. In Proceedings of RANLP 2011 Katrin Kirchhoff et. al. Factored Language Models Tutorial (2008) (2002) The IEEE website. [Online]. Available: http://www.ieee.org/ (2010) SRILM-Language Modelling Toolkit.