• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Natural Language Checking with Program Checking Tools

  • 7,131 views
Uploaded on

Written text is an important component in the process of knowledge acquisition and communication. Poorly written text fails to deliver clear ideas to the reader no matter how revolutionary and …

Written text is an important component in the process of knowledge acquisition and communication. Poorly written text fails to deliver clear ideas to the reader no matter how revolutionary and ground-breaking these ideas are. Providing text with good writing style is essential to transfer ideas smoothly. While we have sophisticated tools to check for stylistic problems in program code, we do not apply the same techniques for written text. In this paper we present TextLint, a rule-based tool to check for common style errors in natural language. TextLint provides a structural model of written text and an extensible rule-based checking mechanism.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,131
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
14
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Natural Language Checking with Program Checking Tools Fabrizio Perin, Lukas Renggli, Jorge Ressia
  • 2. Programming Languages Parser Syntax Compiler Style
  • 3. Programming Languages Parser Syntax Compiler Program Style Checker
  • 4. Programming Natural Languages Languages Parser Syntax Compiler Program Style Checker
  • 5. Programming Natural Languages Languages Spell Checker Parser Syntax Grammar Compiler Checker Program Style Checker
  • 6. Programming Natural Languages Languages Spell Checker Parser Syntax Grammar Compiler Checker Program Style TextLint Checker
  • 7. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 8. (2) we implement an object-oriented model used to represent natural text in Smalltalk; .txt (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and .html (4) we demonstrate a graphical user interface that presents and explains the .tex problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 9. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 10. · The Markup models L TEX or HTML commands depending on the filetype A (2) we implement an object-oriented model used to represent natural text in of the input. All document elements answer the message text which returns a plain string representation of the modeled text entity ignoring markup tokens. Furthermore Smalltalk; all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element (3) we demonstrate a pattern matcher for the detection of style issues in text() interval() 1* 1* 1* natural language; and Document Paragraph Sentence Phrase 1 1 * SyntacticElement (4) we demonstrate a graphical user interface that presents and explains the * text() interval() problems detected by the tool. Word Punctuation Whitespace Fig. 3. The TextLint model and the relationships between its classes. Markup 3 From Strings to Objects Text Parsing To build the high-level document model from the flat input string we use Model PetitParser [7]. PetitParser is a framework targeted at parsing formal languages (e.g., programming languages), but we employ it in this project to parse natural Validation Failures 4 Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 11. representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() 1* 1* 1* Document Paragraph Sentence Phrase 1 1 * SyntacticElement * text() interval() raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Word Punctuation Whitespace Markup Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Fig. 3. The TextLint model and the relationships between its classes. Model Validation Failures Rules Styles GUI
  • 12. representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() 1* 1* 1* Document Paragraph Sentence Phrase 1 1 * SyntacticElement * text() interval() raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Word Punctuation Whitespace Markup Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Fig. 3. The TextLint model and the relationships between its classes. Model Validation Failures Rules Styles GUI
  • 13. representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() 1* 1* 1* Document Paragraph Sentence Phrase 1 1 * SyntacticElement * text() interval() raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Word Punctuation Whitespace Markup Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Fig. 3. The TextLint model and the relationships between its classes. Model Validation Failures Rules Styles GUI
  • 14. representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the Other Language the model are depicted in Figure 3. elements in Models Element text() interval() 1* 1* 1* Document Paragraph Sentence Phrase 1 1 * SyntacticElement * text() interval() raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Word Punctuation Whitespace Markup Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Fig. 3. The TextLint model and the relationships between its classes. Model Validation Failures Rules Styles GUI
  • 15. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 16. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 17. Avoid "a lot" Avoid "irregardless" Avoid "a" Avoid "one of the most" Avoid "allow to" Avoid "regarded as" Avoid "an" Avoid "required to" Avoid "as to whether" Avoid "somehow" Avoid "can not" Avoid "stuff" Avoid "case" Avoid "the fact is" Avoid "certainly" Avoid "the fact that" Avoid "could" Avoid "the truth is" Avoid "currently" Avoid "thing" Avoid "different than" Avoid "thus" Avoid "doubt but" Avoid "true fact" Avoid "each and every one" Avoid "would" Avoid "enormity" Avoid comma Avoid "factor" Avoid connectors repetition Avoid "funny" Avoid continuous punctuation Avoid "help but" Avoid continuous word repetition Avoid "help to" Avoid contraction Avoid "however" Avoid joined sentences Avoid "importantly" raries: For parsing natural languages we use PetitParser [7], a flexible Avoid long paragraph Avoid "in order to" rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour Avoid long sentence Avoid "in regards to" , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. Avoid passive voice Avoid "in terms of" he contributions of this paper are: Avoid qualifier Avoid "insightful" 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Avoid whitespace Smalltalk; Avoid "interesting" 3) we demonstrate a pattern matcher for the detection of style issues in Avoid word repetition natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 18. Avoid "a lot" Avoid "irregardless" Avoid "a" Avoid "one of the most" Avoid "allow to" Avoid "regarded as" Avoid "an" Avoid "required to" Avoid "as to whether" Avoid "somehow" Avoid "can not" Avoid "stuff" Avoid "case" Avoid "the fact is" Avoid "certainly" Avoid "the fact that" Avoid "could" Avoid "the truth is" Avoid "currently" Avoid "thing" Avoid "different than" (self  word:  ‘somehow’) Avoid "thus" Avoid "doubt but" Avoid "true fact" Avoid "each and every one" Avoid "would" Avoid "enormity" Avoid comma Avoid "factor" Avoid connectors repetition Avoid "funny" Avoid continuous punctuation Avoid "help but" Avoid continuous word repetition Avoid "help to" Avoid contraction Avoid "however" Avoid joined sentences Avoid "importantly" raries: For parsing natural languages we use PetitParser [7], a flexible Avoid long paragraph Avoid "in order to" rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour Avoid long sentence Avoid "in regards to" , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. Avoid passive voice Avoid "in terms of" he contributions of this paper are: Avoid qualifier Avoid "insightful" 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Avoid whitespace Smalltalk; Avoid "interesting" 3) we demonstrate a pattern matcher for the detection of style issues in Avoid word repetition natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 19. Avoid "a lot" Avoid "irregardless" Avoid "a" Avoid "one of the most" Avoid "allow to" Avoid "regarded as" Avoid "an" Avoid "required to" Avoid "as to whether" Avoid "somehow" Avoid "can not" Avoid "stuff" Avoid "case" Avoid "the fact is" Avoid "certainly" Avoid "the fact that" Avoid "could" Avoid "the truth is" (self  punctuation)  ,  (self  punctuation) Avoid "currently" Avoid "thing" Avoid "different than" Avoid "thus" Avoid "doubt but" Avoid "true fact" Avoid "each and every one" Avoid "would" Avoid "enormity" Avoid comma Avoid "factor" Avoid connectors repetition Avoid "funny" Avoid continuous punctuation Avoid "help but" Avoid continuous word repetition Avoid "help to" Avoid contraction Avoid "however" Avoid joined sentences Avoid "importantly" raries: For parsing natural languages we use PetitParser [7], a flexible Avoid long paragraph Avoid "in order to" rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour Avoid long sentence Avoid "in regards to" , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. Avoid passive voice Avoid "in terms of" he contributions of this paper are: Avoid qualifier Avoid "insightful" 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Avoid whitespace Smalltalk; Avoid "interesting" 3) we demonstrate a pattern matcher for the detection of style issues in Avoid word repetition natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 20. Avoid "a lot" Avoid "irregardless" Avoid "a" Avoid "one of the most" Avoid "allow to" Avoid "regarded as" Avoid "an" Avoid "required to" Avoid "as to whether" Avoid "somehow" Avoid "can not" Avoid "stuff" Avoid "case" Avoid "the fact is" Avoid "certainly" Avoid "the fact that" (self  wordIn:  #('am'  'are'  'were'  'being'  ...  ))  ,   Avoid "could" Avoid "the truth is" Avoid "currently" Avoid "thing" (self  separator  star)  ,   Avoid "different than" Avoid "thus" ((self  wordSatisfying:  [  :value  |  value  endsWith:  'ed'  ])  /   Avoid "doubt but" Avoid "true fact" Avoid "each and every one" Avoid "would"  (self  wordIn:  #('awoken'  'been'  'born'  'beat'  ...  ))) Avoid "enormity" Avoid comma Avoid "factor" Avoid connectors repetition Avoid "funny" Avoid continuous punctuation Avoid "help but" Avoid continuous word repetition Avoid "help to" Avoid contraction Avoid "however" Avoid joined sentences Avoid "importantly" raries: For parsing natural languages we use PetitParser [7], a flexible Avoid long paragraph Avoid "in order to" rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour Avoid long sentence Avoid "in regards to" , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. Avoid passive voice Avoid "in terms of" he contributions of this paper are: Avoid qualifier Avoid "insightful" 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Avoid whitespace Smalltalk; Avoid "interesting" 3) we demonstrate a pattern matcher for the detection of style issues in Avoid word repetition natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 21. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 22. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. scientificPaperStyle  :=  TLTextLintRule  allRules Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces -­‐  TLWordRepetitionInParagraphRule the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 23. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 24. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 25. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 26. raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 27. raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 28. Validation
  • 29. Issues Words t t1 t2 t3 t4 Fig. 6. Evolution of a paper from beginning to publication. 7.1 History of a Paper
  • 30. -74% Avoid ‘currently’ -25% Avoid ‘certainly’ -24% Avoid ‘would’ -20% Avoid ‘factor’ -20% Avoid long paragraph -13% Avoid ‘thus’ -10% Avoid ‘however’ -7% Avoid ‘case’ -5% Avoid ‘can not’ -5% Avoid ‘could’ -4% Avoid passive voice -3% Avoid ‘insightful’ -3% Avoid ‘stuff’ -1% Avoid joined sentences Avoid ‘as to whether’ 0% Avoid ‘different than’ 0% Avoid ‘doubt but’ 0% Avoid ‘each and every one’ 0% Avoid ‘enormity’ 0% Avoid ‘help but’ 0% Avoid ‘in regards to’ 0% Avoid ‘irregardless’ 0% Avoid ‘regarded as’ 0% Avoid ‘the fact is’ 0% Avoid ‘the truth is’ 0% Avoid ‘true fact’ 0% Avoid comma 0% Avoid qualifier 2% Avoid ‘funny’ 5% Avoid ‘one of the most’ 5% Avoid ‘importantly’ 9% Avoid long sentence 10% Avoid ‘an’ 10% Avoid continuous punctuation 15% Avoid ‘interesting’ 17% Avoid ‘required to’ 17% Avoid ‘a’ 23% Avoid ‘in order to’ 23% Avoid continuous word repetition 24% Avoid ‘in terms of’ 24% Avoid ‘somehow’ 25% Avoid ‘help to’ 27% Avoid ‘the fact that’ 32% Avoid whitespace 45% Avoid ‘allow to’ 46% Avoid ‘a lot’ 55% Avoid ‘thing’ 70% Avoid contraction 73% Fig. 7. Effectiveness of various TextLint rules. a more in-depth discussion of tools that comment on writing style could be included.￿
  • 31. Future Work ‣ Natural Language Model ‣ Styles for Other Domains ‣ More Rules
  • 32. textlint.lukas-renggli.ch @textlint