Natural Language Checking with Program Checking Tools

1. Natural Language Checking with Program Checking Tools Fabrizio Perin, Lukas Renggli, Jorge Ressia

4. SyntaxStyle Programming Languages Parser Compiler

5. Program Checker Parser Compiler SyntaxStyle Programming Languages

6. Program Checker Parser Compiler SyntaxStyle Programming Languages Natural Languages

7. Program Checker Parser Compiler Spell Checker Grammar Checker SyntaxStyle Programming Languages Natural Languages

8. Program Checker TextLint Parser Compiler Spell Checker Grammar Checker SyntaxStyle Programming Languages Natural Languages

15. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are deﬁned in

16. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are deﬁned in .txt .html .tex

18. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in · The Markup models LATEX or HTML commands depending on the filetype of the input. All document elements answer the message text which returns a plain string representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes. 3 From Strings to Objects To build the high-level document model from the flat input string we use PetitParser [7]. PetitParser is a framework targeted at parsing formal languages (e.g., programming languages), but we employ it in this project to parse natural 4

19. raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes.

22. Other Language Models raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes.

25. Avoid "a lot" Avoid "a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI

26. Avoid "a lot" Avoid "a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI (self word: ‘somehow’)

27. Avoid "a lot" Avoid "a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI (self punctuation) , (self punctuation)

28. Avoid "a lot" Avoid "a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI (self wordIn: #('am' 'are' 'were' 'being' ... )) , (self separator star) , ((self wordSatisfying: [ :value | value endsWith: 'ed' ]) / (self wordIn: #('awoken' 'been' 'born' 'beat' ... )))

30. (2) we implement an object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are deﬁned in scientificPaperStyle := TLTextLintRule allRules -‐ TLWordRepetitionInParagraphRule

34. raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI

35. raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI

36. Validation

37. t t1 t2 t3 t4 Issues Words Fig. 6. Evolution of a paper from beginning to publication. 7.1 History of a Paper

38. Avoid‘currently’-74% Avoid‘certainly’-25% Avoid‘would’-24% Avoid‘factor’-20% Avoidlongparagraph-20% Avoid‘thus’-13% Avoid‘however’-10% Avoid‘case’-7% Avoid‘cannot’-5% Avoid‘could’-5% Avoidpassivevoice-4% Avoid‘insightful’-3% Avoid‘stuff’-3% Avoidjoinedsentences-1% Avoid‘astowhether’0% Avoid‘differentthan’0% Avoid‘doubtbut’0% Avoid‘eachandeveryone’0% Avoid‘enormity’0% Avoid‘helpbut’0% Avoid‘inregardsto’0% Avoid‘irregardless’0% Avoid‘regardedas’0% Avoid‘thefactis’0% Avoid‘thetruthis’0% Avoid‘truefact’0% Avoidcomma0% Avoidqualifier2% Avoid‘funny’5% Avoid‘oneofthemost’5% Avoid‘importantly’9% Avoidlongsentence10% Avoid‘an’10% Avoidcontinuouspunctuation15% Avoid‘interesting’17% Avoid‘requiredto’17% Avoid‘a’23% Avoid‘inorderto’23% Avoidcontinuouswordrepetition24% Avoid‘intermsof’24% Avoid‘somehow’25% Avoid‘helpto’27% Avoid‘thefactthat’32% Avoidwhitespace45% Avoid‘allowto’46% Avoid‘alot’55% Avoid‘thing’70% Avoidcontraction73% Fig.7.EffectivenessofvariousTextLintrules. amorein-depthdiscussionoftoolsthatcommentonwritingstylecouldbeincluded.

39. Future Work ‣ Natural Language Model ‣ Styles for Other Domains ‣ More Rules

41. textlint.lukas-renggli.ch @textlint

Natural Language Checking with Program Checking Tools

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to Natural Language Checking with Program Checking Tools

Similar to Natural Language Checking with Program Checking Tools (20)

More from Lukas Renggli

More from Lukas Renggli (19)

Recently uploaded

Recently uploaded (20)

Natural Language Checking with Program Checking Tools