Natural Language Checking
with Program Checking Tools
Fabrizio Perin, Lukas Renggli, Jorge Ressia
SyntaxStyle
Programming
Languages
Parser
Compiler
Program
Checker
Parser
Compiler
SyntaxStyle
Programming
Languages
Program
Checker
Parser
Compiler
SyntaxStyle
Programming
Languages
Natural
Languages
Program
Checker
Parser
Compiler
Spell Checker
Grammar
Checker
SyntaxStyle
Programming
Languages
Natural
Languages
Program
Checker
TextLint
Parser
Compiler
Spell Checker
Grammar
Checker
SyntaxStyle
Programming
Languages
Natural
Languages
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
.txt
.html
.tex
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
· The Markup models LATEX or HTML commands depending on the filetype
of the input.
All document elements answer the message text which returns a plain string
representation of the modeled text entity ignoring markup tokens. Furthermore
all elements know their source interval in the document. The relationship among
the elements in the model are depicted in Figure 3.
Element
text()
interval()
Document Paragraph Sentence Phrase
1 * 1 * 1 *
SyntacticElement
text()
interval()
Word Punctuation Whitespace Markup
1
*
1
*
Fig. 3. The TextLint model and the relationships between its classes.
3 From Strings to Objects
To build the high-level document model from the flat input string we use
PetitParser [7]. PetitParser is a framework targeted at parsing formal languages
(e.g., programming languages), but we employ it in this project to parse natural
4
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
representation of the modeled text entity ignoring markup tokens. Furthermore
all elements know their source interval in the document. The relationship among
the elements in the model are depicted in Figure 3.
Element
text()
interval()
Document Paragraph Sentence Phrase
1 * 1 * 1 *
SyntacticElement
text()
interval()
Word Punctuation Whitespace Markup
1
*
1
*
Fig. 3. The TextLint model and the relationships between its classes.
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
representation of the modeled text entity ignoring markup tokens. Furthermore
all elements know their source interval in the document. The relationship among
the elements in the model are depicted in Figure 3.
Element
text()
interval()
Document Paragraph Sentence Phrase
1 * 1 * 1 *
SyntacticElement
text()
interval()
Word Punctuation Whitespace Markup
1
*
1
*
Fig. 3. The TextLint model and the relationships between its classes.
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
representation of the modeled text entity ignoring markup tokens. Furthermore
all elements know their source interval in the document. The relationship among
the elements in the model are depicted in Figure 3.
Element
text()
interval()
Document Paragraph Sentence Phrase
1 * 1 * 1 *
SyntacticElement
text()
interval()
Word Punctuation Whitespace Markup
1
*
1
*
Fig. 3. The TextLint model and the relationships between its classes.
Other Language Models
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
representation of the modeled text entity ignoring markup tokens. Furthermore
all elements know their source interval in the document. The relationship among
the elements in the model are depicted in Figure 3.
Element
text()
interval()
Document Paragraph Sentence Phrase
1 * 1 * 1 *
SyntacticElement
text()
interval()
Word Punctuation Whitespace Markup
1
*
1
*
Fig. 3. The TextLint model and the relationships between its classes.
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
Avoid "a lot"
Avoid "a"
Avoid "allow to"
Avoid "an"
Avoid "as to whether"
Avoid "can not"
Avoid "case"
Avoid "certainly"
Avoid "could"
Avoid "currently"
Avoid "different than"
Avoid "doubt but"
Avoid "each and every one"
Avoid "enormity"
Avoid "factor"
Avoid "funny"
Avoid "help but"
Avoid "help to"
Avoid "however"
Avoid "importantly"
Avoid "in order to"
Avoid "in regards to"
Avoid "in terms of"
Avoid "insightful"
Avoid "interesting"
Avoid "irregardless"
Avoid "one of the most"
Avoid "regarded as"
Avoid "required to"
Avoid "somehow"
Avoid "stuff"
Avoid "the fact is"
Avoid "the fact that"
Avoid "the truth is"
Avoid "thing"
Avoid "thus"
Avoid "true fact"
Avoid "would"
Avoid comma
Avoid connectors repetition
Avoid continuous punctuation
Avoid continuous word repetition
Avoid contraction
Avoid joined sentences
Avoid long paragraph
Avoid long sentence
Avoid passive voice
Avoid qualifier
Avoid whitespace
Avoid word repetition
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Avoid "a lot"
Avoid "a"
Avoid "allow to"
Avoid "an"
Avoid "as to whether"
Avoid "can not"
Avoid "case"
Avoid "certainly"
Avoid "could"
Avoid "currently"
Avoid "different than"
Avoid "doubt but"
Avoid "each and every one"
Avoid "enormity"
Avoid "factor"
Avoid "funny"
Avoid "help but"
Avoid "help to"
Avoid "however"
Avoid "importantly"
Avoid "in order to"
Avoid "in regards to"
Avoid "in terms of"
Avoid "insightful"
Avoid "interesting"
Avoid "irregardless"
Avoid "one of the most"
Avoid "regarded as"
Avoid "required to"
Avoid "somehow"
Avoid "stuff"
Avoid "the fact is"
Avoid "the fact that"
Avoid "the truth is"
Avoid "thing"
Avoid "thus"
Avoid "true fact"
Avoid "would"
Avoid comma
Avoid connectors repetition
Avoid continuous punctuation
Avoid continuous word repetition
Avoid contraction
Avoid joined sentences
Avoid long paragraph
Avoid long sentence
Avoid passive voice
Avoid qualifier
Avoid whitespace
Avoid word repetition
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
(self	
  word:	
  ‘somehow’)
Avoid "a lot"
Avoid "a"
Avoid "allow to"
Avoid "an"
Avoid "as to whether"
Avoid "can not"
Avoid "case"
Avoid "certainly"
Avoid "could"
Avoid "currently"
Avoid "different than"
Avoid "doubt but"
Avoid "each and every one"
Avoid "enormity"
Avoid "factor"
Avoid "funny"
Avoid "help but"
Avoid "help to"
Avoid "however"
Avoid "importantly"
Avoid "in order to"
Avoid "in regards to"
Avoid "in terms of"
Avoid "insightful"
Avoid "interesting"
Avoid "irregardless"
Avoid "one of the most"
Avoid "regarded as"
Avoid "required to"
Avoid "somehow"
Avoid "stuff"
Avoid "the fact is"
Avoid "the fact that"
Avoid "the truth is"
Avoid "thing"
Avoid "thus"
Avoid "true fact"
Avoid "would"
Avoid comma
Avoid connectors repetition
Avoid continuous punctuation
Avoid continuous word repetition
Avoid contraction
Avoid joined sentences
Avoid long paragraph
Avoid long sentence
Avoid passive voice
Avoid qualifier
Avoid whitespace
Avoid word repetition
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
(self	
  punctuation)	
  ,	
  (self	
  punctuation)
Avoid "a lot"
Avoid "a"
Avoid "allow to"
Avoid "an"
Avoid "as to whether"
Avoid "can not"
Avoid "case"
Avoid "certainly"
Avoid "could"
Avoid "currently"
Avoid "different than"
Avoid "doubt but"
Avoid "each and every one"
Avoid "enormity"
Avoid "factor"
Avoid "funny"
Avoid "help but"
Avoid "help to"
Avoid "however"
Avoid "importantly"
Avoid "in order to"
Avoid "in regards to"
Avoid "in terms of"
Avoid "insightful"
Avoid "interesting"
Avoid "irregardless"
Avoid "one of the most"
Avoid "regarded as"
Avoid "required to"
Avoid "somehow"
Avoid "stuff"
Avoid "the fact is"
Avoid "the fact that"
Avoid "the truth is"
Avoid "thing"
Avoid "thus"
Avoid "true fact"
Avoid "would"
Avoid comma
Avoid connectors repetition
Avoid continuous punctuation
Avoid continuous word repetition
Avoid contraction
Avoid joined sentences
Avoid long paragraph
Avoid long sentence
Avoid passive voice
Avoid qualifier
Avoid whitespace
Avoid word repetition
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
(self	
  wordIn:	
  #('am'	
  'are'	
  'were'	
  'being'	
  ...	
  ))	
  ,	
  
(self	
  separator	
  star)	
  ,	
  
((self	
  wordSatisfying:	
  [	
  :value	
  |	
  value	
  endsWith:	
  'ed'	
  ])	
  /	
  
	
  (self	
  wordIn:	
  #('awoken'	
  'been'	
  'born'	
  'beat'	
  ...	
  )))
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
scientificPaperStyle	
  :=	
  TLTextLintRule	
  allRules
-­‐	
  TLWordRepetitionInParagraphRule
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
(2) we implement an object-oriented model used to represent natural text in
Smalltalk;
(3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
(4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Fig. 2. Data Flow through TextLint.
Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces
the natural text model of TextLint and Section 3 details how text documents
are parsed and the model is composed. Section 4 presents the rules which
model the stylistic checks. Section 5 describes how stylistic rules are defined in
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
raries: For parsing natural languages we use PetitParser [7], a flexible
rsing framework that makes it easy to define parsers and to dynamically
use, compose, transform and extend grammars. Furthermore, we use Glamour
, an engine for scripting browsers. Glamour reifies the notion of a browser
d defines the flow of data between different user interface widgets.
he contributions of this paper are:
1) we apply ideas from program checking to the domain of natural language;
2) we implement an object-oriented model used to represent natural text in
Smalltalk;
3) we demonstrate a pattern matcher for the detection of style issues in
natural language; and
4) we demonstrate a graphical user interface that presents and explains the
problems detected by the tool.
Text Parsing Model Validation Failures
Rules Styles
GUI
Validation
t
t1 t2 t3 t4
Issues
Words
Fig. 6. Evolution of a paper from beginning to publication.
7.1 History of a Paper
Avoid‘currently’-74%
Avoid‘certainly’-25%
Avoid‘would’-24%
Avoid‘factor’-20%
Avoidlongparagraph-20%
Avoid‘thus’-13%
Avoid‘however’-10%
Avoid‘case’-7%
Avoid‘cannot’-5%
Avoid‘could’-5%
Avoidpassivevoice-4%
Avoid‘insightful’-3%
Avoid‘stuff’-3%
Avoidjoinedsentences-1%
Avoid‘astowhether’0%
Avoid‘differentthan’0%
Avoid‘doubtbut’0%
Avoid‘eachandeveryone’0%
Avoid‘enormity’0%
Avoid‘helpbut’0%
Avoid‘inregardsto’0%
Avoid‘irregardless’0%
Avoid‘regardedas’0%
Avoid‘thefactis’0%
Avoid‘thetruthis’0%
Avoid‘truefact’0%
Avoidcomma0%
Avoidqualifier2%
Avoid‘funny’5%
Avoid‘oneofthemost’5%
Avoid‘importantly’9%
Avoidlongsentence10%
Avoid‘an’10%
Avoidcontinuouspunctuation15%
Avoid‘interesting’17%
Avoid‘requiredto’17%
Avoid‘a’23%
Avoid‘inorderto’23%
Avoidcontinuouswordrepetition24%
Avoid‘intermsof’24%
Avoid‘somehow’25%
Avoid‘helpto’27%
Avoid‘thefactthat’32%
Avoidwhitespace45%
Avoid‘allowto’46%
Avoid‘alot’55%
Avoid‘thing’70%
Avoidcontraction73%
Fig.7.EffectivenessofvariousTextLintrules.
amorein-depthdiscussionoftoolsthatcommentonwritingstylecouldbeincluded.
Future Work
‣ Natural Language Model
‣ Styles for Other Domains
‣ More Rules
textlint.lukas-renggli.ch
@textlint

Natural Language Checking with Program Checking Tools

  • 1.
    Natural Language Checking withProgram Checking Tools Fabrizio Perin, Lukas Renggli, Jorge Ressia
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 15.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 16.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in .txt .html .tex
  • 17.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 18.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in · The Markup models LATEX or HTML commands depending on the filetype of the input. All document elements answer the message text which returns a plain string representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes. 3 From Strings to Objects To build the high-level document model from the flat input string we use PetitParser [7]. PetitParser is a framework targeted at parsing formal languages (e.g., programming languages), but we employ it in this project to parse natural 4
  • 19.
    raries: For parsingnatural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes.
  • 20.
    raries: For parsingnatural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes.
  • 21.
    raries: For parsingnatural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes.
  • 22.
    Other Language Models raries:For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI representation of the modeled text entity ignoring markup tokens. Furthermore all elements know their source interval in the document. The relationship among the elements in the model are depicted in Figure 3. Element text() interval() Document Paragraph Sentence Phrase 1 * 1 * 1 * SyntacticElement text() interval() Word Punctuation Whitespace Markup 1 * 1 * Fig. 3. The TextLint model and the relationships between its classes.
  • 23.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 24.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 25.
    Avoid "a lot" Avoid"a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 26.
    Avoid "a lot" Avoid"a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI (self  word:  ‘somehow’)
  • 27.
    Avoid "a lot" Avoid"a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI (self  punctuation)  ,  (self  punctuation)
  • 28.
    Avoid "a lot" Avoid"a" Avoid "allow to" Avoid "an" Avoid "as to whether" Avoid "can not" Avoid "case" Avoid "certainly" Avoid "could" Avoid "currently" Avoid "different than" Avoid "doubt but" Avoid "each and every one" Avoid "enormity" Avoid "factor" Avoid "funny" Avoid "help but" Avoid "help to" Avoid "however" Avoid "importantly" Avoid "in order to" Avoid "in regards to" Avoid "in terms of" Avoid "insightful" Avoid "interesting" Avoid "irregardless" Avoid "one of the most" Avoid "regarded as" Avoid "required to" Avoid "somehow" Avoid "stuff" Avoid "the fact is" Avoid "the fact that" Avoid "the truth is" Avoid "thing" Avoid "thus" Avoid "true fact" Avoid "would" Avoid comma Avoid connectors repetition Avoid continuous punctuation Avoid continuous word repetition Avoid contraction Avoid joined sentences Avoid long paragraph Avoid long sentence Avoid passive voice Avoid qualifier Avoid whitespace Avoid word repetition raries: For parsing natural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI (self  wordIn:  #('am'  'are'  'were'  'being'  ...  ))  ,   (self  separator  star)  ,   ((self  wordSatisfying:  [  :value  |  value  endsWith:  'ed'  ])  /    (self  wordIn:  #('awoken'  'been'  'born'  'beat'  ...  )))
  • 29.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 30.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in scientificPaperStyle  :=  TLTextLintRule  allRules -­‐  TLWordRepetitionInParagraphRule
  • 31.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 32.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 33.
    (2) we implementan object-oriented model used to represent natural text in Smalltalk; (3) we demonstrate a pattern matcher for the detection of style issues in natural language; and (4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI Fig. 2. Data Flow through TextLint. Figure 2 gives an overview of the architecture of TextLint. Section 2 introduces the natural text model of TextLint and Section 3 details how text documents are parsed and the model is composed. Section 4 presents the rules which model the stylistic checks. Section 5 describes how stylistic rules are defined in
  • 34.
    raries: For parsingnatural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 35.
    raries: For parsingnatural languages we use PetitParser [7], a flexible rsing framework that makes it easy to define parsers and to dynamically use, compose, transform and extend grammars. Furthermore, we use Glamour , an engine for scripting browsers. Glamour reifies the notion of a browser d defines the flow of data between different user interface widgets. he contributions of this paper are: 1) we apply ideas from program checking to the domain of natural language; 2) we implement an object-oriented model used to represent natural text in Smalltalk; 3) we demonstrate a pattern matcher for the detection of style issues in natural language; and 4) we demonstrate a graphical user interface that presents and explains the problems detected by the tool. Text Parsing Model Validation Failures Rules Styles GUI
  • 36.
  • 37.
    t t1 t2 t3t4 Issues Words Fig. 6. Evolution of a paper from beginning to publication. 7.1 History of a Paper
  • 38.
    Avoid‘currently’-74% Avoid‘certainly’-25% Avoid‘would’-24% Avoid‘factor’-20% Avoidlongparagraph-20% Avoid‘thus’-13% Avoid‘however’-10% Avoid‘case’-7% Avoid‘cannot’-5% Avoid‘could’-5% Avoidpassivevoice-4% Avoid‘insightful’-3% Avoid‘stuff’-3% Avoidjoinedsentences-1% Avoid‘astowhether’0% Avoid‘differentthan’0% Avoid‘doubtbut’0% Avoid‘eachandeveryone’0% Avoid‘enormity’0% Avoid‘helpbut’0% Avoid‘inregardsto’0% Avoid‘irregardless’0% Avoid‘regardedas’0% Avoid‘thefactis’0% Avoid‘thetruthis’0% Avoid‘truefact’0% Avoidcomma0% Avoidqualifier2% Avoid‘funny’5% Avoid‘oneofthemost’5% Avoid‘importantly’9% Avoidlongsentence10% Avoid‘an’10% Avoidcontinuouspunctuation15% Avoid‘interesting’17% Avoid‘requiredto’17% Avoid‘a’23% Avoid‘inorderto’23% Avoidcontinuouswordrepetition24% Avoid‘intermsof’24% Avoid‘somehow’25% Avoid‘helpto’27% Avoid‘thefactthat’32% Avoidwhitespace45% Avoid‘allowto’46% Avoid‘alot’55% Avoid‘thing’70% Avoidcontraction73% Fig.7.EffectivenessofvariousTextLintrules. amorein-depthdiscussionoftoolsthatcommentonwritingstylecouldbeincluded.
  • 39.
    Future Work ‣ NaturalLanguage Model ‣ Styles for Other Domains ‣ More Rules
  • 41.