NTHU Natural Language Processing Term Project Intro

NLP Lab Term Project
Preposition Error Correction
羅右鈞
程尚謙
李巧雯

Goals of the project
• Our system allows users to input a English sentence
without tags around possible errors, then it can
automatically detect and correct preposition errors.
2

Functions of the project
• Automatically detect and correct preposition errors
• Replacement preposition correction (RT)
• price for the tickets → price of the tickets
• Unwanted preposition correction (UT)
• discuss about the issue → discuss the issue
• Missing preposition correction (MT)
• listen music → listen to music
• Give some examples of each corrections
3

Work Flow
4
Extract Patterns
VOA Corpus
Detect Error Correct Error
Generate related
examples
Linggle API
Patterns
Input Output

Methodology
• Extract Patterns From VOA Corpus
• For each sentence, generate n-grams containing
preposition. (n = 3,4,5)
• Keep the n-grams which start and end with content
words or preposition and do not contain symbol.
5

Methodology
• Transform the n-grams into part-of-speech n-grams
and group by part-of-speech n-grams. Finally, the n-
grams with higher frequency are patterns.
• Ex. N PREP DT N, V PREP DT N, V PREP ADJ N, …
• For RT and UT, patterns would be (pattern, index of
preposition)
• Ex. (V PREP ADJ N, 1)
• For MT, patterns would be (pattern, a set of index of
preposition should be inserted)
• Ex. (V ADJ N , set([1, 2]))
6

Methodology
• Preposition Error Detection
• When user input a sentence, our system will find the
n-grams probably containing preposition error by
means of matching these pattern to it with the same
part-of-speech n-gram.
• If the n-grams overlap, give priority to the n-gram with
longer length.
• Ex. “We have discussed about the issue a lot of times.”
• ('have', 'V'), ('discussed', 'V'), ('about', 'PREP'), ('the', 'DT'), ('issue', 'N')
• ('discussed', 'V'), ('about', 'PREP'), ('the', 'DT'), ('issue', 'N')
• ('have', 'V'), ('discussed', 'V'), ('about', 'PREP')
• ('about', 'PREP'), ('the', 'DT'), ('issue', 'N')
7

Methodology
• Automatic Preposition Error Correction
• Transform the n-gram into the query for Linggle to get
possible corrections
8
Linggle API
Generalize
the query
Output
1. There are results.
2. No result after generalize
the query two times.
Query
No result

Methodology
• Generating related sentences of correction
• Use the phrase of correction which returns from Linggle
• EX. [('know', 'V'), ('about', 'PREP'), ('the', 'DT'), ('weather', 'N')]
• Convert the phrase into n-grams which are made up of
content words such as noun, verb, adj. ….
• EX. ['know', 'about'], ['the', 'weather'], ['know', 'about', 'the'],
['about', 'the', 'weather'], ['know', 'about', 'the', 'weather']
• Map those n-grams against each sentence in the corpus
and calculates scores
• Display sentences with higher scores as the examples of
evidence for each correction 9

Data and resources
• VOA corpus
• Linggle API
10

System Architecture
• Using modern web tech and client-server architecture to
build our project.
• Client part
• React ( Javascript UI library supported by Facebook )
• Server part
• Run Node as web server and all business logics will be
implemented with Python (Use Flask to build Restful API)
11

References
• Ting hui Kao, Yu-Wei Chang, Hsun wen Chiu, Tzu-Hsi Yen, Joanne
Boisson, Jian-Cheng Wu, and Jason S. Chang. 2013. CoNLL-2013
Shared Task: Grammatical Error Correction NTHU System Description.
• Nitin Madnani and Aoife Cahill. 2014. An Explicit Feedback System
for Preposition Errors based on Wikipedia Revisions.
14

NTHU Natural Language Processing Term Project Intro

Recommended

Recommended

More Related Content

Similar to NTHU Natural Language Processing Term Project Intro

Similar to NTHU Natural Language Processing Term Project Intro (20)

Recently uploaded

Recently uploaded (20)

NTHU Natural Language Processing Term Project Intro