3. Probabilistic parsing
03
• Dynamic algorithms
• Machine learning
• Training sets
(e.g. Stanford POS parser)
Extremely powerful, but
requires significant knowledge
of computational linguistics and
huge time investment so…
4. Rule-based pattern matching
04
1. There is a man on your left. T / F
If true, a man is on your left. Stop.
If false, proceed to 2.
2. There is a woman on your left. T / F
If true, there is a woman on your left. Stop.
If false, there is nobody on your left. Stop.
True/false statements
5. Rule-based pattern matching
05
Decision-tree algorithm
There is a man on your left.
There is a woman on your left.
No.Yes. STOP
Yes. STOP No.
There is nobody on your left. STOP
Assumptions:
1. Only adults are present
2. There is no third gender
6. Regular expressions (Regex)
06
There is a man. /bmanb/;
There is a woman. /bwomanb/;
The discrete words “man” and “woman” will
be identified, generating a “true” result.
Literal characters Special characters (meta & class)
(preceded by backslash )
b as in boy b boundary (of word)
s as in sun s space
I as in left I case insensitive (finds upper & lower case)
g as in green g greedy (finds all)
7. Regular expressions (Regex)
07
1. /help/
2. /bhelp/ – escape b – boundary
3. /bhelpb/
4. /bHelpb/
5. /bhelpb/i i – case insensitive
6. /bhelpb/g g – global (greedy)
7. /bhelpb/gi
8. /bgreyb/gi
9. /bgr(a|e)yb/gi | - pipe (Boolean “or”)
10. /bgr[ae]yb/gi
8. Regular expressions (Regex)
08
e.g. /bmaybeb/gi;
– escape (from normal characters)
i – case insensitive
b – boundary
g – greedy (global)
1. I think that maybe he can understand. T/F
2. He may be able to understand T/F
3. Maybe, he can understand. T/F
4. Maybelline is a company name. T/F
5. Maybe, he said maybe. T/F
9. 09
Proofreading
Target
• One of the + singular noun
Regex
• /bone of theb/gi;
Execute
• Check that the phrase one of the
is followed by a plural noun
Eg. 1
10. 10
Difficult-to-read tags
Introduction Purpose Method Results Discussion
<segment features='problem;introduction;rhetorical_moves' state='active'>We
address the problem of model-based object recognition.</segment> <segment
features='purpose;rhetorical_moves' state='active'>Our aim is to localize and
recognize road vehicles from monocular images or videos in calibrated traffic
scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A
3-D deformable vehicle model with 12 shape parameters is set up as prior
information, and its pose is determined by three parameters, which are its position
on the ground plane and its orientation about the vertical axis under ground-plane
constraints.</segment> <segment features='purpose;rhetorical_moves'
state='active'>An efficient local gradient-based method is proposed to evaluate the
fitness between the projection of the vehicle model and image data, which is
combined into a novel evolutionary computing framework to estimate the 12 shape
parameters and three pose parameters by iterative evolution.</segment> <segment
features='background;introduction;rhetorical_moves' state='active'>The recovery of
pose parameters achieves vehicle localization, whereas the shape parameters are
used for vehicle recognition.</segment> <segment
features='method;rhetorical_moves' state='active'>Numerous experiments are
Eg. 2
11. 11
Difficult-to-read tags
Introduction Purpose Method Results Discussion
<segment features='problem;introduction;rhetorical_moves' state='active'>We
address the problem of model-based object recognition.</segment> <segment
features='purpose;rhetorical_moves' state='active'>Our aim is to localize and
recognize road vehicles from monocular images or videos in calibrated traffic
scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A
3-D deformable vehicle model with 12 shape parameters is set up as prior
information, and its pose is determined by three parameters, which are its position
on the ground plane and its orientation about the vertical axis under ground-plane
constraints.</segment> <segment features='purpose;rhetorical_moves'
state='active'>An efficient local gradient-based method is proposed to evaluate the
fitness between the projection of the vehicle model and image data, which is
combined into a novel evolutionary computing framework to estimate the 12 shape
parameters and three pose parameters by iterative evolution.</segment> <segment
features='background;introduction;rhetorical_moves' state='active'>The recovery of
pose parameters achieves vehicle localization, whereas the shape parameters are
used for vehicle recognition.</segment> <segment
features='method;rhetorical_moves' state='active'>Numerous experiments are
Eg. 2
13. 13
Regular “-ed”
False positives:
• learned /d/ /id/
Pron Preceeding sound Potential regex
/id/ d, t /[dt]edb/gi;
/t/ voiceless consonants /[sf]edb/gi;
/d/ voiced consonants /[zv]edb/gi;
/d/ Vowel /(ow|i|ay)edb/gi;
Pronunciation of “ed” is dictated by the sound of the preceeding letter(s).
| – Boolean “or”
so x|y means either x or y
d|ted means d or ted but by adding brackets
(d|t)ed means ded or ted
Eg. 3
14. 14
Pronunciation of “th-”
Pron Feature Potential regex
/𝜽/ Voiceless initial th /bth/gi;
- This is the default.
/t/ th pronounced as t /bthomas|thames|thyme/gi;
- This deals with special cases
/𝜹/ Voiced initial th /btha(n|t) b/gi;
/bthe(m|y) b/gi;
/bth(eir|ere|ese|) b/gi;
/bthisb/gi;
/btho(se|ugh) b/gi;
/bthusb/gi;
- These covert /𝜽/ to /𝜹/
Pronunciation of “th” can be predicted by the rule that for function words
the initial th is pronounced as a voiced sound. These regex used in
sequence can annotate for many “th” sounds.
Eg. 4