Successfully reported this slideshow.
Upcoming SlideShare
×

# Pedagogic application of regular expressions

493 views

Published on

Using regular expressions in online langage learning tools to enable learners to identify particular features and provide feedback on the features as necessary, e.g. find errors and provide suggestions on how to rewrite them

Published in: Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Pedagogic application of regular expressions

1. 1. John Blake Japan Advanced Institute of Science and Technology Pedagogic application of regular expressions /bbetweenW+(?:w+W+){1,2}?tob/gi;
2. 2. Overview 02 Introduction • Probabilistic parsing • Rule-based pattern matching • Regular expressions Pedagogic applications • Modality detector • Error detector • Other: tagged corpora, pronunciation of “ed”
3. 3. Probabilistic parsing 03 • Dynamic algorithms • Machine learning • Training sets (e.g. Stanford POS parser) Extremely powerful, but requires significant knowledge of computational linguistics and huge time investment so…
4. 4. Rule-based pattern matching 04 1. There is a man on your left. T / F If true, a man is on your left. Stop. If false, proceed to 2. 2. There is a woman on your left. T / F If true, there is a woman on your left. Stop. If false, there is nobody on your left. Stop. True/false statements
5. 5. Rule-based pattern matching 05 Decision-tree algorithm There is a man on your left. There is a woman on your left. No.Yes. STOP Yes. STOP No. There is nobody on your left. STOP Assumptions: 1. Only adults are present 2. There is no third gender
6. 6. Rule-based pattern matching 06 There is a man. /bmanb/; There is a woman. /bwomanb/; Regular expressions (regexp|regex) The discrete words “man” and “woman” will be identified, generating a “true” result.
7. 7. Regular expressions (Regex) 07 e.g. /bmaybeb/gi; – escape (from normal characters) i – case insensitive b – boundary g – greedy 1. I think that maybe he can understand. T/F 2. He may be able to understand T/F 3. Maybe, he can understand. T/F 4. Maybelline is a company name. T/F 5. Maybe, he said maybe. T/F
8. 8. Pedagogic applications 08 Modality detector Online error detectors - Common error detector (Morrall, 2000-14) - Corpus-based error detector (Blake, 2012-15) Other applications - Annotation highlighter - Ideas for pronunciation, grammar and vocab
9. 9. 09 Situation App. 1 Students graduate students, researchers Aim write research articles Problems lack of familiarity of genre, lack of language, lack of content.
10. 10. 10 Tentative language & approximation Type Examples Modal verbs may, might, would, can Lexical verbs seem, appear, suggest Modal adverbs perhaps, probably, possibly, Modal adjectives probable, possible, uncertain Modal nouns assumption, claim, possibility # Approximation 49% Almost a half, nearly 50%, less than 1 in 2 App. 1
11. 11. 11 Material mismatch Students from different faculties studying tentative language (hedging) and approximation in academic writing use generic materials prepared by teacher. App. 1
12. 12. 12 Lack of face validity Some students do not want to “waste time” dealing with materials not appropriate to their major. They expect materials tailored to their exact needs. App. 1
13. 13. 13 Solution: Modality detector App. 1
14. 14. 14 Solution: Modality detector Individualized instruction • Student selects appropriate text • Student inputs relevant text • Regex identifies hedges & approximation • Execute command labels & highlights App. 1
15. 15. 15 Warning: False positives More complex regex reduce false positives App. 1
16. 16. 16 Piles of unmarked homework Responding to written work takes too much time, and is repetitive since many students make the same surface-level mistakes. App. 2
17. 17. 17 No time to respond Teachers are expected to: • Identify the location of errors • Explain the errors (if necessary) • Correct the errors (if necessary) All of which take lots of time. App. 2
18. 18. 18 Solution: Error detector Identification Student inputs own work Regex identifies expected errors Explanation Execute command selects and displays prepared explanation Correction Student corrects work and submits improved version App. 2
19. 19. 19 Error classification App. 2 Type Description Accuracy factual and language errors Brevity too many words Clarity vague or ambiguous terms Objectivity emotive language Formality abbreviations, contractions, & informal terms An ethnographic survey of the literature on writing scientific research articles revealed five key criteria (Blake & Blake, 2015)
20. 20. 20 App. 2
21. 21. 21 Specific example Error • One of the + singular noun Regex • /bone of theb/gi; Execute • Check that the phrase one of the is followed by a plural noun App. 2
22. 22. 22 False positives harnessed in learning process by forcing student engagement App. 2
23. 23. 23 Difficult-to-read tags Introduction Purpose Method Results Discussion <segment features='problem;introduction;rhetorical_moves' state='active'>We address the problem of model-based object recognition.</segment> <segment features='purpose;rhetorical_moves' state='active'>Our aim is to localize and recognize road vehicles from monocular images or videos in calibrated traffic scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A 3-D deformable vehicle model with 12 shape parameters is set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane and its orientation about the vertical axis under ground-plane constraints.</segment> <segment features='purpose;rhetorical_moves' state='active'>An efficient local gradient-based method is proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative evolution.</segment> <segment features='background;introduction;rhetorical_moves' state='active'>The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for vehicle recognition.</segment> <segment features='method;rhetorical_moves' state='active'>Numerous experiments are App. 3
24. 24. 24 Difficult-to-read tags Introduction Purpose Method Results Discussion <segment features='problem;introduction;rhetorical_moves' state='active'>We address the problem of model-based object recognition.</segment> <segment features='purpose;rhetorical_moves' state='active'>Our aim is to localize and recognize road vehicles from monocular images or videos in calibrated traffic scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A 3-D deformable vehicle model with 12 shape parameters is set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane and its orientation about the vertical axis under ground-plane constraints.</segment> <segment features='purpose;rhetorical_moves' state='active'>An efficient local gradient-based method is proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative evolution.</segment> <segment features='background;introduction;rhetorical_moves' state='active'>The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for vehicle recognition.</segment> <segment features='method;rhetorical_moves' state='active'>Numerous experiments are App. 3
25. 25. 25 Easy-to-read tags Introduction Purpose Method Results Discussion http://www.jaist.ac.jp/~johnb/Movehighlighter.html App. 3
26. 26. 26 Ideas for you and your students Pronunciation: Regular “ed” • Regular “ed” /t/, /d/, /id/ • th [voiced or voiceless] Grammar: • Tenses: e.g. perfect continuous: been + ing • Quantifiers : [U] much, little; [C] many, few; [U/C] lots of , a lot of Vocabulary: • Colours: red, blue crimson red, cobalt blue, • Body parts: hand, eyes, leg hand out, eye up, leg it
27. 27. 27 Regular “ed” False positives: • learned /d/ /id/ Pron Preceeding sound Potential regex /id/ d, t /(d|t)edb/gi; /t/ voiceless consonants /(s|f)edb/gi; /d/ voiced consonants /(z|v)edb/gi; /d/ Vowel /(ow|i|ay)edb/gi; Pronunciation of “ed” is dictated by the sound of the preceeding letter(s). | – Boolean “or” so x|y means either x or y d|ted means d or ted but by adding brackets (d|t)ed means ded or ted
28. 28. 28 Pronunciation of “th” Pron Feature Potential regex /𝜹/ Voiced initial th /btha(n|t|) b/gi; /bthe(b|ir|m|re|se|y) b/gi; /bthisb/gi; /btho(se|ugh|) b/gi; /bthusb/gi; /𝜽/ Voiceless initial th /bth/gi; /t/ th pronounced as t /bthomas|thames|thyme/gi; Pronunciation of “th” can be predicted by the law that for function words the initial th is pronounced as a voiced sound.
29. 29. References 29 Blake, J. (2012, November 28-30). Corpus-based academic written error detector. Conference proceedings of the 20th International Conference on Computers in Education. Nanyang Technological University, Singapore. Blake, X. and Blake, J. (2015, January 29-31). Academic literacy: Mentor and mentee perspectives. Poster presented at 35th International Conference of ThaiTESOL, Bangkok, Thailand. Morrall, A. (2000-2014). Common Error Detector. [Online tool] http://www2.elc.polyu.edu.hk/cill/errordetector.htm
30. 30. Any questions, comments or suggestions? johnb@jaist.ac.jp