Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Personalised statistical writing analysis

743 views

Published on

Powerpoint slides from JAECS, 2013, Sendai, Japan.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Personalised statistical writing analysis

  1. 1. John Blake Japan Advanced Institute of Science and Technology Personalised statistical writing analysis
  2. 2. Overview • Introduction – context, impetus – focus, process • Five aspects – statistical analysis • Personalised writing analysis – sample extracts • Interview survey • Future direction 2
  3. 3. Context *Proofreading for faculty *Writing assistance for PhD candidates 3 70%  50% science
  4. 4. Impetus 21 email exchange on various points, including: • “minor scary incident”で統一したいと思います。 • “near miss”“ではなく”minor scary incident”で統一し たいと思います。 • 提出先に聞きました。near accidentというのが一 般的なようです。これで修正しました。 • “near-miss incident”に変更しました。 ….先生から 指示に従うように提案されました。 • Near miss incident → Near miss incidents に全て修正 しました。 4 From one research article (RA) minor scary incident  near-miss incident ヒヤリ・ ハット
  5. 5. Focus Enable research articles meet generic expectations of: • Accuracy by being factually correct • Clarity by avoiding ambiguity • Formality by adopting appropriate style 5 rhetorical structure, logic, originality, flawed method, etc.= important, but…
  6. 6. Five aspects of generic integrity 1. Vocabulary fit 2. Readability 3. Word type balance 4. Style and usage 5. Lexicogrammatical errors Summary statistics 6 Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. London: Longman.
  7. 7. Process for each research article •Create target corpus (TC) •Analyse RA and TC •Identify errors in RA •Compile ratios where poss. •Create feedback document 7
  8. 8. Five aspects 8 • keyness of RA & TCVocabulary fit • Readability statistics of RA & TCReadability • Ratio of GSL, AWL and off-list for RA & TC Word type balance • Markedness, modality, register Style and usage • Vocabulary & grammatical errors Lexico- grammar
  9. 9. 1. Vocabulary fit Scott & Tribble (2006, p.56) ``keyness [is what a text] boils down to`` Hyland (2011) paper-journal fit 9 Hyland, K. (2011). Welcome to the Machine: Thoughts on writing for scholarly publication. Journal of Second Language Teaching and Research, 1 (1), 58–68. Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam, Philadelphia: John Benjamins. TC firm knowledge market international foreign performance research variables markets countries export country relationship business model RA organizational TMSs coordination DOPPO expertise interactions mechanisms BLOCK employee leader team coordinate informal information management Prepared using AntConc 3.2.4w with Brown Corpus as reference TC = 243 RAs, c. 2.1 million words RA = 10k words
  10. 10. 10 Prepared using Wordle with RA, 10k words TC firm knowledge market international foreign performance research variables markets countries export country relationship business model RA
  11. 11. 2. Readability 11 0 5 10 15 20 25 Gunning fog index Flesch Kincaid grade level Mean sentence length Draft Target Bogert, J. (1985). In Defense of the Fog Index. Business Communication Quarterly, 48 (2), 9-12. Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation. English Text Construction, 1 (1), 41-61. McClure, G. (1987). Readability Formulas: Useful or Useless, Professional Communication, IEEE Transactions on, 30 (1), 12-15. Bogert (1985) & McClure (1987) – factors affecting readability Gilquin & Paquot (2008) - Learner academic writing – rather `chatty` Research articles tend to have a higher reading difficulty.
  12. 12. 3. Word type balance Levels academic text 1st 1000 73.5% 2nd 1000 4.6% AWL 8.5% Other 13.3% 12 First 2k words 69% AWL 16% Off-list 15% Cobb , T. (2013). Web Vocabprofile. www.lextutor.ca/vp/ Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press. Used in EAP courses at PolyU and CityU in Hong Kong Nation (2001,p.17) RA analysed by Web VP classic v4 (Cobb, 2013)
  13. 13. 4. Style and usage errors 13 Marked usage Ratio Suggestion People provide first 0:9 COCA People first provide Hyland (1998) – hedging Robb (2003) – “Google as a quick ‘n’ dirty corpus tool” Hyland, K. (1998). Hedging in scientific research articles. Amsterdam : John Benjamins Robb, T. (2003). Google as a quick ‘n’ dirty corpus tool. TESL-EJ, 7(2). Corpora: IS, KS, MS, BNC , COCA , WAC
  14. 14. 5. Lexicogrammatical errors 14 Grammatical or vocabulary errors Incorrect form Correct form Comment 1 Taking account differences Taking account of differences preposition 2 this study answers to two questions this study answers two questions answer to s.b. / answer s.th. 3 former employee a former employee employee [singular] 4 to participate to this study to participate in this study collocation (participate in) 5 emphasis is given on XX emphasis is placed on XX collocation (give to / place on) 6 for being responsible to be responsible general vs. specific purpose
  15. 15. Summary statistics 15 Based on requests for simple to understand evaluation  Caveat: subjective evaluations disguised as statistics
  16. 16. Personalised writing analysis 16 Selected statistics for subject 1 Readability Yours Target Word type balance Yours % Target % Gunning fog index 13.2 13.2 1k words 68.58 74.39 Mean sentence length 15.49 19.37 2K words 6.69 5.29 Mean number of clauses /sentence 1.19 1.54 AWL 16.36 7.67 Lexical density 0.63 0.57 Off-list words 8.36 12.65
  17. 17. Personalised writing analysis 17 Selected statistics for subject 4 Style and usage Sentence Ratio Comment or correction 1 minor scary incidents 1: 58,700 WAC near-miss incidents 2 falling-accident 0: 19 COCA slips, trips and falls OR falling objects 3 a medical examination by interview 1: 525 WAC 0: 1 COCA a medical consultation 4 According to sex 1: 18 WAC According to the gender 5 175 indoor workers n/a Use One hundred and …. 6 Tomio,T. (1995) proposes n/a Omit initials in in-text citations unless …
  18. 18. Personalised writing analysis 18 Selected statistics for subject 7 Style and usage Sentence Ratio Comment or correction 1 people provide first their expertise … 0:9 COCA people first provide their expertise … 2 XX also engage into XX 1:9000 COCA XX also engage in XX 3 The XX structure limits become n/a Use limits for boundaries and limitations for restrictions/ inabilities 4 future studies are able to n/a Use may be to show uncertainty 5 employee simultaneous participation 0:5 WAC simultaneous participation of employees
  19. 19. Interview survey Interviewer = me Subjects = 4 faculty, 1 PhD candidate Nationalities = 3 Japanese, 2 non-Japanese Number = 5 participants Interview time = 30 minutes Location = private office on campus Dates of interview = Jun-Jul 2013 Semi-structured interviews e.g. `What revisions did you make to your paper since…..? `How can I make the feedback more useful?` 19
  20. 20. Survey results 20 • Explanatory notes – too long • Key word lists – couldn`t understand • Three readability scores – too complex • Raw ratios – too difficult e.g. 47:211,120 1:4500 • Lexico-grammatical errors • Word type balance • Ratios for style and usage
  21. 21. Incremental improvements (made) 1. Create summary statistic scorecard  2. Use word tag cloud for vocabulary fit  3. Shorten explanatory notes  4. Simplify and approximate ratios  5. Show word type balance graphically with percentages 6. Select `most useful` readability measure(s) – mean sentence and word length? 21
  22. 22. Future developments • Integration of metrics into one-stop online porthole (thanks to reviewer for idea) for researchers to submit drafts • Statistical comparison of draft and published versions to evaluate success of feedback 22
  23. 23. Any questions, suggestions or comments? John Blake johnb@jaist.ac.jp

×