• Save

Like this? Share it with your network


Chi-Un Lei "Text Mining and Educational Discourse"






Total Views
Views on SlideShare
Embed Views



1 Embed 82

http://www.cite.hku.hk 82



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Don’t forget that you need to prune

Chi-Un Lei "Text Mining and Educational Discourse" Presentation Transcript

  • 1. Text Mining and Educational Discourse Dr. Chi-Un Lei, Dept. of Electrical and Electronic Eng. LASI-HK 2014 (Adopted from LASI workshop 2014) 1
  • 2. Words from the Speaker “The key insight communicated through this workshop is that … If we can understand the connection between socio- psychological processes and language by means of the social signals encoded in them, we can structure computational models of language interactions more effectively.” --- Carolyn Penstein Rosé 2
  • 3. Outline  Theoretical: Connection between discourse and learning  From rich but implicit constructs to explicit features that capture the essence for machine learning  Hands-on: Machine learning for text extraction and classification 3 Automatic Analysis Of Conversation Conversational Interventions Positive Learning Outcomes
  • 4. Educational Discourse 4 Sociolinguistics Discourse Analysis Language And Identity Language Use Machine Learning Multi- Level Modeling Applied Statistics Computational Models Of Discourse Analysis
  • 5. Souffle Framework 5 Engagement Engagement • Transactive • Knowledge Integration Person Person Authority Authority  Analysis of discussions for learning
  • 6. Transactivity  Building on an idea expressed earlier in a conversation  Using a reasoning statement 6 We don't want tmax to be at 570 both for the material and [the Environment] Well, for power and efficiency, we want a high tmax, but environmentally, we want a lower one.
  • 7. 7
  • 8. 8
  • 9. System of Engagement  Showing openness to the existence of other perspectives  Examples  Nuclear is a good choice  I consider nuclear to be a good choice  There’s no denying that nuclear is a superior choice  Is nuclear a good choice? 9
  • 10. 10
  • 11. 11
  • 12. What is machine learning?  Automatically or semi-automatically  Inducing concepts (i.e., rules) from data  Finding patterns in data (For human and computer)  Explaining data  Making predictions 12 Data Learning Algorithm Model New Data PredictionClassification Engine
  • 13. 13
  • 14. Keep this picture in mind…  Machine learning isn’t magic  But it can be useful for identifying meaningful patterns in your data when used properly  Proper use requires insight into your data  Otherwise, GIGO (Garbage In Garbage Out)  Think like a computer! 14
  • 15. Machine Learning for Text Mining  Basic features: “Bag of Words”  Represent text as a vector where each position corresponds to a term 15 • Cows make cheese. (110010) • Cheese make cows. (110010) • Hamsters eat seeds. (001101) Cheese Cows Eat Hamsters Make Seeds
  • 16. Basic Types of Features  Unigram  Single words (e.g. prefer, sandwhich)  Bigram  Pairs of words next to each other (e.g. eat bread)  Simple lexical patterns  e.g. “common denominator” versus “common multiple”  Punctuation  “You think the answer is 9?” vs. “You think the answer is 9.” 16
  • 17. Part of Speech (POS) Tagging  POS bigrams capture syntactic or stylistic information  e.g. “the answer which is …” vs “which is the answer”  Pairs of POS (Part-of-Speech) tags next to each other  DT_NN: "Determiner"_"Noun, singular or mass “  NNP_NNP: “Proper noun, singular”_“Proper noun, singular”  Examples  JJR: Adjective, comparative 17 http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  • 18. Feature Space Customizations  Machine learning algorithms look for features that are good predictors, NOT features that are necessarily meaningful  Look for approximations e.g. Don’t need to do a complete syntactic analysis for questions Look for question marks Look for wh-terms that occur immediately before an auxilliary verb --- Combined features 18
  • 19. LightSide  Easy UI  Feature Extraction  Model Building / Machine Learning  Error Analysis  Data Structuring  Free/open-source for adoption and extension 19
  • 20. Feature Extraction 20
  • 21. Machine Learning and Evaluation 21
  • 22. Recap … “The key insight communicated through this workshop is that … If we can understand the connection between socio- psychological processes and language by means of the social signals encoded in them, we can structure computational models of language interactions more effectively.” --- Carolyn Penstein Rosé 22
  • 23. Examples of Part of Speech Tagging 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10.LS List item marker 11.MD Modal 12.NN Noun, singular or mass 13.NNS Noun, plural 14.NNP Proper noun, singular 15.NNPS Proper noun, plural 16.PDT Predeterminer 17.POS Possessive ending 18.PRP Personal pronoun 19.PP Possessive pronoun 20.RB Adverb 21.RBR Adverb, comparative 22.RBS Adverb, superlative23 http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  • 24. Examples of Part of Speech Tagging 23.RP Particle 24.SYM Symbol 25.TO to 26.UH Interjection 27.VB Verb, base form 28.VBD Verb, past tense 29.VBG Verb, gerund/present participle 30.VBN Verb, past participle 31.VBP Verb, non-3rd ps. sing. present 32.VBZ Verb, 3rd ps. sing. present 33.WDT wh-determiner 34.WP wh-pronoun 35.WP Possessive wh- pronoun 36.WRB wh-adverb http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  • 25. Findings  Transactivity (Berkowitz & Gibbs, 1983)  Moderating effect on learning (Joshi & Rosé, 2007; Russell, 2005; Kruger & Tomasello, 1986; Teasley, 1995)  Moderating effect on knowledge sharing in working groups (Gweon et al., 2011)  Engagement (Martin & White, 2005)  Correlational analysis: Strong correlation between displayed openness of group members and articulation of reasoning (R = .72) (Dyke et al., in press)  Intervention study: Causal effect on propensity to articulate ideas in group chats (effect size .6 standard deviations) (Kumar et al., 2011)  Mediating effect of idea contribution on learning in scientific inquiry (Wang et al., 2011) 25