Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Automatic Analysis of Online Discussions among Hong Kong Students


Published on

HU, Xiao (University of Hong Kong)
Author(s) bear(s) the responsibility in case of any infringement of the Intellectual Property Rights of third parties.
CITE was notified by the author(s) that if the presentation slides contain any personal particulars, records and personal data (as defined in the Personal Data (Privacy) Ordinance) such as names, email addresses, photos of students, etc, the author(s) have/has obtained the corresponding person's consent.

  • Be the first to comment

  • Be the first to like this

Towards Automatic Analysis of Online Discussions among Hong Kong Students

  1. 1. Xiao HuUniversity of Hong KongCITE Research Symposium 2013May 12, 2013Towards Automatic Analysis of OnlineDiscussions Among Hong KongStudents
  2. 2. Outline Goals and Purposes Data Mining and Applications to Online Discussions Classification Association Rule Mining Findings More questions to answer Bridging research and teaching
  3. 3. Goals and Purposes Online discussions are widely used in education Effective for communication and collaboration Need tools to monitor online discussions Data mining may help (semi-)automatically identifyvarious patterns in online discussions, for example: Threads that need interventions Outcome predictions Role identification (e.g., question raiser, answerprovide, etc.) Network analysis of student groups Assessment of discussion quality .....
  4. 4. This Study How effective it is to mine online discussionsof HK students? A case study on 1,965 discussion posts on the subject of global warming collected from five primary or secondary schools inHong Kong from years 2006-2009 383 discussion threads involving 1 to 21participants Two commonly used Data Mining techniques Classification Association rule mining
  5. 5. What is Data Mining? To identify patterns (or to prove no patterns) from adataset DM is NOT querying databases Where you know what you are looking for E.g., total sales in the past three years DM is NOT statistical testing Where you know the hypotheses E.g. H0: the means of two groups are equal DM is discovery-based Find out unknown patterns, generate hypotheses DM is iterative exhaustively explore very large data sets
  6. 6. Data Mining –Classification Functionality: to assign one of a number of classlabels to each instance of your data Examples of classification tasks: Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate orfraudulent Categorizing news stories as finance, weather,entertainment, sports, etc Categorizing library materials by catalogs Predicting whether a post in an online forum will getreplies or not
  7. 7. How Classification Works? Given a collection of data (training set ) Each instance contains a set of attributes, one of theattributes is the class label. Find (calculate) a model for the class label as afunction of the values of other attributes Goal: previously unseen data can then be fed tothe model and the model assigns a class labelas accurately as possible Performance measure: accuracy How many instances are correctly classified
  8. 8. An Illustrative Example (1)8TrainingDataNAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 noClassificationAlgorithmsIF rank = ‘professor’OR years > 6THEN tenured = ‘yes’Classifier(Model)
  9. 9. An Illustrative Example (2)9ClassificationAlgorithmsIF rank = ‘professor’OR years > 6THEN tenured = ‘yes’Classifier(Model)Unseen Data(Jeff, Professor, 4)Tenured?
  10. 10. Classifying Online Discussions(1) Task1: threads with one vs. many participants To predict whether a post belongs to a threadinvolving only one participant or a thread involvingmany (> 14) participants Attributes used to build classification model Words in the posts: individual words (unigram)two consecutive words (bigrams) Classification algorithm: Naive Bayesian Empirically effective in text categorization Performance: 79.07%
  11. 11. Classifying Online Discussions(2) Task2: initial posts with vs. without replies To predict whether an initial post are likely to getreplies or not Attributes used to build classification model Words in the posts: individual words (unigram)two consecutive words (bigrams) Classification algorithm: Naive Bayesian Empirically effective in text categorization Performance: 64%Need to look deeper: mine patterns in eachcategory
  12. 12. Data Mining – Association Rules Functionality: to find associative relationsbetween patterns frequently occurring in yourdata {Pattern A} => {Pattern B} with certain probability Examples of association rule mining tasks: Basket (shopping cart) analysis: customers buyingproduct A often also buy product B Medical diagnosis: a patient with symptoms A islikely to have disease B Protein sequences: the appearances of amino acidsA indicates a greater chance of also having aminoacids C Online discussions: a post with word or phrase A islikely to be in class B
  13. 13. Mining Association Rules fromOnline Discussions (1) Task 1: Words and phrases strongly associatedwith threads with one or many participantsRank One participant Many participants1 dioxide i agree2 carbon dioxide agree3 carbon i4 temperature greenhouse gases5 global warming i think6 global think7 warming yes8 power carbon dioxide9 air global warming10 water yeah
  14. 14. Mining Association Rules fromOnline Discussions (2) Task 2: Words and phrases strongly associatedwith initial posts with or without repliesRank Has no reply Has replies1 global warming protect2 earth’s melt3 global world4 warming warming5 earth sea6 s i7 greenhouse ice8 effect rise9 gases global warming10 greenhouse effect global
  15. 15. Findings and future work Data mining techniques were able to find patternsfrom online discussions among Hong Kongstudents It was feasible to distinguish threads and posts incontrast categories Same techniques can be applied to distinguish Shallow and deep discussions (depth of threads) Confusion level of posts (need annotations ontraining data) Speech acts of posts (need annotations on trainingdata) Emotions in the posts (need annotations on trainingdata)
  16. 16. Integrating Research andTeaching Both data mining techniques are discussed andpracticed in the Data Mining course in theBachelor of Science in Information Management(BSIM 0018) The tool used in this project is also taught in thecourse Projects like this can be students’ course projects,
  17. 17. Thank you!Questions, comments, and suggestions areappreciated!Xiao Hu: