© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 7, NO. 3, JULY-SEPTEMBER 2014
Mining Social Media Data for Understanding
Students’ Learning Experiences
Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madha
Presented By
Biplab Chandra Debnath
ID: 1015312004
Institute of Information and Communication Technology (IICT)
Bangladesh University of Engineering and Technology (BUET)
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Contents
 Objectives
 Introduction
 Related Works
 Data Collection
 Inductive Content Analysis
 Naïve Bayes Multilevel Classifier
 Comparison Experiment
 Detect Students Problems From Purdue Data Set
 Limitations and Future Work
 Conclusion
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Objectives
 Demonstrating workflow of social media data sense
making for education data mining.
 Integrating both qualitative analysis and large scale data
mining techniques
 Exploring engineering students informal conversations on
twitter.
 Understanding issues and problems students encounter
in their learning experiences.
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Introduction
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Related Work
 Public Discourse on the Web
 Goffman’s theory (notion of front-stage and back-stage of people’s
social performances)
 Mining Twitter Data
 Analyze tweets with hastag #iranElection
 Popular classification model (Decision tree, Logistic regression,
Maximum entropy, Boosting, SVM)
 Learning Analytics and Educational Data Mining
 CMS, VLE, EDM (blackboard.com)
 Identify students academic performances
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 Radian6 (http://www.salesforce.com/)
 Twitter APIs
 Keywords: engineer, students, campus, class, homework,
professor, and lab.
 Twitter hashtag #engineeringProblems occurring most
frequently
 25,284 tweets with the hashtag #engineeringProblems posted
from 10,239 unique Twitter accounts.
 Considering only 2785 tweets
 39,095 tweets with the hashtag #engineeringProblems posted
from 5,592 unique Twitter accounts (Purdue University)
Data Collection
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 Non-mutually exclusive categories
Development of Categories
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 Naïve Bayes classifier is effective on this data set compared to other
multi-label classifiers.
 Text Pre-Processing
 Naïve Bayes multilevel classifier
 Evaluation Measures for Multi-Label Classifier
 Classification Result
Naïve Bayes Multilevel Classification
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 Remove all #tag, negative emotions, repeating letters
(huuungryyy)
 Used the Krovetz stemmer in the Lemur information
retrieval toolkit
 Remove the common stop words (much, more, all, always,
still, only)
Text Pre-Processing
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes multilevel classifier
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes multilevel classifier
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes multilevel classifier
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Example Based Classification Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Example Based Classification Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Example Based Classification Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Label-Based Evaluation Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Label-Based Evaluation Measures
Macro-averaged F1 is higher for classifiers work better on
smaller categories.
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Label-Based Evaluation Measures
Label based accuracy is not a very effective measure to
account label imbalance.
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Comparison Experiment: SVM and M3L
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Comparison Experiment: SVM and M3L
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Comparison Experiment: SVM and M3L
Same training and testing data sets
 One-versus-all SVM multi-label classifier classified all
tweets into not in the category for all categories.
 Max-Margin Multi-Label classifier takes label correlation.
 The performance is better than the simplistic one-versus-
all SVM classifier.
 But still not as good as the Naive Bayes classifier.
 Because SVM is not a probabilistic model
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Detect Students Problems From Purdue Data Set
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Detect Students Problems From Purdue Data Set
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Detect Students Problems From Purdue Data Set
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 First, not all students are active on Twitter.
 Second, consideration on only negative aspect but not
positive on learning experiences
 Third, identified the prominent themes with relatively large
number of tweets in the data.
 Fourth, the qualitative analysis reveals that there are
correlations among the themes.
Limitations
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 First, The “manipulation” of personal image online may
need to be taken into considerations in future work.
 Second, Future work can compare both the good and bad
things to investigate the tradeoffs with which students
struggle.
 Third, Future work can be done to design more
sophisticated algorithms in order to reveal the hidden
information in the “long tail”.
 Fourth, Future work could specifically address the
correlations among these student problems.
Future Work
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
 Through a qualitative content analysis, we found that
engineering students are largely struggling with the heavy
study load, and are not able to manage it successfully.
 Heavy study load leads to many consequences including
lack of social engagement, sleep problems, and other
psychological and physical health problems.
 This detector can be applied as a monitoring mechanism
to identify at-risk students.
Conclusion

Data mining on social networks for students learning experiences

  • 1.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 7, NO. 3, JULY-SEPTEMBER 2014 Mining Social Media Data for Understanding Students’ Learning Experiences Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madha Presented By Biplab Chandra Debnath ID: 1015312004 Institute of Information and Communication Technology (IICT) Bangladesh University of Engineering and Technology (BUET)
  • 2.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Contents  Objectives  Introduction  Related Works  Data Collection  Inductive Content Analysis  Naïve Bayes Multilevel Classifier  Comparison Experiment  Detect Students Problems From Purdue Data Set  Limitations and Future Work  Conclusion
  • 3.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Objectives  Demonstrating workflow of social media data sense making for education data mining.  Integrating both qualitative analysis and large scale data mining techniques  Exploring engineering students informal conversations on twitter.  Understanding issues and problems students encounter in their learning experiences.
  • 4.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Introduction
  • 5.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Related Work  Public Discourse on the Web  Goffman’s theory (notion of front-stage and back-stage of people’s social performances)  Mining Twitter Data  Analyze tweets with hastag #iranElection  Popular classification model (Decision tree, Logistic regression, Maximum entropy, Boosting, SVM)  Learning Analytics and Educational Data Mining  CMS, VLE, EDM (blackboard.com)  Identify students academic performances
  • 6.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  Radian6 (http://www.salesforce.com/)  Twitter APIs  Keywords: engineer, students, campus, class, homework, professor, and lab.  Twitter hashtag #engineeringProblems occurring most frequently  25,284 tweets with the hashtag #engineeringProblems posted from 10,239 unique Twitter accounts.  Considering only 2785 tweets  39,095 tweets with the hashtag #engineeringProblems posted from 5,592 unique Twitter accounts (Purdue University) Data Collection
  • 7.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  Non-mutually exclusive categories Development of Categories
  • 8.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  Naïve Bayes classifier is effective on this data set compared to other multi-label classifiers.  Text Pre-Processing  Naïve Bayes multilevel classifier  Evaluation Measures for Multi-Label Classifier  Classification Result Naïve Bayes Multilevel Classification
  • 9.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  Remove all #tag, negative emotions, repeating letters (huuungryyy)  Used the Krovetz stemmer in the Lemur information retrieval toolkit  Remove the common stop words (much, more, all, always, still, only) Text Pre-Processing
  • 10.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Naïve Bayes multilevel classifier
  • 11.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Naïve Bayes multilevel classifier
  • 12.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Naïve Bayes multilevel classifier
  • 13.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Example Based Classification Measures
  • 14.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Example Based Classification Measures
  • 15.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Example Based Classification Measures
  • 16.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Label-Based Evaluation Measures
  • 17.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Label-Based Evaluation Measures Macro-averaged F1 is higher for classifiers work better on smaller categories.
  • 18.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Label-Based Evaluation Measures Label based accuracy is not a very effective measure to account label imbalance.
  • 19.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Comparison Experiment: SVM and M3L
  • 20.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Comparison Experiment: SVM and M3L
  • 21.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Comparison Experiment: SVM and M3L Same training and testing data sets  One-versus-all SVM multi-label classifier classified all tweets into not in the category for all categories.  Max-Margin Multi-Label classifier takes label correlation.  The performance is better than the simplistic one-versus- all SVM classifier.  But still not as good as the Naive Bayes classifier.  Because SVM is not a probabilistic model
  • 22.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Detect Students Problems From Purdue Data Set
  • 23.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Detect Students Problems From Purdue Data Set
  • 24.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016 Detect Students Problems From Purdue Data Set
  • 25.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  First, not all students are active on Twitter.  Second, consideration on only negative aspect but not positive on learning experiences  Third, identified the prominent themes with relatively large number of tweets in the data.  Fourth, the qualitative analysis reveals that there are correlations among the themes. Limitations
  • 26.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  First, The “manipulation” of personal image online may need to be taken into considerations in future work.  Second, Future work can compare both the good and bad things to investigate the tradeoffs with which students struggle.  Third, Future work can be done to design more sophisticated algorithms in order to reveal the hidden information in the “long tail”.  Fourth, Future work could specifically address the correlations among these student problems. Future Work
  • 27.
    © Biplab C.Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016  Through a qualitative content analysis, we found that engineering students are largely struggling with the heavy study load, and are not able to manage it successfully.  Heavy study load leads to many consequences including lack of social engagement, sleep problems, and other psychological and physical health problems.  This detector can be applied as a monitoring mechanism to identify at-risk students. Conclusion