Data mining on social networks for students learning experiences

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 7, NO. 3, JULY-SEPTEMBER 2014
Mining Social Media Data for Understanding
Students’ Learning Experiences
Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madha
Presented By
Biplab Chandra Debnath
ID: 1015312004
Institute of Information and Communication Technology (IICT)
Bangladesh University of Engineering and Technology (BUET)

Contents
 Objectives
 Introduction
 Related Works
 Data Collection
 Inductive Content Analysis
 Naïve Bayes Multilevel Classifier
 Comparison Experiment
 Detect Students Problems From Purdue Data Set
 Limitations and Future Work
 Conclusion

Objectives
 Demonstrating workflow of social media data sense
making for education data mining.
 Integrating both qualitative analysis and large scale data
mining techniques
 Exploring engineering students informal conversations on
twitter.
 Understanding issues and problems students encounter
in their learning experiences.

Introduction

Related Work
 Public Discourse on the Web
 Goffman’s theory (notion of front-stage and back-stage of people’s
social performances)
 Mining Twitter Data
 Analyze tweets with hastag #iranElection
 Popular classification model (Decision tree, Logistic regression,
Maximum entropy, Boosting, SVM)
 Learning Analytics and Educational Data Mining
 CMS, VLE, EDM (blackboard.com)
 Identify students academic performances

 Radian6 (http://www.salesforce.com/)
 Twitter APIs
 Keywords: engineer, students, campus, class, homework,
professor, and lab.
 Twitter hashtag #engineeringProblems occurring most
frequently
 25,284 tweets with the hashtag #engineeringProblems posted
from 10,239 unique Twitter accounts.
 Considering only 2785 tweets
 39,095 tweets with the hashtag #engineeringProblems posted
from 5,592 unique Twitter accounts (Purdue University)
Data Collection

 Non-mutually exclusive categories
Development of Categories

 Naïve Bayes classifier is effective on this data set compared to other
multi-label classifiers.
 Text Pre-Processing
 Naïve Bayes multilevel classifier
 Evaluation Measures for Multi-Label Classifier
 Classification Result
Naïve Bayes Multilevel Classification

 Remove all #tag, negative emotions, repeating letters
(huuungryyy)
 Used the Krovetz stemmer in the Lemur information
retrieval toolkit
 Remove the common stop words (much, more, all, always,
still, only)
Text Pre-Processing

Naïve Bayes multilevel classifier

Example Based Classification Measures

Label-Based Evaluation Measures

Macro-averaged F1 is higher for classifiers work better on
smaller categories.

Label based accuracy is not a very effective measure to
account label imbalance.

Comparison Experiment: SVM and M3L

Comparison Experiment: SVM and M3L
Same training and testing data sets
 One-versus-all SVM multi-label classifier classified all
tweets into not in the category for all categories.
 Max-Margin Multi-Label classifier takes label correlation.
 The performance is better than the simplistic one-versus-
all SVM classifier.
 But still not as good as the Naive Bayes classifier.
 Because SVM is not a probabilistic model

Detect Students Problems From Purdue Data Set

 First, not all students are active on Twitter.
 Second, consideration on only negative aspect but not
positive on learning experiences
 Third, identified the prominent themes with relatively large
number of tweets in the data.
 Fourth, the qualitative analysis reveals that there are
correlations among the themes.
Limitations

 First, The “manipulation” of personal image online may
need to be taken into considerations in future work.
 Second, Future work can compare both the good and bad
things to investigate the tradeoffs with which students
struggle.
 Third, Future work can be done to design more
sophisticated algorithms in order to reveal the hidden
information in the “long tail”.
 Fourth, Future work could specifically address the
correlations among these student problems.
Future Work

 Through a qualitative content analysis, we found that
engineering students are largely struggling with the heavy
study load, and are not able to manage it successfully.
 Heavy study load leads to many consequences including
lack of social engagement, sleep problems, and other
psychological and physical health problems.
 This detector can be applied as a monitoring mechanism
to identify at-risk students.
Conclusion

Data mining on social networks for students learning experiences

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Data mining on social networks for students learning experiences

Similar to Data mining on social networks for students learning experiences (20)

Recently uploaded

Recently uploaded (20)

Data mining on social networks for students learning experiences