A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions’ Communication on Facebook
1. AThree-Step
Data-Mining Analysis of
Top-Ranked Higher
Education Institutions’
Communication on
Facebook
Álvaro Figueira, André Fonseca
CRACS/INESCTEC and University of Porto
arf@dcc.fc.up.pt , andre.fonseca@pgawind.eu
4. 4
The rankingand the dataset.
World Rank Institution Location
National
Rank
Quality of
Education
Alumni
Employment
Quality of
Faculty
Publications Influence Citations Broad Impact Patents Score
1
Top 0.1%
Harvard University USA 1 1 1 1 1 1 1 1 2 100.00
2
Top 0.1%
Stanford University USA 2 8 2 2 5 3 2 3 7 96.86
3
Top 0.1%
Massachusetts
Institute of
Technology
USA 3 2 12 3 14 2 3 2 1 95.72
4
Top 0.1%
University of
Cambridge
United
Kingdom
1 3 10 6 10 7 17 13 52 93.14
5
Top 0.1%
University of Oxford
United
Kingdom
2 7 14 9 6 6 4 9 19 92.20
6
Top 0.1%
Columbia University USA 4 13 6 10 13 12 13 14 4 90.80
7
Top 0.1%
University of
California, Berkeley
USA 5 6 24 5 12 4 7 7 22 88.26
8
Top 0.1%
University of Chicago USA 6 11 13 8 23 17 11 16 85 87.13
9
Top 0.1%
Princeton University USA 7 4 15 4 99 26 23 39 122 86.04
10
Top 0.1%
Yale University USA 8 9 27 11 17 10 32 18 48 81.20
CWUR 2017 - World University Rankings. (https://cwur.org/2017.php)
6. 6
Step 1a: understanding the
communication strategy.
• Posting frequency
• Period of the day of the posts
• Number of posts in each weekday
• Frequency of posting per month
• Topics of posting and the self-image
• Topical words in posts
• Sentiment words in posts
• Distribution of posts in topic areas
• Sentiment analysis
• Fading patterns
8. 8
Step 1b: understanding the
response patterns.
• Frequency and intensity of comments
• Topics of comments
• Distribution of posts in topic areas
• Topics of posting and the self-image
• Sentiment analysis
• Aggregated sentiment in each HEI
10. 10
Step 2: comparing HEI through
created metrics.
• Most active fans
• Feedback from Most Active Fan (MAF)
• Number of posts by a fan vs. number of users
• Type of post where comments are made
• Score of each HEI vs. the number of fans
• Fading patterns
12. 12
Step 3: predictive analytics.
• In the prediction phase, we built three models to predict basically
three things:
• the engagement a post will have in the next 3 days;
• the average sentiment of the response it will have in the next 4 days;
• the fading of the response in the next 3 days.
• For the prediction we used “Random Forests”.
• To prevent overfitting, we divided the dataset into 80% for training
and 20% for testing (never previously used in the training).
• The results showed that the accuracy of the classifier was slightly
above 80%, and that the F1-measure was of 84%, both values were
computed as the average for each of the two metrics.