CITE: "Rantanen, A., Salminen, J., & Jansen, B. J. (2018). Determining Online Brand Reputation with Machine Learning from Social Media Mentions: A Study in the Banking Context. Presented at the 13th Global Brand Conference, Northumbria University, UK, 2–4 May."
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
Determining Online Brand Reputation with Machine Learning from Social Media Mentions: A Study in the Banking Context
1. Determining online brand reputation with
machine learning from social media mentions: A
study in the banking context
Anette Rantanen1, Joni Salminen1,3, Filip Ginter2, Bernard J. Jansen3
1Turku School of Economics, 2University of Turku, 3Qatar Computing
Research Institute
2. Outline
1. Research purpose
2. Research design
3. Data collection
4. Training data annotation
5. Convolutional neural network
6. Results
7. Evaluation
8. Improvements
3. Research purpose
The purpose of this research is to develop and test a machine
learning model for automatic classification of online brand
reputation. Earlier machine learning efforts have mostly focused on
simple sentiment analysis, classifying positive, neutral and negative
brand mentions (e.g., Cambria, Schuller, Xia, & Havasi, 2013).
However, brand reputation is a much more complex construct,
including concepts, such as trust, quality, relatability, and other
psychologically advanced dimensions (Aaker, 1997).
4. Research design
To create a robust classification scheme, we first extensively review
the existing literature and then apply a convolutional neural network
(CNN), to detect and classify these brand dimensions from real
consumer discussions. We validate our approach by classifying social
media mentions for two organizations in the banking sector.
5. Data collection
• 2 large Finnish banks
• 2 large online social networks (> 10M messages)
6. Data collection
• 2 large Finnish banks
• 2 large online social networks (> 10M messages)
7. Data collection
• 2 large Finnish banks
• 2 large online social networks (> 10M messages)
8. Classification framework
Multidimensional reputation scales include Fortune Magazine’s annual
AMAC index, reputation quotient scale (RQ) by Fombrun, Gardberg, &
Sever (2013) and Walsh and Beatty's (2007) customer-based
reputation scale (CBR). The AMAC index focuses both on corporate
management’s point of view and customer perspective, while the
reputation quotient scale also takes into account other corporate
stakeholders such as employees. We adopt a perspective according to
which online brand reputation is a) a multi-dimensional construct, b)
defined to a large extent by customers. We combine constructs from
prior literature into a 6-dimensional classification scheme.
10. Training data annotation
In total, we retrieve 18,807 social media posts. Two research assistants
and one of the authors then code a 2,057 randomly sampled posts, so
that each post is classified either to one of the defined categories or to
a neutral category. To validate the manual coding, we calculate Fleiss’
Kappa and find a satisfactory agreement (k = 0.61). The manual coding
is a necessary step for generating training data for the supervised
machine learning model.
15. Evaluation
• Precision = Number of true
positives / Total number of
predicted positives
• Recall = Number of true
positives / Actual number
of positives
• F1 = Composite metric of
the above
17. Improvements
• Simplification of classification open coding
• Theoretically complex frameworks fit poorly into reality of people in
social media what is an “innovative brand”?
• Additional training data class balance and representativeness
• Trying out other machine learning models (e.g., random forest,
XGBoost)
19. Interpretation
As seen from Table 3, the strongest classes are neutral, agreeable(+) and
quality(+). Precision is strong in the class responsible(-), but the recall is
weaker, i.e. there was actually an even larger number of comments that
should have hit in this category, but they were categorized incorrectly into a
neutral category. For quality(+), the situation was the opposite. The machine
predicted quality(+) to have more comments than it actually did, so the
precision of the class was weaker than the recall. The quality(+) comments
seem to have been particularly mixed with agreeable(+) comments, which is
not surprising because there was some difficulty in distinguishing the two
even in the manual coding. The weakest classes are reliable(-), quality(-),
responsible(+), and innovative(+). There is only little training data for the
responsible(+) category, so reliable conclusions cannot be drawn.