Twitter Analytics for hashtag #supplychain. Considering Twitter and Twitter data for supply chain practice and research
Review of a research paper:
"Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research
Bongsug (Kevin) Chae
Department of Management, College of Business Administration, Kansas State University, United States
Received 1 March 2014, Accepted 28 December 2014, Available online 5 January 2015
Hello everybody, I am Dennis Kappen, and I am here to present the research paper on “Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research
At the onset let us establish that Supply Chain is a complex domain and involves a system of organizations, people, activities, information, and resources involved in moving a product or service from supplier to customer
Based on a report by Bernard Marr Best-Selling Author, Keynote Speaker and Leading Business and Data Expert “Supply chain management is a field where Big Data and analytics have obvious applications. Until recently, however, businesses have implemented big data analytics in areas of operation such as marketing or manufacturing easily than in Supply chain management.”
This is where this research paper fits in. A few takeaways from this paper are: To understand how the researchers extracted intelligence from 22,399 #supplychain tweets To understand the Proposed framework for data analytics combining three methodologies namely: descriptive analytics (DA), content analytics (CA) which integrated text mining and sentiment analysis, and network analytics (NA) relying on network visualization and metrics
Furthermore gathering insights on findings on the usage of #supplychain tweets which are shown to be used by different groups of supply chain professionals and organizations (e.g., news services, IT companies, logistic providers, manufacturers) for information sharing, hiring professionals, and communicating with stakeholders, among others
Lets follow through the sections of the paper as in the order of: Introduction Background Framework Research method (Results) Discussion Implications Limitations and Future Work
We know that Social Media has been used for Marketing Brand Management Product and Service Promotion Customer engagement Recruitment Sales forecasting & Education
At the same time Big Data has been extensively used for stock price predictions prevention of epidemics early event monitoring election predictions crisis management brand management public relations information diffusion Public opinions and similar areas of application
Within this intersection of social media, big data and SCM there has been a slow growth in the usage of social media
From this image that I found, we can see that while the flow of physical goods can be unidirectional, the flow of information is bidirectional
And There has been a focus on using social media for product traceability data, geo-location and mapping and supply chain visibility and intelligence.
So the question is Why was Twitter used as medium for Data analytics in this research paper?
The main reasons as explained by the authors were Due to open access to the Twitter API To Understand the relevance of Twitter within the SCM context and To Develop a framework for analyzing supply chain tweets of three types: tweets, replies and retweets
The authors set out to analyse supplychain tweets and look for relevance within these four attributes:
Tweet characteristics Topics characteristics User characteristics Sentiment characteristics.
Through streaming using APIs, only 1% of publically available Twitter data can be acquired Through companies like GNIP, DataSift, , which are known as Twitter Firehoses. 100% of Twitter data can be available through these data providers. This can prove to be quite expensive.
Analysis of collected data can become challenging because : Data are less structured because of the existence of ( texts, informal expressions) and are more enriched because of the presence of elements such as user profiles, follower information, hashtags, URL)
The use of diverse research methods and metrics are therefore necessary to extract intelligence from the highly enriched and unstructured social media data
Due to this reason, the authors propose a framework which comprises of Descriptive analytics (DA) Content analytics (CA) and Network analytics (NA)
Lets look at these modules in detail in the next few slides
Descriptive analytics focuses on descriptive statistics, which relates to information such as such as the number of tweets, distribution of different types of tweets, and the number of hashtags.
Most of us have conducted surveys in our grad studies, and we can relate to this situation.
While a small number of metrics (e.g., sample size, response rate, responder profile) are used for the survey data, the enriched nature of Twitter data requires intelligence extraction, using a large set of metrics regarding tweets such as number of tweets, word counts, @user per tweet, number of hashtags
Knowing who tweets, replies, and retweets is important both for practitioners looking for business value from Twitter and researchers studying a phenomenon
a large portion of tweets contain one or more URLs in their texts. URLs could be news releases, reports, articles, and more. Thus, analyzing URLs can reveal what topical interests and informa-tion are considered important among Twitter users
Content analytics (CA), refers to a broad set of natural language processing (NLP) which is enabling computers to derive meaning from human or natural language input. Additionally the authors used text mining methods
Social media data are primarily texts and thus “unstructured” in nature.
A Tweet's text is informal and composed of a short list of words, hashtags, URLs, and other information. Thus, a careful consideration of text cleaning and processing is a prerequisite for intelligence gathering.
A tweet may contain not just information, but also opinions
Hence advanced text mining techniques such as sentiment analysis may be key for extracting data.
Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, or service , is positive, negative, or neutral.
Text mining transforms unstructured texts (or documents) into formatted data (or documents), using such techniques as tokenization, n-grams, stemming, and removing stop words (unnecessary words)
Those transformed texts can be used for text summarization, key word analysis, word frequency analysis, and text clustering, using machine learning algorithms, such as clustering and association analysis.
Not to be said: tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus A phoneme /ˈfoʊniːm/ is one of the units of sound (or gesture in the case of sign languages, see chereme) that distinguish one word from another in a particular language. The difference in meaning between the English words kill and kiss is a result of the exchange of the phoneme /l/ for the phoneme /s/. Two words that differ in meaning through a contrast of a single phoneme form a minimal pair. stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form
Twitter users engage through @reply and retweets which could lead to being able to extract network information from Twitter data using techniques and metrics in network theory
Nodes (which are Twitter users) and edges (which are the links or relationships between users ) are two basic terms in the theory. Network topology refers to a layout of the nodes and the edges based on the information of reply and retweet in Twitter. This network visualization uncovers patterns in interactions among users
With twitter data, the there could be two kinds of topological networks: which are friendship network and @reply (or mention) network.
Friendship networks can be constructed based on the information of follower and following. Also, the conversation using @reply creates interpersonal relationships among Twitter users.
Centrality (or popularity) analysis uses node-level metrics, such as degree and betweenness centrality, revealing influential actors in the network
Degree centrality, a key metric, explains who has the most ties (or degrees) to others in the network i.e. those nodes adjacent to a focal node, “betweenness centrality” includes distant paths of the focal node
While centrality analysis mainly focuses on individual nodes (or users in Twitter), community analysis explores network-level characteristics. For example, network density represents the portion of all possible connections between nodes, and, thus, it is a measure of network cohesion.
Modularity analysis identifies specific communities from the network through visualization. Modularity is a measure of how strongly the network is divided into modules
Descriptive analytics which includes Tweet metrics, User Metrics and URL metrics
Content analytics includes word analysis, hashtag analysis and sentiment analysis
Network analytics includes Topological analysis , Centrality analysis and Community analysis.
Let us know look at the results from using the above framework:
The authors initially conducted a series of keyword and hashtag searches, including “supply chain”, “SCM”, “logistics”, “supply chain management”, and so on. Which led them to find out that #supplychain was the most prevalent hashtag used by supply chain professionals and organizations
To perform descriptive analytics , the authors used script languages (Bruns and Burgess, 2011) and multiple statistical and data mining techniques.
Script languages were needed to extract information such as users and hashtags from the dataset of tweets
Active users are calculated based on the number of tweets (which are original tweets + retweets + @replies).
4313 unique users were seen within the dataset
The visibility of users can be calculated by the number of @replies received ( which is = @replies received+ retweets received)
This slide shows the most active users and the most visible users: the most active users are not necessarily the most visible users.
The authors also visualized the activity of the most visible users. The result also suggests that highly visible users tend to be active users as well
It is interesting to note that
89% (19,890) of the total tweets contained one or more URLs.
11,362 different URLs were found from these tweets. Especially active and visible users include URLs in their tweets: almost
From the word analysis perspective:
The most popular words in tweets were logistics (found in 3961 tweets), jobs (3874), manufacturing (1309), careers (1160), risk (744), procurement (719), warehouse (614), software (605), sustainability (602), import (549), export (544), freight (526), sourcing (525), global (515), operations (503), retail (470), cloud (314), data (276), re-shoring (276), visibility (238), and security (219), among others.
An important aspect of this table is the clustering of tweets into five popular themes –which were logistics, sustainability, manufacturing, risk, and software
However my concern is that the authors did not explain, or I may have missed this , that they do not explain how they deduced these five themes.
This is a collection of popular hashtags that were evident in the hashtag analysis
Before we look at sentiment analysis, Just to Recap: Sentiment analysis: is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral. The authors chose SentiStrength because the sentiment analysis algorithm is designed for informal texts such as tweets
Fig 3. shows the sentiments at the entire dataset level. Many tweets appear to be neutral (neither positive nor negative), as indicated by a high portion (67%) of tweets with the score 0.
Another large portion (28%) of tweets scores either -1 or +1, which is relatively neutral. However, some tweets contain either strongly negative or positive sentiment.
Figure 4 shows the tweets separated into five clusters, using such themes as corporate social responsibility (CSR), Risk, Logistics, Manufacturing, and IT.
Again my concern being a missing rational for how these themes were evolved and why does this not sync with the earlier themes…
This table shows some exemplar tweets with relatively strong sentiments.
Some examples were ….
This table shows a few examples from each group or cluster
Some examples were…
From the Network Analytics perspective, topological analysis revealed 5447 Nodes (users) = which are users who sent/received
9238 Edges or connection links are relationships between those users through @reply
Once the shortest path length from every node to all other nodes is calculated, the diameter is the longest of all the calculated shortest paths in a network
I found this image on Stackoverflow which illustrates a network with 6 nodes and 7 edges or links…
The authors indicate that the Average path length was 5.82 meaning that everyone is about 6 nodes away from each length
From a network analytics from a centrality analysis perspective, Just to recap: Centrality analysis is a simple measure of a node's connectedness with others and can be an indicator of a user's popularity.
Degree centrality, a key metric, explains who has the most ties (or degrees) to others in the network
From this image that I found, you can see that nodes 3 and 4 have degree 4
In-degree means number of links that lead into a node; out degree indicates out of the node…
Furthermore, This is another image to explain highest number of edges, The node marked “RED” has the highest degree centrality
Closeness centrality is the lowest average shortest distance to all other nodes.
I have purposely placed these slides here so that we could review a table in the next slide
Where Based on this rationale, the analysis shows that Companies (like ,@deloitte,@toyotaequipment)and industry presses (e.g.,@ILmagazine) in the SCM field draw many replies or mentions, as indicated by their high in-degree values. Some individual users (like ,@Lcecere) are noticeably active users.
While centrality is low, analysis shows the presence of supply chain companies, logistics providers and retailers
The authors refer to Bloden et al. and I looked up This image from this paper for Modularity, which is essentially one measure of the structure of networks or graphs. They had defined a greedy algorithm for determining modularity
It was designed to measure the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.
Modularity is often used in optimization methods for detecting community structure in networks. )
(Each pass is made of two phases: one where modularity is optimized by allowing only local changes of communities; one where the found communities are aggregated in order to build a new network of communities. The passes are repeated iteratively until no increase of modularity is possible.)
Based on this protocol, (which in IMHO is also not detailed) the ommunity analysis showed a very low graph density (0.001), indicating that the entire #supplychain network was sparsely distributed (not cohesive)
The modularity-analysis revealed over 400 communities in the #supplychain network. A majority of these communities are quite small-scale.
There were four large communities, each of which represents over 5% of the total nodes. These four communities were about corporate social responsibility, sustainability, manufacturing, and SCM
This I like because there was a rationale on how they deduced the four communities…
In the discussion section the paper tries to link back to the four attributes of their research question indicated in the introduction section which were namely : Tweet characteristics, Topics characteristics, User characteristics and Sentiment characteristics.
From the tweet Characteristics perspective, supply chain tweets were conversational and engaging
The rate of hashtags in #supplychain tweets also draws attention. User inclusion of hashtags indicates “topical” relevance
The two factors that played a key role in determining information diffusion of supply chain : 1)Tweets were about: Timely issues And Challenges, Job-related tweets were the least retweeted 2) Second, the number of hashtags were positively associated with the degree of diffusion through retweets. Those tweets widely diffused through retweets tend to contain six hashtags on average, which is twice that of other less popular tweets.
Under the section, the topics indicated were :logistics, manufacturing, sustainability, corporate social responsibility (CSR), risk management, and IT
topics of interest in tweets were #BigData, #SocialMedia, #careers, #equity, #CRM, #SaaS, #humanrights, #horsemeat, #fashion, #RFID, #fairtrade, #ClimateChange, #Android, #ethics, #analytics, and #humantrafﬁcking
Interestingly some active users belonged to job or career services, SCM practitioners, trade magazines
Not all active users are highly visible. Visible users receive many @reply (or mentions) and their tweets are retweeted.
Analysis also shows that a relatively small percentage of Users account for a large portion of tweets. Only 1% (45 users) and next top 9% (395 users) account for almost 40% and 34% of tweets, respectively
#supplychain tweets contained relatively low sentiment. Descriptive analytics and Content Analytics helped to explain that Most #supplychain tweets were about events, SCM news and reports, jobs, and advertisements.
SCM-related incident during the period of data collection influence sentiments. Major events, such as the Bangladesh garment factory collapse, draw emotional, largely negative, tweets. However this event happened after the end of data collection for this research study.
Positive sentiment was found in promotional tweets; as an example “(“How @Sony strengthened its supply chain and added value “
Risk-related tweets tend to carry far more negative sentiment than those of other topics. There appear to be disproportionately more negative tweets in the topic of manufactur- ing.
This is a table that identifies implications of Twitter usage within the SCM context.
From a professional use perspective twitter usage is relevant for learning, promoting an networking
Organizational use serves for usage as a communication platform, to spread positive images through tweets and retweets
Additionally, #supplychain twitter usage information can serve as a hiring tool , as a sales channel, for market sensing and for sensing supply-chain related events.
The relatively short duration of data collection, which lasted a little over two months, is a limitation
The authors used hashtag #supplychain to search and collect Twitter data. Another, perhaps better, approach would be using multiple keywords (e.g., supply chain) and/or hashtags (e.g., #manufacturing, #logistics) in data collection. This approach would enable research using a large quantity of supply chain-related Twitter data.
developing detailed, practical guidelines, could help companies in designing industry applications, using Twitter and other social media platforms, for diverse supply chain activities, including new product development, stakeholder engagement, supply chain risk management, and market sensing
The other area is studying the impact of social media (and big data) investment (e.g., technologies, data scientists, and data repositories) on supply chain performance.
IMHO The research method section should have defined a set procedure on how they carried out their analysis. I.e. they just link to past papers and do not list the algorithm and how it was used. This was unclear
This leads to lack of clarity of the analysis method raising the issue of reliability and repeatability of their method.
Results are reported within the Research method section leading to confusion
Themes from Word analysis does not match the themes from Sentiment analysis : Why does this happen and why is there no explanation of this differentiation?
Thank you for listening and I look forward to your questions and discussions
#supplychain and Twitter Analytics
Presented by: Dennis L. Kappen, Ph.D. Candidate,
GAMER Lab, FBIT
Image Source: http://room214.com/social/getting-to-know-twitter-analytics
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 2
Image Source: http://cdn2.business2community.com/wp-content/uploads/2015/02/twitter-analytics.png.png
Moving Product or Service > to Customer
Supply Chain Management > SCM
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 3
from 22,399 #supplychain tweets.
descriptive analytics (DA),
content analytics (CA) integrating text
mining and sentiment analysis,
network analytics (NA)
Usage of supply chain tweets
Image Source: http://cdn2.business2community.com/wp-content/uploads/2015/02/twitter-analytics.png.png
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 4
Research method (Results)
Limitations and Future
Image Source: https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAQWAAAAJGE4ODJkOWY0LTVmYTctNDA5NC04ZDRkLWMwNWEwZWU5MzY1NQ.jpg
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 5
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 6
Slow growth in the usage of social media
product traceability data
geo-location and mapping data
supply chain visibility and intelligence
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 7
Access toTwitter data usingTwitter
Application Programming Interface (API)
UnderstandTwitter within SCM context
Analyse: tweets, replies, and retweets
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 8
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 9
Twitter Firehoses are
Analysis is expensive
from the highly
Descriptive analytics (DA)
▪ INTELLIGENCE extraction
▪ METRICS: number of tweets, word counts,
@user per tweet, number of hashtags, etc.
▪ WHO tweets, replies, and retweets?
▪ WHICH URLs?
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 11
Image Source: http://wersm.com/wp-
Content analytics (CA)
Texts are unstructured
Text cleaning and processing
Information and opinions
Advanced text mining:
TwitterAnalytics relies on
automatic text processing
techniques and algorithms
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 12
Network analytics (NA)
Users engage using@reply
▪ Nodes (users)
▪ Edges (relationships)
▪ friendship network and @reply (or
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 13
Network analytics (NA)
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 14
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 15
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 16
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 17
500 million tweets/day (2015)
#supplychain 22,399 tweets
Original tweets (58%)
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 18
Image Source: http://yogfitness.com/wp-
4313 unique users
Active users = original tweets
Visibility = @replies received
+ retweets received
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 19
Image Source: http://yogfitness.com/wp-
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 20
Active users = original tweets + retweets+@replies
Visibility = @replies received + retweets received
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 21
One or more URLs = 89%
Active andVisible users include URLs in their
Top URLs include:
SCM online newspaper sites, and
articles about manufacturing leadership,
and Big Data.
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 22
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 23
3839 different hashtags appear 62,575 times
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 24
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 25
Exemplar tweets Sentiment
This development is pretty exciting for me #supplychain #sustainability 3 (Positive)
Samsung assures excellent working conditions in China. 〈http://t.co/9VcBIL1XAj
#workingconditions #supplychain #responsiblesourcing #rtw〉
Loving the feedback that I am getting on this report today! #supplychain
As parked 787 s multiply? Boeing cash drain worries grow 〈http://t.co/up6V9sBYbi
CHaINA Magazine: Apple Finds Child Labor Abuses in Its Supply Chain – As the
controversy regarding. 〈http://t.co/kt6MMXvng #SupplyChain〉
Walmart US Chief Says Sales are Suffering Because of Restocking Issues
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 26
Exemplar tweets Sentiment
CSR Investigation of #Apple #SupplyChain by #SACOM reveals continued
#labor abuse throughout despite Apple's claims.
Loving this Green Supply Chain infographic 〈http://t.co/M0spSUs6WG
#supplychain #CSR #infographic @SCInsightsLLC〉
IT Supply Chain Disruption a MajorThreat to Business from @forbes
〈http://t.co/raX48u4Zcw #supplychain #IT #analytics〉
How @Sony strengthened its supply chain and added value
〈http://t.co/B0DTiWPAnm #GrnBz via @GreenBiz #IT #supplychain〉
Logistics Could I expect to receive my parcel today? I doubt it. Another terrible
experience from @Parcel2Go #expensive #supplychain #logistics
Great day @ #DalhousieUniversity! Excellent case study presentations
by 16 teams. Join Careers in Motion @JDIcareers #logistics #supplychain
Risk With new #supplychain controls in place IKEA meatballs back after
horsemeat scare 〈http://t.co/YKvYT1ov1H #risk〉
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 29
Community Analysis (network-level)
Network was sparsely distributed
Over 400 communities in the #supplychain
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 30Companies Industry Press Individual users
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 31
(Blondel et al., 2008)
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 32
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 33
73% of #supplychain tweets contained two or
Factors for information diffusion
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 34
Image Source: http://blog.mytmc.com/wp-
Topics of interest
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 35
Image Source: http://http://erau.edu/-
Characteristics of users
Job or career services
Active users vsVisible users
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 36
#supplychain had low sentiment
Major events influencing sentiments
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 37
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 38
Short data collection period
#supplychain vs keywords
Impact of social media on supply chain
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 39
Research Methods section
Themes in word analysis and sentiment
No explanation of the evolution of these
themes (inductive or deductive ?)
12/12/2016 Dennis L. Kappen Twitter :3d_ideation 40