This document discusses a study that analyzed social media data from Twitter to understand public inconvenience caused by metro rail construction projects in four major Indian cities. The study collected tweets related to noise, vibration, traffic and other issues. It then conducted social network analysis to identify influential Twitter accounts. Text analytics, topic modeling and sentiment analysis were used to analyze the tweets and identify discussion topics and public sentiment towards the projects. The analysis of social media data provided insights into the key issues faced by the public and their reactions, which can help metro agencies improve engagement and address public concerns.
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Analyzing public sentiment on Indian metro construction projects via social media
1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/360807044
Harnessing social media data for analyzing public inconvenience in
construction of Indian metro rail projects
Article in CSI Transactions on ICT · May 2022
DOI: 10.1007/s40012-022-00356-9
CITATION
1
READS
200
2 authors, including:
Some of the authors of this publication are also working on these related projects:
IT for Construction Supply Chain View project
Ganesh Devkar
CEPT University
50 PUBLICATIONS 189 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ganesh Devkar on 24 May 2022.
The user has requested enhancement of the downloaded file.
2. ORIGINAL RESEARCH
Harnessing social media data for analyzing public inconvenience
in construction of Indian metro rail projects
Srinjoy Das1 • Ganesh Devkar1
Received: 21 March 2022 / Accepted: 6 May 2022
CSI Publications 2022
Abstract The metro rail links many parts of a large city
and offers one of the best modes of transit across it.
However, the construction phase of the metro causes
inconveniences to the people living in cities by signifi-
cantly escalating the noise and dust pollution. During the
construction of metro rails, citizens face problems in their
everyday lives, especially as the elevated corridors go
through thick-populated areas and high-vehicle traffic
areas, creating traffic snarls and impediments. The litera-
ture review revealed additional concerns, such as public
green cover depletion, potholes, building waste disposal,
and vibrational issues. With the advent of social media,
urban citizens use platforms like Twitter to express their
opinions on inconveniences. The interactions over these
social media platforms constitute big data. This data has
immense potential for public engagement, accountability,
and timely resolution of public inconvenience. The exam-
ination and analysis of social media posts on these issues
are important as such analysis would inform the metro rail
agency about the factors that affect the public and their
emotional reaction to a specific activity or series of activ-
ities performed by the construction team. In this work,
Twitter posts about the inconveniences associated with
metro rail projects in four cities across India were ana-
lyzed. Social network analysis (SNA), text analytics, topic
modeling, and sentiment analysis were conducted to ana-
lyze the data. SNA helped find the Twitter accounts with
above-average betweenness centralities, while text
analytics and topic modeling helped find latent discussion
topics among the stakeholders. Sentiment analysis gave an
idea of the public sentiments towards the projects.
Keywords Inconvenience Nuisance Sentiment
analysis Topic modeling Text analysis Social network
analysis
1 Introduction
The railway-focused ‘‘Mass Rapid Transit System’’ has
been recognized as a solution to traffic and environmental
pollution problems faced in major cities worldwide [1].
However, the construction of such systems can signifi-
cantly exacerbate public inconvenience. Citizens face
many inconveniences in their daily lives due to the con-
struction of metro rails, especially when the metro route
traverses thickly populated areas and high vehicular traffic
zones, causing traffic flow impediments. Along with this,
the construction of metro rail projects results in loss of
public green cover, noise pollution, potholes on roads,
waste from the construction work, and vibrational issues.
With the emergence of online platforms like Twitter,
Facebook, and Instagram, citizens have been using these
social platforms to discuss their woes and the inconve-
nience caused during the construction of such projects.
These interactions that use text, photos, and videos con-
stitute big data [1]. This data on the feelings of urban cit-
izens has immense potential in the comprehension of
public interactions with ongoing construction processes. It
can help devise more meaningful engagement with affected
people and users.
Twitter is a popular social media site used by citizens
and transportation agencies [2]. The analysis of big data on
Srinjoy Das
srinjoy.das@alum.cept.ac.in
Ganesh Devkar
ganesh.devkar@cept.ac.in
1
Faculty of Technology, CEPT University, Ahmedabad, India
123
CSIT
https://doi.org/10.1007/s40012-022-00356-9
3. this social media platform has been gaining attention in
scholarly work in recent years. In the construction man-
agement domain, Ninan et al. [3] have analyzed Twitter
content to identify major inconveniences perceived by the
public in construction projects. Our research analyses the
different inconveniences, how the general public reacts to
such inconveniences, and how they interact. Previous lit-
erature reviews and current news by media point to various
problems caused by the metro rail construction in different
cities of India. However, not much research exists on
Indian public interactions with and reactions towards such
projects and the analysis of online data from social media.
2 Literature review
2.1 Public inconvenience in construction projects
Public inconvenience is an act or omission that affects the
general public, or a substantial portion of them, and
interferes with the privileges that would otherwise be
enjoyed by members of the society [4]. A construction
project can cause numerous aggravations throughout the
project’s span. The inconvenience caused can take many
forms like traffic jams, diminished income, unexpected
mishaps, and air and noise pollution.
Hadi [5] conducted five different case studies involving
the construction of a junior school, a housing development,
a supermarket, a hospital, and a leisure facility in England.
It was observed that the residents in the neighborhood of
these projects reported problems like noise, dust, vibration,
pollution, health and safety, dirt, traffic congestion, and
litter during the construction phase of these projects. Glass
and Simmonds [6] validated the findings of Hadi [5] when
they conducted case studies on housing in the UK to
understand the degree to which construction inconve-
niences impacted residents. As with most construction
sites, the most critical issues were dirt, dust, noise, and
traffic movements, consistent with Hadi’s findings [5].
Comments about construction machinery disturbing resi-
dents at night were noteworthy and mirrored Schexnay-
der’s observations [7].
Duminda [8] performed questionnaire surveys on road
projects and found that traffic delays, air, noise pollution,
interruptions to utility services during road construction,
disturbance to worksite accessibility, and disturbance to
drains were the most significant inconveniences to pedes-
trians. A report on Aberdeen Western Peripheral route in
Scotland states that construction activities can cause nui-
sances like noise, vibration, dust, and amenity loss. In
highway work, truck drivers find the glare from construc-
tion lights inconvenient while driving [9]. The big nui-
sances of night-time construction are noise, sounds, and
light from construction sites [10]. Noise, vibration, glare,
and dust effects of work activities can disturb owners of
adjacent property [11].
Kukadia et al. [12] found that dirt and dust from con-
struction activities caused significant irritation to people in
the neighborhood. Mechanical operations create a lot of
noise and vibration in the construction process, irritating
the residents [13]. Ray [14] described obstacles that com-
panies face on corridors planned for transit development.
The effects of construction are adverse on revenue because
of lack of access, parking loss (temporarily for staging or
long-term), interruptions in water and electricity service,
and experiential nuisances such as dust and noise. Extant
literature points to some of the most frequently found
standard inconvenience parameters. They are noise pollu-
tion, dust pollution, traffic congestion/delay, vibration,
parking, health safety, water pollution, litter, mishaps,
and glare from construction lights.
The construction of a metro rail project in a city typi-
cally spans 10–15 years. A stream of literature focuses on
the public inconveniences caused by the metro rail project.
It reflects the ongoing emphasis on Mass Rapid Transit
Systems to improve urban transportation in major cities
worldwide. However, during construction, diverted road
traffic from existing road corridors raises the traffic load on
adjacent roads during peak hours, contributing to conges-
tion and traffic jams. Elevated metro corridors cause noise
pollution and vibration issues and hamper the privacy of
residents who live close to the existing or proposed metro
[15]. Other problems include loss of trees and public green
cover, potholes on roads, and waste from construction
work.
2.2 Twitter as a preferred social medium
for research
Online Social Media can furnish interested stakeholders
with project information, declarations, and archives and
provide a platform for information exchange or conversa-
tion. Web-based media, such as Twitter, WhatsApp, and
Facebook, use various media designs, including video.
Online media permit partners to share and get data rapidly,
successfully, and effortlessly. Such tools inform individu-
als about issues and welcome clients to engage in various
ways [16].
Nik-Bakht and El-Diraby [17] explain that social media
activity allows stakeholders to participate in construction
decision-making. Social media act as a bridge between
various stakeholders by improving knowledge management
[18]. Hence, in developed and developing countries, con-
cerned agencies like government bodies and transport
departments engage the public in conversation with the
help of online social media. It helps maintain clarity of
CSIT
123
4. information and essential communication between the two
stakeholders. Public views help judge infrastructure pro-
jects by providing updates on future changes to strategy
[19]. For example, in the USA, the State Department of
Transportation (DOT) uses Twitter to disseminate impor-
tant information to the public about traffic conditions (e.g.,
collisions, road delays, and congestion) [20]. Since its
launch in 2008, the Washington State DOT has acted as a
social media model for national and international transport
agencies, with extensive notable social media use, attitude,
and brand-building activities for the public that it serves
[21]. Social media interactions guarantee high customer
satisfaction for infrastructure services (e.g., transportation
systems, control of water supplies, and electrical power
supply).
With the appearance of mainstream online media stages
like Twitter, Facebook, and Instagram, the citizens use
such social platforms to discuss their woes and inconve-
niences caused by the construction of significant infras-
tructure projects. Reports state that Twitter has 316 million
monthly active users, and 500 million tweets are posted per
day, of which approximately 80% are posted using mobile
device users. This presents academic researchers with a
plethora of opportunities to analyze the vivid range of
issues.
Twitter is open and easy to access and gain data. It is
useful because it only allows 140 characters (less than
message /text), which makes the message succinct and
understandable. The use of Twitter to survey online data is
an emerging trend as it can be simultaneously accessed by
many people. Twitter offers ways in which many tweets
can be collected in real-time through computer programs.
The Twitter streaming API, for instance, is a freely
accessible resource that enables real-time access to the
global stream of tweets [22]. Different filters may be
applied to collate tweets of interest, e.g., tweets being
posted within a particular geographical area, tweets con-
taining specific unique keywords, or tweets posted by a
selected group of users. The data obtained can then be
analyzed (e.g., using data mining techniques) to understand
the current situation. It is easier to sort tweets on Twitter
with a strong hashtag trend.
This research aims to discover the nuisances caused by
metro projects in India by analyzing online big data from
Twitter. It also examines the interactions among people
and how they react to such problems. In the context of
India, hardly any research has been done on the application
of Twitter data analysis for the construction industry. The
literature survey helped us understand the different incon-
venience parameters and identify the research gap that
there are not enough analyses on Twitter data about the
inconvenience of metro construction in India.
3 Research Methodology
The research philosophy of Constructivism was adopted by
analyzing multiple case studies and using different analysis
tools like text analytics, topic modeling, sentiment analy-
sis, and social network analysis to measure and interpret
the results. Metro rail projects were studied across four
major Indian cities—Mumbai, Pune, Bangalore, and
Kolkata. Different phases of these metro rail projects are
under construction. These projects are located in different
parts of the country, and hence any undue representation or
bias was avoided in a selection based on geographical
location. The tweets on the social media platform Twitter
constituted the essential data in the study. Tweets about
metro rail projects related to the research objectives were
collected and analyzed through a quantitative research
design. The process involved was as follows:
3.1 Data collection
The first step was the development of a set of predefined
keywords like ‘‘noise,’’ ‘‘vibration,’’ ‘‘traffic,’’ and their
synonyms related to our research to filter out the necessary
tweets. As an example, to extract tweets about ‘‘noise
pollution’’ caused by the metro rail construction, the fol-
lowing keywords were used: ‘‘noise,’’ ‘‘loudness,’’
‘‘racket,’’ and ‘‘disturbance.’’ Further, a search query was
developed to screen Twitter data, e.g., ‘‘metro Mumbai
traffic jam (#Mumbai OR #metro) until:2021-02-15
since:2012-01-01.’’
Web crawlers are often used by researchers to collect
and screen data from social media platforms [19]. This
research used a web crawler named Scrapehero to collect
relevant tweets. After selecting the relevant pages, the
crawler automatically extracted the data and exported it to
excel.
3.2 Data pre-processing
The first step was to de-duplicate the tweets as the software
cannot distinguish between similar tweets, thereby leading
to redundant data. Once the desired tweets were sorted
from the first step, pre-processing was done to further work
with them. The tweets derived in the first stage were ter-
med raw tweets. Several components of a tweet text like
emoticons, user references, phrases, and addresses of the
Uniform Resource Location (URL) were deleted. The
reasoning was that concatenated terms are always sporadic,
which increases the difficulty of data characterization and
decreases the efficacy of data classification, text analytics,
topic modeling, and sentiment analysis [19].
CSIT
123
5. 3.2.1 Normalization and stemming
Normalization was performed by transforming tweets into
minor case to enable easier comparison with a dictionary
[23]. For example, a tweet that reads, ‘‘@metrorailPune Is
there a plan to take care of possible noise and vibrations
created by Pune Metro so that the residents along the route
do not get DISTURBED?’’ would eventually look like this
‘‘@metrorailPune is there a plan to take care of possible
noise and vibrations created by Pune metro so that the
residents along the route do not get disturbed.’’ In addition,
any extra white space between characters was removed.
[19]. Stemming is the process in which derived words are
reduced to their roots, e.g., words such as ‘‘disturbed’’ and
‘‘disturbing’’ are converted to ‘‘disturb.’’ The Stanford NLP
library provides various algorithms such as porter stem-
ming for this process [23].
3.2.2 Removal of stopwords, special characters, and URLs
The next step involved deleting unnecessary words called
stopwords, like ‘‘a,’’ ‘‘an,’’ and ‘‘the’’ that do not contribute
to the tweet’s meaning or sentiment. It was also necessary
to get rid of symbols like #, $, *, and URLs in the tweet
message. These words increase the complexity of the
sentence without adding any tangible value to the entire
data set. The ‘‘@’’ symbol before usernames present in the
tweet was also removed to avoid confusion in data analysis
in the subsequent stages.
3.3 Text analytics and topic modeling
After pre-processing all the datasets, text analytics was
conducted on the text document. Text analytics, computer-
based analysis, and textual data visualization are a range of
linguistic, statistical, and machine learning tools for busi-
ness intelligence, exploratory data analysis, research, and
investigation that model and structure the information
content of textual sources [24]. A two-dimensional (2D)
map recognizes keywords or themes in the source text and
shows the relative frequency or value of those words/
themes, and represents in 2D certain aspects of the rela-
tionships between the words/themes is a typical visualiza-
tion feature from text analytics software [35].
Latent Dirichlet Allocation (LDA) is a standard mod-
eling technique for the subject in focus [26]. A proba-
bilistic generative model classifies a series of tweets into
latent topics [25]. Each topic (as defined by LDA) is a
collection of often co-occurring phrases, and each docu-
ment is assigned a probability of being related to the var-
ious themes [22].
3.4 Sentiment analysis
VADER sentiment is a lexical sentiment classifier used to
mark the sentiment of each text corpus [27]. A sentiment
lexicon is one in which words have been assigned semantic
ratings, usually between - 1 and 1 [28]. In addition,
VADER sentiment combines sentiment scores from indi-
vidual words to create sentence scores. Booster words also
support sentence sentiment (e.g., ‘‘very’’ in very joyful)
and negation words (e.g., ‘‘not’’ in not happy). The entire
score is then compounded into a compound score using
lexicon-based algorithms. The compound score is either
positive, negative, or neutral, and the variation lies from
- 1 to ? 1, with 0 being the neutral score. In the study,
sentiment analysis was performed to investigate the pub-
lic’s overall sentiment, opinion, and perception toward the
metro rail project. It provides an idea of how people feel
about an ongoing metro construction despite the nuisances
in their daily lives.
3.5 Social network analysis
Social network analysis is the mapping and measurement
of relationships between individuals, communities, orga-
nizations, computers, URLs, and other related information
entities. It focuses on interactions between entities such as
people, groups, societies, associations, or organizations
[29, 30]. These relationships or connections are conceptu-
alized as nodes and connectors in SNA [31]. Nodes are
entities, while connectors are node-to-node relations [32].
Pre-processed data was analyzed from the social network
perspective to understand and visualize interactions
between the different stakeholders.
4 Background of case studies
Four metro projects located in different parts of India were
selected. These projects have attracted public attention,
mainly owing to inconveniences during construction.
Mumbai is the most densely populated city in India. For
our case study of the metro project, the selected period was
from 2015 to February 2021. The construction of Phase 2
of the Mumbai metro commenced with the underground
Line 3, the most controversial one. Line 2, Line 4, Line 5,
Line 6, and Line 7 are also under construction in different
phases. The metro lines other than Line 3 are primarily
elevated metro corridors. So it would be a good mix of
different construction activities to study the different types
of inconveniences caused by the construction activities.
People have complained of displacement and reconstruc-
tion, noise pollution, tree-felling, and the violation of
environmental standards. In 2014, the state government
CSIT
123
6. sanctioned 30 hectares of forest land from the no-devel-
opment area to the corporation to construct a metro car
shed. It triggered the Aarey protests, which prompted cit-
izens’ groups from all over Mumbai to launch the ‘‘Save
Aarey’’ campaign and file multiple court cases [33].
The Pune Metro has also been a metro project mired in
certain controversies regarding public issues. Since initial
plans were made, Pune residents did not want elevated
routes because they felt that the narrow roads would not be
able to handle the traffic resulting from the construction of
the pillars of elevated routes. According to the city acti-
vists, flyovers along the metro route and narrow roads on
the metro corridor would cause traffic congestion and
interruptions. Currently, three lines are under construction
in Pune. Phase 2 of the entire metro network is for Line 2,
and the construction work started in May 2016. Line 3
started construction in January 2020. For our data collec-
tion, the period between January 2016 to February 2021
was considered.
Bangalore, a city in southern India, is infamous for its
traffic congestion problems. There has been a demand for
the metro project in the city since the early 2000s. The
construction of phase 2 consists of the south extension of
the Greenline and the west extension of the Purple Line,
which began in October 2016. The Yellow Line and the
Pink Line have also been under construction since 2017
and 2018, respectively. However, the project has also been
without its fair share of criticism from citizens during the
construction phase. There is a significant loss of natural
green cover, and there have been extreme traffic woes due
to the already existing traffic situation in the city. The
timeline for data collection was from January 2016 to
February 2021.
In another significant city, Kolkata, for some of the
residents of Bow Bazaar, one of the most congested areas
of central Kolkata, the much-awaited East–West Metro
project turned out to be a bad ordeal [34]. Many houses
unexpectedly formed cracks in the narrow lane and began
to cave in after a tunnel-boring Metro project machine,
operating 14 m below the ground, struck an aquifer (un-
derground layer of water-bearing permeable rock). Two
buildings had collapsed entirely, and several more had
sustained significant damage. Another notable incident was
the collapse of the Majherhat Bridge due to the vibration
caused by Line 3. Line 2 has been under construction since
2010, and its progress has been very slow over the past
decade. The timeline for tweets extraction was selected as
the period between January 2010 to February 2021.
5 Data analysis
The collected data were analyzed using the different tools
mentioned in the research methodology section. This sec-
tion presents the outcomes of this analysis with references
to each case study project and investigates the pattern from
the cross-case analysis.
5.1 Inconvenience parameters comparison
After the first de-duplication step, as mentioned in
Sect. 3.2, the entire datasets of tweets were read manually.
Then the tweets were categorized according to the context
in which they were written. This had to be done because
the search query using the keywords for the inconvenience
and the city name sometimes returned tweets unrelated to
the construction work of the metro. Examples of relevant
and irrelevant tweets are given below.
Irrelevant Tweet—‘‘Your voice is drowned out at the
ticket counter of #MumbaiMetro by the noise of the
train—perfect metaphor for what #Reliance really
wants’’.
Relevant Tweet—‘‘ @CPMumbaiPolice Metro is good
but the machines on site are making too much noise.
Probably contractors are using outdated machines. And
it can continue till late in night till 1. I stay at A S Marg,
near Powai lake, Powai My good Mumbai Police pl do
something to stop this.’’
The keywords ‘‘noise’’ and ‘‘MumbaiMetro’’ used in the
search query are present in the irrelevant tweet as well.
However, here the tweet did not talk about the noise
emanating from the construction work of the metro but the
general noise from the train at the ticket counter. Hence,
this tweet was not used as our sample. In the relevant tweet,
the exact keywords are present. The tweet talked about the
late-night noise from the metro construction machinery that
caused disturbance to residents. This is an example of
public inconvenience caused by metro construction that our
research aims to explore. The web crawler cannot distin-
guish between these two types of tweets as it tends to
extract every tweet containing the said keywords. Since
this process could not be entirely automated, a manual step
was required to weed out irrelevant data.
All irrelevant tweets were deleted, and in the end, 3018
tweets were analyzed for Mumbai, 796 tweets for Pune,
890 tweets for Bangalore, and 657 tweets for Kolkata.
Table 1 ranks the order of inconveniences faced by the
people in each city. ‘‘Traffic’’ was the highest inconve-
nience issue in Mumbai, Pune, and Bangalore, whereas, for
Kolkata, it was ‘‘Safety Concern.’’ It reflected that traffic is
a serious concern in the first three cities due to their already
CSIT
123
7. congested, unplanned growth over the past few decades.
The construction of the metro rail further deteriorated the
situation. Kolkata residents were scared due to the building
and the Majherhat bridge collapsing during the construc-
tion, as discussed in Sect. 4. The minimal cause of
inconvenience for both Mumbai and Bangalore was
‘‘Vibration,’’ while for Kolkata, it was ‘‘Construction
Waste.’’
5.2 Sentiment analysis
Table 2 compares the three different sentiment types across
the four cities. In all cities, it was found that ‘‘neutral’’
sentiments were the least prevalent. The highest percentage
of tweets in all cities except Pune were ‘‘Negative.’’ This
implies that except for Pune, which had 46% positive
tweets, the overall perception of the project was more
negative while considering all the tweets mentioning public
inconveniences.
The percentage of negative tweets was highest for
Kolkata, followed by Mumbai and Bangalore. The three
cities were very close to each other in terms of negativity.
However, Kolkata was the least in positivity, with a sig-
nificantly lower percentage at 25%. This can be attributed
to the building and bridge collapse incidents in Kolkata
which created a wrong impression about the project in the
public’s mind. Mumbai and Bangalore were close to each
other regarding positive sentiments, while Pune had the
highest positive tweets. Higher positivity in Pune could be
due to the lack of construction mishaps and the absence of
significant politically driven protests about any issue.
Kolkata had the highest percentage of neutral tweets
among all the cities. Mumbai, Pune, and Bangalore had
similar percentages of negative tweets.
5.3 Text analytics and topic modeling
The multi-dimensional scaling was done for all the text
documents containing the tweets of the cities. The relative
size of the bubbles in the MDS plot compares to the rela-
tive frequency of occurrence of terms and a scale. The
MDS visualization also showed term clustering using
various bubble colors; the clustering is based on the adja-
cency of terms when mapped to the 2D plot space and is
only indicative [35]. KH Coder was used to create the MDS
plot.
Figure 1 shows the MDS plot for all the cities, which
showed the clusters of words with the most frequencies and
co-relation. For the Mumbai plot, the blue cluster forms the
most dominant topic comprising keywords like Mumbai,
Metro, Chief Minister, congestion, project, need, travel
time, pain, stop, and another set of yellow cluster words at
the center contain words like an hour, cause, road, time,
construction, problem, congestion, jam.
From there, we can deduce some topics of discussion
like:
1. Need for the metro to reduce traffic congestion
2. Inconvenience caused by metro works on the road
3. Noise pollution caused by the construction of the
underground metro corridor
4. Loss of green cover due to the construction of metro
car shed at Arrey forest
Similarly, for Pune, the following topics can be
deduced:
1. Requests made to politicians and official bodies for a
metro, flyover, and ring road construction
2. Inconvenience caused by metro works on the road,
cutting of trees, and traffic congestion at Hinjawadi
Table 1 Inconvenience
parameter distribution across
different cities
Sr. No Inconveniences Mumbai Pune Bangalore Kolkata
1 Traffic 63.49% 65.19% 56.14% 27.09%
2 Environmental Damage 13.45% 18.73% 17.41% 0.91%
3 Miscellaneous 10.24% 12.78% 12.05% 0.00%
4 Noise 9.08% 1.27% 2.68% 0.30%
5 Construction Waste 1.39% 1.01% 3.46% 0.30%
6 Dust 1.09% 0.00% 5.13% 0.76%
7 Safety Concern 1.06% 1.01% 2.34% 43.23%
8 Vibration 0.20% 0 0.78% 7.46%
9 Project Delay 0.00% 0.00% 0.00% 19.94%
Table 2 Sentiment comparison across cities
Sentiments Mumbai Pune Bangalore Kolkata
Positive 38% 46% 40% 25%
Neutral 13% 12% 12% 24%
Negative 49% 42% 48% 51%
CSIT
123
8. 3. Metro will help in reducing pollution traffic congestion
on roads
4. Public transport like buses and metro solve transporta-
tion problems in the city
For Bangalore, the following main topics can be
deduced:
1. Congestion on the road caused by metro construction
2. Involvement of political parties and destruction of
lakes by construction debris
3. Irritation caused due to noise pollution
4. Urge politicians to finish the metro project fast to ease
the traffic problems
For Kolkata, the following topics can be deduced:
1. Extreme delay in the project
2. Metro construction works caused many building col-
lapses and damaged many others that developed cracks
The topic modeling was done on TACIT, a popular tool
used by researchers. The analysis was done using LDA, as
discussed in Sect. 3.3. A total of 10 probable latent topics
were calculated per city. Each topic contained a maximum
of 19 keywords, and the topic was deduced from them. The
topics with the highest probability value per city are listed
in Table 3.
For Mumbai, the debate between people regarding
environmental concerns due to cutting trees in the Arrey
colony was the strongest. The chief minister of Maha-
rashtra had also been very frequently mentioned during the
interactions. The topmost latent topic of discussion among
people of Pune was the traffic issues faced by people on the
road due to the construction of the metro, while some were
also hopeful that the metro construction would help solve
the traffic congestion in the streets of Pune. Also, people
were concerned about the cutting and felling of trees for
construction around the city, posing environmental threats.
The most probable topic of discussion among the citi-
zens of Bangalore was the concern for cutting and felling
trees and the filling of natural lakes in the city due to metro
rail construction.
The citizens in Kolkata were most concerned about the
building collapse incident in Bowbazar. The protests
through tweets questioned the irresponsible attitude of the
Fig. 1 MDS plot of all cities: Clockwise from top left- Mumbai, Pune, Kolkata and Bangalore
CSIT
123
9. metro authorities and contractors and the overall safety
standards followed during construction.
5.4 Social network analysis
The social network analysis was conducted using a soft-
ware called NodeXL. The datasheet of all the stakeholders
across the four cases was prepared with each tweet’s cor-
responding mentioned users. The Harel and Koren (HK)
algorithm presented a fast multi-level graph layout
approach for the social network graph [36]. This method
comprises a two-phase process that recursively coarsens
the graph to arrive at a multi-level representation. The
graph is first embedded in a high-dimensional space before
being projected onto a two-dimensional plane using main
components analysis. The Harel-Koren Fast Multiscale
Layout was used to plot the graphs for all the cities.
Figures 2, 3, 4, and 5 show the indicative SNA graphs of
all the cities. Table 4 shows the various SNA metrics
calculated across the cities.
Table 3 Topic Modeling results across cities
Cities Topic Keywords
Mumbai Chiefminister, mumbaimetro, arrey, greencover, deforestation, environment, tree, destruction, loss, carbon, footprint, metro,
transport, protest, colony, cut, stop, carshed, construction
Pune Pune, metro, traffic, metro, work, will, road, punemetro, trees, city, metrorailpune, s dev, traffic, roads, problem, city, citytraffic,
due, project
Bangalore Bangalore, metro, traffic, metro, work, trees, cut, road, will, bengaluru, roads s, city, traffic, nammametro, due, south, city,
cpronammametro, construction
Kolkata Kolkata, metro, metrorail, work, bowbazar, collapse, construction, metro s, west, work, east, city, traffic, eastline, road, taratala,
traffic, railway, building
Fig. 2 SNA graph of Mumbai
CSIT
123
10. Node: People and communities are the network’s nodes,
while the connections depict their interactions or flows.
SNA analyzes interpersonal interactions in both visual and
statistical modes. Mumbai had the highest nodes (3031),
while Kolkata had the least (559) among the four cities.
This shows that the number of stakeholders using Twitter
in Mumbai was the highest, while in Kolkata, it was the
least.
Edges: Endpoints or end vertices are terms for vertices
that bind to form an edge. A directed edge is an equilibrium
point of nodes graphically expressed by an arrow drawn
between them. An undirected edge ignores the orientation
of all nodes and handles them equally. Mumbai had the
most edges, whereas Kolkata had the least. It signifies that
the most number of tweets were directed either to one or
more stakeholders in Mumbai to communicate with that
person or organization. Such directed tweets were least in
Kolkata.
Average Geodesic Distance: The shortest path, also
known as a geodesic path, is a path between two nodes in a
graph with the fewest edges. If the graph is weighted, it is a
path with the smallest sum of edge weights. Geodesic
distance, also known as the shortest distance, is the length
of a geodesic path. Geodesic paths are not all unique, but
geodesic distance is well-defined since they are all the
same length. It is a metric for the number of steps required
for one network member to meet the other. It measures the
transmission efficiency since a longer path length indicates
that data must travel through more participants to enter the
entire network. Bangalore had the highest value, which
means that the data transmission from one point to another
required longer paths to reach the intended stakeholder,
while in the case of Kolkata, it required the shortest path.
Diameter: The network diameter is the most significant
possible distance between any two nodes on the map. It is
the maximum Geodesic Distance in the graph. Mumbai had
the highest diameter value, 12, while Kolkata had the least
with 9. It shows that the size of the network was largest for
Mumbai and smallest for Kolkata.
Graph Density: The ratio of the existing connections in a
network to the possible most number of connections is
referred to as ‘‘network density.’’ A ‘‘potential relation’’
could occur between two ‘‘nodes.’’ A higher significant
number of connections between people in a network is
intuitively linked to a tighter structure and greater cohe-
siveness. The density of a network is a measure of a
structure’s cohesiveness. It is the proportion of all the
network’s connections to the total number of potential
links. The density of a graph is measured as a number
between 0 and 1. (1 represents a situation where each node
Fig. 3 SNA graph of Bangalore
CSIT
123
11. is connected to every other node). The network of Mumbai
was less completely linked than the other three. This net-
work had an unweighted density of 0.0003, indicating that
only 0.03% of the network was linked. Kolkata had the
highest graph density, with the value being 0.0019. It
shows that the percentage of interacting people in Kolkata
to the entire population was the highest, while for Mumbai,
it was the lowest.
Average Betweenness Centrality: Indicators of centrality
are used to mark the most significant vertices in a graph.
Betweenness is a measure of a vertex’s centrality in a
graph. The number of times a node serves as a bridge along
the shortest path between two other nodes is measured by
betweenness centrality. Mumbai had the most average
betweenness centrality, and Kolkata had the least value.
Table 5 shows the top ten stakeholders with the highest
average betweenness centralities across the four cities. The
stakeholders have been listed based on the professional or
societal position deduced from their Twitter handles.
Across the four cities, it is observed that the most signifi-
cant stakeholders were:
• Chief Minister/Chief Minister’s Office (CMO)
• Prime Minister/Prime Minister’s Office (PMO)
• Metro Rail Agency
Most tweets were directed towards the CMO in Mumbai
and Bangalore, while at least one user categorized under
‘‘Common People’’ was common across Pune, Bangalore,
and Kolkata. The Prime Minister was common across
Mumbai, Pune, and Bangalore, while the PMO was more
prevalent in Kolkata. The metro rail agency and minister of
railways were the most mentioned for Pune and Kolkata,
respectively.
6 Conclusion
This research identified various inconveniences caused by
the construction of the metro from Twitter API with the
help of multiple keywords. Irrelevant tweets that were
picked by the API were manually analyzed and eliminated.
Sentiment Analysis for all four cities was done using
VADER. Pune had the highest percentage of positive
Fig. 4 SNA graph of Kolkata
CSIT
123
12. tweets (46%), and Kolkata had the highest rate of negative
tweets (51%). Kolkata also had the most neutral tweets
(24%) among all four cities.
Text Analytics and Topic Modeling were done using
KH Coder and TACIT. Other than the word ‘‘Metro,’’ the
word ‘‘Traffic’’ was most seen across the text corpus of
Mumbai, Pune, and Bangalore, while for Kolkata, it was
‘‘Bow Bazar’’ and ‘‘Work.’’ Social Network Analysis
conducted using Nodes XL showed the most prominent
stakeholders across the four different cities. Topic model-
ing revealed the most probable underlying themes of dis-
cussion among the stakeholders.
This study helped to understand the problems that
ordinary people most tweeted about. Sentiment research
can be helpful for decision-makers to consider public
opinion, both broadly and in-depth, as public opinion
becomes increasingly relevant to the management during
the feasibility review, design, and post-evaluation phases
of metro rail projects.
This study would help metro authorities, and construc-
tion companies understand the woes of the public and get a
clear picture of how they interact online. It is critical to
study this matter as it can directly inform any organization
or authority of how a particular activity or sequence of
actions executed by the construction team affects the
public. In the era of big data, this system can assist pro-
fessionals and authorities in being proactive, participating
in public discourse, and helping serve them better during
its construction phase, which is, after all, the primary
purpose of a metro rail project.
Fig. 5 SNA graph of Pune
Table 4 SNA metrics across
cities
SNA Metrics Mumbai Pune Bangalore Kolkata
Nodes 3031 1042 959 559
Edges 3298 1862 896 814
Diameter 12 10 16 9
Average Geodesic Distance 3.96 4.03 4.37 3.35
Graph Density 0.0003 0.0015 0.0009 0.0019
Average Betweenness Centrality 4377.72 2364.54 1096.30 819.29
CSIT
123
13. 6.1 Research limitations
The research is an example of how sophisticated comput-
ing techniques can be applied to construction informatics.
There are, however, a few limitations to this study. As part
of the study, Twitter data was analyzed by a cross-sectional
view of the data within a limited time frame. Although
social media platforms like Twitter have made inroads in
the social life of citizens, especially in urban settings, it is
yet to become a popular mechanism for social interaction
and is far from mainstream use for voicing concerns in
public life. Hence, the Twitter data analyzed as part of the
study need not be a true reflection of public inconvenience
caused by the metro construction project in India. The age
group of the Twitter user group is generally young, aged
18–24 (44%), aged 25–29 (31%), and 30–49 (26%). Since
younger people use Twitter more, the target group that
expresses inconvenience becomes limited. In the coming
years, when the social media platform is expected to gain
significant popularity among more users (especially among
the young generation in urban settings), social media data
and its analysis will become more relevant and practical in
policy settings.
This research work assumed that the data tweeted by the
user is accurate. As the user handle may post misinfor-
mation and there is no possible method to check the
veracity of the tweet or verify the user’s identity, it makes
the authenticity of the tweets questionable. Tweets in the
local language were not included in the analysis, which is a
drawback because only a limited number of people posted
about inconveniences in English, and the analysis was not
comprehensive. Furthermore, socioeconomic and cultural
factors were not considered in the analysis.
6.2 Future scope
This research is an initial exploration into the different
inconveniences caused by the metro project’s construction
in Mumbai, Pune, Bangalore, and Kolkata through analysis
of tweets using Text Analytics, Topic Modeling, Sentiment
Analysis, and Social Network Analysis. There is immense
scope in the future to explore other domains. Further
research could enhance the accuracy of the sentiment
classification algorithm and develop various visualization
tools to present extracted online textual data. If comparable
data is gathered, social, cultural, and time influences on
public opinion would become clearer. There is a need for
better tools to extract and analyze local language tweets as
currently available tools only work with English words,
which results in the exclusion of many relevant tweets. An
improved emotion dictionary construction tool that can
automatically or semi-automatically evaluate attitudinal
terms and deal with out-of-vocabulary (OOV) words
should be developed. Better tools are required to identify
sarcasm in tweets. An extensive tool to detect the profes-
sion and the type of user handle on Twitter would be
helpful as this activity is currently performed manually,
consuming considerable time. There were also instances
when tweets unrelated to the public inconvenience were
extracted during the data collection. There is scope to
introduce more comprehensive textual analytical functions
Table 5 Top ten major stakeholders with highest betweenness centralities
Highest Betweenness
Centralities
Mumbai Pune Bangalore Kolkata
1 Chief Minister Metro Rail Agency Chief Minister Minister of
Railways
2 Police Chief Minister Activist Common People
3 Chief Minister’s Office Data Analyst South City Political Action
Committee
Media
4 IAS Officer Common People NGO worker Chief Minister
5 Metro Rail Agency Chief Minister’s
Office
Prime Minister Common People
6 Prime Minister Traffic Police Metro Rail Agency Economist
7 High Profile Politician Common People Traffic Police Media
8 NGO Police Common People Prime Minister’s
Office
9 Minister of Road Transport
Highways
Prime Minister Activism Page Metro Rail Agency
10 Metro Rail Agency Multi-professional Chief Minister’s Office Coder
CSIT
123
14. to enable the crawler to collect data that accurately matches
the concerned subject.
Funding The authors received no financial support for this article’s
research, authorship, and/or publication.
Declarations
Conflict of interest The author(s) declare no potential conflicts of
interest concerning research, authorship, and/or publication of this
article.
References
1. Sharma N, Dhyani R, Gangopadhyay S (2013) Critical issues
related to metro rail projects in India. J Infrastruct Dev
5(1):67–86. https://doi.org/10.1177/0974930613488296
2. Manetti G, Bellucci M, Bagnoli L (2017) Stakeholder engage-
ment and public information through social media: a study of
canadian and american public transportation agencies. Am Rev
Public Adm 47(8):991–1009. https://doi.org/10.1177/
0275074016649260
3. Ninan J, Clegg S, Mahalingam A (2019) Branding and govern-
mentality for infrastructure megaprojects: the role of social
media. Int J Project Manage 37(1):59–72. https://doi.org/10.1016/
j.ijproman.2018.10.005
4. Pasayat A (n.d.) Kachrulal Bhagirath Agrawal Ors vs State Of
Maharashtra Ors on 22 September, 2004. Retrieved from
https://indiankanoon.org/doc/293583/
5. Hadi M (2001) DTI construction industry directorate and forestry
commission project report: prepared for: CD Framework: Best
value UK timber in construction Approved on behalf of BRE
6. Glass J, Simmonds M (2007) ‘‘considerate construction’’: case
studies of current practice. Eng Constr Archit Manag
14(2):131–149. https://doi.org/10.1108/09699980710731263
7. Schexnayder CJ (n.d.) Mitigation of night-time construction
noise, vibrations and other nuisances. Synthesis Practice 218,
National Cooperative Highway Research Program, Transporta-
tion Research Board, Washington, DC.
8. Duminda JMS (2010) Strategy to minimize user inconvenience
during road rehabilitation. University of Moratuwa
9. Griffith A, Lynde M (2002) Assessing public inconvenience in
highway work zones (No. FHWA-OR-RD-02-20). Oregon Dept
of Transportation Research Unit.
10. Shane JS, Amr Kandil CJS (2011) Nighttime construction
impacts on safety, quality, and productivity (Issue 10). https://
onlinepubs.trb.org/onlinepubs/nchrp/docs/NCHRP10-78_FR.pdf
11. Ferguson A (2012) Qualitative evaluation of transportation con-
struction related social costs and their impacts on the local
community (Issue May) [The University of Texas at Arlington].
https://rc.library.uta.edu/uta-ir/handle/10106/11165
12. Kukadia V, Upton S, Grimwood C (2003) Contorlling particles,
vapour and noise pollution from construction sites; parts 1–5: site
preparation, demolition, earthworks and landscaping. BRE Pol-
lution Guide, pp 1–8
13. Xue X, Zhang R, Zhang X, Yang RJ, Li H (2015) Environmental
and social challenges for urban subway construction: an empirical
study in China. Int J Project Manag 33(3):576–588. https://doi.
org/10.1016/j.ijproman.2014.09.003
14. Ray R (2017) Open for business? Effects of los angeles metro rail
construction on adjacent businesses. J Trans Land Use
10(1):725–742. https://doi.org/10.5198/jtlu.2017.932
15. Chakraborty, D (2010) Mumbai Residents Oppose Elevated
Metro Corridors. Avail online at http://www.projectsmonitor.com
corridors, accessed in January 2011.
16. United States Environmental Protection Agency (n.d.) Public
Participation Guide: Social Media. Retrieved from https://www.
epa.gov/international-cooperation/public-participation-guide-
social-media#:*:text=Social media allow stakeholders to,in a
variety of ways.
17. Nik-Bakht M, El-Diraby TE (2020) Beyond chatter: profiling
community discussion networks in urban infrastructure projects.
J Infrastruct Syst 26(3):05020006.
18. Perera S, Victoria M, Brand S (2015) Use of social media in
construction industry: a case study. Going North for Sustain-
ability: Leveraging Knowledge and Innovation for Sustainable
Construction and Development: Proceedings of the International
Council for Research and Innovation in Building and Construc-
tion (CIB2015), 23-25 November 2015, South Bank University,
London, UK, 462-473.
19. Qi B, Costin A, Jia M (2020) A framework with efficient
extraction and analysis of Twitter data for evaluating public
opinions on transportation services. Travel Behav Soc 21:10–23.
https://doi.org/10.1016/j.tbs.2020.05.005
20. Kocatepe A, Ulak MB, Lores J, Ozguven EE, Yazici A (2018)
Exploring the reach of departments of transportation tweets:
What drives public engagement? Case Stud Trans Policy
6(4):683–694
21. Wojtowicz J, Wallace WA (2016) Use of social media by
transportation agencies for traffic management. Transp Res Rec
2551(1):82–89. https://doi.org/10.3141/2551-10
22. Basu R, Khatua A, Jana A, Ghosh S (2017) Harnessing twitter
data for analyzing public reactions to transportation policies:
evidence from the odd-even policy in Delhi, India. (November).
Retrieved from https://www.researchgate.net/publication/
321997978_Harnessing_Twitter_Data_for_Analyzing_Public_
Reactions_to_Transportation_Policies_Evidences_from_the_
Odd-Even_Policy_in_Delhi_India
23. Kaur N, Pushe V, Kaur R (2014) Natural language processing
interface for synonym. Int J Comput Sci Mob Comput
3(7):638–642
24. Hu X, Liu H (2012) Text analytics in social media. In: Mining
text data (pp 385–414). Springer, Boston, MA.
25. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation.
J Mach Learn Res 3:993–1022
26. Sujon M, Dai F (2021) Social media mining for understanding
traffic safety culture in washington state using twitter data.
J Comput Civ Eng 35(1):04020059. https://doi.org/10.1061/
(asce)cp.1943-5487.0000943
27. Hutto C, Gilbert E (2014) Vader: a parsimonious rule-based
model for sentiment analysis of social media text. In: Proceedings
of the international AAAI conference on web and social media,
vol 8, No. 1, pp 216–225
28. Borg A, Boldt M (2020) Using VADER sentiment and SVM for
predicting customer response sentiment. Expert Syst Appl
162:113746. https://doi.org/10.1016/j.eswa.2020.113746
29. Scott J (1988) Social network analysis. Sociology 22(1):109–127
30. Williams NL, Ferdinand N, Pasian B (2015) Online stakeholder
interactions in the early stage of a megaproject. Proj Manag J
46(6):92–110
31. Carrasco JA, Hogan B, Wellman B, Miller EJ (2008) Collecting
social network data to study social activity-travel behavior: an
egocentric approach. Environ Plann B Plann Des 35(6):961–980
32. Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network
analysis in the social sciences. Science 323(5916):892–895
33. Johari A (2018) From parsis to adivasis, Mumbai’s metro project
faces heat from citizens. Retrieved from https://scroll.in/article/
CSIT
123
15. 881470/from-parsis-to-adivasis-mumbais-underground-metro-pro
ject-faces-heat-from-citizens
34. Chattyopadhay S (2019) In Kolkata, houses collapse during
Metro tunnelling work. Retrieved from https://frontline.thehindu.
com/dispatches/article29332315.ece
35. Palmer S, Udawatta N (2019) Characterising ‘‘Green Building’’
as a topic in Twitter. Constr Innov 19(4):513–530. https://doi.org/
10.1108/CI-02-2018-0007
36. Harel D, Koren Y (2002) A fast multi-scale method for drawing
large graphs. J Graph Algorithm Appl 6(3):177–202. https://doi.
org/10.7155/jgaa.00051
CSIT
123
View publication stats