Made a Visualisation project Report by using R packages(ggplot) on the Global terrorism dataset(1970-2015) using different interactive graphs, different combination of colours had been used so that colour blind people can also visualise the patterns.
1. ANALYSIS OF
GLOBAL
TERRORISM (PATH
TO PEACE)
ABSTRACT
[In this project I have used various methods of
visualization for analysing different facts of Global
Terrorism Data. Different type of graphs like line, bar,
maps and wordcloud has been used for this purpose.
Most of these graphs are interactive one with a facility
to zoom in/out and hover the mouse over data point.
Statistical and data analysis software, R is used for this
entire project.]
SIDDARTH CHAUDHARY
X16137001
Data Visualization Report
M.Sc Data Analytics
National College of Ireland
2. 1 Introduction
One of the three major problem this world is facing is terrorism and other two are food and pollution.
Unfortunately as the technology is getting advanced, the techniques to kill people and to convince
people to kill others are become easier. We all appreciate internet, search engines and social media to
bring this world together but on the other hand they are becoming prima facia reasons of promotion
of terrorism. We cannot directly stop these attacks but can try to do an insight analysis of these
activities which might help us to find root cause of origination of terrorism in this world and path it
will lead us in future.
2 Literature Review
As per Heinz Kohut [1] a need for revenge is required when a person has suffered a narcissistic injury.
Anyone who is involved in any form of terrorism has its own reason. Sometimes the person/group is
motivated by other person/group one who has faced suffering or might be they has some personal
motive like revenge, wealth or power . As these groups/people become stronger and bigger they are
used by influential and political parties for their advantage. As per Rummel [2] it is predicted that 170
million civilians have been killed in 20th
century. Stalin, Mao and Hitler are responsible for 100 million
of them.
Lotto [3] analysed various reasons of terrorism and important ones are humiliation, exploitation and
victim of violence. As per Volkan [4] Germany's humiliation was responsible for the origin of World
War II.As per [5] Andrew and [3] Lotto, between World War II and 1960, lot of countries like India,
colonial rule was ended and lot of secular and democratic governments came into power. This was
time when whole world was taking a new shape. Then in 1980, starting with Jimmy Carters presidential
rule in America, US has taken wide military action in middle east. There had been military interventions
in 43 African and middle east countries with major population being Muslim. There has been various
factors behind this war like protecting oil supplies, cold war or rescuing hostages and sometimes
attacking non supporting countries.
Lee [6] has analysed the another important reason of terrorism as Oil. He presented a detail study
regarding the association between oil and terrorism. As per Freeman [7] terrorist groups need lot of
funding for training, weapons and performing attacks for example the 9/11 attack is supposed to have
costed half million dollar. As per Lee [6] this money comes prominently from oil production due to
their huge profit, religious orientation and commercial advantages. Lee[6] also explained another
aspect, that oil producing countries becomes the terrorism target to impact world economy/enemy
countries or for raising funds. To prove his hypothesis Lee has used GTD (Global Terrorism Data) and
International Terrorism: Attributes of Terrorist Events (ITERATE) database and applied various statistical
models.
3. 3 Dataset Understanding
1.1 About Data
Global Terrorism Data is an open source produced by Kaggle[8]. It contains details about
international and national terrorist events that has happened around the globe between 1970 to 2016.
The data has information more than 150,000 cases. For each case, it has given details of date, time,
location, latitude, longitude, weapons, casualties, responsible group along with description of case. It
includes data about 83,000 bombings, 11,000 kidnappings.
1.2 Data Reduction
Originally there are 137 columns in this data. Out of which I have selected 18 columns for this
data analysis. Below table explain the important fields and reason of their selection. Although this data
qualifies for very advanced and high level analytics but looking at scope of this assignment I have tried
to answer some questions which needed numeric data. Rest of the fields which I have not selected,
were descriptive and could be used for text and sentiment analytics.
Seq Name of field Type of
field
Code example Selected
1 Id Numeric Unique id of
each incident
1,2 No/ It does not provide any
substantial information
2 Year Numeric Year of the
incident
1970-2015 Yes/ Helpful in finding relationship of
events wrt time
3 Month Numeric Month of the
incident
1,2 Yes/ Helpful in finding relationship of
seasons with events
4 Day Numeric Day of the
incident
1,5 etc Yes/ It is of use if want to check
whether there is any pattern in event
date
5 Time Numeric Time of the
incident
12:30 Yes/ It is of use if we want to know
preferred time of attack
6 Country_name Text Country where
incident
happened
India, Turkey Yes/ Helpful in understanding which
countries are more attacked and why
7 Region_txt Text Region of the
incident
North America Yes/ Helps in region bases analysis
8 City Text City where
incident
happened
Denver Yes/ Helps in finding which cities are
more attacked and when.
9 Latitude Text Latitude of city Yes/ Helps in mapping the location
10 Longitude Numeric Longitude of City Yes/Helps in mapping the location
11 attack_type Text Type of Attack Bombing,
Assassination
Yes/ It will help in find the patterns in
attack with respect to time and
countries.
12 target_type Text Target type Government, Police Yes/ It will help in find the patterns in
target type with respect to time and
countries. We can also check the
relation between attack type and
target type.
13 target_sub_
type
Text Target Subtype Embassy/
Consulate
Yes
14 nationality_of_
target
Text Nationality of
Target
USA/
Germany
Yes/ Which country are more targeted
at what time.
15 weapon_type Text Type of weapon
used in attack
Bomb/
Firearms
Yes/ It will help in understanding the
weapons being used in attack.
4. 16 Minor_
injured_
people
Numeric Number of
minors injured in
attack
1,2 Yes/ To understand what and when
children are more impacted.
17 Major_
injured_
people
Numeric Number of non-
minors injured in
attack
1,2 Yes/ To understand which places are
more prone to attack
18 People_
killed
Numeric Number of
people killed
1,2 Yes/ Which attacks are more
dangerous
Table-1 Data columns of data set
1.3 Data Cleaning
Lot of cleaning has been done on the data especially with text part. Lot of names and summary
details were consisting of special characters which were removed. Some spellings of cities and states
were corrected.
2 Analysis and Visualization tool
I have used R for this entire project. R is an statistical analysis software with convenient data modelling
and visualization packages. The programming done for this project has been uploaded in submission
link on moodle t. Following packages were used in R like Plotly, graphics, stats etc.
3 Data visualization
For each visualization have performed following steps.
1. Business question: It states the problem/pattern which we are trying to find from data.
2. Data preparation: It explains the method of data preparation done to visualise or analyse the data
to find the answer of question. All the data setup in done in R.
3. Data visualization: It is the graphical representation of data which has been done with the help of
R.
4. Data Analysis : It is the analysis of graph and deductions which can be made from graph.
5. Statistical Analysis: It is the statistical analysis which supports the analysis which has been done
according to graph.
5. 4 How the terrorism has grew in past 45 years?
Data preparation : Initially the data was loaded in R. Then it is grouped with respect to year with the
help of table function in R.
Data Visualization: I have prepared histogram and line graph to validate the pattern. The graph are
prepared with the help of plotly and plot function in R. The first one is an interactive graph prepared
by plotly.
Graph-1Line graph Graph-2 Histogram
Analysis: As we can see from graph there has been tremendous growth in the terrorism in past 45
years. There is no fixed trend or cycle in the graph. From 1970 to 2000 there is one peak in graph
which shows proves the Lee theory that around 60 there was not much terrorism as new countries
were coming into power and busy in establishing themselves. The decade of 1990-2000 has seen
lesser cases in comparison to previous decade. There can be various factors which can explain it like
economic details, war in gulf etc. But this will need further analysis by regression.
After 2005 terrorism has risen exponentially. Below is the overall statistical summary of data. This data
has seen Q3 two times in whole tenure which explains the sudden drop after 2000.
Min 1st
Quartile Median Mean 3rd
Quartile Max
1971 1986 2014
470 1332 2860 3484 3887 16840
Statistical analysis: I performed an annova between year and number of cases in each year. The reason
I chose annova as year is a factor or category data. So an annova will be better choice in comparison
to regression. Following is the coefficients and F statistics of annova.
Df Sum square Mean square F Value P Value
year 44.04 911099 52.63 2.2e-16***
Residuals 540 9337023 17291
6. This data qualifies for time series analysis and can be correlated with various factors of world economy
as further analysis. I also plotted Auto correlation graph and partial auto correlation graph to see the
relation between consecutive ears and these has been relationship up to a time lag of 7 years.
Graph-3 ACF Graph-4 PACF
5 Is there any relationship between month and number of attacks.
The reason to do this analysis is to check if there are any months which are more preferred in attack.
Like in many asian countries there are more attacks in summer rather than winters.
Data preparation : The data was grouped according to month and year. It is done in R with the help of
table function.
Data Visualization: I have prepared mosaic plot for this question. Mosaic plots are best to compare
the matrix data where row and column can be same or different. Like in this graph year has been taken
at x axis and month at y axis. The size of each rectangle shows the number of attack which has been
done in that month and year. I have chosen different colour with different intensity in such a fashion
that red-green and yellow-blue are never in pair. This will avoid any confusion for color blind people
and on black/white print.
Graph-5-Mosaic plot of Number of cases in each year and month
7. Analysis : Since mosaic plot in unable to provide any concrete answer, so I plotted boxplot and
performed annova over this data to analyse the relationship of month and total number of terrorist
cases
Statistical Analysis: As we can see from box plot there is not much difference between various boxplots
of various months. January, September and December are little smaller than rest of months but this
difference does not seems to be much significant. To confirm this, I used annova to test the variability.
Following is the output of F statistics on annova.
Df Sum square Mean
square
F Value
Cases 1 116.8 116.808 8.4
Residuals 583 8073.2 13.08
Significant 0.001 0.01 0.05
Table-2-Statistical Analysis of month/year impact
As we see from F statistic of annova that results are not very significant. So we accept the null
hypothesis that there is not much difference in total number of cases in each month.
6 Which countries have seen maximum terrorism cases and killings.
Data preparation : The data was grouped according to country in R with the help of table function.
Data Visualization: To prepare this graph, plot_geo function of plotly package is used. It required all
the data to arranged according to the name of countries which are currently approved by UNO. There
were some countries which existed in past or divided into countries with new names, these are difficult
to visualize with plot_geo as it needed old map of world. So those countries were dropped from list. I
have chosen different shades of red colour to show the number of cases of terrorism. More red means
more cases and white means no cases. There are three maps plotting number of incident in each
country, number of killings in each country and number of killings of US citizen in each country. All
these graph are user interactive with a facility to zoom in/out.
8. Map -1
Map 1 shows the countries where with different shades according the number of cases As we can see
from graph middle east, Afghanistan, Pakistan and India are the countries which have faced maximum
number of cases.
Map-2
Map2 shows the countries where maximum number of killings have occurred over the last 45 years.
As we can see again Iraq, Afghanistan, Pakistan, India and south america are among the top 10.
Map 3
Map3 shows the number of US citizens killed thought the world in different cases. As such there has
not been any significant number except Iraq and Afghanistan due to the war situation in these two
countries.
To find the correlation between US citizen killing and total number of killings I have done correlation
analysis between the two data set. The correlation coefficient of two data set is 0.27.
9. Graph-8 is the scatter plot of total number of killings and US citizen killings in each incident. As we can
see from the graph that there has not been much correlation between the two and number of US
killings in less than 100 while the other component is in thousands. So we can rule out the possibility
of US citizen being motive in all the cases.
Graph-8
7 What type of person is targeted the most?
To answer this question we need to perform text analytics on the details of the summary provided
with respect to each incident. There is a column named summary which provides the basic details of
each incident like what, where, how etc. The data from this column was extracted and converted into
one gram of text. (n gram corresponds to braking of text into small strings of length n). So in other
words each word of text is separated, cleaned and counted. This is also known as corpus in text
analytics.
Data Visualization: The visualization tool used for this question is word cloud. A word cloud is
collection of words where size of each word corresponds to its frequency.
Graph-9
Analysis: Although word cloud is just pictorial representation of our text data, but it can impart lot of
information. Word cloud of Graph-9 is consist of all words which have been reprated more than 3000
times. So this single graph is sufficient to tell who is targeted the most.
People impacted-citizens, military, police, Diplomatic
Property: Private, Government
10. Places impacted: Institution,religious.
This information very much match with the information extracted above by various graph and
statistical analysis.This data can be further used to perform sentiment analysis.
8 Which countries have been attacked more in last 45 years.
It is one of the most important question that this dataset could answer. There are some countries
which are more impacted in comparison to others. Or there is a shift in attack. We can find out which
country was most attacked in which year. It will also be a reflection of world and country economy.
Developed countries are major target of attack to impact share market or trades. But sometimes
developing countries are also hit to lower down their growth rate or might be some other reasons.
Graph-10
Graph-11 Graph-12
Analysis: Since the above question has big horizon and there are many ways to represent this
information, I have chosen an interactive graph to display these facts. Graph-10 is a line graph between
year and number of cases with different colour line for each country. If we put cursor on a line, it will
show the country name and number cases in that year. This graph can be summarized or expanded
with a duration of 5,10,15 and 45 years. Graph-11 is a zoomed version of first graph from duration
2007 to 2015.The country visible in first and second graph are the one with highest number of cases
in last 45 years. Graph-12 is a bar graph which provides this information correctly along with figures.
Further Analysis: If this data is combined with various other economical and resources (like oil, gas,
man power) we can actually predict the terrorism cases in near future for these countries.
11. 9 Conclusion
The Global Terrorism Database is a very vast dataset with numerous possible visualizations and
analysis. This data set can be used for performing timeseries analysis to forecast the future cases of
terrorism in different countries. It can also be used to forecast the type of weapons and places targeted
in near future. What we have done in this project is just a tip of the iceberg with huge possibilities for
different causal analysis which can be used as precautionary measures to maintain peace in this world.
10 Reference
[1] Kohut, H., 1972. Thoughts on narcissism and narcissistic rage. The psychoanalytic study of the child,
27(1), pp.360-400.
[2] Rummel, R.J., 1997. Death by government. Transaction Publishers.
[3] Muenster, B. and Lotto, D., 2010. The social psychology of humiliation and revenge. The
fundamentalist mindset: Psychological perspectives on religion, violence, and history, pp.71-89.
[4] Volkan, V.D., 1988. The need to have enemies and allies: From clinical practice to international
relationships. Jason Aronson.
[5] Bacevich, A.J., 2017. America's War for the Greater Middle East: A Military History. Random House
Trade Paperbacks.
[6] Lee, C.Y., 2016. Oil and Terrorism: Uncovering the Mechanisms. Journal of Conflict Resolution,
p.0022002716673702.
[7] Freeman, Michael. 2011. ‘‘The Sources of Terrorist Financing: Theory and Typology.’’Studies in
Conflict and Terrorism 34 (6): 461-75.
[8] https://www.kaggle.com/datasets