SlideShare a Scribd company logo
1 of 32
Download to read offline
Data Warehousing and Business Intelligence
Project
on
Impact of Travel & Tourism on Economy
Shantanu Deshpande
x18125514
Video Link:https://youtu.be/1upKlsPfWJ4
MSc/PGDip Data Analytics – 2019/20
Submitted to: Prof. Sean Heeney
National College of Ireland
Project Submission Sheet – 2019/2020
School of Computing
Student Name: Shantanu Deshpande
Student ID: x18125514
Programme: MSc Data Analytics
Year: 2019/20
Module: Data Warehousing and Business Intelligence
Lecturer: Prof. Sean Heeney
Submission Due
Date:
12/04/2019
Project Title: Impact of Travel & Tourism on the Economy
I hereby certify that the information contained in this (my submission) is information
pertaining to my own individual work that I conducted for this project. All information
other than my own contribution is fully and appropriately referenced and listed in the
relevant bibliography section. I assert that I have not referred to any work(s) other than
those listed. I also include my TurnItIn report with this submission.
ALL materials used must be referenced in the bibliography section. Students are
encouraged to use the Harvard Referencing Standard supplied by the Library. To use
other author’s written or electronic work is an act of plagiarism and may result in disci-
plinary action. Students may be required to undergo a viva (oral examination) if there
is suspicion about the validity of their submitted work.
Signature:
Date: April 12, 2019
PLEASE READ THE FOLLOWING INSTRUCTIONS:
1. Please attach a completed copy of this sheet to each project (including multiple copies).
2. You must ensure that you retain a HARD COPY of ALL projects, both for
your own reference and in case a project is lost or mislaid. It is not sufficient to keep
a copy on computer. Please do not bind projects or place in covers unless specifically
requested.
3. Assignments that are submitted to the Programme Coordinator office must be placed
into the assignment box located outside the office.
Office Use Only
Signature:
Date:
Penalty Applied (if
applicable):
Table 1: Mark sheet – do not edit
Criteria Mark Awarded Comment(s)
Objectives of 5
Related Work of 10
Data of 25
ETL of 20
Application of 30
Video of 10
Presentation of 10
Total of 100
Project Check List
This section capture the core requirements that the project entails represented as a check
list for convenience.
Used LATEX template
Three Business Requirements listed in introduction
At least one structured data source
At least one unstructured data source
At least three sources of data
Described all sources of data
All sources of data are less than one year old, i.e. released after 17/09/2017
Inserted and discussed star schema
Completed logical data map
Discussed the high level ETL strategy
Provided 3 BI queries
Detailed the sources of data used in each query
Discussed the implications of results in each query
Reviewed at least 5-10 appropriate papers on topic of your DWBI project
Impact of Travel & Tourism on the Economy
Shantanu Deshpande
x18125514
April 12, 2019
Abstract
The tourism to a destination is often under-appreciated in terms of its economic
importance. With the advancement in the transportation industry, we can now see
bigger and fast-moving airplanes, which has made travelling to different locations
more affordable and convenient. This has led to an increase in the tourist numbers
worldwide. Very often, this growth has a tremendously positive impact on a coun-
trys key development indicators. To understand its importance, I have built a Data
Warehouse and a Business Intelligence model by using the data of tourist arrival
rate as my main source and compared it with other development indicators like the
unemployment rate and the GDP to observe impact on the countrys development
indicators.
1 Introduction
Travelling these days has become a customary habit for people across the globe, be it
for business purpose or leisure purpose. Due to the improvement in air connectivity
between countries, it has resulted in decrease in travel time and travel costs. This is one
of the major factors that have boosted the tourism rate globally. Tourists contribute to
sales, profits, jobs, tax revenues, and income in an area. The most direct effects occur
within the primary tourism sectors –lodging, restaurants, transportation, amusements,
and retail trade . Through secondary effects, tourism affects most sectors of the economy.
An economic impact analysis of tourism activity normally focuses on changes in sales,
income, and employment in a region resulting from tourism activity Stynes (1997). Many
countries have started taking steps in order to ease the visa norms for tourists Kumar
(2018) in view of the economic advantages. In this project, I am analyzing the year wise
tourism rate across several countries and comparing it with few key country development
indicators like the GDP and unemployment rate to figure out the relationship between
them. Using this data warehouse, I will be developing a MOLAP (Multidimensional
Online Analytical Processing) cube that will address the following business queries
(1) Is there a relationship between the number of tourists visiting a country and the
unemployment rate of a country?
(2) Does the growth in International travel and tourism affect the GDP of the country?
(3) How the top 5 European countries have performed in terms of the global GDP
contribution of the Travel and Tourism industry?
1
2 Data Sources
For implementing this project, I have made use of 6 data sets fetched from 5 different
sources of which 5 are structured and 1 is unstructured. They have been discussed in
brief below
Source Type Brief Summary
World Bank Structured It is used because it has country wise data of
international arrival rate which is useful for
my query
OECD Structured The data contains country wise Unemploy-
ment rate which is joined with World Bank
data for my first query
Wikipedia Unstructured It is an online encyclopedia and contains the
relevant data table related to tourism which
helped me for my second query
WTTC Structured This website contains the data pertaining to
the economic impact of travel and tourism
industry which is useful for me to compare
it with worldwide GDP contribution.
Statista Structured It is a statistical data containing the world-
wide total GDP contribution and has been
used as benchmark for one of the queries
Table 2: Summary of sources of data used in the project
2.1 Source 1: OECD
The data set represents the unemployment rate for around 42 countries, over the years.
The data spans from 2014 to 2018. The data is periodically updated on the website and
has been downloaded in a CSV format. The data was subsequently uploaded in R for
cleaning purpose. Two columns that were not useful were removed from the dataset and
also the Country code column was converted to Country name in order to perform join
operation. The following columns were used for the query-
1) Country
2) Year
3) Unemployment Rate
The dataset was downloaded from the following URL-
URL: https://data.oecd.org/unemp/unemployment-rate.htm
Figure 1: Unemployment Rate
2.2 Source 2: World Bank
This data set contains the tourist arrival data for 9 years of around 242 countries. In the
data, each row corresponds to the tourist arrival number in a country for a particular
year. This is a periodically updated dataset which is available for download on data-
bank.worldbank.org. The file format was CSV and the file was then loaded in R for the
cleaning process that included transpose of the rows columns and removing the NULL
values. The relevant R codes that were used for the cleaning process have been mentioned
in the appendix. Following this, there were 4 columns of data-
1) Country
2) Country Code
3) Year
4) Arrival Nos.
URL: https://databank.worldbank.org/data/reports.aspx?source=2series=ST.INT.ARVL
This dataset would be used for my first query.
Figure 2: Arrival Rate
2.3 Source 3: Wikipedia (Unstructured)
Wikipedia is an online encyclopaedia which is a most popular repository for general
reference work. My third dataset has been taken from this website. Here, I fetched the
data of countries showing strong international travel and tourism growth between 2010-
2016. The data was grabbed using Selector Gadget and then loaded in R for cleaning.
The countries that are mentioned here are some of the most underdeveloped ones however
in the recent years they have seen a massive inflow in the tourist numbers. This may be
due to the steps taken by the Government to boost tourism in the country. The following
columns were used for the query-
1) Country
2) Percent growth (tourist nos.)
The dataset were downloaded from the below URL-
https://en.wikipedia.org/wiki/Tourism
Figure 3: Tourism Growth Rate
2.4 Source 4: World Travel Tourism Council (Structured)
Two separate datasets have been taken from World travel tourism council. They host
the data related to the economics of travel and tourism industry through a separate Data
Gateway portal for all the countries around the globe. Access for getting the data was
provided after a written email stating the purpose of the study and how the data would
be used. The data for relevant countries was fetched individually and then merged using
R to form two separate datasets that were used for two queries. The data on this website
is periodically updated hence no publication date is available. This data incorporates the
below columns-
1) Year
2) Country
3) Local Currency in Bn Nominal prices
4) Local Currency in Bn Real prices
5) Percentage growth
6) Percentage of GDP
7) USD in bn Nominal prices
8) USD in bn Real prices
The datasets were downloaded from the below URL-
URL: https://www.wttc.org/datagateway
Figure 4: Tourism GDP Contribution
Figure 5: Email Request
Figure 6: Access provided to data portal
2.5 Source 5: Statista (Structured)
Statista is a German online portal for statistics, which makes data collected by market
and opinion research institutes and data derived from the economic sector. The company
provides statistics and survey results, which are presented in charts and tables. (Statista)
The dataset for the third query has been downloaded from Statista. The data represents
the total contribution of travel and tourism industry to the global economy from 2006
to 2017, the values being in trillion US dollars. The data was downloaded in an excel
format with two sheets, the first one being the Overview that contained the metadata
whereas the second sheet contained the relevant data for the query. With the use of R,
the first sheet was removed and the file was converted in CSV format. The release date
of the report is March 2018. The data has the following three columns-
1) Year
2) Direct Contribution(in trillion USD)
3) Total Contribution(in trillion USD)
The link to the dataset is pasted below-
URL: https://www.statista.com/statistics/233223/travel-and-tourism–total-economic-contribution-
worldwide/
Figure 7: Worldwide T&T GDP Contribution
3 Related Work
The growth in global travel industry has been momentous and will presumably proceed
for a long time to come. The travel industry’s significance to the economy of numerous
industrial and developing nations has also increased drastically. The following points
will assist in exploring the topic in detail in order to get a better clarity with the use
of past works in accordance to the project requirement. After a study which included
research of around 114 articles, it has been identified that the critical success factors for
the growth of tourism industry are openness to trade and tourist security Kristo (2014).
Studies suggest that an increase in tourism demand may alter the country’s patterns of
production and specialization, in particular by crowding out internationally traded sectors
(i.e. export and import-competing sectors) Sahli & Nowak (2005). This will subsequently
affect the GDP of the country and also the employment rate however the impact may
be positive or negative depending on the residents attitude to the growing number of
tourists. While the underlying phases of the travel industry are typically met with a
lot of eagerness on part of local residents in view of the apparent financial advantages,
it is just common that, as undesirable changes take place in the physical condition and
in the sort of vacationer being pulled in, this inclination step by step turns out to be
increasingly negative. In order to have a sustainable tourism growth, these factors need
to be monitored intermittently.
The following points will assist in exploring the topic in detail in order to get a better
clarity with the use of past works in accordance to the project requirement. Under
developed countries, beset by incapacitating rural poverty, have extensive potential in
pulling in travellers looking for new, bona fide encounters in zones of unexploited natural
and cultural riches. The direct effects occur within the primary tourism sectors; hotelling,
restaurants, transportation, and retail trade Stynes (1997). Under-developed countries
generally lack proper employment sectors hence the economy does not flourish at the
required rate. For such countries, tourism is one such sector that will assist in upbringing
the economy. The challenging part for the government is to allocate funds for marketing
the tourism on global level.
However, the growing rate of tourism does not necessarily mean a significant increase
in the revenue generated by this industry. The spend behaviour of the tourists must be
taken into account so as to benefit most out of each individual tourist as this will lead to
overall growth in the sector as well as the economy.
4 Data Model
Previously mentioned literature’s gave me clearness on what my methodology ought to
be and also how I need to continue while keeping all my business necessities in prospect.
I have tried to incorporate the country development indicators in order to build a system
that will produce a significant relation and thereby enable the respective government
bodies to take steps in accordance for improving the countrys economy. The four key
decisions to be made during the design of a dimensional model according to the Kimball
approach are-
1. Selecting the business process.
2. Declaring the grains.
3. Identifying the dimensions.
4. Identifying the facts.
Here, I have used the Kimballs bottom up approach, wherein at first the data marts are
created and then we build the data warehouse. In a general sense, marts are made of all
the dimension tables and the fact table. We can say that data marts are assembled to
form a data warehouse. The purpose for picking the Kimballs approach in light of the
fact that all through his work Ralph Kimball constantly bolstered the consideration of
the end-clients in the process Chhabra and Pahwa (2016). Regardless, for our project we
have assembled data of last 4-5 years and giving a filtered analysis to the end users to
settle on sound decisions. Along these lines, to achieve what I ask for from my tasks I
have joined my datasets on the basis of the unique values each of them include. So, my
dataset in section data source 2.1 contains the unemployment rate, it is joined with my
dataset from 2.2 on the basis of country and year. Similarly, the other datasets have also
been joined on the basis of either country or year. Now, from the data that has been
derived, I have created 2 dimensions- DimCountry and DimYear. These dimensions have
been discussed in brief below-
DimCountry: This dimension table consists of all the countries that are present in
our data tables. As there were different countries in the data sources, I had to write a
SQL query with Union operation in order to have all the countries that would be required
for the queries. The attributes that are contained in this dimension are Country ID and
Country Name. Country id has been created by me in SSIS and is used in the Fact table
with a foreign key reference.
DimYear: In this dimension, we have used only the Year attribute along with the
Year id which is a primary key that I have generated in SSIS. The Year id will act as a
foreign key in my fact table. With the help of DimYear we can notice the change in the
rate of our measures.
Now, let us discuss the facts that will act as the measures and help us in the business
process-
Fact Table: The fact table that is created with the help of dim tables constitutes
both the primary keys of our dimensions. The fact table plays a crucial role in setting up
the ground for our business query requirements. It contains all the measures that would
be required for the thorough analysis of our BI queries. Following are the facts that my
fact table comprises of-
Country ID : Primary key of DimCountry
Year ID: Primary key of DimYear
Arrival Rate: It consists of the arrival rate of tourists in a particular country over the
year.
Unemployment Rate: This is a measure of the rate of unemployment of a country over
the year.
Tourism growth percent: It consists the tourism growth rate between 2010-2016 for 10
less developed countries.
TnT GDP Contribution10: This value represents the contribution of travel and tourism
to the countrys GDP.
TnT TotalGDPContribution: It consists of the overall contribution of travel and tourism
industry to the worldwide GDP.
TnT GDP Contribution5: This value represents the contribution of travel and tourism
to the countrys GDP.
All the above dimensions and facts will be our base and form our data mart. The Kimballs
approach is followed and the following schema is formed:
Figure 8: Star Schema
5 Logical Data Map
Following is the Logical Data Map for my way to deal with acquiring the ideal star schema. It clarifies all of the dimensions and facts
that have been utilized and how they were changed before stacking.
Table 3: Logical Data Map describing all transforma-
tions, sources and destinations for all components of the
data model illustrated in Figure 1
Source Column Destination Column Type Transformation
1,2,3,4 Country DimCountry Country name Dimension In few of data sources, country name was spelled incor-
rectly, so matched using match() function
1,2,4,5 Year DimYear Year Dimension contained junk prefix (’x’) which was removed using
separate() function
2 Tourist arrival FactTable Arrival Rate Fact values were converted to float type
1 Value FactTable Unemployment
Rate
Fact values were converted to float. Null values removed
using na.omit() function
3 Percentage FactTable Tourism growth
percent
Fact percent symbol was separated using separate() function
4 USD in bn
real prices
FactTable TnT GDP con
tribution10
Fact value were converted to float type
5 Total contri
bution
FactTable TnT Total GDP
Contribution
Fact values were converted to float type
4 USD in bn
real prices
FactTable TnT GDP con
tribution5
Fact value were converted to float type
6 ETL Process
Data warehouses are basically used for decision-making, hence the foremost requirement
of a data warehouse is the correctness of data which will avoid misleading calculations.
The ETL process primarily consists of Extraction, Transformation Loading which in-
volves the following prominent tasks:
a) identification of relevant information at the source level.
b) extraction of the appropriate information.
c) integration of the information obtained from multiple sources into a common format.
d) transforming the integrated data model through cleaning process based on business
rules or requirement.
e) loading the processed information onto the data warehouse / data mart Rahm & Do
(2000) Mentioned below is the ETL strategy that I have applied in my project.
Figure 9: Cube Formation
6.1 Extraction:
The first data set for my project has been extracted from OECD. There were total
8 columns in the dataset. The dataset includes the year wise unemployment rate for
around 42 countries and it was extracted in CSV format. The second source of data
has been extracted from World Bank. The dataset incorporates 16 columns and tourist
arrival rate for 269 countries. The data is horizontally distributed yearwise hence had to
perform a column transpose in R. The third data source is an unstructured one which
has been extracted from Wikipedia. From here, I extracted a data table that had the
countries showing strong growth in international travel and tourism between 2010-2016.
It had two columns country name and the tourist arrivals percentage growth. The fourth
and fifth data set for my project has been extracted from World Travel Tourism Council
(WTTC). Depending on the query requirements, the countries were individually selected
along with required attributes and downloaded as multiple files. These files were then
grouped and split into two using R for two separate BI queries. The source for my sixth
dataset is Statista. From this website, the data that was extracted consisted of two excel
sheets. Out of this, the first sheet was of no use, hence it was removed and the second
sheet was converted into CSV format. The second sheet consisted of contribution of
travel and tourism to the global GDP.
6.2 Transformation:
Data cleaning, also called data cleansing or scrubbing is an important function of trans-
formation. It basically deals with detecting and removing errors and inconsistencies from
data so as to ameliorate the quality of data. Data quality problems are generally observed
in single data collections, such as files and databases, the causes for this being misspellings
during data entry, missing information or other invalid data Rahm & Do (2000) All the
above datasets that were extracted from multiple sources need to be transformed in a
clean format which will thus make it suitable for loading in our data warehouse. Trans-
formation part is one of the most time-consuming activity as all the data sources must be
thoroughly cleaned in order to produce a reliable business solution. Now, let us see what
all cleaning mechanisms I have used to cleanse the data for our project implementation.
Firstly, I scanned the data extracted from my first source i.e. OECD and distinguished
the attributes that are required for my query. The columns that were not required were
removed from the dataset using R programming. The dataset only contained the country
codes and not the country names. In order to achieve a successful join, I had to replace
the country codes with the country names. For this, I used the match() function in R and
replaced with the country names from my second dataset that was sourced from World
Bank. Thereafter, I found that were few missing values that were then removed using the
na.omit() function in R. In the World Bank data set, as mentioned above, the data was
horizontally distributed i.e. for each year there was a separate column with the value in
respective cell. For performing the join operation, I needed all the years in one single col-
umn. For this, I have used the t() function in R while keeping the first two columns intact.
Then I used the rep() function to repeat the values in first two columns to match with
the corresponding transposed value. There were some bad entries in the dataset which
were then replaced with NA values and then subsequently removed by using na.omit()
function. The third dataset is an unstructured dataset sourced from Wikipedia. The data
table was pulled using Selector Gadget in Chrome and loaded in R for cleaning purpose.
For this, I used two R libraries- tidyr and htmltab. The data contained an initial rank
column that was not required and removed. Another column included the percentage
(%) symbol infront of the number. So, I used the separate() function to remove the
symbol and convert into numeric values. Also I had to name couple of countries properly
which were wrongly spelt during extraction. The fourth data source for this project is
World Travel Tourism Council (WTTC). As described above, depending on the query,
countrywise datasets for the relevant query were extracted which resulted in multiple files
from source. For the cleaning purpose, following libraries were used readxl, data.table
and tidyr. The data, in this case as well, was horizontally distributed which was then
transposed and the country name was repeated to match with the number of correspond-
ing transpose value. The year column contained a junk prefix,x, that was removed using
s
¯
eparate() function. The fifth and final data source for this project is Statista. The data
extracted from here had two sheets, first one had the metadata which was not required
and thus removed using R. The second sheet contained three columns that were already
in cleaned state just had to remove the initial two rows and change the column names.
All the above transformations have been carried out in R studio and the relevant R codes
have been been mentioned in the Appendix.
6.3 Loading:
After the completion of the above stages, we then have to move our raw data to staging
arena. The data is called from the SSMS to SSIS where the data would be staged through
the flat file source to the OLE DB Destination. In the SSIS, we have several components
out of which OLE DB destination is one which loads data from database in form of
tables using SQL commands. In this stage, I have created 6 flat files in my staging arena
which will hold the raw data. Additionally, I created an execute SQL task to create the
dimension table from the flat files. Within this task, I have written an SQL query to
create the dimension tables followed by the insert query to insert the data from all the
staging tables. By completing this process, I got two dimension tables which will now
be used for populating our fact table. For the fact table, another SQL task is taken and
the output of the previous SQL task for dimension has been given as input. In the SQL
task for fact table, we created the fact table and written an SQL query for obtaining the
values from all the staging tables that will be required for our business queries. After
this, the final step is to populate the fact table with the relevant measures.
During the execution of query for populating fact table, it is important to have proper
SQL joins in place so that correct values get populated else this will lead us to misleading
values. This was one of the most challenging part as I had 6 different datasets with
differing values hence needed to experiment with join queries multiple times. During
the execution of the complete process, the data gets loaded repetitively thus creating
duplicate values each time. In order to avoid this scenario, we have added a truncate
query at the start of the SSIS process. This will truncate all the staging tables and drop
the dimension and fact tables and rerun the process so that the data gets populated
properly every time. Once fact table is populated, we then have to create our schema.
Star schemas characteristically consist of fact tables linked to associated dimension tables
via primary/foreign key relationships. OLAP cubes can be equivalent in content to, or
more often derived from, a relational star schema Kimball/Ross (2016). In SSAS first I
made a new data source source which determined my association from where it got the
data from SSIS. This imported every one of my tables including dimensions and facts in
SSAS. Our essential point was to acquire a star schema for which we went with Kimball’s
methodology of data modelling. To accomplish this we at that point made a data source
view in which we chose what all tables we required. Here I chose my two dimensions and
the fact table and in the wake of handling the data source view it gave me my ideal star
schema design. Now the final step was to create the MOLAP cube. For achieving this, a
new cube had to be built within which our existing dimension and fact tables had to be
selected. Next, I named the cube and hit the process button to deploy the cube. Once we
receive the deployment successful as output, we can move over to the browser section and
drag the fact count field into the query field and confirm whether the fact table has been
properly populated or not. With this we can say that my cube is successfully deployed.
7 Application
In the sections above, we observed the successful deployment of our cube that we are now
going to use for answering our business queries. Following are the business queries that
I think would be useful to review the constraints discussed in section 3. The obtained
results of these queries along with the previous related work on the subject have been
discussed in detail in section 7.4. Now, we will evaluate our 3 business queries and check
our results:
7.1 BI Query 1: Is there a relationship between the number of
tourists visiting a country and the unemployment rate of a
country?
The sources contributing for this query are data source (2.1) and data source (2.2) Figure
1 shows us the visualized comparison of the arrival rate and the unemployment rate for
a country. From the graph, we can clearly observe that there is a positive and inverse
relationship between the two. As the tourist arrival rate increases, there is a decrease
in the unemployment rate hence we can say that tourism is one of the factors that can
reduce the unemployment rate of a country. Surprisingly, we can notice in our graph that
although the tourist arrival rate in South Africa is increasing, it is not having an impact
on the unemployment rate, infact it is as well increasing.
Figure 10: Results for BI Query 1
7.2 Query 2: Does the growth in International travel and tourism
affect the GDP of the country?
The sources contributing for this query are data source (2.3) and data source (2.5) The
first graph in figure 2 shows us the percentage growth in international travel and tourism
between 2010-2016. Between these years, from our analysis, we can see that the GDP has
also increased significantly which shows that there is a distinct relationship between the
tourism industry and the GDP of a country. Only for 1 country, Sao Tome and Principe,
the GDP growth is not significant although the tourism rate has increased by almost 30
Figure 11: Results for BI Query 2
7.3 BI Query 3: How the top 5 European countries have per-
formed in terms of the global GDP contribution of the
Travel and Tourism industry?
The sources contributing for this query are data source (2.4) and data source (2.5)
From figure 3, the yearwise growth in the worldwide GDP contribution of the travel
and tourism industry is visible. In the second graph we can see the top 5 European
countries with maximum tourism numbers and the GDP contribution of their tourism
industry. As visible from the graph, the GDP contribution of France and Spain is below
the average global GDP contribution of Travel and tourism industry. Whereas United
Kingdom and Italy have outperformed the global GDP growth rate.
Figure 12: Results for BI Query 3
7.4 Discussion
As now we are done with our business queries and have the graphs with us let us discuss
the implication of each one of them in detail. Let us first discuss BI query 1 that gave
us the relation between the tourist arrival rate and the unemployment rate of a country.
We can clearly observe that as the rate of tourism is growing in a country, it has a direct
and inverse relation with the unemployment rate of the country i.e. theres a decrease in
unemployment. The tourism spending, as also mentioned in section 3, lay primarily in the
purchase of goods and services from a variety of industries, with usually rather less than
two-thirds of their expenditure being in the hotels and restaurants normally identified
with the tourism industry De Kadt (1979). This spending pattern by tourists thereby
creates job for the locals. The second query gives us the relation between the growth of
tourism and the GDP of the country. The countries that were studied were some of the
less developed countries. As described in section 3, the less developed countries usually
have unexploited natural and cultural riches. If proper promotional initiatives are taken
by the local bodies, tourists would be willing to explore new places and thereby boost
the countrys economy. From our graph, we can notice a strong positive relation between
the two attributes thereby highlighting the significant impact tourism can have on the
economy. Result of the third and final query is to visualize on a global level, how the top
5 tourist famous European countries have performed over the recent years in terms of the
tourism contribution to GDP. From the graph, it can noted that although the tourism
numbers are high in France and Spain, the contribution to global economy is below the
average growth rate. The government can work around strategies that will influence the
tourists to spend more while they travel within these countries whereas Germany, Italy
and United Kingdom have contributed almost equally and at similar rate to the global
GDP contribution.
8 Conclusion and Future Work
With all the data that was fetched for analysis, I have tried to build a data warehouse
that will help us in correlating the tourism parameter with the country development
indicators. From the graphs we can say that we were able to achieve the desired results.
What I observed is that, in the recent years, there has been a very positive growth in the
tourism industry worldwide. This growth is primarily due to reduced travel times and
costs and also better communication mediums. The notable aspect is the positive impact
on the key country development indicators. Lot of countries have started acknowledging
this by allocating part of budget towards the betterment of tourism related services.
However, it is also equally important to consistently monitor the impact on the GDP
contribution as although the tourism rate is increasing it does not necessarily mean that
it will improve the GDP at a similar rate. The future prospect in this study could be to
analyze tourist sentiment so as to understand what factors are considered before deciding
to visit the place. This could be compared across different countries and modelled to
improve tourism rate. Also understanding the spending pattern of tourists is important
in order to monitor the impact of tourism industry on the GDP contribution.
References
De Kadt, E. (1979), ‘Tourism: Passport todevelopment’, Perspectives on thesocial and-
cultural effects of tourism in developing countries .
Kimball/Ross (2016), ‘Star schema olap cube — kimball dimensional modeling tech-
niques’.
Kristo, J. (2014), ‘Evaluating the tourism-led economic growth hypothesis in a developing
country: The case of albania’, Mediterranean Journal of Social Sciences 5(8), 39.
Kumar, V. R. (2018), ‘Ease visa norms for free movement of tourists, says
united nations body’, https://www.thehindubusinessline.com/news/variety/
ease-visa-norms-for-free-movement-of-tourists-says-united-nations-body/
article20602398.ece1.
Rahm, E. & Do, H. H. (2000), ‘Data cleaning: Problems and current approaches’, IEEE
Data Eng. Bull. 23(4), 3–13.
Sahli, M. & Nowak, J.-J. (2005), ‘Migration, unemployment and net benefits of inbound
tourism in a developing country’.
Stynes, D. J. (1997), ‘Economic impacts of tourism’, Illinois Bureau of Tourism, Depart-
ment of Commerce and Community Affairs .
Appendix
R code example
#Extraction and cleaning Unemployment Rate - Data Source 1
setwd("C:/ Users/shant/Desktop/Data -Files")
unemployment <- data.frame(read.csv("Unemployment -total.csv"))
arrival <- data.frame(read.csv("International -arrival.csv"))
#Changing the column names
colnames(unemployment) <- c("Country", "Indicator", "Subject", "Measure", "Fr
#removing columns that are not required
unemployment[,4:5] <- NULL
unemployment[6] <- NULL
#replacing country codes with country names
unemployment$Country <- arrival$Country.Name[match(unemployment$Country , arri
#removing NA values
unemployment <- na.omit(unemployment)
write.csv(unemployment , "Unemployment -rate -cleaned.csv", row.names = F)
#Extraction and cleaning of Arrival data - Data Source 2
#install.packages (" reshape2")
library(reshape2)
setwd("C:/ Users/shant/Desktop/Data -Files")
arrival_rate <- data.frame(read.csv("International -arrival.csv", stringsAsFac
head(arrival_rate)
arrival_rate[,1:2] <- NULL
arrival_rate <-subset(arrival_rate , select= -c(3,4))
head(arrival_rate)
#arrival_rate <- melt(arrival_rate ,id = c(" Country.Name", "Country.Code "))
data <- t(arrival_rate)
arrival_rate <-cbind(arrival_rate[rep(1:nrow(arrival_rate),each=10),1:2],#this
Year=c(2009:2018),#this gives the year column
Tourist_Arrival=as.vector(data[3:12 ,])) # the Average Educat
arrival_rate$Country <- NULL
arrival_rate$Tourist_Arrival <- as.character(arrival_rate$Tourist_Arrival)
arrival_rate$Tourist_Arrival[arrival_rate$Tourist_Arrival == ".."] <- "NA"
#arrival_rate <- arrival_rate [!( arrival_rate$Tourist_Arrival == "NA")]
arrival_rate <- arrival_rate[-c(2640:2690), ]
#arrival_rate <- na.omit(arrival_rate)
write.csv(arrival_rate , "arrival -rate -cleanned.csv", row.names = F)
#Extracting and cleaning Wikipedia data - Data source 3
#install.packages (" htmltab ")
library(tidyr)
#working
setwd("C:/ Users/shant/Desktop/Data -Files")
library(htmltab)
url <- "https ://en.wikipedia.org/wiki/Tourism"
tourism <- htmltab(doc = url , which = 5)
#Delete Columns
tourism <- tourism[, -c(1)]
tourism <- separate(tourism , "Percentage", into = c("Percentage","dump"))
tourism$dump <- NULL
tourism$Percentage <- as.numeric(tourism$Percentage)
tourism$Country[5] <- "Sao Tome and Principe"
tourism$Country[6] <- "Srilanka"
write.csv(tourism , " highesttourismgrowth .csv", row.names = F)
# Extracting and cleaning GDP contribution - WTTC
#install.packages (" data.table", repos = "https :// ftp.heanet.ie/mirrors/cran.r
#install.packages (" readxl", repos = "https :// ftp.heanet.ie/mirrors/cran.r-pro
#install.packages (" tidyr", repos = "https :// ftp.heanet.ie/mirrors/cran.r-proj
library(readxl)
library(data.table)
library(tidyr)
#cleaning the raw Azerbaijan data
setwd("C:/ Users/shant/Desktop/Data -Files/GDP -contribution")
azerbaijan <- data.frame(read_excel("Azerbaijan -tourism -gdp.xls"))
azerbaijan <- data.frame(t(azerbaijan ))
setDT(azerbaijan , keep.rownames = TRUE )[]
azerbaijan <- azerbaijan[-1,-9]
azerbaijan <- azerbaijan[-c(1:11), ]
colnames(azerbaijan) <- c("Year", "Country", "Local currencyin bn (Nominal pr
str(azerbaijan)
azerbaijan$Country <- as.factor(azerbaijan$Country)
azerbaijan$Country[is.na(azerbaijan$Country )] <- "Azerbaijan"
azerbaijan$Year <- gsub("[()]", "", azerbaijan$Year)
azerbaijan <- separate(azerbaijan , col = Year , into = c("Junk","Year"), sep =
#typeof(azerbaijan $"US$ in bn (Real prices )")
#typeof(azerbaijan $" Percentage growth ")
#typeof(azerbaijan $" Percentage of GDP")
azerbaijan <- azerbaijan[,-1]
write.csv(azerbaijan , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/aze
#cleaning the raw Cameroon data
cameroon <- data.frame(read_excel("Cameroon -tourism -gdp.xls"))
cameroon <- data.frame(t(cameroon ))
setDT(cameroon , keep.rownames = TRUE )[]
cameroon <- cameroon[-1,-9]
cameroon <- cameroon[-c(1:11), ]
colnames(cameroon) <- c("Year", "Country", "Local currencyin bn (Nominal pric
str(cameroon)
cameroon$Country <- as.factor(cameroon$Country)
cameroon$Country[is.na(cameroon$Country )] <- "Cameroon"
cameroon$Year <- gsub("[()]", "", cameroon$Year)
cameroon <- separate(cameroon , col = Year , into = c("Junk","Year"), sep = "X"
cameroon <- cameroon[,-1]
write.csv(cameroon , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/camer
#cleaning the raw Gerogia data
georgia <- data.frame(read_excel("Georgia -tourism -gdp.xls"))
georgia <- data.frame(t(georgia ))
setDT(georgia , keep.rownames = TRUE )[]
georgia <- georgia[-1,-9]
georgia <- georgia[-c(1:11), ]
colnames(georgia) <- c("Year", "Country", "Local currencyin bn (Nominal price
str(georgia)
georgia$Country <- as.factor(georgia$Country)
georgia$Country[is.na(georgia$Country )] <- "Georgia"
georgia$Year <- gsub("[()]", "", georgia$Year)
georgia <- separate(georgia , col = Year , into = c("Junk","Year"), sep = "X")
georgia <- georgia[,-1]
write.csv(georgia , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/georgi
#cleaning the raw Iceland data
iceland <- data.frame(read_excel("Iceland -tourism -gdp.xls"))
iceland <- data.frame(t(iceland ))
setDT(iceland , keep.rownames = TRUE )[]
iceland <- iceland[-1,-9]
iceland <- iceland[-c(1:11), ]
colnames(iceland) <- c("Year", "Country", "Local currencyin bn (Nominal price
iceland$Country[is.na(iceland$Country )] <- "Iceland"
iceland$Year <- gsub("[()]", "", iceland$Year)
iceland <- separate(iceland , col = Year , into = c("Junk","Year"), sep = "X")
iceland <- iceland[,-1]
write.csv(iceland , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/icelan
#cleaning the raw Kyrgyzstan data
kyrgyzstan <- data.frame(read_excel("Kyrgyzstan -tourism -gdp.xls"))
kyrgyzstan <- data.frame(t(kyrgyzstan ))
setDT(kyrgyzstan , keep.rownames = TRUE )[]
kyrgyzstan <- kyrgyzstan[-1,-9]
kyrgyzstan <- kyrgyzstan[-c(1:11), ]
colnames(kyrgyzstan) <- c("Year", "Country", "Local currencyin bn (Nominal pr
str(kyrgyzstan)
kyrgyzstan$Country <- as.factor(kyrgyzstan$Country)
kyrgyzstan$Country[is.na(kyrgyzstan$Country )] <- "Kyrgyzstan"
kyrgyzstan$Year <- gsub("[()]", "", kyrgyzstan$Year)
kyrgyzstan <- separate(kyrgyzstan , col = Year , into = c("Junk","Year"), sep =
kyrgyzstan <- kyrgyzstan[,-1]
write.csv(kyrgyzstan , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/kyr
#cleaning the raw Myanmar data
myanmar <- data.frame(read_excel("Myanmar -tourism -gdp.xls"))
myanmar <- data.frame(t(myanmar ))
setDT(myanmar , keep.rownames = TRUE )[]
myanmar <- myanmar[-1,-9]
myanmar <- myanmar[-c(1:11), ]
colnames(myanmar) <- c("Year", "Country", "Local currencyin bn (Nominal price
myanmar$Country[is.na(myanmar$Country )] <- "Myanmar"
myanmar$Year <- gsub("[()]", "", myanmar$Year)
myanmar <- separate(myanmar , col = Year , into = c("Junk","Year"), sep = "X")
myanmar <- myanmar[,-1]
write.csv(myanmar , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/myanma
#cleaning the raw Qatar data
qatar <- data.frame(read_excel("Qatar -tourism -gdp.xls"))
qatar <- data.frame(t(qatar ))
setDT(qatar , keep.rownames = TRUE )[]
qatar <- qatar[-1,-9]
qatar <- qatar[-c(1:11), ]
colnames(qatar) <- c("Year", "Country", "Local currencyin bn (Nominal prices)
qatar$Country[is.na(qatar$Country )] <- "Qatar"
qatar$Year <- gsub("[()]", "", qatar$Year)
qatar <- separate(qatar , col = Year , into = c("Junk","Year"), sep = "X")
qatar <- qatar[,-1]
write.csv(qatar , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/qatar -t
#cleaning the raw Sao Tome data
saotome <- data.frame(read_excel("Sao -Tome -tourism -gdp.xls"))
saotome <- data.frame(t(saotome ))
setDT(saotome , keep.rownames = TRUE )[]
saotome <- saotome[-1,-9]
saotome <- saotome[-c(1:11), ]
colnames(saotome) <- c("Year", "Country", "Local currencyin bn (Nominal price
saotome$Country[is.na(saotome$Country )] <- "Sao Tome and Principe"
saotome$Year <- gsub("[()]", "", saotome$Year)
saotome <- separate(saotome , col = Year , into = c("Junk","Year"), sep = "X")
saotome <- saotome[,-1]
write.csv(saotome , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/saotom
#cleaning the raw Srilanka data
srilanka <- data.frame(read_excel("Srilanka -tourism -gdp.xls"))
srilanka <- data.frame(t(srilanka ))
setDT(srilanka , keep.rownames = TRUE )[]
srilanka <- srilanka[-1,-9]
srilanka <- srilanka[-c(1:11), ]
colnames(srilanka) <- c("Year", "Country", "Local currencyin bn (Nominal pric
srilanka$Country <- "Srilanka"
srilanka$Year <- gsub("[()]", "", srilanka$Year)
srilanka <- separate(srilanka , col = Year , into = c("Junk","Year"), sep = "X"
srilanka <- srilanka[,-1]
write.csv(srilanka , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/srila
#cleaning the raw Sudan data
sudan <- data.frame(read_excel("Sudan -tourism -gdp.xls"))
sudan <- data.frame(t(sudan ))
setDT(sudan , keep.rownames = TRUE )[]
sudan <- sudan[-1,-9]
sudan <- sudan[-c(1:11), ]
colnames(sudan) <- c("Year", "Country", "Local currencyin bn (Nominal prices)
sudan$Country[is.na(sudan$Country )] <- "Sudan"
sudan$Year <- gsub("[()]", "", sudan$Year)
sudan <- separate(sudan , col = Year , into = c("Junk","Year"), sep = "X")
sudan <- sudan[,-1]
write.csv(sudan , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/sudan -t
#merging all the cleaned files
sudan <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/sudan -
colnames(sudan)
srilanka <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/sril
colnames(srilanka)
saotome <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/saoto
colnames(saotome)
qatar <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/qatar -
colnames(qatar)
myanmar <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/myanm
colnames(myanmar)
kyrgyzstan <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/ky
colnames(kyrgyzstan)
iceland <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/icela
colnames(iceland)
georgia <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/georg
colnames(georgia)
cameroon <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/came
colnames(cameroon)
azerbaijan <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/az
#colnames(azerbaijan) <- trimws(colnames(azerbaijan ))
demo <- rbind(sudan , srilanka , saotome , qatar , myanmar , kyrgyzstan , iceland ,
#demo$Local.currencyin.bn.. Nominal.prices. <- as.numeric(demo$Local.currencyi
#demo$Local.currency.in.bn.. Real.prices. <- as.numeric(demo$Local.currency.in
#demo$Percentage.growth <- as.numeric(demo$Percentage.growth)
#demo$Percentage.of.GDP <- as.numeric(demo$Percentage.of.GDP)
#demo$US..in.bn.. Nominal.prices. <- as.numeric(demo$US..in.bn.. Nominal.prices
#demo$US..in.bn.. Real.prices. <- as.numeric(demo$US..in.bn.. Real.prices .)
write.csv(demo , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/10merged -
col <- data.frame(read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contributio
colnames(col) <- c("Year", "Country", "Local currencyin bn (Nominal prices)",
write.csv(col , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/10merged -
#Extraction and cleaning Statista code -
library(readxl)
library(plyr)
library(tibble)
library(dplyr)
library(sqldf)
demo2 <- data.frame(read_excel("C:/ Users/shant/Desktop/FinalDWData/OECD/Stati
demo2
demo2 = demo2[-1:-2 ,]
demo2
#demo$Casualty_Class <- demo2$label[match(demo$Casualty_Class , demo2$code )]
colnames(demo2) <- c("Year", "Direct Contribution (in trillion USD)", "Total
demo2$‘Direct Contribution (in trillion USD)‘ <- as.numeric(demo2$‘Direct Con
demo2$Year <- as.numeric(demo2$Year)
demo2$’Total Contribution (in trillion USD)’ <- as.numeric(demo2$’Total Contr
write.csv(demo2, "C:/ Users/shant/Desktop/FinalDWData/OECD/Statista/Imp -data/G

More Related Content

What's hot

Measuring progress on Europe’s SDGs implementation: A view from IEEP
Measuring progress on Europe’s SDGs implementation: A view from IEEPMeasuring progress on Europe’s SDGs implementation: A view from IEEP
Measuring progress on Europe’s SDGs implementation: A view from IEEPCéline Charveriat
 
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Alan McSweeney
 
Migration statistics quarterly
Migration statistics quarterlyMigration statistics quarterly
Migration statistics quarterlyMundo Spanish
 
Latvijas Banka Monthly Newsletter: November 2018
Latvijas Banka Monthly Newsletter: November 2018Latvijas Banka Monthly Newsletter: November 2018
Latvijas Banka Monthly Newsletter: November 2018Latvijas Banka
 
2017 oecd economic survey of spain
2017 oecd economic survey of spain2017 oecd economic survey of spain
2017 oecd economic survey of spainaliaalistartup
 
Bolden.saturday
Bolden.saturdayBolden.saturday
Bolden.saturdaynado-web
 
Worldwide Military Expenditure
Worldwide Military ExpenditureWorldwide Military Expenditure
Worldwide Military ExpenditureAprameya Bhol
 
Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017
Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017
Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017Marcelo Gomes Freire
 
Open Budget Format: Issues on Development of Specification and Converter Impl...
Open Budget Format: Issues on Development of Specification and Converter Impl...Open Budget Format: Issues on Development of Specification and Converter Impl...
Open Budget Format: Issues on Development of Specification and Converter Impl...Olya Parkhimovich
 
Bureau statistics-shamim-rafique
Bureau statistics-shamim-rafiqueBureau statistics-shamim-rafique
Bureau statistics-shamim-rafiqueabdulrehman saeed
 
Latvijas Banka Monthly Newsletter: August 2018
Latvijas Banka Monthly Newsletter: August 2018Latvijas Banka Monthly Newsletter: August 2018
Latvijas Banka Monthly Newsletter: August 2018Latvijas Banka
 

What's hot (20)

Measuring progress on Europe’s SDGs implementation: A view from IEEP
Measuring progress on Europe’s SDGs implementation: A view from IEEPMeasuring progress on Europe’s SDGs implementation: A view from IEEP
Measuring progress on Europe’s SDGs implementation: A view from IEEP
 
Albania in the emde s for europe
Albania in the emde s for europeAlbania in the emde s for europe
Albania in the emde s for europe
 
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
 
CASE Network Studies and Analyses 260 - Do Acceding Countries Need Higher Fis...
CASE Network Studies and Analyses 260 - Do Acceding Countries Need Higher Fis...CASE Network Studies and Analyses 260 - Do Acceding Countries Need Higher Fis...
CASE Network Studies and Analyses 260 - Do Acceding Countries Need Higher Fis...
 
Migration statistics quarterly
Migration statistics quarterlyMigration statistics quarterly
Migration statistics quarterly
 
Latvijas Banka Monthly Newsletter: November 2018
Latvijas Banka Monthly Newsletter: November 2018Latvijas Banka Monthly Newsletter: November 2018
Latvijas Banka Monthly Newsletter: November 2018
 
2017 oecd economic survey of spain
2017 oecd economic survey of spain2017 oecd economic survey of spain
2017 oecd economic survey of spain
 
Bolden.saturday
Bolden.saturdayBolden.saturday
Bolden.saturday
 
Sustainable Index for Udon Thani, Thailand
Sustainable Index for Udon Thani, ThailandSustainable Index for Udon Thani, Thailand
Sustainable Index for Udon Thani, Thailand
 
Worldwide Military Expenditure
Worldwide Military ExpenditureWorldwide Military Expenditure
Worldwide Military Expenditure
 
Labor Force Participation Rates: Measuring workforce engagement
Labor Force Participation Rates: Measuring workforce engagementLabor Force Participation Rates: Measuring workforce engagement
Labor Force Participation Rates: Measuring workforce engagement
 
Gross Domestic Product (GDP): Measuring the Economy
Gross Domestic Product (GDP): Measuring the EconomyGross Domestic Product (GDP): Measuring the Economy
Gross Domestic Product (GDP): Measuring the Economy
 
Government Spending: Measuring Federal Expenditures
Government Spending: Measuring Federal ExpendituresGovernment Spending: Measuring Federal Expenditures
Government Spending: Measuring Federal Expenditures
 
Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017
Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017
Oecd Economic Survey Reforms for Inclusive Growth - Spain 2017
 
Researchers dataset 2015
Researchers dataset 2015Researchers dataset 2015
Researchers dataset 2015
 
U.S Census Bureau Data Tools Handout
U.S Census Bureau Data Tools HandoutU.S Census Bureau Data Tools Handout
U.S Census Bureau Data Tools Handout
 
Open Budget Format: Issues on Development of Specification and Converter Impl...
Open Budget Format: Issues on Development of Specification and Converter Impl...Open Budget Format: Issues on Development of Specification and Converter Impl...
Open Budget Format: Issues on Development of Specification and Converter Impl...
 
Bureau statistics-shamim-rafique
Bureau statistics-shamim-rafiqueBureau statistics-shamim-rafique
Bureau statistics-shamim-rafique
 
Latvijas Banka Monthly Newsletter: August 2018
Latvijas Banka Monthly Newsletter: August 2018Latvijas Banka Monthly Newsletter: August 2018
Latvijas Banka Monthly Newsletter: August 2018
 
UK accident analysis
UK accident analysisUK accident analysis
UK accident analysis
 

Similar to Impact of Travel & Tourism on Economy

Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.ABHISHEKDAHALE
 
Project DescriptionsData ExercisesThe purpose of the data ex.docx
Project DescriptionsData ExercisesThe purpose of the data ex.docxProject DescriptionsData ExercisesThe purpose of the data ex.docx
Project DescriptionsData ExercisesThe purpose of the data ex.docxwkyra78
 
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...Kaushik Rajan
 
I have added Part A & B for references ... PLEASE NOTE Th.docx
I have added Part A & B for references ... PLEASE NOTE Th.docxI have added Part A & B for references ... PLEASE NOTE Th.docx
I have added Part A & B for references ... PLEASE NOTE Th.docxAASTHA76
 
Introduction to the Development Finance Work and Systems at OECD DAC
Introduction to the Development Finance Work and Systems at OECD DACIntroduction to the Development Finance Work and Systems at OECD DAC
Introduction to the Development Finance Work and Systems at OECD DACExternalEvents
 
SDG Indicators and FAO Role_ ENGLISH
SDG Indicators and FAO Role_ ENGLISHSDG Indicators and FAO Role_ ENGLISH
SDG Indicators and FAO Role_ ENGLISHFAO
 
Framework for a set of e-Government Core Indicators
Framework for a set of e-Government Core IndicatorsFramework for a set of e-Government Core Indicators
Framework for a set of e-Government Core IndicatorsDr Lendy Spires
 
DPS and CPT eXplorer: connecting data & policy
DPS and CPT eXplorer: connecting data & policyDPS and CPT eXplorer: connecting data & policy
DPS and CPT eXplorer: connecting data & policycarloamati
 
The SDG era and the challenge of producing more and better agriculture data
The SDG era and the challenge of producing more and better agriculture dataThe SDG era and the challenge of producing more and better agriculture data
The SDG era and the challenge of producing more and better agriculture dataFAO
 
Methodology Total Official Support for Sustainable Development (TOSSD)
 Methodology Total Official Support for Sustainable Development (TOSSD) Methodology Total Official Support for Sustainable Development (TOSSD)
Methodology Total Official Support for Sustainable Development (TOSSD)UNDP Policy Centre
 
Sample Report: BRIC B2C E-commerce Markets 2014
Sample Report: BRIC B2C E-commerce Markets 2014Sample Report: BRIC B2C E-commerce Markets 2014
Sample Report: BRIC B2C E-commerce Markets 2014yStats.com
 
DWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of Ireland
DWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of IrelandDWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of Ireland
DWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of IrelandShrikant Samarth
 
Sample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.com
Sample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.comSample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.com
Sample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.comyStats.com
 
Bmc_Digital_Economy_Information_Society_Index
Bmc_Digital_Economy_Information_Society_Index Bmc_Digital_Economy_Information_Society_Index
Bmc_Digital_Economy_Information_Society_Index Mohamed Bouanane
 
Sample Report: Africa B2C E-Commerce Market 2015
Sample Report: Africa B2C E-Commerce Market 2015Sample Report: Africa B2C E-Commerce Market 2015
Sample Report: Africa B2C E-Commerce Market 2015yStats.com
 
Sample Report: MENA B2C E-Commerce Market 2015
Sample Report: MENA  B2C E-Commerce Market 2015Sample Report: MENA  B2C E-Commerce Market 2015
Sample Report: MENA B2C E-Commerce Market 2015yStats.com
 
Banking organization systemic risk report
Banking organization systemic risk reportBanking organization systemic risk report
Banking organization systemic risk reportLalit Jain
 

Similar to Impact of Travel & Tourism on Economy (20)

X18136931 dwbi report
X18136931 dwbi reportX18136931 dwbi report
X18136931 dwbi report
 
Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.
 
Project DescriptionsData ExercisesThe purpose of the data ex.docx
Project DescriptionsData ExercisesThe purpose of the data ex.docxProject DescriptionsData ExercisesThe purpose of the data ex.docx
Project DescriptionsData ExercisesThe purpose of the data ex.docx
 
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
 
I have added Part A & B for references ... PLEASE NOTE Th.docx
I have added Part A & B for references ... PLEASE NOTE Th.docxI have added Part A & B for references ... PLEASE NOTE Th.docx
I have added Part A & B for references ... PLEASE NOTE Th.docx
 
Linked Data for Cross-Domain Decision-making in Tourism
Linked Data for Cross-Domain Decision-making in TourismLinked Data for Cross-Domain Decision-making in Tourism
Linked Data for Cross-Domain Decision-making in Tourism
 
Introduction to the Development Finance Work and Systems at OECD DAC
Introduction to the Development Finance Work and Systems at OECD DACIntroduction to the Development Finance Work and Systems at OECD DAC
Introduction to the Development Finance Work and Systems at OECD DAC
 
SDG Indicators and FAO Role_ ENGLISH
SDG Indicators and FAO Role_ ENGLISHSDG Indicators and FAO Role_ ENGLISH
SDG Indicators and FAO Role_ ENGLISH
 
Framework for a set of e-Government Core Indicators
Framework for a set of e-Government Core IndicatorsFramework for a set of e-Government Core Indicators
Framework for a set of e-Government Core Indicators
 
DPS and CPT eXplorer: connecting data & policy
DPS and CPT eXplorer: connecting data & policyDPS and CPT eXplorer: connecting data & policy
DPS and CPT eXplorer: connecting data & policy
 
The SDG era and the challenge of producing more and better agriculture data
The SDG era and the challenge of producing more and better agriculture dataThe SDG era and the challenge of producing more and better agriculture data
The SDG era and the challenge of producing more and better agriculture data
 
Methodology Total Official Support for Sustainable Development (TOSSD)
 Methodology Total Official Support for Sustainable Development (TOSSD) Methodology Total Official Support for Sustainable Development (TOSSD)
Methodology Total Official Support for Sustainable Development (TOSSD)
 
Web Index o Indice de la Web 2012
Web Index o Indice de la Web 2012Web Index o Indice de la Web 2012
Web Index o Indice de la Web 2012
 
Sample Report: BRIC B2C E-commerce Markets 2014
Sample Report: BRIC B2C E-commerce Markets 2014Sample Report: BRIC B2C E-commerce Markets 2014
Sample Report: BRIC B2C E-commerce Markets 2014
 
DWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of Ireland
DWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of IrelandDWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of Ireland
DWBI - Criminalytics: Entities affecting the Rate of Crime in Republic of Ireland
 
Sample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.com
Sample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.comSample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.com
Sample Report_Global Blockchain and Cryptocurrency Market 2021_by yStats.com
 
Bmc_Digital_Economy_Information_Society_Index
Bmc_Digital_Economy_Information_Society_Index Bmc_Digital_Economy_Information_Society_Index
Bmc_Digital_Economy_Information_Society_Index
 
Sample Report: Africa B2C E-Commerce Market 2015
Sample Report: Africa B2C E-Commerce Market 2015Sample Report: Africa B2C E-Commerce Market 2015
Sample Report: Africa B2C E-Commerce Market 2015
 
Sample Report: MENA B2C E-Commerce Market 2015
Sample Report: MENA  B2C E-Commerce Market 2015Sample Report: MENA  B2C E-Commerce Market 2015
Sample Report: MENA B2C E-Commerce Market 2015
 
Banking organization systemic risk report
Banking organization systemic risk reportBanking organization systemic risk report
Banking organization systemic risk report
 

More from Shantanu Deshpande

Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesShantanu Deshpande
 
Analyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyAnalyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyShantanu Deshpande
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsShantanu Deshpande
 
Pharmaceutical store management system
Pharmaceutical store management systemPharmaceutical store management system
Pharmaceutical store management systemShantanu Deshpande
 

More from Shantanu Deshpande (7)

Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
 
Analyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyAnalyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacy
 
Pneumonia detection using CNN
Pneumonia detection using CNNPneumonia detection using CNN
Pneumonia detection using CNN
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalytics
 
Pharmaceutical store management system
Pharmaceutical store management systemPharmaceutical store management system
Pharmaceutical store management system
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Impact of Travel & Tourism on Economy

  • 1. Data Warehousing and Business Intelligence Project on Impact of Travel & Tourism on Economy Shantanu Deshpande x18125514 Video Link:https://youtu.be/1upKlsPfWJ4 MSc/PGDip Data Analytics – 2019/20 Submitted to: Prof. Sean Heeney
  • 2. National College of Ireland Project Submission Sheet – 2019/2020 School of Computing Student Name: Shantanu Deshpande Student ID: x18125514 Programme: MSc Data Analytics Year: 2019/20 Module: Data Warehousing and Business Intelligence Lecturer: Prof. Sean Heeney Submission Due Date: 12/04/2019 Project Title: Impact of Travel & Tourism on the Economy I hereby certify that the information contained in this (my submission) is information pertaining to my own individual work that I conducted for this project. All information other than my own contribution is fully and appropriately referenced and listed in the relevant bibliography section. I assert that I have not referred to any work(s) other than those listed. I also include my TurnItIn report with this submission. ALL materials used must be referenced in the bibliography section. Students are encouraged to use the Harvard Referencing Standard supplied by the Library. To use other author’s written or electronic work is an act of plagiarism and may result in disci- plinary action. Students may be required to undergo a viva (oral examination) if there is suspicion about the validity of their submitted work. Signature: Date: April 12, 2019 PLEASE READ THE FOLLOWING INSTRUCTIONS: 1. Please attach a completed copy of this sheet to each project (including multiple copies). 2. You must ensure that you retain a HARD COPY of ALL projects, both for your own reference and in case a project is lost or mislaid. It is not sufficient to keep a copy on computer. Please do not bind projects or place in covers unless specifically requested. 3. Assignments that are submitted to the Programme Coordinator office must be placed into the assignment box located outside the office. Office Use Only Signature: Date: Penalty Applied (if applicable):
  • 3. Table 1: Mark sheet – do not edit Criteria Mark Awarded Comment(s) Objectives of 5 Related Work of 10 Data of 25 ETL of 20 Application of 30 Video of 10 Presentation of 10 Total of 100
  • 4. Project Check List This section capture the core requirements that the project entails represented as a check list for convenience. Used LATEX template Three Business Requirements listed in introduction At least one structured data source At least one unstructured data source At least three sources of data Described all sources of data All sources of data are less than one year old, i.e. released after 17/09/2017 Inserted and discussed star schema Completed logical data map Discussed the high level ETL strategy Provided 3 BI queries Detailed the sources of data used in each query Discussed the implications of results in each query Reviewed at least 5-10 appropriate papers on topic of your DWBI project
  • 5. Impact of Travel & Tourism on the Economy Shantanu Deshpande x18125514 April 12, 2019 Abstract The tourism to a destination is often under-appreciated in terms of its economic importance. With the advancement in the transportation industry, we can now see bigger and fast-moving airplanes, which has made travelling to different locations more affordable and convenient. This has led to an increase in the tourist numbers worldwide. Very often, this growth has a tremendously positive impact on a coun- trys key development indicators. To understand its importance, I have built a Data Warehouse and a Business Intelligence model by using the data of tourist arrival rate as my main source and compared it with other development indicators like the unemployment rate and the GDP to observe impact on the countrys development indicators. 1 Introduction Travelling these days has become a customary habit for people across the globe, be it for business purpose or leisure purpose. Due to the improvement in air connectivity between countries, it has resulted in decrease in travel time and travel costs. This is one of the major factors that have boosted the tourism rate globally. Tourists contribute to sales, profits, jobs, tax revenues, and income in an area. The most direct effects occur within the primary tourism sectors –lodging, restaurants, transportation, amusements, and retail trade . Through secondary effects, tourism affects most sectors of the economy. An economic impact analysis of tourism activity normally focuses on changes in sales, income, and employment in a region resulting from tourism activity Stynes (1997). Many countries have started taking steps in order to ease the visa norms for tourists Kumar (2018) in view of the economic advantages. In this project, I am analyzing the year wise tourism rate across several countries and comparing it with few key country development indicators like the GDP and unemployment rate to figure out the relationship between them. Using this data warehouse, I will be developing a MOLAP (Multidimensional Online Analytical Processing) cube that will address the following business queries (1) Is there a relationship between the number of tourists visiting a country and the unemployment rate of a country? (2) Does the growth in International travel and tourism affect the GDP of the country? (3) How the top 5 European countries have performed in terms of the global GDP contribution of the Travel and Tourism industry? 1
  • 6. 2 Data Sources For implementing this project, I have made use of 6 data sets fetched from 5 different sources of which 5 are structured and 1 is unstructured. They have been discussed in brief below Source Type Brief Summary World Bank Structured It is used because it has country wise data of international arrival rate which is useful for my query OECD Structured The data contains country wise Unemploy- ment rate which is joined with World Bank data for my first query Wikipedia Unstructured It is an online encyclopedia and contains the relevant data table related to tourism which helped me for my second query WTTC Structured This website contains the data pertaining to the economic impact of travel and tourism industry which is useful for me to compare it with worldwide GDP contribution. Statista Structured It is a statistical data containing the world- wide total GDP contribution and has been used as benchmark for one of the queries Table 2: Summary of sources of data used in the project
  • 7. 2.1 Source 1: OECD The data set represents the unemployment rate for around 42 countries, over the years. The data spans from 2014 to 2018. The data is periodically updated on the website and has been downloaded in a CSV format. The data was subsequently uploaded in R for cleaning purpose. Two columns that were not useful were removed from the dataset and also the Country code column was converted to Country name in order to perform join operation. The following columns were used for the query- 1) Country 2) Year 3) Unemployment Rate The dataset was downloaded from the following URL- URL: https://data.oecd.org/unemp/unemployment-rate.htm Figure 1: Unemployment Rate
  • 8. 2.2 Source 2: World Bank This data set contains the tourist arrival data for 9 years of around 242 countries. In the data, each row corresponds to the tourist arrival number in a country for a particular year. This is a periodically updated dataset which is available for download on data- bank.worldbank.org. The file format was CSV and the file was then loaded in R for the cleaning process that included transpose of the rows columns and removing the NULL values. The relevant R codes that were used for the cleaning process have been mentioned in the appendix. Following this, there were 4 columns of data- 1) Country 2) Country Code 3) Year 4) Arrival Nos. URL: https://databank.worldbank.org/data/reports.aspx?source=2series=ST.INT.ARVL This dataset would be used for my first query. Figure 2: Arrival Rate
  • 9. 2.3 Source 3: Wikipedia (Unstructured) Wikipedia is an online encyclopaedia which is a most popular repository for general reference work. My third dataset has been taken from this website. Here, I fetched the data of countries showing strong international travel and tourism growth between 2010- 2016. The data was grabbed using Selector Gadget and then loaded in R for cleaning. The countries that are mentioned here are some of the most underdeveloped ones however in the recent years they have seen a massive inflow in the tourist numbers. This may be due to the steps taken by the Government to boost tourism in the country. The following columns were used for the query- 1) Country 2) Percent growth (tourist nos.) The dataset were downloaded from the below URL- https://en.wikipedia.org/wiki/Tourism Figure 3: Tourism Growth Rate
  • 10. 2.4 Source 4: World Travel Tourism Council (Structured) Two separate datasets have been taken from World travel tourism council. They host the data related to the economics of travel and tourism industry through a separate Data Gateway portal for all the countries around the globe. Access for getting the data was provided after a written email stating the purpose of the study and how the data would be used. The data for relevant countries was fetched individually and then merged using R to form two separate datasets that were used for two queries. The data on this website is periodically updated hence no publication date is available. This data incorporates the below columns- 1) Year 2) Country 3) Local Currency in Bn Nominal prices 4) Local Currency in Bn Real prices 5) Percentage growth 6) Percentage of GDP 7) USD in bn Nominal prices 8) USD in bn Real prices The datasets were downloaded from the below URL- URL: https://www.wttc.org/datagateway Figure 4: Tourism GDP Contribution
  • 11. Figure 5: Email Request Figure 6: Access provided to data portal 2.5 Source 5: Statista (Structured) Statista is a German online portal for statistics, which makes data collected by market and opinion research institutes and data derived from the economic sector. The company provides statistics and survey results, which are presented in charts and tables. (Statista) The dataset for the third query has been downloaded from Statista. The data represents the total contribution of travel and tourism industry to the global economy from 2006 to 2017, the values being in trillion US dollars. The data was downloaded in an excel format with two sheets, the first one being the Overview that contained the metadata whereas the second sheet contained the relevant data for the query. With the use of R, the first sheet was removed and the file was converted in CSV format. The release date of the report is March 2018. The data has the following three columns- 1) Year 2) Direct Contribution(in trillion USD) 3) Total Contribution(in trillion USD) The link to the dataset is pasted below- URL: https://www.statista.com/statistics/233223/travel-and-tourism–total-economic-contribution- worldwide/
  • 12. Figure 7: Worldwide T&T GDP Contribution 3 Related Work The growth in global travel industry has been momentous and will presumably proceed for a long time to come. The travel industry’s significance to the economy of numerous industrial and developing nations has also increased drastically. The following points will assist in exploring the topic in detail in order to get a better clarity with the use of past works in accordance to the project requirement. After a study which included research of around 114 articles, it has been identified that the critical success factors for the growth of tourism industry are openness to trade and tourist security Kristo (2014). Studies suggest that an increase in tourism demand may alter the country’s patterns of production and specialization, in particular by crowding out internationally traded sectors (i.e. export and import-competing sectors) Sahli & Nowak (2005). This will subsequently affect the GDP of the country and also the employment rate however the impact may be positive or negative depending on the residents attitude to the growing number of tourists. While the underlying phases of the travel industry are typically met with a lot of eagerness on part of local residents in view of the apparent financial advantages, it is just common that, as undesirable changes take place in the physical condition and in the sort of vacationer being pulled in, this inclination step by step turns out to be increasingly negative. In order to have a sustainable tourism growth, these factors need to be monitored intermittently. The following points will assist in exploring the topic in detail in order to get a better clarity with the use of past works in accordance to the project requirement. Under developed countries, beset by incapacitating rural poverty, have extensive potential in pulling in travellers looking for new, bona fide encounters in zones of unexploited natural and cultural riches. The direct effects occur within the primary tourism sectors; hotelling, restaurants, transportation, and retail trade Stynes (1997). Under-developed countries
  • 13. generally lack proper employment sectors hence the economy does not flourish at the required rate. For such countries, tourism is one such sector that will assist in upbringing the economy. The challenging part for the government is to allocate funds for marketing the tourism on global level. However, the growing rate of tourism does not necessarily mean a significant increase in the revenue generated by this industry. The spend behaviour of the tourists must be taken into account so as to benefit most out of each individual tourist as this will lead to overall growth in the sector as well as the economy. 4 Data Model Previously mentioned literature’s gave me clearness on what my methodology ought to be and also how I need to continue while keeping all my business necessities in prospect. I have tried to incorporate the country development indicators in order to build a system that will produce a significant relation and thereby enable the respective government bodies to take steps in accordance for improving the countrys economy. The four key decisions to be made during the design of a dimensional model according to the Kimball approach are- 1. Selecting the business process. 2. Declaring the grains. 3. Identifying the dimensions. 4. Identifying the facts. Here, I have used the Kimballs bottom up approach, wherein at first the data marts are created and then we build the data warehouse. In a general sense, marts are made of all the dimension tables and the fact table. We can say that data marts are assembled to form a data warehouse. The purpose for picking the Kimballs approach in light of the fact that all through his work Ralph Kimball constantly bolstered the consideration of the end-clients in the process Chhabra and Pahwa (2016). Regardless, for our project we have assembled data of last 4-5 years and giving a filtered analysis to the end users to settle on sound decisions. Along these lines, to achieve what I ask for from my tasks I have joined my datasets on the basis of the unique values each of them include. So, my dataset in section data source 2.1 contains the unemployment rate, it is joined with my dataset from 2.2 on the basis of country and year. Similarly, the other datasets have also been joined on the basis of either country or year. Now, from the data that has been derived, I have created 2 dimensions- DimCountry and DimYear. These dimensions have been discussed in brief below- DimCountry: This dimension table consists of all the countries that are present in our data tables. As there were different countries in the data sources, I had to write a SQL query with Union operation in order to have all the countries that would be required for the queries. The attributes that are contained in this dimension are Country ID and Country Name. Country id has been created by me in SSIS and is used in the Fact table with a foreign key reference. DimYear: In this dimension, we have used only the Year attribute along with the Year id which is a primary key that I have generated in SSIS. The Year id will act as a foreign key in my fact table. With the help of DimYear we can notice the change in the rate of our measures. Now, let us discuss the facts that will act as the measures and help us in the business
  • 14. process- Fact Table: The fact table that is created with the help of dim tables constitutes both the primary keys of our dimensions. The fact table plays a crucial role in setting up the ground for our business query requirements. It contains all the measures that would be required for the thorough analysis of our BI queries. Following are the facts that my fact table comprises of- Country ID : Primary key of DimCountry Year ID: Primary key of DimYear Arrival Rate: It consists of the arrival rate of tourists in a particular country over the year. Unemployment Rate: This is a measure of the rate of unemployment of a country over the year. Tourism growth percent: It consists the tourism growth rate between 2010-2016 for 10 less developed countries. TnT GDP Contribution10: This value represents the contribution of travel and tourism to the countrys GDP. TnT TotalGDPContribution: It consists of the overall contribution of travel and tourism industry to the worldwide GDP. TnT GDP Contribution5: This value represents the contribution of travel and tourism to the countrys GDP. All the above dimensions and facts will be our base and form our data mart. The Kimballs approach is followed and the following schema is formed: Figure 8: Star Schema
  • 15. 5 Logical Data Map Following is the Logical Data Map for my way to deal with acquiring the ideal star schema. It clarifies all of the dimensions and facts that have been utilized and how they were changed before stacking. Table 3: Logical Data Map describing all transforma- tions, sources and destinations for all components of the data model illustrated in Figure 1 Source Column Destination Column Type Transformation 1,2,3,4 Country DimCountry Country name Dimension In few of data sources, country name was spelled incor- rectly, so matched using match() function 1,2,4,5 Year DimYear Year Dimension contained junk prefix (’x’) which was removed using separate() function 2 Tourist arrival FactTable Arrival Rate Fact values were converted to float type 1 Value FactTable Unemployment Rate Fact values were converted to float. Null values removed using na.omit() function 3 Percentage FactTable Tourism growth percent Fact percent symbol was separated using separate() function 4 USD in bn real prices FactTable TnT GDP con tribution10 Fact value were converted to float type 5 Total contri bution FactTable TnT Total GDP Contribution Fact values were converted to float type 4 USD in bn real prices FactTable TnT GDP con tribution5 Fact value were converted to float type
  • 16. 6 ETL Process Data warehouses are basically used for decision-making, hence the foremost requirement of a data warehouse is the correctness of data which will avoid misleading calculations. The ETL process primarily consists of Extraction, Transformation Loading which in- volves the following prominent tasks: a) identification of relevant information at the source level. b) extraction of the appropriate information. c) integration of the information obtained from multiple sources into a common format. d) transforming the integrated data model through cleaning process based on business rules or requirement. e) loading the processed information onto the data warehouse / data mart Rahm & Do (2000) Mentioned below is the ETL strategy that I have applied in my project. Figure 9: Cube Formation 6.1 Extraction: The first data set for my project has been extracted from OECD. There were total 8 columns in the dataset. The dataset includes the year wise unemployment rate for around 42 countries and it was extracted in CSV format. The second source of data has been extracted from World Bank. The dataset incorporates 16 columns and tourist arrival rate for 269 countries. The data is horizontally distributed yearwise hence had to perform a column transpose in R. The third data source is an unstructured one which has been extracted from Wikipedia. From here, I extracted a data table that had the countries showing strong growth in international travel and tourism between 2010-2016. It had two columns country name and the tourist arrivals percentage growth. The fourth and fifth data set for my project has been extracted from World Travel Tourism Council (WTTC). Depending on the query requirements, the countries were individually selected along with required attributes and downloaded as multiple files. These files were then
  • 17. grouped and split into two using R for two separate BI queries. The source for my sixth dataset is Statista. From this website, the data that was extracted consisted of two excel sheets. Out of this, the first sheet was of no use, hence it was removed and the second sheet was converted into CSV format. The second sheet consisted of contribution of travel and tourism to the global GDP. 6.2 Transformation: Data cleaning, also called data cleansing or scrubbing is an important function of trans- formation. It basically deals with detecting and removing errors and inconsistencies from data so as to ameliorate the quality of data. Data quality problems are generally observed in single data collections, such as files and databases, the causes for this being misspellings during data entry, missing information or other invalid data Rahm & Do (2000) All the above datasets that were extracted from multiple sources need to be transformed in a clean format which will thus make it suitable for loading in our data warehouse. Trans- formation part is one of the most time-consuming activity as all the data sources must be thoroughly cleaned in order to produce a reliable business solution. Now, let us see what all cleaning mechanisms I have used to cleanse the data for our project implementation. Firstly, I scanned the data extracted from my first source i.e. OECD and distinguished the attributes that are required for my query. The columns that were not required were removed from the dataset using R programming. The dataset only contained the country codes and not the country names. In order to achieve a successful join, I had to replace the country codes with the country names. For this, I used the match() function in R and replaced with the country names from my second dataset that was sourced from World Bank. Thereafter, I found that were few missing values that were then removed using the na.omit() function in R. In the World Bank data set, as mentioned above, the data was horizontally distributed i.e. for each year there was a separate column with the value in respective cell. For performing the join operation, I needed all the years in one single col- umn. For this, I have used the t() function in R while keeping the first two columns intact. Then I used the rep() function to repeat the values in first two columns to match with the corresponding transposed value. There were some bad entries in the dataset which were then replaced with NA values and then subsequently removed by using na.omit() function. The third dataset is an unstructured dataset sourced from Wikipedia. The data table was pulled using Selector Gadget in Chrome and loaded in R for cleaning purpose. For this, I used two R libraries- tidyr and htmltab. The data contained an initial rank column that was not required and removed. Another column included the percentage (%) symbol infront of the number. So, I used the separate() function to remove the symbol and convert into numeric values. Also I had to name couple of countries properly which were wrongly spelt during extraction. The fourth data source for this project is World Travel Tourism Council (WTTC). As described above, depending on the query, countrywise datasets for the relevant query were extracted which resulted in multiple files from source. For the cleaning purpose, following libraries were used readxl, data.table and tidyr. The data, in this case as well, was horizontally distributed which was then transposed and the country name was repeated to match with the number of correspond- ing transpose value. The year column contained a junk prefix,x, that was removed using s ¯ eparate() function. The fifth and final data source for this project is Statista. The data extracted from here had two sheets, first one had the metadata which was not required and thus removed using R. The second sheet contained three columns that were already
  • 18. in cleaned state just had to remove the initial two rows and change the column names. All the above transformations have been carried out in R studio and the relevant R codes have been been mentioned in the Appendix. 6.3 Loading: After the completion of the above stages, we then have to move our raw data to staging arena. The data is called from the SSMS to SSIS where the data would be staged through the flat file source to the OLE DB Destination. In the SSIS, we have several components out of which OLE DB destination is one which loads data from database in form of tables using SQL commands. In this stage, I have created 6 flat files in my staging arena which will hold the raw data. Additionally, I created an execute SQL task to create the dimension table from the flat files. Within this task, I have written an SQL query to create the dimension tables followed by the insert query to insert the data from all the staging tables. By completing this process, I got two dimension tables which will now be used for populating our fact table. For the fact table, another SQL task is taken and the output of the previous SQL task for dimension has been given as input. In the SQL task for fact table, we created the fact table and written an SQL query for obtaining the values from all the staging tables that will be required for our business queries. After this, the final step is to populate the fact table with the relevant measures. During the execution of query for populating fact table, it is important to have proper SQL joins in place so that correct values get populated else this will lead us to misleading values. This was one of the most challenging part as I had 6 different datasets with differing values hence needed to experiment with join queries multiple times. During the execution of the complete process, the data gets loaded repetitively thus creating duplicate values each time. In order to avoid this scenario, we have added a truncate query at the start of the SSIS process. This will truncate all the staging tables and drop the dimension and fact tables and rerun the process so that the data gets populated properly every time. Once fact table is populated, we then have to create our schema. Star schemas characteristically consist of fact tables linked to associated dimension tables via primary/foreign key relationships. OLAP cubes can be equivalent in content to, or more often derived from, a relational star schema Kimball/Ross (2016). In SSAS first I made a new data source source which determined my association from where it got the data from SSIS. This imported every one of my tables including dimensions and facts in SSAS. Our essential point was to acquire a star schema for which we went with Kimball’s methodology of data modelling. To accomplish this we at that point made a data source view in which we chose what all tables we required. Here I chose my two dimensions and the fact table and in the wake of handling the data source view it gave me my ideal star schema design. Now the final step was to create the MOLAP cube. For achieving this, a new cube had to be built within which our existing dimension and fact tables had to be selected. Next, I named the cube and hit the process button to deploy the cube. Once we receive the deployment successful as output, we can move over to the browser section and drag the fact count field into the query field and confirm whether the fact table has been properly populated or not. With this we can say that my cube is successfully deployed.
  • 19. 7 Application In the sections above, we observed the successful deployment of our cube that we are now going to use for answering our business queries. Following are the business queries that I think would be useful to review the constraints discussed in section 3. The obtained results of these queries along with the previous related work on the subject have been discussed in detail in section 7.4. Now, we will evaluate our 3 business queries and check our results: 7.1 BI Query 1: Is there a relationship between the number of tourists visiting a country and the unemployment rate of a country? The sources contributing for this query are data source (2.1) and data source (2.2) Figure 1 shows us the visualized comparison of the arrival rate and the unemployment rate for a country. From the graph, we can clearly observe that there is a positive and inverse relationship between the two. As the tourist arrival rate increases, there is a decrease in the unemployment rate hence we can say that tourism is one of the factors that can reduce the unemployment rate of a country. Surprisingly, we can notice in our graph that although the tourist arrival rate in South Africa is increasing, it is not having an impact on the unemployment rate, infact it is as well increasing. Figure 10: Results for BI Query 1
  • 20. 7.2 Query 2: Does the growth in International travel and tourism affect the GDP of the country? The sources contributing for this query are data source (2.3) and data source (2.5) The first graph in figure 2 shows us the percentage growth in international travel and tourism between 2010-2016. Between these years, from our analysis, we can see that the GDP has also increased significantly which shows that there is a distinct relationship between the tourism industry and the GDP of a country. Only for 1 country, Sao Tome and Principe, the GDP growth is not significant although the tourism rate has increased by almost 30 Figure 11: Results for BI Query 2
  • 21. 7.3 BI Query 3: How the top 5 European countries have per- formed in terms of the global GDP contribution of the Travel and Tourism industry? The sources contributing for this query are data source (2.4) and data source (2.5) From figure 3, the yearwise growth in the worldwide GDP contribution of the travel and tourism industry is visible. In the second graph we can see the top 5 European countries with maximum tourism numbers and the GDP contribution of their tourism industry. As visible from the graph, the GDP contribution of France and Spain is below the average global GDP contribution of Travel and tourism industry. Whereas United Kingdom and Italy have outperformed the global GDP growth rate. Figure 12: Results for BI Query 3
  • 22. 7.4 Discussion As now we are done with our business queries and have the graphs with us let us discuss the implication of each one of them in detail. Let us first discuss BI query 1 that gave us the relation between the tourist arrival rate and the unemployment rate of a country. We can clearly observe that as the rate of tourism is growing in a country, it has a direct and inverse relation with the unemployment rate of the country i.e. theres a decrease in unemployment. The tourism spending, as also mentioned in section 3, lay primarily in the purchase of goods and services from a variety of industries, with usually rather less than two-thirds of their expenditure being in the hotels and restaurants normally identified with the tourism industry De Kadt (1979). This spending pattern by tourists thereby creates job for the locals. The second query gives us the relation between the growth of tourism and the GDP of the country. The countries that were studied were some of the less developed countries. As described in section 3, the less developed countries usually have unexploited natural and cultural riches. If proper promotional initiatives are taken by the local bodies, tourists would be willing to explore new places and thereby boost the countrys economy. From our graph, we can notice a strong positive relation between the two attributes thereby highlighting the significant impact tourism can have on the economy. Result of the third and final query is to visualize on a global level, how the top 5 tourist famous European countries have performed over the recent years in terms of the tourism contribution to GDP. From the graph, it can noted that although the tourism numbers are high in France and Spain, the contribution to global economy is below the average growth rate. The government can work around strategies that will influence the tourists to spend more while they travel within these countries whereas Germany, Italy and United Kingdom have contributed almost equally and at similar rate to the global GDP contribution. 8 Conclusion and Future Work With all the data that was fetched for analysis, I have tried to build a data warehouse that will help us in correlating the tourism parameter with the country development indicators. From the graphs we can say that we were able to achieve the desired results. What I observed is that, in the recent years, there has been a very positive growth in the tourism industry worldwide. This growth is primarily due to reduced travel times and costs and also better communication mediums. The notable aspect is the positive impact on the key country development indicators. Lot of countries have started acknowledging this by allocating part of budget towards the betterment of tourism related services. However, it is also equally important to consistently monitor the impact on the GDP contribution as although the tourism rate is increasing it does not necessarily mean that it will improve the GDP at a similar rate. The future prospect in this study could be to analyze tourist sentiment so as to understand what factors are considered before deciding to visit the place. This could be compared across different countries and modelled to improve tourism rate. Also understanding the spending pattern of tourists is important in order to monitor the impact of tourism industry on the GDP contribution.
  • 23. References De Kadt, E. (1979), ‘Tourism: Passport todevelopment’, Perspectives on thesocial and- cultural effects of tourism in developing countries . Kimball/Ross (2016), ‘Star schema olap cube — kimball dimensional modeling tech- niques’. Kristo, J. (2014), ‘Evaluating the tourism-led economic growth hypothesis in a developing country: The case of albania’, Mediterranean Journal of Social Sciences 5(8), 39. Kumar, V. R. (2018), ‘Ease visa norms for free movement of tourists, says united nations body’, https://www.thehindubusinessline.com/news/variety/ ease-visa-norms-for-free-movement-of-tourists-says-united-nations-body/ article20602398.ece1. Rahm, E. & Do, H. H. (2000), ‘Data cleaning: Problems and current approaches’, IEEE Data Eng. Bull. 23(4), 3–13. Sahli, M. & Nowak, J.-J. (2005), ‘Migration, unemployment and net benefits of inbound tourism in a developing country’. Stynes, D. J. (1997), ‘Economic impacts of tourism’, Illinois Bureau of Tourism, Depart- ment of Commerce and Community Affairs . Appendix R code example #Extraction and cleaning Unemployment Rate - Data Source 1 setwd("C:/ Users/shant/Desktop/Data -Files") unemployment <- data.frame(read.csv("Unemployment -total.csv")) arrival <- data.frame(read.csv("International -arrival.csv")) #Changing the column names colnames(unemployment) <- c("Country", "Indicator", "Subject", "Measure", "Fr #removing columns that are not required unemployment[,4:5] <- NULL unemployment[6] <- NULL #replacing country codes with country names unemployment$Country <- arrival$Country.Name[match(unemployment$Country , arri
  • 24. #removing NA values unemployment <- na.omit(unemployment) write.csv(unemployment , "Unemployment -rate -cleaned.csv", row.names = F) #Extraction and cleaning of Arrival data - Data Source 2 #install.packages (" reshape2") library(reshape2) setwd("C:/ Users/shant/Desktop/Data -Files") arrival_rate <- data.frame(read.csv("International -arrival.csv", stringsAsFac head(arrival_rate) arrival_rate[,1:2] <- NULL arrival_rate <-subset(arrival_rate , select= -c(3,4)) head(arrival_rate) #arrival_rate <- melt(arrival_rate ,id = c(" Country.Name", "Country.Code ")) data <- t(arrival_rate) arrival_rate <-cbind(arrival_rate[rep(1:nrow(arrival_rate),each=10),1:2],#this Year=c(2009:2018),#this gives the year column Tourist_Arrival=as.vector(data[3:12 ,])) # the Average Educat arrival_rate$Country <- NULL arrival_rate$Tourist_Arrival <- as.character(arrival_rate$Tourist_Arrival) arrival_rate$Tourist_Arrival[arrival_rate$Tourist_Arrival == ".."] <- "NA" #arrival_rate <- arrival_rate [!( arrival_rate$Tourist_Arrival == "NA")] arrival_rate <- arrival_rate[-c(2640:2690), ] #arrival_rate <- na.omit(arrival_rate) write.csv(arrival_rate , "arrival -rate -cleanned.csv", row.names = F) #Extracting and cleaning Wikipedia data - Data source 3 #install.packages (" htmltab ") library(tidyr) #working setwd("C:/ Users/shant/Desktop/Data -Files") library(htmltab) url <- "https ://en.wikipedia.org/wiki/Tourism"
  • 25. tourism <- htmltab(doc = url , which = 5) #Delete Columns tourism <- tourism[, -c(1)] tourism <- separate(tourism , "Percentage", into = c("Percentage","dump")) tourism$dump <- NULL tourism$Percentage <- as.numeric(tourism$Percentage) tourism$Country[5] <- "Sao Tome and Principe" tourism$Country[6] <- "Srilanka" write.csv(tourism , " highesttourismgrowth .csv", row.names = F) # Extracting and cleaning GDP contribution - WTTC #install.packages (" data.table", repos = "https :// ftp.heanet.ie/mirrors/cran.r #install.packages (" readxl", repos = "https :// ftp.heanet.ie/mirrors/cran.r-pro #install.packages (" tidyr", repos = "https :// ftp.heanet.ie/mirrors/cran.r-proj library(readxl) library(data.table) library(tidyr) #cleaning the raw Azerbaijan data setwd("C:/ Users/shant/Desktop/Data -Files/GDP -contribution") azerbaijan <- data.frame(read_excel("Azerbaijan -tourism -gdp.xls")) azerbaijan <- data.frame(t(azerbaijan )) setDT(azerbaijan , keep.rownames = TRUE )[] azerbaijan <- azerbaijan[-1,-9] azerbaijan <- azerbaijan[-c(1:11), ] colnames(azerbaijan) <- c("Year", "Country", "Local currencyin bn (Nominal pr str(azerbaijan) azerbaijan$Country <- as.factor(azerbaijan$Country) azerbaijan$Country[is.na(azerbaijan$Country )] <- "Azerbaijan" azerbaijan$Year <- gsub("[()]", "", azerbaijan$Year) azerbaijan <- separate(azerbaijan , col = Year , into = c("Junk","Year"), sep = #typeof(azerbaijan $"US$ in bn (Real prices )")
  • 26. #typeof(azerbaijan $" Percentage growth ") #typeof(azerbaijan $" Percentage of GDP") azerbaijan <- azerbaijan[,-1] write.csv(azerbaijan , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/aze #cleaning the raw Cameroon data cameroon <- data.frame(read_excel("Cameroon -tourism -gdp.xls")) cameroon <- data.frame(t(cameroon )) setDT(cameroon , keep.rownames = TRUE )[] cameroon <- cameroon[-1,-9] cameroon <- cameroon[-c(1:11), ] colnames(cameroon) <- c("Year", "Country", "Local currencyin bn (Nominal pric str(cameroon) cameroon$Country <- as.factor(cameroon$Country) cameroon$Country[is.na(cameroon$Country )] <- "Cameroon" cameroon$Year <- gsub("[()]", "", cameroon$Year) cameroon <- separate(cameroon , col = Year , into = c("Junk","Year"), sep = "X" cameroon <- cameroon[,-1] write.csv(cameroon , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/camer #cleaning the raw Gerogia data georgia <- data.frame(read_excel("Georgia -tourism -gdp.xls")) georgia <- data.frame(t(georgia )) setDT(georgia , keep.rownames = TRUE )[] georgia <- georgia[-1,-9] georgia <- georgia[-c(1:11), ] colnames(georgia) <- c("Year", "Country", "Local currencyin bn (Nominal price str(georgia)
  • 27. georgia$Country <- as.factor(georgia$Country) georgia$Country[is.na(georgia$Country )] <- "Georgia" georgia$Year <- gsub("[()]", "", georgia$Year) georgia <- separate(georgia , col = Year , into = c("Junk","Year"), sep = "X") georgia <- georgia[,-1] write.csv(georgia , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/georgi #cleaning the raw Iceland data iceland <- data.frame(read_excel("Iceland -tourism -gdp.xls")) iceland <- data.frame(t(iceland )) setDT(iceland , keep.rownames = TRUE )[] iceland <- iceland[-1,-9] iceland <- iceland[-c(1:11), ] colnames(iceland) <- c("Year", "Country", "Local currencyin bn (Nominal price iceland$Country[is.na(iceland$Country )] <- "Iceland" iceland$Year <- gsub("[()]", "", iceland$Year) iceland <- separate(iceland , col = Year , into = c("Junk","Year"), sep = "X") iceland <- iceland[,-1] write.csv(iceland , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/icelan #cleaning the raw Kyrgyzstan data kyrgyzstan <- data.frame(read_excel("Kyrgyzstan -tourism -gdp.xls")) kyrgyzstan <- data.frame(t(kyrgyzstan )) setDT(kyrgyzstan , keep.rownames = TRUE )[] kyrgyzstan <- kyrgyzstan[-1,-9] kyrgyzstan <- kyrgyzstan[-c(1:11), ] colnames(kyrgyzstan) <- c("Year", "Country", "Local currencyin bn (Nominal pr str(kyrgyzstan)
  • 28. kyrgyzstan$Country <- as.factor(kyrgyzstan$Country) kyrgyzstan$Country[is.na(kyrgyzstan$Country )] <- "Kyrgyzstan" kyrgyzstan$Year <- gsub("[()]", "", kyrgyzstan$Year) kyrgyzstan <- separate(kyrgyzstan , col = Year , into = c("Junk","Year"), sep = kyrgyzstan <- kyrgyzstan[,-1] write.csv(kyrgyzstan , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/kyr #cleaning the raw Myanmar data myanmar <- data.frame(read_excel("Myanmar -tourism -gdp.xls")) myanmar <- data.frame(t(myanmar )) setDT(myanmar , keep.rownames = TRUE )[] myanmar <- myanmar[-1,-9] myanmar <- myanmar[-c(1:11), ] colnames(myanmar) <- c("Year", "Country", "Local currencyin bn (Nominal price myanmar$Country[is.na(myanmar$Country )] <- "Myanmar" myanmar$Year <- gsub("[()]", "", myanmar$Year) myanmar <- separate(myanmar , col = Year , into = c("Junk","Year"), sep = "X") myanmar <- myanmar[,-1] write.csv(myanmar , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/myanma #cleaning the raw Qatar data qatar <- data.frame(read_excel("Qatar -tourism -gdp.xls")) qatar <- data.frame(t(qatar )) setDT(qatar , keep.rownames = TRUE )[] qatar <- qatar[-1,-9] qatar <- qatar[-c(1:11), ] colnames(qatar) <- c("Year", "Country", "Local currencyin bn (Nominal prices)
  • 29. qatar$Country[is.na(qatar$Country )] <- "Qatar" qatar$Year <- gsub("[()]", "", qatar$Year) qatar <- separate(qatar , col = Year , into = c("Junk","Year"), sep = "X") qatar <- qatar[,-1] write.csv(qatar , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/qatar -t #cleaning the raw Sao Tome data saotome <- data.frame(read_excel("Sao -Tome -tourism -gdp.xls")) saotome <- data.frame(t(saotome )) setDT(saotome , keep.rownames = TRUE )[] saotome <- saotome[-1,-9] saotome <- saotome[-c(1:11), ] colnames(saotome) <- c("Year", "Country", "Local currencyin bn (Nominal price saotome$Country[is.na(saotome$Country )] <- "Sao Tome and Principe" saotome$Year <- gsub("[()]", "", saotome$Year) saotome <- separate(saotome , col = Year , into = c("Junk","Year"), sep = "X") saotome <- saotome[,-1] write.csv(saotome , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/saotom #cleaning the raw Srilanka data srilanka <- data.frame(read_excel("Srilanka -tourism -gdp.xls")) srilanka <- data.frame(t(srilanka )) setDT(srilanka , keep.rownames = TRUE )[] srilanka <- srilanka[-1,-9] srilanka <- srilanka[-c(1:11), ] colnames(srilanka) <- c("Year", "Country", "Local currencyin bn (Nominal pric srilanka$Country <- "Srilanka"
  • 30. srilanka$Year <- gsub("[()]", "", srilanka$Year) srilanka <- separate(srilanka , col = Year , into = c("Junk","Year"), sep = "X" srilanka <- srilanka[,-1] write.csv(srilanka , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/srila #cleaning the raw Sudan data sudan <- data.frame(read_excel("Sudan -tourism -gdp.xls")) sudan <- data.frame(t(sudan )) setDT(sudan , keep.rownames = TRUE )[] sudan <- sudan[-1,-9] sudan <- sudan[-c(1:11), ] colnames(sudan) <- c("Year", "Country", "Local currencyin bn (Nominal prices) sudan$Country[is.na(sudan$Country )] <- "Sudan" sudan$Year <- gsub("[()]", "", sudan$Year) sudan <- separate(sudan , col = Year , into = c("Junk","Year"), sep = "X") sudan <- sudan[,-1] write.csv(sudan , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/sudan -t #merging all the cleaned files sudan <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/sudan - colnames(sudan) srilanka <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/sril colnames(srilanka) saotome <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/saoto colnames(saotome) qatar <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/qatar - colnames(qatar) myanmar <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/myanm colnames(myanmar) kyrgyzstan <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/ky colnames(kyrgyzstan) iceland <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/icela colnames(iceland) georgia <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/georg
  • 31. colnames(georgia) cameroon <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/came colnames(cameroon) azerbaijan <- read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contribution/az #colnames(azerbaijan) <- trimws(colnames(azerbaijan )) demo <- rbind(sudan , srilanka , saotome , qatar , myanmar , kyrgyzstan , iceland , #demo$Local.currencyin.bn.. Nominal.prices. <- as.numeric(demo$Local.currencyi #demo$Local.currency.in.bn.. Real.prices. <- as.numeric(demo$Local.currency.in #demo$Percentage.growth <- as.numeric(demo$Percentage.growth) #demo$Percentage.of.GDP <- as.numeric(demo$Percentage.of.GDP) #demo$US..in.bn.. Nominal.prices. <- as.numeric(demo$US..in.bn.. Nominal.prices #demo$US..in.bn.. Real.prices. <- as.numeric(demo$US..in.bn.. Real.prices .) write.csv(demo , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/10merged - col <- data.frame(read.csv("C:/ Users/shant/Desktop/Data -Files/GDP -contributio colnames(col) <- c("Year", "Country", "Local currencyin bn (Nominal prices)", write.csv(col , "C:/ Users/shant/Desktop/Data -Files/GDP -contribution/10merged - #Extraction and cleaning Statista code - library(readxl) library(plyr) library(tibble) library(dplyr) library(sqldf) demo2 <- data.frame(read_excel("C:/ Users/shant/Desktop/FinalDWData/OECD/Stati demo2 demo2 = demo2[-1:-2 ,] demo2 #demo$Casualty_Class <- demo2$label[match(demo$Casualty_Class , demo2$code )] colnames(demo2) <- c("Year", "Direct Contribution (in trillion USD)", "Total demo2$‘Direct Contribution (in trillion USD)‘ <- as.numeric(demo2$‘Direct Con
  • 32. demo2$Year <- as.numeric(demo2$Year) demo2$’Total Contribution (in trillion USD)’ <- as.numeric(demo2$’Total Contr write.csv(demo2, "C:/ Users/shant/Desktop/FinalDWData/OECD/Statista/Imp -data/G