2. Table of Contents
Climate Change, a myth or reality?.................................................................................................................3
Introduction .......................................................................................................................................................3
Tools used................................................................................................................................................................3
Data Preperation...................................................................................................................................................3
The Data..........................................................................................................................................................3
Structure of the Data..................................................................................................................................3
Normalizing the data..................................................................................................................................4
Problems with the dataset.......................................................................................................................4
Data exploration using SQL .....................................................................................................................5
Data visualization using Tableau...........................................................................................................6
Data modeling using R...............................................................................................................................7
Summary ..................................................................................................................................................................8
Bonus : Challenges faced....................................................................................................................................8
3. Climate Change, a myth or reality?
Introduction
Climate change is viewed by some as one of the major threats to the planet. But to others it
is nothing but a hoax. The best way to understand where to stand is by analyzing climate
change data over time to see if there indeed has been a significant change. For this project,
Temperature has been recorded overtime in Cincinnati. And it is studied to view any affects
of climate change.
Tools used
Microsoft SQL Server was used to import, understand the data. R Studio was used to form
plots and models and Tableau was used for visualizations
Data Preperation
The Data
The data used for this particular project is called "Climate Change: Earth Surface
Temperature Data" The data consists of Global Land and Ocean Temperatures, Land
Temperatures by City, State and Country measured over 266 years, from 1743 to 2013. For
the purpose of this project, the dataset depicting land temperatures by City has been taken.
The original source of the data is Berkeley Earth
Structure of the Data
The structure for the dataset contains 7 columns
Variables Considered in Global Temperatures by City
d
t
AverageTemperat
ure
AverageTemperatureUncer
tainty City
Countr
y
Latitu
de
Longitu
de
1743-11-01 6.068 1.737 Ãrhu
s
Denma
rk
57.05
N
10.33
1744-04-01 5.788 3.624 Ãrhu
s
Denma
rk
57.05
N
10.33
4. The data has the following variables
Variable Description
dt The date in the format of year-month-day
AverageTemperature Average Temperature in the regoin
AverageTemperatureUncertainty The 95% confidence interval of the Average
Temperature
City The City
Country The Country
Latitude The Latitude
Longitude The Longitude
Normalizing the data
The data can be divided into the following tables and attributes
Normalized Tables
Tables Attributes
Date The Date in the form of year month day
Average
Temperature
Average Temperature in the regoin, The 95% confidence interval of
the Average Temperature
City The City, Latitude, Longitude
Country The Country
Problems with the dataset
1) All the data types are stored as strings
2) There are many Null values
3) There are duplicate entries
4) There are empty rows for average temperature and uncertainty values
5) Not all days have entries
6) For the initial years (1743 - around 1850) The measurement of temperatures and
uncertainty was probably not made with sophisticated equipment used in the later
years as there is a difference in data between the two sets
7) The maximum temperature notes is 29.8320000 and minimum is -11.0710000, which
could be the limitation to the data / mode of measurement as temperatures can clearly
be greater than 30 degrees Celcius
5. Data exploration using SQL
1) There are 3448 cities from different countries
2) There are 3239 records for Cincinnati
3) After making empty and non integer values null and converting dt to date datatype
and average temperature, average temperature uncertainty to decimal datatype the
resultant rows are as follows:
4) The dates spread over 266 years, 1743 to 2013. From 01-11-1743 to 01-09-2013
5) Checking range of temperature and uncertainty for the 12 months
6) When ordering by month it can be seen that the lowest temperature is in January and
the highest temperature is in July
7) The uncertainty reduces over the years , this could be due to increase in accuracy of
measurements over the years
8) When ordering by year it can be seen that the highest temperature was at 2012 and
lowest at 1779, hence there could be an increase in temperature from 1779 to 2012,
on a rough scale. However, there are many anomolies in the data.
Sample possible anomaly, lowest temperature year
Maximum temperature years
6. Data visualization using Tableau
The Average Temperatures over the world for the latest year
The Average Temperature increase over the years for the world
There is a clear rise in temperature over the years over the world
The Average temperature increase over the years for cincinnati
7. The anomalies in the data can be clearly seen in this graph
The average temperatures for various months for cincinnati
Data modeling using R
Plotting the correlations and distributions of the data
It can be seen that the highest correlation is between temperature uncertainty and year,
this as stated, could be due to increase in recording precision.
8. Plotting the model
The fit has an r squared is 0.05568. Which implies that the model only explains 5.6% of the
data, the rest is noise or anomolies. Hence for the city of Cincinnati, climate change is not
majorly observed, as opposed to global values. This could be primarily due to more general
data working better on the global scale. Working with global data would lead to more
accurate results.
Summary
The earth is supposed to get hotter over the years, this is known as climate change.From
this analysis, there is a clear rise in global temperature over the years. But that is not
observed correctly for Cincinnati alone. Cleaning the data by removing influential outliers,
collecting more data over more time would improve the results along with more
sophisticated equipment to measure temperatures and predict their rise. It would also
benefit the study to factor in more parameters like the carbon emmissions of a region,
energy saving policies of a region, the population of a region, Ozone levels of a region, the
azimuth and elevation at a lat/long etc. Most importantly, taking steps towards living a
greener life would help us reduce the speed at which the temperature is rising.
Bonus : Challenges faced
The biggest challenge was when the dataset I wanted to use was too big for R and SQL
executions, therefore I had to resort to using a smaller dataset(Cincinnati). I have not
figured out how to deal with a dataset that large since it was rich in information. Ultimately,
I have learnt that the best way is to cut out the data.