Climate change model forecast global temperature out to 2100Gaetan Lion
This study is leveraging a VAR model introduced in an earlier presentation to forecast global temperature out to 2100, and assess how likely are we to keep such temperatures at or under the + 1.5 degree Celsius threshold.
Climate change model forecast global temperature out to 2100Gaetan Lion
This study is leveraging a VAR model introduced in an earlier presentation to forecast global temperature out to 2100, and assess how likely are we to keep such temperatures at or under the + 1.5 degree Celsius threshold.
Presenting Climate Change Models that estimate and forecast global temperature levels in association or caused by CO2 concentration (ppm) levels. These models also replicate IPCC scenarios.
Analysis of the impact cloth diapering has on household utilities: natural gas, electricy and water.
Plus, estimates for costs of the cloth diapers and accessories.
Statistical Evaluation of Spatial Interpolation Methods for Small-Sampled Reg...Beniamino Murgante
Statistical Evaluation of Spatial Interpolation Methods for Small-Sampled Region. A Case Study of Temperature Change Phenomenon in Bangladesh
Avit Bhowmik, Pedro Cabral - Institute of Statistics and Information Management, New University of Lisbon
Study of Extreme Weather Events (hot & cold day or wave) over Bihar RegionEditor IJLRES
In this paper an attempt has been made to study the heat wave /cold wave and hot /cold days for Bihar region during 46 years of data (1969-2014). The representative months have been taken from December to February and March to June for cold wave/ cold days and hot wave /hot days respectively. The results obtained from decade analysis shows that the frequency of the hot wave /hot days increased during the last decade (2005 to 2014) almost for all the stations of Bihar region. In the similar manner the frequency of the cold wave /cold days also increases during last two decades (1995 to 2004 and 2005 to 2014) over the region. The gaps in the data of part time observatories have not been taken into account in the final decision.
Forbes co2 and temperature presentation for earth day at cua april 22 2015 ...Kevin Forbes
Extended Abstract
Introduction
While the vast majority of climate scientists have concluded that the changes in the climate over the past few decades can be attributed to human activity [Doran and Zimmerman, 2009], there has been a degree of reluctance to attribute specific weather events to elevated CO2 concentrations. For example, Coumou and Rahmstorf [2012] have noted that there has been an exceptionally high incidence of extreme weather events over the past decade and that some of the events can be linked to climate change but nevertheless concede that particular events “cannot be directly attributed to global warming.” Moreover, the World Meteorological Organization has noted that the incidence of extreme weather events matches IPCC projections, but qualifies this conclusion by stating that “it is impossible to say that an individual weather or climate event was “caused” by climate change….” [World Meteorological Organization, 2011, p 15]. This claim of “attribution impossibility” is not a minor shortcoming; it leaves the causes of extreme events open to question, allowing climate skeptics to attribute the increased incidence of extreme events to so-called “natural variability.” In the United States, this has undermined the political consensus necessary to adopt robust, cost-effective policies to reduce CO2 emissions.
This paper explores the relationship between CO2 and weather by addressing whether there is a causal relationship between the atmospheric concentration level of carbon dioxide and hourly temperature. The analysis begins by noting that traditional correlation analysis is not capable of addressing whether there is a causal relationship between CO2 and temperature because statistical methods alone cannot render results that establish or reject causality between two variables that are contemporaneously correlated. Nevertheless, it is possible to address the issue of causality by using more advanced statistical techniques.
An Approach to Establishing Causality
This paper addresses the issue of causality between CO2 and temperature by following the research of the Nobel Laureate Clive Granger [1969], who defined causality in terms of whether lagged values of a variable lead to more accurate predictions of some other variable. In his words, “The definition of causality …is based entirely on the predictability of the some series, say Xt. If some other series Yt, contains information in past terms that helps in the prediction of Xt … then Yt is said to cause Xt.” [Granger, 1969, p 430]. This study embraces this view of causality by examining whether lagged values of CO2 lead to more accurate forecasts of temperature. The specific approach adopted here is to exploit the diurnal nature of the variation in the hourly CO2 concentration levels by using the CO2 concentration level in hour t – 24 as an explanatory variable. This variable has a 0.96 correlation with the CO2 level in hour t but i
Analysis of indian weather data sets using data mining techniquescsandit
India has a typical weather conditions consisting of various seasons and geographical conditions.Country
has extreme high temperatures at rajasthan desert, cold climate at Himalayas and heavy rainfall at
chirapunji. These extreme variations in temperatures make us to feel difficult in inferring / predictions of
weather effectively. It requires higher scientific techniques / methods like machine learning algorithms
applications for effective study and predictions of weather conditions. In this paper, we applied K-means
cluster algorithm for grouping similar data sets together and also applied J48 classification technique
along with linear regression analysis.
Count of Candies - Happy Halloween DayRavi Nakulan
Here we got a quite small sample data which contained a count of candies from 2008-2018 at Hamilton city. It is interesting to see if we can identify any independent variable which may impact the count of candies. Remember, Halloween is celebrated on October 31st so it may be a weekday or weekend.
I decided to choose "Weather" conditions to identify the impact of rain, cold, or severe wind. Seems wind has more impact on human nature because we can protect ourselves from rain, cold but not from the wind.
Presenting Climate Change Models that estimate and forecast global temperature levels in association or caused by CO2 concentration (ppm) levels. These models also replicate IPCC scenarios.
Analysis of the impact cloth diapering has on household utilities: natural gas, electricy and water.
Plus, estimates for costs of the cloth diapers and accessories.
Statistical Evaluation of Spatial Interpolation Methods for Small-Sampled Reg...Beniamino Murgante
Statistical Evaluation of Spatial Interpolation Methods for Small-Sampled Region. A Case Study of Temperature Change Phenomenon in Bangladesh
Avit Bhowmik, Pedro Cabral - Institute of Statistics and Information Management, New University of Lisbon
Study of Extreme Weather Events (hot & cold day or wave) over Bihar RegionEditor IJLRES
In this paper an attempt has been made to study the heat wave /cold wave and hot /cold days for Bihar region during 46 years of data (1969-2014). The representative months have been taken from December to February and March to June for cold wave/ cold days and hot wave /hot days respectively. The results obtained from decade analysis shows that the frequency of the hot wave /hot days increased during the last decade (2005 to 2014) almost for all the stations of Bihar region. In the similar manner the frequency of the cold wave /cold days also increases during last two decades (1995 to 2004 and 2005 to 2014) over the region. The gaps in the data of part time observatories have not been taken into account in the final decision.
Forbes co2 and temperature presentation for earth day at cua april 22 2015 ...Kevin Forbes
Extended Abstract
Introduction
While the vast majority of climate scientists have concluded that the changes in the climate over the past few decades can be attributed to human activity [Doran and Zimmerman, 2009], there has been a degree of reluctance to attribute specific weather events to elevated CO2 concentrations. For example, Coumou and Rahmstorf [2012] have noted that there has been an exceptionally high incidence of extreme weather events over the past decade and that some of the events can be linked to climate change but nevertheless concede that particular events “cannot be directly attributed to global warming.” Moreover, the World Meteorological Organization has noted that the incidence of extreme weather events matches IPCC projections, but qualifies this conclusion by stating that “it is impossible to say that an individual weather or climate event was “caused” by climate change….” [World Meteorological Organization, 2011, p 15]. This claim of “attribution impossibility” is not a minor shortcoming; it leaves the causes of extreme events open to question, allowing climate skeptics to attribute the increased incidence of extreme events to so-called “natural variability.” In the United States, this has undermined the political consensus necessary to adopt robust, cost-effective policies to reduce CO2 emissions.
This paper explores the relationship between CO2 and weather by addressing whether there is a causal relationship between the atmospheric concentration level of carbon dioxide and hourly temperature. The analysis begins by noting that traditional correlation analysis is not capable of addressing whether there is a causal relationship between CO2 and temperature because statistical methods alone cannot render results that establish or reject causality between two variables that are contemporaneously correlated. Nevertheless, it is possible to address the issue of causality by using more advanced statistical techniques.
An Approach to Establishing Causality
This paper addresses the issue of causality between CO2 and temperature by following the research of the Nobel Laureate Clive Granger [1969], who defined causality in terms of whether lagged values of a variable lead to more accurate predictions of some other variable. In his words, “The definition of causality …is based entirely on the predictability of the some series, say Xt. If some other series Yt, contains information in past terms that helps in the prediction of Xt … then Yt is said to cause Xt.” [Granger, 1969, p 430]. This study embraces this view of causality by examining whether lagged values of CO2 lead to more accurate forecasts of temperature. The specific approach adopted here is to exploit the diurnal nature of the variation in the hourly CO2 concentration levels by using the CO2 concentration level in hour t – 24 as an explanatory variable. This variable has a 0.96 correlation with the CO2 level in hour t but i
Analysis of indian weather data sets using data mining techniquescsandit
India has a typical weather conditions consisting of various seasons and geographical conditions.Country
has extreme high temperatures at rajasthan desert, cold climate at Himalayas and heavy rainfall at
chirapunji. These extreme variations in temperatures make us to feel difficult in inferring / predictions of
weather effectively. It requires higher scientific techniques / methods like machine learning algorithms
applications for effective study and predictions of weather conditions. In this paper, we applied K-means
cluster algorithm for grouping similar data sets together and also applied J48 classification technique
along with linear regression analysis.
Count of Candies - Happy Halloween DayRavi Nakulan
Here we got a quite small sample data which contained a count of candies from 2008-2018 at Hamilton city. It is interesting to see if we can identify any independent variable which may impact the count of candies. Remember, Halloween is celebrated on October 31st so it may be a weekday or weekend.
I decided to choose "Weather" conditions to identify the impact of rain, cold, or severe wind. Seems wind has more impact on human nature because we can protect ourselves from rain, cold but not from the wind.
Forecasting Temperatures in Bangladesh: An Application of SARIMA ModelsPremier Publishers
Climate change is presently among the significant topics of discussion and temperature is one of its main components. In this study, it is to be observed that the minimum temperature is more fluctuating compared to the maximum temperature. Several suggested SARIMA models were established for maximum and minimum temperature series according to the methods of the Box Jenkin’s methodology. The best model for maximum temperature is SARIMA (1, 0, 0) (1, 1, 0)12 and for minimum temperature is SARIMA (2, 0, 1) (2, 1, 0)12 selected based on AIC. From the model validation outcomes, the projected values are well-fitted through the original data with the lower and upper limits holding bulks of the original data. The detected models are therefore suitable to be used for projecting monthly maximum and minimum temperature in Bangladesh. The selected SARIMA models give two-year predicted monthly maximum and minimum temperatures that can help decision makers to establish priorities for preparing themselves against forthcoming weather fluctuations. The forecasts also display that the minimum temperature of Bangladesh will continue with the upward trend. This is a reflection of a fluctuating climate in the entire country.
ICLR wildfire season forecast 2018 (Richard Carr, Canadian Forest Service)glennmcgillivray
On May 16, ICLR hosted a special webinar which provided a forecast of the 2018 wildfire season. The session was led by Richard Carr, Fire Research Analyst for the Canadian Forest Service. The interactive webinar summarized the current conditions in Canada and provided a forecast for the 2018 fire season. Richard provides fire weather processing for the Canadian Wildland Fire Information System (CWFIS) and international projects. He provides fire weather briefings to the Canadian Forest Service (CFS) fire group, the Canadian Interagency Forest Fire Centre (CIFFC), and to federal emergency response personnel, and helps provide seasonal forecasting of fire risk. He also helps provide information to the North American Seasonal Fire Assessment Outlook, and the North American Drought Monitor via AAFC’s Canadian Drought Monitor. He represents NRCan-CFS in the CIFFC Forest and Fire Meteorology Working Group.
VISUAL ANALYSIS OF ELECTRICITY DEMAND: ENERGY DASHBOARD GRAPHICS Graphical Da...Fatma ÇINAR
A real time interactive data management for Impulse and Response Analysis Technique using lattice and ggplot2 Graphical Packages embedded in R software has been employed. Average consumption, peak consumption and daily consumption data have been used while the temperature data is also employed to highlight the significance of relationship between consumption and the weather conditions. The demand for electricity by the factors affecting the demand with a multi-dimensional matrix graphics based on Energy Dashboard Software has been analysed leading to visualisation.
ICLR Forecast: 2019 Wildfire Season (May 17, 2019)glennmcgillivray
On May 17, 2019, ICLR provided a forecast of the 2019 wildfire season led by Richard Carr from the Canadian Forest Service. The interactive webinar summarized the current conditions in Canada and provided a forecast for the 2019 wildfire season.
Richard Carr provides fire weather processing for the Canadian Wildland Fire Information System (CWFIS) and international projects. He also provides fire weather briefings to the Canadian Forest Service (CFS) fire group, the Canadian Interagency Forest Fire Centre (CIFFC), and to federal emergency response personnel, and helps provide seasonal forecasting of fire risk. Richard helps provide information to the North American Seasonal Fire Assessment Outlook, and the North American Drought Monitor via AAFC’s Canadian Drought Monitor. Richard represents NRCan-CFS in the CIFFC Forest and Fire Meteorology Working Group.
Global climate change has far-reaching effects on natural ecosystem and socio-economic system, and it is a hot issue that the governments and the scientific community as well as the general public pay attention to today. Meanwhile, climate change has strong regional characteristics. In the global context of climate warming, the climate change trends and intensity is not entirely consistent. Therefore, strengthening small regional climate change research plays an extremely important role on local agricultural production, livelihood and disaster prevention. On the basis of the monthly average temperature series in Guyuan meteorological station from 1957 to 2011, the temperature trends were analyzed with Mann-Kendall test and Pettitt jump test. The result with linear regression analysis showed that the annual average temperature in Guyuan City is in an increasing trend, and the average increase rate is 0.3071℃/10a. The annual highest temperature, the annual lowest temperature, and the annual average temperature in Guyuan City showed an upward trend with Mann-Kendall test. The biggest change is the annual lowest temperature, and the change rate is 0.60 ℃/10a, and then is the annual average temperature and annual highest temperature. Pettitt jump test results showed that the annual lowest temperature in Guyuan City changed in the earliest year before 1984. The annual average temperature and annual highest temperature changed nearly the year of 1993. Multiple regression analysis showed that changes of temperature in Guyuan City mainly occurred after the 1980s, and there is a significant upward trend into the 21st, which is in accordance with Pettitt jump test results.
It is presented an analysis of weather variables of Puente Campuzano's community, located at Taxco - Guerrero. The weather station is located in the Polytechnic University of Guerrero State, specifically at geographical coordinates: 18.44°, -99.58°. A descriptive statistics analysis is realized of the main variables utilized for design of greenhouse, photovoltaic systems, to implement bioclimatic architecture approaches, and other real applications. The analyzed variables are: environmental temperature, surface temperature, solar irradiance and wind velocity. Location and dispersion values were obtained of each physical parameter. Is presented the histograms of data distribution of temperature, irradiance and wind velocity, and also a summary of the main statistical characteristics of the different data distribution. All
the data distribution have a left asymmetry, and only the surface temperature has a mode of 45.50 °C. Monthly irradiance mean is 634.02 W/m2
, the mean wind velocity is 2.75 m/s, and the mean of environmental and surface temperature are: 26.67 °C y 31.30 °C, respectively.
Data Science - Part X - Time Series ForecastingDerek Kane
This lecture provides an overview of Time Series forecasting techniques and the process of creating effective forecasts. We will go through some of the popular statistical methods including time series decomposition, exponential smoothing, Holt-Winters, ARIMA, and GLM Models. These topics will be discussed in detail and we will go through the calibration and diagnostics effective time series models on a number of diverse datasets.
Loan Project Buying a House For this assignment, you will analy.docxSHIVA101531
Loan Project: Buying a House
For this assignment, you will analyze a home mortgage loan.
1. Find a description, asking price, and real estate taxes of a house for sale, and decide on a purchase price you would be willing to pay (assuming you have the means). Find a current market interest rate for a 30-year fixed-rate mortgage having a down payment of 20 percent of the purchase price.
2. Compute the down payment, amount financed, and the monthly mortgage payment (showing how to use the appropriate financial formula).
3. Compute the monthly amount of real estate taxes and add to the monthly mortgage payment to get the total monthly amount paid.
4. Suppose that in order to qualify for the loan, the total monthly amount paid cannot exceed 30 percent of monthly income. What is the minimum monthly income needed to qualify for the loan? What is the minimum annual income needed? (Note: This is a simplified minimum income requirement calculation, for the purposes of this project, as it does not take into account other costs such as insurance or other loans or assets currently held.)
5. Construct an amortization table (using spreadsheet software or online resources such as http://www.bankrate.com).
6. Assume that the first payment is made in January of the current year. Find the month and year of the last payment. Find the date of the first month when the amount applied to the principal exceeds the amount of interest paid. How many of the 360 payments have been made at this point?
7. Assuming that the mortgage is held for the full 30 years, compute the total principal paid and the total interest paid.
Your report must include
· name of project and your name
· house's description, asking price, and real estate taxes, the purchase price, and the current market interest rate (include references)
· computations and answers for tasks 2, 3, and 4, amortization table for task 5, answers for task 6, and computations and answers for task 7
· conclusion (a paragraph summary describing the results you found to be particularly interesting, and why)
Additional details and discussion will be provided in WebTycho conferences.
Statistics Project
For this assignment, you will implement a project involving statistical procedures. The topic may be something that is related to your work, a hobby, or something you found interesting. If you choose, you may use the example described below.
The project report must include
· name of project and your name
· purpose of project
· data (provide the raw data used, and cite the source)—the sample size must be at least 10
· median, sample mean, range, sample variance, and sample standard deviation (show work)
· frequency distribution
· histogram
· percentage of data within one standard deviation of the mean, percentage of data within two standard deviations of the mean, percentage of data within three standard deviations of the mean (include explanation and interpretation --- do your percentages imply that t ...
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Table of Contents
Climate Change, a myth or reality?.................................................................................................................3
Introduction .......................................................................................................................................................3
Tools used................................................................................................................................................................3
Data Preperation...................................................................................................................................................3
The Data..........................................................................................................................................................3
Structure of the Data..................................................................................................................................3
Normalizing the data..................................................................................................................................4
Problems with the dataset.......................................................................................................................4
Data exploration using SQL .....................................................................................................................5
Data visualization using Tableau...........................................................................................................6
Data modeling using R...............................................................................................................................7
Summary ..................................................................................................................................................................8
Bonus : Challenges faced....................................................................................................................................8
3. Climate Change, a myth or reality?
Introduction
Climate change is viewed by some as one of the major threats to the planet. But to others it
is nothing but a hoax. The best way to understand where to stand is by analyzing climate
change data over time to see if there indeed has been a significant change. For this project,
Temperature has been recorded overtime in Cincinnati. And it is studied to view any affects
of climate change.
Tools used
Microsoft SQL Server was used to import, understand the data. R Studio was used to form
plots and models and Tableau was used for visualizations
Data Preperation
The Data
The data used for this particular project is called "Climate Change: Earth Surface
Temperature Data" The data consists of Global Land and Ocean Temperatures, Land
Temperatures by City, State and Country measured over 266 years, from 1743 to 2013. For
the purpose of this project, the dataset depicting land temperatures by City has been taken.
The original source of the data is Berkeley Earth
Structure of the Data
The structure for the dataset contains 7 columns
Variables Considered in Global Temperatures by City
d
t
AverageTemperat
ure
AverageTemperatureUncer
tainty City
Countr
y
Latitu
de
Longitu
de
1743-11-01 6.068 1.737 Ãrhu
s
Denma
rk
57.05
N
10.33
1744-04-01 5.788 3.624 Ãrhu
s
Denma
rk
57.05
N
10.33
4. The data has the following variables
Variable Description
dt The date in the format of year-month-day
AverageTemperature Average Temperature in the regoin
AverageTemperatureUncertainty The 95% confidence interval of the Average
Temperature
City The City
Country The Country
Latitude The Latitude
Longitude The Longitude
Normalizing the data
The data can be divided into the following tables and attributes
Normalized Tables
Tables Attributes
Date The Date in the form of year month day
Average
Temperature
Average Temperature in the regoin, The 95% confidence interval of
the Average Temperature
City The City, Latitude, Longitude
Country The Country
Problems with the dataset
1) All the data types are stored as strings
2) There are many Null values
3) There are duplicate entries
4) There are empty rows for average temperature and uncertainty values
5) Not all days have entries
6) For the initial years (1743 - around 1850) The measurement of temperatures and
uncertainty was probably not made with sophisticated equipment used in the later
years as there is a difference in data between the two sets
7) The maximum temperature notes is 29.8320000 and minimum is -11.0710000, which
could be the limitation to the data / mode of measurement as temperatures can clearly
be greater than 30 degrees Celcius
5. Data exploration using SQL
1) There are 3448 cities from different countries
2) There are 3239 records for Cincinnati
3) After making empty and non integer values null and converting dt to date datatype
and average temperature, average temperature uncertainty to decimal datatype the
resultant rows are as follows:
4) The dates spread over 266 years, 1743 to 2013. From 01-11-1743 to 01-09-2013
5) Checking range of temperature and uncertainty for the 12 months
6) When ordering by month it can be seen that the lowest temperature is in January and
the highest temperature is in July
7) The uncertainty reduces over the years , this could be due to increase in accuracy of
measurements over the years
8) When ordering by year it can be seen that the highest temperature was at 2012 and
lowest at 1779, hence there could be an increase in temperature from 1779 to 2012,
on a rough scale. However, there are many anomolies in the data.
Sample possible anomaly, lowest temperature year
Maximum temperature years
6. Data visualization using Tableau
The Average Temperatures over the world for the latest year
The Average Temperature increase over the years for the world
There is a clear rise in temperature over the years over the world
The Average temperature increase over the years for cincinnati
7. The anomalies in the data can be clearly seen in this graph
The average temperatures for various months for cincinnati
Data modeling using R
Plotting the correlations and distributions of the data
It can be seen that the highest correlation is between temperature uncertainty and year,
this as stated, could be due to increase in recording precision.
8. Plotting the model
The fit has an r squared is 0.05568. Which implies that the model only explains 5.6% of the
data, the rest is noise or anomolies. Hence for the city of Cincinnati, climate change is not
majorly observed, as opposed to global values. This could be primarily due to more general
data working better on the global scale. Working with global data would lead to more
accurate results.
Summary
The earth is supposed to get hotter over the years, this is known as climate change.From
this analysis, there is a clear rise in global temperature over the years. But that is not
observed correctly for Cincinnati alone. Cleaning the data by removing influential outliers,
collecting more data over more time would improve the results along with more
sophisticated equipment to measure temperatures and predict their rise. It would also
benefit the study to factor in more parameters like the carbon emmissions of a region,
energy saving policies of a region, the population of a region, Ozone levels of a region, the
azimuth and elevation at a lat/long etc. Most importantly, taking steps towards living a
greener life would help us reduce the speed at which the temperature is rising.
Bonus : Challenges faced
The biggest challenge was when the dataset I wanted to use was too big for R and SQL
executions, therefore I had to resort to using a smaller dataset(Cincinnati). I have not
figured out how to deal with a dataset that large since it was rich in information. Ultimately,
I have learnt that the best way is to cut out the data.