SlideShare a Scribd company logo
1 of 32
COMP41680 - Sample API Assignment¶
In [5]:
import os
import urllib.request
import csv
import pandas as pd
Task 1: Identify one or more suitable web APIs¶
API Chosen:
A single API that was chosen for this assignment was that
provided by www.worldweatheronline.com
Specifically, the historic weather data API -
http://developer.worldweatheronline.com/api/docs/historical-
weather-api.aspx
The API is no longer freely available but they give out a free 60
day trial upon signing up, this entitles the user to 500 calls to
the API per day.
The API key I received which works here is
fbaf429501ff4c7f92b8463217d103
In [3]:
api_key = "fbaf429501ff4c7f92b8463217d103"
Task 2: Collect data your chosen API(s)¶
Collecting Raw Data - Functions needed:
The following 3 functions were written to allow multiple calls
of the API as only limited data is available per call.
These function are commented throughout and are called below:
In [4]:
#create a file with set headings - 2 diff types of data to store
def create_file (file_loc, headings):
with open(file_loc, "w",newline='') as write_file: #as in
get_and_write_data function
f = csv.writer(write_file)
f.writerow(headings)
write_file.close()
#function to call the API, retreive the raw csv data, and write to
a file
def get_and_write_data(link, file_loc):
response = urllib.request.urlopen(link)
html = response.read().decode()
with open(file_loc, "a",newline='') as write_file: #open the
file / create it, newline ='' to prevent blank lines being written
f = csv.writer(write_file)
lines = html.strip().split("n")
for l in lines:
if l[0] =="#": # prevent it from writing the comments in
the return of each API call
continue
elif l[0:10] in ["Not Availa", "There is n"]: #prevent it
from writing lines where no data is present (i.e. returns saying
- "Not Available" or "There is no weather data available for the
date provided. Past data is available from 1 July, 2008 onwards
only.")
continue
else: #if doesn't have those it is data and so should be
written
l = l.split(",") #it comes in as a String, so convert to
a list for later easier writing and manipulation
f.writerow(l)
#print ("Line Written")
write_file.close()
#return print ("Monthly Data Appending to Raw File -
Completed")
# function to take in parameters set and then use this data to
build a link
# to be passed into the get_and_write_data function
def get_raw_data(file_loc, api_key, location, year, month):
#month needs to be a string to avoid invalid token errors for
ints as the API needs a leading 0 for single digit months
while year <=2016: #iterate for all years available in api,
namely July 2008 to date
if month == "02": #need to change end date in the call to
the API as it doesn't return full values if the date doesn't exist,
e.g. 31st of February
end_day = "28"
elif month in ["04", "06", "09", "11"]:
end_day = "30"
else:
end_day = "31"
# the bulding of the link is what decides the data returned,
it's available in hourly intervals,
# for any location in different formats, the documentation
below outlines the possibilities
#
http://developer.worldweatheronline.com/api/docs/historical-
weather-api.aspx
link =
"http://api.worldweatheronline.com/premium/v1/past-
weather.ashx?key="+ api_key + "&q="+ location
+"&format=csv&date="+ str(year) + "-"+ month +"-
01&enddate="+ str(year) + "-"+ month +"-"+ str(end_day)
+"&tp=24"#
get_and_write_data(link, file_loc)
year = year+1
Task 3: Parse the collected data, and store it in an appropriate
file format¶
Collecting Raw Data and writing raw data to CSV:
The following code retreives the raw data using the above
Functions from the API and writes it to a CSV file.
This data needs extensive cleaning and manipulation before it
can be used.
In [98]:
###Set Variable get the raw data from the API and store in the
File location set here
location = "Dublin"
raw_file_loc = "weather-data-raw.csv"
create_file (raw_file_loc, " ") # create a file with no headings to
store the raw data, no headings needed as the data returns 2
distinct CSV lines with different # of columns
# the api only returns 1 month worth of data at a time
# so a loop to iterate over all months beginning at Jan is needed
# the API needs the month in 0x format, therefore months 1-9
need to have a 0 added to the front,
# therefore the conversion between int and str was necessary
here and a string is passed through to the funtion
month = 1
while month <= 12:
if month <10:
month = "0" + str(month)
else:
month = str(month)
get_raw_data(raw_file_loc, api_key, "location", 2008,
month)
month = int(month)+1
print("Raw Data Collection Completed n")
Begin Raw Data Collection
Raw Data Collection Completed
Task 4: Load and represent the data using an appropriate data
structure. Apply any pre-processing steps to
clean/filter/combine the data¶
Parsing Raw Data:
The raw data returns alternating lines of values, 8 CSVs for
each day, and 24 columns of "Hourly data" which is also a daily
average as the call to the API has been configured.
These need to be parsed and the data that is to be used later
saved, while only 1 data set was needed, I decided to keep and
write both sets to different files, for future proofing.
In [ ]:
hourly_file = "weather-data-hourly.csv"
daily_file = "weather-data-daily.csv"
#these are the headings as provided by the API documentation
hourly_headings =
["date","time","tempC","tempF","windspeedMiles","windspeed
Kmph","winddirdegree","winddir16point","weatherCode","weat
herIconUrl","weatherDesc","precipMM","humidity","visibilityK
m","pressureMB","cloudcover","HeatIndexC","HeatIndexF","De
wPointC","DewPointF","WindChillC","WindChillF","WindGust
Miles","WindGustKmph","FeelsLikeC","FeelsLikeF"]
daily_headings =
["date","maxtempC","maxtempF","mintempC","mintempF","sun
rise","sunset","moonrise","moonset"]
#call on the function to create the files as needed
create_file(hourly_file, hourly_headings)
create_file(daily_file, daily_headings)
# open the raw data and then based on the length of the line,
write to appropriate file
# the len of the lines is actually around 58-62 and over 180, so
100 was chosen for safety,
# this can be easily changed in future if the API changes
raw_data = open(raw_file_loc, "r")
lines = raw_data.readlines()
for l in lines:
# print (len(l))
if len(l) <= 100:
with open(daily_file, "a",newline='') as daily:
df = csv.writer(daily)
l = l.split(",")
df.writerow(l)
daily.close()
elif len(l) >101:
with open(hourly_file, "a",newline='') as hourly:
hf = csv.writer(hourly)
l = l.split(",")
hf.writerow(l)
hourly.close()
else:
continue
raw_data.close()
Utilising Pandas and further Data Modification With the CSV
files written these are imported using Pandas.2 columns were
chosen for analysis, namely Temperature and Precipitation for
each dayThe date field was stored as a String, so this was
converted to a Datetime to allow for time analysis.
In [5]:
hourly_data = pd.read_csv(hourly_file)
daily_data = pd.read_csv(daily_file)
#convert date string to datetime -
http://stackoverflow.com/questions/17134716/convert-
dataframe-column-type-from-string-to-datetime
pd.options.mode.chained_assignment = None # default='warn'
## suppress warning regarding A value is trying to be set on a
copy of a slice from a DataFrame. - same warning was
appearing using a For loop, the index and .loc, and that loop
took 5 minutes to run on my machine
hourly_data['date'] = pd.to_datetime(hourly_data['date']) #
removed from to_datetime {, format="YYYY-MM-DD"}
#for i in simplified_data.index:
#
simplified_data.loc[i,'date']=pd.to_datetime(simplified_data.loc
[i, 'date'])
simplified_data = hourly_data[["date", "tempC", "precipMM"]]
# extract temp and precip data for analysis and visualisation
simplified_data = simplified_data.sort_values(by=['date']) #
reorder the data by date
In [101]:
hourly_data[0:5]
Out[101]:
datetimetempCtempFwindspeedMileswindspeedKmphwinddirde
greewinddir16pointweatherCodeweatherIconUrl...HeatIndexCHe
atIndexFDewPointCDewPointFWindChillCWindChillFWindGus
tMilesWindGustKmphFeelsLikeCFeelsLikeF02009-01-
0124744111794E113http://cdn.worldweatheronline.net/images/
wsymb......744235438132043812009-01-
02246421828121ESE116http://cdn.worldweatheronline.net/imag
es/wsymb......642235133223613322009-01-
03245411423172S113http://cdn.worldweatheronline.net/images/
wsymb......541-13133771133732009-01-
04246431219314NW122http://cdn.worldweatheronline.net/imag
es/wsymb......642337337101733742009-01-
05245421728100E176http://cdn.worldweatheronline.net/images/
wsymb......5411330332642033
5 rows × 26 columns
In [102]:
simplified_data[0:5]
Out[102]:
datetempCprecipMM13292008-07-011515.813302008-07-
02169.913312008-07-031425.513322008-07-04154.613332008-
07-051637.5
Missing Data
Final Pre-Processing steps are to look for missing data to see if
further pre-processing is needed.
In [138]:
#look for missing data
simplified_data.isnull().sum() # no missing values in the
reduced dataset
Out[138]:
date 0
tempC 0
precipMM 0
dtype: int64
In [104]:
simplified_data.dtypes.value_counts()
Out[104]:
float64 1
int64 1
datetime64[ns] 1
dtype: int64
There's no Null's in the data, there's also no strings either, this
means there's therefore no values in it such as "Not Available"
or for example "No moonrise" in moonrise column, etc.
Both of these are highly indicative that all values are present.
The final Pre-processing step is to get Monthly averages to
create a reduced size data set that can be easier visualised, but
still accurate and indicative of the months rain and temperature.
In [107]:
monthly =
simplified_data.groupby([pd.Grouper(key='date',freq='M')]) #
http://stackoverflow.com/questions/32982012/grouping-
dataframe-by-custom-date
avg_month = monthly.mean() #create a new DF based on the
mean of the groupby object created above
print(avg_month[0:5])
tempC precipMM
date
2008-07-31 16.419355 8.290323
2008-08-31 17.064516 7.906452
2008-09-30 15.833333 5.186667
2008-10-31 13.032258 5.477419
2008-11-30 10.766667 3.250000
Task 5: Analyse and summarise the cleaned dataset¶
Descriptive Statistics
Initially of the Data Set containing all daily data:
In [110]:
print("nSimplified_data columnns:n" +
str(simplified_data.columns) + "n")
print("Simplified_data Descriptive Stats:n")
print(simplified_data.describe())
Simplified_data columnns:
Index(['date', 'tempC', 'precipMM'], dtype='object')
Simplified_data Descriptive Stats:
tempC precipMM
count 2740.000000 2740.000000
mean 12.941241 3.267628
std 4.517523 5.706110
min 2.000000 0.000000
25% 10.000000 0.100000
50% 13.000000 1.100000
75% 16.000000 3.800000
max 25.000000 52.400000
In [111]:
print("Descriptive Stats:n")
print(avg_month.describe())
Descriptive Stats:
tempC precipMM
count 92.000000 92.000000
mean 12.742613 3.327383
std 3.926780 2.076563
min 4.000000 0.100000
25% 9.403226 1.858871
50% 12.768817 2.784516
75% 16.108333 4.209516
max 21.354839 12.500000
As can be seen from comparing both descriptive stats, the
monthly average, seems to have removed outliers (e.g. max
precipitation 52mm), has reduced the standard deviation, but the
quartiles have remained largely the same.
Matplotlib and Pandas Graphing
In [112]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
Line Graphs and Area Plot¶
In [28]:
plt.figure()
avg_month.plot()
plt.title("Avg Monthly Temperature and Precipitation in Dublin
since July 2008n")
plt.ylabel("Temperature C | Precipitation MM")
plt.xlabel("Time")
plt.show()
<matplotlib.figure.Figure at 0x245e32cd9e8>
In [35]:
avg_month.plot.area(stacked=False)
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x245e5c3d5c0>
Basic Line Graph and Area Plot show how temp and precip
interact, as expected Temp increases and falls based on time of
year.
Precipitation doesn't seem to follow the same expected trend. It
seems the Irish reputation for never ending rain is well
deserved, although it would appear to have fallen in recent
years.Stacked Histogram¶
Shows the distribution of the data.
In [33]:
avg_month.plot.hist(stacked=True)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at
0x245e5a93860>ScatterPlots¶
Explore the data, look for patterns, outliers, etc.
In [38]:
avg_month.plot.scatter(x="tempC", y="precipMM", s=50 )
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x245e6cb6cf8>
In [114]:
plt.scatter(avg_month['tempC'], avg_month['precipMM'])
plt.show()
In [39]:
from pandas.tools.plotting import scatter_matrix
scatter_matrix(avg_month, alpha=0.2, figsize=(6, 6),
diagonal='kde')
Out[39]:
array([[<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E6D16278>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E6DAC5F8>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E6DF5F98>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E6E30D30>]], dtype=object)
In [40]:
from pandas.tools.plotting import scatter_matrix
scatter_matrix(daily_data, alpha=0.2, figsize=(6, 6),
diagonal='kde')
Out[40]:
array([[<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E6C462B0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E7606278>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E764F470>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E76859E8>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E76CEB38>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E6D7B160>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E4FA96D8>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E4F505C0>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E77666A0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E77B08D0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E77ED470>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E78354A8>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E7875C50>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E78C4278>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E78FDA90>,
<matplotlib.axes._subplots.AxesSubplot object at
0x00000245E7949CC0>]], dtype=object)Dual Axis Line
Graphs¶
In [54]:
plt.figure()
ax = avg_month.plot(secondary_y=['precipMM'])
ax.set_ylabel("Temperature C")
ax.right_ax.set_ylabel("Precipitation MM")
plt.title("Avg Monthly Temperature and Precipitation in Dublin
since July 2008n")
plt.xlabel("Time")
plt.show()
<matplotlib.figure.Figure at 0x245e8cd2208>
In [55]:
avg_month.plot(subplots=True, figsize=(6, 6));
Final Manipulation, Exploration and Visualisation¶Temperature
v Precipitation¶
For the purposes of this exploration, 2 new Data Frames were
created, grouping by Temp and Precipitation and comparing to
the mean value of the other, this data was then explored as
outlined further below
In [132]:
#x="tempC", y="precipMM"
avg_month_temp = avg_month.groupby("tempC")
temp_data = avg_month_temp.mean() #create a new DF based
on the mean of the groupby object created above
print(temp_data[4:7])
precipMM
tempC
4.000000 12.500000
5.333333 0.100000
6.322581 4.441935
6.392857 2.585714
6.903226 2.574194
In [135]:
plt.figure()
avg_month_temp.mean().plot()#secondary_y=['precipMM'])
plt.title("Avg amount (mm) of Precipitation as Temperature
increases (Dublin since July 2008)n")
plt.xlabel("Temperature - C")
plt.ylabel("Precipitation MM")
plt.show()
<matplotlib.figure.Figure at 0x245e039cf98>
In [136]:
#x="tempC", y="precipMM"
avg_month_precip = avg_month.groupby("precipMM")
precip_data = avg_month_precip.mean() #create a new DF
based on the mean of the groupby object created above
print(precip_data[0:1])
tempC
precipMM
0.100000 5.333333
0.722581 11.387097
0.733333 19.166667
In [137]:
plt.figure()
avg_month_precip.mean().plot()#secondary_y=['precipMM'])
plt.title("Avg Temperature as amount of Precipitation increases
(Dublin since July 2008)n")
plt.xlabel("Precipitation - MM")
plt.ylabel("Temperature - C")
plt.show()
<matplotlib.figure.Figure at 0x245e0386b38>
Tentative Conclusion¶
Further in-depth studies and tests could be carried out to test
the statistical significant of the results, and to incorporate other
meterological datasets. However, based on the current data,
there does not seem to be a strong relationship between level of
rain compared to temperature.
So it doesn't really matter how hot it gets in Dublin, we can still
expect rain!
COMP47670 Assignment 1: Data Collection & Preparation
Deadline: Monday 23rd March 2020
Overview:
The objective of this assignment is to collect a dataset from one
or more open web APIs of your choice, and use Python to
preprocess and analyse the collected data.
The assignment should be implemented as a single Jupyter
Notebook (not a script). Your notebook should be clearly
documented, using comments and Markdown cells to explain the
code and results.
Tasks:
For this assignment you should complete the following tasks:
1. Data identification:
· Choose at least one open web API as your data source (i.e. not
a static or pre-collected dataset). If you decide to use more than
one API, these APIs should be related in some way.
2. Data collection:
· Collect data from your API(s) using Python. Depending on the
API(s), you may need to repeat the collection process multiple
times to download sufficient data.
· Store the collected data in an appropriate file format for
subsequent analysis (e.g. JSON, XML, CSV).
3. Data preparation and analysis:
· Load and represent the data using an appropriate data structure
(i.e. records/items as rows, described by features as columns).
· Apply any preprocessing steps that might be required to clean
or filter the data before analysis. Where more than one API is
used, apply suitable data integration methods.
· Analyse, characterise, and summarise the cleaned dataset,
using tables and plots where appropriate. Clearly explain and
interpret any analysis results which are produced.
· Summarise any insights which you gained from your analysis
of the data. Suggest ideas for further analysis which could be
performed on the data in future.
Guidelines:
· The assignment should be completed individually. Any
evidence of plagiarism will result in a 0 grade.
· Submit your assignment via the COMP47670 Brightspace
page. Your submission should be in the form of a single ZIP file
containing the notebook (i.e. IPYNB file) and your data. If your
data is too large to upload, please include a smaller sample of
the data in the ZIP file.
· In the notebook please clearly state your full name and your
student number. Also provide links to the home pages for the
API(s) which you used.
· Hard deadline: Submit by the end of Monday 23rd March 2020
· 1-5 days late: 10% deduction from overall mark
· 6-10 days late: 20% deduction from overall mark
· No assignments accepted after 10 days without extenuating
circumstances approval and/or medical certificate.

More Related Content

Similar to COMP41680 - Sample API Assignment¶In [5] .docx

in C++ Design a class named Employee The class should keep .pdf
in C++ Design a class named Employee The class should keep .pdfin C++ Design a class named Employee The class should keep .pdf
in C++ Design a class named Employee The class should keep .pdf
adithyaups
 
Introduction to SQL Report tool
Introduction to SQL Report toolIntroduction to SQL Report tool
Introduction to SQL Report tool
Russell Frearson
 
I have this python code and i am trying to Build a multivari.pdf
I have this python code and i am trying to Build a multivari.pdfI have this python code and i am trying to Build a multivari.pdf
I have this python code and i am trying to Build a multivari.pdf
sukhvir71
 
PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data Processing
Michael Peacock
 
MMYERS Portfolio
MMYERS PortfolioMMYERS Portfolio
MMYERS Portfolio
Mike Myers
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure Presentation
Amin Uddin
 

Similar to COMP41680 - Sample API Assignment¶In [5] .docx (20)

Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
in C++ Design a class named Employee The class should keep .pdf
in C++ Design a class named Employee The class should keep .pdfin C++ Design a class named Employee The class should keep .pdf
in C++ Design a class named Employee The class should keep .pdf
 
Weather data meets ibm cloud. part 3 transformation and aggregation of weat...
Weather data meets ibm cloud. part 3   transformation and aggregation of weat...Weather data meets ibm cloud. part 3   transformation and aggregation of weat...
Weather data meets ibm cloud. part 3 transformation and aggregation of weat...
 
Matlab data import
Matlab data importMatlab data import
Matlab data import
 
Leveraging Quandl
Leveraging Quandl Leveraging Quandl
Leveraging Quandl
 
pio_present
pio_presentpio_present
pio_present
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Introduction to SQL Report tool
Introduction to SQL Report toolIntroduction to SQL Report tool
Introduction to SQL Report tool
 
HTML Flight Scraper
HTML Flight Scraper HTML Flight Scraper
HTML Flight Scraper
 
Composable and streamable Play apps
Composable and streamable Play appsComposable and streamable Play apps
Composable and streamable Play apps
 
I have this python code and i am trying to Build a multivari.pdf
I have this python code and i am trying to Build a multivari.pdfI have this python code and i am trying to Build a multivari.pdf
I have this python code and i am trying to Build a multivari.pdf
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Weather data meets ibm cloud. part 1 ingestion and processing of weather da...
Weather data meets ibm cloud. part 1   ingestion and processing of weather da...Weather data meets ibm cloud. part 1   ingestion and processing of weather da...
Weather data meets ibm cloud. part 1 ingestion and processing of weather da...
 
PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data Processing
 
MMYERS Portfolio
MMYERS PortfolioMMYERS Portfolio
MMYERS Portfolio
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
RESTful API In Node Js using Express
RESTful API In Node Js using Express RESTful API In Node Js using Express
RESTful API In Node Js using Express
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure Presentation
 

More from pickersgillkayne

CompetencyIn this project, you will demonstrate your mastery of .docx
CompetencyIn this project, you will demonstrate your mastery of .docxCompetencyIn this project, you will demonstrate your mastery of .docx
CompetencyIn this project, you will demonstrate your mastery of .docx
pickersgillkayne
 
CompetencyExplore advocacy opportunities in the community..docx
CompetencyExplore advocacy opportunities in the community..docxCompetencyExplore advocacy opportunities in the community..docx
CompetencyExplore advocacy opportunities in the community..docx
pickersgillkayne
 
CompetencyExamine leaderships role in executing successful change.docx
CompetencyExamine leaderships role in executing successful change.docxCompetencyExamine leaderships role in executing successful change.docx
CompetencyExamine leaderships role in executing successful change.docx
pickersgillkayne
 
CompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docx
CompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docxCompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docx
CompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docx
pickersgillkayne
 
CompetencyDevelop examples of ethical and privacy concerns a.docx
CompetencyDevelop examples of ethical and privacy concerns a.docxCompetencyDevelop examples of ethical and privacy concerns a.docx
CompetencyDevelop examples of ethical and privacy concerns a.docx
pickersgillkayne
 
CompetencyAssess the causes and consequences of historical e.docx
CompetencyAssess the causes and consequences of historical e.docxCompetencyAssess the causes and consequences of historical e.docx
CompetencyAssess the causes and consequences of historical e.docx
pickersgillkayne
 
CompetencyApply data analytic methodologies to diverse popul.docx
CompetencyApply data analytic methodologies to diverse popul.docxCompetencyApply data analytic methodologies to diverse popul.docx
CompetencyApply data analytic methodologies to diverse popul.docx
pickersgillkayne
 
CompetencyAssess the development of societal standards in re.docx
CompetencyAssess the development of societal standards in re.docxCompetencyAssess the development of societal standards in re.docx
CompetencyAssess the development of societal standards in re.docx
pickersgillkayne
 
CompetencyAnalyze the evolution of social media standards an.docx
CompetencyAnalyze the evolution of social media standards an.docxCompetencyAnalyze the evolution of social media standards an.docx
CompetencyAnalyze the evolution of social media standards an.docx
pickersgillkayne
 
CompetencyAnalyze the evolution of social media standards and .docx
CompetencyAnalyze the evolution of social media standards and .docxCompetencyAnalyze the evolution of social media standards and .docx
CompetencyAnalyze the evolution of social media standards and .docx
pickersgillkayne
 
CompetencyAnalyze leadership and management roles in change ma.docx
CompetencyAnalyze leadership and management roles in change ma.docxCompetencyAnalyze leadership and management roles in change ma.docx
CompetencyAnalyze leadership and management roles in change ma.docx
pickersgillkayne
 
Competency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docxCompetency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docx
pickersgillkayne
 
CompetenciesDetermine the historical impact of art on mode.docx
CompetenciesDetermine the historical impact of art on mode.docxCompetenciesDetermine the historical impact of art on mode.docx
CompetenciesDetermine the historical impact of art on mode.docx
pickersgillkayne
 

More from pickersgillkayne (20)

CompetencyIn this project, you will demonstrate your mastery of .docx
CompetencyIn this project, you will demonstrate your mastery of .docxCompetencyIn this project, you will demonstrate your mastery of .docx
CompetencyIn this project, you will demonstrate your mastery of .docx
 
CompetencyExplore advocacy opportunities in the community..docx
CompetencyExplore advocacy opportunities in the community..docxCompetencyExplore advocacy opportunities in the community..docx
CompetencyExplore advocacy opportunities in the community..docx
 
CompetencyEvaluate the role and impact of financial principl.docx
CompetencyEvaluate the role and impact of financial principl.docxCompetencyEvaluate the role and impact of financial principl.docx
CompetencyEvaluate the role and impact of financial principl.docx
 
CompetencyEvaluate the impact of innovation on team success..docx
CompetencyEvaluate the impact of innovation on team success..docxCompetencyEvaluate the impact of innovation on team success..docx
CompetencyEvaluate the impact of innovation on team success..docx
 
CompetencyEvaluate the role of identity, diverse segments, a.docx
CompetencyEvaluate the role of identity, diverse segments, a.docxCompetencyEvaluate the role of identity, diverse segments, a.docx
CompetencyEvaluate the role of identity, diverse segments, a.docx
 
CompetencyExamine leaderships role in executing successful change.docx
CompetencyExamine leaderships role in executing successful change.docxCompetencyExamine leaderships role in executing successful change.docx
CompetencyExamine leaderships role in executing successful change.docx
 
CompetencyEvaluate psychological theories and their insights.docx
CompetencyEvaluate psychological theories and their insights.docxCompetencyEvaluate psychological theories and their insights.docx
CompetencyEvaluate psychological theories and their insights.docx
 
CompetencyEmploy contemporary economic principles that guide.docx
CompetencyEmploy contemporary economic principles that guide.docxCompetencyEmploy contemporary economic principles that guide.docx
CompetencyEmploy contemporary economic principles that guide.docx
 
CompetencyDetermine how the environment and economies are in.docx
CompetencyDetermine how the environment and economies are in.docxCompetencyDetermine how the environment and economies are in.docx
CompetencyDetermine how the environment and economies are in.docx
 
CompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docx
CompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docxCompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docx
CompetencyDescribe the atmosphere, biosphere, hydrosphere, g.docx
 
CompetencyDevelop examples of ethical and privacy concerns a.docx
CompetencyDevelop examples of ethical and privacy concerns a.docxCompetencyDevelop examples of ethical and privacy concerns a.docx
CompetencyDevelop examples of ethical and privacy concerns a.docx
 
CompetencyAssess the causes and consequences of historical e.docx
CompetencyAssess the causes and consequences of historical e.docxCompetencyAssess the causes and consequences of historical e.docx
CompetencyAssess the causes and consequences of historical e.docx
 
CompetencyApply data analytic methodologies to diverse popul.docx
CompetencyApply data analytic methodologies to diverse popul.docxCompetencyApply data analytic methodologies to diverse popul.docx
CompetencyApply data analytic methodologies to diverse popul.docx
 
CompetencyAssess the development of societal standards in re.docx
CompetencyAssess the development of societal standards in re.docxCompetencyAssess the development of societal standards in re.docx
CompetencyAssess the development of societal standards in re.docx
 
CompetencyAnalyze the evolution of social media standards an.docx
CompetencyAnalyze the evolution of social media standards an.docxCompetencyAnalyze the evolution of social media standards an.docx
CompetencyAnalyze the evolution of social media standards an.docx
 
CompetencyAnalyze the evolution of social media standards and .docx
CompetencyAnalyze the evolution of social media standards and .docxCompetencyAnalyze the evolution of social media standards and .docx
CompetencyAnalyze the evolution of social media standards and .docx
 
CompetencyAnalyze leadership and management roles in change ma.docx
CompetencyAnalyze leadership and management roles in change ma.docxCompetencyAnalyze leadership and management roles in change ma.docx
CompetencyAnalyze leadership and management roles in change ma.docx
 
CompetencyAnalyze financial statements to assess performance a.docx
CompetencyAnalyze financial statements to assess performance a.docxCompetencyAnalyze financial statements to assess performance a.docx
CompetencyAnalyze financial statements to assess performance a.docx
 
Competency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docxCompetency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docx
 
CompetenciesDetermine the historical impact of art on mode.docx
CompetenciesDetermine the historical impact of art on mode.docxCompetenciesDetermine the historical impact of art on mode.docx
CompetenciesDetermine the historical impact of art on mode.docx
 

Recently uploaded

QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
httgc7rh9c
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 

Recently uploaded (20)

FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use Cases
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food Additives
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

COMP41680 - Sample API Assignment¶In [5] .docx

  • 1. COMP41680 - Sample API Assignment¶ In [5]: import os import urllib.request import csv import pandas as pd Task 1: Identify one or more suitable web APIs¶ API Chosen: A single API that was chosen for this assignment was that provided by www.worldweatheronline.com Specifically, the historic weather data API - http://developer.worldweatheronline.com/api/docs/historical- weather-api.aspx The API is no longer freely available but they give out a free 60 day trial upon signing up, this entitles the user to 500 calls to the API per day. The API key I received which works here is fbaf429501ff4c7f92b8463217d103
  • 2. In [3]: api_key = "fbaf429501ff4c7f92b8463217d103" Task 2: Collect data your chosen API(s)¶ Collecting Raw Data - Functions needed: The following 3 functions were written to allow multiple calls of the API as only limited data is available per call. These function are commented throughout and are called below: In [4]: #create a file with set headings - 2 diff types of data to store def create_file (file_loc, headings): with open(file_loc, "w",newline='') as write_file: #as in get_and_write_data function f = csv.writer(write_file) f.writerow(headings) write_file.close()
  • 3. #function to call the API, retreive the raw csv data, and write to a file def get_and_write_data(link, file_loc): response = urllib.request.urlopen(link) html = response.read().decode() with open(file_loc, "a",newline='') as write_file: #open the file / create it, newline ='' to prevent blank lines being written f = csv.writer(write_file) lines = html.strip().split("n") for l in lines: if l[0] =="#": # prevent it from writing the comments in the return of each API call continue elif l[0:10] in ["Not Availa", "There is n"]: #prevent it from writing lines where no data is present (i.e. returns saying - "Not Available" or "There is no weather data available for the date provided. Past data is available from 1 July, 2008 onwards only.") continue else: #if doesn't have those it is data and so should be written l = l.split(",") #it comes in as a String, so convert to a list for later easier writing and manipulation f.writerow(l) #print ("Line Written") write_file.close() #return print ("Monthly Data Appending to Raw File - Completed") # function to take in parameters set and then use this data to build a link # to be passed into the get_and_write_data function def get_raw_data(file_loc, api_key, location, year, month): #month needs to be a string to avoid invalid token errors for ints as the API needs a leading 0 for single digit months
  • 4. while year <=2016: #iterate for all years available in api, namely July 2008 to date if month == "02": #need to change end date in the call to the API as it doesn't return full values if the date doesn't exist, e.g. 31st of February end_day = "28" elif month in ["04", "06", "09", "11"]: end_day = "30" else: end_day = "31" # the bulding of the link is what decides the data returned, it's available in hourly intervals, # for any location in different formats, the documentation below outlines the possibilities # http://developer.worldweatheronline.com/api/docs/historical- weather-api.aspx link = "http://api.worldweatheronline.com/premium/v1/past- weather.ashx?key="+ api_key + "&q="+ location +"&format=csv&date="+ str(year) + "-"+ month +"- 01&enddate="+ str(year) + "-"+ month +"-"+ str(end_day) +"&tp=24"# get_and_write_data(link, file_loc) year = year+1 Task 3: Parse the collected data, and store it in an appropriate file format¶ Collecting Raw Data and writing raw data to CSV: The following code retreives the raw data using the above Functions from the API and writes it to a CSV file. This data needs extensive cleaning and manipulation before it can be used.
  • 5. In [98]: ###Set Variable get the raw data from the API and store in the File location set here location = "Dublin" raw_file_loc = "weather-data-raw.csv" create_file (raw_file_loc, " ") # create a file with no headings to store the raw data, no headings needed as the data returns 2 distinct CSV lines with different # of columns # the api only returns 1 month worth of data at a time # so a loop to iterate over all months beginning at Jan is needed # the API needs the month in 0x format, therefore months 1-9 need to have a 0 added to the front, # therefore the conversion between int and str was necessary here and a string is passed through to the funtion month = 1 while month <= 12: if month <10: month = "0" + str(month) else: month = str(month) get_raw_data(raw_file_loc, api_key, "location", 2008, month) month = int(month)+1 print("Raw Data Collection Completed n") Begin Raw Data Collection
  • 6. Raw Data Collection Completed Task 4: Load and represent the data using an appropriate data structure. Apply any pre-processing steps to clean/filter/combine the data¶ Parsing Raw Data: The raw data returns alternating lines of values, 8 CSVs for each day, and 24 columns of "Hourly data" which is also a daily average as the call to the API has been configured. These need to be parsed and the data that is to be used later saved, while only 1 data set was needed, I decided to keep and write both sets to different files, for future proofing. In [ ]: hourly_file = "weather-data-hourly.csv" daily_file = "weather-data-daily.csv" #these are the headings as provided by the API documentation hourly_headings = ["date","time","tempC","tempF","windspeedMiles","windspeed Kmph","winddirdegree","winddir16point","weatherCode","weat herIconUrl","weatherDesc","precipMM","humidity","visibilityK m","pressureMB","cloudcover","HeatIndexC","HeatIndexF","De
  • 7. wPointC","DewPointF","WindChillC","WindChillF","WindGust Miles","WindGustKmph","FeelsLikeC","FeelsLikeF"] daily_headings = ["date","maxtempC","maxtempF","mintempC","mintempF","sun rise","sunset","moonrise","moonset"] #call on the function to create the files as needed create_file(hourly_file, hourly_headings) create_file(daily_file, daily_headings) # open the raw data and then based on the length of the line, write to appropriate file # the len of the lines is actually around 58-62 and over 180, so 100 was chosen for safety, # this can be easily changed in future if the API changes raw_data = open(raw_file_loc, "r") lines = raw_data.readlines() for l in lines: # print (len(l)) if len(l) <= 100: with open(daily_file, "a",newline='') as daily: df = csv.writer(daily) l = l.split(",") df.writerow(l) daily.close() elif len(l) >101: with open(hourly_file, "a",newline='') as hourly: hf = csv.writer(hourly) l = l.split(",") hf.writerow(l) hourly.close() else: continue raw_data.close() Utilising Pandas and further Data Modification With the CSV
  • 8. files written these are imported using Pandas.2 columns were chosen for analysis, namely Temperature and Precipitation for each dayThe date field was stored as a String, so this was converted to a Datetime to allow for time analysis. In [5]: hourly_data = pd.read_csv(hourly_file) daily_data = pd.read_csv(daily_file) #convert date string to datetime - http://stackoverflow.com/questions/17134716/convert- dataframe-column-type-from-string-to-datetime pd.options.mode.chained_assignment = None # default='warn' ## suppress warning regarding A value is trying to be set on a copy of a slice from a DataFrame. - same warning was appearing using a For loop, the index and .loc, and that loop took 5 minutes to run on my machine hourly_data['date'] = pd.to_datetime(hourly_data['date']) # removed from to_datetime {, format="YYYY-MM-DD"} #for i in simplified_data.index: # simplified_data.loc[i,'date']=pd.to_datetime(simplified_data.loc
  • 9. [i, 'date']) simplified_data = hourly_data[["date", "tempC", "precipMM"]] # extract temp and precip data for analysis and visualisation simplified_data = simplified_data.sort_values(by=['date']) # reorder the data by date In [101]: hourly_data[0:5]
  • 12. Final Pre-Processing steps are to look for missing data to see if further pre-processing is needed. In [138]: #look for missing data simplified_data.isnull().sum() # no missing values in the reduced dataset Out[138]:
  • 13. date 0 tempC 0 precipMM 0 dtype: int64 In [104]: simplified_data.dtypes.value_counts()
  • 14. Out[104]: float64 1 int64 1 datetime64[ns] 1 dtype: int64 There's no Null's in the data, there's also no strings either, this means there's therefore no values in it such as "Not Available" or for example "No moonrise" in moonrise column, etc. Both of these are highly indicative that all values are present. The final Pre-processing step is to get Monthly averages to create a reduced size data set that can be easier visualised, but still accurate and indicative of the months rain and temperature. In [107]:
  • 15. monthly = simplified_data.groupby([pd.Grouper(key='date',freq='M')]) # http://stackoverflow.com/questions/32982012/grouping- dataframe-by-custom-date avg_month = monthly.mean() #create a new DF based on the mean of the groupby object created above print(avg_month[0:5]) tempC precipMM date 2008-07-31 16.419355 8.290323 2008-08-31 17.064516 7.906452 2008-09-30 15.833333 5.186667 2008-10-31 13.032258 5.477419 2008-11-30 10.766667 3.250000 Task 5: Analyse and summarise the cleaned dataset¶ Descriptive Statistics Initially of the Data Set containing all daily data: In [110]: print("nSimplified_data columnns:n" + str(simplified_data.columns) + "n") print("Simplified_data Descriptive Stats:n") print(simplified_data.describe())
  • 16. Simplified_data columnns: Index(['date', 'tempC', 'precipMM'], dtype='object') Simplified_data Descriptive Stats: tempC precipMM count 2740.000000 2740.000000 mean 12.941241 3.267628 std 4.517523 5.706110 min 2.000000 0.000000 25% 10.000000 0.100000 50% 13.000000 1.100000 75% 16.000000 3.800000 max 25.000000 52.400000 In [111]: print("Descriptive Stats:n") print(avg_month.describe())
  • 17. Descriptive Stats: tempC precipMM count 92.000000 92.000000 mean 12.742613 3.327383 std 3.926780 2.076563 min 4.000000 0.100000 25% 9.403226 1.858871 50% 12.768817 2.784516 75% 16.108333 4.209516 max 21.354839 12.500000 As can be seen from comparing both descriptive stats, the monthly average, seems to have removed outliers (e.g. max precipitation 52mm), has reduced the standard deviation, but the quartiles have remained largely the same. Matplotlib and Pandas Graphing In [112]: import matplotlib import matplotlib.pyplot as plt %matplotlib inline Line Graphs and Area Plot¶
  • 18. In [28]: plt.figure() avg_month.plot() plt.title("Avg Monthly Temperature and Precipitation in Dublin since July 2008n") plt.ylabel("Temperature C | Precipitation MM") plt.xlabel("Time") plt.show() <matplotlib.figure.Figure at 0x245e32cd9e8>
  • 19. In [35]: avg_month.plot.area(stacked=False) Out[35]: <matplotlib.axes._subplots.AxesSubplot at 0x245e5c3d5c0> Basic Line Graph and Area Plot show how temp and precip interact, as expected Temp increases and falls based on time of year. Precipitation doesn't seem to follow the same expected trend. It seems the Irish reputation for never ending rain is well deserved, although it would appear to have fallen in recent years.Stacked Histogram¶ Shows the distribution of the data.
  • 21. In [38]: avg_month.plot.scatter(x="tempC", y="precipMM", s=50 ) Out[38]: <matplotlib.axes._subplots.AxesSubplot at 0x245e6cb6cf8>
  • 23. In [39]: from pandas.tools.plotting import scatter_matrix scatter_matrix(avg_month, alpha=0.2, figsize=(6, 6), diagonal='kde') Out[39]: array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000245E6D16278>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E6DAC5F8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x00000245E6DF5F98>,
  • 24. <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E6E30D30>]], dtype=object) In [40]: from pandas.tools.plotting import scatter_matrix scatter_matrix(daily_data, alpha=0.2, figsize=(6, 6), diagonal='kde')
  • 25. Out[40]: array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000245E6C462B0>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E7606278>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E764F470>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E76859E8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x00000245E76CEB38>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E6D7B160>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E4FA96D8>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E4F505C0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x00000245E77666A0>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E77B08D0>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E77ED470>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E78354A8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x00000245E7875C50>, <matplotlib.axes._subplots.AxesSubplot object at
  • 26. 0x00000245E78C4278>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E78FDA90>, <matplotlib.axes._subplots.AxesSubplot object at 0x00000245E7949CC0>]], dtype=object)Dual Axis Line Graphs¶ In [54]: plt.figure() ax = avg_month.plot(secondary_y=['precipMM']) ax.set_ylabel("Temperature C") ax.right_ax.set_ylabel("Precipitation MM") plt.title("Avg Monthly Temperature and Precipitation in Dublin since July 2008n") plt.xlabel("Time") plt.show() <matplotlib.figure.Figure at 0x245e8cd2208>
  • 27. In [55]: avg_month.plot(subplots=True, figsize=(6, 6)); Final Manipulation, Exploration and Visualisation¶Temperature v Precipitation¶ For the purposes of this exploration, 2 new Data Frames were created, grouping by Temp and Precipitation and comparing to the mean value of the other, this data was then explored as outlined further below
  • 28. In [132]: #x="tempC", y="precipMM" avg_month_temp = avg_month.groupby("tempC") temp_data = avg_month_temp.mean() #create a new DF based on the mean of the groupby object created above print(temp_data[4:7]) precipMM tempC 4.000000 12.500000 5.333333 0.100000 6.322581 4.441935 6.392857 2.585714 6.903226 2.574194 In [135]:
  • 29. plt.figure() avg_month_temp.mean().plot()#secondary_y=['precipMM']) plt.title("Avg amount (mm) of Precipitation as Temperature increases (Dublin since July 2008)n") plt.xlabel("Temperature - C") plt.ylabel("Precipitation MM") plt.show() <matplotlib.figure.Figure at 0x245e039cf98> In [136]: #x="tempC", y="precipMM" avg_month_precip = avg_month.groupby("precipMM") precip_data = avg_month_precip.mean() #create a new DF based on the mean of the groupby object created above print(precip_data[0:1]) tempC precipMM 0.100000 5.333333
  • 30. 0.722581 11.387097 0.733333 19.166667 In [137]: plt.figure() avg_month_precip.mean().plot()#secondary_y=['precipMM']) plt.title("Avg Temperature as amount of Precipitation increases (Dublin since July 2008)n") plt.xlabel("Precipitation - MM") plt.ylabel("Temperature - C") plt.show() <matplotlib.figure.Figure at 0x245e0386b38> Tentative Conclusion¶ Further in-depth studies and tests could be carried out to test the statistical significant of the results, and to incorporate other meterological datasets. However, based on the current data, there does not seem to be a strong relationship between level of
  • 31. rain compared to temperature. So it doesn't really matter how hot it gets in Dublin, we can still expect rain! COMP47670 Assignment 1: Data Collection & Preparation Deadline: Monday 23rd March 2020 Overview: The objective of this assignment is to collect a dataset from one or more open web APIs of your choice, and use Python to preprocess and analyse the collected data. The assignment should be implemented as a single Jupyter Notebook (not a script). Your notebook should be clearly documented, using comments and Markdown cells to explain the code and results. Tasks: For this assignment you should complete the following tasks: 1. Data identification: · Choose at least one open web API as your data source (i.e. not a static or pre-collected dataset). If you decide to use more than one API, these APIs should be related in some way. 2. Data collection: · Collect data from your API(s) using Python. Depending on the API(s), you may need to repeat the collection process multiple times to download sufficient data. · Store the collected data in an appropriate file format for subsequent analysis (e.g. JSON, XML, CSV). 3. Data preparation and analysis: · Load and represent the data using an appropriate data structure (i.e. records/items as rows, described by features as columns). · Apply any preprocessing steps that might be required to clean or filter the data before analysis. Where more than one API is used, apply suitable data integration methods. · Analyse, characterise, and summarise the cleaned dataset, using tables and plots where appropriate. Clearly explain and interpret any analysis results which are produced.
  • 32. · Summarise any insights which you gained from your analysis of the data. Suggest ideas for further analysis which could be performed on the data in future. Guidelines: · The assignment should be completed individually. Any evidence of plagiarism will result in a 0 grade. · Submit your assignment via the COMP47670 Brightspace page. Your submission should be in the form of a single ZIP file containing the notebook (i.e. IPYNB file) and your data. If your data is too large to upload, please include a smaller sample of the data in the ZIP file. · In the notebook please clearly state your full name and your student number. Also provide links to the home pages for the API(s) which you used. · Hard deadline: Submit by the end of Monday 23rd March 2020 · 1-5 days late: 10% deduction from overall mark · 6-10 days late: 20% deduction from overall mark · No assignments accepted after 10 days without extenuating circumstances approval and/or medical certificate.