Weather and Climate
Data: Not Just for
Meteorologists
Margriet Groenendijk
ODSC Europe
London – 13 October 2017
Weather
Forecast
@MargrietGr
Weather
Forecast
in a
notebook
@MargrietGr
Jupyter notebooks
@MargrietGr
Data
The Weather
Company API
Get free access here:
https://ibm.com/cloud
@MargrietGr
Demo
@MargrietGr
But ...
Where does weather
data come from?
@MargrietGr
Observations and Models
@MargrietGr
Weather Data
http://www.metoffice.gov.uk/datapoint/
https://www.wunderground.com/weather/
api/
https://business.weather.com/products/th
e-weather-company-data-packages
https://climexp.knmi.nl
http://www.ecmwf.int/en/forecasts/datase
ts
@MargrietGr
But …
I want a map
@MargrietGr
netcdf binary files (*.nc)
@MargrietGr
netcdf binary files (*.nc)
@MargrietGr
from netCDF4 import Dataset, num2date
import numpy as np
cfile =
'assets/HadCRUT.4.5.0.0.median.nc'
dataset = Dataset(cfile)
print dataset.data_model
print dataset.variables
Data is here:
https://crudata.uea.ac.uk/cru/data/temperature/@MargrietGr
NETCDF4
OrderedDict([(u'latitude', <type
'netCDF4._netCDF4.Variable'>
float32 latitude(latitude)
standard_name: latitude
long_name: latitude
point_spacing: even
units: degrees_north
axis: Y
unlimited dimensions:
ts: days since 1850-1-1 00:00:00
calendar: gregorian
start_year: 1850
...
@MargrietGr
import scipy
import matplotlib
from pylab import *
from mpl_toolkits.basemap import Basemap,
addcyclic, shiftgrid, maskoceans
# define the area to plot and projection to use
m =
Basemap(llcrnrlon=-180,llcrnrlat=-
60,urcrnrlon=180,urcrnrlat=80,projection='mill')
@MargrietGr
# covert the latitude, longitude and temperatures to
raster coordinates to be plotted
t1=temperature[0,:,:]
t1,lon=addcyclic(t1,lons)
january,longitude=shiftgrid(180.,t1,lon,start=False)
x,y=np.meshgrid(longitude,lats)
px,py=m(x,y)
rcParams['font.size']=12
rcParams['figure.figsize']=[8.0, 6.0]
figure()
palette=cm.RdYlBu_r
rmin=-30.; rmax=30.
ncont=20
dc=(rmax-rmin)/ncont
vc=arange(rmin,rmax+dc,dc)
pal_norm=matplotlib.colors.Normalize(vmin = rmin, vmax
= rmax, clip = False)
m.drawcoastlines(linewidth=0.5)
m.drawmapboundary(fill_color=(1.0,1.0,1.0))
cf=m.pcolormesh(px, py, january, cmap = palette)
cbar=colorbar(cf,orientation='horizontal',
shrink=0.95)
cbar.set_label('Mean Temperature in January')
tight_layout()
@MargrietGr
This was all about measurements
What about forecasts
and predictions?
@MargrietGr
Climate Models
@MargrietGr
All parameters Atmospheric
parameters
Carbon cycle
parameters
Sulphur cycle
and oceanic parameters
Global
Temperature
anomalies
1850-2100
@MargrietGr
Climate Model Experiments
What can I use
weather data
for?
@MargrietGr
@MargrietGr
Historic
Weather Data
Build model
Predict with
Weather
Forecast Data
Example
Traffic
Collisions
and Weather
www.pexels.com/photo/blur-cars-dew-drops-125510/@MargrietGr
NYPD Traffic
Collisions
1,119,577
Collisions since July
2012
https://data.cityofnewyork.us/Public-
Safety/NYPD-Motor-Vehicle-Collisions/h9gi-
nx95
@MargrietGr
Weather
Historic
Weather
https://business.weather.com/products/
weather-data-packages
@MargrietGr
Explore Data with PixieDust
Load into Pandas DataFrame
@MargrietGr
Explore Data with PixieDust
@MargrietGr
@MargrietGr
Explore Weather Data with PixieDust
Hypothesis
The number of traffic
collisions is influenced by
the weather
@MargrietGr
Combine collisions and weather data
newdf =
pd.merge_asof(collisions.sort_values(by='Date'),
weather.sort_values(by='date'),
left_on='Date',right_on='date',tolerance=pd.
Timedelta('1h'))
@MargrietGr
Combine collisions and weather data
times = pd.DatetimeIndex(collisions.Date)
collisions1 =
collisions.groupby(
[times.year,times.month,times.day,times.hour]).
sum()
collisions1['Hour']=collisions1.index
@MargrietGr
Combine collisions and weather data
@MargrietGr
Analysis with scikit-learn
@MargrietGr
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
np_scaled =
min_max_scaler.fit_transform(X_collisions)
X_normalized =
pd.DataFrame(np_scaled)X_normalized.head()
Simple linear regression
@MargrietGr
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X_normalized, y_collisions,
random_state=42)
lr = LinearRegression(fit_intercept=True,
normalize=False)
lr.fit(X_train, y_train)
Training set score: 0.14
Test set score: 0.13
Random Forest
@MargrietGr
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators=5,
random_state=2)
forest.fit(X_train, y_train)
Accuracy on training set: 0.948
Accuracy on test set: 0.041
@MargrietGr
Tree feature importances
Additional Data Needed
Speed limit
Number of lanes
Traffic volume
Population density
Road quality
Usage by trucks and buses
Pavement rating
Width
https://data.cityofnewyork.us/Transportation/Street-
Pavement-Rating/2cav-chmn
For each collision find
nearest road based on
Latitude and Longitude
@MargrietGr
Hope you will now
consider weather and
climate data in your data
science projects!
@MargrietGr
@MargrietGr
Historic
Weather Data
Build model
Predict with
Weather
Forecast Data
Thank you!
Margriet Groenendijk, PhD
Data Scientist
Developer Advocate
mgroenen@uk.ibm.com
@MargrietGr
Slides
https://www.slideshare.net/MargrietGr
oenendijk/presentations
Blog
https://medium.com/ibm-watson-
data-lab
IBM Data Science Experience
https://datascience.ibm.com
PixieDust
https://ibm-watson-data-
lab.github.io/pixiedust/
Notebooks
https://github.com/ibm-watson-data-
lab/python-notebooks
IBM Cloud
https://ibm.com/cloud
@MargrietGr
ODSC Europe: Weather and Climate Data: Not Just for Meteorologists

ODSC Europe: Weather and Climate Data: Not Just for Meteorologists