Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, and what is it made of? The answer is a mix of measurements, models and statistics, meaning that the use of weather and climate data can get complex very quickly.
These slides provide a brief overview of the science behind weather and climate forecasts and provides you with the tools to get started with weather data - even if you aren't a meteorologist. Learn how to connect weather data to other data sources, how to visualize weather and climate data in an interactive weather dashboard embedded in a Python notebook, and other ways you can use weather data for yourself, from examples using weather APIs, maps, PixieDust and Machine Learning.
15. import scipy
import matplotlib
from pylab import *
from mpl_toolkits.basemap import Basemap,
addcyclic, shiftgrid, maskoceans
# define the area to plot and projection to use
m =
Basemap(llcrnrlon=-180,llcrnrlat=-
60,urcrnrlon=180,urcrnrlat=80,projection='mill')
@MargrietGr
# covert the latitude, longitude and temperatures to
raster coordinates to be plotted
t1=temperature[0,:,:]
t1,lon=addcyclic(t1,lons)
january,longitude=shiftgrid(180.,t1,lon,start=False)
x,y=np.meshgrid(longitude,lats)
px,py=m(x,y)
33. Analysis with scikit-learn
@MargrietGr
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
np_scaled =
min_max_scaler.fit_transform(X_collisions)
X_normalized =
pd.DataFrame(np_scaled)X_normalized.head()
34. Simple linear regression
@MargrietGr
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X_normalized, y_collisions,
random_state=42)
lr = LinearRegression(fit_intercept=True,
normalize=False)
lr.fit(X_train, y_train)
Training set score: 0.14
Test set score: 0.13
35. Random Forest
@MargrietGr
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators=5,
random_state=2)
forest.fit(X_train, y_train)
Accuracy on training set: 0.948
Accuracy on test set: 0.041
37. Additional Data Needed
Speed limit
Number of lanes
Traffic volume
Population density
Road quality
Usage by trucks and buses
Pavement rating
Width
https://data.cityofnewyork.us/Transportation/Street-
Pavement-Rating/2cav-chmn
For each collision find
nearest road based on
Latitude and Longitude
@MargrietGr
38. Hope you will now
consider weather and
climate data in your data
science projects!
@MargrietGr