xray at SciPy 2015

●
●
●
●
●
latitude
longitude
time

DataArray
○
○
○
○
Dataset
○ DataArray
○

time
longitude
latitude
land_coverelevation

>>> ds
<xray.Dataset>
Dimensions: (time: 10, latitude: 8, longitude: 8)
Coordinates:
* time (time) datetime64 2015-01-01 2015-01-02 2015-01-03 2015-01-04 ...
* latitude (latitude) float64 50.0 47.5 45.0 42.5 40.0 37.5 35.0 32.5
* longitude (longitude) float64 -105.0 -102.5 -100.0 -97.5 -95.0 -92.5 ...
elevation (longitude, latitude) int64 201 231 582 239 1848 1004 1004 ...
land_cover (longitude, latitude) object 'forest' 'urban' 'farmland'...
Data variables:
temperature (time, longitude, latitude) float64 13.7 8.031 18.36 24.95 ...
pressure (time, longitude, latitude) float64 1.374 1.142 1.388 0.9992 ...

# numpy style
ds.temperature[0, :, :]
# pandas style
ds.temperature.loc[:, -90, 50]
# with dimension names
ds.sel(time='2015-01-01')
ds.sel(longitude=-90, latitude=50, method='nearest')

# math
(10 + ds) ** 0.5
ds.temperature + ds.pressure
np.sin(ds.temperature)
# aggregation
ds.mean(dim='time')
ds.max(dim=['latitude', 'longitude'])

time
time
space
space
+ =
Result has the union of all dimension names

year
+ =
Result has the intersection of coordinate labels
2000
2001
2002
2003
2004
2005
2006
2007
2008
X
X
X
X
X

# average by calendar month
ds.groupby('time.month').mean('time')
# resample to every 10 days
ds.resample('10D', dim='time', how='max')

# xray -> numpy
ds.temperature.values
# xray -> pandas
ds.to_dataframe()
# pandas -> xray
xray.Dataset.from_dataframe(df)

>>> ds = xray.open_mfdataset('/Users/shoyer/data/era-interim/*.nc')
>>> ds
<xray.Dataset>
Dimensions: (latitude: 256, longitude: 512, time: 52596)
Coordinates:
* latitude (latitude) float32 89.4628 88.767 88.067 87.3661 86.6648 ...
* longitude (longitude) float32 0.0 0.703125 1.40625 2.10938 2.8125 ...
* time (time) datetime64[ns] 1979-01-01 1979-01-01T06:00:00 ...
Data variables:
t2m (time, latitude, longitude) float64 240.6 240.6 240.6 ...
>>> ds.nbytes * (2 ** -30)
51.363675981760025

ds_by_season = ds.groupby('time.season').mean('time')
t2m_range = abs(ds_by_season.sel(season='JJA')
- ds_by_season.sel(season='DJF')).t2m
%time result = t2m_range.load()
CPU times: user 2min 1s, sys: 49.5 s, total: 2min 51s
Wall time: 38.6 s

More details: continuum.io/blog/xray-dask

pandas: indexing, factorize
NumPy: arrays
netCDF4, h5py, SciPy: IO
dask.array: out of core arrays

xray at SciPy 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (6)

Similar to xray at SciPy 2015

Similar to xray at SciPy 2015 (20)

Recently uploaded

Recently uploaded (20)

xray at SciPy 2015