Improving access to geospatial Big Data in the hydrology domain

1. Improving access to geospatial Big Data in the hydrology domain Claudia Vitolo1,2 and Wouter Buytaert1 1 Imperial College London 2 Brunel University London Big Data and Spatial Analytics - Business and Industrial Section Royal Statistical Society, London, UK - 18.11.2015

2. Outline 1. Background 2. Open Data and access approaches 3. Demo 4. Conclusions

3. 1. Background

4. What is Hydrology? Hydrology is the scientific study of the movement, distribution, and quality of water on Earth. Source: Hydrology. In Wikipedia, The Free Encyclopedia.

5. What do (river) hydrologists do? ▣ Collect data on climate, soil, geology, topography, etc. ▣ Setup model ▣ Calibrate model with observed water levels and stream flows □ locations □ time intervals ▣ Use models to analyse scenarios and make predictions

6. Big Data in Hydrology Information: ▣ Topography & bathymetry ▣ Geology ▣ Soil & Moisture ▣ Land cover ▣ Weather & Climate ▣ Hydrometry ▣ Quality samples ▣ Groundwater ▣ Infrastructures Format: ▣ Plain text ▣ Raster ▣ Vector ▣ Binary ▣ Markup Languages ▣ Graphs & networks ▣ Cad drawings

10. Big Data challenges: ▣ Get large volume of heterogeneous data ▣ Mash-up information and use it to make decisions

11. 2. Open Data and data access approaches

12. Open Data “Open data and content can be freely used, modified, and shared by anyone for any purpose” Source: http://opendefinition.org/

17. The National River Flow Archive (NRFA) River flow data from gauging station networks across the UK including networks operated by: ● Environment Agency (England), ● Natural Resources Wales, ● Scottish Environment Protection Agency, ● Rivers Agency (Northern Ireland). http://nrfa.ceh.ac.uk/

18. GUI PROS: simple and intuitive CONS: not scalable, not flexible Point & click (GUI) vs programmatic (API) data retrieval API PROS: scalable, fast and flexible CONS: requires programming skills

19. Application Programming Interface SERVER USER/CLIENT API

20. The NRFA’s API ▣ metadata catalogue, ▣ catalogue filters, ▣ time series of gauged daily data, ▣ time series of catchment monthly rainfall.

21. How does an API work? server/format/service?X=1&Y=2&Z=3

22. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION A: How do I get information on station “18019” from the NRFA catalogue?

23. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION A: How do I get information on station “18019” from the NRFA catalogue? ANSWER: nrfaapps.ceh.ac.uk/nrfa/json/stationSummary?db=nrfa_public&stn=18019

24. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION B: How do I get the time series of gauged daily data for station “18019”?

25. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION B: How do I get the time series of gauged daily data for station “18019”? ANSWER: nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=18019&dt=gdf

26. From machine-readable to human- readable formats JSON XML Plain text

27. R libraries to interface APIs ▣ raincpc: download and process the Climate Prediction Center's (CPC) daily rainfall data ▣ rnoaa: an interface to NOAA Climate data API ▣ soilDB: read data from USDA-NCSS soil databases. ▣ waterData: retrieve, analyse, and calculate anomalies of daily hydrologic time series data. ▣ rnrfa: an interface to the UK National River Flow Archive data API.

28. 3. Demo

29. The R package RNRFA API interface: ▣ make request ▣ parse response ▣ retrieve and filter metadata catalogue ▣ get time series of gauged daily data and catchment monthly rainfall API interface + external libraries: ▣ make maps ▣ create interactive tables and plots ▣ simplify and speed up reporting!

30. Example of dynamic report ▣ Find all the stations operated by National Resources Wales ▣ Retrieve time series of daily flows ▣ Run a basic analysis ▣ Create interactive plot, table and map

31. 4. Conclusions

32. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R

39. Thanks! Any questions? Claudia Vitolo Twitter: @clavitolo Email: claudia.vitolo@gmail.com Blog: http://claudiavitolo.com/

Improving access to geospatial Big Data in the hydrology domain

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Improving access to geospatial Big Data in the hydrology domain

Similar to Improving access to geospatial Big Data in the hydrology domain (20)

Recently uploaded

Recently uploaded (20)

Improving access to geospatial Big Data in the hydrology domain