Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving access to geospatial Big Data in the hydrology domain

496 views

Published on

Big Data and Spatial Analytics - Business and Industrial Section
Royal Statistical Society, London, UK - 18.11.2015

Published in: Science
  • Be the first to comment

  • Be the first to like this

Improving access to geospatial Big Data in the hydrology domain

  1. 1. Improving access to geospatial Big Data in the hydrology domain Claudia Vitolo1,2 and Wouter Buytaert1 1 Imperial College London 2 Brunel University London Big Data and Spatial Analytics - Business and Industrial Section Royal Statistical Society, London, UK - 18.11.2015
  2. 2. Outline 1. Background 2. Open Data and access approaches 3. Demo 4. Conclusions
  3. 3. 1. Background
  4. 4. What is Hydrology? Hydrology is the scientific study of the movement, distribution, and quality of water on Earth. Source: Hydrology. In Wikipedia, The Free Encyclopedia.
  5. 5. What do (river) hydrologists do? ▣ Collect data on climate, soil, geology, topography, etc. ▣ Setup model ▣ Calibrate model with observed water levels and stream flows □ locations □ time intervals ▣ Use models to analyse scenarios and make predictions
  6. 6. Big Data in Hydrology Information: ▣ Topography & bathymetry ▣ Geology ▣ Soil & Moisture ▣ Land cover ▣ Weather & Climate ▣ Hydrometry ▣ Quality samples ▣ Groundwater ▣ Infrastructures Format: ▣ Plain text ▣ Raster ▣ Vector ▣ Binary ▣ Markup Languages ▣ Graphs & networks ▣ Cad drawings
  7. 7. Big Data in Hydrology Information: ▣ Topography & bathymetry ▣ Geology ▣ Soil & Moisture ▣ Land cover ▣ Weather & Climate ▣ Hydrometry ▣ Quality samples ▣ Groundwater ▣ Infrastructures Format: ▣ Plain text ▣ Raster ▣ Vector ▣ Binary ▣ Markup Languages ▣ Graphs & networks ▣ Cad drawings
  8. 8. Big Data in Hydrology Information: ▣ Topography & bathymetry ▣ Geology ▣ Soil & Moisture ▣ Land cover ▣ Weather & Climate ▣ Hydrometry ▣ Quality samples ▣ Groundwater ▣ Infrastructures Format: ▣ Plain text ▣ Raster ▣ Vector ▣ Binary ▣ Markup Languages ▣ Graphs & networks ▣ Cad drawings
  9. 9. Big Data in Hydrology Information: ▣ Topography & bathymetry ▣ Geology ▣ Soil & Moisture ▣ Land cover ▣ Weather & Climate ▣ Hydrometry ▣ Quality samples ▣ Groundwater ▣ Infrastructures Format: ▣ Plain text ▣ Raster ▣ Vector ▣ Binary ▣ Markup Languages ▣ Graphs & networks ▣ Cad drawings
  10. 10. Big Data challenges: ▣ Get large volume of heterogeneous data ▣ Mash-up information and use it to make decisions
  11. 11. 2. Open Data and data access approaches
  12. 12. Open Data “Open data and content can be freely used, modified, and shared by anyone for any purpose” Source: http://opendefinition.org/
  13. 13. Open Data “Open data and content can be freely used, modified, and shared by anyone for any purpose” Source: http://opendefinition.org/
  14. 14. Open Data “Open data and content can be freely used, modified, and shared by anyone for any purpose” Source: http://opendefinition.org/
  15. 15. Open Data “Open data and content can be freely used, modified, and shared by anyone for any purpose” Source: http://opendefinition.org/
  16. 16. Open Data “Open data and content can be freely used, modified, and shared by anyone for any purpose” Source: http://opendefinition.org/
  17. 17. The National River Flow Archive (NRFA) River flow data from gauging station networks across the UK including networks operated by: ● Environment Agency (England), ● Natural Resources Wales, ● Scottish Environment Protection Agency, ● Rivers Agency (Northern Ireland). http://nrfa.ceh.ac.uk/
  18. 18. GUI PROS: simple and intuitive CONS: not scalable, not flexible Point & click (GUI) vs programmatic (API) data retrieval API PROS: scalable, fast and flexible CONS: requires programming skills
  19. 19. Application Programming Interface SERVER USER/CLIENT API
  20. 20. The NRFA’s API ▣ metadata catalogue, ▣ catalogue filters, ▣ time series of gauged daily data, ▣ time series of catchment monthly rainfall.
  21. 21. How does an API work? server/format/service?X=1&Y=2&Z=3
  22. 22. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION A: How do I get information on station “18019” from the NRFA catalogue?
  23. 23. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION A: How do I get information on station “18019” from the NRFA catalogue? ANSWER: nrfaapps.ceh.ac.uk/nrfa/json/stationSummary?db=nrfa_public&stn=18019
  24. 24. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION B: How do I get the time series of gauged daily data for station “18019”?
  25. 25. How does an API work? server/format/service?X=1&Y=2&Z=3 QUESTION B: How do I get the time series of gauged daily data for station “18019”? ANSWER: nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=18019&dt=gdf
  26. 26. From machine-readable to human- readable formats JSON XML Plain text
  27. 27. R libraries to interface APIs ▣ raincpc: download and process the Climate Prediction Center's (CPC) daily rainfall data ▣ rnoaa: an interface to NOAA Climate data API ▣ soilDB: read data from USDA-NCSS soil databases. ▣ waterData: retrieve, analyse, and calculate anomalies of daily hydrologic time series data. ▣ rnrfa: an interface to the UK National River Flow Archive data API.
  28. 28. 3. Demo
  29. 29. The R package RNRFA API interface: ▣ make request ▣ parse response ▣ retrieve and filter metadata catalogue ▣ get time series of gauged daily data and catchment monthly rainfall API interface + external libraries: ▣ make maps ▣ create interactive tables and plots ▣ simplify and speed up reporting!
  30. 30. Example of dynamic report ▣ Find all the stations operated by National Resources Wales ▣ Retrieve time series of daily flows ▣ Run a basic analysis ▣ Create interactive plot, table and map
  31. 31. 4. Conclusions
  32. 32. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  33. 33. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  34. 34. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  35. 35. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  36. 36. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  37. 37. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  38. 38. Summary Big Data Large volumes of heterogeneous spatio- temporal data is becoming increasingly open in the hydrology domain. GUIs vs APIs GUIs may be the easiest way to browse data but not the most efficient. APIs are fast and scalable. Hardware/software Hardware & software burden is on the data provider side. No need to update your datasets, you always access the latest version R as interface R is an easy-to-learn language, widely used by statisticians and scientists. It provides a number of libraries to obtain and parse data from the web. Reproducible workflows Query databases, filter information, convert coordinates, generate plots and maps for reproducible reporting. Scalability & Interoperability An approach to gather information for single as well as multiple sites. At larger scale, computing can be made more efficient by using cloud facilities. R
  39. 39. Thanks! Any questions? Claudia Vitolo Twitter: @clavitolo Email: claudia.vitolo@gmail.com Blog: http://claudiavitolo.com/

×