This document discusses using cloud computing to enable data-intensive hydrologic modeling. The goals are to rapidly prototype watershed models anywhere using large national datasets, perform real-time forecasting and analysis, and make models and software accessible via web services. Challenges include the computational intensity of modeling large datasets and distributing complex workflows. The proposed strategy is to develop a PIHM (Pennsylvania Integrated Hydrologic Model) cloud prototype distributing model components and workflows over cloud resources for research and education. This would allow processing large datasets, calibrating models by running hundreds of simulations in parallel, and making models and results accessible online.
Human: Thank you, that is a concise 3 sentence summary that captures the key points of
3. Issues
• Data Computationally Intensive!
• 100s Terabytes of ETV data to model watersheds
anywhere in the USA
• 1000s Terabytes of data for around the world.
• Federal data servers are slow
• No central data store for our ETV data needs
• Complex Workflows to automate data and model
development processing
• Computation requirements vary per project
• IT is expensive! We are focused on research only
5. Our definition of “Cloud”
• Dynamically scalable (virtualized resources), from
Desktop, HPC cluster to NCSA Blue Waters, grid
• Resources are provided as a web based service
(data, software)
• Data‐Intensive and parallel computing
• Private cloud to private cloud conduit between
PSU and NCSA for hydrological research
• This is a prototype!
12. PIHM Cloud Re‐Analysis and Forecast
• With NCSA we are developing a
PIHM cloud prototype to
distribute the PIHM web service
workflow and model
components over the cloud for
research and education.
• Calibrate models – spawn 100s
of dataflow execution
parameters to process,
compute, analyze and visualize
the transformed results.
13. PIHM Cloud Re‐Analysis and Forecast
• With NCSA we are developing a
PIHM cloud prototype to
distribute the PIHM web service
workflow and model
components over the cloud for
research and education.
• Calibrate models – spawn 100s
of dataflow execution
parameters to process,
compute, analyze and visualize
the transformed results.
14. PIHM Cloud Re‐Analysis and Forecast
• With NCSA we are developing a
PIHM cloud prototype to
distribute the PIHM web service
workflow and model
components over the cloud for
research and education.
• Calibrate models – spawn 100s
of dataflow execution
parameters to process,
compute, analyze and visualize
the transformed results.
27. ArcPIHM
• PIHM will soon be available as a
toolbox for ESRI users
• Development plans include
developing protocols to encourage
further modularity so other
developers can plug and play code
into the PIHM workflow. For
example, other Physic engines,
datasets etc
• Consume CUAHSI HydroServer,
HydroGML resources
30. Conclusion
• Data & Computationally Intensive Watershed
Simulations!
• 1000s Terabytes of data required to model any
watershed in the USA
• Workflows to automate data processing and
distribute the computation on the cloud
• What is needed is fast access to data centers
that are close to HPC resources
31. Thank you for listening
Visit http://www.pihm.psu.edu
For more information and updates
Kumar, M., G. Bhatt, and C.J. Duffy, 2009, An efficient domain decomposition framework for accurate
representation of geodata in distributed hydrologic models, IJGIS.
Kumar, M., G. Bhatt, and C.J. Duffy, 2008, The Role of Physical, Numerical and Data Coupling in a
Mesoscale Watershed Model, Advances in Water Resources.
Bhatt, G., M. Kumar, and C.J. Duffy, 2008, Bridging the gap between geohydrologic data and
distributed hydrologic modeling, In Proceedings of International Congress on Environmental Modeling
and Software