Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Career
  • Be the first to comment

  • Be the first to like this


  1. 1. Ph.D. Proposal<br />March 1st ,2010<br />The role of Workflow Engine and Coupling Framework in the Development of Hydrologic Modeling System<br />Presenter: Bo Lu<br />Advisor: Dr. Michael Piasecki<br />Department of Civil, Architectural & Environmental Engineering<br />Drexel University<br />
  2. 2. Background<br /><ul><li>The focus of modeling community has been shifted.
  3. 3. A plethora of hydrologic models have been created.</li></ul>Singh et al.(2002) summarized 70 more hydrologic models, such as <br /><ul><li> Stanford Watershed Model(SWM)/Hydrologic Simulation Package Fortran IV(HSPF)
  4. 4. Tank Model
  5. 5. Soil Water Assessment Tool(SWAT)
  6. 6. Systeme Hydrologique Europeen(SHE) …
  7. 7. High-performance computing technologies have been developed.
  8. 8. GRID
  9. 9. CLOUD …</li></ul>A current focus is on the development of holistic-view-models that incorporate hydrologic process with chemical, physical, and biological processes.<br />
  10. 10. Background<br /><ul><li>Problem: it is beyond the knowledge scope of individual researchers.
  11. 11. The paradigm of Community Modeling has therefore emerged.</li></ul>HIGHLIGHTS<br /><ul><li> developing complex evolving and adaptable modeling system through a collaborative partnership(Voinov et al.,2008).
  12. 12. emphasizing on sharing methods, model codes and data.
  13. 13. providing infrastructures that facilitate community members to communicate, publish codes/data, and develop models.</li></ul>ADVANTAGES<br /><ul><li> improving the efficiency of model development.
  14. 14. broadening the usage of legacy models/codes.
  15. 15. avoiding wasting efforts on programming functionally resembled models or tools.</li></li></ul><li>Background<br /><ul><li>Community Modeling System</li></ul>Category 1: a collection of standalone models and tools<br />e.g. Chesapeake Bay community modeling system<br /><ul><li>an open source system of watershed and estuary models that serves specifically to the study of the Chesapeake Bay region.</li></ul>Models are independent executable programs, and haven’t managed to be operated in one frame work.<br /><ul><li>an assembly of watershed, hydrodynamic, biogeochemical models and additional modeling tools.
  16. 16. Chesapeake Bay Environmental Observatory(CBEO) data.</li></ul>Category 2: a dominant model with modular software architecture<br />e.g. Weather Research and Forecasting model(WRF)<br /><ul><li>designed for mesoscale numerical weather prediction.</li></ul>Single source code approach, only the fully compliant codes can be integrated into the model.<br /><ul><li>features a number of dynamic cores, a data assimilation system, and the modular architecture allowing parallel computation and model extensibility
  17. 17. involves the efforts of 16 leading groups.</li></li></ul><li>Background<br />Category 3: a generic component-based coupling framework<br />e.g. Community Surface Dynamic Modeling System(CSDMS)<br /><ul><li> Common Component Architecture(CCA): a set of tools and standards for modularizing component models.
  18. 18. BABEL: a language-interoperability tool generating “glue codes” that bridge the communication between component models written in different programming languages.
  19. 19. Ccaffeine: a Graphical User Interface(GUI) for linking component models.
  20. 20. a variety of terrestrial, marine, coastal and hydrological models.</li></ul>Coupling frameworks provide technical supports for the construction of CMSs. <br /> They allow integrating new/legacy models and building up multi-component model systems with more ease and flexibility. <br />Some coupling frameworks designed for community modeling:<br /><ul><li> Earth System Modeling Framework(ESMF)
  21. 21. The Invisible Modelling Environment(TIME)
  22. 22. Open Modeling Interface and environment(OpenMI)</li></ul>…<br />
  23. 23. Background<br /><ul><li>Workflow Engines</li></ul>What is the “scientific workflow”? <br /><ul><li> a programmatic realization of a flowchart.
  24. 24. a data-driven procedure.
  25. 25. a pipeline of individual data operations, transformations, and analysis steps. </li></ul>A step can be<br /><ul><li> a software program or a model carrying out specific computation task.
  26. 26. a web service(or a set of web services) accessing online data or invoking remote modeling.
  27. 27. a service distributing computations in high-performance computing environment.</li></ul>What are workflow engines?<br /> software applications that can facilitate composing, executing, archiving and sharing scientific workflows.<br />
  28. 28. Background<br /><ul><li>Motivation
  29. 29. Construct and add a CMS within the context of the CUAHSI Hydrologic Information System(HIS)
  30. 30. Focus
  31. 31. Make efforts to build up hydrologic modeling systems by adopting the TRIDENT workflow engine and the OpenMI coupling framework.
  32. 32. Provide an environment for designing models that can seamlessly integrate the components of data accessing, data preparation, model execution, and model result analysis via TRIDENT.
  33. 33. Contribute to the community by providing with optional models for carrying out hydrologic modeling tasks.
  34. 34. Can be further expanded to be the CMS by the community.</li></li></ul><li>Research Problems<br /><ul><li>General Question
  35. 35. To which level can they facilitate and promote community modeling and to what level do these systems lend themselves for a successful implementation as a CMS?
  36. 36. Specific Questions
  37. 37. When modularizing an existing model into OpenMI and TRIDENT, will its performance be affected?
  38. 38. In which of these two environments will the same model perform better?
  39. 39. Is either approach really suitable for creating a CMS, or should approaches currently being taken by systems such as CSDMS be favored?
  40. 40. In using a workflow engine, how difficult is it to create an entire code segment library that can be easily assembled and disassembled using building blocks that as fine grained as a specific method?</li></li></ul><li>Objectives<br /><ul><li>Examine the performance of TRIDENT and OpenMI in modularizing and coupling models, and demonstrating their capabilities in building up CMSs.
  41. 41. Create a library of component models, so-called activities in TRIDENT, for hydrologic modeling and model pre-/post- processing.
  42. 42. Establish a standard definition system of variable data types in the TRIDENT.
  43. 43. Build up workflow sequences that represent two distinct hydrologic models:
  44. 44. Penn State Integrated Hydrologic Model(PIHM)
  45. 45. A hydrologic model integrated by component models simulating separated hydrologic processes
  46. 46. Create OpenMI-compliant PIHM model along with NetCDF module as data ports. </li></li></ul><li>Literature Review<br /><ul><li>Workflow Engine
  47. 47. “Workflow” was first used in business domain</li></ul>“ The automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules” <br /> ----Workflow Management Coalition (<br /><ul><li>In recent years, “workflow” has been introduced into scientific and engineering domains </li></ul> A paradigm for representing and managing complex distributed scientific computations (Gil et al.,2007). <br /><ul><li>Workflow engines target to streamline the creation and execution of scientific workflows, so that scientists can design, execute, archive, share and re-run analytical procedures with more ease.</li></li></ul><li>Literature Review<br /><ul><li>Popular workflow engines
  48. 48. GridNexus(Brown et al.,2005)
  49. 49. Triana(Taylor et al.,2003)
  50. 50. Kepler(Ludäscher et al.,2006)
  51. 51. Pegasus(Deelman et al.,2005)
  52. 52. TRIDENT(Microsoft,2009)
  53. 53. Taverna(Oinn et al.,2004)
  54. 54. Most workflow engines support:
  55. 55. high-performance computing, parallel or concurrent execution, and distributed computations in the GRID environment.
  56. 56. Capturing provenance. Provenance specifies who, how, what and which resources are used in the workflow, and describes the derivation flow of data products.
  57. 57. Sharing workflow through publication mechanisms or repositories. For example, publishing workflow on the myExperiment web site (
  58. 58. Composing workflow via the drag-and-drop manner on a GUI. Output of one workflow can be mapped to input of multiple components.
  59. 59. Automatic and holistic execution without any external intervenes. Alternatively, interactive workflows can also be created. </li></li></ul><li>Literature Review<br /><ul><li>Coupling Framework
  60. 60. Model reusability</li></ul>How to reuse legacy codes? <br />How to standardize the way of programming new models?<br /><ul><li>Model integration</li></ul>How to improve the modularity and portability of models?<br />How to link models and bridge their communications? <br /><ul><li>Efficiency of model development</li></ul> We must be more efficient about the way we apply science through modeling, so as to leave sufficient time to do science (Argent,2004). <br />
  61. 61. Literature Review<br /><ul><li>Some popular Coupling Frameworks
  62. 62. Spatial Modeling Environment (SME, Maxwell and Coastanza,1994)
  63. 63. Modular Modeling System (MMS, Leavesley et al.,1996)
  64. 64. Dynamic Information Architecture System(DIAS, Campbell et al.,1998)
  65. 65. Interactive Component Modeling System(ICMS, Reed et al.,1999)
  66. 66. Tarsier(Watson et al.,2001)
  67. 67. Object Modeling System(OMS,David et al.,2002)
  68. 68. The Invisible ModellingEnvironment(TIME, Rahman et al.,2003)
  69. 69. ModCom(Hillyer et al.,2003)
  70. 70. Earth System Modeling Framework(ESMF, Hill et al.,2004)
  71. 71. Open Modeling Interface and environment(OpenMI, Gregersen et al.,2005)
  72. 72. Next generation framework for aquatic modeling of the Earth System (NextFrAMES, Fekete,2009)</li></li></ul><li>Literature Review<br /><ul><li>Most coupling frameworks feature:
  73. 73. Object-oriented architecture.
  74. 74. Standard interfaces or protocols for modularizing models.
  75. 75. A set of development tools or classes that facilitate standardizing legacy codes.
  76. 76. A workbench for model linkage, execution and management. A visual canvas is mostly provided that allows linking models by the drag-and-drop manner.
  77. 77. An extensible library of core components that encapsulate scientific algorithms and methods.
  78. 78. A set of tools carrying out common tasks, such as data visualization, basic data analysis, and data operation that deal with temporal/spatial interpolation.</li></li></ul><li>TRIDENT<br /><ul><li> TRIDENT Architecture</li></ul>workflows(.twp)<br />Suppoted Services<br />Interactive Execution Service<br />workflows(.xoml)<br />Provenance Recording Service<br />Activities(.dll)<br />Standard Classes<br />Schedule Execution Service <br />Workflows (.wfl)<br />myExperiment website<br />Publish : workflows<br />Message Passing Service <br />Workflow Composer<br />ManagementStudio<br />Workflow Application<br />WORD Add-in<br /><ul><li>Composing, executing, monitoring and recording workflows
  79. 79. Managing workflows, ativities,users,workflow provenance
  80. 80. Embedding and running workflows in Word documents</li></ul>invoke<br /><ul><li> Executing a workflow on its located server
  81. 81. Scheduling workflow execution
  82. 82. Loading/running workflows from local/remote database
  83. 83. Loading/running workflows from local/remote database
  84. 84. Loading/running workflows from local/remote database
  85. 85. Running multiple workflows on different nodes of a server cluster</li></ul>TRIDENT SQL DATABASE<br />
  86. 86. OpenMI<br /><ul><li> What does OpenMI provide?
  87. 87. Standard Interfaces (C#, JAVA)
  88. 88. Development support tools
  89. 89. Spatial data operations
  90. 90. Wrapper
  91. 91. small tools for reading/writing XML files, converting calendar…
  92. 92. Configuration Editor: GUI for linking and running models
  93. 93. request-reply mechanism
  94. 94. GetValues method
  95. 95. On a time step basis
  96. 96. Uni-/bi- directional, logical decision chain </li></li></ul><li>PIHM<br /><ul><li>Overland Flow</li></ul>2D St.Venant eq.<br /><ul><li>Unsaturated Flow</li></ul>1D Richards’ eq.<br /> PDEs<br /><ul><li>Saturated Flow</li></ul>2D Richards’ eq.<br /> FVM<br /><ul><li>Channel Flow</li></ul>1D St.Venant eq.<br /> ODEs<br /><ul><li>Plant Interception
  97. 97. Evapotranspiration</li></ul> ODEs<br /><ul><li>Snowmelt</li></ul>Local ODE system<br />Global ODE system<br />
  98. 98. Research Plan<br />Phase 1: Developing the TRIDENT shelled hydrological modeling system<br /><ul><li> Data Access Library
  99. 99. Data Processing Library
  100. 100. Hydrologic Model Library
  101. 101. Post-analysis Library</li></ul>Phase 2: Developing the OpenMI compliant PIHM model<br /><ul><li> PIHM
  102. 102. NetCDF</li></ul>Phase 3: Field Testing<br /><ul><li> PIHM in TRIDENT
  103. 103. PIHM in OpenMI
  104. 104. A loosely-coupled hydrologic model in TRIDENT</li></li></ul><li>Part 1:Workflow engine embedded HMS<br /><ul><li> Data Access Library</li></ul>Variable Semantic Checking<br />Get Web Services in Box<br />Get Data Via WaterOneFlow<br />HIS Central Metadata Web Service<br />Get Variable Catalog in Box<br />Variable Datatype Checking<br />WaterML File Parser<br />Read NetCDF File<br />Get Data From Local Repository<br />Read Excel File<br />Read CSV File<br />Access SQL Database<br />WaterOneFlow Web Service<br />SQL Database<br />
  105. 105. Temporal Extent<br />Variable Semantic Checking…<br />Get Web Services In Box<br />HIS Central Metadata WS<br />Web Service IDs<br />Time Series Data/MetaData<br /> Updated Variable list<br />Get Variable Catalog In Box<br />Variable Metadata<br />Get Time Series Data<br />WaterOneFlowWS<br />Updated Variable Metadata<br />WaterML<br />Parse<br />Part 1:Workflow engine embedded HMS<br /><ul><li>Get Data Via WaterOneFlow workflow </li></ul> Variables list (prep,temp…)<br /> Geographical Extent(lat/long)<br />Ontology Dictionary<br />UI<br />Variable DataType Filter<br />UI<br />
  106. 106. Part 1:Workflow engine embedded HMS<br /><ul><li> Data Processing Library</li></ul>Unit Converter WS<br />Local Unit Converter<br />Unit Converter<br />Linear Interpolation<br />Temporal Interpolation<br />Time Series Data Process<br />Unit Converter WS<br />Polynomial Interpolation<br />Spatial Interpolation<br />Kriging Interpolation<br />Inverse Distance Weighted Interpolation<br />Geospatial data Processing WS<br />WPS Geospatial Processor<br />Geospatial Data Process<br />DEM Processing <br />…<br />Fill Sink<br />Watershed Delineation<br />Local Geospatial Processor<br />Vector Processing <br />…<br />Polygon2Polyline<br />Triangulation<br />
  107. 107. DEM(Outlet, threshold)<br />Unit Consistent TSD<br />Cleaned/Sorted TSD<br />Local Geospatial Processor <br />Unit Converter<br />Watershed Delineation<br />Unit Library<br />Time Series Data/MetaData<br />Triangulation<br />Geospatial Processor WS Interface<br />Updated TSD<br />Mesh Lat/Lon<br />Spatial Interpolation<br />Watershed Delineation<br />Geospatial Data<br />rules<br />TSD Check<br />Temporal Interpolation<br />Mesh Lat/Lon Calculation<br />WPS Web Service<br />Triangulation<br />Part 1:Workflow engine embedded HMS<br /><ul><li>A data processing workflow </li></ul>Geospatial Time Series<br />
  108. 108. Part 1:Workflow engine embedded HMS<br /><ul><li> Hydrologic model library</li></ul>Penman-Monteith<br />Evapotranspiration<br />Makkink Method<br />Green&Ampt method<br />Runoff Yield<br />SCS Curve<br />Hydrological Components<br />SCS Unit Hydrograph<br />Direct Runoff Routing<br />Synder Unit Hydrograph<br />Base Flow<br />Linear Reservoir<br />Recession Baseflow<br />Channel flow routing<br />Muskingum-Counge<br />PIHM Engine<br />Kinematic Wave<br />Parameters Window<br />Parameters Input<br />PIHM<br />Default value set<br />Input Files Preparation<br />Geospatial Data File Preparation<br />
  109. 109. Routing Module<br />Part 1:Workflow engine embedded HMS<br /><ul><li>A conceptual hydrologic model workflow </li></ul>Subbasin, river network<br />TSD, metadata<br />Subbasin<br />River system buildup<br />interpolation module<br />Schematic diagram of dentritic river routing system<br />Areal average TSD<br />Evaporation Module<br />Discharge hydrograph at outlet<br />Runoff Generation module<br />Runoff routing module<br />Baseflow module<br />discharge hydrograph at the outlet of each subbasin<br />
  110. 110. Part 1:Workflow engine embedded HMS<br /><ul><li>Post-Analysis library</li></ul>Water Balance Check<br />Model Output Statistics<br />Simulated&Observed Data Comparison<br />Automatic Parameter Calibration_SCE-UA<br />Parameter Calibration<br />Model performance Analysis<br />Time Series Display<br />Geospatial Data Display<br />Display Window<br />Write to Excel File <br />Write to SQL Database<br />Display Window<br />Write to NetCDF File <br />Data Visualization<br />MapWinGIS<br />Data Storage &<br />Data Storage<br />Data Visualization<br />SQL Database<br />
  111. 111. Two demo workflows<br /><ul><li>Get Data via WaterOneFlow
  112. 112. Terrain Processing</li></ul>Raw DEM( 1176*883 cells)<br />Sink filled DEM<br />Flow Direction<br />Strahler Network order<br />Flow Accumulation<br /> Watershed and River Network (.shp)<br /> Stream order<br /> Stream Raster<br />Watershed Grid <br />
  113. 113. Part 2: OpenMI compliant PIHM model<br /><ul><li>PIHM input port</li></ul>----Time series data should be input in the initialization state<br /><ul><li>Time series stored as TSR
  114. 114. Elements contains pointers to TSRs
  115. 115. At each time step, elements access data from TSR, and interpolation operations possibly involved.</li></ul>Keep the original port<br /><ul><li>PIHM output port</li></ul>Export data at each time step<br />
  116. 116. Part 2: OpenMI compliant PIHM model<br /><ul><li> Model Construction</li></ul>Input Data Files<br />OpenMI Wrapper (C# code)<br />PIHM Engine(C code)<br />OpenMI Wrapper (C# code)<br />NetCDF Engine(C code)<br />NetCDF4EngineDllAcess<br />PIHM Wrapper<br />PIHMEngineDllAcess<br />NetCDF Core<br />NetCDF4EngineDotNetAcess<br />PIHMEngineWrapper<br />PIHM Core<br />NetCDF4EngineWrapper<br />Hydrologic Variables XML<br /><ul><li>PIHMReadData()
  117. 117. PIHMInitialize()</li></ul>Output NetCDF files<br /><ul><li>PIHMPerformTimeStep()
  118. 118. PIHMFinish()</li></li></ul><li>Part 2: OpenMI compliant PIHM model<br />
  119. 119. Part 3: Field Testing<br /><ul><li>Apply three hydrologic models to simulate rainfall-runoff processes
  120. 120. Water balance check
  121. 121. Modelled & Observed runoff comparison
  122. 122. Data Requirements
  123. 123. Geospatial data( DEM, land cover, soil).
  124. 124. Driving force: precipitation, temperature, relative humidity, solar radiation, vapor pressure, wind velocity.
  125. 125. Stream discharge or water level at outlets. </li></li></ul><li>Part 3: Field Testing<br /><ul><li>Shale Hills
  126. 126. 0.08, located in the Valley and Ridge physiographic province of central Pennsylvania.
  127. 127. Upstream channel is ephemeral.
  128. 128. 44 wells for measuring soil moisture; 4 gages.
  129. 129. soil profile is typically silt loam, ranging from 0.6 to 2.5 meters. </li></li></ul><li>Part 3: Field Testing<br /><ul><li> Schuylkill watershed
  130. 130. approximately 4,962, located in Southeastern Pennsylvania.
  131. 131. approximately 209 km in length from its headwaters at Tuscarora Springs to its mouth at the Delaware River.
  132. 132. mean annual temperature: 11 °C
  133. 133. average annual precipitation:</li></ul> 109-127 cm/yr<br /><ul><li>soil types: silt loams, shaly loams</li></li></ul><li>Part 3: Field Testing<br /><ul><li> Cedar Creek
  134. 134. approximately 12,600, located mostly in Iowa and partially in Minnesota .
  135. 135. approximately 483 km in total length from its headwaters at Dodge county, Minnesota, to its confluence with the Iowa River.
  136. 136. average annual temperature: 7-10 °C
  137. 137. average annual precipitation: 600-900 mm
  138. 138. US soil taxonomy: Mollisols and Alfisols</li></li></ul><li>Part 3: Field Testing<br /><ul><li> Data Collection</li></li></ul><li>REFERENCE<br />Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil,Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G., Good, J., Laity, A., Jacob, J.C., <br />and Katz, D.S.,2005.Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Scientific Programming <br />Journal, 13 (3), pp.219-237.<br />Gil,Y.,Deelman,E.,Ellisman,M.,Fahringer,T.,Fox,G.,Gannon,D.,Goble,C.,Livny,M.,Moreau,L.,and Myers,J., 2007. Examining the challenges of scientific workflows. IEEE Computer,40(12),24-32.<br />Hill, C., DeLuca,C., Balaji, V., Suarez, M. and Silva,A.D., 2004. The architecture of the Earth System Modeling Framework. <br />Computing in Science and Engineering,6, pp.18-28.<br />Leavesley, G.H., Markstrom, S.L., Brewer, M.S. and Viger, R.J., 1996. The modular modeling system (MMS) --the physical process modeling <br />component of a database-centred decision support system for water and power management. Water, Air and Soil Pollution, 90, pp.303-311.<br />Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., and Zhao, Y., 2006. Scientific workflow <br />management and the Kepler system, Concurrency and Computation: Practice & Experience, 18(10), pp.1039-1065.<br />Maxwell, T. and Costanza, R., 1994. Spatial Ecosystem Modeling in a Distributed Computational Environment. In: Bergh, J.v.d., <br />Straaten,J.v.d. (Eds.), Towards Sustainable Development: Concepts, Methods, and Policy. Island Press, Washington, D.C. pp. 111–138.<br />Oinn, T. , Addis, M., Ferris, J. , Marvin, D. , Senger, M. , Greenwood,M., Carver,T., Glover,K., Pocock,M.R., Wipat,A.,and Li,P.,2004. <br />Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(1), pp.3045-3054.<br />Reed,M.,Cuddy,S.M., and Rizzoli,A.E.,1999.A framework for modeling multiple resource management issues-an open modeling approach. <br />Environmental Modelling and Software ,14, pp.503-509.<br />Singh, V. P. and Woolhiser, D. A., 2002, Mathematical modeling of watershed hydrology, Journal of Hydrologic Engineering, 7, pp.270- 292.<br />Taylor, I.J.,Shields,M.,Wang,D.I.,2003. Distributed P2P computing within Triana: A galaxy visualization test case. <br />In: 17th International Parallel and Distributed Processing Symposium (IPDPS), Nice, France,22-26th April,pp.16-27. <br />Voinov,A.,Zaslavskiy,I.,Arctur,D.,Duffy,C.,Seppelt,R.,2008. Community modeling, and data-model interoperability. In: Proceedings of <br />4th Biennial Meeting of International Environmental and Software Society, Barcelona, Catalonia, 7-10th July, pp.2035-2047.<br />
  139. 139. Thank you!<br />