Ph.D. Proposal<br />March 1st ,2010<br />The role of Workflow Engine and Coupling Framework in the Development of Hydrologic Modeling System<br />Presenter: Bo Lu<br />Advisor: Dr. Michael Piasecki<br />Department of Civil, Architectural & Environmental Engineering<br />Drexel University<br />
Background<br /><ul><li>The focus of modeling community has been shifted.
A plethora of hydrologic models have been created.</li></ul>Singh et al.(2002) summarized 70 more hydrologic models, such as <br /><ul><li> Stanford Watershed Model(SWM)/Hydrologic Simulation Package Fortran IV(HSPF)
CLOUD …</li></ul>A current focus is on the development of holistic-view-models that incorporate hydrologic process with chemical, physical, and biological processes.<br />
Background<br /><ul><li>Problem: it is beyond the knowledge scope of individual researchers.
The paradigm of Community Modeling has therefore emerged.</li></ul>HIGHLIGHTS<br /><ul><li> developing complex evolving and adaptable modeling system through a collaborative partnership(Voinov et al.,2008).
emphasizing on sharing methods, model codes and data.
providing infrastructures that facilitate community members to communicate, publish codes/data, and develop models.</li></ul>ADVANTAGES<br /><ul><li> improving the efficiency of model development.
avoiding wasting efforts on programming functionally resembled models or tools.</li></li></ul><li>Background<br /><ul><li>Community Modeling System</li></ul>Category 1: a collection of standalone models and tools<br />e.g. Chesapeake Bay community modeling system<br /><ul><li>an open source system of watershed and estuary models that serves specifically to the study of the Chesapeake Bay region.</li></ul>Models are independent executable programs, and haven’t managed to be operated in one frame work.<br /><ul><li>an assembly of watershed, hydrodynamic, biogeochemical models and additional modeling tools.
Chesapeake Bay Environmental Observatory(CBEO) data.</li></ul>Category 2: a dominant model with modular software architecture<br />e.g. Weather Research and Forecasting model(WRF)<br /><ul><li>designed for mesoscale numerical weather prediction.</li></ul>Single source code approach, only the fully compliant codes can be integrated into the model.<br /><ul><li>features a number of dynamic cores, a data assimilation system, and the modular architecture allowing parallel computation and model extensibility
involves the efforts of 16 leading groups.</li></li></ul><li>Background<br />Category 3: a generic component-based coupling framework<br />e.g. Community Surface Dynamic Modeling System(CSDMS)<br /><ul><li> Common Component Architecture(CCA): a set of tools and standards for modularizing component models.
BABEL: a language-interoperability tool generating “glue codes” that bridge the communication between component models written in different programming languages.
Ccaffeine: a Graphical User Interface(GUI) for linking component models.
a variety of terrestrial, marine, coastal and hydrological models.</li></ul>Coupling frameworks provide technical supports for the construction of CMSs. <br /> They allow integrating new/legacy models and building up multi-component model systems with more ease and flexibility. <br />Some coupling frameworks designed for community modeling:<br /><ul><li> Earth System Modeling Framework(ESMF)
a pipeline of individual data operations, transformations, and analysis steps. </li></ul>A step can be<br /><ul><li> a software program or a model carrying out specific computation task.
a web service(or a set of web services) accessing online data or invoking remote modeling.
a service distributing computations in high-performance computing environment.</li></ul>What are workflow engines?<br /> software applications that can facilitate composing, executing, archiving and sharing scientific workflows.<br />
When modularizing an existing model into OpenMI and TRIDENT, will its performance be affected?
In which of these two environments will the same model perform better?
Is either approach really suitable for creating a CMS, or should approaches currently being taken by systems such as CSDMS be favored?
In using a workflow engine, how difficult is it to create an entire code segment library that can be easily assembled and disassembled using building blocks that as fine grained as a specific method?</li></li></ul><li>Objectives<br /><ul><li>Examine the performance of TRIDENT and OpenMI in modularizing and coupling models, and demonstrating their capabilities in building up CMSs.
Create a library of component models, so-called activities in TRIDENT, for hydrologic modeling and model pre-/post- processing.
Establish a standard definition system of variable data types in the TRIDENT.
Build up workflow sequences that represent two distinct hydrologic models:
A hydrologic model integrated by component models simulating separated hydrologic processes
Create OpenMI-compliant PIHM model along with NetCDF module as data ports. </li></li></ul><li>Literature Review<br /><ul><li>Workflow Engine
“Workflow” was first used in business domain</li></ul>“ The automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules” <br /> ----Workflow Management Coalition (http://www.wfmc.org)<br /><ul><li>In recent years, “workflow” has been introduced into scientific and engineering domains </li></ul> A paradigm for representing and managing complex distributed scientific computations (Gil et al.,2007). <br /><ul><li>Workflow engines target to streamline the creation and execution of scientific workflows, so that scientists can design, execute, archive, share and re-run analytical procedures with more ease.</li></li></ul><li>Literature Review<br /><ul><li>Popular workflow engines
high-performance computing, parallel or concurrent execution, and distributed computations in the GRID environment.
Capturing provenance. Provenance specifies who, how, what and which resources are used in the workflow, and describes the derivation flow of data products.
Sharing workflow through publication mechanisms or repositories. For example, publishing workflow on the myExperiment web site (http://www.myexperiment.org)
Composing workflow via the drag-and-drop manner on a GUI. Output of one workflow can be mapped to input of multiple components.
Automatic and holistic execution without any external intervenes. Alternatively, interactive workflows can also be created. </li></li></ul><li>Literature Review<br /><ul><li>Coupling Framework
Model reusability</li></ul>How to reuse legacy codes? <br />How to standardize the way of programming new models?<br /><ul><li>Model integration</li></ul>How to improve the modularity and portability of models?<br />How to link models and bridge their communications? <br /><ul><li>Efficiency of model development</li></ul> We must be more efficient about the way we apply science through modeling, so as to leave sufficient time to do science (Argent,2004). <br />
Literature Review<br /><ul><li>Some popular Coupling Frameworks
Spatial Modeling Environment (SME, Maxwell and Coastanza,1994)
Modular Modeling System (MMS, Leavesley et al.,1996)
Dynamic Information Architecture System(DIAS, Campbell et al.,1998)
Interactive Component Modeling System(ICMS, Reed et al.,1999)
Standard interfaces or protocols for modularizing models.
A set of development tools or classes that facilitate standardizing legacy codes.
A workbench for model linkage, execution and management. A visual canvas is mostly provided that allows linking models by the drag-and-drop manner.
An extensible library of core components that encapsulate scientific algorithms and methods.
A set of tools carrying out common tasks, such as data visualization, basic data analysis, and data operation that deal with temporal/spatial interpolation.</li></li></ul><li>TRIDENT<br /><ul><li> TRIDENT Architecture</li></ul>workflows(.twp)<br />Suppoted Services<br />Interactive Execution Service<br />workflows(.xoml)<br />Provenance Recording Service<br />Activities(.dll)<br />Standard Classes<br />Schedule Execution Service <br />Workflows (.wfl)<br />myExperiment website<br />Publish : workflows<br />Message Passing Service <br />Workflow Composer<br />ManagementStudio<br />Workflow Application<br />WORD Add-in<br /><ul><li>Composing, executing, monitoring and recording workflows
A loosely-coupled hydrologic model in TRIDENT</li></li></ul><li>Part 1:Workflow engine embedded HMS<br /><ul><li> Data Access Library</li></ul>Variable Semantic Checking<br />Get Web Services in Box<br />Get Data Via WaterOneFlow<br />HIS Central Metadata Web Service<br />Get Variable Catalog in Box<br />Variable Datatype Checking<br />WaterML File Parser<br />Read NetCDF File<br />Get Data From Local Repository<br />Read Excel File<br />Read CSV File<br />Access SQL Database<br />WaterOneFlow Web Service<br />SQL Database<br />
Temporal Extent<br />Variable Semantic Checking…<br />Get Web Services In Box<br />HIS Central Metadata WS<br />Web Service IDs<br />Time Series Data/MetaData<br /> Updated Variable list<br />Get Variable Catalog In Box<br />Variable Metadata<br />Get Time Series Data<br />WaterOneFlowWS<br />Updated Variable Metadata<br />WaterML<br />Parse<br />Part 1:Workflow engine embedded HMS<br /><ul><li>Get Data Via WaterOneFlow workflow </li></ul> Variables list (prep,temp…)<br /> Geographical Extent(lat/long)<br />Ontology Dictionary<br />UI<br />Variable DataType Filter<br />UI<br />
Part 1:Workflow engine embedded HMS<br /><ul><li> Data Processing Library</li></ul>Unit Converter WS<br />Local Unit Converter<br />Unit Converter<br />Linear Interpolation<br />Temporal Interpolation<br />Time Series Data Process<br />Unit Converter WS<br />Polynomial Interpolation<br />Spatial Interpolation<br />Kriging Interpolation<br />Inverse Distance Weighted Interpolation<br />Geospatial data Processing WS<br />WPS Geospatial Processor<br />Geospatial Data Process<br />DEM Processing <br />…<br />Fill Sink<br />Watershed Delineation<br />Local Geospatial Processor<br />Vector Processing <br />…<br />Polygon2Polyline<br />Triangulation<br />
At each time step, elements access data from TSR, and interpolation operations possibly involved.</li></ul>Keep the original port<br /><ul><li>PIHM output port</li></ul>Export data at each time step<br />
US soil taxonomy: Mollisols and Alfisols</li></li></ul><li>Part 3: Field Testing<br /><ul><li> Data Collection</li></li></ul><li>REFERENCE<br />Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil,Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G., Good, J., Laity, A., Jacob, J.C., <br />and Katz, D.S.,2005.Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Scientific Programming <br />Journal, 13 (3), pp.219-237.<br />Gil,Y.,Deelman,E.,Ellisman,M.,Fahringer,T.,Fox,G.,Gannon,D.,Goble,C.,Livny,M.,Moreau,L.,and Myers,J., 2007. Examining the challenges of scientific workflows. IEEE Computer,40(12),24-32.<br />Hill, C., DeLuca,C., Balaji, V., Suarez, M. and Silva,A.D., 2004. The architecture of the Earth System Modeling Framework. <br />Computing in Science and Engineering,6, pp.18-28.<br />Leavesley, G.H., Markstrom, S.L., Brewer, M.S. and Viger, R.J., 1996. The modular modeling system (MMS) --the physical process modeling <br />component of a database-centred decision support system for water and power management. Water, Air and Soil Pollution, 90, pp.303-311.<br />Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., and Zhao, Y., 2006. Scientific workflow <br />management and the Kepler system, Concurrency and Computation: Practice & Experience, 18(10), pp.1039-1065.<br />Maxwell, T. and Costanza, R., 1994. Spatial Ecosystem Modeling in a Distributed Computational Environment. In: Bergh, J.v.d., <br />Straaten,J.v.d. (Eds.), Towards Sustainable Development: Concepts, Methods, and Policy. Island Press, Washington, D.C. pp. 111–138.<br />Oinn, T. , Addis, M., Ferris, J. , Marvin, D. , Senger, M. , Greenwood,M., Carver,T., Glover,K., Pocock,M.R., Wipat,A.,and Li,P.,2004. <br />Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(1), pp.3045-3054.<br />Reed,M.,Cuddy,S.M., and Rizzoli,A.E.,1999.A framework for modeling multiple resource management issues-an open modeling approach. <br />Environmental Modelling and Software ,14, pp.503-509.<br />Singh, V. P. and Woolhiser, D. A., 2002, Mathematical modeling of watershed hydrology, Journal of Hydrologic Engineering, 7, pp.270- 292.<br />Taylor, I.J.,Shields,M.,Wang,D.I.,2003. Distributed P2P computing within Triana: A galaxy visualization test case. <br />In: 17th International Parallel and Distributed Processing Symposium (IPDPS), Nice, France,22-26th April,pp.16-27. <br />Voinov,A.,Zaslavskiy,I.,Arctur,D.,Duffy,C.,Seppelt,R.,2008. Community modeling, and data-model interoperability. In: Proceedings of <br />4th Biennial Meeting of International Environmental and Software Society, Barcelona, Catalonia, 7-10th July, pp.2035-2047.<br />