OGCE Workflow Toolkit for Multi-Scale Science Applications Suresh Marru Pervasive Technology Institute Indiana University ODI Gateway/Pipeline Evaluation NOAO Jan 21st 2010
Provide temporary storage for the pipeline and for the science gateway, interface with archive, and data transfer from ODI and to compute resources
IU Bloomington data center
NSF funded project to create a high throughput 535 TB Lustre storage solution
Short term storage for applications with high bandwidth needs
IU added a wide area version in 2008
Enhanced wide area security
Data Capacitor Architecture Clients Metadata Servers Object Storage Object Storage Force-10 Switch Twenty-four object storage servers attached to shared storage
Massive Data Storage System MDSS Architecture
Science Gateway A Science Gateway is a community-developed set of tools, applications, and data that is integrated via a portal or a suite of applications, usually in a graphical user interface, that is customized to meet the needs of the targeted community. More Info - http://www.teragrid.org/gateways/
Gateways Advantages Increase access To instruments Increase capabilities To analyze data Improve workforce development For underserved populations Increase outreach Increase public awareness Public sees value in investments in large facilities
There are approximately 30 gateways using the TeraGrid
Example Gateway Accomplishments LEAD - access to radar data NVO – access to sky surveys OOI – access to sensor data PolarGrid – access to polar ice sheet data SIDGrid – analysis tools GridChem – developing multiscale coupling How would this have been done before gateways? How many details do we want each individual scientist to need to know?
TeraGrid Advantages and Challenges What’s different when the resource doesn’t belong just to me? Resource discovery Accounting Security Proposal-based requests for resources (peer-reviewed access) Code scaling and performance numbers Justification of resources Gateway citations Tremendous benefits at the high end, but even more work for the developers Potential impact on science is huge Small number of developers can impact thousands of scientists But need a way to train and fund those developers and provide them with appropriate tools
Example Gateway: LEAD
Weather is Local, High-Impact, Heterogeneous and Rapidly Evolving…Yet Our Technologies and Thinking are Static Rain and Snow Fog Rain and Snow Snow and Freezing Rain Intense Turbulence Severe Thunderstorms
LEAD Dynamic Adaptive Infrastructure Storms Forming Forecast Model Streaming Observations Data Mining Instrument Steering Refine forecast grid
NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA) UMass/Amherst, OU, CSU, UPRM Concept: inexpensive, phased array Doppler radars on cell towers and buildings Dynamically adaptive sensing of multiple targets while simultaneously meeting multiple end-user needs
LEAD Workflow Requirements Run jobs on-demand on TeraGrid. Deadline driven workflows (severe weather tracking) Users ranging from 8th grade students to seasoned researchers. Run jobs on Multiple TeraGrid resources to decrease turn-around time. Must be able to integrate to Portal with very user friendly web interface.
Workflow Survey in 2003(http://www.extreme.indiana.edu/swf-survey/)
High Level LEAD Architecture Workflow graph Application services Compute Engine User Portal Workflow Engine Fault Tolerance & scheduler Event Notification Bus Portal server MyLEAD Agent service Data Management Service Data Catalog service Providence Collection service MyLEAD User Metadata catalog Data Storage
LEAD Pioneering Technology
LEAD Scientists and Educational Interactions Lowering the barrier for using complex end-to-end weather technologies Democratize Empower Facilitate End Users Developers Researcherss
Open Grid Computing Environments Generalize, Harden, Build Test
Flexible Layered Service Oriented Architecture User Interactions Other Clients XBaya GUI Web Portal XBaya Core Event Bus Middleware Services GFac Services Workflow Engine (ODE) XRegistry XMCCat Metadata Catalog Compute & Data Resources Computational Cloud Local Lab Resources Computational Grids
OGCE Workflow Suite Generic Service Toolkit Tool to wrap command-line applications as web services Handles file staging & job submission and monitoring Extensible runtime for security, resource brokering & urgent computing Generic Factory service for on-demand creation of application services XRegistry Information repository for the OGCE workflow suite Register, search, retrieve & share XML documents User & hierarchical group based authorization XBaya GUI based tool to compose & monitor workflows Extensible support for compiler plug-ins like BPEL, Jython, SCUFL Dynamic Workflow Execution support to start, pause, resume, rewind of workflow executions Apache ODE Scientific Workflow Extensions XBaya GUI integration for BPEL Generation Asynchronous support for long running workflows Instrumented with fine grained monitoring Eventing System Supports both WS-Eventing and WS-Notification Standards Very scalable Persistent Message Box for clients behind firewalls and with intermittent network glitches.
Generic Application Service Factory
Application Wrapper Framework c Application Factory
Scientific Applications are wrapped into web services by filling in a web-form.
The Application Factory generates a web service for each application with I/O interfaces.
Registers WSDL for the service with a registry
Each service generates a stream of notifications that log the service actions back to the XMCCat Metadata Catalog, user monitoring, and provenance tracking tools
App Service Run program & publish events
Service Monitoring via Events 1 2 3 4 5 6
The service output is a stream of events
I am running your request
I have started to move your input files
I have all the files
I am running your application
The application is finished
I am moving the output to you file space
I am done
These are automatically generated by the service using a distributed event system(WS-Eventing / WS-Notification)
Topic based pub-sub system witha well known “channel”
Application Service Instance Notification Channel x x publisher listener
Interoperable XBaya Workflow Architecture BPEL 1.1 BPEL 2.0 SCUFL Abstract DAG Model Composition and Monitoring Python Dynamic Enactor/Interpreter Jython Based Enactor GPEL Engine Apache ODE Engine Taverna Python Runtime Message Bus
WS-BPEL Business Process Execution Language for Web Services (WS-BPEL) De-facto standard for specifying web service based business processes and service compositions Basic activities Invoke, Receive, Assign.. Structured activities Sequence, Flow, ForEach,..
Workflow Composition, Execution & Monitoring XBaya enables users to construct, share, execute and monitor sequence of tasks executing on their local workstations to high-end compute resources.
GPEL Grid Process Execution Language BPEL4WS based home grown research workflow engine Supports a subset of BPEL4WS 1.1 One of the very early adaptations of BPEL efforts Specifically designed for eScience Usage Long running workflow support Decoupled client
Benefits of Porting to Apache ODE
XML Metadata Catalog (XMC Cat) Taming Complex Scientific Metadata Schemas “A significant need exists in many disciplines for long-term, distributed, and stable data and metadata repositories”
NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure
“Metadata is key to being able to share results”
UK e-Science Core Programme Study
More Info: Scott Jensen, Beth Plale
Simple Recovery Architecture Portal BPEL Workflow Engine Application Performance Models Fault Tolerance/ Recovery Service Resource Reliability Models Application Service OVP/ RST/ MIG NWS, MDS BQP Notification Service Deadline & Success Probability 36 36
OGCE Workflow Usage Flow Scientist/Application provider registers application description with Registry Service. Workflow Author constructs the workflow with multiple wrapped application services. Workflow is compiled and deployed to the ODE workflow Engine. Workflow inputs are captured by XBaya and workflow is launched to ODE. Workflow system and possibly some services publish notifications to the Message bus reporting the progress of the workflow. XBaya monitoring system listens to notifications and color the workflow components to present workflow progress.