OGCE WorkflowSuite for Science
         Gateways

  Suresh Marru, Raminder Singh,
 Chathura Herath & Marlon Pierce

        Indiana University
OGCE

       Gateways      TeraGrid
                    User Portal
        (LEAD,
      GridChem,
          …)


                  TG GIG




                                   Generalize,
                                  Harden, Build
                                      Test
Gateways/E-Science Community
Requirements from gateways

• Gateways demand scientific workflow systems
  to be:
  – Flexible
  – Dynamic
  – Interactive
  – Technology Adaptive
  – Interoperable with Emerging Computational
    Resources and their job management interfaces
OGCE Workflow Suite
• Generic Service Toolkit
   –   Tool to wrap command-line applications as web services
   –   Handles file staging&job submissions
   –   Extensible runtime for security, resource brokering& urgent computing
   –   Generic Factory service for on-demand creation of application services
• XRegistry
   – Information repository for the OGCE workflow suite
   – Register, search, retrieve&share XML documents
   – User & hierarchical group based authorization
• XBaya
   – GUI based tool to compose&monitor workflows
   – Extensible support for compiler plug-ins like BPEL &Jython
   – Dynamic Workflow Execution support to start, pause, resume, rewind
     of workflow executions

                         OGCE Workflow Tutorial
Features
• Security
   –   Authentication and authorization
   –   Secure invocations between services
   –   Support for gateway community accounts
   –   Support for multiple user accounts
• Reliability
   – Retry job submissions and file staging
   – Fault Tolerance and Recovery service
        • Over-provisioning and migration
• Compatibility
   – Taverna, Kepler and Trianna


                        OGCE Workflow Tutorial
Application Services
• Workflows are built by composing web                  Application Factory

  services                                                        c

  – Fortran applications are “wrapped” by a
    Application Factory which generates a web
    service for the app.
     • Registers WSDL for the service with a registry
                                                            App
  – Each service generates a stream of                      Service
    notifications that log the service actions back
    to the XMC Cat Metadata Catalog.
                                                          Run program
                                                          & publish events
Workflow Composition, Execution
                & Monitoring
Baya enables users to
 construct, share, execute
 and monitor sequence of
 tasks executing on their
 local workstations to
 high-end compute
 resources.
Service Monitoring via Events
• The service output is a stream of events                                           Application
                                                                                       Service
    –   I am running your request                                                     Instance
    –   I have started to move your input files.
    –   I have all the files                                                6
                                                                        5
    –   I am running your application.                              4
    –   The application is finished                             3
    –   I am moving the output to you file space            2
                                                        1
    –   I am done.
• These are automatically generated
  by the service using a
  distributed event system
  (WS-Eventing / WS-Notification)                                           Notification
    – Topic based pub-sub system with                                        Channel
      a well known “channel”.
                                                   Subscribe
                                                   Topic=x                                    x
                                                                            x

                                                            listener                    publisher
OGCE Workflow Tools




WRF-Static running
  on Tungsten
Workflow Suite Architecture
XML Metadata Catalog (XMC Cat)
                         Taming Complex Scientific Metadata Schemas

“A significant need exists in
   many disciplines for long-                                                                   Message Bus
   term, distributed, and



                                                 Notifications
                                                                                                                          Workflow




                                      Workflow
                                                                                                                                   N            otification
                                                                                                                                                                 s
   stable data and metadata                                                                    Record
                                                                                                        Workflo
                                                                                                                w   Outputs
   repositories”
                                                                                                                                    Intermediate Results
                                                                                           Workflow Configuration and

   –   NSF Blue-Ribbon Advisory                                                                                   In   puts                                            Metadata Catalog
                                                                                                           rkflow
       Panel on Cyberinfrastructure                                                                 r d Wo
                                                                                               Reco
                                                                                                                                                         s
                                                                                                                                                   low




                                                                                                                                                                         ws




                                                                                                                                                                                     sults
                                                                                                                                            o   rkf




                                                                                                                                                                         lo
                                                                                                                                           W




                                                                                                                                                                       kf
                                       Workflow                                                                                       or




                                                                                                                                                                                          e
                                                                                                                                                                    or
                                                                                                                                    yF




                                                                                                                                                                                 Search R
                                                                                                                                                                  rW
                                                                 Co                                                         e   r
                                                                                                                         Qu




                                                                                                                                                                 ito
                                                                      mp




                                                                                                                                                             on
                                                                           os
“Metadata is key to being                                                    eW




                                                                                                                                                             M
                                                                               or
                                                                                    kfl
                                                                                          ow
  able to share results”
  –    UK e-Science Core Programme Study
                                                                                                                                                   Portal




                                 More Info: Scott Jensen
Applications
• LEAD
   – Lower entry barrier to using weather analysis tools
   – Improve detection, analysis & prediction of mesoscale weather
• Motif-Network
   – Transformation of sequenced genomes to “domain-space”
• Cyber-Infrastructure Evaluation
   – Performance evaluation of future supercomputer architectures
• ADAM
   – Algorithms for feature extraction, data normalization, classification
     and normalization
• GridChem
   – Molecular Chemistry Grid helping researchers run chemistry
     applications on Grid Environment

                        OGCE Workflow Tutorial
LEAD: A Weather Forecasting Workflow (1/2)
    Terrain data files
                                                                             NAM, RUC, GFS data                            9
                                         3                                                                                          3D Model Data
  1                                                                                                                                  Interpolator
          Terrain                                3D Model                                                                            (lateral Boundary
                                                    Data                       Surface data,                                             Conditions)
       Preprocessor                                                      upper air mesonet data and
                                                Interpolator
                                                (Initial Boundary            wind profiler data                  11                                  15
                                                   Conditions)
  2
                                                                                                                  ARPS to WRF                        IDV
         WRF Static                                                                                                    Data
        Preprocessor                                                                                               Interpolator
                                       4
                                                 88D Radar
                                                 Re-mapper
  Surface, terrestrial
                                                                                          7
       data files                                                                                                 10                    WRF
                                                                                    ADAS                                                 WRF
                                                                                                                    ARPS              12 WRFWRF
                                           Radar data
       Run once per                         (Level II)                                                             Ensemble
       forecast region                                              5                                              Generator
                                                                                                                                                         13
                                                NIDS Radar                            8
              Radar data                        Re-mapper                                                                      WRF to ARPS Data
               (Level III)                                                                ADAM                                   Interpolator

               Satellite                   6
                data                                                                                             Visualization on
                                                  Satellite Data                                                  users request
                                                   Re-mapper                                                                                             14
         Repeat                                                                                                                     ARPS Plotting
       periodically                                                                                                                   Program
       for new data                                                            Data mining:
                                                                              look for storm
                                                                                signature                                       Triggered if a storm
Static data     Real time data Initialization                                                  Forecast
                                                                                                                      13
                                                                                                          Visualization             is detected
                                                  Analysis     Data Mining
LEAD: A Weather Forecasting Workflow (2/2)




          WRF-Static running
            on Tungsten




                  OGCE Workflow Tutorial
Motif-Network: Whole Genome
               workflow
• Domain webs of large genomes
   – Input list of amino acid sequences
   – Identify all known domains
   – Construct webs
                                                Ensemble-type processing
                                                    (minimal network reqs)
                                                Capacity-type computing


                                                  Parallel processing
                                               Capability-type computing




                          Jeff Tilson, RENCI
CI: Execute Sub-Workflow

• Input a campaign step filename
• Execute GAMESS per step
  specification




                      Jeff Tilson, RENCI
Example: “Optimal” Weather
Prediction Using Dynamic Adaptivity
                                                 Storms Forming



                                                                      Forecast Model
Streaming
Observations                    Data Mining



          Instrument Steering

                                       Refine forecast grid



                                                                  On-Demand
                                                                  Grid Computing
Analyze &
              Predict



 Discover                    Research &
&Visualize                 Reproducibility




             Education &
              Outreach
Live Demo & Questions?




     OGCE Workflow Tutorial

Ogce Workflow Suite Tg09

  • 1.
    OGCE WorkflowSuite forScience Gateways Suresh Marru, Raminder Singh, Chathura Herath & Marlon Pierce Indiana University
  • 2.
    OGCE Gateways TeraGrid User Portal (LEAD, GridChem, …) TG GIG Generalize, Harden, Build Test Gateways/E-Science Community
  • 3.
    Requirements from gateways •Gateways demand scientific workflow systems to be: – Flexible – Dynamic – Interactive – Technology Adaptive – Interoperable with Emerging Computational Resources and their job management interfaces
  • 4.
    OGCE Workflow Suite •Generic Service Toolkit – Tool to wrap command-line applications as web services – Handles file staging&job submissions – Extensible runtime for security, resource brokering& urgent computing – Generic Factory service for on-demand creation of application services • XRegistry – Information repository for the OGCE workflow suite – Register, search, retrieve&share XML documents – User & hierarchical group based authorization • XBaya – GUI based tool to compose&monitor workflows – Extensible support for compiler plug-ins like BPEL &Jython – Dynamic Workflow Execution support to start, pause, resume, rewind of workflow executions OGCE Workflow Tutorial
  • 5.
    Features • Security – Authentication and authorization – Secure invocations between services – Support for gateway community accounts – Support for multiple user accounts • Reliability – Retry job submissions and file staging – Fault Tolerance and Recovery service • Over-provisioning and migration • Compatibility – Taverna, Kepler and Trianna OGCE Workflow Tutorial
  • 6.
    Application Services • Workflowsare built by composing web Application Factory services c – Fortran applications are “wrapped” by a Application Factory which generates a web service for the app. • Registers WSDL for the service with a registry App – Each service generates a stream of Service notifications that log the service actions back to the XMC Cat Metadata Catalog. Run program & publish events
  • 7.
    Workflow Composition, Execution & Monitoring Baya enables users to construct, share, execute and monitor sequence of tasks executing on their local workstations to high-end compute resources.
  • 8.
    Service Monitoring viaEvents • The service output is a stream of events Application Service – I am running your request Instance – I have started to move your input files. – I have all the files 6 5 – I am running your application. 4 – The application is finished 3 – I am moving the output to you file space 2 1 – I am done. • These are automatically generated by the service using a distributed event system (WS-Eventing / WS-Notification) Notification – Topic based pub-sub system with Channel a well known “channel”. Subscribe Topic=x x x listener publisher
  • 9.
    OGCE Workflow Tools WRF-Staticrunning on Tungsten
  • 10.
  • 11.
    XML Metadata Catalog(XMC Cat) Taming Complex Scientific Metadata Schemas “A significant need exists in many disciplines for long- Message Bus term, distributed, and Notifications Workflow Workflow N otification s stable data and metadata Record Workflo w Outputs repositories” Intermediate Results Workflow Configuration and – NSF Blue-Ribbon Advisory In puts Metadata Catalog rkflow Panel on Cyberinfrastructure r d Wo Reco s low ws sults o rkf lo W kf Workflow or e or yF Search R rW Co e r Qu ito mp on os “Metadata is key to being eW M or kfl ow able to share results” – UK e-Science Core Programme Study Portal More Info: Scott Jensen
  • 12.
    Applications • LEAD – Lower entry barrier to using weather analysis tools – Improve detection, analysis & prediction of mesoscale weather • Motif-Network – Transformation of sequenced genomes to “domain-space” • Cyber-Infrastructure Evaluation – Performance evaluation of future supercomputer architectures • ADAM – Algorithms for feature extraction, data normalization, classification and normalization • GridChem – Molecular Chemistry Grid helping researchers run chemistry applications on Grid Environment OGCE Workflow Tutorial
  • 13.
    LEAD: A WeatherForecasting Workflow (1/2) Terrain data files NAM, RUC, GFS data 9 3 3D Model Data 1 Interpolator Terrain 3D Model (lateral Boundary Data Surface data, Conditions) Preprocessor upper air mesonet data and Interpolator (Initial Boundary wind profiler data 11 15 Conditions) 2 ARPS to WRF IDV WRF Static Data Preprocessor Interpolator 4 88D Radar Re-mapper Surface, terrestrial 7 data files 10 WRF ADAS WRF ARPS 12 WRFWRF Radar data Run once per (Level II) Ensemble forecast region 5 Generator 13 NIDS Radar 8 Radar data Re-mapper WRF to ARPS Data (Level III) ADAM Interpolator Satellite 6 data Visualization on Satellite Data users request Re-mapper 14 Repeat ARPS Plotting periodically Program for new data Data mining: look for storm signature Triggered if a storm Static data Real time data Initialization Forecast 13 Visualization is detected Analysis Data Mining
  • 14.
    LEAD: A WeatherForecasting Workflow (2/2) WRF-Static running on Tungsten OGCE Workflow Tutorial
  • 15.
    Motif-Network: Whole Genome workflow • Domain webs of large genomes – Input list of amino acid sequences – Identify all known domains – Construct webs Ensemble-type processing (minimal network reqs) Capacity-type computing Parallel processing Capability-type computing Jeff Tilson, RENCI
  • 16.
    CI: Execute Sub-Workflow •Input a campaign step filename • Execute GAMESS per step specification Jeff Tilson, RENCI
  • 17.
    Example: “Optimal” Weather PredictionUsing Dynamic Adaptivity Storms Forming Forecast Model Streaming Observations Data Mining Instrument Steering Refine forecast grid On-Demand Grid Computing
  • 18.
    Analyze & Predict Discover Research & &Visualize Reproducibility Education & Outreach
  • 19.
    Live Demo &Questions? OGCE Workflow Tutorial