Ogce Workflow Suite


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Scientific workflow requirements challenge WS-BPEL
  • Ogce Workflow Suite

    1. 1. OGCE Workflow Toolkit for Multi-Scale Science Applications<br />Suresh Marru<br />Pervasive Technology Institute<br />Indiana University<br />ODI Gateway/Pipeline Evaluation <br />NOAO<br />Jan 21st 2010<br />
    2. 2. TeraGrid<br />
    3. 3. Data Capacitor<br /><ul><li>Provide temporary storage for the pipeline and for the science gateway, interface with archive, and data transfer from ODI and to compute resources
    4. 4. IU Bloomington data center
    5. 5. NSF funded project to create a high throughput 535 TB Lustre storage solution
    6. 6. Short term storage for applications with high bandwidth needs
    7. 7. IU added a wide area version in 2008
    8. 8. Enhanced wide area security</li></li></ul><li>Data Capacitor Architecture<br />Clients<br />Metadata Servers<br />Object Storage<br />Object Storage<br />Force-10 Switch<br />Twenty-four object storage servers attached to shared storage<br />
    9. 9. Massive Data Storage System<br />MDSS Architecture<br />
    10. 10. <ul><li>Deep
    11. 11. Wide
    12. 12. Open</li></li></ul><li>Science Gateway<br />A Science Gateway is a community-developed set of tools, applications, and data that is integrated via a portal or a suite of applications, usually in a graphical user interface, that is customized to meet the needs of the targeted community.<br />More Info - http://www.teragrid.org/gateways/ <br />
    13. 13. Gateways Advantages<br />Increase access<br />To instruments<br />Increase capabilities<br />To analyze data<br />Improve workforce development<br />For underserved populations<br />Increase outreach<br />Increase public awareness<br />Public sees value in investments in large facilities<br />
    14. 14. There are approximately 30 gateways using the TeraGrid<br />
    15. 15. Example Gateway Accomplishments<br />LEAD - access to radar data<br />NVO – access to sky surveys<br />OOI – access to sensor data<br />PolarGrid – access to polar ice sheet data<br />SIDGrid – analysis tools<br />GridChem – developing multiscale coupling<br />How would this have been done before gateways?<br />How many details do we want each individual scientist to need to know?<br />
    16. 16. TeraGrid Advantages and Challenges<br />What’s different when the resource doesn’t belong just to me?<br />Resource discovery<br />Accounting<br />Security<br />Proposal-based requests for resources (peer-reviewed access)<br />Code scaling and performance numbers<br />Justification of resources<br />Gateway citations<br />Tremendous benefits at the high end, but even more work for the developers<br />Potential impact on science is huge<br />Small number of developers can impact thousands of scientists<br />But need a way to train and fund those developers and provide them with appropriate tools<br />
    17. 17. Example Gateway: LEAD<br />
    18. 18. Weather is Local, High-Impact, Heterogeneous and Rapidly Evolving…Yet Our Technologies and Thinking are Static<br />Rain and<br />Snow<br />Fog<br />Rain and<br />Snow<br />Snow and<br />Freezing Rain<br />Intense<br />Turbulence<br />Severe<br />Thunderstorms<br />
    19. 19. LEAD Dynamic Adaptive Infrastructure <br />Storms Forming<br />Forecast Model<br />Streaming<br />Observations<br />Data Mining<br />Instrument Steering<br />Refine forecast grid<br />
    20. 20. NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA)<br />UMass/Amherst, OU, CSU, UPRM<br />Concept: inexpensive, phased array Doppler radars on cell towers and buildings<br />Dynamically adaptive sensing of multiple targets while simultaneously meeting multiple end-user needs<br />
    21. 21. LEAD Workflow Requirements<br />Run jobs on-demand on TeraGrid.<br />Deadline driven workflows (severe weather tracking)<br />Users ranging from 8th grade students to seasoned researchers.<br />Run jobs on Multiple TeraGrid resources to decrease turn-around time.<br />Must be able to integrate to Portal with very user friendly web interface.<br />
    22. 22. Workflow Survey in 2003(http://www.extreme.indiana.edu/swf-survey/)<br />
    23. 23. High Level LEAD Architecture<br />Workflow graph<br />Application <br />services<br />Compute Engine<br />User Portal<br />Workflow<br />Engine<br />Fault<br />Tolerance<br />& scheduler<br />Event Notification Bus<br />Portal<br />server<br />MyLEAD <br />Agent<br />service<br />Data<br />Management<br />Service<br />Data<br />Catalog<br />service<br />Providence<br />Collection<br />service<br />MyLEAD User<br />Metadata<br />catalog<br />Data Storage<br />
    24. 24. LEAD Pioneering Technology<br />
    25. 25. LEAD Scientists and Educational Interactions<br />Lowering the barrier for using complex end-to-end weather technologies<br />Democratize<br />Empower<br />Facilitate<br />End Users<br />Developers<br />Researcherss<br />
    26. 26.
    27. 27. Layered Architecture<br />
    28. 28. Open Grid Computing Environments<br />Generalize, Harden, Build Test<br />
    29. 29. Flexible Layered Service Oriented Architecture<br />User Interactions<br />Other Clients<br />XBaya GUI<br />Web Portal<br />XBaya Core<br />Event Bus<br />Middleware Services<br />GFac Services<br />Workflow Engine (ODE)<br />XRegistry<br />XMCCat Metadata Catalog<br />Compute & Data Resources<br />Computational Cloud <br />Local Lab Resources<br />Computational Grids<br />
    30. 30. OGCE Workflow Suite<br />Generic Service Toolkit<br />Tool to wrap command-line applications as web services<br />Handles file staging & job submission and monitoring<br />Extensible runtime for security, resource brokering & urgent computing<br />Generic Factory service for on-demand creation of application services<br />XRegistry<br />Information repository for the OGCE workflow suite<br />Register, search, retrieve & share XML documents<br />User & hierarchical group based authorization<br />XBaya<br />GUI based tool to compose & monitor workflows<br />Extensible support for compiler plug-ins like BPEL, Jython, SCUFL<br />Dynamic Workflow Execution support to start, pause, resume, rewind of workflow executions<br />Apache ODE Scientific Workflow Extensions<br />XBaya GUI integration for BPEL Generation<br />Asynchronous support for long running workflows<br />Instrumented with fine grained monitoring<br />Eventing System<br />Supports both WS-Eventing and WS-Notification Standards<br />Very scalable<br />Persistent Message Box for clients behind firewalls and with intermittent network glitches.<br />
    31. 31. Generic Application Service Factory<br />
    32. 32. Application Wrapper Framework<br />c<br />Application Factory<br /><ul><li>Scientific Applications are wrapped into web services by filling in a web-form.
    33. 33. The Application Factory generates a web service for each application with I/O interfaces.
    34. 34. Registers WSDL for the service with a registry
    35. 35. Each service generates a stream of notifications that log the service actions back to the XMCCat Metadata Catalog, user monitoring, and provenance tracking tools</li></ul>App <br />Service<br />Run program<br />& publish events<br />
    36. 36. Service Monitoring via Events<br />1<br />2<br />3<br />4<br />5<br />6<br /><ul><li>The service output is a stream of events
    37. 37. I am running your request
    38. 38. I have started to move your input files
    39. 39. I have all the files
    40. 40. I am running your application
    41. 41. The application is finished
    42. 42. I am moving the output to you file space
    43. 43. I am done
    44. 44. These are automatically generated by the service using a distributed event system(WS-Eventing / WS-Notification)
    45. 45. Topic based pub-sub system witha well known “channel”</li></ul>Application<br />Service<br />Instance<br />Notification<br />Channel<br />x<br />x<br />publisher<br />listener<br />
    46. 46. Workflow Composer<br />
    47. 47. Interoperable XBaya Workflow Architecture<br />BPEL 1.1<br />BPEL 2.0<br />SCUFL<br />Abstract DAG Model<br />Composition and Monitoring<br />Python<br />Dynamic Enactor/Interpreter<br />Jython<br />Based Enactor<br />GPEL <br />Engine<br />Apache <br />ODE Engine<br />Taverna<br />Python Runtime<br />Message Bus<br />
    48. 48. WS-BPEL<br />Business Process Execution Language for Web Services (WS-BPEL)<br />De-facto standard for specifying web service based business processes and service compositions<br />Basic activities<br />Invoke, Receive, Assign..<br />Structured activities<br />Sequence, Flow, ForEach,..<br />
    49. 49. Workflow Composition, Execution & Monitoring<br /> XBaya enables users to construct, share, execute and monitor sequence of tasks executing on their local workstations to high-end compute resources.<br />
    50. 50. GPEL<br />Grid Process Execution Language<br />BPEL4WS based home grown research workflow engine<br />Supports a subset of BPEL4WS 1.1 <br />One of the very early adaptations of BPEL efforts<br />Specifically designed for eScience Usage<br />Long running workflow support<br />Decoupled client<br />
    51. 51. Benefits of Porting to Apache ODE<br />
    52. 52. XML Metadata Catalog (XMC Cat)<br />Taming Complex Scientific Metadata Schemas<br />“A significant need exists in many disciplines for long-term, distributed, and stable data and metadata repositories”<br /><ul><li>NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure</li></ul>“Metadata is key to being able to share results”<br /><ul><li>UK e-Science Core Programme Study</li></ul>More Info: Scott Jensen, Beth Plale<br />
    53. 53. Simple Recovery Architecture<br />Portal<br />BPEL<br />Workflow <br />Engine<br />Application <br />Performance <br />Models<br />Fault Tolerance/<br />Recovery Service<br />Resource <br />Reliability <br />Models<br />Application <br />Service<br />OVP/<br />RST/ <br />MIG <br />NWS, MDS<br />BQP<br />Notification <br />Service<br />Deadline &<br />Success <br />Probability<br />36<br />36<br />
    54. 54. OGCE Workflow Usage Flow<br />Scientist/Application provider registers application description with Registry Service.<br />Workflow Author constructs the workflow with multiple wrapped application services.<br />Workflow is compiled and deployed to the ODE workflow Engine.<br />Workflow inputs are captured by XBaya and workflow is launched to ODE.<br />Workflow system and possibly some services publish notifications to the Message bus reporting the progress of the workflow.<br />XBaya monitoring system listens to notifications and color the workflow components to present workflow progress.<br />
    55. 55. Packaged, Downloadable Software<br />http://www.collab-ogce.org/ogce/index.php/Main_Page<br />
    56. 56. ODI Overview – Image Acquisition<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    57. 57. Overview – Image Copy<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    58. 58. Overview – Image Transfer to Data Capacitor<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    59. 59. Overview – Image Ingestions into the Archive<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    60. 60. Overview – Clean Up<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    61. 61. Overview – Automated Tier 1 Processing<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    62. 62. Overview – User Driven Tier 2 Processing<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    63. 63. Overview – User Driven Tier 2 Processing<br />WIYN Buffer<br />ODI<br />Data Capacitor<br />Integration Server<br />Data Capacitor<br />Science Gateway<br />End Users<br />MDSS Archive<br />TeraGrid Compute<br />Resources<br />
    64. 64. Molecular Chemistry Gateway <br /><ul><li>Challenges:
    65. 65. Some apps have rich Client Gui’s, a challenge with asynchronous long running workflows
    66. 66. Workflow Verification Service
    67. 67. Parametric sweep scheduling, monitoring iteration steps, graphical composition
    68. 68. Human Interaction into Workflows</li></li></ul><li>Example: Assimilation Workflow<br />
    69. 69. Motif Network Genome Analysis<br />Ensemble-type processing<br />(minimal network reqs)<br />Capacity-type computing<br />Parallel processing<br />Capability-type computing<br /><ul><li>Domain webs of large genomes
    70. 70. Input list of amino acid sequences
    71. 71. Identify all known domains
    72. 72. Construct webs</li></li></ul><li>BioVLAB Cloud Use Case<br />
    73. 73. Live Demo & Questions?<br />