Opportunities and Challenges for
Running Scientific Workflows
on the Cloud
Yong Zhao, Xubo Fei, Ioan Raicu, Shiyong Lu

Cy...
Overview
 INTRODUCTION
 OPPORTUNITIES
 CHALLENGES
 RESEARCH DIRECTIONS
 CONCLUSIONS
INTRODUCTION
 There is something in the air.
INTRODUCTION
 Cloud computing is gaining tremendous momentum in both
academia and industry.
 “Cloud Computing”: a large-...
INTRODUCTION
 Manage and run workflow applications on the cloud
(especially data-intensive scientific workflows)
 Severa...
OPPORTUNITIES
 Keywords: Infinite computing resource
 1. The scale of scientific problems that can be addressed
by scien...
OPPORTUNITIES
 2. The on-demand resource allocation
mechanism in Cloud has a number of
advantages over the traditional cl...
OPPORTUNITIES
 3. Much bigger room for trade-off between
performance and cost.
 Spectrum of resource investment: from de...
CHANLLENGES
 Architectural challenges
 Integration challenges
 Computing challenges
 Data management challenges

 Lan...
Architectural Challenges
User interface customizability and support

Reproducibility support
Heterogeneous and distributed...
Reference Architecture for SWFMSs
Deploy the architecture: solutions
Operation

Task
Management

Workflow
management

All_in_the_could

SWFMS running
out of...
Integration Challenges
How to integrate scientific workflow systems with Cloud
infrastructure and resources ?

 Operation...
Language Challenges
 MapReduce: a widely used computing model, with two
key function, Map and Reduce. --White-Box

 Swif...
Language Challenges
 Handle the mapping from input and output data into
logical structures.
 Support large-scale paralle...
Computing Challenges
 Workflow system may not be able to talk to Cloud
resources directly  middleware services needed.
(...
Data management challenges
 When data intensiveness increase, the management of
data resources and dataflow between the s...
Service management challenges
 The engineering of the components of an SWFMS as
services:
 thousands of services develop...
RESEARCH DIRECTIONS
 Emphasis on workflow reference architecture and direct
research effort to foregoing layers
 Great l...
RESEARCH DIRECTIONS
 SWFMS security
 Access control: critical because of the natures of
clouds ( Dynamic, large data and...
CONCLUSIONS
 As more customers and applications migrate into Cloud,
the requirement to have workflow system to manage
com...
Thank You!
Upcoming SlideShare
Loading in …5
×

Opportunities and Challenges for Running Scientific Workflows on the Cloud

477 views

Published on

Opportunities and Challenges for Running Scientific Workflows on the Cloud

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
477
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Cloud Computing has proven to be one of the great disruptive technologies of our time, and the effects of its increasing adoption and maturation will ripple out.Cloud Computing is here to stay, and as developers become more aware of the immense potential.
  • Opportunities and Challenges for Running Scientific Workflows on the Cloud

    1. 1. Opportunities and Challenges for Running Scientific Workflows on the Cloud Yong Zhao, Xubo Fei, Ioan Raicu, Shiyong Lu Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2011 International Conference Ying Lian Computer Science, WSU
    2. 2. Overview  INTRODUCTION  OPPORTUNITIES  CHALLENGES  RESEARCH DIRECTIONS  CONCLUSIONS
    3. 3. INTRODUCTION  There is something in the air.
    4. 4. INTRODUCTION  Cloud computing is gaining tremendous momentum in both academia and industry.  “Cloud Computing”: a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet.  Mostly applied to Web applications and business applications. To support workflow applications a link is missing
    5. 5. INTRODUCTION  Manage and run workflow applications on the cloud (especially data-intensive scientific workflows)  Several Scientific workflow management systems (SWFMSs) have been applied.  Cloud Workflow: specification, execution, and provenance tracking of scientific workflows, as well as the management of data and computing resources to enable the running of scientific workflows on the Cloud  Following sections: Meaning, challenges, research opportunities
    6. 6. OPPORTUNITIES  Keywords: Infinite computing resource  1. The scale of scientific problems that can be addressed by scientific workflows is now greatly increased, which was previously upbounded by the size of a dedicated resource pool with limited resource sharing extension in the form of virtual organizations.  data size (e.g. GenBank double/9-12m )—vast storage space  complexities of the applications (e.g. protein simulation by iterative algorithm with huge parameters) – massive computing resources
    7. 7. OPPORTUNITIES  2. The on-demand resource allocation mechanism in Cloud has a number of advantages over the traditional cluster/Grid environments for scientific workflows:  Improve resources utilization. Unequal numbers of recourses are required for different stages.  Faster turn-around time for end users: dynamic scale out/in  Enable new generation workflow: collaborative scientific workflow. In which user interaction and collaboration patterns are favored
    8. 8. OPPORTUNITIES  3. Much bigger room for trade-off between performance and cost.  Spectrum of resource investment: from delicate private resources, hybrid local & cloud, full outsourcing on clouds  Cloud computing bring the opportunities to improve the performance/cost ratio  But the optimization of this ratio and automatic tradeoff mechanism remain challenging.
    9. 9. CHANLLENGES  Architectural challenges  Integration challenges  Computing challenges  Data management challenges  Language challenges  Service management challenges
    10. 10. Architectural Challenges User interface customizability and support Reproducibility support Heterogeneous and distributed services and software tools integration Heterogeneous and distributed data product management High-end computing support Workflow monitoring and failure handling Interoperability
    11. 11. Reference Architecture for SWFMSs
    12. 12. Deploy the architecture: solutions Operation Task Management Workflow management All_in_the_could SWFMS running out of the Cloud Not on a batch-based schedule Presentation Layer deployed at a client machine SWFMS inside the cloud, and accessed via Web browser No concern of vendor lock-in Deploy immediately without sequence Suitable for ad hoc domainspecific requirement Highly scalable: Software as a Service SWFMS itself cannot benefit from the scalability Cost of storage of provenance & data products More dependent on Cloud platform Cost; Dependency; Vendor lock-in
    13. 13. Integration Challenges How to integrate scientific workflow systems with Cloud infrastructure and resources ?  Operation layer : Applications, services, and tools hosted in the Cloud and the scheduling and management of a workflow are outside the Cloud. (e.g. Google Map service use ad hoc scripts and programs to glue the services together)  Task management layer: resource provisioning. (e.g. Nimbus)  Workflow management layer: Debugging, monitoring, and provenance tracking  All in cloud: porting issue. Need a workflow engine at cloud end, and web interface or thin client at user end
    14. 14. Language Challenges  MapReduce: a widely used computing model, with two key function, Map and Reduce. --White-Box  SwiftScript serves as a general purpose coordination language, where existing applications can be invoked without modification. --Black-Box
    15. 15. Language Challenges  Handle the mapping from input and output data into logical structures.  Support large-scale parallelism via either implicit parallelism, or explicit declaratives.  Support data partitioning and task partitioning.  Require a scalable, reliable, and efficient runtime system that can support Cloud-scale task scheduling and dispatching, provide error recovery and fault tolerance.
    16. 16. Computing Challenges  Workflow system may not be able to talk to Cloud resources directly  middleware services needed. (Nimbus or Falkon to handle the resource provisioning and task dispatching)  More complicated if consider: workflow resource requirement, data dependencies, Cloud virtualization.  A SWFMS will try to automatically recover when non-fatal errors happen. Smart-return: detailed execution info be logged, for workflow restart.
    17. 17. Data management challenges  When data intensiveness increase, the management of data resources and dataflow between the storage and compute resources become the bottleneck.  Data Locality: CPU cheaper, data inflate  location is the most challenge, rather than the computational resources  Combining compute and data management: need to minimize the amount of data movement. Otherwise, significant underutilization of raw resources will be yield.  Provenance: derivation history of a data product. Tracking across service providers, and across different abstraction layers. Secure access is another missing now.
    18. 18. Service management challenges  The engineering of the components of an SWFMS as services:  thousands of services developed and available for the myExperiment project  the LEAD system has developed a tool to wrap and convert ordinary science applications into services  The orchestration and invocation of services from an SWFMS  managing the large number of service instances  data movements across different service instances
    19. 19. RESEARCH DIRECTIONS  Emphasis on workflow reference architecture and direct research effort to foregoing layers  Great leap on Middleware development: resource management, monitoring, messaging  Many Task computing (MTC): preliminary applied in Grids and supercomputer, expected to largely improved for Cloud  Scripting: mixture of semantics, combination of application of services…  Cost optimization: very challenging, but rewarding too
    20. 20. RESEARCH DIRECTIONS  SWFMS security  Access control: critical because of the natures of clouds ( Dynamic, large data and service sharing)  Information flow control: assure the scientific flow related info propagated to an authorized end  Secure electronic transaction protocol: pay-as-you-go pricing model
    21. 21. CONCLUSIONS  As more customers and applications migrate into Cloud, the requirement to have workflow system to manage complex tasks will become more urgent  Now mash-up’s and MapReduce style task management have been acting in place of a workflow system in the Cloud  The opportunities and challenges in bringing workflow systems into the Cloud are discussed  They identify key research directions in realizing scientific workflows in Cloud environments
    22. 22. Thank You!

    ×