This document discusses opportunities and challenges for running scientific workflows on cloud computing platforms. It begins by introducing cloud computing and scientific workflows. It then describes three main opportunities: 1) increased scale to address larger problems, 2) on-demand resource allocation for improved efficiency, and 3) more flexibility to optimize performance and cost. Several challenges are also outlined, including architectural integration issues, data management difficulties, and the need for cloud-compatible workflow languages and services. The document concludes by proposing research directions to address these challenges and better support scientific workflows in cloud environments.
Cedar Day 2018 - Integrating PeopleSoft Payroll - Alex LightstoneCedar Consulting
Companies are increasingly adopting Cloud applications for their core HR operations. However, some clients are choosing to have retain PeopleSoft for payroll processing, either as an on-premises implementation or running on a Cloud infrastructure platform (IaaS / PaaS).
In this session, we will look at the reasons behind those decisions, the benefits gained and the options available for interfacing to PeopleSoft and how this was implemented.
Cedar Day 2018 - Integrating PeopleSoft Payroll - Alex LightstoneCedar Consulting
Companies are increasingly adopting Cloud applications for their core HR operations. However, some clients are choosing to have retain PeopleSoft for payroll processing, either as an on-premises implementation or running on a Cloud infrastructure platform (IaaS / PaaS).
In this session, we will look at the reasons behind those decisions, the benefits gained and the options available for interfacing to PeopleSoft and how this was implemented.
The existing concept of virtualization provides increased system utilization via virtual infrastructure and promotes resource
sharing across an organization. To maximize the effective use of resources, cloud computing is used which uses service oriented architecture
infrastructure with on demand provisioning of software, platform and infrastructure as service. Dynamic service management is achieved by
implementing the cloud where dynamically scalable resources are provided as a service over the internet. This can be viewed as an
extension of grid computing, combined with utility computing and autonomic computing which helps an organization in converting the
capital expenditure into utility expenditure. This paper focuses on the basics of cloud computing technology.
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Whitepaper: Choose the cloud platform that beats the competition - Citrix Clo...Citrix
Server virtualization introduced the concepts of automation and agility to servers and applications. Now, as enterprises seek to achieve ever-higher levels of business agility, IT efficiency and cost control, they are exploring private clouds as a way to extend those benefits to the entire datacenter. There are many private cloud platforms in the market today, yet most fail to address key cloud design requirements that are critical to success, notably compatibility, scalability and flexibility. When choosing a private cloud platform, enterprises should carefully consider these and other private cloud attributes.
Cloud Lock-in vs. Cloud Interoperability - Indicthreads cloud computing conf...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract:As the cloud adoption increases, there is a growing concern about the lock-in of customers into the various cloud platforms. This session will discuss various major cloud platforms, the type of lock-in the customer will face in each of these platforms and what each customer can do to minimize their lock-in.
Key takeaways for audience are:
Understand what is cloud lock-in
Types of cloud vendor lock-ins
What is cloud interoperability
Major initiatives around cloud interoperability standards
Goals, differences and players/proponents of these major standards
Steps to minimize cloud lock-in for your customers
Speaker: Ashwin Waknis is a Sr. IT professional with 15 years in the industry. Ashwin is currently head of the Cloud Professional Services Business at Persistent Systems. Before that Ashwin was a Sr. Product Manager at Cisco Systems where he lead major initiatives around Knowledge Management, Enterprise Portal, Web 2.0/Social softwares and Enterprise Search. For the last 2 years, Ashwin has been involved in Cloud Computing initiatives first at Cisco and then at Persistent Systems.Ashwin has spoken at many customer workshops and events organized for educational institutes.
The cloud ecosystem for the enterprise: Comparative analysis of the leading c...Skender Kollcaku
Innovations are necessary to ride the inevitable tide of change.
In this analysi I try to make clear evidence of what could be the next paradigms in the Cloud-based organizations. In my research I tend to obtain a wider view of what is the current environment in the business core, what enables the IT transformation, why this process is natural and reflects immediate necessity in recent times (see chapter 1: It takes an ecosystem).
Three business archetypes.
I identify the possible future roles inside an organization who custody the abilities or hidden ambitions to evolve or adopt sooner the Cloud Computing model to add business value.
Trying to answer the question “Why the need for Cloud Computing?” brought me to consider three different types of cloud application: cloud in Public Administration, in educational field (E-Learning) and in my personal experience in my job (cloud integration between two platforms). These three successful adoptions of cloud model make us think about how to improve services and invent new ones.
Future paradigms (flash disks, Big Data and cloud-based streaming services, combined with the almost infinite resources of cloud providers (in terms of shared datacenters, memory and speed processing), lead the tendencies to the next evolution step: web-based and multi-platform applications. This technology-oriented achievement will reveal shifting behaviours in users and organizations and their dynamically changing needs.
Cloud providers (I analyse in the last two chapters Amazon, Google, IBM and Microsoft) will continue to take part into a competition of most convenient tech offerings, but this should not be the main attraction point when talking about cloud models.
The focal point is how cloud computing will determine new models of organizations and what areas are going to develop by introducing cloud in their strategic operations and future investments.
That is challenging!
Paolo Merialdo, Cloud Computing and Virtualization: una introduzioneInnovAction Lab
Intervento di Paolo Merialdo, Professore dell'Università degli Studi Roma Tre all'evento "Cloud Computing e Virtualization" di Roma, 17 Settembre 2010, organizzato da Innovation Lab. http://innovationlab.dia.uniroma3.it/?p=124
This Lustratus REPAMA presentation explores some of the likely channel models that cloud computing will bring about. It is circulated in DRAFT form to create discussion and debate.
This presentation will help you all a lot.
because this is not from a particular text book or a reference guide it is a collection of several web sites.
Supporting bioinformatics applications with hybrid multi-cloud servicesAhmed Abdullah
ElasticHPC Supports the creation and management of cloud computing resources over multiple public cloud Providers Including Amazon, Azure, Google and Clouds supporting OpenStack.
The existing concept of virtualization provides increased system utilization via virtual infrastructure and promotes resource
sharing across an organization. To maximize the effective use of resources, cloud computing is used which uses service oriented architecture
infrastructure with on demand provisioning of software, platform and infrastructure as service. Dynamic service management is achieved by
implementing the cloud where dynamically scalable resources are provided as a service over the internet. This can be viewed as an
extension of grid computing, combined with utility computing and autonomic computing which helps an organization in converting the
capital expenditure into utility expenditure. This paper focuses on the basics of cloud computing technology.
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Whitepaper: Choose the cloud platform that beats the competition - Citrix Clo...Citrix
Server virtualization introduced the concepts of automation and agility to servers and applications. Now, as enterprises seek to achieve ever-higher levels of business agility, IT efficiency and cost control, they are exploring private clouds as a way to extend those benefits to the entire datacenter. There are many private cloud platforms in the market today, yet most fail to address key cloud design requirements that are critical to success, notably compatibility, scalability and flexibility. When choosing a private cloud platform, enterprises should carefully consider these and other private cloud attributes.
Cloud Lock-in vs. Cloud Interoperability - Indicthreads cloud computing conf...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract:As the cloud adoption increases, there is a growing concern about the lock-in of customers into the various cloud platforms. This session will discuss various major cloud platforms, the type of lock-in the customer will face in each of these platforms and what each customer can do to minimize their lock-in.
Key takeaways for audience are:
Understand what is cloud lock-in
Types of cloud vendor lock-ins
What is cloud interoperability
Major initiatives around cloud interoperability standards
Goals, differences and players/proponents of these major standards
Steps to minimize cloud lock-in for your customers
Speaker: Ashwin Waknis is a Sr. IT professional with 15 years in the industry. Ashwin is currently head of the Cloud Professional Services Business at Persistent Systems. Before that Ashwin was a Sr. Product Manager at Cisco Systems where he lead major initiatives around Knowledge Management, Enterprise Portal, Web 2.0/Social softwares and Enterprise Search. For the last 2 years, Ashwin has been involved in Cloud Computing initiatives first at Cisco and then at Persistent Systems.Ashwin has spoken at many customer workshops and events organized for educational institutes.
The cloud ecosystem for the enterprise: Comparative analysis of the leading c...Skender Kollcaku
Innovations are necessary to ride the inevitable tide of change.
In this analysi I try to make clear evidence of what could be the next paradigms in the Cloud-based organizations. In my research I tend to obtain a wider view of what is the current environment in the business core, what enables the IT transformation, why this process is natural and reflects immediate necessity in recent times (see chapter 1: It takes an ecosystem).
Three business archetypes.
I identify the possible future roles inside an organization who custody the abilities or hidden ambitions to evolve or adopt sooner the Cloud Computing model to add business value.
Trying to answer the question “Why the need for Cloud Computing?” brought me to consider three different types of cloud application: cloud in Public Administration, in educational field (E-Learning) and in my personal experience in my job (cloud integration between two platforms). These three successful adoptions of cloud model make us think about how to improve services and invent new ones.
Future paradigms (flash disks, Big Data and cloud-based streaming services, combined with the almost infinite resources of cloud providers (in terms of shared datacenters, memory and speed processing), lead the tendencies to the next evolution step: web-based and multi-platform applications. This technology-oriented achievement will reveal shifting behaviours in users and organizations and their dynamically changing needs.
Cloud providers (I analyse in the last two chapters Amazon, Google, IBM and Microsoft) will continue to take part into a competition of most convenient tech offerings, but this should not be the main attraction point when talking about cloud models.
The focal point is how cloud computing will determine new models of organizations and what areas are going to develop by introducing cloud in their strategic operations and future investments.
That is challenging!
Paolo Merialdo, Cloud Computing and Virtualization: una introduzioneInnovAction Lab
Intervento di Paolo Merialdo, Professore dell'Università degli Studi Roma Tre all'evento "Cloud Computing e Virtualization" di Roma, 17 Settembre 2010, organizzato da Innovation Lab. http://innovationlab.dia.uniroma3.it/?p=124
This Lustratus REPAMA presentation explores some of the likely channel models that cloud computing will bring about. It is circulated in DRAFT form to create discussion and debate.
This presentation will help you all a lot.
because this is not from a particular text book or a reference guide it is a collection of several web sites.
Supporting bioinformatics applications with hybrid multi-cloud servicesAhmed Abdullah
ElasticHPC Supports the creation and management of cloud computing resources over multiple public cloud Providers Including Amazon, Azure, Google and Clouds supporting OpenStack.
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsAhmed Abdullah
We have introduced elasticHPC-Docker based on container technology. Our package enables the creation of a computer cluster with containerized applications and workflows in private and in different commercial clouds using single interface. It also includes options to manage the cluster, to deploy and run bioinformatics applications for large datasets, and to interface with image registries.
CloudFlow: Computational Cloud Services and Workflows for Agile EngineeringI4MS_eu
The motivating idea behind CloudFlow is to open up the power of Cloud Computing for engineering WorkFlows (CloudFlow). The aim of CloudFlow is to enable engineers to access services on the Cloud spanning domains such as CAD, CAM, CAE (CFD), Systems and PLM, and combining them to integrated workflows leveraging HPC resources. Workflows are of key importance in todays product/production development processes were products show ever increasing complexity integrating geometry, mechanics, electronics and software aspects. Such complex products require multi-domain simulation, simulation-in-the-loop and synchronized workflows based on interoperability of data, services and workflows.
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013Amazon Web Services
Ingesting, storing, processing and delivering a large library of content involves massive complexity. This session walks through sample code that leverages AWS Services to perform all these tasks while coordinating the activities with Amazon Simple Workflow Service (SWF). Along the journey you are introduced to best practices for cost optimization, monitoring, reporting, and exception or error handling. In addition to the sample workflow, a guest speaker from Netflix takes the audience on a deep dive into their “digital supply chain” where you learn how they have automated their processes in moving data all the way from the studios to the last mile. Services covered include Amazon SWF, Amazon Simple Storage Service (S3), Amazon Glacier, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Transcoder, Amazon Mechanical Turk, and Amazon CloudFront.
Social media provide the infrastructure for communication without limits of time, place or participation. In contrast, deliverable-oriented routine teamwork, such as is typical in Procurement environments, needs defined processes involving selected individuals.
This webinar introduces Procurement Specialists to the concept Workflows in the Cloud and their advantages. It also gives an opportunity to prioritise the more important benefits.
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflowsmingtemp
http://www.cs.virginia.edu/~mm5bw/papers/WorkflowAutoScaling.pdf
The presentation for SC 2011
http://dl.acm.org/citation.cfm?id=2063449
www.mingmao.org
Scaling wix with microservices architecture devoxx London 2015Aviran Mordo
Many small startups build their systems on top of a traditional toolset like Tomcat, Hibernate, and MySQL. These systems are used because they facilitate easy development and fast progress, but many of them are monolithic and have limited scalability. So as a startup grows, the team is confronted with the problem of how to evolve the system and make it scalable.
Facing the same dilemma, Wix.com grew from 0 to 60 million users in just a few years. Facing some interesting challenges, like performance and availability. Traditional performance solutions, such as caching, would not help due to a very long tail problem which causes caching to be highly inefficient. And because every minute of downtime means customers lose money, the product needed to have near 100% availability.
Solving these issues required some interesting and out-of-the-box thinking, and this talk will discuss some of these strategies: building a highly preformant, highly available and highly scalable system; and leveraging microservices architecture and multi-cloud platforms to help build a very efficient and cost-effective system.
This session walks through approaches for media ingest, storage, processing and delivery scenarios on the AWS cloud. We cover solutions for high speed file transfer, cloud-based transcoding, tiered storage, content processing, and global low latency delivery, as well as the orchestration and management of the entire media workflow with the AWS Data Pipeline service. Attendees can expect to come away with an understanding of best practices for architecting and deploying cloud-based media workflows.
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Over the past five years, cloud computing has gone from a curiosity to
core scientific technology. The cloud's relative simplicity, instant
availability, and reasonable cost have made it attractive to
scientists, especially in domains relatively new to large scale data
analysis. This trend will continue into the foreseeable future,
challenging resource providers to adapt their services, to provide
easy federation with other providers, and to accommodate many
different scientific disciplines. For developers of cloud services,
there are also many challenges. Efficient access to, and the curation
of large data sets remain largely unsolved problems. Image
management also raises new issues, especially if these images are to
be shared and trusted. This presentation reviews the current status
of cloud computing and presents some ideas on how the upcoming
challenges might be met.
Presented at CNAF in Bologna, Italy by Charles Loomis in May 2013.
Cloud computing is a technique that has a great capabilities and benefits for users. Cloud characteristics
encourage many organizations to move to this technology. But many consideration faces transmission
process. This paper outline some of these considerations and considerable efforts solved cloud scalability
issues.
This is a 2 hour strategy workshop developed by Predrag Mitrovic (http://mynethouse.se and http://cloudadvisor.se).
The workshop is intended for CIOs and roles close to business strategy formulation around technology. Feel free to use the material and develop it further, as long as you give me access to the updated materials.
Any questions can be directed to my e-mail: predrag[at]mynethouse.se
I hope that you enjoy this material and find it useful.
/Predrag a.k.a Cloud Advisor
Developers are constantly seeking an easier and faster way to build and ship new and modern software features and capabilities based on the latest and greatest cloud APIs. DevOps teams and IT professionals, on the other hand, face the challenge of controlling security, compliance, performance, scalability, and availability of the underlying infrastructure environments. Can both of these initiatives be achieved?
These slides based on the webinar featuring Torsten Volk, research director at leading IT analyst firm EMA, highlight how to bridge the gap between these two key initiatives and transform corporate IT into an accelerator for digital transformation.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Opportunities and Challenges for Running Scientific Workflows on the Cloud
1. Opportunities and Challenges for
Running Scientific Workflows
on the Cloud
Yong Zhao, Xubo Fei, Ioan Raicu, Shiyong Lu
Cyber-Enabled Distributed Computing and Knowledge Discovery
(CyberC), 2011 International Conference
Ying Lian
Computer Science, WSU
4. INTRODUCTION
Cloud computing is gaining tremendous momentum in both
academia and industry.
“Cloud Computing”: a large-scale distributed computing
paradigm that is driven by economies of scale, in which a
pool of abstracted, virtualized, dynamically-scalable,
managed computing power, storage, platforms, and
services are delivered on demand to external customers
over the Internet.
Mostly applied to Web applications and business
applications. To support workflow applications
a link is missing
5. INTRODUCTION
Manage and run workflow applications on the cloud
(especially data-intensive scientific workflows)
Several Scientific workflow management systems
(SWFMSs) have been applied.
Cloud Workflow: specification, execution, and
provenance tracking of scientific workflows, as well as
the management of data and computing resources to
enable the running of scientific workflows on the Cloud
Following sections: Meaning, challenges, research
opportunities
6. OPPORTUNITIES
Keywords: Infinite computing resource
1. The scale of scientific problems that can be addressed
by scientific workflows is now greatly increased, which
was previously upbounded by the size of a dedicated
resource pool with limited resource sharing extension in
the form of virtual organizations.
data size (e.g. GenBank double/9-12m )—vast storage
space
complexities of the applications (e.g. protein simulation by
iterative algorithm with huge parameters) – massive
computing resources
7. OPPORTUNITIES
2. The on-demand resource allocation
mechanism in Cloud has a number of
advantages over the traditional cluster/Grid
environments for scientific workflows:
Improve resources utilization. Unequal numbers of
recourses are required for different stages.
Faster turn-around time for end users: dynamic scale
out/in
Enable new generation workflow: collaborative
scientific workflow. In which user interaction and
collaboration patterns are favored
8. OPPORTUNITIES
3. Much bigger room for trade-off between
performance and cost.
Spectrum of resource investment: from delicate
private resources, hybrid local & cloud, full outsourcing
on clouds
Cloud computing bring the opportunities to improve
the performance/cost ratio
But the optimization of this ratio and automatic tradeoff mechanism remain challenging.
9. CHANLLENGES
Architectural challenges
Integration challenges
Computing challenges
Data management challenges
Language challenges
Service management challenges
10. Architectural Challenges
User interface customizability and support
Reproducibility support
Heterogeneous and distributed services and
software tools integration
Heterogeneous and distributed data product
management
High-end computing support
Workflow monitoring and failure handling
Interoperability
12. Deploy the architecture: solutions
Operation
Task
Management
Workflow
management
All_in_the_could
SWFMS running
out of the
Cloud
Not on a
batch-based
schedule
Presentation
Layer
deployed at a
client machine
SWFMS inside
the cloud, and
accessed via
Web browser
No concern of
vendor lock-in
Deploy
immediately
without
sequence
Suitable for ad
hoc domainspecific
requirement
Highly scalable:
Software as a
Service
SWFMS itself
cannot benefit
from the
scalability
Cost of storage
of provenance
& data
products
More
dependent on
Cloud platform
Cost;
Dependency;
Vendor lock-in
13. Integration Challenges
How to integrate scientific workflow systems with Cloud
infrastructure and resources ?
Operation layer : Applications, services, and tools hosted in
the Cloud and the scheduling and management of a
workflow are outside the Cloud. (e.g. Google Map service
use ad hoc scripts and programs to glue the services
together)
Task management layer: resource provisioning. (e.g. Nimbus)
Workflow management layer: Debugging, monitoring, and
provenance tracking
All in cloud: porting issue. Need a workflow engine at cloud
end, and web interface or thin client at user end
14. Language Challenges
MapReduce: a widely used computing model, with two
key function, Map and Reduce. --White-Box
SwiftScript serves as a general purpose coordination
language, where existing applications can be invoked
without modification. --Black-Box
15. Language Challenges
Handle the mapping from input and output data into
logical structures.
Support large-scale parallelism via either implicit
parallelism, or explicit declaratives.
Support data partitioning and task partitioning.
Require a scalable, reliable, and efficient runtime system
that can support Cloud-scale task scheduling and
dispatching, provide error recovery and fault tolerance.
16. Computing Challenges
Workflow system may not be able to talk to Cloud
resources directly middleware services needed.
(Nimbus or Falkon to handle the resource provisioning
and task dispatching)
More complicated if consider: workflow resource
requirement, data dependencies, Cloud virtualization.
A SWFMS will try to automatically recover when non-fatal
errors happen. Smart-return: detailed execution info be
logged, for workflow restart.
17. Data management challenges
When data intensiveness increase, the management of
data resources and dataflow between the storage and
compute resources become the bottleneck.
Data Locality: CPU cheaper, data inflate location is the
most challenge, rather than the computational resources
Combining compute and data management: need to
minimize the amount of data movement. Otherwise,
significant underutilization of raw resources will be yield.
Provenance: derivation history of a data product. Tracking
across service providers, and across different abstraction
layers. Secure access is another missing now.
18. Service management challenges
The engineering of the components of an SWFMS as
services:
thousands of services developed and available for the
myExperiment project
the LEAD system has developed a tool to wrap and convert
ordinary science applications into services
The orchestration and invocation of services from an
SWFMS
managing the large number of service instances
data movements across different service instances
19. RESEARCH DIRECTIONS
Emphasis on workflow reference architecture and direct
research effort to foregoing layers
Great leap on Middleware development: resource
management, monitoring, messaging
Many Task computing (MTC): preliminary applied in Grids
and supercomputer, expected to largely improved for
Cloud
Scripting: mixture of semantics, combination of
application of services…
Cost optimization: very challenging, but rewarding too
20. RESEARCH DIRECTIONS
SWFMS security
Access control: critical because of the natures of
clouds ( Dynamic, large data and service sharing)
Information flow control: assure the scientific flow
related info propagated to an authorized end
Secure electronic transaction protocol: pay-as-you-go
pricing model
21. CONCLUSIONS
As more customers and applications migrate into Cloud,
the requirement to have workflow system to manage
complex tasks will become more urgent
Now mash-up’s and MapReduce style task management
have been acting in place of a workflow system in the
Cloud
The opportunities and challenges in bringing workflow
systems into the Cloud are discussed
They identify key research directions in realizing scientific
workflows in Cloud environments
Cloud Computing has proven to be one of the great disruptive technologies of our time, and the effects of its increasing adoption and maturation will ripple out.Cloud Computing is here to stay, and as developers become more aware of the immense potential.