This document summarizes a joint workshop on workflow allocations and scheduling on Infrastructure as a Service (IaaS) platforms. It discusses using on-demand resources to more efficiently allocate workflows compared to static allocations. It proposes two algorithms, Eager and Deferred, to determine allocations within a given budget limit. Simulations using synthetic workflows showed Deferred guarantees budget constraints while Eager is faster. For small applications and budgets, Deferred is preferred. For larger applications and budgets approaching task parallelism saturation, Eager performs better. The document also discusses a prototype system using Nimbus, Phantom and DIET to deploy workflows on IaaS resources.
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
Improving initial generations in pso algorithm for transportation network des...ijcsit
Transportation Network Design Problem (TNDP) aims to select the best project sets among a number of new projects. Recently, metaheuristic methods are applied to solve TNDP in the sense of finding better solutions sooner. PSO as a metaheuristic method is based on stochastic optimization and is a parallel revolutionary computation technique. The PSO system initializes with a number of random solutions and seeks for optimal solution by improving generations. This paper studies the behavior of PSO on account of improving initial generation and fitness value domain to find better solutions in comparison with previous attempts.
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
Improving initial generations in pso algorithm for transportation network des...ijcsit
Transportation Network Design Problem (TNDP) aims to select the best project sets among a number of new projects. Recently, metaheuristic methods are applied to solve TNDP in the sense of finding better solutions sooner. PSO as a metaheuristic method is based on stochastic optimization and is a parallel revolutionary computation technique. The PSO system initializes with a number of random solutions and seeks for optimal solution by improving generations. This paper studies the behavior of PSO on account of improving initial generation and fitness value domain to find better solutions in comparison with previous attempts.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An Effective PSO-inspired Algorithm for Workflow Scheduling IJECEIAES
The Cloud is a computing platform that provides on-demand access to a shared pool of configurable resources such as networks, servers and storage that can be rapidly provisioned and released with minimal management effort from clients. At its core, Cloud computing focuses on maximizing the effectiveness of the shared resources. Therefore, workflow scheduling is one of the challenges that the Cloud must tackle especially if a large number of tasks are executed on geographically distributed servers. This entails the need to adopt an effective scheduling algorithm in order to minimize task completion time (makespan). Although workflow scheduling has been the focus of many researchers, a handful efficient solutions have been proposed for Cloud computing. In this paper, we propose the LPSO, a novel algorithm for workflow scheduling problem that is based on the Particle Swarm Optimization method. Our proposed algorithm not only ensures a fast convergence but also prevents getting trapped in local extrema. We ran realistic scenarios using CloudSim and found that LPSO is superior to previously proposed algorithms and noticed that the deviation between the solution found by LPSO and the optimal solution is negligible.
VLSI Implementation of 32-Bit Unsigned Multiplier Using CSLA & CLAAIJMTST Journal
In this project we are going to compare the performance of different adders implemented to the multipliers based on area and time needed for calculation. The CLAA based multiplier uses the delay time of 99ns for performing multiplication operation where as in CSLA based multiplier also uses nearly the same delay time for multiplication operation. But the area needed for CLAA multiplier is reduced to 31 % by the CSLA based multiplier to complete the multiplication operation.
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Introduction to Deep Learning with Pythonindico data
A presentation by Alec Radford, Head of Research at indico Data Solutions, on deep learning with Python's Theano library.
The emphasis of the presentation is high performance computing, natural language processing (using recurrent neural nets), and large scale learning with GPUs.
Video of the talk available here: https://www.youtube.com/watch?v=S75EdAcXHKk
Wave File Features Extraction using Reduced LBP IJECEIAES
In this work, we present a novel approach for extracting features of a digital wave file. This approach will be presented, implemented and tested. A signature or a key to any wave file will be created. This signature will be reduced to minimize the efforts of digital signal processing applications. Hence, the features array can be used as key to recover a wave file from a database consisting of several wave files using reduced Local binary patterns (RLBP). Experimental results are presented and show that The proposed RLBP method is at least 3 times faster than CSLBP method, which mean that the proposed method is more efficient.
Three types of service model: SAAS, PAAS, IAAS
Four types of deployment model: Public, Private, Hybrid And community Cloud.
During the load balancing process, few issues are yet to be fully addressed. Couple of them are:
Some of the nodes are overutilized or some of the nodes are underutilized
Improper workload in Cloud environment results into overhead in resource utilization and in turn inefficient usage of energy
response time of jobs
communication cost of servers
maintain cost of VMs,
throughput and overload of any single node.
By addressing the concern of load balancing, we aim to address multiple facets of Cloud viz. (a) resource utilization (b) CPU time (c) Migration time.
Problem statement
Problem raised while dealing with load balancing
How to minimize the CPU time
How to increase the resource utilization &
How to decrease the energy consumption and Migration time etc.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An Effective PSO-inspired Algorithm for Workflow Scheduling IJECEIAES
The Cloud is a computing platform that provides on-demand access to a shared pool of configurable resources such as networks, servers and storage that can be rapidly provisioned and released with minimal management effort from clients. At its core, Cloud computing focuses on maximizing the effectiveness of the shared resources. Therefore, workflow scheduling is one of the challenges that the Cloud must tackle especially if a large number of tasks are executed on geographically distributed servers. This entails the need to adopt an effective scheduling algorithm in order to minimize task completion time (makespan). Although workflow scheduling has been the focus of many researchers, a handful efficient solutions have been proposed for Cloud computing. In this paper, we propose the LPSO, a novel algorithm for workflow scheduling problem that is based on the Particle Swarm Optimization method. Our proposed algorithm not only ensures a fast convergence but also prevents getting trapped in local extrema. We ran realistic scenarios using CloudSim and found that LPSO is superior to previously proposed algorithms and noticed that the deviation between the solution found by LPSO and the optimal solution is negligible.
VLSI Implementation of 32-Bit Unsigned Multiplier Using CSLA & CLAAIJMTST Journal
In this project we are going to compare the performance of different adders implemented to the multipliers based on area and time needed for calculation. The CLAA based multiplier uses the delay time of 99ns for performing multiplication operation where as in CSLA based multiplier also uses nearly the same delay time for multiplication operation. But the area needed for CLAA multiplier is reduced to 31 % by the CSLA based multiplier to complete the multiplication operation.
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Introduction to Deep Learning with Pythonindico data
A presentation by Alec Radford, Head of Research at indico Data Solutions, on deep learning with Python's Theano library.
The emphasis of the presentation is high performance computing, natural language processing (using recurrent neural nets), and large scale learning with GPUs.
Video of the talk available here: https://www.youtube.com/watch?v=S75EdAcXHKk
Wave File Features Extraction using Reduced LBP IJECEIAES
In this work, we present a novel approach for extracting features of a digital wave file. This approach will be presented, implemented and tested. A signature or a key to any wave file will be created. This signature will be reduced to minimize the efforts of digital signal processing applications. Hence, the features array can be used as key to recover a wave file from a database consisting of several wave files using reduced Local binary patterns (RLBP). Experimental results are presented and show that The proposed RLBP method is at least 3 times faster than CSLBP method, which mean that the proposed method is more efficient.
Three types of service model: SAAS, PAAS, IAAS
Four types of deployment model: Public, Private, Hybrid And community Cloud.
During the load balancing process, few issues are yet to be fully addressed. Couple of them are:
Some of the nodes are overutilized or some of the nodes are underutilized
Improper workload in Cloud environment results into overhead in resource utilization and in turn inefficient usage of energy
response time of jobs
communication cost of servers
maintain cost of VMs,
throughput and overload of any single node.
By addressing the concern of load balancing, we aim to address multiple facets of Cloud viz. (a) resource utilization (b) CPU time (c) Migration time.
Problem statement
Problem raised while dealing with load balancing
How to minimize the CPU time
How to increase the resource utilization &
How to decrease the energy consumption and Migration time etc.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Task Scheduling using Hybrid Algorithm in Cloud Computing Environmentsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems iosrjce
Imprecise computation model is used in dynamic scheduling algorithm having heuristic function to
schedule task sets. A task is characterized by ready time, worst case computation time, deadline and resource
requirements. A task failing to meet its deadline and resource requirements on time is split into mandatory part
and optional part. These sub-tasks of a task can execute concurrently on multiple cores, thus achieving
parallelization provided by the multi-core system. Mandatory part produces acceptable results while optional
part refines the result further. To study the effectiveness of proposed scheduling algorithm, extensive simulation
studies have been carried out. Performance of proposed scheduling algorithm is compared with myopic and
improved myopic scheduling algorithm. The simulation studies shows that schedulability of task split myopic
algorithm is always higher than myopic and improved myopic algorithm.
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
See our presentation from the 6th International EULAG Users Workshop. We talked about taking HPC to the "Industry 4.0" by implementing smart techniques to optimize the codes in terms of performance and energy consumption. It explains how Machine Learning can dynamically optimize HPC simulations and byteLAKE's software autotuning solution.
Find out more about byteLAKE at: www.byteLAKE.com
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHMecij
The running generation of world, cloud computing has become the most powerful, chief and also lightning technology. IT based companies has already changed their way to buy and design hardware through this technology. It is a high utility which can also make software more attractive. Load balancing research in
cloud technology is one of the burning technologies in modern time. In this paper, pointing various proposed algorithms, the topic of load balancing in Cloud Computing are researched and compared to provide a gist of the latest way in this research area. By using Genetic Algorithm the balance is most
flexible which is represented here.
Similar to Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice (20)
(R)evolution of the computing continuum - A few challengesFrederic Desprez
Initially proposed to interconnect computers worldwide, the Internet has significantly evolved to become in two decades a key element in almost all our activities. This (r)evolution mainly relies on the progress that has been achieved in computation and communication fields and that has led to the well-known and widely spread Cloud Computing paradigm.
With the emergence of the Internet of Things (IoT), stakeholders expect a new revolution that will push, once again, the limits of the Internet, in particular by favouring the convergence between physical and virtual worlds. This convergence is about to be made possible thanks to the development of minimalist sensors as well as complex industrial physical machines that can be connected to the Internet through edge computing infrastructures.
Among the obstacles to this new generation of Internet services is the development of a convenient and powerful framework that should allow operators, and devops, to manage the life-cycle of both the digital infrastructures and the applications deployed on top of these infrastructures, throughout the cloud to IoT continuum.
In this keynote, Frédéric Desprez and his colleague Adrien Lebre presented research issues and provide preliminary answers to identify whether the challenges brought by this new paradigm is an evolution or a revolution for our community.
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...Frederic Desprez
The aim of the SILECS and SLICES projects is to design and build a large infrastructure for experimental research on various aspects of distributed computing, from small connected objects to the large data centres of tomorrow. This infrastructure will allow end-to-end experimentation with software and applications at all levels of the software layers, from event capture (sensors, actuators) to data processing and storage, to radio transmission management and dynamic deployment of edge computing services, enabling reproducible research on all-point programmable networks, ... SILECS is the french node of a european infrastructure called SLICES.
Super Infrastructure for Large-Scale Experimental Computer Science, (Almost) everything you wanted to know about SILECS/SLICES but didn't dare to ask. Presentation at "journées du GDR RSD", Nantes, Jan. 23, 2020/.
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceFrederic Desprez
SILECS, based on two existing infrastructure (FIT and Grid'5000), aims to provide a large robust, trustable and scalable instrument for research in
distributed computing and networks. Experiments from the Internet of Things, data centers, cloud computing, security services, and the networks
connecting them will be possible, in a reproducible way, on various hardware and software. This instrument will offer a multi-platform experimental
infrastructure (HPC, Cloud, Big Data, Software Defined Storage, IoT, wireless, Software Defined Network / Radio) capable of exploring the
infrastructures that will be deployed tomorrow and assist researchers and industrial about how to design, build and operate a multi-scale, robust and
safe computer system. Diverse digital resources (compute, storage, link, IO devices) are be assembled to support a “playground” at scale.
Challenges and Issues of Next Cloud Computing PlatformsFrederic Desprez
Cloud computing has now crossed the frontiers of research to reach industry. It is used every day , whether to exchange emails or make
reservations on web sites. However, many research works remain to be done to improve the performance and functionality of these platforms of tomorrow. In this talk, I will do an overview of some these theoretical and appliead researches done at INRIA and particularly around Clouds distribution, energy monitoring and management, massive data processing and exchange, and resource management.
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
The increasing complexity of available infrastructures (hierarchical, parallel, distributed, etc.) with specific features (caches, hyper-threading, dual core, etc.) makes it extremely difficult to build analytical models that allow for a satisfying prediction. Hence, it raises the question on how to validate algorithms and software systems if a realistic analytic study is not possible. As for many other sciences, the one answer is experimental validation. However, such experimentations rely on the availability of an instrument able to validate every level of the software stack and offering different hardware and software facilities about compute, storage, and network resources.
Almost ten years after its premises, the Grid'5000 testbed has become one of the most complete testbed for designing or evaluating large-scale distributed systems. Initially dedicated to the study of large HPC facilities, Grid’5000 has evolved in order to address wider concerns related to Desktop Computing, the Internet of Services and more recently the Cloud Computing paradigm. We now target new processors features such as hyperthreading, turbo boost, and power management or large applications managing big data. In this keynote we will both address the issue of experiments in HPC and computer science and the design and usage of the Grid'5000 platform for various kind of applications.
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
1. Workflow Allocations and Scheduling on IaaS
Platforms, from Theory to Practice
Eddy Caron1 , Fr´d´ric Desprez2 , Adrian Muresan1 , Fr´d´ric
e e , e e
Suter3 , Kate Keahey4
1 Ecole Normale Sup´rieure de Lyon, France
e
2 INRIA
3 IN2P3 Computing Center, CNRS, IN2P3
4 UChicago, Argonne National Laboratory
Joint-lab workshop
2. Outline
Context
Theory
Models
Proposed solution
Simulations
Practice
Architecture
Application
Experimentation
Conclusions and perspectives
3. Workflows are a common pattern in scientific applications
applications built on legacy code
applications built as an aggregate
use inherent task parallelism
phenomenons having inherent workflow structure
Workflows are omnipresent!
4. (a) Ramses (b) Montage
(c) Ocean current modeling (d) Epigenomics
Figure Workflow application examples
5. Classic model of resource provisioning
static allocations in a grid environment
researchers compete for resources
researchers tend to over-provision and under-use
workflow applications have a non-constant resource demand
This is inefficient, but can it be improved?
6. Classic model of resource provisioning
static allocations in a grid environment
researchers compete for resources
researchers tend to over-provision and under-use
workflow applications have a non-constant resource demand
This is inefficient, but can it be improved?
Yes!
How?
7. Classic model of resource provisioning
static allocations in a grid environment
researchers compete for resources
researchers tend to over-provision and under-use
workflow applications have a non-constant resource demand
This is inefficient, but can it be improved?
Yes!
How?
on-demand resources
automate resource provisioning
smarter scheduling strategies
8. Why on-demand resources?
more efficient resource usage
eliminate overbooking of resources
can be easily automated
unlimited resources ∗
9. Why on-demand resources?
more efficient resource usage
eliminate overbooking of resources
can be easily automated
unlimited resources ∗ =⇒ budget limit
10. Why on-demand resources?
more efficient resource usage
eliminate overbooking of resources
can be easily automated
unlimited resources ∗ =⇒ budget limit
Our goal
consider a more general model of workflow apps
consider on-demand resources and a budget limit
find a good allocation strategy
11. Related work
Functional workflows
Bahsi, E.M., Ceyhan, E., Kosar, T.: Conditional Workflow Management: A Survey and Analysis. Scientific
Programming 15(4), 283–297 (2007)
biCPA
Desprez, F., Suter, F.: A Bi-Criteria Algorithm for Scheduling Parallel Task Graphs on Clusters. In: Proc.
of the 10th IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing. pp. 243–252 (2010)
Chemical programming for workflow applications
Fernandez, H., Tedeschi, C., Priol, T.: A Chemistry Inspired Workflow Management System for Scientific
Applications in Clouds. In: Proc. of the 7th Intl. Conference on E-Science. pp. 39–46 (2011)
Pegasus
TMalawski, M., Juve, G., Deelman, E. and Nabrzyski, J..: Cost- and Deadline-Constrained Provisioning
for Scientific Workflow Ensembles in IaaS Clouds. 24th IEEE/ACM International Conference on
Supercomputing (SC12) (2012)
12. Outline
Context
Theory
Models
Proposed solution
Simulations
Practice
Architecture
Application
Experimentation
Conclusions and perspectives
14. Application model
Non-deterministic (functional) workflows
An application is a graph G = (V, E), where
V = {vi |i = 1, . . . , |V |} is a set of vertices
E = {ei,j |(i, j) ∈ {1, . . . , |V |} × {1, . . . , |V |}} is a set of edges
representing precedence and flow constraints
Vertices
a computational task [parallel, moldable]
an OR-split [transitions described by random variables]
an OR-join
16. Platform model
A provider of on-demand resources from a catalog:
C = {vmi = (nCPUi , costi )|i ≥ 1}
nCPU represents the number of equivalent virtual CPUs
cost represents a monetary cost per running hour
(Amazon-like)
communication bounded multi-port model
Makespan
C = maxi C (vi ) is the global makespan where
C (vi ) is the finish time of task vi ∈ V
Cost of a schedule S
Cost = ∀vmi ∈S Tendi − Tstarti × costi
Tstarti , Tendi represent the start and end times of vmi
costi is the catalog cost of virtual resource vmi
17. Problem statement
Given
G a workflow application
C a provider of resources from the catalog
B a budget
find a schedule S such that
Cost ≤ B budget limit is not passed
C (makespan) is minimized
with a predefined confidence.
18. Proposed approach
1. Decompose the non-DAG workflow into DAG sub-workflows
2. Distribute the budget to the sub-workflows
3. Determine allocations by adapting an existing allocation
approach
20. Step 2: Allocating budget
1. Compute the number of executions of each sub-worflow
# of transitions of the edge connecting its parent OR-split to
its start node
Described by a random variable according to a distinct normal
distribution + confidence parameter
2. Give each sub-workflow a ratio of the budget proportional to its
work contribution.
Work contribution of a sub-workflow G i
as the sum of the average execution times of its tasks
average execution time computed over the catalog C
task speedup model is taken into consideration
multiple executions of a sub-workflow also considered
21. Step 3: Determining allocations
Two algorithms based on the bi-CPA algorithm.
Eager algorithm
one allocation for each task
good trade-off between makespan and average time-cost area
fast algorithm
considers allocation-time cost estimations only
22. Step 3: Determining allocations
Two algorithms based on the bi-CPA algorithm.
Eager algorithm
one allocation for each task
good trade-off between makespan and average time-cost area
fast algorithm
considers allocation-time cost estimations only
Deferred algorithm
outputs multiple allocations for each task
good trade-off between makespan and average time-cost area
slower algorithm
one allocation is chosen at scheduling time
23. Algorithm parameters
over under average work allocated to tasks
TA , TA
TCP duration of the critical path
B estimation of the used budget when TA and TCP
meet
TA keeps increasing as we increase the
allocation of tasks and TCP keeps decreasing so
they will eventually meet.
When they do meet we have a trade-off between
the average work in tasks and the makespan.
p(vi ) number of processing units allocated to task vi
24. The eager allocation algorithm
1: for all v ∈ V i do
2: Alloc(v ) ← {minvmi ∈C CPUi }
3: end for
4: Compute B
|V i |
5: while TCP > TA ∩ j=1 cost(vj ) ≤ B i do
over
6: for all vi ∈ Critical Path do
7: Determine Alloc (vi ) such that p (vi ) = p(vi ) + 1
8: Gain(vi ) ← T (vi ,Alloc(vi )) − T (vi ,Alloc) (vi ))
p(vi ) p (vi
9: end for
10: Select v such that Gain(v ) is maximal
11: Alloc(v ) ← Alloc (v )
12: over
Update TA and TCP
13: end while
Algorithm 1: Eager-allocate(G i = (V i , E i ), B i )
25. Methodology
Simulation using SimGrid
Used 864 synthetic workflows for three types of applications
Fast Fourier Transform
Strassen matrix multiplication
Random workloads
Used a virtual resource catalog inspired by Amazon EC2
Used a classic list-scheduler for task mapping
Measured
Cost and makespan after task mapping
Name #VCPUs Network performance Cost / hour
m1.small 1 moderate 0.09
m1.med 2 moderate 0.18
m1.large 4 high 0.36
m1.xlarge 8 high 0.72
m2.xlarge 6.5 moderate 0.506
m2.2xlarge 13 high 1.012
m2.4xlarge 26 high 2.024
c1.med 5 moderate 0.186
c1.xlarge 20 high 0.744
cc1.4xlarge 33.5 10 Gigabit Ethernet 0.186
cc2.8xlarge 88 10 Gigabit Ethernet 0.744
28. First conclusions
Eager is fast but cannot guarantee budget constraint after
mapping
Deferred is slower, but guarantees budget constraint
After a certain budget they yield to identical allocations
for small applications and small budgets Deferred should be
preferred.
When the size of the applications increases or the budget limit
approaches task parallelism saturation, using Eager is
preferable.
29. Outline
Context
Theory
Models
Proposed solution
Simulations
Practice
Architecture
Application
Experimentation
Conclusions and perspectives
30. Architecture
Send workflow Acquire/release
DIET (MADAG) resources Nimbus
Workflow engine Phantom
Receive result service
Scientist
VM1
SeD Grafic1 SeD Ramses
VM2
Phantom Phantom
...
frontend frontend VM3
VMn
Manage Autoscaling Groups
Deploy workflow tasks
Figure System architecture
31. Main components
Nimbus
open-source IaaS provider
provides low-level resources (VMs)
compatible with the Amazon EC2 interface
used a FutureGrid install
32. Main components
Phantom
auto-scaling and high availability provider
high-level resource provider
subset of the Amazon auto-scale service
part of the Nimbus platform
used a FutureGrid install
still under development
33. Main components
MADag
workflow engine
part of the DIET (Distributed Interactive Engineering Toolkit)
software
one service implementation per task
each service launches its afferent task
supports DAG, PTG and functional workflows
34. Main components
Client
describes his workflow in xml
implements the services
calls the workflow engine
no explicit resource management
selects the IaaS provider to deploy on
35. How does it work?
Send workflow Acquire/release
DIET (MADAG) resources Nimbus
Workflow engine Phantom
Receive result service
Scientist
VM1
SeD Grafic1 SeD Ramses
VM2
Phantom Phantom
...
frontend frontend VM3
VMn
Manage Autoscaling Groups
Deploy workflow tasks
Figure System architecture
36. RAMSES
n-body simulations of dark matter interactions
backbone of galaxy formations
AMR workflow application
parallel (MPI) application
can refine at different zoom levels
38. Methodology
used a FutureGrid Nimbus installation as a testbed
measured running time for static and dynamic allocations
estimated cost for each allocation
varied maximum number of used resources
40. Results
25:00 Runtime (static)
VM alloc (dynamic)
Runtime (dynamic)
Running time (MM:SS) 20:00
15:00
10:00
5:00
0 2 4 6 8 10 12 14 16
Number of VMs
Figure Running times for a 26 × 26 × 26 box simulation
41. Results
Cost (static)
25 Cost (dynamic)
Cost (units) 20
15
10
5
0
0 2 4 6 8 10 12 14 16
Number of VMs
Figure Estimated costs for a 26 × 26 × 26 box simulation
42. Outline
Context
Theory
Models
Proposed solution
Simulations
Practice
Architecture
Application
Experimentation
Conclusions and perspectives
43. Conclusions
proposed two algorithms – Eager and Deferred with each their
pro and cons
on-demand resources can better model workflow usage
on-demand resources have a VM allocation overhead
allocation overhead decreases with number of VMs
for RAMSES, cost is greatly reduced
44. Perspectives
preallocate VMs
spot instances
smarter scheduling strategy
determine per application type which is the tipping point
Compare our algorithms with others
45. Perspectives
preallocate VMs
spot instances
smarter scheduling strategy
determine per application type which is the tipping point
Compare our algorithms with others
Collaborations
Continue the collaboration with the Nimbus/FutureGrid teams
On the algorithms themselves (currently too complicated for
an actual implementation)
Understanding (obtaining models) clouds and virtualized
platforms
going from theoretical algorithms to (accurate) simulations
and actual implementation
46. References
Eddy Caron, Fr´d´ric Desprez, Adrian Muresan and Fr´d´ric Suter.
e e e e
Budget Constrained Resource Allocation for Non-Deterministic
Workflows on a IaaS Cloud. 12th International Conference on
Algorithms and Architectures for Parallel Processing. (ICA3PP-12),
Fukuoka, Japan, September 04 - 07, 2012
Adrian Muresan, Kate Keahey. Outsourcing computations for galaxy
simulations. In eXtreme Science and Engineering Discovery Environment
2012 - XSEDE12, Chicago, Illinois, USA, June 15 - 19 2012. Poster
session.
Adrian Muresan. Scheduling and deployment of large-scale applications
on Cloud platforms. Laboratoire de l’Informatique du Parall´lisme (LIP),
e
ENS Lyon, France, Dec. 10th, 2012 PhD thesis.
47.
48. Algorithm parameters
over
TA - Eager
|V i |
over 1
TA = × (T (vj , Alloc(vj )) × cost(vj )) ,
B
j=1
under
TA - Deferred
|V i |
under 1
TA = × (T (vj , Alloc(vj )) × costunder (vj ))
B
j=1
49. Budget distribution algorithm
1: ω∗ ← 0
2: for all G i = (V i , E i ) ⊆ G do
3: nExec i ← CDF −1 (D i , Confidence)
4: ωi ← 1 T (vj , vmk ) × nExec i
|C|
vj ∈Vi vmk ∈C
5: ω∗ ← ω∗ + ωi
6: end for
7: for all G i ⊆ G do
ωi 1
8: B i ← B × ω∗ × nExec i
9: end for
Algorithm 2: Share Budget(B, G, Confidence)