This document discusses a framework for certifying workflows in a component-based cloud computing platform for high-performance computing (HPC) services. It presents a Scientific Workflow Component Certifier (SWC2) that uses the mCRL2 model checking toolset to verify workflows meet safety and liveness properties. It describes translating scientific workflows specified in SAFeSWL into the mCRL2 input language and evaluating default and application-specific properties. As a case study, it models and certifies a MapReduce workflow in the HPC Shelf system, checking 20 formal properties and measuring certification times for different components.
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
FACS2017-Presentation.pdf
1. Certification of Workflows in a Component-Based
Cloud of High Performance Computing Services
Allberson Bruno de Oliveira Dantas1
Francisco Heron de Carvalho Junior1
Luís Soares Barbosa2
1Mestrado e Doutorado em Ciência da Computação, Universidade Federal do Ceará,
Brazil
2HASLab INESC TEC & Universidade do Minho, Campus de Gualtar, Braga, Portugal
2. Agenda
I Context
I HPC Shelf;
I A Component Certification Framework for HPC Shelf;
I Results
I SWC2 (Scientific Workflow Component Certifier);
I A small scale case study (MapReduce).
I Conclusions and Further Works.
3. Context - HPC Shelf - Problem Posed
How to offer to Application Providers
some confidence that components of Parallel
Computing Systems they built do what
Component Developers declare they do in
Contextual Contracts ?
4. Context - HPC Shelf - The Solution
A Certification Framework for
Components of Parallel Computing Systems !
A. B. de O. Dantas, F. H. de Carvalho Junior, L. S. Barbosa, A
Framework for Certification of Large-scale Component-based
Parallel Computing Systems in a Cloud Computing Platform for
HPC Services, Proceedings of the 7th International Conference on
Cloud Computing and Services Science (CLOSER’2017), Porto,
Portugal, 2017.
5. Context - HPC Shelf - The Certification Framework
I Computations;
I Virtual Platforms;
I Connectors;
I Data Sources;
I Certifiers;
I Tacticals;
I Service Bindings;
I Action Bindings;
I Certification Bindings.
6. Context - HPC Shelf - The Certification Framework
I Expert Users;
I Application Providers;
I Component Developers;
I Platform Maintainers;
I Certifier.
10. Context - HPC Shelf - The Solution
A. B. de O. Dantas, F. H. de Carvalho Junior, L. S. Barbosa, A
Framework for Certification of Large-scale Component-based
Parallel Computing Systems in a Cloud Computing Platform for
HPC Services, Proceedings of the 7th International Conference on
Cloud Computing and Services Science (CLOSER’2017), Porto,
Portugal, 2017.
I Case Study on Certification of Computation Components;
I C4 (Computation Component Certifier Component).
11. Problem - HPC Shelf - Certifying Workflows
This paper is concerned with ...
I Certification of Workflows of Parallel Computing Systems with
respect to safety and liveness properties;
I
.
.
.
12. Why Certifying Workflows in HPC Shelf ?
I The interest in SWfMS’s;
“Workflows have been largely applied by scientists and
engineers for the design, execution and monitoring of
reusable data processing task pipelines in scientific
discovery and data analysis”
I Workflows are commonly represented by components that absorb
all the orchestration logic required to solve a specific problem;
I Ensuring the reliability of workflows is a challenging task:
I Deadlocks must be avoided;
I Crucial operations must be effectively executed;
I The protocol that defines the order in which certain critical
operations must be executed must be respected;
I No faulty behaviors should be induced by badly designed workflows.
13. Problem - HPC Shelf - Certifying Workflows
This paper is concerned with ...
I Certification of Workflows of Parallel Computing Systems with
respect to safety and liveness properties;
I SWC2 (Scientific Workflow Component Certifier);
I A tactical component for the mCRL2 toolset.
14. Why mCRL2 ?
A tool for modeling and analysis of communicating systems
I System behaviors in mCRL2 are specified in a process algebra
reminiscent of ACP (Algebra of Communicating Processes);
I Processes are built from a set of user-declared actions and a small
number of combinators;
I Actions can be parameterized by data and conditional constructs;
I Data is defined in terms of abstract, equational data types;
I Behaviors given operationally, using Labeled Transition Systems;
I Properties are expressed in a modal logic with fixed points,
extending Kozen’s propositional modal µ-calculus with data
variables and quantification over data domains;
I Model checking.
15. The SWC2 Certifier
Workflow patterns
I What are the relevant properties a scientific workflow is expected
to comply?
I Experience with SAFe specifications and a state-of-the-art survey
on widespread SWfMSs:
I Coarse-Grained Components;
I Few orchestration constructors;
I Abstract workflows;
I Separation of Interface from Implementation;
I Incremental Instantiation of Computational Resources;
I Internal Workflows;
I Clustering.
16. The SWC2 Certifier
Internal Workflows in HPC Shelf
Each component workflow of a component C consists of a set of rules
of the form:
C ::= act1 → act2 ↓ | > → act2 ↓ | act1 → act2
8
(component rule)
17. The SWC2 Certifier
Formal Properties and Contextual Contracts
Properties:
I default;
I application;
I ad hoc.
Contextual signature of SWC2:
SWC2 [deadlock_absence = D : DAType,
infinite_loop_absence = I : ILAType,
life_cycle_verification = L : LCVType,
ad_hoc_properties = A : AHType]
A concrete certifier for SWC2:
SWC2Impl : SWC2 [deadlock_absence = DeadlockAbsence,
infinite_loop_absence = InfiniteLoopAbsence,
life_cycle_verification = LifeCycleVerification,
ad_hoc_properties = AdHocProperties]
18. The SWC2 Certifier
Formal Properties and Contextual Contracts
Tactical component:
Tactical [message_passing_interface = M : MPIType,
number_of_nodes = N : Integer, number_of_cores = C : Integer]
mCRL2 [version = V : VersionType, message_passing_interface = M : MPIType,
number_of_nodes = N : Integer, number_of_cores = C : Integer]
extends Tactical [message_passing_interface = M,
number_of_nodes = N,number_of_cores = C]
0 <sequence>
1 <perform action="resolve" id_port="mCRL2-life-cycle"/>
2 <perform action="deploy" id_port="mCRL2-life-cycle"/>
3 <perform action="instantiate" id_port="mCRL2-life-cycle"/>
4 <perform action="verify_perform" id_port="mCRL2-verify"/>
5 <perform action="release" id_port="mCRL2-life-cycle"/>
6 </sequence>
Figure: The orchestration code of the certifier SWC2Impl in TCOL.
19. The SWC2 Certifier
Formal Properties and Contextual Contracts
A concrete tactical component fot mCRL2:
mCRL2Impl : mCRL2 [version = 201409.1,
number_of_cores = 4,
message_passing_interface = MPICH2]
The contextual contract for mCRL2 in SWC2Impl:
mCRL2 [message_passing_interface = MPI]
A contract for certifying a workflow (provided in a certification binding
between the workflow component and a SWC2 component):
SWC2 [deadlock_absence = DeadlockAbsence, ad_hoc_properties = AdHocProperties]
20. Translating SAFeSWL to mCRL2
The Translation Process
A formal grammar of the orchestration subset of SAFeSWL:
T ::= L | G | T1; T2 | T1||T2 | repeat T (task)
L ::= act | break | continue | start(h, act) | wait(h) | cancel(h) (literal)
G ::= act↓T | act↓T + G (guarded tasks)
21. Translating SAFeSWL to mCRL2
The Translation Process
I The semantics of W consists of a task TW is given by the rules
described ahead, and initial state hTW , stop, ∅, ∅, ∅, ∅i;
I The symbol stop denotes task completion;
I Each execution state consists of a tuple hT1, T2, E, L, S, Fi:
I T1 is the next task to be evolved;
I T2 is the following task to be evolved;
I E are the actions enabled in the components;
I L is a stack of pairs with the beginning and the end of the repeat
blocks scoping the current task;
I S is a set of pairs with actions asynchronously activated that have
not yet been finished and their handles;
I F is a set of handles associated to finished asynchronous actions.
23. Default Properties in mCRL2
Deadlock absence:
I DA : [true∗]htrueitrue
I there is always a possible next action at every point in the
workflow
Infinite loop absence:
I ILA : ∀i : Nat.[true∗]htrue ∗ .break(i)itrue
I verifies if all mCRL2 break(i) actions can occur from a certain
point on, where i is the index of the related repeat
Life-cycle restrictions:
I LC1 : ∀c : Nat.[!resolve(c) ∗ .deploy(c)]false
htrue ∗ .resolve(c).!release(c) ∗ .deploy(c)itrue
I a deploy may not be performed before a resolve
25. Case Study - MapReduce
MapReduce is a programming model implemented by a number of
large-scale data parallel processing frameworks. A user must specify:
I a map function, which is applied by a set of parallel mapper
processes to each element of an input list of key/value pairs
(KV-pairs), returning a set of elements in an intermediary list of
KV-pairs;
I a reduce function, which is applied by a set of parallel reducer
processes to all intermediate values associated with the same key
across all mappers, yielding a list of output pairs.
26. Case Study - MapReduce
Figure: A classic example of word counting with MapReduce.
27. Case Study - MapReduce
We have designed a framework of components for MapReduce
computations in HPC Shelf, comprising the following components:
I Mapper;
I Reducer;
Computations
I Splitter;
I Shuffler.
Connectors
I ReadPairsEnviromnentPortType;
I WritePairsEnviromnentPortType;
I IteratorEnvironmentPortType;
Environment Port Types
I TaskSourceChunkPortType.
I TaskChunkPortType;
Task Port Types
30. Case Study - MapReduce
Certification of the MapReduce Workflow
Figure: Certification architecture for the MapReduce workflow.
31. Case Study - MapReduce
MapReduce Ad Hoc Properties
The MapReduce ad hoc properties include safety and liveness ones.
The safety group describes precedences of execution between two
distinct components or component actions (2 examples of 11):
SbM : [!compute(3, 352) ∗ .compute(5, 551)]false
MbC : [true ∗ .compute(5, 551).!compute(3, 352) ∗ .
compute(5, 551)]false
The liveness group includes the following set of properties:
LIV1 : htrue ∗ .guard(3, 342)i true
LIV2 : ∀c : Nat, a : Nat. νX.µY .[compute(c, a)]Y && [!compute(c, a)]X
LIV3 : [true∗](∀c : Nat, a : Nat =>
µY .([!compute(c, a)]Y && < true > true))
32. Case Study - MapReduce
MapReduce Workflow Certification Time
The 20 formal properties (both default and ad hoc) verified were
distributed among the 4 units of the tactical component, and proven.
Figure: MapReduce workflow certification times.
33. Conclusions and Future Work
I The role of SWC2 within an application is clear:
Increasing the confidence on the underlying workflow
specification and avoiding anomalous or erroneous
behaviours introduced by design errors.
I Such an approach presents a reflexive character:
A component that certify the orchestration of the other
components of a parallel computing system.
I We have advanced the work with our certification framework for
HPC Shelf by introducing the possibility of verifying
workflow components against behavioural properties;
I New tactical components and certifiers can be smoothly added to
deal with the verification of different classes of properties over
components of diferent kinds.
34. Conclusions and Future Work
I The work on workflow certification in HPC Shelf, as well as its
certification framework, will be continuosly an ongoing project;
I e.g. a certifier for connectors (Reo ?).
I At present, an initial prototype was developed in C#/MPI and
validated through benchmark examples, such as MapReduce;
I Implementations are available at
github.com/UFC-MDCC-HPC/HPC-Shelf-Certification;
I we are looking forlarger case studies !
I The abstractions behind verifying workflow properties through the
certification framework we propose for HPC Shelf could be
employed by other SWfMS systems (e.g. Pegasus or Taverna)
I Possibly, part os further works ...