S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

S-Cube Learning Package

Data Dependency:
Inferring Data Attributes in Service Orchestrations
Based on Sharing Analysis

´
Universidad Politecnica de Madrid (UPM)

Learning Package Categorization

S-Cube

WP-JRA-2.2: Adaptable Coordinated Service
Compositions

Models and Mechanisms For Coordinated
Service Compositions

Data Dependency:
Inferring Data Attributes in Service Orchestrations Based on
Sharing Analysis

Table of Contents

1 Introduction and Background
2 Motivation and Problem Statement
3 Overview of the Approach
Contexts and Concept Lattices
Horn Clause Programs
Workflows in Horn Clause Form
Sharing Analysis
Obtaining and Interpreting Results

4 Application to Fragment Identification
5 Conclusions

These slides have been prepared for offline viewing. Throughout the presentation, running commentaries, notes and
additional remarks will be displayed on the margins using the condensed font, like here.
Please refer to the publication list at the end for more details.

1 Introduction and Background

SOA and Web Services
Service Oriented Architecture (SOA):
We mention just some of the
• Flexible set of computing system design and key features of SOA and Web
implementation principles Services, that correspond to
• Emphasis on loose coupling between services a “high-level” view of the
area, i.e. without taking into
and with OS
account the details of
• Distribution over Internet/intranet implementation technologies
• Actors: service providers and consumers and infrastructure.
• Intrinsic dynamism and adaptability There are many standards
• Functionality often in the form of Web services and technologies in the
service world. Also, different
Web services: service provision platforms
• Interoperability, platform independence offer varying degrees of
functionality to users and
• Data exchange standards: XML
designers.
• Several technological ﬂavors:
For a more detailed
WSDL/SOAP-based introduction to the SOA
RESTful design philosophy, platforms,
• Typical implementation platforms: tools, and techniques, please
refer to the list of publications.
Java / .NET application servers, BPEL, Web
server scripts (e.g. PHP)

Service Compositions
Service compositions are
typically designed to reflect
some underlying
Service compositions aggregate individual technological or business
services to achieve a more complex or process.
cross-organizational task: Compositions thus allow
creation of new “higher level”
• Combining loosely coupled components functionality from existing
• Compositions expose themselves as services services as building blocks. A
• Often described using workflows (control/data) service composition often
involves services from
• Complex control and latent parallelism
different subsystems within
• Potentially long-running an organization, as well as
• Centralized control ⇒ orchestration external services.
• Subject to migration, adaptation, fragmentation, Orchestrations have a
etc. centralized control and data
• Described using abstract & executable flow (a workflow of activities).
They are usually described
formalisms & languages using a general-purpose or
specialized language or
formal notation.

Data in Service Compositions

Analyzing data together with
control allows us to answer
Data in service compositions represents inputs, questions about composition
intermediate results, internal messages, and behavior depending on the
input data and other received
final results: messages.
• Workflow activities operate on data (access, E.g., we can ask whether
combine, transform, etc.) some conditional branch in
• Therefore data dependencies as important as the workflow will be taken for
a given kind of data, or what
control
values would some
• Data is atomic or structured using rich intermediate data fields have
information formats (XML trees) for the given input.
• Uses query languages (e.g. XPath) to search That kind of problems has
and access fields and nested elements been long studied in program
analysis, and getting exact
• Behavior of control structures typically depends
answers is generally hard
on data. and undecidable in presence
of loops.

2 Motivation and Problem Statement

Example Medical Workflow

x: Patient ID

y: Medical history

a1 : Retrieve ¬stable a4 : Select new
medical history medication

+ + a5 : Log treatment

a2 : Retrieve a3 : Continue last
medication record stable prescription

z: Medication record

Written using BPMN (Businessdrug prescription workflow. Notation).
Fig. 1. An example
Process Modeling
• A y: Medical history(non-executable) description.
high-level
c: Criterion
p: Prescription candidate
This workflow shows a simplified drug prescription process in a health organization. At the entry, the patient identifies
a41 : Run tests to a42 : Search
him/herself (item x, PatientID). The patient’s medical history (y ) and medication record (z) are then retrieved in parallel
produce medication medication
(activities a1 and a2 ). criteria databases

Example Medical Workflow (contd.)
x: Patient ID

y: Medical history

a1 : Retrieve ¬stable a4 : Select new

+ + a5 : Log treatment

a2 : Retrieve a3 : Continue last
medication record stable prescription


Fig. 1. An example drug prescription workflow.
Aiming at fragmentation that respects data privacy.
y: Medical history
Depending on whether the patient’s condition is stable or not,c: Criterion earlier prescription is continued (activity a3 ), or a
either the
new medication is selected (activity a4 ). Finally, the treatment of the patient is logged (activity a5 ).
In this example, we consider data privacy attributes. Data to contain confidential information on the patient’s medical and
a41 : Run tests may a42 : Search
produceA fragment should contain activities that access data of only a certain
medication history, including insurance coverage. medication medication
criteria databases
privacy level. The fragments can then be distributed based on what privacy clearance they require.

Example Sub-Workflow
Fig. 1. An example drug prescription workflow.

y: Medical history
c: Criterion
a41 : Run tests to a42 : Search
produce medication medication
criteria databases

no Result yes
sufficiently
specific?

Fig. 2. Selection of new medication.

Sub-Workflow for medication selection (component service a4 )
der to make concepts useful for analysis, we on intermediate data.
• involves looping based need to Concepts may have one or both parts of the annotatio
e them into concept lattices. A lattice is a mathemat- in the latter case, the annotation is not shown.
ucture make≤, ∨, ∧) built around a“unpack” (in of thecase
To (L, things more interesting, we set L one our componentFig. 5 presents the concept lattices for the medical
services, a4 , from the main workflow and represent it
ing as a sub-workflow with own inputs (items y and z), outputs (itemcontexts from Fig.dataThe most general concepts are s
concepts from a context), a partial order relation p), and intermediate 4. (item c).
east upper bound (LUB) operation ∨, and the greatest top of the lattices, and the most specific (empty in bo
bound soon as there is a loop involvedarbitrary x, y ∈ L, the analysis bottom. more complex. An exact analysis of the
As
(GLB) operation ∧. For (taking the “no” branch), the at the becomes
orchestration state after the loop would require a discovery of the loop invariant, which is a generally difficult problem. As we
x ∨ y = z has the property x ≤ z and y ≤ z, but it is
least such element, i.e., for any other w ∈ L such that B. Describing Data with Concept Lattices
will show in the next section, we find our way around this obstacle by employing abstract interpretation techniques that give
us a conservative approximation of the loop behavior.
and y ≤ w, we have z ≤ w. The case for the greatest The data items that are input to the workflow ne
ound operation ∧ is symmetric. In this paper, we deal mapped to the appropriate objects in the input conce

Data Attributes
Reasoning about the
User-defined attributes can be used to whereabouts of data in the
characterize data in a given analysis domain execution of a service
composition is simpler if we
• Application dependent view track only data attributes and
• Simplified data model: sets of properties instead not the entire complex data
structures. This fits very well
of complex structures
an approach to program
• User (designer) chooses relevant attributes,
analysis known as abstract
describing e.g.: interpretation, where infinite
information content data domains are abstracted
privacy/confidentiality levels ⇐ our example into finite ones.
ownership E.g., knowing privacy levels
other aspects of quality of input data, we can try to
infer the privacy levels of
• Possibly: a combination of views intermediate data and the
• Known or assumed for input data, implicit in individual activities in the
control/data dependencies in the workflow workflow.
Of course, we have to know
Question: How to infer attributes (i.e. properties) how data tests and
of intermediate and resulting data items? operations depend on and
• Based on control flow and data dependencies affect data attributes on the
abstract plane.

Knowing Data Attributes
Knowledge of data attributes at design time: Analysis of data attributes for
components of a service
• Supporting fragmentation orchestration at design time is
Fragment: a part of orchestration that can be an instance of static analysis,
distributed for remote execution where properties are inferred
What parts can be identified and enacted in a from specification, and not by
running the orchestration.
distributed fashion?
The static analysis approach
• Checking data compliance
can be combined with
Content of messages exchanged with/between monitoring and adaptation
component services in an orchestration mechanisms, and the
Is “sufficient” data passed to components? analysis can be performed on
a live executing instance of
• Robust top-down development the orchestration.
Modular structure of service orchestrations That can give more accurate
Refining specifications of workflow results, because by looking at
(sub-)components the live instance we can learn
the actual values of data up
Also useful at runtime: to that point in execution, and
• Updating predictions with actual data update the analysis
• On-demand analysis accordingly.

Problem Statement
P ROBLEM
To infer user-defined attributes for data items and activities on different lev-
els in an orchestration, automatically from:
known attributes of input data
• defined or assumed by the designer
control structure
• including complex control structures, such as parallel flows, conditional
branches and loops.
data operations
• reading or writing data, including tests, assignments and service
invocations

The aim is to provide the automated, mechanical inference of data attributes, ideally using a tool that can be invoked at
design time. The concrete tool implementation depends on the language in which the orchestration is written (e.g., BPEL,
BPMN, etc.)
In this learning package, we generally present the approach and ideas for each step in the analysis process. These steps
can be adapted to particular orchestration language and turned into a fully automated tool chain.

Overview
Input data context Workflow definition Resulting context

User perspective
α1 α2 α3 ... α1 α2 α3 ...
i1 o1
i2 o2
i3 o3
... ...

Input concept lattice Resulting concept lattice
Underlying techniques and artifacts

Horn clause program
w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):-
A1=f1(X1),
Y1=f1Y1(X1),
A2=f2(X2),
Y2=f2Y2(X2),
A3=f3(Y1,Y2),
...

Input substitution Sharing analysis Abstract substitution
...
X1=f(U1,U2), - Abstract interpretation [[X1,A1,Y1,A3,Z1],
[A3,Z1,A4,Z2],
X2=f(U1), - Sharing+freeness domain [X2,A4,Z2],
X3=f, - CiaoDE / CiaoPP suite [X2,A2,Y2,A3,Z1,A4,Z2]]
...

Fig. 3. Overview of the approach.

Above the line are artifacts that the user works with directly. The input data context describes user-defined attributes of the
eed inputs to the orchestration, and accompanies the workflow definition, like the represented in the form results are program in
to be mapped to appropriate objects (in this case the needs to be one in our example. The of a logic returned [14]:
Medical history the resulting context which gives back the attributes series of logical implications results, and activities.
the form of and the Medication record from Fig. 5(a)). for the intermediate data items, which can be operationally un
are explained in the slides that follow. stating which subgoals are needed to accomplish
The intermediate steps below the line NALYSIS
IV. A PPLYING S HARING A
derstood as
given goal. Note that the translation into a logic program doe
Our application of sharing analysis to elicit new knowledge not need to be operationally equivalent to the initial workflow

Overview (contd.)
The approach to Automated Attribute inference takes as input:
• an input data context that identifies the input data items to the workflow
and their attributes
• a workflow definition in some appropriate formalism (e.g. BPMN in our
example)
and gives at output:
• a resulting context that presents inferred attributes of all intermediate
data items and activities in the given workflow.
The key steps in the process include:
• Conceptualizing the input data context in the form of a concept lattice,
and preparing the input substitution for the analysis.
• Turning the given workflow definition into a Horn Clause program that
is fed to the analysis, along with the input substitution.
• Performing sharing analysis and using its result, the abstract
substitution to construct the resulting concept lattice.
• Interpreting the resulting concept lattice to produce the resulting
context.

Outline of the Section

This Overview section starts with two subsections that introduce
some important background notions.
• Subsection Contexts and Concept Lattices brieﬂy introduces the key
notions of Formal Concept Analysis (FCA), like contexts and concept
lattices that are used in the rest of the text for representing (and
reasoning about) inputs and outputs of the proposed analysis
approach.
• Subsection Horn Clause Programs presents the key ideas behind
logic (Horn Clause) programs, gives an informal introduction to their
form and meaning, and presents the notion of structured terms,
substitutions, uniﬁcation, which are all referred to later. It also
introduces Prolog syntax.
These two subsections do not describe steps of the approach as
such. Rather, they supply the notions whose understanding is
necessary for understanding the steps in the approach.

Outline of the Section (contd.)

The rest of subsections describe steps in the process of automated
attribute inference:
• Subsection Workflows in Horn Clause Form starts from a rather
generalized way of describing workflows that involve complex data and
control dependencies, and describes how such workflows can be
turned into a Horn Clause form amenable to sharing analysis.
• Next, subsection Sharing Analysis first defines the notion of sharing
in logic programs, building on the notion of substitution, introduced
earlier. Next, it describes the notion of abstract substitution, which is
used in the actual analysis as the domain for abstract interpretation. It
also describes how an initial substitution for the analysis is set up using
attributes from the input concept lattice.
• Finally, subsection Obtaining and Interpreting Results explains how
the result of the sharing analysis, in the form of abstract substitution, is
turned into a resulting concept lattice, and then used to generate the
resulting context, which is the end result.

3 1 Contexts and Concept Lattices

with the approaches to verify
The sharing analysis tools we will use [7], [6] work on logic
Contexts: therefore the workflow under consideration specifications using data-flow
programs, and Objects and Attributes
those higher-level conceptual
with various aspects of busine
Formal Concept Analysis is a
Symptoms Tests Coverage case we aim mathematical prop
branch of at inferring
Medical history that takes into account details o
lattice theory concerned with
Medication record control flow and data operatio
knowledge representation
(a) Characteristics of medical databases. or UMLreasoning. diagrams a
and activity
whileAHorncontext is simply a an
FCA clauses provide
Name Address PIN SSN that has been extensively stud
table that associates objects
Passport As with attributes.
an illustration, we give
National Id Card of our workflowon the left in B
The examples written
Driving License clauses. The contexts: one that th
show two
translation for
describes the content of
Social Security Card Prolog syntax, and will be ex
medical databases, and
(b) Types of identity documents. Lines 1-8that describes the
another are a Horn clause
the workflow with a list of com
information contained in
Fig. 4. Two examples of contexts. (linesdifferent identity documents.
2-8) following the defini
Objects (rows) stand for
Notion of context in Formal Concept Analysis some meaningful entities,
(FCA) and attributes (columns) are
• Set A of attributes (columns) chosen by the user to
represent relevant notions in
• Set O of objects (rows)
the application domain.
• Boolean object-attribute relation ρ ⊆ O × A

Concepts
From the definition, in
concept (B , D ) we need to
The idea behind a concept is a close connection know only B or D to find the
other using (·) .
between subsets of objects and attributes.
That means we can choose
Objects → Attributes to work with objects or
• For arbitrary subset of objects B ⊆ O attributes, whatever is more
convenient.
let B = {a ∈ A | ∀o ∈ B , oρa }
E.g., we can start from a
“all and only those attributes that belong to all
single attribute a and
objects from B” calculate {a } to find the
Attributes → Objects most general concept that
has a.
• For arbitrary subset of attributes D ⊆ A
Or, we can start with an
let D = {o ∈ O | ∀a ∈ D , oρa } object o and calculate {o }
“all and only those objects that have all to find the most specific
attributes from D” concept containing o.
Because B = B and
Iff B = D and D = B then (B , D ) is a concept D = D, we say that
• B = (B ) = D = B, D = (D ) = B = D concepts are closed under
(·) , i.e., (·) is a closure.

activity, and ϕ is an uninterpreted discussed symbol
to be function below. A41=f41(Y,Z), % a_41 race condition betw
particular name is not relevant for sharing analysis, C=f41_C(Y),
The ordering of activities in the body of a clause must try to read/write the
21
Concept Lattices
been chosen to recall the activity name). This is A42=f42(C), % a_42
respect data dependencies, in the sense P=f42_P(C). should example and the pos
that data items
23
d by goals of the same shape where the left-hand side a goal only if they are produced by a detected from the st
appear as arguments in
Concept latticepreceding activity. The ordering also needs to respect control include both branch
tands for data item produced by the withactivity, and the
ordering
Fig. 6. Horn clause program encoding for the medication prescri
n the right hand side includes dependencies arising fromworkflow. sequences and joins (AND can be affected by
• (B1 , D ) data ,items usedB1 the B2 explicit2 ⊆ D1
(B goals ) ⇔ in ⊆
The concept lattice is often
D2 ⇔D
tion of the data item. For1instance, 2OR). A1=f1(X,D) in the AND-split case, the relative using a variant ofactivity shown component
and Otherwise, as
1 Y(X,D) in•lines 2 and concept ofthe fact that a1
Lesser 3 representadds attributes (D2the body i.e.a Horn clause is diagrams (bottom left). a
order activities as goals in
⊆ D1 , of Hasse its body (activities
a items x and d asB1 , D1 ) is moredata item y. The not significant from the sharing analysis same manner as
( inputs, to produce specific) Nodes represent concepts.view
the point of
ception in w is the goal for concept includes lesser objects ordering can always be found, unless there
• Greater sub-workflow a4 (line 7) and one such The top concept is goal for a4
The visually at
scussed below. race condition between potentially parallelized activities
Symptoms the top, and the predicate a4 de
to a bottom
(B1 ⊆ B2 , i.e. (B , D2 ) is more general)
ordering of activities in the body of2a clause must try to read/write the same data item. This is not the case in concept istranslated the introdu
visually at by
data dependencies, in ( ): the most items should
• Top the sense that data generalTests exampleMedicationthe possibility ofbottom. represent the case o
concept (all Coverage
Medical history
objects)
and record this happening can be static
s arguments in aBottom (⊥): the most specific concept (allthe structure of Callouts showmeans of a tothat
• goal only if they are produced by a detected from the workflow. Alsonew recur
by objects note
g activity. The ordering also needs to respect control include both branches of the the concept (inherited by11) is in
attributes) XOR-split, since the all
(w2 in line data t
ncies arising from explicit sequences and joins (AND can be affected by either oneupwards nodes) above the for predicate a 4x.
of them. The workflow
). Otherwise, as in the AND-split case, the relative component activity a4 is effectively attributes new to the
It is a complete lattice line, and aB. Input Substitutio
repeat-until loop,
activities as goals in the body of a Horn clause is latticebody (activities a41 and a42 ) is translated in lines 19-2
(a) Concept its for medical databases.
concepts (inherited by all
Name same manner as w.
the An input substitu
downwards nodes) below the
The goal for a4 in the definition fore which 7) is a
line. of w (line attribute
Symptoms
PIN to a predicate a4 defined in lines 10-13. Its loop is a map
Address
variables. It structur
Passport The example concept lattices as
data items given
translated by introducing auxiliary clauses in lines 15-17
Driving License

represent the case of loop exit (line 15) and the variables w
“hidden” loop
(bottom left) correspond to itera
Tests Coverage
Medical history Medication record SSN

by means of a recursive call. the sampleto Variable sharing c
Soc. Sec. Card
The call contexts for of the l
the body
National ID (w2 in line 11) is translated medical databases to variable se
represent the auxil
before the call personal
predicate a 4x. identification documents. of the
The structure
the input concept lat
(a) Concept lattice for medical databases. B. Input Substitutions attribute in the inpu
(b) Concept lattice for identity documents.
Name An input substitution sets up the initial sharing (andcor
named after the th

About Logic Programs
Logic programming is one of
Logic programs represent a computation task as the classical programming
set of logical rules and facts paradigms, along with
imperative, object-oriented
Logical rules model if-then inferences: and functional programming.
• B1 ∧ B2 ∧ · · · ∧ Bn → H: if B1 , . . . , Bn are all The example gives rules for
the “x is an ancestor of y”
true (n 0), then we conclude that H is true.
relation, written as
• Often written as H ← B1 ∧ B2 ∧ · · · ∧ Bn ancestor (x , y ), using also
• H is the head of the rule parent (x , y ) relation.
• B1 ∧ · · · ∧ Bn is the body of the rule Logic programs are
• H ← (the case n = 0) is a fact (H is always true) declarative, because they
state the rules and the
problem to be solved (e.g.
Example ﬁnding somebody’s ancestors
or descendants), not the
ancestor (x , y ) ← parent (x , y ) (a parent is an ancestor) sequence of steps to solve it.
ancestor (x , y ) ← parent (z , y )∧ (a parent’s ancestor is That makes logic programs
ancestor (x , z ) an ancestor) relatives of SQL, but far more
powerful.

Elements of Horn Clause Programs
Elements of logic programs include: The elements of logic
programs (predicates, terms,
• Predicates that describe logical properties or variables, constants, etc.)
relations, such as ancestor /2 and parent /2 correspond to the notions in
(where /n means “with n arguments”) First Order Logic (FOL).
• Atoms that apply predicates, such as “x is an As in FOL, we assume that
predicate and constant
ancestor of y”, written as ancestor (x , y ) names refer to distinct
• Variables x , y , z that stand for arbitrary objects entities – unlike differently
in a rule (implicitly ∀-qualified) named variables that can
• Constants that name distinct objects (such as refer to the same object.

Alice, Bob, Carol and Dennis below) Also, p/1 and p/2 are two
different predicates — with
In a Horn Clause program rules, H and each of one and two arguments,
B1 , . . . , Bn are atoms. respectively — even though
they share the same name p.

Continued Example: Parent Fact Database The simplified structure of
Horn Clause programs allows
parent (Alice, Bob) ← efficient reasoning, i.e.
derivation of logical
parent (Dennis, Bob) ←
consequences from known
parent (Carol, Dennis) ← facts and rules.

Executing Horn-Clause Programs
The sample queries compute
Executing a logic programs means searching for different things (or fail, in the
a proof of a logical statement known as the last case) depending on the
query – in C or Java we
query, finding variable values along the way. would have to program
separate procedures for “find
Sample Query 1: Find Bob’s ancestors person’s ancestors” and “find
person’s descendants” etc.
Query: ancestor (x , Bob)
In case of success, the
Answers: x = Alice, x = Carol, x = Dennis variables in the query may
point to objects for which the
query can be proven from the
Sample Query 2: Find Carols’s descendants program.

Query: ancestor (Carol, y ) The “magic” is done by the
under-the-hood inference
Answers: y = Dennis, y = Bob engine that takes the program
and the query and performs a
systematic search for a proof.
Sample Query 3: Find Alice’s ancestors The result may be a failure, or
Query: ancestor (x , Alice) a single or multiple solution
(possibly infinite number of
Answer: no solution (cannot prove for any x) them).

Handling Structures
Structured terms have the shape f (t1 , t2 , . . . , tm ), Note that f (t1 , . . . , tn ) is
m 0, where f is a functor, and each of ti is NOT a function call in the
sense of C, Python or
again a term. Haskell. It can be thought of
as a data record with name f
Example: Peano Arithmetics and n ﬁelds. For n = 0, f is
simply a constant.
Program: number (0) ←
number (s(x )) ← number (x ) It goes without saying that
structured terms can be (and
succ (x , s(x )) ←
often are) nested, as in the
Query 1: number (x ) examples of Peano
Answers: x = 0, x = s(0), x = s(s(0)), x = s(s(s(0))), . . . arithmetics and lists.

Query 2: succ (x , s(0)) Lists are very frequently used
data structures, and are a
Answer: x =0
common tool in logic
programming. However,
Lists are common structures, with constant [ ] structured terms can be used
representing the empty list, and functor “.” (dot) to represent nodes in a tree
used to put together the head and the tail: or a graph, records that store
information, or other kinds of
• [4] is the same as .(4, [ ]) data containers we need.
• [1, 2, 3, 4] is the same as .(1, .(2, .(3, .(4, [ ]))))

Unification and Substitutions
An atoms of the form t1 = t2 expresses syntactical equality.
• it succeeds if t1 and t2 are identical, or can be made identical by
substituting some variables in t1 and t2 for terms.
• we are interested in the substitution which introduces the least amount
of information — the most general unifier (MGU)
• a substitution maps (binds) variables to terms

Unification Examples
Unification MGU Unification MGU
1=0 none (failure) s (x ) = s (y ) θ = {x → y }
s(0) = s(x ) θ = {x → 0} f (s(x ), x ) = f (z , 1) θ = {z → s(1), x → 1}
f (0) = s(x ) none (failure) f (s(x ), y ) = f (1, z ) none (failure)

Running a query means finding a substitution that makes the query
true, by adding MGUs from each B1 , . . . , Bn in a rule body.

Note that t1 = t2 is just a nicer way of writing = (t1 , t2 ). Unification is implicit in parameter passing in clause heads. For
instance, we can rewrite the rule “succ (x , s(x )) ←” as “succ (x , y ) ← y = s(x )”. Equally, the rule
“number (s(x )) ← number (x )” can be rewritten as “number (y ) ← y = s(x ) ∧ number (x )”.

Prolog and Friends
Prolog is a programming language based on The full Prolog language
includes “impure” features,
Horn Clause rules. such as dynamic fact updates
Concrete language syntax: and I/O. Modern Prolog
systems contain extensions
• clauses end with a full stop (“.”) such as constraint logic
• uses “:-” instead of “←”, comma instead of “∧” programming (CLP) and
• variables start with uppercase letters or “ ” tabling.
• predicate names, functors and constants start However, “pure” Prolog
with lowercase letters programs have a close
relationship with logical
Powerful analysis tools and techniques based on theories. Reasoning about
them in a sound fashion is
“clean” program semantics. easier than in other
executable formalisms.
Examples in Prolog We will use Prolog to encode
Ancestors: ancestor(X,Y):- parent(X,Y). objects and attributes in an
ancestor(X,Y):- parent(Z,Y), ancestor(X,Z). executable from which will
Peano arith.: number(0). capture the structure of a
number(s(X)):- number(X). workﬂow, and Prolog analysis
add(0, X, X):- number(X). tools to automatically derive
add(s(X), Y, s(Z)):- add(X,Y,Z). attributes.

3 3 Workﬂows in Horn Clause Form

Anatomy of a Workflow
There are many concrete
In general, workflows may contain complex data workflow definition languages
that can be used to specify
and control dependencies: control structures and data
• sequences, conditional branches, and loops operations. Here, we use an
• parallel flows, with pre- and post-conditions abstract workflow
representation where both
• data items are read and written by activities control and data
• inputs: possibly complex XML information sets dependencies are shown
x explicitly.
x, y
y, z Analyzing content of data
z items at all points in a
z
workflow is an instance of a
x
y x? y ? x , y ? general program analysis
problem. To solve, especially
in presence of loops and
complex data structures,
Understanding how data is handled throughout approximation techniques
the workflow is non-trivial. such as abstract
• what information items / parts are used? where? interpretation are usually
needed

Example of (Enriched) Workflow
To make the analysis of workflow control and data dependencies
easier, let us first “distill” our BPMN workflow example into a simplified
abstract form below (elements to be clarified in the slides that follow).
• We keep only the activity tags (a1 , . . . , a5 ), control dependencies
between them, and labels for data items read/written by the activities.
• We abstract the looping in the sub-workflow as a structured activity of
repeat-until type with a separate body sub-workflow.
x ,d y ,z x
y
a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ– a1 ∧ done–a2 ,
AND
pre–a3 ≡ done–a1 ∧ succ– a1 ∧ done–a2 ,
OR a5
pre–a5 ≡ done–a3 ∨ done–a4 }
x ,e AND y ,z
z
a2 a3 −

a41 a42
a4 : repeat-until loop
C ={pre– a42 ≡ done–a41 }
exit depends on p y ,z c
c p

Example of (Enriched) Workﬂow (cont.)
x ,d y ,z x
y
a1 a4 − − C ={pre– a4 ≡ done– a1 ∧ ¬succ– a1 ∧ done– a2 ,
AND
pre– a3 ≡ done– a1 ∧ succ– a1 ∧ done– a2 ,
OR a5
pre– a5 ≡ done– a3 ∨ done– a4 }
x ,e AND y ,z
z
a2 a3 −

a41 a42
C ={pre–a42 ≡ done– a41 }
c p
Workﬂow control structure:
• includes activities a1 , . . . , a5
• arrows show control dependencies (e.g. a4 depends on a1 and a2 )
• independent activities may run in parallel (e.g. a1 and a2 )
• different join types (AND/OR)
Data dependencies based on read/write annotations
• Wi annotation for each activity ai
R
i
• Ri is the set of data items read, Wi is the set of data items written

x ,d y ,z x
y
a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 ,
AND
pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 ,
OR a5
x ,e AND y ,z
z
a2 a3 −

a41 a42
c p

Set C of logical control preconditions:
• activity preconditions pre– ai expressed using propositional formulas
• done– aj means “aj has finished”
• succ– aj means “aj has achieved its (user-defined) goal”
• easily models sequences and AND/OR/XOR parallel flows
Helps detect possible deadlocks and race conditions:
• deadlocks appear in case of circular dependencies (pre– ai → done– ai )
• race conditions appear when two activities that read/write same data
item can execute in parallel


x ,d y ,z x
y
a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 ,
AND
pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 ,
OR a5
x ,e AND y ,z
z
a2 a3 −

a41 a42
c p

Based on control preconditions, we can ﬁnd legal orderings of
activities that respect the preconditions:
• only if there are no deadlocks/race conditions
(that can be efﬁciently checked using e.g. SAT solvers)
All legal orderings are equivalent from the point of view of data
handling.


x ,d y ,z x
y
a1 a4 − − C ={pre– a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 ,
AND
pre– a3 ≡ done–a1 ∧ succ–a1 ∧ done– a2 ,
OR a5
pre– a5 ≡ done– a3 ∨ done– a4 }
x ,e AND y ,z
z
a2 a3 −

a41 a42
C ={pre– a42 ≡ done– a41 }
c p

Sub-workflows can be used to model complex constructs:
• in our case, activity a4 is a repeat-until loop
• the body of the loop is a sub-workflow (with a41 and a42
Sub-workflows also allow modular development and/or assembly of
workflows

Workflow as a Horn Clause Program
We represent workflow symbolically in w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-
a Horn Clause form for further A1=f1(X,D), % a_1
Y=f1_Y(X,D),
analysis. A2=f2(X,E), % a_2
Z=f2_Z(X,E),
A3=f3(Y,Z), % a_3
• the representation is not a_4(Y,Z,A4,A41,C,A42,P), % a_4
operationally equivanent A5=f5(X). % a_5

a_4(Y,Z,A4,A41,C,A42,P):-
The predicate w stands for the w2(Y,Z,A41,C2,A42,P2),
A4=f4(P2),
workflow a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).

• clause body reflects a legal a_4x(_,_,C,P,C,P,_,_,_).
a_4x(X,Z,_,_,C,P,A4,A41,A42):-
ordering of activities a_4(X,Z,A4,A41,C,A42,P).
• variables stand for data items w2(Y,Z,A41,C,A42,P):-
and activities A41=f41(Y,Z), % a_41
C=f41_C(Y),
A42=f42(C), % a_42
Sub-workflows and complex activities P=f42_P(C).
are in separate predicates.

Note that in Prolog syntax, an underscore (“ ”) represents a new, fresh variable that stands for an arbitrary term.
Predicate w2 represents the body of the loop, and predicates a 4 and a 4x model the repeat-until construct.

Workflow as a Horn Clause Program (cont.)
w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-
A1=f1(X,D), % a_1
For each activity, we model data Y=f1_Y(X,D),
A2=f2(X,E), % a_2
dependencies with unifications: Z=f2_Z(X,E),
A3=f3(Y,Z), % a_3
• Ai = fi (Ri ) stands for “activity ai a_4(Y,Z,A4,A41,C,A42,P), % a_4
A5=f5(X). % a_5
reads data items from set Ri ”
a_4(Y,Z,A4,A41,C,A42,P):-
• for each written item z ∈ Wi w2(Y,Z,A41,C2,A42,P2),

written by ai , z = fiz (Q ) stands
A4=f4(P2),
a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).
for “z is written using data items a_4x(_,_,C,P,C,P,_,_,_).
from Q ⊆ Ri ” a_4x(X,Z,_,_,C,P,A4,A41,A42):-
a_4(X,Z,A4,A41,C,A42,P).

Such a Horn Clause representation w2(Y,Z,A41,C,A42,P):-
can be derived mechanically and, in A41=f41(Y,Z), % a_41
C=f41_C(Y),
principle, automatically. A42=f42(C), % a_42
P=f42_P(C).

Choice of functors (fi and fiz ) is purely symbolic and is not significant for the subsequent sharing analysis. The purpose of
the unifications in the Horn Clause representation is to express functional dependencies between activities and data items in
a workflow, and not to actually calculate them.

Sharing in Logic Programs
Sharing analysis tries to find
Sharing analysis of logic programs tries to infer out all possible sharings
how data is shared between variables: between variables in case of
successful executions. This
• sharing is always relative to a substitution θ
requires inclusion of all
upon successful execution of a query possible substitutions on exit
• two variables x , y are said to share if the terms from a query.
x θ and y θ (i.e. after applying θ to x and y ) That is generally impractical
contain some common variable. and often impossible, since
there may be many or even
Example infinite number of possible
substitutions to be taken into
θ = {x → s(y )} x θ = s (y ), y θ = y x and y share account.
To make sharing analysis
θ = {} xθ = x, yθ = y x and y do NOT share
viable, we often resort to
θ = {x → s(w ), x θ = s(w ), y θ = f (1, z ) x and y do some sort of approximation
that reduces the repertoire of
y → f (1, z )} NOT share
possible sharing cases to a
θ = {x → [1, w , f (z )], x θ = [1, w , f (z )], x and y share finite, manageable size, while
remaining safe, i.e. not
y → n(z , s(w ))} y θ = n(z , s(w )) w and z
missing any potential sharing.

Abstract Substitution Domain
Instead of looking at (possibly infinite number of) concrete
substitutions, we can perform analysis on a simplified abstract level.
Abstract substitutions approximate terms with sets of contained
variables (not concerned with the exact shape of terms):

Concrete: θ = { x → f ( u , g ( v )), y → h(5, u ), z → i ( v , w )}

DOMAIN
u v w
shared by

Abstract: Θ={ {x , y } , {x , z } , {z } }

By operating in the abstract substitution domain, the analysis task becomes simplified and finite. The shown abstract
substitution domain is not the only applicable choice. For instance, we can work with pair-wise sharing etc. Different sharing
domains also differ with respect to the computational cost of the analysis and precision. The domain used here is known to
be more precise (in the sense of avoiding over-approximation), but exponential in time with respect to the number of
variables involved. It is also often combined with additional freeness or groundness information.

Workflow Input Substitutions
We include the information on user-defined attributes of input data to
the workflow, by setting up the initial substitution for inputs (x, d, e in
our case):
init1(X, D, E):-
X= f1(Name, Pin),
D= f2(Symptoms, Tests),
E= f3(Symptoms, Coverage).

Reflects positioning of inputs in the initial context / concept lattice.
The initial concrete substitution coded here maps to the initial abstract
substitution Θ = {{x }, {d , e}, {d }, {e}}
• “x has some components not shared with d or e”
• “d and e share something” (Symptoms), but
• “both d and e have some private (not shared) components”
(Tests and Coverage, respectively)

Again, note that the choice of functors (f1, f2, f3) and variable names that stand for the attributes (Name, PIN, etc.) is
not significant for the abstract sharing analysis.

3 5 Obtaining and Interpreting Results

Sharing Results
The sharing results shown in
1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],
2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], (a) were obtained from
3 [X,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], sharing and freeness (shfr)
4 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], analysis in the CiaoPP
5 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],
analysis suite.
6 [D,A1,Y,A3,A4,A41,C,A42,P],
7 [E,A2,Z,A3,A41]] The results are safe in the
(a) The resulting substitution sense that all possible
sharing is included. However,
Top-level variables Recovered hidden variables it may contain a degree of
X, A5 {u1 , u2 , u3 , u4 }
E {u1 , u3 , u5 , u7 } over-approximation, i.e. it
D {u1 , u2 , u5 , u6 } may conservatively assume
s (which are A2, Z {u1 , u2 , u3 , u4 , u5 , u7 } sharing where it cannot be
between top- A1, Y, A42, C, P {u1 , u2 , u3 , u4 , u5 , u6 }
A3, A4, A41 {u1 , u2 , u3 , u4 , u5 , u6 , u7 }
ruled out with certainty.
ds iff A ⊆ B,
he associated (b) Points in the resulting sharing lattice. From an abstract substitution,
n the sharing Fig. 8. Abstract substitution and the recovered hidden variables. we do not know which
lattice. variables are shared, so one
The
ses for input
resulting abstract substitution (a) shows
u u u 1 5 2 possibility is to “recover” a
sharing for datau items and activities.
u 4
sufﬁcient number of hidden
3
x, a5 variables that are shared in a
t object (has
It is as if the data item and activity variables
u 7
u
manner compatible with the
6
e
shared
Security Card a set of hidden variables u1 , . . . , u7 (b)
d abstract substitution.
s and SSN).
cal histories a2 , z
a1 , y, p,
es Symptoms a42 , c

Minimal Hidden Variable Recovery
How many hidden variables are needed to
comply with the sharing results? The proof that it is sufficient
to “invent” a hidden variable
• As many as there are sharing settings. for each sharing setting in the
• The hidden variables are counterparts of the resulting abstract sharing,
user-defined attributes used in the input and that fewer hidden
variables than that would not
substitution. do, follows from the definition
A straightforward algorithm to recover a minimal of abstract sharing and the
monotonicity of logic
set of resulting hidden variables U. programs.
For any non-empty abstract
substitution, there is an
infinite number of compatible
function R ECOVER S UBST VARS(V,Θ ) concrete substitutions even
n ← |Θ |; U ← {u1 , u2 , ..., un } n = |Θ | fresh variables in U with a fixed set of hidden
S : V → ℘(U); S ← const(0) / the initial value for the result variables, because the shape
for x ∈ V , i ∈ {1..n} do for each variable and subst. setting of terms may be arbitrary.
if x ∈ Θ [i] then if the variable appears in the setting
That suffices in our case,
S ← S[x → S(x) ∪ {ui }] add ui to its resulting set
because we just want to
end if
know what is shared and not
end for
exactly how.
return U, S
end function

1 2 3 4 5 6 7
(b) Points in the resulting sharing lattice.
Resulting Lattice (Recovered)
Fig. 8. Abstract substitution and the recovered hidden variables.

u1 u5 u2
To interpret the resulting
u4
u3 lattice in terms of the original
x, a5
user-deﬁned attributes, we
u7 observe that sharing analysis
u6 preserves ordering between
e
d concepts in the lattice. That
means that the “lesser”
concepts in the resulting
a1 , y, p, lattice inherit all original
a2 , z attributes from the input data
a42 , c
items (shown in boldface).
a3 , a4 , a41 Therefore, we se the resulting
lattice over hidden variables
as a skeleton to “paint”
Fig. 9. The resulting concept lattice.
We can now construct the resulting concept intermediate data items and
latice: activities with the original
user-deﬁned attributes.
• activity and data item variables as objects
1
reasonable recoveredpractice.variables as attributes
• speed in hidden
The output of theare highlighted an abstract substitution
• activities analysis is
Fig. 8(a)), which is common to both cases of input data

Resulting Context
The resulting context is a
simple tabular form that is
Finally, after assigning user-defined attributes to presented to the user as the
concepts in the resulting lattice, we can create result of the sharing analysis.
the resulting context: The user starts with the input
context (above the line) and
the workflow definition, while
Item Name PIN Symp. Tests Cover. Item Na
all other steps are
x x
intermediate, mechanical and
d d
ideally fully automated.
e e
a2 , z For activities,a , z
attributes
2
indicated the properties of
a1 , y, p, a42 , c a1 , y, p, a42 , c
data visible (read) by an
a3 , a4 , a41 activity. a3 , data items,
For a4 , a41
a5 a5
attributes describe the
information content of data
Fig. 10. The resulting context for thewas derived from.cas
and what it two analysis
• The input data items (above the line) keep the
Note that the sharing analysis
initial attributes is conservative in the sense
meaning of these outputdata items and activities be interpreted that all attributes thatand are
• Intermediate hidden variables has to (below and Tests, cannot
in terms of the originaladded with— starting with those of the beMedicationoutprovider
the line) are attributes the assigned attributes decidedly ruled
included.
are

input data items. The sharing analysis of course preserves the Coverage, and are
original relationship among the input top-level variables [8]: (a3 , a4 and a41 ) nee

4 Application to Fragment Identiﬁcation

Fragmentation Example (Information Flow)
Main medical workflow Workflow for service a4 .
¬stable a4 : Select new
Organization

medication
Health

a41 : Run tests to no Result yes
+ produce medication sufficiently
criteria specific?
a3 : Continue last
+ prescription
stable
Examiners
Medical

a1 : Retrieve a42 : Search
databases
Registry Medication
Provider

a2 : Retrieve
medication record
Archive

a5 : Log treatment

Fig. 11. An example fragmentation for the drug prescription workflow.
Distributing execution of the workflow(s) across organizations
• Fragment: a subset of activities[12] Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit Data
ACKNOWLEDGMENTS
sharing a common property
• Fragments assigned to swim-lanes (partners) Links in WS-BPEL Processes. In International Conference on Services
Computing (SCC), 2008.
The research leading to these results has received funding
• Property: access level Programme [13] Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performance
from the European Community’s 7 th Framework to sensitive data
of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell.,
under the NoE S-Cube (Grant Agreement n◦ 215483). Thesee insurance coverage
Medical examiners cannot 14(2-3):189–216, 2002.
authors were also partially supported by Spanish MEC project [14] J.W. Lloyd. Foundations of Logic Programming. Springer, 2nd Ext. Ed.,
2008-05624/TIN DOVES and CM project P2009/TIC/1465 see medicalDaniel Wutke, and Frank Leymann. A Novel Approach
Medication providers cannot [15] Daniel Martin, tests
1987.

(PROMETIDOS). Registry can see only the patient ID. to Decentralized Workflow Enactment. In EDOC ’08: Proceedings
of the 2008 12th International IEEE Enterprise Distributed Object
Computing Conference, pages 127–136, Washington, DC, USA, 2008.
R EFERENCES IEEE Computer Society.
[16] F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program

Another Input Context Example

1 INITIAL SUBSTITUTION 3 RESULTING LATTICE

init2(X,D,E):-
u1
X=f1(Name, Address, SSN), u2
D=f2(SSN, Tests, Coverage),
u4
E=f3(SSN, Coverage). e
u3
x, a5

2 SHARING RESULTS

u5
u1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], z, a2
d
u2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],
u3 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],
u4 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],
u5 [D,A1,Y,A3,A4,A41,C,A42,P]] y , p, c, a1 , a3 , a4 , a41 , a42

4 RESULTING CONTEXT 5 RESULTING FRAGMENTATION SCHEME

Name Address SSN Tests Coverage
Swimlane Activities
x, a5
Health Organization a1 , a3 , a4 , a41 , a42
d
e Medical Examiners (empty)
z, a2 Medication Provider a2
y , p, c, other a · Registry Archive a5

Conclusions
Representing inputs and intermediate data and activities using FCA
contexts and concept lattices allows lattice-based formulation,
interpretation, and reasoning on data attributes.
Sharing analysis of logic programs is a powerful technique for
(abstract) sharing analysis, including sharing of data attributes.
• Supports complex data and control structures
• Applicable both at design and run-time
Applications include fragmentation (as illustrated), but also:
Data compliance checking – to verify that sufficient information is
exchanged between component services
Robust top-down development – by refining component /
sub-workflow specifications.
Future work: developing translators from concrete executable
languages (BPEL, XPDL, Yawl, etc.) into Horn clause programs to
facilitate automatic analysis. Also: analyzing stateful conversations
between compositions.

References

The content of this presentation is based on the following publications:

Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo.
Automated Attribute Inference in Complex Service Workflows Based on Sharing
Analysis.
Proceedings of the 8th International Conference on Service Computing - IEEE SCC
2011, IEEE Press, 2011.
Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo.
Automatic Fragment Identification in Workflows Based on Sharing Analysis.
In Mathias Weske, Jian Yang, Paul Maglio, and Marcelo Fantinato, editors,
Service-Oriented Computing – ICSOC 2010, number 6470 in LNCS. Springer Verlag,
2010.

References
Some pointers on Web service analysis and fragmentation:
Daniel Martin, Daniel Wutke, and Frank Leymann.
A Novel Approach to Decentralized Workﬂow Enactment.
In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise Distributed
Object Computing Conference, pages 127–136, Washington, DC, USA, 2008. IEEE
Computer Society.

Ustun Yildiz and Claude Godart.
Information Flow Control with Decentralized Service Compositions.
In Proceedings of ICWS 2007, pages 9–17, 2007.

Oliver Kopp, Rania Khalaf, and Frank Leymann.
Deriving Explicit Data Links in WS-BPEL Processes.
In International Conference on Services Computing (SCC), 2008.

Rania Khalaf.
Note on Syntactic Details of Split BPEL-D Business Processes.
Technical Report 2007/2, IAAS, U. Stuttgart, July 2007.

References

Some pointers to Formal Concept Analysis (FCA):

Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors.
Formal Con- cept Analysis, Foundations and Applications.
Volume 3626 of Lecture Notes in Computer Science. Springer, 2005.

Claudio Carpineto and Giovanni Romano.
Concept Data Analysis: Theory and Applications.
Wiley, 2004.

B. A. Davey and H. A. Priestley.
Introduction to Lattices and Order.
Cambridge University Press, 2nd ed. edition, 2002.

Sergei O. Kuznetsov and Sergei A. Obiedkov.
Comparing performance of algorithms for generating concept lattices.
J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002.

References

Some pointers on program analysis in general and logic programming:

P. Cousot and R. Cousot.
Abstract Interpretation: a Uniﬁed Lattice Model for Static Analysis of Programs by
Construction or Approximation of Fixpoints.
In ACM Symposium on Principles of Programming Languages (POPL’77). ACM
Press, 1977.
F. Nielson, H. R. Nielson, and C. Hankin.
Principles of Program Analysis.
Springer, 2005. Second Ed.

´
M. V. Hermenegildo, F. Bueno, M. Carro, P. Lopez, E. Mera, J.F. Morales, and G.
Puebla.
An Overview of Ciao and its Design Philosophy.
Theory and Practice of Logic Programming, 2011. http://arxiv.org/abs/1102.5497.

Acknowledgements

The research leading to these results has received funding
from the European Community’s Seventh Framework
Programme [FP7/2007-2013] under grant agreement 215483
(S-Cube).

S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Recommended

Recommended

More Related Content

Similar to S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Similar to S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis (20)

More from virtual-campus

More from virtual-campus (20)

Recently uploaded

Recently uploaded (20)

S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis