SlideShare a Scribd company logo
S-Cube Learning Package


                Data Dependency:
Inferring Data Attributes in Service Orchestrations
            Based on Sharing Analysis

                         ´
        Universidad Politecnica de Madrid (UPM)
Learning Package Categorization

                               S-Cube



            WP-JRA-2.2: Adaptable Coordinated Service
                         Compositions



              Models and Mechanisms For Coordinated
                       Service Compositions



                           Data Dependency:
     Inferring Data Attributes in Service Orchestrations Based on
                            Sharing Analysis
Table of Contents

1   Introduction and Background
2   Motivation and Problem Statement
3   Overview of the Approach
        Contexts and Concept Lattices
        Horn Clause Programs
        Workflows in Horn Clause Form
        Sharing Analysis
        Obtaining and Interpreting Results

4   Application to Fragment Identification
5   Conclusions


These slides have been prepared for offline viewing. Throughout the presentation, running commentaries, notes and
additional remarks will be displayed on the margins using the condensed font, like here.
Please refer to the publication list at the end for more details.
1   Introduction and Background
SOA and Web Services
  Service Oriented Architecture (SOA):
                                                          We mention just some of the
    • Flexible set of computing system design and         key features of SOA and Web
        implementation principles                         Services, that correspond to
    • Emphasis on loose coupling between services         a “high-level” view of the
                                                          area, i.e. without taking into
        and with OS
                                                          account the details of
    •   Distribution over Internet/intranet               implementation technologies
    •   Actors: service providers and consumers           and infrastructure.
    •   Intrinsic dynamism and adaptability               There are many standards
    •   Functionality often in the form of Web services   and technologies in the
                                                          service world. Also, different
  Web services:                                           service provision platforms
    • Interoperability, platform independence             offer varying degrees of
                                                          functionality to users and
    • Data exchange standards: XML
                                                          designers.
    • Several technological flavors:
                                                          For a more detailed
             WSDL/SOAP-based                              introduction to the SOA
             RESTful                                      design philosophy, platforms,
    • Typical implementation platforms:                   tools, and techniques, please
                                                          refer to the list of publications.
             Java / .NET application servers, BPEL, Web
             server scripts (e.g. PHP)
Service Compositions
                                                         Service compositions are
                                                         typically designed to reflect
                                                         some underlying
  Service compositions aggregate individual              technological or business
  services to achieve a more complex or                  process.
  cross-organizational task:                             Compositions thus allow
                                                         creation of new “higher level”
    • Combining loosely coupled components               functionality from existing
    • Compositions expose themselves as services         services as building blocks. A
    • Often described using workflows (control/data)      service composition often
                                                         involves services from
    • Complex control and latent parallelism
                                                         different subsystems within
    • Potentially long-running                           an organization, as well as
    • Centralized control ⇒ orchestration                external services.
    • Subject to migration, adaptation, fragmentation,   Orchestrations have a
      etc.                                               centralized control and data
    • Described using abstract & executable              flow (a workflow of activities).
                                                         They are usually described
      formalisms & languages                             using a general-purpose or
                                                         specialized language or
                                                         formal notation.
Data in Service Compositions

                                                            Analyzing data together with
                                                            control allows us to answer
   Data in service compositions represents inputs,          questions about composition
   intermediate results, internal messages, and             behavior depending on the
                                                            input data and other received
   final results:                                            messages.
     • Workflow activities operate on data (access,          E.g., we can ask whether
         combine, transform, etc.)                          some conditional branch in
     •   Therefore data dependencies as important as        the workflow will be taken for
                                                            a given kind of data, or what
         control
                                                            values would some
     •   Data is atomic or structured using rich            intermediate data fields have
         information formats (XML trees)                    for the given input.
     •   Uses query languages (e.g. XPath) to search        That kind of problems has
         and access fields and nested elements               been long studied in program
                                                            analysis, and getting exact
     •   Behavior of control structures typically depends
                                                            answers is generally hard
         on data.                                           and undecidable in presence
                                                            of loops.
2   Motivation and Problem Statement
Example Medical Workflow

       x: Patient ID


                                               y: Medical history



                             a1 : Retrieve                            ¬stable        a4 : Select new
                            medical history                                            medication

                 +                                    +                                                    a5 : Log treatment

                             a2 : Retrieve                                          a3 : Continue last
                           medication record                            stable          prescription



                                              z: Medication record


        Written using BPMN (Businessdrug prescription workflow. Notation).
                        Fig. 1. An example
                                           Process Modeling
             • A y: Medical history(non-executable) description.
                 high-level
                                                                     c: Criterion
          z: Medication record
                                                                                                p: Prescription candidate
This workflow shows a simplified drug prescription process in a health organization. At the entry, the patient identifies
                                              a41 : Run tests to             a42 : Search
him/herself (item x, PatientID). The patient’s medical history (y ) and medication record (z) are then retrieved in parallel
                                             produce medication              medication
(activities a1 and a2 ).                            criteria                  databases
Example Medical Workflow (contd.)
       x: Patient ID


                                                  y: Medical history



                                a1 : Retrieve                           ¬stable      a4 : Select new
                               medical history                                         medication

                 +                                       +                                                 a5 : Log treatment

                                a2 : Retrieve                                       a3 : Continue last
                              medication record                            stable       prescription



                                                 z: Medication record


                                            Fig. 1.   An example drug prescription workflow.
        Aiming at fragmentation that respects data privacy.
                       y: Medical history
Depending on whether the patient’s condition is stable or not,c: Criterion earlier prescription is continued (activity a3 ), or a
                                                                 either the
        z: Medication record
new medication is selected (activity a4 ). Finally, the treatment of the patient is logged (activity a5 ).
                                                                                              p: Prescription candidate
In this example, we consider data privacy attributes. Data to contain confidential information on the patient’s medical and
                                             a41 : Run tests may          a42 : Search
                                           produceA fragment should contain activities that access data of only a certain
medication history, including insurance coverage.     medication          medication
                                                   criteria                databases
privacy level. The fragments can then be distributed based on what privacy clearance they require.
z: Medication record
    Example Sub-Workflow
                                                  Fig. 1.     An example drug prescription workflow.

                             y: Medical history
                                                                                  c: Criterion
                    z: Medication record
                                                                                                                p: Prescription candidate
                                                             a41 : Run tests to                  a42 : Search
                                                            produce medication                   medication
                                                                   criteria                       databases




                                                                            no                     Result           yes
                                                                                                 sufficiently
                                                                                                  specific?



                                                        Fig. 2.      Selection of new medication.

             Sub-Workflow for medication selection (component service a4 )
 der to make concepts useful for analysis, we on intermediate data.
                  • involves looping based need to Concepts may have one or both parts of the annotatio
e them into concept lattices. A lattice is a mathemat- in the latter case, the annotation is not shown.
ucture make≤, ∨, ∧) built around a“unpack” (in of thecase
     To (L, things more interesting, we set L one our componentFig. 5 presents the concept lattices for the medical
                                                                            services, a4 , from the main workflow and represent it
ing as a sub-workflow with own inputs (items y and z), outputs (itemcontexts from Fig.dataThe most general concepts are s
     concepts from a context), a partial order relation p), and intermediate 4. (item c).
 east upper bound (LUB) operation ∨, and the greatest top of the lattices, and the most specific (empty in bo
bound soon as there is a loop involvedarbitrary x, y ∈ L, the analysis bottom. more complex. An exact analysis of the
     As
          (GLB) operation ∧. For (taking the “no” branch), the at the becomes
     orchestration state after the loop would require a discovery of the loop invariant, which is a generally difficult problem. As we
  x ∨ y = z has the property x ≤ z and y ≤ z, but it is
  least such element, i.e., for any other w ∈ L such that B. Describing Data with Concept Lattices
     will show in the next section, we find our way around this obstacle by employing abstract interpretation techniques that give
     us a conservative approximation of the loop behavior.
and y ≤ w, we have z ≤ w. The case for the greatest                        The data items that are input to the workflow ne
 ound operation ∧ is symmetric. In this paper, we deal mapped to the appropriate objects in the input conce
Data Attributes
                                                            Reasoning about the
   User-defined attributes can be used to                    whereabouts of data in the
   characterize data in a given analysis domain             execution of a service
                                                            composition is simpler if we
     • Application dependent view                           track only data attributes and
     • Simplified data model: sets of properties instead     not the entire complex data
                                                            structures. This fits very well
        of complex structures
                                                            an approach to program
     • User (designer) chooses relevant attributes,
                                                            analysis known as abstract
        describing e.g.:                                    interpretation, where infinite
             information content                            data domains are abstracted
              privacy/confidentiality levels ⇐ our example   into finite ones.
             ownership                                      E.g., knowing privacy levels
             other aspects of quality                       of input data, we can try to
                                                            infer the privacy levels of
     • Possibly: a combination of views                     intermediate data and the
     • Known or assumed for input data, implicit in         individual activities in the
        control/data dependencies in the workflow            workflow.
                                                            Of course, we have to know
   Question: How to infer attributes (i.e. properties)      how data tests and
   of intermediate and resulting data items?                operations depend on and
     • Based on control flow and data dependencies           affect data attributes on the
                                                            abstract plane.
Knowing Data Attributes
   Knowledge of data attributes at design time:             Analysis of data attributes for
                                                            components of a service
     • Supporting fragmentation                             orchestration at design time is
            Fragment: a part of orchestration that can be   an instance of static analysis,
            distributed for remote execution                where properties are inferred
            What parts can be identified and enacted in a    from specification, and not by
                                                            running the orchestration.
            distributed fashion?
                                                            The static analysis approach
     • Checking data compliance
                                                            can be combined with
            Content of messages exchanged with/between      monitoring and adaptation
            component services in an orchestration          mechanisms, and the
            Is “sufficient” data passed to components?       analysis can be performed on
                                                            a live executing instance of
     • Robust top-down development                          the orchestration.
            Modular structure of service orchestrations     That can give more accurate
            Refining specifications of workflow                results, because by looking at
            (sub-)components                                the live instance we can learn
                                                            the actual values of data up
   Also useful at runtime:                                  to that point in execution, and
     • Updating predictions with actual data                update the analysis
     • On-demand analysis                                   accordingly.
Problem Statement
P ROBLEM
To infer user-defined attributes for data items and activities on different lev-
els in an orchestration, automatically from:
      known attributes of input data
            • defined or assumed by the designer
       control structure
            • including complex control structures, such as parallel flows, conditional
               branches and loops.
       data operations
            • reading or writing data, including tests, assignments and service
               invocations


The aim is to provide the automated, mechanical inference of data attributes, ideally using a tool that can be invoked at
design time. The concrete tool implementation depends on the language in which the orchestration is written (e.g., BPEL,
BPMN, etc.)
In this learning package, we generally present the approach and ideas for each step in the analysis process. These steps
can be adapted to particular orchestration language and turned into a fully automated tool chain.
3   Overview of the Approach
Overview
                                                            Input data context                       Workflow definition                      Resulting context

                    User perspective
                                                                α1     α2     α3    ...                                                            α1     α2     α3        ...
                                                          i1                                                                              o1
                                                          i2                                                                              o2
                                                          i3                                                                              o3
                                                          ...                                                                             ...




                                                          Input concept lattice                                                          Resulting concept lattice
                    Underlying techniques and artifacts




                                                                                                    Horn clause program
                                                                                                    w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):-
                                                                                                        A1=f1(X1),
                                                                                                        Y1=f1Y1(X1),
                                                                                                        A2=f2(X2),
                                                                                                        Y2=f2Y2(X2),
                                                                                                        A3=f3(Y1,Y2),
                                                                                                        ...




                                                            Input substitution                          Sharing analysis                   Abstract substitution
                                                                     ...
                                                                     X1=f(U1,U2),                     - Abstract interpretation                 [[X1,A1,Y1,A3,Z1],
                                                                                                                                                 [A3,Z1,A4,Z2],
                                                                     X2=f(U1),                        - Sharing+freeness domain                  [X2,A4,Z2],
                                                                     X3=f,                            - CiaoDE / CiaoPP suite                    [X2,A2,Y2,A3,Z1,A4,Z2]]
                                                                     ...




                                                                                          Fig. 3.   Overview of the approach.

     Above the line are artifacts that the user works with directly. The input data context describes user-defined attributes of the
 eed inputs to the orchestration, and accompanies the workflow definition, like the represented in the form results are program in
     to be mapped to appropriate objects (in this case the needs to be one in our example. The of a logic returned [14]:
Medical history the resulting context which gives back the attributes series of logical implications results, and activities.
     the form of and the Medication record from Fig. 5(a)).             for the intermediate data items, which can be operationally un
                                             are explained in the slides that follow. stating which subgoals are needed to accomplish
     The intermediate steps below the line NALYSIS
             IV. A PPLYING S HARING A
                                                                        derstood as
                                                                        given goal. Note that the translation into a logic program doe
  Our application of sharing analysis to elicit new knowledge not need to be operationally equivalent to the initial workflow
Overview (contd.)
   The approach to Automated Attribute inference takes as input:
     • an input data context that identifies the input data items to the workflow
       and their attributes
     • a workflow definition in some appropriate formalism (e.g. BPMN in our
       example)
   and gives at output:
     • a resulting context that presents inferred attributes of all intermediate
       data items and activities in the given workflow.
   The key steps in the process include:
     • Conceptualizing the input data context in the form of a concept lattice,
       and preparing the input substitution for the analysis.
     • Turning the given workflow definition into a Horn Clause program that
       is fed to the analysis, along with the input substitution.
     • Performing sharing analysis and using its result, the abstract
       substitution to construct the resulting concept lattice.
     • Interpreting the resulting concept lattice to produce the resulting
       context.
Outline of the Section

   This Overview section starts with two subsections that introduce
   some important background notions.
     • Subsection Contexts and Concept Lattices briefly introduces the key
       notions of Formal Concept Analysis (FCA), like contexts and concept
       lattices that are used in the rest of the text for representing (and
       reasoning about) inputs and outputs of the proposed analysis
       approach.
     • Subsection Horn Clause Programs presents the key ideas behind
       logic (Horn Clause) programs, gives an informal introduction to their
       form and meaning, and presents the notion of structured terms,
       substitutions, unification, which are all referred to later. It also
       introduces Prolog syntax.
   These two subsections do not describe steps of the approach as
   such. Rather, they supply the notions whose understanding is
   necessary for understanding the steps in the approach.
Outline of the Section (contd.)

   The rest of subsections describe steps in the process of automated
   attribute inference:
     • Subsection Workflows in Horn Clause Form starts from a rather
       generalized way of describing workflows that involve complex data and
       control dependencies, and describes how such workflows can be
       turned into a Horn Clause form amenable to sharing analysis.
     • Next, subsection Sharing Analysis first defines the notion of sharing
       in logic programs, building on the notion of substitution, introduced
       earlier. Next, it describes the notion of abstract substitution, which is
       used in the actual analysis as the domain for abstract interpretation. It
       also describes how an initial substitution for the analysis is set up using
       attributes from the input concept lattice.
     • Finally, subsection Obtaining and Interpreting Results explains how
       the result of the sharing analysis, in the form of abstract substitution, is
       turned into a resulting concept lattice, and then used to generate the
       resulting context, which is the end result.
3   1   Contexts and Concept Lattices
with the approaches to verify
  The sharing analysis tools we will use [7], [6] work on logic
Contexts: therefore the workflow under consideration                 specifications using data-flow
programs, and Objects and Attributes
                                                                    those higher-level conceptual
                                                                    with various aspects of busine
                                                                          Formal Concept Analysis is a
                           Symptoms        Tests       Coverage     case we aim mathematical prop
                                                                          branch of at inferring
     Medical history                                                that takes into account details o
                                                                          lattice theory concerned with
    Medication record                                               control flow and data operatio
                                                                          knowledge representation
             (a) Characteristics of medical databases.              or UMLreasoning. diagrams a
                                                                          and activity
                                                                    whileAHorncontext is simply a an
                                                                             FCA clauses provide
                             Name        Address       PIN    SSN   that has been extensively stud
                                                                          table that associates objects
                Passport                                               As with attributes.
                                                                            an illustration, we give
       National Id Card                                             of our workflowon the left in B
                                                                          The examples written
        Driving License                                             clauses. The contexts: one that th
                                                                          show two
                                                                                      translation for
                                                                          describes the content of
    Social Security Card                                            Prolog syntax, and will be ex
                                                                          medical databases, and
                 (b) Types of identity documents.                      Lines 1-8that describes the
                                                                          another are a Horn clause
                                                                    the workflow with a list of com
                                                                          information contained in
                 Fig. 4.   Two examples of contexts.                (linesdifferent identity documents.
                                                                           2-8) following the defini
                                                                          Objects (rows) stand for
    Notion of context in Formal Concept Analysis                          some meaningful entities,
    (FCA)                                                                 and attributes (columns) are
        • Set A of attributes (columns)                                   chosen by the user to
                                                                          represent relevant notions in
        • Set O of objects (rows)
                                                                          the application domain.
        • Boolean object-attribute relation ρ ⊆ O × A
Concepts
                                                          From the definition, in
                                                          concept (B , D ) we need to
  The idea behind a concept is a close connection         know only B or D to find the
                                                          other using (·) .
  between subsets of objects and attributes.
                                                          That means we can choose
  Objects → Attributes                                    to work with objects or
    • For arbitrary subset of objects B ⊆ O               attributes, whatever is more
                                                          convenient.
      let B = {a ∈ A | ∀o ∈ B , oρa }
                                                          E.g., we can start from a
      “all and only those attributes that belong to all
                                                          single attribute a and
      objects from B”                                     calculate {a } to find the
  Attributes → Objects                                    most general concept that
                                                          has a.
    • For arbitrary subset of attributes D ⊆ A
                                                          Or, we can start with an
      let D = {o ∈ O | ∀a ∈ D , oρa }                     object o and calculate {o }
      “all and only those objects that have all           to find the most specific
      attributes from D”                                  concept containing o.
                                                          Because B = B and
  Iff B = D and D = B then (B , D ) is a concept          D = D, we say that
      • B = (B ) = D = B, D = (D ) = B = D                concepts are closed under
                                                          (·) , i.e., (·) is a closure.
activity, and ϕ is an uninterpreted discussed symbol
                                               to be function below.                                          A41=f41(Y,Z), % a_41 race condition betw
 particular name is not relevant for sharing analysis,                                                        C=f41_C(Y),
                                                    The ordering of activities in the body of a clause must try to read/write the
                                                                                        21
     Concept Lattices
   been chosen to recall the activity name). This is                                                          A42=f42(C), % a_42
                                               respect data dependencies, in the sense P=f42_P(C). should example and the pos
                                                                                                              that data items
                                                                                        23
d by goals of the same shape where the left-hand side a goal only if they are produced by a detected from the st
                                               appear as arguments in
              Concept latticepreceding activity. The ordering also needs to respect control include both branch
 tands for data item produced by the              withactivity, and the
                                                          ordering
                                                                                      Fig. 6.        Horn clause program encoding for the medication prescri
 n the right hand side includes dependencies arising fromworkflow. sequences and joins (AND can be affected by
                    • (B1 , D ) data ,items usedB1 the B2 explicit2 ⊆ D1
                                                (B goals ) ⇔ in ⊆
                                                                                                                              The concept lattice is often
                                                         D2                              ⇔D
 tion of the data item. For1instance, 2OR). A1=f1(X,D) in the AND-split case, the relative using a variant ofactivity         shown component
                                               and          Otherwise, as
 1 Y(X,D) in•lines 2 and concept ofthe fact that a1
                          Lesser 3 representadds attributes (D2the body i.e.a Horn clause is diagrams (bottom left). a
                                               order       activities as goals in
                                                                                                 ⊆ D1 , of                    Hasse its body (activities
a items x and d asB1 , D1 ) is moredata item y. The not significant from the sharing analysis same manner as
                          ( inputs, to produce specific)                                                                       Nodes represent concepts.view
                                                                                                                                         the point of
ception in w is the goal for concept includes lesser objects ordering can always be found, unless there
                    • Greater sub-workflow a4 (line 7) and one such                                                            The top concept is goal for a4
                                                                                                                                             The visually at
scussed below.                                                                        race condition between potentially parallelized activities
                                                                                        Symptoms                              the top, and the predicate a4 de
                                                                                                                                         to a bottom
                          (B1 ⊆ B2 , i.e. (B , D2 ) is more general)
ordering of activities in the body of2a clause must try to read/write the same data item. This is not the case in             concept istranslated the introdu
                                                                                                                                          visually at by
data dependencies, in ( ): the most items should
                    • Top the sense that data generalTests exampleMedicationthe possibility ofbottom. represent the case o
                                                                            concept (all Coverage
                                                                      Medical history
                                                                                                         objects)
                                                                                                     and record                 this happening can be static
 s arguments in aBottom (⊥): the most specific concept (allthe structure of Callouts showmeans of a tothat
                    • goal only if they are produced by a detected from                                                        the workflow. Alsonew recur
                                                                                                                                         by objects note
 g activity. The ordering also needs to respect control include both branches of the the concept (inherited by11) is in
                          attributes)                                                                                          XOR-split, since the all
                                                                                                                                         (w2 in line data t
 ncies arising from explicit sequences and joins (AND can be affected by either oneupwards nodes) above the for                          predicate a 4x.
                                                                                                                                of them. The workflow
 ). Otherwise, as in the AND-split case, the relative component activity a4 is effectively attributes new to the
              It is a complete lattice                                                                                        line, and aB. Input Substitutio
                                                                                                                                           repeat-until loop,
   activities as goals in the body of a Horn clause is latticebody (activities a41 and a42 ) is translated in lines 19-2
                                                                    (a) Concept its for medical databases.
                                                                                                                              concepts (inherited by all
                                                                                      Name same manner as w.
                                                                                      the                                                    An input substitu
                                                                                                                              downwards nodes) below the
                                                                                          The goal for a4 in the definition fore which 7) is a
                                                                                                                              line.      of w (line attribute
                              Symptoms
                                                                       PIN            to a predicate a4 defined in lines 10-13. Its loop is a map
                                                                                                  Address
                                                                                                                                         variables. It structur
                                                                     Passport                                                 The example concept lattices as
                                                                                                                                         data items given
                                                                                      translated by introducing auxiliary clauses in lines 15-17
                                                                                              Driving License


                                                                                      represent the case of loop exit (line 15) and the variables w
                                                                                                                                         “hidden” loop
                                                                                                                              (bottom left) correspond to itera
                Tests                     Coverage
            Medical history            Medication record                                                      SSN

                                                                                      by means of a recursive call. the sampleto Variable sharing c
                                                                                                         Soc. Sec. Card
                                                                                                                              The call contexts for of the l
                                                                                                                                             the body
                                                                     National ID      (w2 in line 11) is translated medical databases to variable se
                                                                                                                                         represent the auxil
                                                                                                                               before the call personal
                                                                                      predicate a 4x.                         identification documents. of the
                                                                                                                                         The structure
                                                                                                                                         the input concept lat
           (a) Concept lattice for medical databases.                                 B. Input Substitutions                             attribute in the inpu
                                                                   (b) Concept lattice for identity documents.
                            Name                                                          An input substitution sets up the initial sharing (andcor
                                                                                                                                         named after the th
3   2   Horn Clause Programs
About Logic Programs
                                                                   Logic programming is one of
     Logic programs represent a computation task as                the classical programming
     set of logical rules and facts                                paradigms, along with
                                                                   imperative, object-oriented
     Logical rules model if-then inferences:                       and functional programming.
       • B1 ∧ B2 ∧ · · · ∧ Bn → H: if B1 , . . . , Bn are all      The example gives rules for
                                                                   the “x is an ancestor of y”
            true (n 0), then we conclude that H is true.
                                                                   relation, written as
        •   Often written as H ← B1 ∧ B2 ∧ · · · ∧ Bn              ancestor (x , y ), using also
        •   H is the head of the rule                              parent (x , y ) relation.
        •   B1 ∧ · · · ∧ Bn is the body of the rule                Logic programs are
        •   H ← (the case n = 0) is a fact (H is always true)      declarative, because they
                                                                   state the rules and the
                                                                   problem to be solved (e.g.
Example                                                            finding somebody’s ancestors
                                                                   or descendants), not the
ancestor (x , y ) ← parent (x , y )   (a parent is an ancestor)    sequence of steps to solve it.
ancestor (x , y ) ← parent (z , y )∧     (a parent’s ancestor is   That makes logic programs
                    ancestor (x , z )              an ancestor)    relatives of SQL, but far more
                                                                   powerful.
Elements of Horn Clause Programs
     Elements of logic programs include:                        The elements of logic
                                                                programs (predicates, terms,
       • Predicates that describe logical properties or         variables, constants, etc.)
         relations, such as ancestor /2 and parent /2           correspond to the notions in
         (where /n means “with n arguments”)                    First Order Logic (FOL).
       • Atoms that apply predicates, such as “x is an          As in FOL, we assume that
                                                                predicate and constant
         ancestor of y”, written as ancestor (x , y )           names refer to distinct
       • Variables x , y , z that stand for arbitrary objects   entities – unlike differently
         in a rule (implicitly ∀-qualified)                      named variables that can
       • Constants that name distinct objects (such as          refer to the same object.

         Alice, Bob, Carol and Dennis below)                    Also, p/1 and p/2 are two
                                                                different predicates — with
     In a Horn Clause program rules, H and each of              one and two arguments,
     B1 , . . . , Bn are atoms.                                 respectively — even though
                                                                they share the same name p.

Continued Example: Parent Fact Database                         The simplified structure of
                                                                Horn Clause programs allows
parent (Alice, Bob) ←                                           efficient reasoning, i.e.
                                                                derivation of logical
parent (Dennis, Bob) ←
                                                                consequences from known
parent (Carol, Dennis) ←                                        facts and rules.
Executing Horn-Clause Programs
                                                     The sample queries compute
    Executing a logic programs means searching for   different things (or fail, in the
    a proof of a logical statement known as the      last case) depending on the
                                                     query – in C or Java we
    query, finding variable values along the way.     would have to program
                                                     separate procedures for “find
Sample Query 1: Find Bob’s ancestors                 person’s ancestors” and “find
                                                     person’s descendants” etc.
Query:   ancestor (x , Bob)
                                                     In case of success, the
Answers: x = Alice, x = Carol, x = Dennis            variables in the query may
                                                     point to objects for which the
                                                     query can be proven from the
Sample Query 2: Find Carols’s descendants            program.

Query:   ancestor (Carol, y )                        The “magic” is done by the
                                                     under-the-hood inference
Answers: y = Dennis, y = Bob                         engine that takes the program
                                                     and the query and performs a
                                                     systematic search for a proof.
Sample Query 3: Find Alice’s ancestors               The result may be a failure, or
Query:    ancestor (x , Alice)                       a single or multiple solution
                                                     (possibly infinite number of
Answer:   no solution (cannot prove for any x)       them).
Handling Structures
     Structured terms have the shape f (t1 , t2 , . . . , tm ),     Note that f (t1 , . . . , tn ) is
     m 0, where f is a functor, and each of ti is                   NOT a function call in the
                                                                    sense of C, Python or
     again a term.                                                  Haskell. It can be thought of
                                                                    as a data record with name f
Example: Peano Arithmetics                                          and n fields. For n = 0, f is
                                                                    simply a constant.
Program: number (0) ←
         number (s(x )) ← number (x )                               It goes without saying that
                                                                    structured terms can be (and
         succ (x , s(x )) ←
                                                                    often are) nested, as in the
Query 1:   number (x )                                              examples of Peano
Answers:   x = 0, x = s(0), x = s(s(0)), x = s(s(s(0))), . . .      arithmetics and lists.

Query 2:   succ (x , s(0))                                          Lists are very frequently used
                                                                    data structures, and are a
Answer:    x =0
                                                                    common tool in logic
                                                                    programming. However,
     Lists are common structures, with constant [ ]                 structured terms can be used
     representing the empty list, and functor “.” (dot)             to represent nodes in a tree
     used to put together the head and the tail:                    or a graph, records that store
                                                                    information, or other kinds of
        • [4] is the same as .(4, [ ])                              data containers we need.
        • [1, 2, 3, 4] is the same as .(1, .(2, .(3, .(4, [ ]))))
Unification and Substitutions
        An atoms of the form t1 = t2 expresses syntactical equality.
            • it succeeds if t1 and t2 are identical, or can be made identical by
                substituting some variables in t1 and t2 for terms.
            • we are interested in the substitution which introduces the least amount
                of information — the most general unifier (MGU)
            • a substitution maps (binds) variables to terms

Unification Examples
       Unification              MGU                             Unification                            MGU
          1=0               none (failure)                    s (x ) = s (y )                 θ = {x → y }
      s(0) = s(x )          θ = {x → 0}                  f (s(x ), x ) = f (z , 1)       θ = {z → s(1), x → 1}
      f (0) = s(x )         none (failure)               f (s(x ), y ) = f (1, z )             none (failure)

        Running a query means finding a substitution that makes the query
        true, by adding MGUs from each B1 , . . . , Bn in a rule body.

Note that t1 = t2 is just a nicer way of writing = (t1 , t2 ). Unification is implicit in parameter passing in clause heads. For
instance, we can rewrite the rule “succ (x , s(x )) ←” as “succ (x , y ) ← y = s(x )”. Equally, the rule
“number (s(x )) ← number (x )” can be rewritten as “number (y ) ← y = s(x ) ∧ number (x )”.
Prolog and Friends
      Prolog is a programming language based on                  The full Prolog language
                                                                 includes “impure” features,
      Horn Clause rules.                                         such as dynamic fact updates
      Concrete language syntax:                                  and I/O. Modern Prolog
                                                                 systems contain extensions
          •     clauses end with a full stop (“.”)               such as constraint logic
          •     uses “:-” instead of “←”, comma instead of “∧”   programming (CLP) and
          •     variables start with uppercase letters or “ ”    tabling.
          •     predicate names, functors and constants start    However, “pure” Prolog
                with lowercase letters                           programs have a close
                                                                 relationship with logical
      Powerful analysis tools and techniques based on            theories. Reasoning about
                                                                 them in a sound fashion is
      “clean” program semantics.                                 easier than in other
                                                                 executable formalisms.
Examples in Prolog                                               We will use Prolog to encode
Ancestors:       ancestor(X,Y):- parent(X,Y).                    objects and attributes in an
                 ancestor(X,Y):- parent(Z,Y), ancestor(X,Z).     executable from which will
Peano arith.:    number(0).                                      capture the structure of a
                 number(s(X)):- number(X).                       workflow, and Prolog analysis
                 add(0, X, X):- number(X).                       tools to automatically derive
                 add(s(X), Y, s(Z)):- add(X,Y,Z).                attributes.
3   3   Workflows in Horn Clause Form
Anatomy of a Workflow
                                                        There are many concrete
  In general, workflows may contain complex data         workflow definition languages
                                                        that can be used to specify
  and control dependencies:                             control structures and data
    •   sequences, conditional branches, and loops      operations. Here, we use an
    •   parallel flows, with pre- and post-conditions    abstract workflow
                                                        representation where both
    •   data items are read and written by activities   control and data
    •   inputs: possibly complex XML information sets   dependencies are shown
                                     x                  explicitly.
                     x, y
                                        y, z            Analyzing content of data
                         z                              items at all points in a
                                     z
                                                        workflow is an instance of a
                x
                y                    x? y ? x , y ?     general program analysis
                                                        problem. To solve, especially
                                                        in presence of loops and
                                                        complex data structures,
  Understanding how data is handled throughout          approximation techniques
  the workflow is non-trivial.                           such as abstract
    • what information items / parts are used? where?   interpretation are usually
                                                        needed
Example of (Enriched) Workflow
   To make the analysis of workflow control and data dependencies
   easier, let us first “distill” our BPMN workflow example into a simplified
   abstract form below (elements to be clarified in the slides that follow).
      • We keep only the activity tags (a1 , . . . , a5 ), control dependencies
            between them, and labels for data items read/written by the activities.
      • We abstract the looping in the sub-workflow as a structured activity of
            repeat-until type with a separate body sub-workflow.
   x ,d                       y ,z   x
    y
            a1         a4     −      −       C ={pre–a4 ≡ done–a1 ∧ ¬succ– a1 ∧ done–a2 ,
                 AND
                                                    pre–a3 ≡ done–a1 ∧ succ– a1 ∧ done–a2 ,
                               OR    a5
                                                    pre–a5 ≡ done–a3 ∨ done–a4 }
   x ,e          AND          y ,z
    z
            a2         a3     −


                                      a41      a42
   a4 : repeat-until loop
                                                         C ={pre– a42 ≡ done–a41 }
          exit depends on p           y ,z      c
                                       c        p
Example of (Enriched) Workflow (cont.)
    x ,d                       y ,z   x
     y
             a1         a4     −      −       C ={pre– a4 ≡ done– a1 ∧ ¬succ– a1 ∧ done– a2 ,
                  AND
                                                     pre– a3 ≡ done– a1 ∧ succ– a1 ∧ done– a2 ,
                                OR    a5
                                                     pre– a5 ≡ done– a3 ∨ done– a4 }
    x ,e          AND          y ,z
     z
             a2         a3     −


                                       a41      a42
    a4 : repeat-until loop
                                                           C ={pre–a42 ≡ done– a41 }
           exit depends on p           y ,z      c
                                        c        p
   Workflow control structure:
    • includes activities a1 , . . . , a5
      • arrows show control dependencies (e.g. a4 depends on a1 and a2 )
      • independent activities may run in parallel (e.g. a1 and a2 )
      • different join types (AND/OR)
   Data dependencies based on read/write annotations
      • Wi annotation for each activity ai
        R
          i
      • Ri is the set of data items read, Wi is the set of data items written
Example of (Enriched) Workflow (cont.)
   x ,d                       y ,z   x
    y
            a1         a4     −      −       C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 ,
                 AND
                                                    pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 ,
                               OR    a5
                                                    pre–a5 ≡ done–a3 ∨ done–a4 }
   x ,e          AND          y ,z
    z
            a2         a3     −


                                      a41      a42
   a4 : repeat-until loop
                                                         C ={pre– a42 ≡ done–a41 }
          exit depends on p           y ,z      c
                                       c        p

   Set C of logical control preconditions:
     • activity preconditions pre– ai expressed using propositional formulas
     • done– aj means “aj has finished”
     • succ– aj means “aj has achieved its (user-defined) goal”
          • easily models sequences and AND/OR/XOR parallel flows
   Helps detect possible deadlocks and race conditions:
          • deadlocks appear in case of circular dependencies (pre– ai → done– ai )
          • race conditions appear when two activities that read/write same data
            item can execute in parallel
Example of (Enriched) Workflow (cont.)

   x ,d                       y ,z   x
    y
            a1         a4     −      −       C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 ,
                 AND
                                                    pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 ,
                               OR    a5
                                                    pre–a5 ≡ done–a3 ∨ done–a4 }
   x ,e          AND          y ,z
    z
            a2         a3     −


                                      a41      a42
   a4 : repeat-until loop
                                                         C ={pre– a42 ≡ done–a41 }
          exit depends on p           y ,z      c
                                       c        p


   Based on control preconditions, we can find legal orderings of
   activities that respect the preconditions:
          • only if there are no deadlocks/race conditions
            (that can be efficiently checked using e.g. SAT solvers)
   All legal orderings are equivalent from the point of view of data
   handling.
Example of (Enriched) Workflow (cont.)

   x ,d                        y ,z   x
    y
             a1         a4     −      −       C ={pre– a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 ,
                  AND
                                                     pre– a3 ≡ done–a1 ∧ succ–a1 ∧ done– a2 ,
                                OR    a5
                                                     pre– a5 ≡ done– a3 ∨ done– a4 }
    x ,e          AND          y ,z
     z
             a2         a3     −


                                       a41      a42
   a4 : repeat-until loop
                                                           C ={pre– a42 ≡ done– a41 }
           exit depends on p           y ,z      c
                                        c        p


   Sub-workflows can be used to model complex constructs:
       • in our case, activity a4 is a repeat-until loop
       • the body of the loop is a sub-workflow (with a41 and a42
   Sub-workflows also allow modular development and/or assembly of
   workflows
Workflow as a Horn Clause Program
        We represent workflow symbolically in                              w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-
        a Horn Clause form for further                                           A1=f1(X,D), % a_1
                                                                                 Y=f1_Y(X,D),
        analysis.                                                                A2=f2(X,E), % a_2
                                                                                 Z=f2_Z(X,E),
                                                                                 A3=f3(Y,Z), % a_3
             • the representation is not                                         a_4(Y,Z,A4,A41,C,A42,P), % a_4
                operationally equivanent                                         A5=f5(X). % a_5

                                                                          a_4(Y,Z,A4,A41,C,A42,P):-
        The predicate w stands for the                                           w2(Y,Z,A41,C2,A42,P2),
                                                                                 A4=f4(P2),
        workflow                                                                  a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).

             • clause body reflects a legal                                a_4x(_,_,C,P,C,P,_,_,_).
                                                                          a_4x(X,Z,_,_,C,P,A4,A41,A42):-
                ordering of activities                                           a_4(X,Z,A4,A41,C,A42,P).
             • variables stand for data items                             w2(Y,Z,A41,C,A42,P):-
                and activities                                                   A41=f41(Y,Z), % a_41
                                                                                 C=f41_C(Y),
                                                                                 A42=f42(C), % a_42
        Sub-workflows and complex activities                                      P=f42_P(C).
        are in separate predicates.

Note that in Prolog syntax, an underscore (“ ”) represents a new, fresh variable that stands for an arbitrary term.
Predicate w2 represents the body of the loop, and predicates a 4 and a 4x model the repeat-until construct.
Workflow as a Horn Clause Program (cont.)
                                                                        w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-
                                                                               A1=f1(X,D), % a_1
        For each activity, we model data                                       Y=f1_Y(X,D),
                                                                               A2=f2(X,E), % a_2
        dependencies with unifications:                                         Z=f2_Z(X,E),
                                                                               A3=f3(Y,Z), % a_3
            • Ai = fi (Ri ) stands for “activity ai                            a_4(Y,Z,A4,A41,C,A42,P), % a_4
                                                                               A5=f5(X). % a_5
                reads data items from set Ri ”
                                                                        a_4(Y,Z,A4,A41,C,A42,P):-
            • for each written item z ∈ Wi                                     w2(Y,Z,A41,C2,A42,P2),

                written by ai , z = fiz (Q ) stands
                                                                               A4=f4(P2),
                                                                               a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).
                for “z is written using data items                      a_4x(_,_,C,P,C,P,_,_,_).
                from Q ⊆ Ri ”                                           a_4x(X,Z,_,_,C,P,A4,A41,A42):-
                                                                               a_4(X,Z,A4,A41,C,A42,P).

        Such a Horn Clause representation                               w2(Y,Z,A41,C,A42,P):-
        can be derived mechanically and, in                                    A41=f41(Y,Z), % a_41
                                                                               C=f41_C(Y),
        principle, automatically.                                              A42=f42(C), % a_42
                                                                               P=f42_P(C).



Choice of functors (fi and fiz ) is purely symbolic and is not significant for the subsequent sharing analysis. The purpose of
the unifications in the Horn Clause representation is to express functional dependencies between activities and data items in
a workflow, and not to actually calculate them.
3   4   Sharing Analysis
Sharing in Logic Programs
                                                                            Sharing analysis tries to find
      Sharing analysis of logic programs tries to infer                     out all possible sharings
      how data is shared between variables:                                 between variables in case of
                                                                            successful executions. This
        • sharing is always relative to a substitution θ
                                                                            requires inclusion of all
             upon successful execution of a query                           possible substitutions on exit
          • two variables x , y are said to share if the terms              from a query.
             x θ and y θ (i.e. after applying θ to x and y )                That is generally impractical
             contain some common variable.                                  and often impossible, since
                                                                            there may be many or even
Example                                                                     infinite number of possible
                                                                            substitutions to be taken into
θ = {x → s(y )}             x θ = s (y ), y θ = y           x and y share   account.
                                                                            To make sharing analysis
θ = {}                      xθ = x, yθ = y           x and y do NOT share
                                                                            viable, we often resort to
θ = {x → s(w ),             x θ = s(w ), y θ = f (1, z )       x and y do   some sort of approximation
                                                                            that reduces the repertoire of
     y → f (1, z )}                                            NOT share
                                                                            possible sharing cases to a
θ = {x → [1, w , f (z )],   x θ = [1, w , f (z )],          x and y share   finite, manageable size, while
                                                                            remaining safe, i.e. not
     y → n(z , s(w ))}      y θ = n(z , s(w ))                   w and z
                                                                            missing any potential sharing.
Abstract Substitution Domain
        Instead of looking at (possibly infinite number of) concrete
        substitutions, we can perform analysis on a simplified abstract level.
        Abstract substitutions approximate terms with sets of contained
        variables (not concerned with the exact shape of terms):

          Concrete:             θ = { x → f ( u , g ( v )), y → h(5, u ), z → i ( v , w )}


               DOMAIN
                                                                  u                  v                   w
                                                                                  shared by



            Abstract:                         Θ={              {x , y } ,         {x , z } ,            {z } }

By operating in the abstract substitution domain, the analysis task becomes simplified and finite. The shown abstract
substitution domain is not the only applicable choice. For instance, we can work with pair-wise sharing etc. Different sharing
domains also differ with respect to the computational cost of the analysis and precision. The domain used here is known to
be more precise (in the sense of avoiding over-approximation), but exponential in time with respect to the number of
variables involved. It is also often combined with additional freeness or groundness information.
Workflow Input Substitutions
        We include the information on user-defined attributes of input data to
        the workflow, by setting up the initial substitution for inputs (x, d, e in
        our case):
                                           init1(X, D, E):-
                                             X= f1(Name, Pin),
                                             D= f2(Symptoms, Tests),
                                             E= f3(Symptoms, Coverage).

        Reflects positioning of inputs in the initial context / concept lattice.
        The initial concrete substitution coded here maps to the initial abstract
        substitution Θ = {{x }, {d , e}, {d }, {e}}
            • “x has some components not shared with d or e”
            • “d and e share something” (Symptoms), but
            • “both d and e have some private (not shared) components”
                (Tests and Coverage, respectively)

Again, note that the choice of functors (f1, f2, f3) and variable names that stand for the attributes (Name, PIN, etc.) is
not significant for the abstract sharing analysis.
3   5   Obtaining and Interpreting Results
Sharing Results
                                                                                             The sharing results shown in
                 1    [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],
                 2     [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],                                 (a) were obtained from
                 3     [X,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],                                 sharing and freeness (shfr)
                 4     [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],                                   analysis in the CiaoPP
                 5     [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],
                                                                                             analysis suite.
                 6     [D,A1,Y,A3,A4,A41,C,A42,P],
                 7     [E,A2,Z,A3,A41]]                                                      The results are safe in the
                                       (a) The resulting substitution                        sense that all possible
                                                                                             sharing is included. However,
                               Top-level variables     Recovered hidden variables            it may contain a degree of
                                     X, A5             {u1 , u2 , u3 , u4 }
                                        E              {u1 , u3 , u5 , u7 }                  over-approximation, i.e. it
                                        D              {u1 , u2 , u5 , u6 }                  may conservatively assume
 s (which are                        A2, Z             {u1 , u2 , u3 , u4 , u5 , u7 }        sharing where it cannot be
between top-                    A1, Y, A42, C, P       {u1 , u2 , u3 , u4 , u5 , u6 }
                                  A3, A4, A41          {u1 , u2 , u3 , u4 , u5 , u6 , u7 }
                                                                                             ruled out with certainty.
ds iff A ⊆ B,
he associated                   (b) Points in the resulting sharing lattice.                 From an abstract substitution,
n the sharing        Fig. 8.    Abstract substitution and the recovered hidden variables.    we do not know which
 lattice.                                                                                    variables are shared, so one
           The
ses for input
               resulting abstract substitution (a) shows
                                u     u   u     1             5       2                      possibility is to “recover” a
           sharing for datau items and activities.
                                            u                                 4
                                                                                             sufficient number of hidden
                                          3
                                                                     x, a5                   variables that are shared in a
t object (has
           It is as if the data item and activity variables
                              u           7
                                             u
                                                                                             manner compatible with the
                                                                          6
                              e
           shared
Security Card        a set of hidden variables u1 , . . . , u7 (b)
                                             d                                               abstract substitution.
 s and SSN).
 cal histories                        a2 , z
                                                                      a1 , y, p,
es Symptoms                                                               a42 , c
Minimal Hidden Variable Recovery
   How many hidden variables are needed to
   comply with the sharing results?                                                 The proof that it is sufficient
                                                                                    to “invent” a hidden variable
       • As many as there are sharing settings.                                     for each sharing setting in the
       • The hidden variables are counterparts of the                               resulting abstract sharing,
           user-defined attributes used in the input                                 and that fewer hidden
                                                                                    variables than that would not
           substitution.                                                            do, follows from the definition
   A straightforward algorithm to recover a minimal                                 of abstract sharing and the
                                                                                    monotonicity of logic
   set of resulting hidden variables U.                                             programs.
                                                                                    For any non-empty abstract
                                                                                    substitution, there is an
                                                                                    infinite number of compatible
  function R ECOVER S UBST VARS(V,Θ )                                               concrete substitutions even
      n ← |Θ |; U ← {u1 , u2 , ..., un }         n = |Θ | fresh variables in U     with a fixed set of hidden
      S : V → ℘(U); S ← const(0)     /          the initial value for the result   variables, because the shape
      for x ∈ V , i ∈ {1..n} do          for each variable and subst. setting      of terms may be arbitrary.
           if x ∈ Θ [i] then            if the variable appears in the setting
                                                                                    That suffices in our case,
                S ← S[x → S(x) ∪ {ui }]             add ui to its resulting set
                                                                                    because we just want to
           end if
                                                                                    know what is shared and not
      end for
                                                                                    exactly how.
  return U, S
  end function
1    2   3        4   5     6   7
              (b) Points in the resulting sharing lattice.
  Resulting Lattice (Recovered)
   Fig. 8.   Abstract substitution and the recovered hidden variables.

                            u1             u5       u2
                                                                           To interpret the resulting
                                                      u4
                     u3                                                    lattice in terms of the original
                                                    x, a5
                                                                           user-defined attributes, we
                     u7                                                    observe that sharing analysis
                                                     u6                    preserves ordering between
                     e
                                                     d                     concepts in the lattice. That
                                                                           means that the “lesser”
                                                                           concepts in the resulting
                                                     a1 , y, p,            lattice inherit all original
                   a2 , z                                                  attributes from the input data
                                                         a42 , c
                                                                           items (shown in boldface).
                a3 , a4 , a41                                              Therefore, we se the resulting
                                                                           lattice over hidden variables
                                                                           as a skeleton to “paint”
               Fig. 9. The resulting concept lattice.
       We can now construct the resulting concept                          intermediate data items and
       latice:                                                             activities with the original
                                                                           user-defined attributes.
             • activity and data item variables as objects
                              1
 reasonable recoveredpractice.variables as attributes
          • speed in hidden
  The output of theare highlighted an abstract substitution
          • activities analysis is
Fig. 8(a)), which is common to both cases of input data
Resulting Context
                                                                   The resulting context is a
                                                                   simple tabular form that is
     Finally, after assigning user-defined attributes to            presented to the user as the
     concepts in the resulting lattice, we can create              result of the sharing analysis.
     the resulting context:                                        The user starts with the input
                                                                   context (above the line) and
                                                                   the workflow definition, while
               Item        Name PIN  Symp.   Tests Cover.                          Item            Na
                                                                   all other steps are
                 x                                                                    x
                                                                   intermediate, mechanical and
                 d                                                                    d
                                                                   ideally fully automated.
                 e                                                                    e
              a2 , z                                               For activities,a , z
                                                                                    attributes
                                                                                     2
                                                                   indicated the properties of
        a1 , y, p, a42 , c                                                 a1 , y, p, a42 , c
                                                                   data visible (read) by an
         a3 , a4 , a41                                             activity. a3 , data items,
                                                                             For a4 , a41
                a5                                                                   a5
                                                                   attributes describe the
                                                                   information content of data
                                       Fig. 10. The resulting context for thewas derived from.cas
                                                                   and what it two analysis
        • The input data items (above the line) keep the
                                                                          Note that the sharing analysis
          initial attributes                                    is conservative in the sense
meaning of these outputdata items and activities be interpreted that all attributes thatand are
        • Intermediate hidden variables has to (below              and Tests, cannot
in terms of the originaladded with— starting with those of the beMedicationoutprovider
          the line) are attributes the assigned attributes          decidedly ruled
                                                                          included.
                                                                                         are

input data items. The sharing analysis of course preserves the              Coverage, and are
original relationship among the input top-level variables [8]:              (a3 , a4 and a41 ) nee
4   Application to Fragment Identification
Fragmentation Example (Information Flow)
                                      Main medical workflow                                                                       Workflow for service a4 .
                                                                                 ¬stable      a4 : Select new
    Organization


                                                                                                medication
      Health




                                                                                                                                             a41 : Run tests to   no     Result      yes
                                                                        +                                                                 produce medication         sufficiently
                                                                                                                                                   criteria             specific?
                                                                                             a3 : Continue last
                                         +                                                       prescription
                                                                                    stable
                          Examiners
                           Medical




                                                   a1 : Retrieve                                                                                a42 : Search
                                                  medical history                                                                               medication
                                                                                                                                                 databases
    Registry  Medication
                Provider




                                                   a2 : Retrieve
                                                 medication record
     Archive




                                                                                                   a5 : Log treatment




                                                                     Fig. 11.   An example fragmentation for the drug prescription workflow.
                      Distributing execution of the workflow(s) across organizations
            • Fragment: a subset of activities[12] Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit Data
                    ACKNOWLEDGMENTS
                                                               sharing a common property
            • Fragments assigned to swim-lanes (partners)        Links in WS-BPEL Processes. In International Conference on Services
                                                                 Computing (SCC), 2008.
    The research leading to these results has received funding
            • Property: access level Programme [13] Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performance
 from the European Community’s 7    th Framework to sensitive data
                                                                 of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell.,
 under the NoE S-Cube (Grant Agreement n◦ 215483). Thesee insurance coverage
                        Medical examiners cannot                   14(2-3):189–216, 2002.
 authors were also partially supported by Spanish MEC project [14] J.W. Lloyd. Foundations of Logic Programming. Springer, 2nd Ext. Ed.,
 2008-05624/TIN DOVES and CM project P2009/TIC/1465 see medicalDaniel Wutke, and Frank Leymann. A Novel Approach
                        Medication providers cannot [15] Daniel Martin, tests
                                                                   1987.

 (PROMETIDOS).          Registry can see only the patient ID.      to Decentralized Workflow Enactment. In EDOC ’08: Proceedings
                                                                                                                             of the 2008 12th International IEEE Enterprise Distributed Object
                                                                                                                             Computing Conference, pages 127–136, Washington, DC, USA, 2008.
                                                      R EFERENCES                                                            IEEE Computer Society.
                                                                                                                        [16] F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program
Another Input Context Example

1        INITIAL SUBSTITUTION                                             3     RESULTING LATTICE

    init2(X,D,E):-
                                                                                      u1
      X=f1(Name, Address, SSN),                                                                             u2
      D=f2(SSN, Tests, Coverage),
                                                                                 u4
      E=f3(SSN, Coverage).                                                       e
                                                                                                                       u3
                                                                                                                     x, a5


2        SHARING RESULTS

                                                                                 u5
    u1     [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],                                                               z, a2
                                                                                 d
    u2      [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],
    u3      [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],
    u4      [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],
    u5      [D,A1,Y,A3,A4,A41,C,A42,P]]                                         y , p, c, a1 , a3 , a4 , a41 , a42




4        RESULTING CONTEXT                                                5     RESULTING FRAGMENTATION SCHEME



                                Name   Address   SSN   Tests   Coverage
                                                                                   Swimlane             Activities
            x, a5
                                                                              Health Organization       a1 , a3 , a4 , a41 , a42
               d
               e                                                              Medical Examiners         (empty)
            z, a2                                                             Medication Provider       a2
    y , p, c, other a   ·                                                     Registry  Archive        a5
5   Conclusions
Conclusions
   Representing inputs and intermediate data and activities using FCA
   contexts and concept lattices allows lattice-based formulation,
   interpretation, and reasoning on data attributes.
   Sharing analysis of logic programs is a powerful technique for
   (abstract) sharing analysis, including sharing of data attributes.
     • Supports complex data and control structures
     • Applicable both at design and run-time
   Applications include fragmentation (as illustrated), but also:
   Data compliance checking – to verify that sufficient information is
                 exchanged between component services
   Robust top-down development – by refining component /
                 sub-workflow specifications.
   Future work: developing translators from concrete executable
   languages (BPEL, XPDL, Yawl, etc.) into Horn clause programs to
   facilitate automatic analysis. Also: analyzing stateful conversations
   between compositions.
References


The content of this presentation is based on the following publications:

    Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo.
    Automated Attribute Inference in Complex Service Workflows Based on Sharing
    Analysis.
    Proceedings of the 8th International Conference on Service Computing - IEEE SCC
    2011, IEEE Press, 2011.
    Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo.
    Automatic Fragment Identification in Workflows Based on Sharing Analysis.
    In Mathias Weske, Jian Yang, Paul Maglio, and Marcelo Fantinato, editors,
    Service-Oriented Computing – ICSOC 2010, number 6470 in LNCS. Springer Verlag,
    2010.
References
Some pointers on Web service analysis and fragmentation:
    Daniel Martin, Daniel Wutke, and Frank Leymann.
    A Novel Approach to Decentralized Workflow Enactment.
    In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise Distributed
    Object Computing Conference, pages 127–136, Washington, DC, USA, 2008. IEEE
    Computer Society.

    Ustun Yildiz and Claude Godart.
    Information Flow Control with Decentralized Service Compositions.
    In Proceedings of ICWS 2007, pages 9–17, 2007.

    Oliver Kopp, Rania Khalaf, and Frank Leymann.
    Deriving Explicit Data Links in WS-BPEL Processes.
    In International Conference on Services Computing (SCC), 2008.

    Rania Khalaf.
    Note on Syntactic Details of Split BPEL-D Business Processes.
    Technical Report 2007/2, IAAS, U. Stuttgart, July 2007.
References

Some pointers to Formal Concept Analysis (FCA):

   Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors.
   Formal Con- cept Analysis, Foundations and Applications.
   Volume 3626 of Lecture Notes in Computer Science. Springer, 2005.

   Claudio Carpineto and Giovanni Romano.
   Concept Data Analysis: Theory and Applications.
   Wiley, 2004.

   B. A. Davey and H. A. Priestley.
   Introduction to Lattices and Order.
   Cambridge University Press, 2nd ed. edition, 2002.

   Sergei O. Kuznetsov and Sergei A. Obiedkov.
   Comparing performance of algorithms for generating concept lattices.
   J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002.
References

Some pointers on program analysis in general and logic programming:

    P. Cousot and R. Cousot.
    Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by
    Construction or Approximation of Fixpoints.
    In ACM Symposium on Principles of Programming Languages (POPL’77). ACM
    Press, 1977.
    F. Nielson, H. R. Nielson, and C. Hankin.
    Principles of Program Analysis.
    Springer, 2005. Second Ed.

                                                ´
    M. V. Hermenegildo, F. Bueno, M. Carro, P. Lopez, E. Mera, J.F. Morales, and G.
    Puebla.
    An Overview of Ciao and its Design Philosophy.
    Theory and Practice of Logic Programming, 2011. http://arxiv.org/abs/1102.5497.
Acknowledgements




   The research leading to these results has received funding
   from the European Community’s Seventh Framework
   Programme [FP7/2007-2013] under grant agreement 215483
   (S-Cube).

More Related Content

Similar to S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Service Mapping
Service Mapping Service Mapping
Service Mapping
Mohamed El-Qassas
 
Diksha sda presentation
Diksha sda presentationDiksha sda presentation
Diksha sda presentation
dikshagupta111
 
Software engineering Questions and Answers
Software engineering Questions and AnswersSoftware engineering Questions and Answers
Software engineering Questions and Answers
Bala Ganesh
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
Up2 Technology
 
SQL- Data Base
SQL- Data BaseSQL- Data Base
SQL- Data Base
mohannadalhanahnah
 
High-Performance Interoperable Architecture for Information Dominance
High-Performance Interoperable Architecture for Information DominanceHigh-Performance Interoperable Architecture for Information Dominance
High-Performance Interoperable Architecture for Information Dominance
Real-Time Innovations (RTI)
 
SDN Federation White Paper
SDN Federation White PaperSDN Federation White Paper
SDN Federation White Paper
Brian Hedstrom
 
Evils of Layering in Telecom Management
Evils of Layering in Telecom ManagementEvils of Layering in Telecom Management
Evils of Layering in Telecom Management
sfratini
 
SOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURE
SOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURESOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURE
SOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURE
AnyaForger34
 
[2015/2016] Introduction to software architecture
[2015/2016] Introduction to software architecture[2015/2016] Introduction to software architecture
[2015/2016] Introduction to software architecture
Ivano Malavolta
 
Business requirement analysis session 5
Business requirement analysis   session 5Business requirement analysis   session 5
Business requirement analysis session 5
sampad_senapati
 
E-Services course Chapter II ISI by Ettaieb Abdessattar
E-Services course Chapter II ISI by Ettaieb AbdessattarE-Services course Chapter II ISI by Ettaieb Abdessattar
E-Services course Chapter II ISI by Ettaieb Abdessattar
Abdessattar Ettaieb
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
darach
 
Introduction to hl7
Introduction to hl7Introduction to hl7
Introduction to hl7
Bhushan Borole
 
Private cloud reference model ms
Private cloud reference model msPrivate cloud reference model ms
Private cloud reference model ms
chrisjosewanjira
 
Collaborate 2012 - the never ending road of project management presentation c...
Collaborate 2012 - the never ending road of project management presentation c...Collaborate 2012 - the never ending road of project management presentation c...
Collaborate 2012 - the never ending road of project management presentation c...
Chain Sys Corporation
 
Systems Lifecycle workbook
Systems Lifecycle workbookSystems Lifecycle workbook
Systems Lifecycle workbook
MISY
 
Aspect Oriented Programming - AOP/AOSD
Aspect Oriented Programming - AOP/AOSDAspect Oriented Programming - AOP/AOSD
Aspect Oriented Programming - AOP/AOSD
Can R. PAHALI
 
Software engineering lecture 1
Software engineering  lecture 1Software engineering  lecture 1
Software engineering lecture 1
JusperKato
 
Non-functional requirements
Non-functional requirements Non-functional requirements
Non-functional requirements
Rohela Raouf
 

Similar to S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis (20)

Service Mapping
Service Mapping Service Mapping
Service Mapping
 
Diksha sda presentation
Diksha sda presentationDiksha sda presentation
Diksha sda presentation
 
Software engineering Questions and Answers
Software engineering Questions and AnswersSoftware engineering Questions and Answers
Software engineering Questions and Answers
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
 
SQL- Data Base
SQL- Data BaseSQL- Data Base
SQL- Data Base
 
High-Performance Interoperable Architecture for Information Dominance
High-Performance Interoperable Architecture for Information DominanceHigh-Performance Interoperable Architecture for Information Dominance
High-Performance Interoperable Architecture for Information Dominance
 
SDN Federation White Paper
SDN Federation White PaperSDN Federation White Paper
SDN Federation White Paper
 
Evils of Layering in Telecom Management
Evils of Layering in Telecom ManagementEvils of Layering in Telecom Management
Evils of Layering in Telecom Management
 
SOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURE
SOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURESOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURE
SOA1-Background.ppt SOFTWARE ORIENTED SERVICES AND ARCHITECTURE
 
[2015/2016] Introduction to software architecture
[2015/2016] Introduction to software architecture[2015/2016] Introduction to software architecture
[2015/2016] Introduction to software architecture
 
Business requirement analysis session 5
Business requirement analysis   session 5Business requirement analysis   session 5
Business requirement analysis session 5
 
E-Services course Chapter II ISI by Ettaieb Abdessattar
E-Services course Chapter II ISI by Ettaieb AbdessattarE-Services course Chapter II ISI by Ettaieb Abdessattar
E-Services course Chapter II ISI by Ettaieb Abdessattar
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Introduction to hl7
Introduction to hl7Introduction to hl7
Introduction to hl7
 
Private cloud reference model ms
Private cloud reference model msPrivate cloud reference model ms
Private cloud reference model ms
 
Collaborate 2012 - the never ending road of project management presentation c...
Collaborate 2012 - the never ending road of project management presentation c...Collaborate 2012 - the never ending road of project management presentation c...
Collaborate 2012 - the never ending road of project management presentation c...
 
Systems Lifecycle workbook
Systems Lifecycle workbookSystems Lifecycle workbook
Systems Lifecycle workbook
 
Aspect Oriented Programming - AOP/AOSD
Aspect Oriented Programming - AOP/AOSDAspect Oriented Programming - AOP/AOSD
Aspect Oriented Programming - AOP/AOSD
 
Software engineering lecture 1
Software engineering  lecture 1Software engineering  lecture 1
Software engineering lecture 1
 
Non-functional requirements
Non-functional requirements Non-functional requirements
Non-functional requirements
 

More from virtual-campus

S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...
S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...
S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...
virtual-campus
 
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical MetaphorS-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
virtual-campus
 
S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...
S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...
S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...
virtual-campus
 
S-CUBE LP: The Chemical Computing model and HOCL Programming
S-CUBE LP: The Chemical Computing model and HOCL ProgrammingS-CUBE LP: The Chemical Computing model and HOCL Programming
S-CUBE LP: The Chemical Computing model and HOCL Programming
virtual-campus
 
S-CUBE LP: Executing the HOCL: Concept of a Chemical Interpreter
S-CUBE LP: Executing the HOCL: Concept of a Chemical InterpreterS-CUBE LP: Executing the HOCL: Concept of a Chemical Interpreter
S-CUBE LP: Executing the HOCL: Concept of a Chemical Interpreter
virtual-campus
 
S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...
S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...
S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...
virtual-campus
 
S-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task ModelsS-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task Models
virtual-campus
 
S-CUBE LP: Impact of SBA design on Global Software Development
S-CUBE LP: Impact of SBA design on Global Software DevelopmentS-CUBE LP: Impact of SBA design on Global Software Development
S-CUBE LP: Impact of SBA design on Global Software Development
virtual-campus
 
S-CUBE LP: Techniques for design for adaptation
S-CUBE LP: Techniques for design for adaptationS-CUBE LP: Techniques for design for adaptation
S-CUBE LP: Techniques for design for adaptation
virtual-campus
 
S-CUBE LP: Self-healing in Mixed Service-oriented Systems
S-CUBE LP: Self-healing in Mixed Service-oriented SystemsS-CUBE LP: Self-healing in Mixed Service-oriented Systems
S-CUBE LP: Self-healing in Mixed Service-oriented Systems
virtual-campus
 
S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...
S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...
S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...
virtual-campus
 
S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...
S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...
S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...
virtual-campus
 
S-CUBE LP: Analyzing Business Process Performance Using KPI Dependency Analysis
S-CUBE LP: Analyzing Business Process Performance Using KPI Dependency AnalysisS-CUBE LP: Analyzing Business Process Performance Using KPI Dependency Analysis
S-CUBE LP: Analyzing Business Process Performance Using KPI Dependency Analysis
virtual-campus
 
S-CUBE LP: Process Performance Monitoring in Service Compositions
S-CUBE LP: Process Performance Monitoring in Service CompositionsS-CUBE LP: Process Performance Monitoring in Service Compositions
S-CUBE LP: Process Performance Monitoring in Service Compositions
virtual-campus
 
S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...
S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...
S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...
virtual-campus
 
S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs
S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event LogsS-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs
S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs
virtual-campus
 
S-CUBE LP: Proactive SLA Negotiation
S-CUBE LP: Proactive SLA NegotiationS-CUBE LP: Proactive SLA Negotiation
S-CUBE LP: Proactive SLA Negotiation
virtual-campus
 
S-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service Selection
S-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service SelectionS-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service Selection
S-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service Selection
virtual-campus
 
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services OrchestrationsS-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
virtual-campus
 
S-CUBE LP: Run-time Verification for Preventive Adaptation
S-CUBE LP: Run-time Verification for Preventive AdaptationS-CUBE LP: Run-time Verification for Preventive Adaptation
S-CUBE LP: Run-time Verification for Preventive Adaptation
virtual-campus
 

More from virtual-campus (20)

S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...
S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...
S-CUBE LP: Analysis Operations on SLAs: Detecting and Explaining Conflicting ...
 
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical MetaphorS-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
 
S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...
S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...
S-CUBE LP: Quality of Service-Aware Service Composition: QoS optimization in ...
 
S-CUBE LP: The Chemical Computing model and HOCL Programming
S-CUBE LP: The Chemical Computing model and HOCL ProgrammingS-CUBE LP: The Chemical Computing model and HOCL Programming
S-CUBE LP: The Chemical Computing model and HOCL Programming
 
S-CUBE LP: Executing the HOCL: Concept of a Chemical Interpreter
S-CUBE LP: Executing the HOCL: Concept of a Chemical InterpreterS-CUBE LP: Executing the HOCL: Concept of a Chemical Interpreter
S-CUBE LP: Executing the HOCL: Concept of a Chemical Interpreter
 
S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...
S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...
S-CUBE LP: SLA-based Service Virtualization in distributed, heterogenious env...
 
S-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task ModelsS-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task Models
 
S-CUBE LP: Impact of SBA design on Global Software Development
S-CUBE LP: Impact of SBA design on Global Software DevelopmentS-CUBE LP: Impact of SBA design on Global Software Development
S-CUBE LP: Impact of SBA design on Global Software Development
 
S-CUBE LP: Techniques for design for adaptation
S-CUBE LP: Techniques for design for adaptationS-CUBE LP: Techniques for design for adaptation
S-CUBE LP: Techniques for design for adaptation
 
S-CUBE LP: Self-healing in Mixed Service-oriented Systems
S-CUBE LP: Self-healing in Mixed Service-oriented SystemsS-CUBE LP: Self-healing in Mixed Service-oriented Systems
S-CUBE LP: Self-healing in Mixed Service-oriented Systems
 
S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...
S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...
S-CUBE LP: Analyzing and Adapting Business Processes based on Ecologically-aw...
 
S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...
S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...
S-CUBE LP: Preventing SLA Violations in Service Compositions Using Aspect-Bas...
 
S-CUBE LP: Analyzing Business Process Performance Using KPI Dependency Analysis
S-CUBE LP: Analyzing Business Process Performance Using KPI Dependency AnalysisS-CUBE LP: Analyzing Business Process Performance Using KPI Dependency Analysis
S-CUBE LP: Analyzing Business Process Performance Using KPI Dependency Analysis
 
S-CUBE LP: Process Performance Monitoring in Service Compositions
S-CUBE LP: Process Performance Monitoring in Service CompositionsS-CUBE LP: Process Performance Monitoring in Service Compositions
S-CUBE LP: Process Performance Monitoring in Service Compositions
 
S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...
S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...
S-CUBE LP: Service Level Agreement based Service infrastructures in the conte...
 
S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs
S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event LogsS-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs
S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs
 
S-CUBE LP: Proactive SLA Negotiation
S-CUBE LP: Proactive SLA NegotiationS-CUBE LP: Proactive SLA Negotiation
S-CUBE LP: Proactive SLA Negotiation
 
S-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service Selection
S-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service SelectionS-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service Selection
S-CUBE LP: A Soft-Constraint Based Approach to QoS-Aware Service Selection
 
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services OrchestrationsS-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
 
S-CUBE LP: Run-time Verification for Preventive Adaptation
S-CUBE LP: Run-time Verification for Preventive AdaptationS-CUBE LP: Run-time Verification for Preventive Adaptation
S-CUBE LP: Run-time Verification for Preventive Adaptation
 

Recently uploaded

Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 

Recently uploaded (20)

Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 

S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

  • 1. S-Cube Learning Package Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis ´ Universidad Politecnica de Madrid (UPM)
  • 2. Learning Package Categorization S-Cube WP-JRA-2.2: Adaptable Coordinated Service Compositions Models and Mechanisms For Coordinated Service Compositions Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis
  • 3. Table of Contents 1 Introduction and Background 2 Motivation and Problem Statement 3 Overview of the Approach Contexts and Concept Lattices Horn Clause Programs Workflows in Horn Clause Form Sharing Analysis Obtaining and Interpreting Results 4 Application to Fragment Identification 5 Conclusions These slides have been prepared for offline viewing. Throughout the presentation, running commentaries, notes and additional remarks will be displayed on the margins using the condensed font, like here. Please refer to the publication list at the end for more details.
  • 4. 1 Introduction and Background
  • 5. SOA and Web Services Service Oriented Architecture (SOA): We mention just some of the • Flexible set of computing system design and key features of SOA and Web implementation principles Services, that correspond to • Emphasis on loose coupling between services a “high-level” view of the area, i.e. without taking into and with OS account the details of • Distribution over Internet/intranet implementation technologies • Actors: service providers and consumers and infrastructure. • Intrinsic dynamism and adaptability There are many standards • Functionality often in the form of Web services and technologies in the service world. Also, different Web services: service provision platforms • Interoperability, platform independence offer varying degrees of functionality to users and • Data exchange standards: XML designers. • Several technological flavors: For a more detailed WSDL/SOAP-based introduction to the SOA RESTful design philosophy, platforms, • Typical implementation platforms: tools, and techniques, please refer to the list of publications. Java / .NET application servers, BPEL, Web server scripts (e.g. PHP)
  • 6. Service Compositions Service compositions are typically designed to reflect some underlying Service compositions aggregate individual technological or business services to achieve a more complex or process. cross-organizational task: Compositions thus allow creation of new “higher level” • Combining loosely coupled components functionality from existing • Compositions expose themselves as services services as building blocks. A • Often described using workflows (control/data) service composition often involves services from • Complex control and latent parallelism different subsystems within • Potentially long-running an organization, as well as • Centralized control ⇒ orchestration external services. • Subject to migration, adaptation, fragmentation, Orchestrations have a etc. centralized control and data • Described using abstract & executable flow (a workflow of activities). They are usually described formalisms & languages using a general-purpose or specialized language or formal notation.
  • 7. Data in Service Compositions Analyzing data together with control allows us to answer Data in service compositions represents inputs, questions about composition intermediate results, internal messages, and behavior depending on the input data and other received final results: messages. • Workflow activities operate on data (access, E.g., we can ask whether combine, transform, etc.) some conditional branch in • Therefore data dependencies as important as the workflow will be taken for a given kind of data, or what control values would some • Data is atomic or structured using rich intermediate data fields have information formats (XML trees) for the given input. • Uses query languages (e.g. XPath) to search That kind of problems has and access fields and nested elements been long studied in program analysis, and getting exact • Behavior of control structures typically depends answers is generally hard on data. and undecidable in presence of loops.
  • 8. 2 Motivation and Problem Statement
  • 9. Example Medical Workflow x: Patient ID y: Medical history a1 : Retrieve ¬stable a4 : Select new medical history medication + + a5 : Log treatment a2 : Retrieve a3 : Continue last medication record stable prescription z: Medication record Written using BPMN (Businessdrug prescription workflow. Notation). Fig. 1. An example Process Modeling • A y: Medical history(non-executable) description. high-level c: Criterion z: Medication record p: Prescription candidate This workflow shows a simplified drug prescription process in a health organization. At the entry, the patient identifies a41 : Run tests to a42 : Search him/herself (item x, PatientID). The patient’s medical history (y ) and medication record (z) are then retrieved in parallel produce medication medication (activities a1 and a2 ). criteria databases
  • 10. Example Medical Workflow (contd.) x: Patient ID y: Medical history a1 : Retrieve ¬stable a4 : Select new medical history medication + + a5 : Log treatment a2 : Retrieve a3 : Continue last medication record stable prescription z: Medication record Fig. 1. An example drug prescription workflow. Aiming at fragmentation that respects data privacy. y: Medical history Depending on whether the patient’s condition is stable or not,c: Criterion earlier prescription is continued (activity a3 ), or a either the z: Medication record new medication is selected (activity a4 ). Finally, the treatment of the patient is logged (activity a5 ). p: Prescription candidate In this example, we consider data privacy attributes. Data to contain confidential information on the patient’s medical and a41 : Run tests may a42 : Search produceA fragment should contain activities that access data of only a certain medication history, including insurance coverage. medication medication criteria databases privacy level. The fragments can then be distributed based on what privacy clearance they require.
  • 11. z: Medication record Example Sub-Workflow Fig. 1. An example drug prescription workflow. y: Medical history c: Criterion z: Medication record p: Prescription candidate a41 : Run tests to a42 : Search produce medication medication criteria databases no Result yes sufficiently specific? Fig. 2. Selection of new medication. Sub-Workflow for medication selection (component service a4 ) der to make concepts useful for analysis, we on intermediate data. • involves looping based need to Concepts may have one or both parts of the annotatio e them into concept lattices. A lattice is a mathemat- in the latter case, the annotation is not shown. ucture make≤, ∨, ∧) built around a“unpack” (in of thecase To (L, things more interesting, we set L one our componentFig. 5 presents the concept lattices for the medical services, a4 , from the main workflow and represent it ing as a sub-workflow with own inputs (items y and z), outputs (itemcontexts from Fig.dataThe most general concepts are s concepts from a context), a partial order relation p), and intermediate 4. (item c). east upper bound (LUB) operation ∨, and the greatest top of the lattices, and the most specific (empty in bo bound soon as there is a loop involvedarbitrary x, y ∈ L, the analysis bottom. more complex. An exact analysis of the As (GLB) operation ∧. For (taking the “no” branch), the at the becomes orchestration state after the loop would require a discovery of the loop invariant, which is a generally difficult problem. As we x ∨ y = z has the property x ≤ z and y ≤ z, but it is least such element, i.e., for any other w ∈ L such that B. Describing Data with Concept Lattices will show in the next section, we find our way around this obstacle by employing abstract interpretation techniques that give us a conservative approximation of the loop behavior. and y ≤ w, we have z ≤ w. The case for the greatest The data items that are input to the workflow ne ound operation ∧ is symmetric. In this paper, we deal mapped to the appropriate objects in the input conce
  • 12. Data Attributes Reasoning about the User-defined attributes can be used to whereabouts of data in the characterize data in a given analysis domain execution of a service composition is simpler if we • Application dependent view track only data attributes and • Simplified data model: sets of properties instead not the entire complex data structures. This fits very well of complex structures an approach to program • User (designer) chooses relevant attributes, analysis known as abstract describing e.g.: interpretation, where infinite information content data domains are abstracted privacy/confidentiality levels ⇐ our example into finite ones. ownership E.g., knowing privacy levels other aspects of quality of input data, we can try to infer the privacy levels of • Possibly: a combination of views intermediate data and the • Known or assumed for input data, implicit in individual activities in the control/data dependencies in the workflow workflow. Of course, we have to know Question: How to infer attributes (i.e. properties) how data tests and of intermediate and resulting data items? operations depend on and • Based on control flow and data dependencies affect data attributes on the abstract plane.
  • 13. Knowing Data Attributes Knowledge of data attributes at design time: Analysis of data attributes for components of a service • Supporting fragmentation orchestration at design time is Fragment: a part of orchestration that can be an instance of static analysis, distributed for remote execution where properties are inferred What parts can be identified and enacted in a from specification, and not by running the orchestration. distributed fashion? The static analysis approach • Checking data compliance can be combined with Content of messages exchanged with/between monitoring and adaptation component services in an orchestration mechanisms, and the Is “sufficient” data passed to components? analysis can be performed on a live executing instance of • Robust top-down development the orchestration. Modular structure of service orchestrations That can give more accurate Refining specifications of workflow results, because by looking at (sub-)components the live instance we can learn the actual values of data up Also useful at runtime: to that point in execution, and • Updating predictions with actual data update the analysis • On-demand analysis accordingly.
  • 14. Problem Statement P ROBLEM To infer user-defined attributes for data items and activities on different lev- els in an orchestration, automatically from: known attributes of input data • defined or assumed by the designer control structure • including complex control structures, such as parallel flows, conditional branches and loops. data operations • reading or writing data, including tests, assignments and service invocations The aim is to provide the automated, mechanical inference of data attributes, ideally using a tool that can be invoked at design time. The concrete tool implementation depends on the language in which the orchestration is written (e.g., BPEL, BPMN, etc.) In this learning package, we generally present the approach and ideas for each step in the analysis process. These steps can be adapted to particular orchestration language and turned into a fully automated tool chain.
  • 15. 3 Overview of the Approach
  • 16. Overview Input data context Workflow definition Resulting context User perspective α1 α2 α3 ... α1 α2 α3 ... i1 o1 i2 o2 i3 o3 ... ... Input concept lattice Resulting concept lattice Underlying techniques and artifacts Horn clause program w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):- A1=f1(X1), Y1=f1Y1(X1), A2=f2(X2), Y2=f2Y2(X2), A3=f3(Y1,Y2), ... Input substitution Sharing analysis Abstract substitution ... X1=f(U1,U2), - Abstract interpretation [[X1,A1,Y1,A3,Z1], [A3,Z1,A4,Z2], X2=f(U1), - Sharing+freeness domain [X2,A4,Z2], X3=f, - CiaoDE / CiaoPP suite [X2,A2,Y2,A3,Z1,A4,Z2]] ... Fig. 3. Overview of the approach. Above the line are artifacts that the user works with directly. The input data context describes user-defined attributes of the eed inputs to the orchestration, and accompanies the workflow definition, like the represented in the form results are program in to be mapped to appropriate objects (in this case the needs to be one in our example. The of a logic returned [14]: Medical history the resulting context which gives back the attributes series of logical implications results, and activities. the form of and the Medication record from Fig. 5(a)). for the intermediate data items, which can be operationally un are explained in the slides that follow. stating which subgoals are needed to accomplish The intermediate steps below the line NALYSIS IV. A PPLYING S HARING A derstood as given goal. Note that the translation into a logic program doe Our application of sharing analysis to elicit new knowledge not need to be operationally equivalent to the initial workflow
  • 17. Overview (contd.) The approach to Automated Attribute inference takes as input: • an input data context that identifies the input data items to the workflow and their attributes • a workflow definition in some appropriate formalism (e.g. BPMN in our example) and gives at output: • a resulting context that presents inferred attributes of all intermediate data items and activities in the given workflow. The key steps in the process include: • Conceptualizing the input data context in the form of a concept lattice, and preparing the input substitution for the analysis. • Turning the given workflow definition into a Horn Clause program that is fed to the analysis, along with the input substitution. • Performing sharing analysis and using its result, the abstract substitution to construct the resulting concept lattice. • Interpreting the resulting concept lattice to produce the resulting context.
  • 18. Outline of the Section This Overview section starts with two subsections that introduce some important background notions. • Subsection Contexts and Concept Lattices briefly introduces the key notions of Formal Concept Analysis (FCA), like contexts and concept lattices that are used in the rest of the text for representing (and reasoning about) inputs and outputs of the proposed analysis approach. • Subsection Horn Clause Programs presents the key ideas behind logic (Horn Clause) programs, gives an informal introduction to their form and meaning, and presents the notion of structured terms, substitutions, unification, which are all referred to later. It also introduces Prolog syntax. These two subsections do not describe steps of the approach as such. Rather, they supply the notions whose understanding is necessary for understanding the steps in the approach.
  • 19. Outline of the Section (contd.) The rest of subsections describe steps in the process of automated attribute inference: • Subsection Workflows in Horn Clause Form starts from a rather generalized way of describing workflows that involve complex data and control dependencies, and describes how such workflows can be turned into a Horn Clause form amenable to sharing analysis. • Next, subsection Sharing Analysis first defines the notion of sharing in logic programs, building on the notion of substitution, introduced earlier. Next, it describes the notion of abstract substitution, which is used in the actual analysis as the domain for abstract interpretation. It also describes how an initial substitution for the analysis is set up using attributes from the input concept lattice. • Finally, subsection Obtaining and Interpreting Results explains how the result of the sharing analysis, in the form of abstract substitution, is turned into a resulting concept lattice, and then used to generate the resulting context, which is the end result.
  • 20. 3 1 Contexts and Concept Lattices
  • 21. with the approaches to verify The sharing analysis tools we will use [7], [6] work on logic Contexts: therefore the workflow under consideration specifications using data-flow programs, and Objects and Attributes those higher-level conceptual with various aspects of busine Formal Concept Analysis is a Symptoms Tests Coverage case we aim mathematical prop branch of at inferring Medical history that takes into account details o lattice theory concerned with Medication record control flow and data operatio knowledge representation (a) Characteristics of medical databases. or UMLreasoning. diagrams a and activity whileAHorncontext is simply a an FCA clauses provide Name Address PIN SSN that has been extensively stud table that associates objects Passport As with attributes. an illustration, we give National Id Card of our workflowon the left in B The examples written Driving License clauses. The contexts: one that th show two translation for describes the content of Social Security Card Prolog syntax, and will be ex medical databases, and (b) Types of identity documents. Lines 1-8that describes the another are a Horn clause the workflow with a list of com information contained in Fig. 4. Two examples of contexts. (linesdifferent identity documents. 2-8) following the defini Objects (rows) stand for Notion of context in Formal Concept Analysis some meaningful entities, (FCA) and attributes (columns) are • Set A of attributes (columns) chosen by the user to represent relevant notions in • Set O of objects (rows) the application domain. • Boolean object-attribute relation ρ ⊆ O × A
  • 22. Concepts From the definition, in concept (B , D ) we need to The idea behind a concept is a close connection know only B or D to find the other using (·) . between subsets of objects and attributes. That means we can choose Objects → Attributes to work with objects or • For arbitrary subset of objects B ⊆ O attributes, whatever is more convenient. let B = {a ∈ A | ∀o ∈ B , oρa } E.g., we can start from a “all and only those attributes that belong to all single attribute a and objects from B” calculate {a } to find the Attributes → Objects most general concept that has a. • For arbitrary subset of attributes D ⊆ A Or, we can start with an let D = {o ∈ O | ∀a ∈ D , oρa } object o and calculate {o } “all and only those objects that have all to find the most specific attributes from D” concept containing o. Because B = B and Iff B = D and D = B then (B , D ) is a concept D = D, we say that • B = (B ) = D = B, D = (D ) = B = D concepts are closed under (·) , i.e., (·) is a closure.
  • 23. activity, and ϕ is an uninterpreted discussed symbol to be function below. A41=f41(Y,Z), % a_41 race condition betw particular name is not relevant for sharing analysis, C=f41_C(Y), The ordering of activities in the body of a clause must try to read/write the 21 Concept Lattices been chosen to recall the activity name). This is A42=f42(C), % a_42 respect data dependencies, in the sense P=f42_P(C). should example and the pos that data items 23 d by goals of the same shape where the left-hand side a goal only if they are produced by a detected from the st appear as arguments in Concept latticepreceding activity. The ordering also needs to respect control include both branch tands for data item produced by the withactivity, and the ordering Fig. 6. Horn clause program encoding for the medication prescri n the right hand side includes dependencies arising fromworkflow. sequences and joins (AND can be affected by • (B1 , D ) data ,items usedB1 the B2 explicit2 ⊆ D1 (B goals ) ⇔ in ⊆ The concept lattice is often D2 ⇔D tion of the data item. For1instance, 2OR). A1=f1(X,D) in the AND-split case, the relative using a variant ofactivity shown component and Otherwise, as 1 Y(X,D) in•lines 2 and concept ofthe fact that a1 Lesser 3 representadds attributes (D2the body i.e.a Horn clause is diagrams (bottom left). a order activities as goals in ⊆ D1 , of Hasse its body (activities a items x and d asB1 , D1 ) is moredata item y. The not significant from the sharing analysis same manner as ( inputs, to produce specific) Nodes represent concepts.view the point of ception in w is the goal for concept includes lesser objects ordering can always be found, unless there • Greater sub-workflow a4 (line 7) and one such The top concept is goal for a4 The visually at scussed below. race condition between potentially parallelized activities Symptoms the top, and the predicate a4 de to a bottom (B1 ⊆ B2 , i.e. (B , D2 ) is more general) ordering of activities in the body of2a clause must try to read/write the same data item. This is not the case in concept istranslated the introdu visually at by data dependencies, in ( ): the most items should • Top the sense that data generalTests exampleMedicationthe possibility ofbottom. represent the case o concept (all Coverage Medical history objects) and record this happening can be static s arguments in aBottom (⊥): the most specific concept (allthe structure of Callouts showmeans of a tothat • goal only if they are produced by a detected from the workflow. Alsonew recur by objects note g activity. The ordering also needs to respect control include both branches of the the concept (inherited by11) is in attributes) XOR-split, since the all (w2 in line data t ncies arising from explicit sequences and joins (AND can be affected by either oneupwards nodes) above the for predicate a 4x. of them. The workflow ). Otherwise, as in the AND-split case, the relative component activity a4 is effectively attributes new to the It is a complete lattice line, and aB. Input Substitutio repeat-until loop, activities as goals in the body of a Horn clause is latticebody (activities a41 and a42 ) is translated in lines 19-2 (a) Concept its for medical databases. concepts (inherited by all Name same manner as w. the An input substitu downwards nodes) below the The goal for a4 in the definition fore which 7) is a line. of w (line attribute Symptoms PIN to a predicate a4 defined in lines 10-13. Its loop is a map Address variables. It structur Passport The example concept lattices as data items given translated by introducing auxiliary clauses in lines 15-17 Driving License represent the case of loop exit (line 15) and the variables w “hidden” loop (bottom left) correspond to itera Tests Coverage Medical history Medication record SSN by means of a recursive call. the sampleto Variable sharing c Soc. Sec. Card The call contexts for of the l the body National ID (w2 in line 11) is translated medical databases to variable se represent the auxil before the call personal predicate a 4x. identification documents. of the The structure the input concept lat (a) Concept lattice for medical databases. B. Input Substitutions attribute in the inpu (b) Concept lattice for identity documents. Name An input substitution sets up the initial sharing (andcor named after the th
  • 24. 3 2 Horn Clause Programs
  • 25. About Logic Programs Logic programming is one of Logic programs represent a computation task as the classical programming set of logical rules and facts paradigms, along with imperative, object-oriented Logical rules model if-then inferences: and functional programming. • B1 ∧ B2 ∧ · · · ∧ Bn → H: if B1 , . . . , Bn are all The example gives rules for the “x is an ancestor of y” true (n 0), then we conclude that H is true. relation, written as • Often written as H ← B1 ∧ B2 ∧ · · · ∧ Bn ancestor (x , y ), using also • H is the head of the rule parent (x , y ) relation. • B1 ∧ · · · ∧ Bn is the body of the rule Logic programs are • H ← (the case n = 0) is a fact (H is always true) declarative, because they state the rules and the problem to be solved (e.g. Example finding somebody’s ancestors or descendants), not the ancestor (x , y ) ← parent (x , y ) (a parent is an ancestor) sequence of steps to solve it. ancestor (x , y ) ← parent (z , y )∧ (a parent’s ancestor is That makes logic programs ancestor (x , z ) an ancestor) relatives of SQL, but far more powerful.
  • 26. Elements of Horn Clause Programs Elements of logic programs include: The elements of logic programs (predicates, terms, • Predicates that describe logical properties or variables, constants, etc.) relations, such as ancestor /2 and parent /2 correspond to the notions in (where /n means “with n arguments”) First Order Logic (FOL). • Atoms that apply predicates, such as “x is an As in FOL, we assume that predicate and constant ancestor of y”, written as ancestor (x , y ) names refer to distinct • Variables x , y , z that stand for arbitrary objects entities – unlike differently in a rule (implicitly ∀-qualified) named variables that can • Constants that name distinct objects (such as refer to the same object. Alice, Bob, Carol and Dennis below) Also, p/1 and p/2 are two different predicates — with In a Horn Clause program rules, H and each of one and two arguments, B1 , . . . , Bn are atoms. respectively — even though they share the same name p. Continued Example: Parent Fact Database The simplified structure of Horn Clause programs allows parent (Alice, Bob) ← efficient reasoning, i.e. derivation of logical parent (Dennis, Bob) ← consequences from known parent (Carol, Dennis) ← facts and rules.
  • 27. Executing Horn-Clause Programs The sample queries compute Executing a logic programs means searching for different things (or fail, in the a proof of a logical statement known as the last case) depending on the query – in C or Java we query, finding variable values along the way. would have to program separate procedures for “find Sample Query 1: Find Bob’s ancestors person’s ancestors” and “find person’s descendants” etc. Query: ancestor (x , Bob) In case of success, the Answers: x = Alice, x = Carol, x = Dennis variables in the query may point to objects for which the query can be proven from the Sample Query 2: Find Carols’s descendants program. Query: ancestor (Carol, y ) The “magic” is done by the under-the-hood inference Answers: y = Dennis, y = Bob engine that takes the program and the query and performs a systematic search for a proof. Sample Query 3: Find Alice’s ancestors The result may be a failure, or Query: ancestor (x , Alice) a single or multiple solution (possibly infinite number of Answer: no solution (cannot prove for any x) them).
  • 28. Handling Structures Structured terms have the shape f (t1 , t2 , . . . , tm ), Note that f (t1 , . . . , tn ) is m 0, where f is a functor, and each of ti is NOT a function call in the sense of C, Python or again a term. Haskell. It can be thought of as a data record with name f Example: Peano Arithmetics and n fields. For n = 0, f is simply a constant. Program: number (0) ← number (s(x )) ← number (x ) It goes without saying that structured terms can be (and succ (x , s(x )) ← often are) nested, as in the Query 1: number (x ) examples of Peano Answers: x = 0, x = s(0), x = s(s(0)), x = s(s(s(0))), . . . arithmetics and lists. Query 2: succ (x , s(0)) Lists are very frequently used data structures, and are a Answer: x =0 common tool in logic programming. However, Lists are common structures, with constant [ ] structured terms can be used representing the empty list, and functor “.” (dot) to represent nodes in a tree used to put together the head and the tail: or a graph, records that store information, or other kinds of • [4] is the same as .(4, [ ]) data containers we need. • [1, 2, 3, 4] is the same as .(1, .(2, .(3, .(4, [ ]))))
  • 29. Unification and Substitutions An atoms of the form t1 = t2 expresses syntactical equality. • it succeeds if t1 and t2 are identical, or can be made identical by substituting some variables in t1 and t2 for terms. • we are interested in the substitution which introduces the least amount of information — the most general unifier (MGU) • a substitution maps (binds) variables to terms Unification Examples Unification MGU Unification MGU 1=0 none (failure) s (x ) = s (y ) θ = {x → y } s(0) = s(x ) θ = {x → 0} f (s(x ), x ) = f (z , 1) θ = {z → s(1), x → 1} f (0) = s(x ) none (failure) f (s(x ), y ) = f (1, z ) none (failure) Running a query means finding a substitution that makes the query true, by adding MGUs from each B1 , . . . , Bn in a rule body. Note that t1 = t2 is just a nicer way of writing = (t1 , t2 ). Unification is implicit in parameter passing in clause heads. For instance, we can rewrite the rule “succ (x , s(x )) ←” as “succ (x , y ) ← y = s(x )”. Equally, the rule “number (s(x )) ← number (x )” can be rewritten as “number (y ) ← y = s(x ) ∧ number (x )”.
  • 30. Prolog and Friends Prolog is a programming language based on The full Prolog language includes “impure” features, Horn Clause rules. such as dynamic fact updates Concrete language syntax: and I/O. Modern Prolog systems contain extensions • clauses end with a full stop (“.”) such as constraint logic • uses “:-” instead of “←”, comma instead of “∧” programming (CLP) and • variables start with uppercase letters or “ ” tabling. • predicate names, functors and constants start However, “pure” Prolog with lowercase letters programs have a close relationship with logical Powerful analysis tools and techniques based on theories. Reasoning about them in a sound fashion is “clean” program semantics. easier than in other executable formalisms. Examples in Prolog We will use Prolog to encode Ancestors: ancestor(X,Y):- parent(X,Y). objects and attributes in an ancestor(X,Y):- parent(Z,Y), ancestor(X,Z). executable from which will Peano arith.: number(0). capture the structure of a number(s(X)):- number(X). workflow, and Prolog analysis add(0, X, X):- number(X). tools to automatically derive add(s(X), Y, s(Z)):- add(X,Y,Z). attributes.
  • 31. 3 3 Workflows in Horn Clause Form
  • 32. Anatomy of a Workflow There are many concrete In general, workflows may contain complex data workflow definition languages that can be used to specify and control dependencies: control structures and data • sequences, conditional branches, and loops operations. Here, we use an • parallel flows, with pre- and post-conditions abstract workflow representation where both • data items are read and written by activities control and data • inputs: possibly complex XML information sets dependencies are shown x explicitly. x, y y, z Analyzing content of data z items at all points in a z workflow is an instance of a x y x? y ? x , y ? general program analysis problem. To solve, especially in presence of loops and complex data structures, Understanding how data is handled throughout approximation techniques the workflow is non-trivial. such as abstract • what information items / parts are used? where? interpretation are usually needed
  • 33. Example of (Enriched) Workflow To make the analysis of workflow control and data dependencies easier, let us first “distill” our BPMN workflow example into a simplified abstract form below (elements to be clarified in the slides that follow). • We keep only the activity tags (a1 , . . . , a5 ), control dependencies between them, and labels for data items read/written by the activities. • We abstract the looping in the sub-workflow as a structured activity of repeat-until type with a separate body sub-workflow. x ,d y ,z x y a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ– a1 ∧ done–a2 , AND pre–a3 ≡ done–a1 ∧ succ– a1 ∧ done–a2 , OR a5 pre–a5 ≡ done–a3 ∨ done–a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done–a41 } exit depends on p y ,z c c p
  • 34. Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre– a4 ≡ done– a1 ∧ ¬succ– a1 ∧ done– a2 , AND pre– a3 ≡ done– a1 ∧ succ– a1 ∧ done– a2 , OR a5 pre– a5 ≡ done– a3 ∨ done– a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre–a42 ≡ done– a41 } exit depends on p y ,z c c p Workflow control structure: • includes activities a1 , . . . , a5 • arrows show control dependencies (e.g. a4 depends on a1 and a2 ) • independent activities may run in parallel (e.g. a1 and a2 ) • different join types (AND/OR) Data dependencies based on read/write annotations • Wi annotation for each activity ai R i • Ri is the set of data items read, Wi is the set of data items written
  • 35. Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 , AND pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 , OR a5 pre–a5 ≡ done–a3 ∨ done–a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done–a41 } exit depends on p y ,z c c p Set C of logical control preconditions: • activity preconditions pre– ai expressed using propositional formulas • done– aj means “aj has finished” • succ– aj means “aj has achieved its (user-defined) goal” • easily models sequences and AND/OR/XOR parallel flows Helps detect possible deadlocks and race conditions: • deadlocks appear in case of circular dependencies (pre– ai → done– ai ) • race conditions appear when two activities that read/write same data item can execute in parallel
  • 36. Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 , AND pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 , OR a5 pre–a5 ≡ done–a3 ∨ done–a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done–a41 } exit depends on p y ,z c c p Based on control preconditions, we can find legal orderings of activities that respect the preconditions: • only if there are no deadlocks/race conditions (that can be efficiently checked using e.g. SAT solvers) All legal orderings are equivalent from the point of view of data handling.
  • 37. Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre– a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 , AND pre– a3 ≡ done–a1 ∧ succ–a1 ∧ done– a2 , OR a5 pre– a5 ≡ done– a3 ∨ done– a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done– a41 } exit depends on p y ,z c c p Sub-workflows can be used to model complex constructs: • in our case, activity a4 is a repeat-until loop • the body of the loop is a sub-workflow (with a41 and a42 Sub-workflows also allow modular development and/or assembly of workflows
  • 38. Workflow as a Horn Clause Program We represent workflow symbolically in w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):- a Horn Clause form for further A1=f1(X,D), % a_1 Y=f1_Y(X,D), analysis. A2=f2(X,E), % a_2 Z=f2_Z(X,E), A3=f3(Y,Z), % a_3 • the representation is not a_4(Y,Z,A4,A41,C,A42,P), % a_4 operationally equivanent A5=f5(X). % a_5 a_4(Y,Z,A4,A41,C,A42,P):- The predicate w stands for the w2(Y,Z,A41,C2,A42,P2), A4=f4(P2), workflow a_4x(Y,Z,C2,P2,C,P,A4,A41,A42). • clause body reflects a legal a_4x(_,_,C,P,C,P,_,_,_). a_4x(X,Z,_,_,C,P,A4,A41,A42):- ordering of activities a_4(X,Z,A4,A41,C,A42,P). • variables stand for data items w2(Y,Z,A41,C,A42,P):- and activities A41=f41(Y,Z), % a_41 C=f41_C(Y), A42=f42(C), % a_42 Sub-workflows and complex activities P=f42_P(C). are in separate predicates. Note that in Prolog syntax, an underscore (“ ”) represents a new, fresh variable that stands for an arbitrary term. Predicate w2 represents the body of the loop, and predicates a 4 and a 4x model the repeat-until construct.
  • 39. Workflow as a Horn Clause Program (cont.) w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):- A1=f1(X,D), % a_1 For each activity, we model data Y=f1_Y(X,D), A2=f2(X,E), % a_2 dependencies with unifications: Z=f2_Z(X,E), A3=f3(Y,Z), % a_3 • Ai = fi (Ri ) stands for “activity ai a_4(Y,Z,A4,A41,C,A42,P), % a_4 A5=f5(X). % a_5 reads data items from set Ri ” a_4(Y,Z,A4,A41,C,A42,P):- • for each written item z ∈ Wi w2(Y,Z,A41,C2,A42,P2), written by ai , z = fiz (Q ) stands A4=f4(P2), a_4x(Y,Z,C2,P2,C,P,A4,A41,A42). for “z is written using data items a_4x(_,_,C,P,C,P,_,_,_). from Q ⊆ Ri ” a_4x(X,Z,_,_,C,P,A4,A41,A42):- a_4(X,Z,A4,A41,C,A42,P). Such a Horn Clause representation w2(Y,Z,A41,C,A42,P):- can be derived mechanically and, in A41=f41(Y,Z), % a_41 C=f41_C(Y), principle, automatically. A42=f42(C), % a_42 P=f42_P(C). Choice of functors (fi and fiz ) is purely symbolic and is not significant for the subsequent sharing analysis. The purpose of the unifications in the Horn Clause representation is to express functional dependencies between activities and data items in a workflow, and not to actually calculate them.
  • 40. 3 4 Sharing Analysis
  • 41. Sharing in Logic Programs Sharing analysis tries to find Sharing analysis of logic programs tries to infer out all possible sharings how data is shared between variables: between variables in case of successful executions. This • sharing is always relative to a substitution θ requires inclusion of all upon successful execution of a query possible substitutions on exit • two variables x , y are said to share if the terms from a query. x θ and y θ (i.e. after applying θ to x and y ) That is generally impractical contain some common variable. and often impossible, since there may be many or even Example infinite number of possible substitutions to be taken into θ = {x → s(y )} x θ = s (y ), y θ = y x and y share account. To make sharing analysis θ = {} xθ = x, yθ = y x and y do NOT share viable, we often resort to θ = {x → s(w ), x θ = s(w ), y θ = f (1, z ) x and y do some sort of approximation that reduces the repertoire of y → f (1, z )} NOT share possible sharing cases to a θ = {x → [1, w , f (z )], x θ = [1, w , f (z )], x and y share finite, manageable size, while remaining safe, i.e. not y → n(z , s(w ))} y θ = n(z , s(w )) w and z missing any potential sharing.
  • 42. Abstract Substitution Domain Instead of looking at (possibly infinite number of) concrete substitutions, we can perform analysis on a simplified abstract level. Abstract substitutions approximate terms with sets of contained variables (not concerned with the exact shape of terms): Concrete: θ = { x → f ( u , g ( v )), y → h(5, u ), z → i ( v , w )} DOMAIN u v w shared by Abstract: Θ={ {x , y } , {x , z } , {z } } By operating in the abstract substitution domain, the analysis task becomes simplified and finite. The shown abstract substitution domain is not the only applicable choice. For instance, we can work with pair-wise sharing etc. Different sharing domains also differ with respect to the computational cost of the analysis and precision. The domain used here is known to be more precise (in the sense of avoiding over-approximation), but exponential in time with respect to the number of variables involved. It is also often combined with additional freeness or groundness information.
  • 43. Workflow Input Substitutions We include the information on user-defined attributes of input data to the workflow, by setting up the initial substitution for inputs (x, d, e in our case): init1(X, D, E):- X= f1(Name, Pin), D= f2(Symptoms, Tests), E= f3(Symptoms, Coverage). Reflects positioning of inputs in the initial context / concept lattice. The initial concrete substitution coded here maps to the initial abstract substitution Θ = {{x }, {d , e}, {d }, {e}} • “x has some components not shared with d or e” • “d and e share something” (Symptoms), but • “both d and e have some private (not shared) components” (Tests and Coverage, respectively) Again, note that the choice of functors (f1, f2, f3) and variable names that stand for the attributes (Name, PIN, etc.) is not significant for the abstract sharing analysis.
  • 44. 3 5 Obtaining and Interpreting Results
  • 45. Sharing Results The sharing results shown in 1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], 2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], (a) were obtained from 3 [X,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], sharing and freeness (shfr) 4 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], analysis in the CiaoPP 5 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P], analysis suite. 6 [D,A1,Y,A3,A4,A41,C,A42,P], 7 [E,A2,Z,A3,A41]] The results are safe in the (a) The resulting substitution sense that all possible sharing is included. However, Top-level variables Recovered hidden variables it may contain a degree of X, A5 {u1 , u2 , u3 , u4 } E {u1 , u3 , u5 , u7 } over-approximation, i.e. it D {u1 , u2 , u5 , u6 } may conservatively assume s (which are A2, Z {u1 , u2 , u3 , u4 , u5 , u7 } sharing where it cannot be between top- A1, Y, A42, C, P {u1 , u2 , u3 , u4 , u5 , u6 } A3, A4, A41 {u1 , u2 , u3 , u4 , u5 , u6 , u7 } ruled out with certainty. ds iff A ⊆ B, he associated (b) Points in the resulting sharing lattice. From an abstract substitution, n the sharing Fig. 8. Abstract substitution and the recovered hidden variables. we do not know which lattice. variables are shared, so one The ses for input resulting abstract substitution (a) shows u u u 1 5 2 possibility is to “recover” a sharing for datau items and activities. u 4 sufficient number of hidden 3 x, a5 variables that are shared in a t object (has It is as if the data item and activity variables u 7 u manner compatible with the 6 e shared Security Card a set of hidden variables u1 , . . . , u7 (b) d abstract substitution. s and SSN). cal histories a2 , z a1 , y, p, es Symptoms a42 , c
  • 46. Minimal Hidden Variable Recovery How many hidden variables are needed to comply with the sharing results? The proof that it is sufficient to “invent” a hidden variable • As many as there are sharing settings. for each sharing setting in the • The hidden variables are counterparts of the resulting abstract sharing, user-defined attributes used in the input and that fewer hidden variables than that would not substitution. do, follows from the definition A straightforward algorithm to recover a minimal of abstract sharing and the monotonicity of logic set of resulting hidden variables U. programs. For any non-empty abstract substitution, there is an infinite number of compatible function R ECOVER S UBST VARS(V,Θ ) concrete substitutions even n ← |Θ |; U ← {u1 , u2 , ..., un } n = |Θ | fresh variables in U with a fixed set of hidden S : V → ℘(U); S ← const(0) / the initial value for the result variables, because the shape for x ∈ V , i ∈ {1..n} do for each variable and subst. setting of terms may be arbitrary. if x ∈ Θ [i] then if the variable appears in the setting That suffices in our case, S ← S[x → S(x) ∪ {ui }] add ui to its resulting set because we just want to end if know what is shared and not end for exactly how. return U, S end function
  • 47. 1 2 3 4 5 6 7 (b) Points in the resulting sharing lattice. Resulting Lattice (Recovered) Fig. 8. Abstract substitution and the recovered hidden variables. u1 u5 u2 To interpret the resulting u4 u3 lattice in terms of the original x, a5 user-defined attributes, we u7 observe that sharing analysis u6 preserves ordering between e d concepts in the lattice. That means that the “lesser” concepts in the resulting a1 , y, p, lattice inherit all original a2 , z attributes from the input data a42 , c items (shown in boldface). a3 , a4 , a41 Therefore, we se the resulting lattice over hidden variables as a skeleton to “paint” Fig. 9. The resulting concept lattice. We can now construct the resulting concept intermediate data items and latice: activities with the original user-defined attributes. • activity and data item variables as objects 1 reasonable recoveredpractice.variables as attributes • speed in hidden The output of theare highlighted an abstract substitution • activities analysis is Fig. 8(a)), which is common to both cases of input data
  • 48. Resulting Context The resulting context is a simple tabular form that is Finally, after assigning user-defined attributes to presented to the user as the concepts in the resulting lattice, we can create result of the sharing analysis. the resulting context: The user starts with the input context (above the line) and the workflow definition, while Item Name PIN Symp. Tests Cover. Item Na all other steps are x x intermediate, mechanical and d d ideally fully automated. e e a2 , z For activities,a , z attributes 2 indicated the properties of a1 , y, p, a42 , c a1 , y, p, a42 , c data visible (read) by an a3 , a4 , a41 activity. a3 , data items, For a4 , a41 a5 a5 attributes describe the information content of data Fig. 10. The resulting context for thewas derived from.cas and what it two analysis • The input data items (above the line) keep the Note that the sharing analysis initial attributes is conservative in the sense meaning of these outputdata items and activities be interpreted that all attributes thatand are • Intermediate hidden variables has to (below and Tests, cannot in terms of the originaladded with— starting with those of the beMedicationoutprovider the line) are attributes the assigned attributes decidedly ruled included. are input data items. The sharing analysis of course preserves the Coverage, and are original relationship among the input top-level variables [8]: (a3 , a4 and a41 ) nee
  • 49. 4 Application to Fragment Identification
  • 50. Fragmentation Example (Information Flow) Main medical workflow Workflow for service a4 . ¬stable a4 : Select new Organization medication Health a41 : Run tests to no Result yes + produce medication sufficiently criteria specific? a3 : Continue last + prescription stable Examiners Medical a1 : Retrieve a42 : Search medical history medication databases Registry Medication Provider a2 : Retrieve medication record Archive a5 : Log treatment Fig. 11. An example fragmentation for the drug prescription workflow. Distributing execution of the workflow(s) across organizations • Fragment: a subset of activities[12] Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit Data ACKNOWLEDGMENTS sharing a common property • Fragments assigned to swim-lanes (partners) Links in WS-BPEL Processes. In International Conference on Services Computing (SCC), 2008. The research leading to these results has received funding • Property: access level Programme [13] Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performance from the European Community’s 7 th Framework to sensitive data of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell., under the NoE S-Cube (Grant Agreement n◦ 215483). Thesee insurance coverage Medical examiners cannot 14(2-3):189–216, 2002. authors were also partially supported by Spanish MEC project [14] J.W. Lloyd. Foundations of Logic Programming. Springer, 2nd Ext. Ed., 2008-05624/TIN DOVES and CM project P2009/TIC/1465 see medicalDaniel Wutke, and Frank Leymann. A Novel Approach Medication providers cannot [15] Daniel Martin, tests 1987. (PROMETIDOS). Registry can see only the patient ID. to Decentralized Workflow Enactment. In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise Distributed Object Computing Conference, pages 127–136, Washington, DC, USA, 2008. R EFERENCES IEEE Computer Society. [16] F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program
  • 51. Another Input Context Example 1 INITIAL SUBSTITUTION 3 RESULTING LATTICE init2(X,D,E):- u1 X=f1(Name, Address, SSN), u2 D=f2(SSN, Tests, Coverage), u4 E=f3(SSN, Coverage). e u3 x, a5 2 SHARING RESULTS u5 u1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], z, a2 d u2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], u3 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], u4 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P], u5 [D,A1,Y,A3,A4,A41,C,A42,P]] y , p, c, a1 , a3 , a4 , a41 , a42 4 RESULTING CONTEXT 5 RESULTING FRAGMENTATION SCHEME Name Address SSN Tests Coverage Swimlane Activities x, a5 Health Organization a1 , a3 , a4 , a41 , a42 d e Medical Examiners (empty) z, a2 Medication Provider a2 y , p, c, other a · Registry Archive a5
  • 52. 5 Conclusions
  • 53. Conclusions Representing inputs and intermediate data and activities using FCA contexts and concept lattices allows lattice-based formulation, interpretation, and reasoning on data attributes. Sharing analysis of logic programs is a powerful technique for (abstract) sharing analysis, including sharing of data attributes. • Supports complex data and control structures • Applicable both at design and run-time Applications include fragmentation (as illustrated), but also: Data compliance checking – to verify that sufficient information is exchanged between component services Robust top-down development – by refining component / sub-workflow specifications. Future work: developing translators from concrete executable languages (BPEL, XPDL, Yawl, etc.) into Horn clause programs to facilitate automatic analysis. Also: analyzing stateful conversations between compositions.
  • 54. References The content of this presentation is based on the following publications: Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo. Automated Attribute Inference in Complex Service Workflows Based on Sharing Analysis. Proceedings of the 8th International Conference on Service Computing - IEEE SCC 2011, IEEE Press, 2011. Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo. Automatic Fragment Identification in Workflows Based on Sharing Analysis. In Mathias Weske, Jian Yang, Paul Maglio, and Marcelo Fantinato, editors, Service-Oriented Computing – ICSOC 2010, number 6470 in LNCS. Springer Verlag, 2010.
  • 55. References Some pointers on Web service analysis and fragmentation: Daniel Martin, Daniel Wutke, and Frank Leymann. A Novel Approach to Decentralized Workflow Enactment. In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise Distributed Object Computing Conference, pages 127–136, Washington, DC, USA, 2008. IEEE Computer Society. Ustun Yildiz and Claude Godart. Information Flow Control with Decentralized Service Compositions. In Proceedings of ICWS 2007, pages 9–17, 2007. Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit Data Links in WS-BPEL Processes. In International Conference on Services Computing (SCC), 2008. Rania Khalaf. Note on Syntactic Details of Split BPEL-D Business Processes. Technical Report 2007/2, IAAS, U. Stuttgart, July 2007.
  • 56. References Some pointers to Formal Concept Analysis (FCA): Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors. Formal Con- cept Analysis, Foundations and Applications. Volume 3626 of Lecture Notes in Computer Science. Springer, 2005. Claudio Carpineto and Giovanni Romano. Concept Data Analysis: Theory and Applications. Wiley, 2004. B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 2nd ed. edition, 2002. Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002.
  • 57. References Some pointers on program analysis in general and logic programming: P. Cousot and R. Cousot. Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In ACM Symposium on Principles of Programming Languages (POPL’77). ACM Press, 1977. F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer, 2005. Second Ed. ´ M. V. Hermenegildo, F. Bueno, M. Carro, P. Lopez, E. Mera, J.F. Morales, and G. Puebla. An Overview of Ciao and its Design Philosophy. Theory and Practice of Logic Programming, 2011. http://arxiv.org/abs/1102.5497.
  • 58. Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under grant agreement 215483 (S-Cube).