S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis
Upcoming SlideShare
Loading in...5
×
 

S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

on

  • 369 views

 

Statistics

Views

Total Views
369
Views on SlideShare
303
Embed Views
66

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 66

http://vc.infosys.tuwien.ac.at 66

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis Presentation Transcript

  • S-Cube Learning Package Data Dependency:Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis ´ Universidad Politecnica de Madrid (UPM)
  • Learning Package Categorization S-Cube WP-JRA-2.2: Adaptable Coordinated Service Compositions Models and Mechanisms For Coordinated Service Compositions Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis
  • Table of Contents1 Introduction and Background2 Motivation and Problem Statement3 Overview of the Approach Contexts and Concept Lattices Horn Clause Programs Workflows in Horn Clause Form Sharing Analysis Obtaining and Interpreting Results4 Application to Fragment Identification5 ConclusionsThese slides have been prepared for offline viewing. Throughout the presentation, running commentaries, notes andadditional remarks will be displayed on the margins using the condensed font, like here.Please refer to the publication list at the end for more details.
  • 1 Introduction and Background
  • SOA and Web Services Service Oriented Architecture (SOA): We mention just some of the • Flexible set of computing system design and key features of SOA and Web implementation principles Services, that correspond to • Emphasis on loose coupling between services a “high-level” view of the area, i.e. without taking into and with OS account the details of • Distribution over Internet/intranet implementation technologies • Actors: service providers and consumers and infrastructure. • Intrinsic dynamism and adaptability There are many standards • Functionality often in the form of Web services and technologies in the service world. Also, different Web services: service provision platforms • Interoperability, platform independence offer varying degrees of functionality to users and • Data exchange standards: XML designers. • Several technological flavors: For a more detailed WSDL/SOAP-based introduction to the SOA RESTful design philosophy, platforms, • Typical implementation platforms: tools, and techniques, please refer to the list of publications. Java / .NET application servers, BPEL, Web server scripts (e.g. PHP)
  • Service Compositions Service compositions are typically designed to reflect some underlying Service compositions aggregate individual technological or business services to achieve a more complex or process. cross-organizational task: Compositions thus allow creation of new “higher level” • Combining loosely coupled components functionality from existing • Compositions expose themselves as services services as building blocks. A • Often described using workflows (control/data) service composition often involves services from • Complex control and latent parallelism different subsystems within • Potentially long-running an organization, as well as • Centralized control ⇒ orchestration external services. • Subject to migration, adaptation, fragmentation, Orchestrations have a etc. centralized control and data • Described using abstract & executable flow (a workflow of activities). They are usually described formalisms & languages using a general-purpose or specialized language or formal notation.
  • Data in Service Compositions Analyzing data together with control allows us to answer Data in service compositions represents inputs, questions about composition intermediate results, internal messages, and behavior depending on the input data and other received final results: messages. • Workflow activities operate on data (access, E.g., we can ask whether combine, transform, etc.) some conditional branch in • Therefore data dependencies as important as the workflow will be taken for a given kind of data, or what control values would some • Data is atomic or structured using rich intermediate data fields have information formats (XML trees) for the given input. • Uses query languages (e.g. XPath) to search That kind of problems has and access fields and nested elements been long studied in program analysis, and getting exact • Behavior of control structures typically depends answers is generally hard on data. and undecidable in presence of loops.
  • 2 Motivation and Problem Statement
  • Example Medical Workflow x: Patient ID y: Medical history a1 : Retrieve ¬stable a4 : Select new medical history medication + + ￿ ￿ a5 : Log treatment a2 : Retrieve a3 : Continue last medication record stable prescription z: Medication record Written using BPMN (Businessdrug prescription workflow. Notation). Fig. 1. An example Process Modeling • A y: Medical history(non-executable) description. high-level c: Criterion z: Medication record p: Prescription candidateThis workflow shows a simplified drug prescription process in a health organization. At the entry, the patient identifies a41 : Run tests to a42 : Searchhim/herself (item x, PatientID). The patient’s medical history (y ) and medication record (z) are then retrieved in parallel produce medication medication(activities a1 and a2 ). criteria databases
  • Example Medical Workflow (contd.) x: Patient ID y: Medical history a1 : Retrieve ¬stable a4 : Select new medical history medication + + ￿ ￿ a5 : Log treatment a2 : Retrieve a3 : Continue last medication record stable prescription z: Medication record Fig. 1. An example drug prescription workflow. Aiming at fragmentation that respects data privacy. y: Medical historyDepending on whether the patient’s condition is stable or not,c: Criterion earlier prescription is continued (activity a3 ), or a either the z: Medication recordnew medication is selected (activity a4 ). Finally, the treatment of the patient is logged (activity a5 ). p: Prescription candidateIn this example, we consider data privacy attributes. Data to contain confidential information on the patient’s medical and a41 : Run tests may a42 : Search produceA fragment should contain activities that access data of only a certainmedication history, including insurance coverage. medication medication criteria databasesprivacy level. The fragments can then be distributed based on what privacy clearance they require.
  • z: Medication record Example Sub-Workflow Fig. 1. An example drug prescription workflow. y: Medical history c: Criterion z: Medication record p: Prescription candidate a41 : Run tests to a42 : Search produce medication medication criteria databases no Result yes sufficiently specific? Fig. 2. Selection of new medication. Sub-Workflow for medication selection (component service a4 ) der to make concepts useful for analysis, we on intermediate data. • involves looping based need to Concepts may have one or both parts of the annotatioe them into concept lattices. A lattice is a mathemat- in the latter case, the annotation is not shown.ucture make≤, ∨, ∧) built around a“unpack” (in of thecase To (L, things more interesting, we set L one our componentFig. 5 presents the concept lattices for the medical services, a4 , from the main workflow and represent iting as a sub-workflow with own inputs (items y and z), outputs (itemcontexts from Fig.dataThe most general concepts are s concepts from a context), a partial order relation p), and intermediate 4. (item c). east upper bound (LUB) operation ∨, and the greatest top of the lattices, and the most specific (empty in bobound soon as there is a loop involvedarbitrary x, y ∈ L, the analysis bottom. more complex. An exact analysis of the As (GLB) operation ∧. For (taking the “no” branch), the at the becomes orchestration state after the loop would require a discovery of the loop invariant, which is a generally difficult problem. As we x ∨ y = z has the property x ≤ z and y ≤ z, but it is least such element, i.e., for any other w ∈ L such that B. Describing Data with Concept Lattices will show in the next section, we find our way around this obstacle by employing abstract interpretation techniques that give us a conservative approximation of the loop behavior.and y ≤ w, we have z ≤ w. The case for the greatest The data items that are input to the workflow ne ound operation ∧ is symmetric. In this paper, we deal mapped to the appropriate objects in the input conce
  • Data Attributes Reasoning about the User-defined attributes can be used to whereabouts of data in the characterize data in a given analysis domain execution of a service composition is simpler if we • Application dependent view track only data attributes and • Simplified data model: sets of properties instead not the entire complex data structures. This fits very well of complex structures an approach to program • User (designer) chooses relevant attributes, analysis known as abstract describing e.g.: interpretation, where infinite information content data domains are abstracted privacy/confidentiality levels ⇐ our example into finite ones. ownership E.g., knowing privacy levels other aspects of quality of input data, we can try to infer the privacy levels of • Possibly: a combination of views intermediate data and the • Known or assumed for input data, implicit in individual activities in the control/data dependencies in the workflow workflow. Of course, we have to know Question: How to infer attributes (i.e. properties) how data tests and of intermediate and resulting data items? operations depend on and • Based on control flow and data dependencies affect data attributes on the abstract plane.
  • Knowing Data Attributes Knowledge of data attributes at design time: Analysis of data attributes for components of a service • Supporting fragmentation orchestration at design time is Fragment: a part of orchestration that can be an instance of static analysis, distributed for remote execution where properties are inferred What parts can be identified and enacted in a from specification, and not by running the orchestration. distributed fashion? The static analysis approach • Checking data compliance can be combined with Content of messages exchanged with/between monitoring and adaptation component services in an orchestration mechanisms, and the Is “sufficient” data passed to components? analysis can be performed on a live executing instance of • Robust top-down development the orchestration. Modular structure of service orchestrations That can give more accurate Refining specifications of workflow results, because by looking at (sub-)components the live instance we can learn the actual values of data up Also useful at runtime: to that point in execution, and • Updating predictions with actual data update the analysis • On-demand analysis accordingly.
  • Problem StatementP ROBLEMTo infer user-defined attributes for data items and activities on different lev-els in an orchestration, automatically from: known attributes of input data • defined or assumed by the designer control structure • including complex control structures, such as parallel flows, conditional branches and loops. data operations • reading or writing data, including tests, assignments and service invocationsThe aim is to provide the automated, mechanical inference of data attributes, ideally using a tool that can be invoked atdesign time. The concrete tool implementation depends on the language in which the orchestration is written (e.g., BPEL,BPMN, etc.)In this learning package, we generally present the approach and ideas for each step in the analysis process. These stepscan be adapted to particular orchestration language and turned into a fully automated tool chain.
  • 3 Overview of the Approach
  • Overview Input data context Workflow definition Resulting context User perspective α1 α2 α3 ... α1 α2 α3 ... i1 o1 i2 o2 i3 o3 ... ... Input concept lattice Resulting concept lattice Underlying techniques and artifacts Horn clause program w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):- A1=f1(X1), Y1=f1Y1(X1), A2=f2(X2), Y2=f2Y2(X2), A3=f3(Y1,Y2), ... Input substitution Sharing analysis Abstract substitution ... X1=f(U1,U2), - Abstract interpretation [[X1,A1,Y1,A3,Z1], [A3,Z1,A4,Z2], X2=f(U1), - Sharing+freeness domain [X2,A4,Z2], X3=f, - CiaoDE / CiaoPP suite [X2,A2,Y2,A3,Z1,A4,Z2]] ... Fig. 3. Overview of the approach. Above the line are artifacts that the user works with directly. The input data context describes user-defined attributes of the eed inputs to the orchestration, and accompanies the workflow definition, like the represented in the form results are program in to be mapped to appropriate objects (in this case the needs to be one in our example. The of a logic returned [14]:Medical history the resulting context which gives back the attributes series of logical implications results, and activities. the form of and the Medication record from Fig. 5(a)). for the intermediate data items, which can be operationally un are explained in the slides that follow. stating which subgoals are needed to accomplish The intermediate steps below the line NALYSIS IV. A PPLYING S HARING A derstood as given goal. Note that the translation into a logic program doe Our application of sharing analysis to elicit new knowledge not need to be operationally equivalent to the initial workflow
  • Overview (contd.) The approach to Automated Attribute inference takes as input: • an input data context that identifies the input data items to the workflow and their attributes • a workflow definition in some appropriate formalism (e.g. BPMN in our example) and gives at output: • a resulting context that presents inferred attributes of all intermediate data items and activities in the given workflow. The key steps in the process include: • Conceptualizing the input data context in the form of a concept lattice, and preparing the input substitution for the analysis. • Turning the given workflow definition into a Horn Clause program that is fed to the analysis, along with the input substitution. • Performing sharing analysis and using its result, the abstract substitution to construct the resulting concept lattice. • Interpreting the resulting concept lattice to produce the resulting context.
  • Outline of the Section This Overview section starts with two subsections that introduce some important background notions. • Subsection Contexts and Concept Lattices briefly introduces the key notions of Formal Concept Analysis (FCA), like contexts and concept lattices that are used in the rest of the text for representing (and reasoning about) inputs and outputs of the proposed analysis approach. • Subsection Horn Clause Programs presents the key ideas behind logic (Horn Clause) programs, gives an informal introduction to their form and meaning, and presents the notion of structured terms, substitutions, unification, which are all referred to later. It also introduces Prolog syntax. These two subsections do not describe steps of the approach as such. Rather, they supply the notions whose understanding is necessary for understanding the steps in the approach.
  • Outline of the Section (contd.) The rest of subsections describe steps in the process of automated attribute inference: • Subsection Workflows in Horn Clause Form starts from a rather generalized way of describing workflows that involve complex data and control dependencies, and describes how such workflows can be turned into a Horn Clause form amenable to sharing analysis. • Next, subsection Sharing Analysis first defines the notion of sharing in logic programs, building on the notion of substitution, introduced earlier. Next, it describes the notion of abstract substitution, which is used in the actual analysis as the domain for abstract interpretation. It also describes how an initial substitution for the analysis is set up using attributes from the input concept lattice. • Finally, subsection Obtaining and Interpreting Results explains how the result of the sharing analysis, in the form of abstract substitution, is turned into a resulting concept lattice, and then used to generate the resulting context, which is the end result.
  • 3 1 Contexts and Concept Lattices
  • with the approaches to verify The sharing analysis tools we will use [7], [6] work on logicContexts: therefore the workflow under consideration specifications using data-flowprograms, and Objects and Attributes those higher-level conceptual with various aspects of busine Formal Concept Analysis is a Symptoms Tests Coverage case we aim mathematical prop branch of at inferring Medical history that takes into account details o lattice theory concerned with Medication record control flow and data operatio knowledge representation (a) Characteristics of medical databases. or UMLreasoning. diagrams a and activity whileAHorncontext is simply a an FCA clauses provide Name Address PIN SSN that has been extensively stud table that associates objects Passport As with attributes. an illustration, we give National Id Card of our workflowon the left in B The examples written Driving License clauses. The contexts: one that th show two translation for describes the content of Social Security Card Prolog syntax, and will be ex medical databases, and (b) Types of identity documents. Lines 1-8that describes the another are a Horn clause the workflow with a list of com information contained in Fig. 4. Two examples of contexts. (linesdifferent identity documents. 2-8) following the defini Objects (rows) stand for Notion of context in Formal Concept Analysis some meaningful entities, (FCA) and attributes (columns) are • Set A of attributes (columns) chosen by the user to represent relevant notions in • Set O of objects (rows) the application domain. • Boolean object-attribute relation ρ ⊆ O × A
  • Concepts From the definition, in concept (B , D ) we need to The idea behind a concept is a close connection know only B or D to find the other using (·) . between subsets of objects and attributes. That means we can choose Objects → Attributes to work with objects or • For arbitrary subset of objects B ⊆ O attributes, whatever is more convenient. let B = {a ∈ A | ∀o ∈ B , oρa } E.g., we can start from a “all and only those attributes that belong to all single attribute a and objects from B” calculate {a } to find the Attributes → Objects most general concept that has a. • For arbitrary subset of attributes D ⊆ A Or, we can start with an let D = {o ∈ O | ∀a ∈ D , oρa } object o and calculate {o } “all and only those objects that have all to find the most specific attributes from D” concept containing o. Because B = B and Iff B = D and D = B then (B , D ) is a concept D = D, we say that • B = (B ) = D = B, D = (D ) = B = D concepts are closed under (·) , i.e., (·) is a closure.
  • activity, and ϕ is an uninterpreted discussed symbol to be function below. A41=f41(Y,Z), % a_41 race condition betw particular name is not relevant for sharing analysis, C=f41_C(Y), The ordering of activities in the body of a clause must try to read/write the 21 Concept Lattices been chosen to recall the activity name). This is A42=f42(C), % a_42 respect data dependencies, in the sense P=f42_P(C). should example and the pos that data items 23d by goals of the same shape where the left-hand side a goal only if they are produced by a detected from the st appear as arguments in Concept latticepreceding activity. The ordering also needs to respect control include both branch tands for data item produced by the withactivity, and the ordering Fig. 6. Horn clause program encoding for the medication prescri n the right hand side includes dependencies arising fromworkflow. sequences and joins (AND can be affected by • (B1 , D ) data ,items usedB1 the B2 explicit2 ⊆ D1 (B goals ) ⇔ in ⊆ The concept lattice is often D2 ⇔D tion of the data item. For1instance, 2OR). A1=f1(X,D) in the AND-split case, the relative using a variant ofactivity shown component and Otherwise, as 1 Y(X,D) in•lines 2 and concept ofthe fact that a1 Lesser 3 representadds attributes (D2the body i.e.a Horn clause is diagrams (bottom left). a order activities as goals in ⊆ D1 , of Hasse its body (activitiesa items x and d asB1 , D1 ) is moredata item y. The not significant from the sharing analysis same manner as ( inputs, to produce specific) Nodes represent concepts.view the point ofception in w is the goal for concept includes lesser objects ordering can always be found, unless there • Greater sub-workflow a4 (line 7) and one such The top concept is goal for a4 The visually atscussed below. race condition between potentially parallelized activities Symptoms the top, and the predicate a4 de to a bottom (B1 ⊆ B2 , i.e. (B , D2 ) is more general)ordering of activities in the body of2a clause must try to read/write the same data item. This is not the case in concept istranslated the introdu visually at bydata dependencies, in ( ): the most items should • Top the sense that data generalTests exampleMedicationthe possibility ofbottom. represent the case o concept (all Coverage Medical history objects) and record this happening can be static s arguments in aBottom (⊥): the most specific concept (allthe structure of Callouts showmeans of a tothat • goal only if they are produced by a detected from the workflow. Alsonew recur by objects note g activity. The ordering also needs to respect control include both branches of the the concept (inherited by11) is in attributes) XOR-split, since the all (w2 in line data t ncies arising from explicit sequences and joins (AND can be affected by either oneupwards nodes) above the for predicate a 4x. of them. The workflow ). Otherwise, as in the AND-split case, the relative component activity a4 is effectively attributes new to the It is a complete lattice line, and aB. Input Substitutio repeat-until loop, activities as goals in the body of a Horn clause is latticebody (activities a41 and a42 ) is translated in lines 19-2 (a) Concept its for medical databases. concepts (inherited by all Name same manner as w. the An input substitu downwards nodes) below the The goal for a4 in the definition fore which 7) is a line. of w (line attribute Symptoms PIN to a predicate a4 defined in lines 10-13. Its loop is a map Address variables. It structur Passport The example concept lattices as data items given translated by introducing auxiliary clauses in lines 15-17 Driving License represent the case of loop exit (line 15) and the variables w “hidden” loop (bottom left) correspond to itera Tests Coverage Medical history Medication record SSN by means of a recursive call. the sampleto Variable sharing c Soc. Sec. Card The call contexts for of the l the body National ID (w2 in line 11) is translated medical databases to variable se represent the auxil before the call personal predicate a 4x. identification documents. of the The structure the input concept lat (a) Concept lattice for medical databases. B. Input Substitutions attribute in the inpu (b) Concept lattice for identity documents. Name An input substitution sets up the initial sharing (andcor named after the th
  • 3 2 Horn Clause Programs
  • About Logic Programs Logic programming is one of Logic programs represent a computation task as the classical programming set of logical rules and facts paradigms, along with imperative, object-oriented Logical rules model if-then inferences: and functional programming. • B1 ∧ B2 ∧ · · · ∧ Bn → H: if B1 , . . . , Bn are all The example gives rules for the “x is an ancestor of y” true (n 0), then we conclude that H is true. relation, written as • Often written as H ← B1 ∧ B2 ∧ · · · ∧ Bn ancestor (x , y ), using also • H is the head of the rule parent (x , y ) relation. • B1 ∧ · · · ∧ Bn is the body of the rule Logic programs are • H ← (the case n = 0) is a fact (H is always true) declarative, because they state the rules and the problem to be solved (e.g.Example finding somebody’s ancestors or descendants), not theancestor (x , y ) ← parent (x , y ) (a parent is an ancestor) sequence of steps to solve it.ancestor (x , y ) ← parent (z , y )∧ (a parent’s ancestor is That makes logic programs ancestor (x , z ) an ancestor) relatives of SQL, but far more powerful.
  • Elements of Horn Clause Programs Elements of logic programs include: The elements of logic programs (predicates, terms, • Predicates that describe logical properties or variables, constants, etc.) relations, such as ancestor /2 and parent /2 correspond to the notions in (where /n means “with n arguments”) First Order Logic (FOL). • Atoms that apply predicates, such as “x is an As in FOL, we assume that predicate and constant ancestor of y”, written as ancestor (x , y ) names refer to distinct • Variables x , y , z that stand for arbitrary objects entities – unlike differently in a rule (implicitly ∀-qualified) named variables that can • Constants that name distinct objects (such as refer to the same object. Alice, Bob, Carol and Dennis below) Also, p/1 and p/2 are two different predicates — with In a Horn Clause program rules, H and each of one and two arguments, B1 , . . . , Bn are atoms. respectively — even though they share the same name p.Continued Example: Parent Fact Database The simplified structure of Horn Clause programs allowsparent (Alice, Bob) ← efficient reasoning, i.e. derivation of logicalparent (Dennis, Bob) ← consequences from knownparent (Carol, Dennis) ← facts and rules.
  • Executing Horn-Clause Programs The sample queries compute Executing a logic programs means searching for different things (or fail, in the a proof of a logical statement known as the last case) depending on the query – in C or Java we query, finding variable values along the way. would have to program separate procedures for “findSample Query 1: Find Bob’s ancestors person’s ancestors” and “find person’s descendants” etc.Query: ancestor (x , Bob) In case of success, theAnswers: x = Alice, x = Carol, x = Dennis variables in the query may point to objects for which the query can be proven from theSample Query 2: Find Carols’s descendants program.Query: ancestor (Carol, y ) The “magic” is done by the under-the-hood inferenceAnswers: y = Dennis, y = Bob engine that takes the program and the query and performs a systematic search for a proof.Sample Query 3: Find Alice’s ancestors The result may be a failure, orQuery: ancestor (x , Alice) a single or multiple solution (possibly infinite number ofAnswer: no solution (cannot prove for any x) them).
  • Handling Structures Structured terms have the shape f (t1 , t2 , . . . , tm ), Note that f (t1 , . . . , tn ) is m 0, where f is a functor, and each of ti is NOT a function call in the sense of C, Python or again a term. Haskell. It can be thought of as a data record with name fExample: Peano Arithmetics and n fields. For n = 0, f is simply a constant.Program: number (0) ← number (s(x )) ← number (x ) It goes without saying that structured terms can be (and succ (x , s(x )) ← often are) nested, as in theQuery 1: number (x ) examples of PeanoAnswers: x = 0, x = s(0), x = s(s(0)), x = s(s(s(0))), . . . arithmetics and lists.Query 2: succ (x , s(0)) Lists are very frequently used data structures, and are aAnswer: x =0 common tool in logic programming. However, Lists are common structures, with constant [ ] structured terms can be used representing the empty list, and functor “.” (dot) to represent nodes in a tree used to put together the head and the tail: or a graph, records that store information, or other kinds of • [4] is the same as .(4, [ ]) data containers we need. • [1, 2, 3, 4] is the same as .(1, .(2, .(3, .(4, [ ]))))
  • Unification and Substitutions An atoms of the form t1 = t2 expresses syntactical equality. • it succeeds if t1 and t2 are identical, or can be made identical by substituting some variables in t1 and t2 for terms. • we are interested in the substitution which introduces the least amount of information — the most general unifier (MGU) • a substitution maps (binds) variables to termsUnification Examples Unification MGU Unification MGU 1=0 none (failure) s (x ) = s (y ) θ = {x → y } s(0) = s(x ) θ = {x → 0} f (s(x ), x ) = f (z , 1) θ = {z → s(1), x → 1} f (0) = s(x ) none (failure) f (s(x ), y ) = f (1, z ) none (failure) Running a query means finding a substitution that makes the query true, by adding MGUs from each B1 , . . . , Bn in a rule body.Note that t1 = t2 is just a nicer way of writing = (t1 , t2 ). Unification is implicit in parameter passing in clause heads. Forinstance, we can rewrite the rule “succ (x , s(x )) ←” as “succ (x , y ) ← y = s(x )”. Equally, the rule“number (s(x )) ← number (x )” can be rewritten as “number (y ) ← y = s(x ) ∧ number (x )”.
  • Prolog and Friends Prolog is a programming language based on The full Prolog language includes “impure” features, Horn Clause rules. such as dynamic fact updates Concrete language syntax: and I/O. Modern Prolog systems contain extensions • clauses end with a full stop (“.”) such as constraint logic • uses “:-” instead of “←”, comma instead of “∧” programming (CLP) and • variables start with uppercase letters or “ ” tabling. • predicate names, functors and constants start However, “pure” Prolog with lowercase letters programs have a close relationship with logical Powerful analysis tools and techniques based on theories. Reasoning about them in a sound fashion is “clean” program semantics. easier than in other executable formalisms.Examples in Prolog We will use Prolog to encodeAncestors: ancestor(X,Y):- parent(X,Y). objects and attributes in an ancestor(X,Y):- parent(Z,Y), ancestor(X,Z). executable from which willPeano arith.: number(0). capture the structure of a number(s(X)):- number(X). workflow, and Prolog analysis add(0, X, X):- number(X). tools to automatically derive add(s(X), Y, s(Z)):- add(X,Y,Z). attributes.
  • 3 3 Workflows in Horn Clause Form
  • Anatomy of a Workflow There are many concrete In general, workflows may contain complex data workflow definition languages that can be used to specify and control dependencies: control structures and data • sequences, conditional branches, and loops operations. Here, we use an • parallel flows, with pre- and post-conditions abstract workflow representation where both • data items are read and written by activities control and data • inputs: possibly complex XML information sets dependencies are shown x explicitly. x, y y, z Analyzing content of data z items at all points in a z workflow is an instance of a x y x? y ? x , y ? general program analysis problem. To solve, especially in presence of loops and complex data structures, Understanding how data is handled throughout approximation techniques the workflow is non-trivial. such as abstract • what information items / parts are used? where? interpretation are usually needed
  • Example of (Enriched) Workflow To make the analysis of workflow control and data dependencies easier, let us first “distill” our BPMN workflow example into a simplified abstract form below (elements to be clarified in the slides that follow). • We keep only the activity tags (a1 , . . . , a5 ), control dependencies between them, and labels for data items read/written by the activities. • We abstract the looping in the sub-workflow as a structured activity of repeat-until type with a separate body sub-workflow. x ,d y ,z x y a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ– a1 ∧ done–a2 , AND pre–a3 ≡ done–a1 ∧ succ– a1 ∧ done–a2 , OR a5 pre–a5 ≡ done–a3 ∨ done–a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done–a41 } exit depends on p y ,z c c p
  • Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre– a4 ≡ done– a1 ∧ ¬succ– a1 ∧ done– a2 , AND pre– a3 ≡ done– a1 ∧ succ– a1 ∧ done– a2 , OR a5 pre– a5 ≡ done– a3 ∨ done– a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre–a42 ≡ done– a41 } exit depends on p y ,z c c p Workflow control structure: • includes activities a1 , . . . , a5 • arrows show control dependencies (e.g. a4 depends on a1 and a2 ) • independent activities may run in parallel (e.g. a1 and a2 ) • different join types (AND/OR) Data dependencies based on read/write annotations • Wi annotation for each activity ai R i • Ri is the set of data items read, Wi is the set of data items written
  • Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 , AND pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 , OR a5 pre–a5 ≡ done–a3 ∨ done–a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done–a41 } exit depends on p y ,z c c p Set C of logical control preconditions: • activity preconditions pre– ai expressed using propositional formulas • done– aj means “aj has finished” • succ– aj means “aj has achieved its (user-defined) goal” • easily models sequences and AND/OR/XOR parallel flows Helps detect possible deadlocks and race conditions: • deadlocks appear in case of circular dependencies (pre– ai → done– ai ) • race conditions appear when two activities that read/write same data item can execute in parallel
  • Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 , AND pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2 , OR a5 pre–a5 ≡ done–a3 ∨ done–a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done–a41 } exit depends on p y ,z c c p Based on control preconditions, we can find legal orderings of activities that respect the preconditions: • only if there are no deadlocks/race conditions (that can be efficiently checked using e.g. SAT solvers) All legal orderings are equivalent from the point of view of data handling.
  • Example of (Enriched) Workflow (cont.) x ,d y ,z x y a1 a4 − − C ={pre– a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2 , AND pre– a3 ≡ done–a1 ∧ succ–a1 ∧ done– a2 , OR a5 pre– a5 ≡ done– a3 ∨ done– a4 } x ,e AND y ,z z a2 a3 − a41 a42 a4 : repeat-until loop C ={pre– a42 ≡ done– a41 } exit depends on p y ,z c c p Sub-workflows can be used to model complex constructs: • in our case, activity a4 is a repeat-until loop • the body of the loop is a sub-workflow (with a41 and a42 Sub-workflows also allow modular development and/or assembly of workflows
  • Workflow as a Horn Clause Program We represent workflow symbolically in w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):- a Horn Clause form for further A1=f1(X,D), % a_1 Y=f1_Y(X,D), analysis. A2=f2(X,E), % a_2 Z=f2_Z(X,E), A3=f3(Y,Z), % a_3 • the representation is not a_4(Y,Z,A4,A41,C,A42,P), % a_4 operationally equivanent A5=f5(X). % a_5 a_4(Y,Z,A4,A41,C,A42,P):- The predicate w stands for the w2(Y,Z,A41,C2,A42,P2), A4=f4(P2), workflow a_4x(Y,Z,C2,P2,C,P,A4,A41,A42). • clause body reflects a legal a_4x(_,_,C,P,C,P,_,_,_). a_4x(X,Z,_,_,C,P,A4,A41,A42):- ordering of activities a_4(X,Z,A4,A41,C,A42,P). • variables stand for data items w2(Y,Z,A41,C,A42,P):- and activities A41=f41(Y,Z), % a_41 C=f41_C(Y), A42=f42(C), % a_42 Sub-workflows and complex activities P=f42_P(C). are in separate predicates.Note that in Prolog syntax, an underscore (“ ”) represents a new, fresh variable that stands for an arbitrary term.Predicate w2 represents the body of the loop, and predicates a 4 and a 4x model the repeat-until construct.
  • Workflow as a Horn Clause Program (cont.) w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):- A1=f1(X,D), % a_1 For each activity, we model data Y=f1_Y(X,D), A2=f2(X,E), % a_2 dependencies with unifications: Z=f2_Z(X,E), A3=f3(Y,Z), % a_3 • Ai = fi (Ri ) stands for “activity ai a_4(Y,Z,A4,A41,C,A42,P), % a_4 A5=f5(X). % a_5 reads data items from set Ri ” a_4(Y,Z,A4,A41,C,A42,P):- • for each written item z ∈ Wi w2(Y,Z,A41,C2,A42,P2), written by ai , z = fiz (Q ) stands A4=f4(P2), a_4x(Y,Z,C2,P2,C,P,A4,A41,A42). for “z is written using data items a_4x(_,_,C,P,C,P,_,_,_). from Q ⊆ Ri ” a_4x(X,Z,_,_,C,P,A4,A41,A42):- a_4(X,Z,A4,A41,C,A42,P). Such a Horn Clause representation w2(Y,Z,A41,C,A42,P):- can be derived mechanically and, in A41=f41(Y,Z), % a_41 C=f41_C(Y), principle, automatically. A42=f42(C), % a_42 P=f42_P(C).Choice of functors (fi and fiz ) is purely symbolic and is not significant for the subsequent sharing analysis. The purpose ofthe unifications in the Horn Clause representation is to express functional dependencies between activities and data items ina workflow, and not to actually calculate them.
  • 3 4 Sharing Analysis
  • Sharing in Logic Programs Sharing analysis tries to find Sharing analysis of logic programs tries to infer out all possible sharings how data is shared between variables: between variables in case of successful executions. This • sharing is always relative to a substitution θ requires inclusion of all upon successful execution of a query possible substitutions on exit • two variables x , y are said to share if the terms from a query. x θ and y θ (i.e. after applying θ to x and y ) That is generally impractical contain some common variable. and often impossible, since there may be many or evenExample infinite number of possible substitutions to be taken intoθ = {x → s(y )} x θ = s (y ), y θ = y x and y share account. To make sharing analysisθ = {} xθ = x, yθ = y x and y do NOT share viable, we often resort toθ = {x → s(w ), x θ = s(w ), y θ = f (1, z ) x and y do some sort of approximation that reduces the repertoire of y → f (1, z )} NOT share possible sharing cases to aθ = {x → [1, w , f (z )], x θ = [1, w , f (z )], x and y share finite, manageable size, while remaining safe, i.e. not y → n(z , s(w ))} y θ = n(z , s(w )) w and z missing any potential sharing.
  • Abstract Substitution Domain Instead of looking at (possibly infinite number of) concrete substitutions, we can perform analysis on a simplified abstract level. Abstract substitutions approximate terms with sets of contained variables (not concerned with the exact shape of terms): Concrete: θ = { x → f ( u , g ( v )), y → h(5, u ), z → i ( v , w )} DOMAIN u v w shared by Abstract: Θ={ {x , y } , {x , z } , {z } }By operating in the abstract substitution domain, the analysis task becomes simplified and finite. The shown abstractsubstitution domain is not the only applicable choice. For instance, we can work with pair-wise sharing etc. Different sharingdomains also differ with respect to the computational cost of the analysis and precision. The domain used here is known tobe more precise (in the sense of avoiding over-approximation), but exponential in time with respect to the number ofvariables involved. It is also often combined with additional freeness or groundness information.
  • Workflow Input Substitutions We include the information on user-defined attributes of input data to the workflow, by setting up the initial substitution for inputs (x, d, e in our case): init1(X, D, E):- X= f1(Name, Pin), D= f2(Symptoms, Tests), E= f3(Symptoms, Coverage). Reflects positioning of inputs in the initial context / concept lattice. The initial concrete substitution coded here maps to the initial abstract substitution Θ = {{x }, {d , e}, {d }, {e}} • “x has some components not shared with d or e” • “d and e share something” (Symptoms), but • “both d and e have some private (not shared) components” (Tests and Coverage, respectively)Again, note that the choice of functors (f1, f2, f3) and variable names that stand for the attributes (Name, PIN, etc.) isnot significant for the abstract sharing analysis.
  • 3 5 Obtaining and Interpreting Results
  • Sharing Results The sharing results shown in 1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], 2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], (a) were obtained from 3 [X,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], sharing and freeness (shfr) 4 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], analysis in the CiaoPP 5 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P], analysis suite. 6 [D,A1,Y,A3,A4,A41,C,A42,P], 7 [E,A2,Z,A3,A41]] The results are safe in the (a) The resulting substitution sense that all possible sharing is included. However, Top-level variables Recovered hidden variables it may contain a degree of X, A5 {u1 , u2 , u3 , u4 } E {u1 , u3 , u5 , u7 } over-approximation, i.e. it D {u1 , u2 , u5 , u6 } may conservatively assume s (which are A2, Z {u1 , u2 , u3 , u4 , u5 , u7 } sharing where it cannot bebetween top- A1, Y, A42, C, P {u1 , u2 , u3 , u4 , u5 , u6 } A3, A4, A41 {u1 , u2 , u3 , u4 , u5 , u6 , u7 } ruled out with certainty.ds iff A ⊆ B,he associated (b) Points in the resulting sharing lattice. From an abstract substitution,n the sharing Fig. 8. Abstract substitution and the recovered hidden variables. we do not know which lattice. variables are shared, so one Theses for input resulting abstract substitution (a) shows u u u 1 5 2 possibility is to “recover” a sharing for datau items and activities. u 4 sufficient number of hidden 3 x, a5 variables that are shared in at object (has It is as if the data item and activity variables u 7 u manner compatible with the 6 e sharedSecurity Card a set of hidden variables u1 , . . . , u7 (b) d abstract substitution. s and SSN). cal histories a2 , z a1 , y, p,es Symptoms a42 , c
  • Minimal Hidden Variable Recovery How many hidden variables are needed to comply with the sharing results? The proof that it is sufficient to “invent” a hidden variable • As many as there are sharing settings. for each sharing setting in the • The hidden variables are counterparts of the resulting abstract sharing, user-defined attributes used in the input and that fewer hidden variables than that would not substitution. do, follows from the definition A straightforward algorithm to recover a minimal of abstract sharing and the monotonicity of logic set of resulting hidden variables U. programs. For any non-empty abstract substitution, there is an infinite number of compatible function R ECOVER S UBST VARS(V,Θ ) concrete substitutions even n ← |Θ |; U ← {u1 , u2 , ..., un } ￿ n = |Θ | fresh variables in U with a fixed set of hidden S : V → ℘(U); S ← const(0) / ￿ the initial value for the result variables, because the shape for x ∈ V , i ∈ {1..n} do ￿ for each variable and subst. setting of terms may be arbitrary. if x ∈ Θ [i] then ￿ if the variable appears in the setting That suffices in our case, S ← S[x ￿→ S(x) ∪ {ui }] ￿ add ui to its resulting set because we just want to end if know what is shared and not end for exactly how. return U, S end function
  • 1 2 3 4 5 6 7 (b) Points in the resulting sharing lattice. Resulting Lattice (Recovered) Fig. 8. Abstract substitution and the recovered hidden variables. u1 u5 u2 To interpret the resulting u4 u3 lattice in terms of the original x, a5 user-defined attributes, we u7 observe that sharing analysis u6 preserves ordering between e d concepts in the lattice. That means that the “lesser” concepts in the resulting a1 , y, p, lattice inherit all original a2 , z attributes from the input data a42 , c items (shown in boldface). a3 , a4 , a41 Therefore, we se the resulting lattice over hidden variables as a skeleton to “paint” Fig. 9. The resulting concept lattice. We can now construct the resulting concept intermediate data items and latice: activities with the original user-defined attributes. • activity and data item variables as objects 1 reasonable recoveredpractice.variables as attributes • speed in hidden The output of theare highlighted an abstract substitution • activities analysis isFig. 8(a)), which is common to both cases of input data
  • Resulting Context The resulting context is a simple tabular form that is Finally, after assigning user-defined attributes to presented to the user as the concepts in the resulting lattice, we can create result of the sharing analysis. the resulting context: The user starts with the input context (above the line) and the workflow definition, while Item Name PIN Symp. Tests Cover. Item Na all other steps are x x intermediate, mechanical and d d ideally fully automated. e e a2 , z For activities,a , z attributes 2 indicated the properties of a1 , y, p, a42 , c a1 , y, p, a42 , c data visible (read) by an a3 , a4 , a41 activity. a3 , data items, For a4 , a41 a5 a5 attributes describe the information content of data Fig. 10. The resulting context for thewas derived from.cas and what it two analysis • The input data items (above the line) keep the Note that the sharing analysis initial attributes is conservative in the sensemeaning of these outputdata items and activities be interpreted that all attributes thatand are • Intermediate hidden variables has to (below and Tests, cannotin terms of the originaladded with— starting with those of the beMedicationoutprovider the line) are attributes the assigned attributes decidedly ruled included. areinput data items. The sharing analysis of course preserves the Coverage, and areoriginal relationship among the input top-level variables [8]: (a3 , a4 and a41 ) nee
  • 4 Application to Fragment Identification
  • Fragmentation Example (Information Flow) Main medical workflow Workflow for service a4 . ¬stable a4 : Select new Organization medication Health a41 : Run tests to no Result yes + ￿ ￿ produce medication sufficiently criteria specific? a3 : Continue last + prescription stable Examiners Medical a1 : Retrieve a42 : Search medical history medication databases Registry & Medication Provider a2 : Retrieve medication record Archive a5 : Log treatment Fig. 11. An example fragmentation for the drug prescription workflow. Distributing execution of the workflow(s) across organizations • Fragment: a subset of activities[12] Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit Data ACKNOWLEDGMENTS sharing a common property • Fragments assigned to swim-lanes (partners) Links in WS-BPEL Processes. In International Conference on Services Computing (SCC), 2008. The research leading to these results has received funding • Property: access level Programme [13] Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performance from the European Community’s 7 th Framework to sensitive data of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell., under the NoE S-Cube (Grant Agreement n◦ 215483). Thesee insurance coverage Medical examiners cannot 14(2-3):189–216, 2002. authors were also partially supported by Spanish MEC project [14] J.W. Lloyd. Foundations of Logic Programming. Springer, 2nd Ext. Ed., 2008-05624/TIN DOVES and CM project P2009/TIC/1465 see medicalDaniel Wutke, and Frank Leymann. A Novel Approach Medication providers cannot [15] Daniel Martin, tests 1987. (PROMETIDOS). Registry can see only the patient ID. to Decentralized Workflow Enactment. In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise Distributed Object Computing Conference, pages 127–136, Washington, DC, USA, 2008. R EFERENCES IEEE Computer Society. [16] F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program
  • Another Input Context Example1 INITIAL SUBSTITUTION 3 RESULTING LATTICE init2(X,D,E):- u1 X=f1(Name, Address, SSN), u2 D=f2(SSN, Tests, Coverage), u4 E=f3(SSN, Coverage). e u3 x, a52 SHARING RESULTS u5 u1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], z, a2 d u2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], u3 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5], u4 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P], u5 [D,A1,Y,A3,A4,A41,C,A42,P]] y , p, c, a1 , a3 , a4 , a41 , a424 RESULTING CONTEXT 5 RESULTING FRAGMENTATION SCHEME Name Address SSN Tests Coverage Swimlane Activities x, a5 Health Organization a1 , a3 , a4 , a41 , a42 d e Medical Examiners (empty) z, a2 Medication Provider a2 y , p, c, other a · Registry & Archive a5
  • 5 Conclusions
  • Conclusions Representing inputs and intermediate data and activities using FCA contexts and concept lattices allows lattice-based formulation, interpretation, and reasoning on data attributes. Sharing analysis of logic programs is a powerful technique for (abstract) sharing analysis, including sharing of data attributes. • Supports complex data and control structures • Applicable both at design and run-time Applications include fragmentation (as illustrated), but also: Data compliance checking – to verify that sufficient information is exchanged between component services Robust top-down development – by refining component / sub-workflow specifications. Future work: developing translators from concrete executable languages (BPEL, XPDL, Yawl, etc.) into Horn clause programs to facilitate automatic analysis. Also: analyzing stateful conversations between compositions.
  • ReferencesThe content of this presentation is based on the following publications: Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo. Automated Attribute Inference in Complex Service Workflows Based on Sharing Analysis. Proceedings of the 8th International Conference on Service Computing - IEEE SCC 2011, IEEE Press, 2011. Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo. Automatic Fragment Identification in Workflows Based on Sharing Analysis. In Mathias Weske, Jian Yang, Paul Maglio, and Marcelo Fantinato, editors, Service-Oriented Computing – ICSOC 2010, number 6470 in LNCS. Springer Verlag, 2010.
  • ReferencesSome pointers on Web service analysis and fragmentation: Daniel Martin, Daniel Wutke, and Frank Leymann. A Novel Approach to Decentralized Workflow Enactment. In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise Distributed Object Computing Conference, pages 127–136, Washington, DC, USA, 2008. IEEE Computer Society. Ustun Yildiz and Claude Godart. Information Flow Control with Decentralized Service Compositions. In Proceedings of ICWS 2007, pages 9–17, 2007. Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit Data Links in WS-BPEL Processes. In International Conference on Services Computing (SCC), 2008. Rania Khalaf. Note on Syntactic Details of Split BPEL-D Business Processes. Technical Report 2007/2, IAAS, U. Stuttgart, July 2007.
  • ReferencesSome pointers to Formal Concept Analysis (FCA): Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors. Formal Con- cept Analysis, Foundations and Applications. Volume 3626 of Lecture Notes in Computer Science. Springer, 2005. Claudio Carpineto and Giovanni Romano. Concept Data Analysis: Theory and Applications. Wiley, 2004. B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 2nd ed. edition, 2002. Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002.
  • ReferencesSome pointers on program analysis in general and logic programming: P. Cousot and R. Cousot. Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In ACM Symposium on Principles of Programming Languages (POPL’77). ACM Press, 1977. F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer, 2005. Second Ed. ´ M. V. Hermenegildo, F. Bueno, M. Carro, P. Lopez, E. Mera, J.F. Morales, and G. Puebla. An Overview of Ciao and its Design Philosophy. Theory and Practice of Logic Programming, 2011. http://arxiv.org/abs/1102.5497.
  • Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under grant agreement 215483 (S-Cube).