Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ikc 2015

812 views

Published on

This talk presents techniques for improving fragment reuse in workflows.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Ikc 2015

  1. 1. Mariem Harmassi, Daniela Grigori, Khalid Belhajjame LAMSADE, Université Paris Dauphine Mining Workflow Repositories for Improving Fragments Reuse
  2. 2. Workflows A business process specified using the BPMN notation A Scientific Workflow system (Taverna) A workflow consists of an orchestrated and repeatable pattern of business activity enabled by the systematic organization of resources into processes that transform materials, provide services, or process information (Workflow Coalition) IKC 20152
  3. 3. Scientific Workflows  Scientific workflows are increasingly used by scientists as a means for specifying and enacting their experiments.  They tend to be data intensive  The data sets obtained as a result of their enactment can be stored in public repositories to be queried, analyzed and used to feed the execution of other workflows.IKC 20153
  4. 4. Workflows are difficult to design  The design of scientific workflows, just like business process, can be a difficult task  Deep knowledge of the domain  Awareness of the resources, e.g., programs and web services, that can enact the steps of the workflow  Publish and share workflows, and promote their reuse.  myExperiment, CrowldLab, Galaxy, and other various business process repository  Reuse is still an aim.  There are no capabilities that support the user in identifying the workflows, or fragments thereof, that are relevant for the task at hand.IKC 20154
  5. 5. Fragment look-up in the life cycle of workflow design Design Workflow Search Fragments Run Workflow PublishWorkflow Workflow repositories IKC 20155
  6. 6. Workflow Fragments Search  Why is it useful for?  The workflow designer knows the steps of the fragment and their dependencies, but does not know the resources (programs or web services) that can be used for their implementation.  The designer may want to know how colleagues and third parties designed the fragment (best practices)  Elements of the solution 1. Filtering: Instead of search the whole repository, we limit the number of workflows in the repository to be examined to those that are relevant to the user 2. Identify the fragments that are reccurrent in the workflows retrieved in (1) IKC 20156
  7. 7. 1 - Filtering step Workflow XML Workflow graph List of keywords List of keywords & synonyms Wordnet BP Repository Filter Else IKC 20157
  8. 8. 2- Identify Recurrent Fragments  We use graph mining algorithms to identify the fragments in the repository that are recurrent.  We use the SUBDUE algorithm.  Which graph representation to use to represent (workflow) fragments?  We examined a number of workflow representation IKC 20158
  9. 9. Representation A att 1 att 2 att 3 att 4 att 5 next operator An d operator sequenc e next operand operator Xor type type operand next operand typeoperand operand Representation B att 1 att 2 att 3 att 4 att 5 next Split- And next Join-Xor J-Xor sequenc e next sp-and sp-and IKC 20159
  10. 10. Representation C att 2 att 3 att 4 att 5 att 1 S-att1-att2 S-att1-att3 seq-att2-att4 seq-att4-att5 att 2 att 3 att 5 att 1 S-att1-att2 S-att1-att3 seq-att3-att5 IKC 201510
  11. 11. att 1 att 2 att 3 att 4 att 5 And_att1_att3 And_att1_att2 XOR_att3_att5 SEQ_att2_att 4 XOR_att4_att5 Representation D Representation D1 att 1 att 2 att 3 att 4 att 5 An d And XOR SEQ XOR IKC 201511
  12. 12. Experiments  1st experiment: To assess the suitability of the graph representations for mining workflow graphs Effectiveness : Precision/ Recall Memory space : Disk space, DIV Execution time  2nd experiment: To assess the impact of the filtering step in narrowing the search to relevant workflow fragments. IKC 201512
  13. 13. Experiment 1: Dataset  We created three datasets of workflow specifications, containing respectively 30, 42, and 71 workflows.  9 out of these workflows are similar to each other and, as uch contain recurrent structures, that should be detected by the mining algorithm.  Despite the small size of the collection, these datasets allowed to distinguish to a certain extent between the different representations. IKC 201513
  14. 14. Experimentation1: Input Data size IKC 201514
  15. 15. Experiment1: Effectiveness (Precision/ Recall) IKC 201515
  16. 16. Representation A att 1 att 2 att 3 att 4 att 5 next operator An d operator sequenc e next operand operator Xor type type operand next operand typeoperand operand Representation B att 1 att 2 att 3 att 4 att 5 next Split- And next Join-Xor J-Xor sequenc e next sp-and sp-and IKC 201516
  17. 17. Experiment1: Effectiveness (Precision/ Recall) IKC 201517
  18. 18. Experiment1: Execution Time ≥ 55 times ≥ 25 times ≈ 4 times ≈ 5 times IKC 201518
  19. 19. Experiment1: Summary  control nodes : recurrent patterns typical coding scheme related to the model rule  Recall  Labeling the edges: specializations of the same abstract workflow. Precision  Xor as a set of alternatives: duplication , loss of informations  Recall Precision  The Representation D1 seems to be therefore the one that performs best IKC 201519
  20. 20. Experiment 2  Data sets: All Taverna 1 workflows (498 workflows) from myExperiment  User query: We use a small fragment from a workflow in myExperiment. IKC 201520
  21. 21. Conclusion  Methodology for improving the reusability  Model of representation D + Filter  Improve the filter Test others similarity measures  Need to assess the usefulness of the technics presented in practice. And how they can be incorporated in the workflow design life cycle. In the context of the Contextual and Aggregrated Information Retrieval (CAIR) project IKC 201521
  22. 22. Mariem Harmassi, Daniela Grigori, Khalid Belhajjame LAMSADE, Université Paris Dauphine Mining Workflow Repositories for Improving Fragments Reuse

×