SlideShare a Scribd company logo
1 of 22
Mariem Harmassi, Daniela Grigori, Khalid
Belhajjame
LAMSADE, Université Paris Dauphine
Mining Workflow Repositories for
Improving Fragments Reuse
Workflows
A business process specified
using the BPMN notation
A Scientific Workflow system
(Taverna)
A workflow consists of an orchestrated and repeatable pattern of business
activity enabled by the systematic organization of resources into
processes that transform materials, provide services, or process
information (Workflow Coalition)
IKC 20152
Scientific Workflows
 Scientific workflows are
increasingly used by scientists
as a means for specifying and
enacting their experiments.
 They tend to be data intensive
 The data sets obtained as a
result of their enactment can
be stored in public repositories
to be queried, analyzed and
used to feed the execution of
other workflows.IKC 20153
Workflows are difficult to design
 The design of scientific workflows, just like
business process, can be a difficult task
 Deep knowledge of the domain
 Awareness of the resources, e.g., programs and
web services, that can enact the steps of the
workflow
 Publish and share workflows, and promote
their reuse.
 myExperiment, CrowldLab, Galaxy, and other
various business process repository
 Reuse is still an aim.
 There are no capabilities that support the user in
identifying the workflows, or fragments thereof, that
are relevant for the task at hand.IKC 20154
Fragment look-up in the life cycle of
workflow design
Design Workflow Search Fragments
Run Workflow
PublishWorkflow
Workflow
repositories
IKC 20155
Workflow Fragments Search
 Why is it useful for?
 The workflow designer knows the steps of the
fragment and their dependencies, but does not
know the resources (programs or web services) that
can be used for their implementation.
 The designer may want to know how colleagues
and third parties designed the fragment (best
practices)
 Elements of the solution
1. Filtering: Instead of search the whole repository,
we limit the number of workflows in the repository
to be examined to those that are relevant to the
user
2. Identify the fragments that are reccurrent in the
workflows retrieved in (1)
IKC 20156
1 - Filtering step
Workflow
XML
Workflow
graph
List of
keywords
List of
keywords &
synonyms
Wordnet
BP
Repository
Filter
Else
IKC 20157
2- Identify Recurrent Fragments
 We use graph mining algorithms to identify the
fragments in the repository that are recurrent.
 We use the SUBDUE algorithm.
 Which graph representation to use to represent
(workflow) fragments?
 We examined a number of workflow representation
IKC 20158
Representation A
att
1
att
2
att
3
att
4
att
5
next
operator
An
d
operator
sequenc
e
next
operand
operator
Xor
type
type
operand
next
operand
typeoperand
operand
Representation B
att
1
att
2
att
3
att
4
att
5
next
Split-
And
next
Join-Xor
J-Xor
sequenc
e
next
sp-and
sp-and
IKC 20159
Representation C
att
2
att
3
att
4
att
5
att
1
S-att1-att2 S-att1-att3
seq-att2-att4
seq-att4-att5
att
2
att
3
att
5
att
1
S-att1-att2 S-att1-att3
seq-att3-att5
IKC 201510
att
1
att
2
att
3
att
4
att
5
And_att1_att3
And_att1_att2
XOR_att3_att5
SEQ_att2_att
4
XOR_att4_att5
Representation D Representation D1
att
1
att
2
att
3
att
4
att
5
An
d
And
XOR
SEQ
XOR
IKC 201511
Experiments
 1st experiment: To assess the suitability of the
graph representations for mining workflow graphs
Effectiveness : Precision/ Recall
Memory space : Disk space, DIV
Execution time
 2nd experiment: To assess the impact of the
filtering step in narrowing the search to relevant
workflow fragments.
IKC 201512
Experiment 1: Dataset
 We created three datasets of workflow
specifications, containing respectively 30, 42, and
71 workflows.
 9 out of these workflows are similar to each other
and, as uch contain recurrent structures, that
should be detected by the mining algorithm.
 Despite the small size of the collection, these
datasets allowed to distinguish to a certain extent
between the different representations.
IKC 201513
Experimentation1:
Input Data size
IKC 201514
Experiment1:
Effectiveness (Precision/ Recall)
IKC 201515
Representation A
att
1
att
2
att
3
att
4
att
5
next
operator
An
d
operator
sequenc
e
next
operand
operator
Xor
type
type
operand
next
operand
typeoperand
operand
Representation B
att
1
att
2
att
3
att
4
att
5
next
Split-
And
next
Join-Xor
J-Xor
sequenc
e
next
sp-and
sp-and
IKC 201516
Experiment1:
Effectiveness (Precision/ Recall)
IKC 201517
Experiment1:
Execution Time
≥ 55
times
≥ 25
times
≈ 4
times
≈ 5
times
IKC 201518
Experiment1: Summary
 control nodes : recurrent patterns typical coding scheme
related to the model rule
 Recall
 Labeling the edges: specializations of the same abstract
workflow.
Precision
 Xor as a set of alternatives: duplication , loss of
informations
 Recall Precision
 The Representation D1 seems to be therefore the one that
performs best
IKC 201519
Experiment 2
 Data sets: All Taverna 1 workflows (498
workflows) from myExperiment
 User query: We use a small fragment from a
workflow in myExperiment.
IKC 201520
Conclusion
 Methodology for improving the reusability
 Model of representation D + Filter
 Improve the filter
Test others similarity measures
 Need to assess the usefulness of the technics
presented in practice. And how they can be
incorporated in the workflow design life cycle.
In the context of the Contextual and Aggregrated
Information Retrieval (CAIR) project
IKC 201521
Mariem Harmassi, Daniela Grigori, Khalid
Belhajjame
LAMSADE, Université Paris Dauphine
Mining Workflow Repositories for
Improving Fragments Reuse

More Related Content

What's hot

Chapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structureChapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structureMahmoud Alfarra
 
Introduction to Eclipse
Introduction to Eclipse Introduction to Eclipse
Introduction to Eclipse Arpana Awasthi
 
Boost your craftsmanship with Java 8
Boost your craftsmanship with Java 8Boost your craftsmanship with Java 8
Boost your craftsmanship with Java 8João Nunes
 
2 introduction to data structure
2  introduction to data structure2  introduction to data structure
2 introduction to data structureMahmoud Alfarra
 
Java8 training - Class 1
Java8 training  - Class 1Java8 training  - Class 1
Java8 training - Class 1Marut Singh
 
Hoisting Nested Functions
Hoisting Nested FunctionsHoisting Nested Functions
Hoisting Nested FunctionsFeras Tanan
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structureMahmoud Alfarra
 
Java8 training - class 3
Java8 training - class 3Java8 training - class 3
Java8 training - class 3Marut Singh
 
Java8: Language Enhancements
Java8: Language EnhancementsJava8: Language Enhancements
Java8: Language EnhancementsYuriy Bondaruk
 
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs Lisong Guo
 
Hoisting Nested Functions
Hoisting Nested Functions Hoisting Nested Functions
Hoisting Nested Functions Feras Tanan
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIJörn Guy Süß JGS
 
Comp 220 i lab 6 overloaded operators lab report and source code
Comp 220 i lab 6 overloaded operators lab report and source codeComp 220 i lab 6 overloaded operators lab report and source code
Comp 220 i lab 6 overloaded operators lab report and source codepradesigali1
 
Stack organization
Stack organizationStack organization
Stack organizationchauhankapil
 
Java 8 lambda expressions
Java 8 lambda expressionsJava 8 lambda expressions
Java 8 lambda expressionsLogan Chien
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notesDoug Chang
 

What's hot (20)

Chapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structureChapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structure
 
Introduction to Eclipse
Introduction to Eclipse Introduction to Eclipse
Introduction to Eclipse
 
Lambdas HOL
Lambdas HOLLambdas HOL
Lambdas HOL
 
Boost your craftsmanship with Java 8
Boost your craftsmanship with Java 8Boost your craftsmanship with Java 8
Boost your craftsmanship with Java 8
 
2 introduction to data structure
2  introduction to data structure2  introduction to data structure
2 introduction to data structure
 
Java8 training - Class 1
Java8 training  - Class 1Java8 training  - Class 1
Java8 training - Class 1
 
Java 8 Intro - Core Features
Java 8 Intro - Core FeaturesJava 8 Intro - Core Features
Java 8 Intro - Core Features
 
Hoisting Nested Functions
Hoisting Nested FunctionsHoisting Nested Functions
Hoisting Nested Functions
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structure
 
Java8 training - class 3
Java8 training - class 3Java8 training - class 3
Java8 training - class 3
 
Java8: Language Enhancements
Java8: Language EnhancementsJava8: Language Enhancements
Java8: Language Enhancements
 
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
 
Hoisting Nested Functions
Hoisting Nested Functions Hoisting Nested Functions
Hoisting Nested Functions
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its API
 
Comp 220 i lab 6 overloaded operators lab report and source code
Comp 220 i lab 6 overloaded operators lab report and source codeComp 220 i lab 6 overloaded operators lab report and source code
Comp 220 i lab 6 overloaded operators lab report and source code
 
5.program structure
5.program structure5.program structure
5.program structure
 
Sorting
SortingSorting
Sorting
 
Stack organization
Stack organizationStack organization
Stack organization
 
Java 8 lambda expressions
Java 8 lambda expressionsJava 8 lambda expressions
Java 8 lambda expressions
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notes
 

Viewers also liked

Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Khalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in SepublicaKhalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenanceKhalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsKhalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 
Предиктивная аналитика и Big Data: методы, инструменты, решения
Предиктивная аналитика и Big Data: методы, инструменты, решенияПредиктивная аналитика и Big Data: методы, инструменты, решения
Предиктивная аналитика и Big Data: методы, инструменты, решенияDell_Russia
 

Viewers also liked (10)

D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Предиктивная аналитика и Big Data: методы, инструменты, решения
Предиктивная аналитика и Big Data: методы, инструменты, решенияПредиктивная аналитика и Big Data: методы, инструменты, решения
Предиктивная аналитика и Big Data: методы, инструменты, решения
 

Similar to Ikc 2015

Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingAraf Karsh Hamid
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsLightbend
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
 
Rejunevating software reengineering processes
Rejunevating software reengineering processesRejunevating software reengineering processes
Rejunevating software reengineering processesmanishthaper
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
Advanced Production Accounting of a Flotation Plant
Advanced Production Accounting of a Flotation PlantAdvanced Production Accounting of a Flotation Plant
Advanced Production Accounting of a Flotation PlantAlkis Vazacopoulos
 
Advanced Production Accounting
Advanced Production AccountingAdvanced Production Accounting
Advanced Production AccountingAlkis Vazacopoulos
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Timothy McPhillips
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Chapter 1 Data structure.pptx
Chapter 1 Data structure.pptxChapter 1 Data structure.pptx
Chapter 1 Data structure.pptxwondmhunegn
 
Oracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethodsOracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethodsAjith Narayanan
 
FACS2017-Presentation.pdf
FACS2017-Presentation.pdfFACS2017-Presentation.pdf
FACS2017-Presentation.pdfallberson
 
Advanced property tracking Industrial Modeling Framework
Advanced property tracking Industrial Modeling FrameworkAdvanced property tracking Industrial Modeling Framework
Advanced property tracking Industrial Modeling FrameworkAlkis Vazacopoulos
 

Similar to Ikc 2015 (20)

Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
oracle-complex-event-processing-066421
oracle-complex-event-processing-066421oracle-complex-event-processing-066421
oracle-complex-event-processing-066421
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
 
Unit 1
Unit  1Unit  1
Unit 1
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
PID2143641
PID2143641PID2143641
PID2143641
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Rejunevating software reengineering processes
Rejunevating software reengineering processesRejunevating software reengineering processes
Rejunevating software reengineering processes
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Advanced Production Accounting of a Flotation Plant
Advanced Production Accounting of a Flotation PlantAdvanced Production Accounting of a Flotation Plant
Advanced Production Accounting of a Flotation Plant
 
Advanced Production Accounting
Advanced Production AccountingAdvanced Production Accounting
Advanced Production Accounting
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
 
E05312426
E05312426E05312426
E05312426
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Chapter 1 Data structure.pptx
Chapter 1 Data structure.pptxChapter 1 Data structure.pptx
Chapter 1 Data structure.pptx
 
Oracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethodsOracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethods
 
FACS2017-Presentation.pdf
FACS2017-Presentation.pdfFACS2017-Presentation.pdf
FACS2017-Presentation.pdf
 
Advanced property tracking Industrial Modeling Framework
Advanced property tracking Industrial Modeling FrameworkAdvanced property tracking Industrial Modeling Framework
Advanced property tracking Industrial Modeling Framework
 

More from Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Khalid Belhajjame
 

More from Khalid Belhajjame (10)

Provenance witha purpose
Provenance witha purposeProvenance witha purpose
Provenance witha purpose
 
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)
 
Edbt 2010, Belhajjame
Edbt 2010, BelhajjameEdbt 2010, Belhajjame
Edbt 2010, Belhajjame
 

Recently uploaded

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Ikc 2015

Editor's Notes

  1. Workflows are increasingly used by scientists as a means for specifying and enacting their experiments. Such workflows are often data intensive [5]. The data sets obtained by their enactment have several applications, e.g., they can be used to understand new phenomena or confirm known facts, and therefore such data sets are worth storing (or preserving) for future analyzes.
  2. -scientific workflows have been used to encode in-silico experiments. -The design of scientific workflows can be a difficult task . It requires a deep knowledge of the domain as well as awareness of the programs and services available for implementing the workflow steps. -In 2009, De Roure and coauthors pointed out the advantages of sharing and reusing workflows from scientific workflows repositores like MyExperiment, Crowdlabs, Galaxy and others. -The problem is that the size of these repositories is continuously growing and many problems relating to the reuse of available workflows emerged, example it become difficlut to distinguish a special use case from a usage pattern -So using mining techniques forms a goos solution. Lets discuss the most important contributions in mining workflows.
  3. Filtering Notre système extrait de ce fichier graphe( Workflow de l'utilisateur) un ensemble de mot ( c'est l'ensemble de mots existant das les labels des noeuds d'activité; attention un label peut comporter plus d'un mot concatenés par un séparateur. on extrait la liste de mots completes. puis nous la soumettons a JaW api de wordnet il nous renvois la liste de tous les synonymes pour chaque mot. la nous avons une liste sémantiquement enrichie. on fait une recherche à partir de cette derniere liste, si un workflow contient mot de cette liste il est retenu.
  4. The concept is simple; Firstly the user enter its workflow (sub-workflow) in an XML format, we transform it into graph format then we extract the list of unique words mentionned in all the labels of the workflow. We estbalish a list of the kaywords and their synonymsthanks to wordnet (Java API for WordNet Searching (JAWS) to retrieve the synsets of a given label from WordNet ). After what we select from the repository only the BP/Workflows that matches at least one from the last list.
  5. The challenges to be addressed are the following : – Which mining algorithm to employ for finding frequent patterns in the repository? – Which graph representation is best suited for formatting workflows for mining frequent fragments? –how to deal with the heterogeneity of the labels used by different users to model the activities of their workflows within the repository?
  6. We conducted two experiments. The first aims to validate our proposed representation model D/D1 and to show the drawbacks of the other models. The second experiment aims to validate the filter. We compare the efficiency and effectiveness of the models.On the effectiveness plan, We focus on proving the drawback of the representation model C when it comes to extract recurrent fragments that contain the XOR link .SO, we manually created a synthetic dataset which ensures that the following sub-structure is the most recurrent. As the size of the synthetic dataset is limited ( 9 BP) we extend it to three dataset by adding some workflows from the Taverna 1 repository, while preserving the goal that the most recurrent sub-workflow is the one already presented . we compared the efficiency and effectiveness of the representation models. The second experiment assesses the impact of the semantic filter.
  7. A is the most expensive in term of space disk required to encode the base in graph format. Concerning the C model as expected: it required more than twice (the number of edges and nodes) the bits required by the model that we propose, namely D and D1, however this ratio decreases to rich between a quarter to the tenth with larger bases. This decrease is due to the content of these bases, with a low percentage of BP with XOR nodes. In third position comes the Model B, it requires less than between 25% up to 40% more than the model D and D1 in terms of number of nodes, edges and bits used. Models D and D1 require the same number of edges and nodes to encode the input data, however the labeling of edges consumes more bits to be encoded.
  8. .We don't care about correctly classifying negative instances, you just don't want too many of them polluting our results. Model C: concerning these experimentation, as expected the Model C led to the worst qualitative performances. C performs a recall rate that varies between 0% and 61.54% and an average recall around 35%; The model C can, at best, discover only one alternative at time(in our case there is 2 alternatives attached to the XOR node) . Model A:The top extracted substructures are more significant than that of model C, and less significant than other models. However when it comes to larger sized databases, results show a dramatic decline in the quality of its sub-structures reaching 0% in terms of precision and recall; which means there is no extracted substructure related to the user expectation. This limitation, can be explained by the excessive use of control nodes. On large input data, their percentage becomes quite significant leading Subdue algorithm to consider them as important sub-structures. Model B :The model B performs much better than the previous two models, A and C. In fact, The model B retrieved successfully almost 67% of the BP elements of the target sub-structure. more than two time than model C and between 13 to 66% more than model A. Comparing model B to model D; In the other side, models B and D led to very similar accuracy performances. Although, the Model B was able to discover more relevant BP elements than model D (about 10% more), it returned more useless or irrelevant BP elements(around 7%). labeling the edges lead to specializations of the same abstract workflow template and consequently affects the quality of results returned (decrease recall). Model D: We can notice a common performance between models D and D1,which distinguish them from other models. Both of them led to a good precision rate. This performance is due to the fact that these two models do not use control nodes and thereby avoid a negative inference on the results. On large input data their percentage becomes quite significant leading Subdue algorithm to consider as significant typical sub-structures of the coding scheme of the model rules (decrease Precison). The results of the first experiment show clearly that the model D1 records the best performances on all the levels without exception. TP+TN/TP+TN+FP+FN
  9. .We don't care about correctly classifying negative instances, you just don't want too many of them polluting our results. Model C: concerning these experimentation, as expected the Model C led to the worst qualitative performances. C performs a recall rate that varies between 0% and 61.54% and an average recall around 35%; The model C can, at best, discover only one alternative at time(in our case there is 2 alternatives attached to the XOR node) . Model A:The top extracted substructures are more significant than that of model C, and less significant than other models. However when it comes to larger sized databases, results show a dramatic decline in the quality of its sub-structures reaching 0% in terms of precision and recall; which means there is no extracted substructure related to the user expectation. This limitation, can be explained by the excessive use of control nodes. On large input data, their percentage becomes quite significant leading Subdue algorithm to consider them as important sub-structures. Model B :The model B performs much better than the previous two models, A and C. In fact, The model B retrieved successfully almost 67% of the BP elements of the target sub-structure. more than two time than model C and between 13 to 66% more than model A. Comparing model B to model D; In the other side, models B and D led to very similar accuracy performances. Although, the Model B was able to discover more relevant BP elements than model D (about 10% more), it returned more useless or irrelevant BP elements(around 7%). labeling the edges lead to specializations of the same abstract workflow template and consequently affects the quality of results returned (decrease recall). Model D: We can notice a common performance between models D and D1,which distinguish them from other models. Both of them led to a good precision rate. This performance is due to the fact that these two models do not use control nodes and thereby avoid a negative inference on the results. On large input data their percentage becomes quite significant leading Subdue algorithm to consider as significant typical sub-structures of the coding scheme of the model rules (decrease Precison). The results of the first experiment show clearly that the model D1 records the best performances on all the levels without exception. TP+TN/TP+TN+FP+FN
  10. The model A is the most expensive in terms of execution time, around 55 up to 25 more time than model D and D1. Let compare the other models. Although on the qualitative level, model B performs better than model C model C seems to be far less expensive. As expected the model D and D1 led to very performances, whereas model D1 performs slightly better.
  11. The results of the second experimentation shows that the use of the semantic filter caused a reduction of in the input date size (bits) 99% which dramatically improved the execution time 36 times less.
  12. Decrease the Disk-space Decrease the RAM Decrease the Execution time Increase the quality of results