This document proposes a method to improve the reuse of workflow fragments by mining workflow repositories. It evaluates different graph representations of workflows and uses the SUBDUE algorithm to identify recurrent fragments. An experiment compares representations on precision, recall, memory usage, and time. Representation D1, which labels edges and nodes, performed best. A second experiment assesses how filtering workflows by keywords impacts finding relevant fragments for a user query. The method aims to incorporate workflow fragment search capabilities into the design lifecycle to promote reuse.
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
Scripting languages like Python, R, andMATLAB have seen significant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the provenance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by identifying the data inputs and the processing steps that were used to produce it. By and large, the provenance information captured for scripts is fine-grained in the sense that it captures data dependencies at the level of script statement, and do so for every variable within the script. While useful, the amount of recorded provenance information can be overwhelming for users and cumbersome to use. This suggests the need for abstraction mechanisms that focus attention on specific parts of provenance relevant for analyses. Toward this goal, we advocate that fine-grained provenance information recorded as the result of script execution can be abstracted using user-specified, workflow-like views. Specifically, we show how the provenance traces recorded by noWorkflow can be mapped to the workflow specifications generated by YesWorkflow from scripts based on user annotations. We examine the issues in constructing a successful mapping, provide an initial implementation of our solution, and present competency queries illustrating how a workflow view generated from the script can be used to explore the provenance recorded during script execution.
I gave this talk in TAPP 2014 during the provenance week in Cologne, on inferring fine graine dependencies between data (ports) in scientific workflows. -- khalid
The document discusses assisting designers in composing workflows through the reuse of frequent workflow fragments mined from repositories. It proposes an approach that involves mining fragments, representing workflows as graphs, homogenizing activity labels, and allowing users to search for fragments using keywords and activities from their initial workflow. Fragments are retrieved based on relevance to keywords and compatibility to specified activities, then ranked and presented to users for composition. Experiments assess different graph representations for mining fragments in terms of effectiveness, size and runtime. The approach aims to help designers reuse best practices from repositories when specifying new workflows.
stacks in algorithems and data structurefaran nawaz
This document discusses stacks and their applications such as converting infix expressions to postfix and prefix expressions. It covers expression evaluation and different notations like polish notation. The key advantages of postfix notation are that it avoids ambiguities of infix notation and allows simple left to right evaluation of expressions using a stack. The document provides examples of infix, prefix and postfix expressions and the steps to convert between the notations. It also describes the process of postfix expression evaluation using a stack.
This document is a presentation on data structures in C# by Mr. Mahmoud R. Alfarra. It introduces C# and its uses in different applications. It covers various data types in C#, calculations and logical operations, control statements like if/else and loops. The document also discusses arrays, methods, and classes in C#. It provides examples to explain concepts like selection statements, iteration, and calling methods. The presentation aims to provide an introduction to the principles of C# for learning purposes.
This document provides an outline and overview of linked lists. It defines a linked list as a collection of nodes that are linked together by references to the next node. Each node contains a data field and a reference field. It describes how to implement a linked list using a self-referential class with fields for data and a reference to the next node. It then outlines common linked list operations like insertion and deletion at different positions as well as sorting and searching the linked list.
Chapter 3: basic sorting algorithms data structureMahmoud Alfarra
The document provides an outline and introduction for a chapter on basic sorting algorithms, including bubble sort, selection sort, and insertion sort algorithms. It includes pseudocode examples and explanations of each algorithm. It notes that bubble sort is one of the slowest but simplest algorithms, involving values "floating" to their correct positions. Selection sort finds the smallest element and places it in the first position, then repeats to find the next smallest. Insertion sort works by moving larger elements to the right to make room for smaller elements inserted from the left.
BCA, JIMS Vasant Kunj-II teaches C language to First Semester students. In this pdf, you can read the fundamentals of Array. JIMS Vasant Kunj-II is one of the best BCA colleges in Delhi NCR with an updated Curriculum.
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
Scripting languages like Python, R, andMATLAB have seen significant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the provenance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by identifying the data inputs and the processing steps that were used to produce it. By and large, the provenance information captured for scripts is fine-grained in the sense that it captures data dependencies at the level of script statement, and do so for every variable within the script. While useful, the amount of recorded provenance information can be overwhelming for users and cumbersome to use. This suggests the need for abstraction mechanisms that focus attention on specific parts of provenance relevant for analyses. Toward this goal, we advocate that fine-grained provenance information recorded as the result of script execution can be abstracted using user-specified, workflow-like views. Specifically, we show how the provenance traces recorded by noWorkflow can be mapped to the workflow specifications generated by YesWorkflow from scripts based on user annotations. We examine the issues in constructing a successful mapping, provide an initial implementation of our solution, and present competency queries illustrating how a workflow view generated from the script can be used to explore the provenance recorded during script execution.
I gave this talk in TAPP 2014 during the provenance week in Cologne, on inferring fine graine dependencies between data (ports) in scientific workflows. -- khalid
The document discusses assisting designers in composing workflows through the reuse of frequent workflow fragments mined from repositories. It proposes an approach that involves mining fragments, representing workflows as graphs, homogenizing activity labels, and allowing users to search for fragments using keywords and activities from their initial workflow. Fragments are retrieved based on relevance to keywords and compatibility to specified activities, then ranked and presented to users for composition. Experiments assess different graph representations for mining fragments in terms of effectiveness, size and runtime. The approach aims to help designers reuse best practices from repositories when specifying new workflows.
stacks in algorithems and data structurefaran nawaz
This document discusses stacks and their applications such as converting infix expressions to postfix and prefix expressions. It covers expression evaluation and different notations like polish notation. The key advantages of postfix notation are that it avoids ambiguities of infix notation and allows simple left to right evaluation of expressions using a stack. The document provides examples of infix, prefix and postfix expressions and the steps to convert between the notations. It also describes the process of postfix expression evaluation using a stack.
This document is a presentation on data structures in C# by Mr. Mahmoud R. Alfarra. It introduces C# and its uses in different applications. It covers various data types in C#, calculations and logical operations, control statements like if/else and loops. The document also discusses arrays, methods, and classes in C#. It provides examples to explain concepts like selection statements, iteration, and calling methods. The presentation aims to provide an introduction to the principles of C# for learning purposes.
This document provides an outline and overview of linked lists. It defines a linked list as a collection of nodes that are linked together by references to the next node. Each node contains a data field and a reference field. It describes how to implement a linked list using a self-referential class with fields for data and a reference to the next node. It then outlines common linked list operations like insertion and deletion at different positions as well as sorting and searching the linked list.
Chapter 3: basic sorting algorithms data structureMahmoud Alfarra
The document provides an outline and introduction for a chapter on basic sorting algorithms, including bubble sort, selection sort, and insertion sort algorithms. It includes pseudocode examples and explanations of each algorithm. It notes that bubble sort is one of the slowest but simplest algorithms, involving values "floating" to their correct positions. Selection sort finds the smallest element and places it in the first position, then repeats to find the next smallest. Insertion sort works by moving larger elements to the right to make room for smaller elements inserted from the left.
BCA, JIMS Vasant Kunj-II teaches C language to First Semester students. In this pdf, you can read the fundamentals of Array. JIMS Vasant Kunj-II is one of the best BCA colleges in Delhi NCR with an updated Curriculum.
Chapter 4: basic search algorithms data structureMahmoud Alfarra
1) The document discusses two common search algorithms: sequential search and binary search. Sequential search looks at each item in a list sequentially until the target is found. Binary search works on a sorted list and divides the search space in half at each step.
2) It provides pseudocode examples of how each algorithm works step-by-step to find a target value in a list or array.
3) Binary search is more efficient than sequential search when the list is sorted, as it can significantly reduce the number of comparisons needed to find the target. Sequential search is used when the list is unsorted.
BCA, Department of Information Technology and Software Development teaches Java and Advanced Java in the Third and Fifth semesters. The best part of the Department faculties is to teach the software in the latest tool, which is used by the IT Experts in the software Companies. We teach Java and Advance Java in Eclipse, Net Beans, and IntelliJ.
JIMS Vasant Kunj-II provides the best BCA Course. This is one of the best BCA colleges in Delhi NCR. The admissions 2022 is open and interested students can apply.
www.jimssouthdelhi.com
We at JIMS Vasant Kunj-II use the latest tools to use all the latest languages we included in the curriculum.
Our BCA Curriculum is well updated as per the Industry Demand and standards.
This document discusses Java lambdas and streams. It begins with an introduction to the speaker, Oleg Tsal-Tsalko, and provides an overview of Java 8 streams including their benefits and common operations. It then covers lambda expressions, functional interfaces, and how lambdas and streams have influenced existing Java classes. The document concludes by providing instructions for downloading a test project to practice using lambdas and streams.
A short introduction to, and practical experiment of, the new features available to the Java craftsman.
Java 8 is mostly about new functional programming paradigms.
Lambdas, method references and the stream API are powerful and yet sometimes hard to understand concepts. These slides are mainly about describing those features, in a way, that is easy to understand, through, analogies and thought experiements discussions.
This document discusses data structures in Java. It begins with an introduction to data structures and their classification as linear, non-linear, static, or dynamic memory allocation. Linear structures include linked lists, stacks, and queues, while non-linear structures are trees and graphs. The document then covers generic programming in Java, collection classes like ArrayLists and LinkedLists, and applications of common data structures. Big-O notation for analyzing algorithms is also introduced. The document contains examples and source code to demonstrate working with collections in Java.
This is a 3 part series on Java8 Features. Drop me an email for a discussion - singh.marut@gmail.com
Code is available at https://github.com/singhmarut/java8training
Videos available at my youtube channel https://www.youtube.com/channel/UCBM4yHwfjQ_syW6Lz8kYpmA
This presentaion provides and overview of the new features of Java 8, namely default methods, functional interfaces, lambdas, method references, streams and Optional vs NullPointerException.
This presentation by Arkadii Tetelman (Lead Software Engineer, GlobalLogic) was delivered at Java.io 3.0 conference in Kharkiv on March 22, 2016.
In this report, we produce a dynamic analysis approach which extracts
all function definitions that can be hoisted using dynamic
analysis framework Jalangi framework. This approach was evaluated
on the following JS Libraries: Q1, Underscore and Lodash.
The accuracy of this approach was 100%, 50%, and 100% respectively.
Keywords: Hoisting Functions - Nested Functions- Dynamic
Analysis.
This document provides an outline and overview of hashing and hash tables. It defines hashing as a technique for storing data to allow for fast insertion, retrieval, and deletion. A hash table uses an array and hash function to map keys to array indices. Collision resolution techniques like linear probing are discussed. The document also summarizes the Hashtable class in .NET, which uses buckets and load factor to avoid collisions. Examples of hash functions and using the Hashtable class are provided.
This is a 3 part series on Java8 Features. Drop me an email for a discussion - singh.marut@gmail.com
https://github.com/singhmarut/java8training
Videos available at my youtube channel https://www.youtube.com/channel/UCBM4yHwfjQ_syW6Lz8kYpmA
The document discusses new features in Java 8 including lambda expressions, method references, default methods, static methods, and streams. Lambda expressions allow implementing functional interfaces with anonymous functions. Method references provide a shorthand for lambda expressions that call existing methods. Default methods allow adding new methods to interfaces without breaking existing implementations. Static methods enable calling methods on interfaces without instantiating them. Streams provide a new way to process collections through functional-style operations like map, filter, and reduce.
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs Lisong Guo
The document describes SherLog, a tool that helps debug systems by connecting clues from runtime logs. It infers possible failure-inducing execution paths and constraints along the paths by matching log sequences to control and data flow in source code. It also symbolically executes paths to infer the value flow of variables. The tool was evaluated on real-world bugs and found to infer useful reproduction information.
tracts
all function definitions that can be hoisted using dynamic
analysis framework Jalangi framework. This approach was evaluated
on the following JS Libraries: Q1, Underscore and Lodash.
The accuracy of this approach was 100%, 50%, and 100% respectively.
Keywords: Hoisting Functions - Nested Functions- Dynamic
Analysis.
The document discusses different types of variable storage classes in C programming:
- Automatic variables are local to the function they are declared in.
- External variables have scope from their point of declaration to the end of the program.
- Static variables are local to the function they are declared in but retain their value between calls.
A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order
What is sorting algorithm
The bubble sort
The selection sort
The insertion sort
The Quick sort
The Shell Sort
This document discusses stack organization and operations. A stack is a last-in, first-out data structure where items added last are retrieved first. It uses a stack pointer to track the top of the stack. Common operations are push, which adds an item to the top of the stack, and pop, which removes an item from the top. Stacks can be implemented with registers, using a stack pointer and data register. Reverse Polish notation places operators after operands, making it suitable for stack-based expression evaluation.
This document provides an overview of Java 8 lambda expressions. It begins with an introduction and background on anonymous classes in previous Java versions. The main topics covered include lambda syntax, lambda expressions vs closures, capturing variables in lambda expressions, type inference, and functional interfaces. It also discusses stream operations like filter, map, flatMap, and peek. Finally, it covers parallelism and how streams can leverage multiple cores to improve performance.
This document discusses function types in Scala, including defining functions as objects or classes that extend function types. It provides examples of defining a factorial function recursively and using function types, as well as discussing recursion patterns like base cases and inductive steps. It also briefly mentions fold operations like foldLeft and foldRight, and using for comprehensions as a substitute for map/flatMap/filter operations.
A use case designed in the context of the Dataone provenance woring group illustrating how the provenance traces generated by differet workflow engines can be quered via the D-PROV model.
These slides introduces the second edition of ProvBench which I am leading to collect a corpus of provenance data for benchmarking for the provenance (and scientific) community
Chapter 4: basic search algorithms data structureMahmoud Alfarra
1) The document discusses two common search algorithms: sequential search and binary search. Sequential search looks at each item in a list sequentially until the target is found. Binary search works on a sorted list and divides the search space in half at each step.
2) It provides pseudocode examples of how each algorithm works step-by-step to find a target value in a list or array.
3) Binary search is more efficient than sequential search when the list is sorted, as it can significantly reduce the number of comparisons needed to find the target. Sequential search is used when the list is unsorted.
BCA, Department of Information Technology and Software Development teaches Java and Advanced Java in the Third and Fifth semesters. The best part of the Department faculties is to teach the software in the latest tool, which is used by the IT Experts in the software Companies. We teach Java and Advance Java in Eclipse, Net Beans, and IntelliJ.
JIMS Vasant Kunj-II provides the best BCA Course. This is one of the best BCA colleges in Delhi NCR. The admissions 2022 is open and interested students can apply.
www.jimssouthdelhi.com
We at JIMS Vasant Kunj-II use the latest tools to use all the latest languages we included in the curriculum.
Our BCA Curriculum is well updated as per the Industry Demand and standards.
This document discusses Java lambdas and streams. It begins with an introduction to the speaker, Oleg Tsal-Tsalko, and provides an overview of Java 8 streams including their benefits and common operations. It then covers lambda expressions, functional interfaces, and how lambdas and streams have influenced existing Java classes. The document concludes by providing instructions for downloading a test project to practice using lambdas and streams.
A short introduction to, and practical experiment of, the new features available to the Java craftsman.
Java 8 is mostly about new functional programming paradigms.
Lambdas, method references and the stream API are powerful and yet sometimes hard to understand concepts. These slides are mainly about describing those features, in a way, that is easy to understand, through, analogies and thought experiements discussions.
This document discusses data structures in Java. It begins with an introduction to data structures and their classification as linear, non-linear, static, or dynamic memory allocation. Linear structures include linked lists, stacks, and queues, while non-linear structures are trees and graphs. The document then covers generic programming in Java, collection classes like ArrayLists and LinkedLists, and applications of common data structures. Big-O notation for analyzing algorithms is also introduced. The document contains examples and source code to demonstrate working with collections in Java.
This is a 3 part series on Java8 Features. Drop me an email for a discussion - singh.marut@gmail.com
Code is available at https://github.com/singhmarut/java8training
Videos available at my youtube channel https://www.youtube.com/channel/UCBM4yHwfjQ_syW6Lz8kYpmA
This presentaion provides and overview of the new features of Java 8, namely default methods, functional interfaces, lambdas, method references, streams and Optional vs NullPointerException.
This presentation by Arkadii Tetelman (Lead Software Engineer, GlobalLogic) was delivered at Java.io 3.0 conference in Kharkiv on March 22, 2016.
In this report, we produce a dynamic analysis approach which extracts
all function definitions that can be hoisted using dynamic
analysis framework Jalangi framework. This approach was evaluated
on the following JS Libraries: Q1, Underscore and Lodash.
The accuracy of this approach was 100%, 50%, and 100% respectively.
Keywords: Hoisting Functions - Nested Functions- Dynamic
Analysis.
This document provides an outline and overview of hashing and hash tables. It defines hashing as a technique for storing data to allow for fast insertion, retrieval, and deletion. A hash table uses an array and hash function to map keys to array indices. Collision resolution techniques like linear probing are discussed. The document also summarizes the Hashtable class in .NET, which uses buckets and load factor to avoid collisions. Examples of hash functions and using the Hashtable class are provided.
This is a 3 part series on Java8 Features. Drop me an email for a discussion - singh.marut@gmail.com
https://github.com/singhmarut/java8training
Videos available at my youtube channel https://www.youtube.com/channel/UCBM4yHwfjQ_syW6Lz8kYpmA
The document discusses new features in Java 8 including lambda expressions, method references, default methods, static methods, and streams. Lambda expressions allow implementing functional interfaces with anonymous functions. Method references provide a shorthand for lambda expressions that call existing methods. Default methods allow adding new methods to interfaces without breaking existing implementations. Static methods enable calling methods on interfaces without instantiating them. Streams provide a new way to process collections through functional-style operations like map, filter, and reduce.
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs Lisong Guo
The document describes SherLog, a tool that helps debug systems by connecting clues from runtime logs. It infers possible failure-inducing execution paths and constraints along the paths by matching log sequences to control and data flow in source code. It also symbolically executes paths to infer the value flow of variables. The tool was evaluated on real-world bugs and found to infer useful reproduction information.
tracts
all function definitions that can be hoisted using dynamic
analysis framework Jalangi framework. This approach was evaluated
on the following JS Libraries: Q1, Underscore and Lodash.
The accuracy of this approach was 100%, 50%, and 100% respectively.
Keywords: Hoisting Functions - Nested Functions- Dynamic
Analysis.
The document discusses different types of variable storage classes in C programming:
- Automatic variables are local to the function they are declared in.
- External variables have scope from their point of declaration to the end of the program.
- Static variables are local to the function they are declared in but retain their value between calls.
A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order
What is sorting algorithm
The bubble sort
The selection sort
The insertion sort
The Quick sort
The Shell Sort
This document discusses stack organization and operations. A stack is a last-in, first-out data structure where items added last are retrieved first. It uses a stack pointer to track the top of the stack. Common operations are push, which adds an item to the top of the stack, and pop, which removes an item from the top. Stacks can be implemented with registers, using a stack pointer and data register. Reverse Polish notation places operators after operands, making it suitable for stack-based expression evaluation.
This document provides an overview of Java 8 lambda expressions. It begins with an introduction and background on anonymous classes in previous Java versions. The main topics covered include lambda syntax, lambda expressions vs closures, capturing variables in lambda expressions, type inference, and functional interfaces. It also discusses stream operations like filter, map, flatMap, and peek. Finally, it covers parallelism and how streams can leverage multiple cores to improve performance.
This document discusses function types in Scala, including defining functions as objects or classes that extend function types. It provides examples of defining a factorial function recursively and using function types, as well as discussing recursion patterns like base cases and inductive steps. It also briefly mentions fold operations like foldLeft and foldRight, and using for comprehensions as a substitute for map/flatMap/filter operations.
A use case designed in the context of the Dataone provenance woring group illustrating how the provenance traces generated by differet workflow engines can be quered via the D-PROV model.
These slides introduces the second edition of ProvBench which I am leading to collect a corpus of provenance data for benchmarking for the provenance (and scientific) community
This document proposes representing scientific workflows as first-class citizens called research objects. It presents a model for workflow research objects that aggregates all necessary elements to understand an investigation. These include experiments, annotations, results, datasets and provenance. Research objects are encoded using semantic technologies like RDF and follow standards such as the Object Exchange model. The lifecycle of research objects is also described.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
A Sightseeing Tour of Prov and Some of its ExtensionsKhalid Belhajjame
This document provides an overview of the PROV provenance model and some of its extensions. It discusses the motivation for provenance, the history and development of the PROV model, its key concepts of entities, activities, and agents. It also describes extensions like ProvONE and PAV that build upon PROV to model workflow and scientific provenance.
This is a keynote that I have given in polyweb workshop on the state of the art of data science reproducibility. I review tools that have been developed over the last few years in the first part. In the second part, I focus on proposals that I have been involved in to facilitate workflow reproducibility and preservation.
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
Audience: Architects, Data Scientists, Developers
Technical level: Introductory
From home intrusion detection, to self-driving cars, to keeping data center operations healthy, Machine Learning (ML) has become one of the hottest topics in software engineering today. While much of the focus has been on the actual creation of the algorithms used in ML, the less talked-about challenge is how to serve these models in production, often utilizing real-time streaming data.
The traditional approach to model serving is to treat the model as code, which means that ML implementation has to be continually adapted for model serving. As the amount of machine learning tools and techniques grows, the efficiency of such an approach is becoming more questionable. Additionally, machine learning and model serving are driven by very different quality of service requirements; while machine learning is typically batch, dealing with scalability and processing power, model serving is mostly concerned with performance and stability.
In this webinar with O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, we will define an alternative approach to model serving, based on treating the model itself as data. Using popular frameworks like Akka Streams and Apache Flink, Boris will review how to implement this approach, explaining how it can help you:
* Achieve complete decoupling between the model implementation for machine learning and model serving, enforcing better standardization of your model serving implementation.
* Enable dynamic updates of the served model without having to restart the system.
* Utilize Tensorflow and PMML as model representation and their usage for building “real time updatable” model serving architecture.
The document discusses complex event processing (CEP) and its use of Oracle Continuous Query Language (CQL) to process streaming data. CEP can process hundreds of thousands of events per second and detect patterns. CQL extends SQL to enable developers to query streaming data. The document provides examples of how CEP can be used in applications like monitoring financial transactions and algorithmic stock trading. It demonstrates the use of CQL concepts like streams, relations, operators and patterns to implement solutions.
Microservices Part 4: Functional Reactive ProgrammingAraf Karsh Hamid
ReactiveX is a combination of the best ideas from the Observer pattern, the Iterator pattern, and functional programming. It combines the Observer pattern, Iterator pattern, and functional programming concepts. ReactiveX allows for asynchronous and event-based programming by using the Observer pattern to push data to observers, rather than using a synchronous pull-based approach.
The document discusses object oriented programming concepts. It describes key object oriented programming concepts like encapsulation, inheritance, polymorphism, and message passing. It also discusses benefits of object oriented programming like modularity, reusability and extensibility. Some disadvantages discussed are compiler overhead, runtime overhead and need for reorientation of developers to object oriented thinking. The document also covers C++ programming concepts like data types, control flow, functions, arrays, strings, pointers and storage classes.
Over time, Machine Learning inference workloads became more and more demanding in terms of latency and throughput. Moreover, many inference workloads compute predictions based on a limited number of models that are deployed in the system. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In particular, Pretzel can properly schedule ML jobs on NUMA machines, whose complexities may impact latencies and efficiency aspects.
In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
Slides of the presentation for my PhD dissertation. I strongly recommend downloading the slides, as they have animations that are easier to see in power point. The abstract of the thesis is as follows: "Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows".
This document proposes extending algorithmic skeletons with event-driven programming to address the inversion of control problem in skeleton frameworks. It introduces event listeners that can be registered at event hooks within skeletons to access runtime information. This allows implementing non-functional concerns like logging and performance monitoring separately from the core parallel logic. The approach is implemented in the Skandium skeleton library, and examples are given of a logger and online performance monitor built using it. An analysis shows the overhead of processing events is negligible, at around 20 microseconds per event.
Operationalizing Machine Learning: Serving ML ModelsLightbend
Join O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, as he discusses one of the hottest topics in software engineering today: serving machine learning models.
Typically with machine learning, different groups are responsible for model training and model serving. Data scientists often introduce their own machine-learning tools, causing software engineers to create complementary model-serving frameworks to keep pace. It’s not a very efficient system. In this webinar, Boris demonstrates a more standardized approach to model serving and model scoring:
* How to develop an architecture for serving models in real time as part of input stream processing
* How this approach enables data science teams to update models without restarting existing applications
* Different ways to build this model-scoring solution, using several popular stream processing engines and frameworks
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
This document summarizes the results of simulations run to analyze the performance of different processor configurations with varying levels of instruction-level parallelism. The key findings are:
1) For processors with significant memory latency, there is little performance difference between simple in-order and more complex out-of-order designs, as memory latency dominates execution time.
2) Supporting just two concurrently pending instructions provides most of the benefit of more complex out-of-order execution, while greatly reducing hardware complexity.
3) As the mismatch between processor and memory system performance increases, all designs see similar performance, regardless of the level of instruction-level parallelism exploited.
The document discusses using Model Driven Architecture (MDA) to reengineer legacy software systems in a more automated way compared to traditional reengineering approaches. MDA provides platform independent and specific models that can be used to generate code for different platforms, formalizing the mapping of services between source and target platforms. Several papers are referenced that propose techniques for static and dynamic analysis of code to generate UML models as part of the reengineering process using MDA.
Cooperative Task Execution for Apache SparkDatabricks
Apache Spark has enabled a vast assortment of users to express batch, streaming, and machine learning computations, using a mixture of programming paradigms and interfaces. Lately, we observe that different jobs are often implemented as part of the same application to share application logic, state, or to interact with each other. Examples include online machine learning, real-time data transformation and serving, low-latency event monitoring and reporting. Although the recent addition of Structured Streaming to Spark provides the programming interface to enable such unified applications over bounded and unbounded data, the underlying execution engine was not designed to efficiently support jobs with different requirements (i.e., latency vs. throughput) as part of the same runtime. It therefore becomes particularly challenging to schedule such jobs to efficiently utilize the cluster resources while respecting their requirements in terms of task response times. Scheduling policies such as FAIR could alleviate the problem by prioritizing critical tasks, but the challenge remains, as there is no way to guarantee no queuing delays. Even though preemption by task killing could minimize queuing, it would also require task resubmission and loss of progress, leading to wasted cluster resources. In this talk, we present Neptune, a new cooperative task execution model for Spark with fine-grained control over resources such as CPU time. Neptune utilizes Scala coroutines as a lightweight mechanism to suspend task execution with sub-millisecond latency and introduces new scheduling policies that respect diverse task requirements while efficiently sharing the same runtime. Users can directly use Neptune for their continuous applications as it supports all existing DataFrame, DataSet, and RDD operators. We present an implementation of the execution model as part of Spark 2.4.0 and describe the observed performance benefits from running a number of streaming and machine learning workloads on an Azure cluster.
Speaker: Konstantinos Karanasos
The document provides an introduction to an advanced production accounting (APA) framework called APA-FP-IMF that is applied to a tin-iron flotation plant case study. The framework models the plant using a unit-operation-port-state superstructure (UOPSS) and solves the resulting bilinear data reconciliation problem to determine if any gross errors are present in the plant data. The analysis finds no statistically detectable gross errors.
Advanced Production Accounting (APA) uses statistical data reconciliation and regression to clean past production data when processes are assumed to be at steady-state. It defines a simultaneous mass and volume problem with density. This is depicted in an oil refinery flowsheet using the unit-operation-port-state superstructure (UOPSS). Key differences from prior work are that UOPSS uses "ports" to represent flows, requiring fewer quality measurements. Industrial Modeling Framework (IMF) implements the mathematical formulations using IMPRESS modeling language and solvers like SECQPE and SORVE for data reconciliation problems. IMFs provide pre-configured models for industrial projects.
This document discusses online analytical processing (OLAP) for business intelligence using a 3D architecture. It proposes the Next Generation Greedy Dynamic Mix based OLAP algorithm (NGGDM-OLAP) which uses a mix of greedy and dynamic approaches for efficient data cube modeling and multidimensional query results. The algorithm constructs execution plans in a top-down manner by identifying the most beneficial view at each step. The document also describes OLAP system architecture, multidimensional data modeling, different OLAP analysis models, and concludes that integrating OLAP and data mining tools can benefit both areas.
Many Machine Learning inference workloads compute predictions based on a limited number of models that are deployed together in the system. These models often share common structure and state. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks, thus being unaware of optimization and sharing opportunities.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
The document provides an introduction to data structures and algorithms analysis. It discusses that a program consists of organizing data in a structure and a sequence of steps or algorithm to solve a problem. A data structure is how data is organized in memory and an algorithm is the step-by-step process. It describes abstraction as focusing on relevant problem properties to define entities called abstract data types that specify what can be stored and operations performed. Algorithms transform data structures from one state to another and are analyzed based on their time and space complexity.
The document summarizes different statistical methods for analyzing the capacity of an Oracle database server:
Simple math models can analyze single metrics like CPU usage but have low precision. Linear regression analysis uses logical I/O to predict CPU utilization and found a database could handle 2000% more workload. Queuing theory models CPU and I/O as queues and uses Erlang C formulas to forecast capacity and scalability in a more precise way than simple models. The document provides examples of applying these methods to capacity analysis.
This document discusses a framework for certifying workflows in a component-based cloud computing platform for high-performance computing (HPC) services. It presents a Scientific Workflow Component Certifier (SWC2) that uses the mCRL2 model checking toolset to verify workflows meet safety and liveness properties. It describes translating scientific workflows specified in SAFeSWL into the mCRL2 input language and evaluating default and application-specific properties. As a case study, it models and certifies a MapReduce workflow in the HPC Shelf system, checking 20 formal properties and measuring certification times for different components.
Presented in this short document is a description of what we call "Advanced" Property Tracking or Tracing (APT). APT is the term given to the technique of predicting, simulating, calculating or estimating the properties (i.e., densities, compositions, conditions, qualities, etc.) in a network or superstructure with significant inventory using statistical data reconciliation and regression (DRR)
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
I gave this talk at the EDBT'2020 conference. It shows how the provenance of workflows can be anonymized without compromising lineage relationships between the data records that are used and generated by the modules that compose the workflow.
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
This document discusses an approach for preserving privacy in scientific workflows that use large datasets. It proposes using k-anonymity to anonymize sensitive workflow data. Parameter dependencies are leveraged to identify sensitive parameters and infer appropriate anonymity degrees. The approach was tested on 20 workflows, with overhead less than 1 millisecond. This preliminary work aims to assist scientists in anonymizing workflow data while enabling exploration of provenance and data products.
- The document discusses evaluating "why-not" queries against scientific workflow provenance. Why-not queries help understand why a data item was not returned by a workflow execution.
- It proposes a solution for evaluating why-not queries in workflows with black-box modules that do not preserve attribute information from inputs. The solution explores workflow modules from sink to source to identify "picky" modules responsible for a data item not appearing in results.
- To identify picky modules, it harvests information from the web by searching for traces of scientific module invocations to find valid candidate inputs and determine if a module accepts them or is likely picky. It conducts an experiment using real workflows to test the effectiveness of
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
1) The document presents a methodology to convert script-based experiments into reproducible workflow research objects (WROs). This addresses issues of understanding, reusing, and reproducing experiments conducted through scripts.
2) The methodology involves 5 steps: generate an abstract workflow, create an executable workflow, refine the workflow, record provenance data, and annotate and check the quality of the conversion.
3) Applying the methodology to a molecular dynamics simulation case study, the authors demonstrate how scripts can be transformed into WROs containing workflows, annotations, provenance data, and other resources needed for reproducibility.
This document discusses research objects and scientific workflows. It introduces research objects as a way to aggregate all elements needed to understand a research investigation, including datasets, results, experiments, and provenance. Scientific workflows are presented as tools for automating data-intensive scientific activities, with prospective and retrospective provenance capturing the intended and actual methods. The document outlines an approach to summarizing complex workflows using semantic annotations of workflow motifs and reduction primitives like collapse and eliminate. This distills provenance traces for improved understanding and querying.
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
Scientific Workflows have become the workhorse of BigData analytics for scientists. As well as being repeatable and optimizable pipelines that bring together datasets and analysis tools, workflows make-up an important part of the provenance of data generated from their execution. By faithfully capturing all stages in the analysis, workflows play a critical part in building up the audit-trail (a.k.a. provenance) meta- data for derived datasets and contributes to the veracity of results. Provenance is essential for reporting results, reporting the method followed, and adapting to changes in the datasets or tools. These functions, however, are hampered by the complexity of workflows and consequently the complexity of data-trails generated from their instrumented execution. In this paper we propose the generation of workflow description summaries in order to tackle workflow complexity. We elaborate reduction primitives for summarizing workflows, and show how prim- itives, as building blocks, can be used in conjunction with semantic workflow annotations to encode different summariza- tion strategies. We report on the effectiveness of the method through experimental evaluation using real-world workflows from the Taverna system.
A talk given at the EDBT/ICDT 2010 conference. For more details, visit the project website at http://img.cs.manchester.ac.uk/dataspaces/dataspaces.html
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
Ikc 2015
1. Mariem Harmassi, Daniela Grigori, Khalid
Belhajjame
LAMSADE, Université Paris Dauphine
Mining Workflow Repositories for
Improving Fragments Reuse
2. Workflows
A business process specified
using the BPMN notation
A Scientific Workflow system
(Taverna)
A workflow consists of an orchestrated and repeatable pattern of business
activity enabled by the systematic organization of resources into
processes that transform materials, provide services, or process
information (Workflow Coalition)
IKC 20152
3. Scientific Workflows
Scientific workflows are
increasingly used by scientists
as a means for specifying and
enacting their experiments.
They tend to be data intensive
The data sets obtained as a
result of their enactment can
be stored in public repositories
to be queried, analyzed and
used to feed the execution of
other workflows.IKC 20153
4. Workflows are difficult to design
The design of scientific workflows, just like
business process, can be a difficult task
Deep knowledge of the domain
Awareness of the resources, e.g., programs and
web services, that can enact the steps of the
workflow
Publish and share workflows, and promote
their reuse.
myExperiment, CrowldLab, Galaxy, and other
various business process repository
Reuse is still an aim.
There are no capabilities that support the user in
identifying the workflows, or fragments thereof, that
are relevant for the task at hand.IKC 20154
5. Fragment look-up in the life cycle of
workflow design
Design Workflow Search Fragments
Run Workflow
PublishWorkflow
Workflow
repositories
IKC 20155
6. Workflow Fragments Search
Why is it useful for?
The workflow designer knows the steps of the
fragment and their dependencies, but does not
know the resources (programs or web services) that
can be used for their implementation.
The designer may want to know how colleagues
and third parties designed the fragment (best
practices)
Elements of the solution
1. Filtering: Instead of search the whole repository,
we limit the number of workflows in the repository
to be examined to those that are relevant to the
user
2. Identify the fragments that are reccurrent in the
workflows retrieved in (1)
IKC 20156
7. 1 - Filtering step
Workflow
XML
Workflow
graph
List of
keywords
List of
keywords &
synonyms
Wordnet
BP
Repository
Filter
Else
IKC 20157
8. 2- Identify Recurrent Fragments
We use graph mining algorithms to identify the
fragments in the repository that are recurrent.
We use the SUBDUE algorithm.
Which graph representation to use to represent
(workflow) fragments?
We examined a number of workflow representation
IKC 20158
12. Experiments
1st experiment: To assess the suitability of the
graph representations for mining workflow graphs
Effectiveness : Precision/ Recall
Memory space : Disk space, DIV
Execution time
2nd experiment: To assess the impact of the
filtering step in narrowing the search to relevant
workflow fragments.
IKC 201512
13. Experiment 1: Dataset
We created three datasets of workflow
specifications, containing respectively 30, 42, and
71 workflows.
9 out of these workflows are similar to each other
and, as uch contain recurrent structures, that
should be detected by the mining algorithm.
Despite the small size of the collection, these
datasets allowed to distinguish to a certain extent
between the different representations.
IKC 201513
19. Experiment1: Summary
control nodes : recurrent patterns typical coding scheme
related to the model rule
Recall
Labeling the edges: specializations of the same abstract
workflow.
Precision
Xor as a set of alternatives: duplication , loss of
informations
Recall Precision
The Representation D1 seems to be therefore the one that
performs best
IKC 201519
20. Experiment 2
Data sets: All Taverna 1 workflows (498
workflows) from myExperiment
User query: We use a small fragment from a
workflow in myExperiment.
IKC 201520
21. Conclusion
Methodology for improving the reusability
Model of representation D + Filter
Improve the filter
Test others similarity measures
Need to assess the usefulness of the technics
presented in practice. And how they can be
incorporated in the workflow design life cycle.
In the context of the Contextual and Aggregrated
Information Retrieval (CAIR) project
IKC 201521
22. Mariem Harmassi, Daniela Grigori, Khalid
Belhajjame
LAMSADE, Université Paris Dauphine
Mining Workflow Repositories for
Improving Fragments Reuse
Editor's Notes
Workflows are increasingly used by scientists as a means for specifying and enacting their experiments. Such workflows are often data intensive [5]. The data sets obtained by their enactment have several applications, e.g., they can be used to understand new phenomena or confirm known facts, and therefore such data sets are worth storing (or preserving) for future analyzes.
-scientific workflows have been used to encode in-silico experiments.
-The design of scientific workflows can be a difficult task . It requires a deep knowledge of the domain as well as awareness of the programs and services available for implementing the workflow steps.
-In 2009, De Roure and coauthors pointed out the advantages of sharing and reusing workflows from scientific workflows repositores like MyExperiment, Crowdlabs, Galaxy and others.
-The problem is that the size of these repositories is continuously growing and many problems relating to the reuse of available workflows emerged, example it become difficlut to distinguish a special use case from a usage pattern
-So using mining techniques forms a goos solution.
Lets discuss the most important contributions in mining workflows.
Filtering
Notre système extrait de ce fichier graphe( Workflow de l'utilisateur) un ensemble de mot ( c'est l'ensemble de mots existant das les labels des noeuds d'activité; attention un label peut comporter plus d'un mot concatenés par un séparateur. on extrait la liste de mots completes.
puis nous la soumettons a JaW api de wordnet il nous renvois la liste de tous les synonymes pour chaque mot.
la nous avons une liste sémantiquement enrichie.
on fait une recherche à partir de cette derniere liste, si un workflow contient mot de cette liste il est retenu.
The concept is simple; Firstly the user enter its workflow (sub-workflow) in an XML format, we transform it into graph format then we extract the list of unique words mentionned in all the labels of the workflow. We estbalish a list of the kaywords and their synonymsthanks to wordnet (Java API for WordNet Searching (JAWS) to retrieve the synsets of a given label from WordNet ).
After what we select from the repository only the BP/Workflows that matches at least one from the last list.
The challenges to be addressed are the following :
– Which mining algorithm to employ for finding frequent patterns in the repository?
– Which graph representation is best suited for formatting workflows for mining frequent fragments?
–how to deal with the heterogeneity of the labels used by different users to
model the activities of their workflows within the repository?
We conducted two experiments. The first aims to validate our proposed representation model D/D1 and to show the drawbacks of the other models. The second experiment aims to validate the filter.
We compare the efficiency and effectiveness of the models.On the effectiveness plan, We focus on proving the drawback of the representation model C when it comes to extract recurrent fragments that contain the XOR link .SO, we manually created a synthetic dataset which ensures that the following sub-structure is the most recurrent. As the size of the synthetic dataset is limited ( 9 BP) we extend it to three dataset by adding some workflows from the Taverna 1 repository, while preserving the goal that the most recurrent sub-workflow is the one already presented .
we compared the efficiency and effectiveness of the representation models.
The second experiment assesses the impact of the semantic filter.
A is the most expensive in term of space disk required to encode the base in graph format.
Concerning the C model as expected: it required more than twice (the number of edges and nodes) the bits required by the model that we propose, namely D and D1, however this ratio decreases to rich between a quarter to the tenth with larger bases. This decrease is due to the content of these bases, with a low percentage
of BP with XOR nodes.
In third position comes the Model B, it requires less than between 25% up to 40% more than the model D and D1 in terms of number of nodes, edges and bits used.
Models D and D1 require the same number of edges and nodes to encode the input data, however the labeling of edges consumes more bits to be encoded.
.We don't care about correctly classifying negative instances, you just don't want too many of them polluting our results.
Model C: concerning these experimentation, as expected the Model C led to the worst qualitative performances. C performs a recall rate that varies between 0% and 61.54% and an average recall around 35%; The model C can, at best, discover only one alternative at time(in our case there is 2 alternatives attached to the XOR node) .
Model A:The top extracted substructures are more significant than that of model C, and less significant than other models. However when it comes to larger sized databases, results show a dramatic decline in the quality of its sub-structures reaching 0% in terms of precision and recall; which means there is no extracted substructure related to the user expectation.
This limitation, can be explained by the excessive use of control nodes. On large input data, their percentage becomes quite significant leading Subdue algorithm to consider them as important sub-structures.
Model B :The model B performs much better than the previous two models, A and C. In fact, The model B retrieved successfully almost 67% of the BP elements of the target sub-structure. more than two time than model C and between 13 to 66% more than model A.
Comparing model B to model D; In the other side, models B and D led to very similar accuracy
performances. Although, the Model B was able to discover more relevant BP elements than model D (about 10% more), it returned more useless or irrelevant BP elements(around 7%).
labeling the edges lead to specializations of the same abstract workflow template and consequently affects the quality of results returned (decrease recall).
Model D: We can notice a common performance between models D and D1,which distinguish them from other models. Both of them led to a good precision rate. This performance is due to the fact that these two
models do not use control nodes and thereby avoid a negative inference on the results.
On large input data their percentage becomes quite significant leading Subdue algorithm to consider as significant typical sub-structures of the coding scheme of the model rules (decrease Precison).
The results of the first experiment show clearly that the model D1 records the best performances on all the levels without exception.
TP+TN/TP+TN+FP+FN
.We don't care about correctly classifying negative instances, you just don't want too many of them polluting our results.
Model C: concerning these experimentation, as expected the Model C led to the worst qualitative performances. C performs a recall rate that varies between 0% and 61.54% and an average recall around 35%; The model C can, at best, discover only one alternative at time(in our case there is 2 alternatives attached to the XOR node) .
Model A:The top extracted substructures are more significant than that of model C, and less significant than other models. However when it comes to larger sized databases, results show a dramatic decline in the quality of its sub-structures reaching 0% in terms of precision and recall; which means there is no extracted substructure related to the user expectation.
This limitation, can be explained by the excessive use of control nodes. On large input data, their percentage becomes quite significant leading Subdue algorithm to consider them as important sub-structures.
Model B :The model B performs much better than the previous two models, A and C. In fact, The model B retrieved successfully almost 67% of the BP elements of the target sub-structure. more than two time than model C and between 13 to 66% more than model A.
Comparing model B to model D; In the other side, models B and D led to very similar accuracy
performances. Although, the Model B was able to discover more relevant BP elements than model D (about 10% more), it returned more useless or irrelevant BP elements(around 7%).
labeling the edges lead to specializations of the same abstract workflow template and consequently affects the quality of results returned (decrease recall).
Model D: We can notice a common performance between models D and D1,which distinguish them from other models. Both of them led to a good precision rate. This performance is due to the fact that these two
models do not use control nodes and thereby avoid a negative inference on the results.
On large input data their percentage becomes quite significant leading Subdue algorithm to consider as significant typical sub-structures of the coding scheme of the model rules (decrease Precison).
The results of the first experiment show clearly that the model D1 records the best performances on all the levels without exception.
TP+TN/TP+TN+FP+FN
The model A is the most expensive in terms of execution time, around 55 up to 25 more time than model D and D1.
Let compare the other models. Although on the qualitative level, model B performs better than model C model C seems to be far less expensive.
As expected the model D and D1 led to very performances, whereas model D1 performs slightly better.
The results of the second experimentation shows that the use of the semantic filter caused a reduction of in the input date size (bits) 99% which dramatically improved the execution time 36 times less.
Decrease the Disk-space
Decrease the RAM
Decrease the Execution time
Increase the quality of results