Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- MIning Software Repositories (MSR) ... by Ahmed Lamkanfi 1083 views
- Mining Software Repositories by Israel Herraiz 5939 views
- OOW Unconference 2010: Mining the A... by karlarao 2872 views
- Software bug prediction by Muthukumaran Kasi... 4497 views
- Data mining slides by smj 40352 views
- Data mining (lecture 1 & 2) conecpt... by Saif Ullah 32216 views

305 views

255 views

255 views

Published on

In many experimental domains, especially e-Science, workﬂow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workﬂows is generated, that formally describes experimental protocols and the way diﬀerent tools are combined inside experiments.

In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workﬂows from a repository. Since sub-workﬂows represent signiﬁcant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or

to retrieve and reuse workﬂows. Such a knowledge, ultimately, leverages the potential of scientiﬁc workﬂow repositories to become a knowledge-asset. A set of experiments

is conducted on the

myExperiment repository to assess the

eﬀectiveness of the approach.

Published in:
Technology

No Downloads

Total views

305

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

5

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Mining Usage Patterns from a Repository of Scientiﬁc Workﬂows Claudia Diamantini, Domenico Potena, Emanuele Storti ´ DII, Universita Politecnica delle Marche, Ancona, Italy SAC 2012, Riva del Garda, March 26-30Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 1 / 21
- 2. IntroductionProcess mining techniques allow for extracting knowledge from eventlogs, i.e. traces of running processes: audit trails of a workﬂow management system transaction logs of an enterprise resource planning systemMain applications: process discovery: what is really happening? conformance check: are we doing what was agreed upon? process extension: how can we redesign the process? Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
- 3. IntroductionProcess mining techniques allow for extracting knowledge from eventlogs, i.e. traces of running processes: audit trails of a workﬂow management system transaction logs of an enterprise resource planning systemMain applications: process discovery: what is really happening? conformance check: are we doing what was agreed upon? process extension: how can we redesign the process? Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
- 4. IntroductionProcess mining techniques allow for extracting knowledge from eventlogs, i.e. traces of running processes: audit trails of a workﬂow management system transaction logs of an enterprise resource planning systemMain applications: process discovery: what is really happening? conformance check: are we doing what was agreed upon? process extension: how can we redesign the process? Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
- 5. MotivationLittle work about how to extract knowledge from a set of models (i.e.:process schemas) understanding of common patterns of usage (best/worst practices) support in process design process integration (from multiple sources) process optimization/re-engineering indexing of processes and their retrieval Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 3 / 21
- 6. MethodologyProcesses schemas have an inherent graph structure: manygraph-mining tools availableApproach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBsWhich representation form?Which speciﬁc technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspeciﬁcity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
- 7. MethodologyProcesses schemas have an inherent graph structure: manygraph-mining tools availableApproach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBsWhich representation form?Which speciﬁc technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspeciﬁcity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
- 8. MethodologyProcesses schemas have an inherent graph structure: manygraph-mining tools availableApproach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBsWhich representation form?Which speciﬁc technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspeciﬁcity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
- 9. MethodologyProcesses schemas have an inherent graph structure: manygraph-mining tools availableApproach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBsWhich representation form?Which speciﬁc technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspeciﬁcity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
- 10. SUBDUESUBDUE [Joyner, 2000] graph-based hierarchical clustering algorithm suited for discrete-valued and structured data searches for substructures (i.e., subgraphs) that best compress the input graph, according to MDLComments: to compress G with a SUB: to replace every occurrence of SUB in G with a single new node SUB best compresses G: num. of bits needed to represent G after the compression is minimum Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
- 11. SUBDUESUBDUE [Joyner, 2000] graph-based hierarchical clustering algorithm suited for discrete-valued and structured data searches for substructures (i.e., subgraphs) that best compress the input graph, according to MDLComments: to compress G with a SUB: to replace every occurrence of SUB in G with a single new node SUB best compresses G: num. of bits needed to represent G after the compression is minimum Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
- 12. SUBDUESUBDUE [Joyner, 2000] graph-based hierarchical clustering algorithm suited for discrete-valued and structured data searches for substructures (i.e., subgraphs) that best compress the input graph, according to MDLComments: to compress G with a SUB: to replace every occurrence of SUB in G with a single new node SUB best compresses G: num. of bits needed to represent G after the compression is minimum Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
- 13. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 14. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 15. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 16. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 17. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 18. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 19. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁnedSUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
- 20. SUBDUE (cont’d)Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistanceDesirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better deﬁned concepts)New indexes completeness: % of original nodes/edges still present in the ﬁnal lattice representativeness (of a SUB): % of input graphs holding the SUB at least once signiﬁcance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
- 21. SUBDUE (cont’d)Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistanceDesirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better deﬁned concepts)New indexes completeness: % of original nodes/edges still present in the ﬁnal lattice representativeness (of a SUB): % of input graphs holding the SUB at least once signiﬁcance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
- 22. SUBDUE (cont’d)Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistanceDesirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better deﬁned concepts)New indexes completeness: % of original nodes/edges still present in the ﬁnal lattice representativeness (of a SUB): % of input graphs holding the SUB at least once signiﬁcance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
- 23. SUBDUE (cont’d)Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistanceDesirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better deﬁned concepts)New indexes completeness: % of original nodes/edges still present in the ﬁnal lattice representativeness (of a SUB): % of input graphs holding the SUB at least once signiﬁcance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
- 24. RepresentationApproach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBsCommon operators: sequence split-AND (parallel split), split-XOR (exclusive choice) join-AND (synchronization), join-XOR (simple merge)How to map the process to the graph? Several choices Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 8 / 21
- 25. Representation models: A rule lowest level of compactness every operator is represented by a node called operator, linked to another node specifying its type (AND/OR/SEQ) and to its operands join/split are distinguishable by the number of in/out-going arcs Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 9 / 21
- 26. Representation models: B rule medium level of compactness the closest to the original representation join/split are replaced by different nodes, one for each kind of operator Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 10 / 21
- 27. Representation models: C rule highest level of compactness implicit representation of operators multiple alternative graphs in case of SPLIT-XOR or JOIN-XOR arcs are labeled to preserve information about domain/range nodes Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 11 / 21
- 28. EvaluationComments the three representations hold the same information, with no loss easy extension with more attributes (actor name/role, info about time/resources, ...): attribute name → edge value → a nodeNeed of experimentations to assess which one performs best w.r.t.the quality of clustering. Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
- 29. EvaluationComments the three representations hold the same information, with no loss easy extension with more attributes (actor name/role, info about time/resources, ...): attribute name → edge value → a nodeNeed of experimentations to assess which one performs best w.r.t.the quality of clustering. Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
- 30. EvaluationComments the three representations hold the same information, with no loss easy extension with more attributes (actor name/role, info about time/resources, ...): attribute name → edge value → a nodeNeed of experimentations to assess which one performs best w.r.t.the quality of clustering. Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
- 31. my ExperimentDataset: 258 processes (XScuﬂ language for Taverna framework) e-Science WF for distributed computation over scientiﬁc data applications: local tools, Web Services, scriptsSetting: WF extraction/parsing preprocessing translation into graphs (A,B,C rules) SUBDUE execution Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 13 / 21
- 32. Experimental resultsOutput lattice (fragment)Red boxes = top level SUBs Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 14 / 21
- 33. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 34. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 35. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 36. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 37. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 38. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 39. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 40. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (ﬁrst 10 top SUBs) 99.22% 31.78% 23.26% Signiﬁcance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03Considerations: A model: low signiﬁcance, high execution time No deﬁnitively best choice between B and C: B model: higher representativeness → indexing C model: higher signiﬁcance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
- 41. Use CaseScenario: a scientist wants to use web service X for DNA alignmentBaseline: browse my Experiment to ﬁnd every workﬂow with X: too many results given a workﬂow: common practice or exception?New approach: search for X in the lattice’s SUBs: 1 ﬁnd all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
- 42. Use CaseScenario: a scientist wants to use web service X for DNA alignmentBaseline: browse my Experiment to ﬁnd every workﬂow with X: too many results given a workﬂow: common practice or exception?New approach: search for X in the lattice’s SUBs: 1 ﬁnd all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
- 43. Use CaseScenario: a scientist wants to use web service X for DNA alignmentBaseline: browse my Experiment to ﬁnd every workﬂow with X: too many results given a workﬂow: common practice or exception?New approach: search for X in the lattice’s SUBs: 1 ﬁnd all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
- 44. Use CaseScenario: a scientist wants to use web service X for DNA alignmentBaseline: browse my Experiment to ﬁnd every workﬂow with X: too many results given a workﬂow: common practice or exception?New approach: search for X in the lattice’s SUBs: 1 ﬁnd all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
- 45. Use CaseScenario: a scientist wants to use web service X for DNA alignmentBaseline: browse my Experiment to ﬁnd every workﬂow with X: too many results given a workﬂow: common practice or exception?New approach: search for X in the lattice’s SUBs: 1 ﬁnd all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
- 46. Use CaseScenario: a scientist wants to use web service X for DNA alignmentBaseline: browse my Experiment to ﬁnd every workﬂow with X: too many results given a workﬂow: common practice or exception?New approach: search for X in the lattice’s SUBs: 1 ﬁnd all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
- 47. Use case (cont’d)Applications: reference during process design (learn-by-example) process retrieval: ﬁnd every my Experiment WF containing such a pattern Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 17 / 21
- 48. Use case (cont’d)Applications: reference during process design (learn-by-example) process retrieval: ﬁnd every my Experiment WF containing such a pattern Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 17 / 21
- 49. DiscussionA graph-based clustering technique (SUBDUE) to recognize the mostfrequent/common subprocesses among a set of workﬂows/processschemas.Comments several representation alternatives for application of SUBDUE evaluations on specialized e-Science domain useful to ﬁnd typical patterns and schemas of usage for tools/tasks Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 18 / 21
- 50. ConclusionApplications: recognition of reference processes (common/best/worst practices) organize a process repository (by indexing processes through substructures) enterprise integration: ﬁnd similarities (differences, overlapping, complementariness) in BP in different companiesFuture work: extending experimentations, especially with (real) Business Processes management of heterogeneity (syntactic/semantic, level of granularity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 19 / 21
- 51. Mining Usage Patterns from a Repository of Scientiﬁc Workﬂows Claudia Diamantini, Domenico Potena, Emanuele Storti ´ DII, Universita Politecnica delle Marche, Ancona, Italy SAC 2012, Riva del Garda, March 26-30Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 20 / 21
- 52. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 53. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 54. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 55. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 56. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 57. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 58. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 59. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints 2 Compress the graph with the best SUB Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 60. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints 2 Compress the graph with the best SUB Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
- 61. SUBDUESubdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints 2 Compress the graph with the best SUBSubsequent SUBs may be deﬁned in terms of previously deﬁned SUBsIterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment