Full paper: http://boole.diiga.univpm.it/paper/sac2012.pdf
In many experimental domains, especially e-Science, workﬂow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workﬂows is generated, that formally describes experimental protocols and the way diﬀerent tools are combined inside experiments.
In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workﬂows from a repository. Since sub-workﬂows represent signiﬁcant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or
to retrieve and reuse workﬂows. Such a knowledge, ultimately, leverages the potential of scientiﬁc workﬂow repositories to become a knowledge-asset. A set of experiments
is conducted on the
myExperiment repository to assess the
eﬀectiveness of the approach.