Successfully reported this slideshow.
Your SlideShare is downloading. ×

Semantic data mining: an ontology based approach

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 66 Ad
Advertisement

More Related Content

Viewers also liked (20)

Similar to Semantic data mining: an ontology based approach (20)

Advertisement

Recently uploaded (20)

Advertisement

Semantic data mining: an ontology based approach

  1. 1. Semantic Data Mining: an Ontology Based Approach Agnieszka Lawrynowicz Institute of Computing Science Poznan University of Technology April 12, 2016 Seminar of the Institute of Computing Science Poznan University of Technology Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 1
  2. 2. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 2
  3. 3. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 3
  4. 4. Introduction: data mining Input: a data table, text documents, ... Output: a model, a pattern set DATA$MINING$ Model,$pa0erns$ data$ Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 4
  5. 5. Introduction: using background knowledge in data mining Using background knowledge in data mining has been extensively researched hierarchy/taxonomy of attributes (Michalski et al., 1986, Srikant, Agrawal, 1995) Inductive Logic Programming (Muggleton, 1991, Lavrac and Dzeroski, 1994) relational learning (Quinlan, 1993, de Raedt, 2008) semantic data mining tutorial @ ECML/PKDD’2011 (Lavrac, Vavpetic, Lawrynowicz, Potoniec, Hilario, Kalousis) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 5
  6. 6. Introduction: relational data mining Input: a relational database, a graph, a set of logical facts, ... Output: a model, a pattern set RELATIONAL) DATA)MINING) Model,)pa4erns) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 6
  7. 7. Semantic data mining Input: a data table, text documents, Web pages, a relational database, a graph, a set of logical facts, ... one or more ontologies Output: a model, a pattern set SEMANTIC) DATA)MINING) Model,)pa3erns) Data) Ontologies) annota;ons) mappings) vocabulary)reBuse) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 7
  8. 8. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 8
  9. 9. Ontology in computer science “engineering artefact [...]“ (Guarino 98) “An ontology is a formal specification Á machine interpretation of a shared Á group of people, consensus conceptualization Á abstract model of phenomena, concepts of a domain of interest“ Á domain knowledge (Gruber 93, Studer 98) Ontology = formal specification of a terminological knowledge (most often from a particular domain) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 9
  10. 10. Semantic Web layer cake Stos języków Sieci Semantycznej Języki modelowania ontologii Dane Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 10
  11. 11. Ontologies + data = knowledge graph reviewer1 paper10 metaReviews PeerReviewedPaper MetaReviewer metaReviews reviews RDF RDFS rdf:type rdf:type rdfs:domain rdfs:range rdfs:subPropertyOf rdfs:subClassOf OWL owl:Restric>on rdfs:subClassOf Reviewer rdf:type owl:someValuesFrom owl:onProperty reviewedBy owl:inverseOf Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 11
  12. 12. Logical meaning of OWL Description Logics, DLs = family of first order logic-based formalisms suitable for representing knowledge, especially terminologies, ontologies, underpinning the Web Ontology Language (OWL). Basic building blocks: concepts, roles, constructors, individuals Example TBox Atomic concept: Reviewer, Paper Roles: reviews, metaReviews, reviewedBy Constructors: ⊓, ∃ Axiom (concept definition): PeerReviewedPaper ≡ Paper ⊓ ∃reviewedBy.Reviewer Axiom (concept description ”each meta reviewer is a reviewer”): MetaReviewer ⊑ Reviewer ABox Fact assertion: metaReviews(reviewer1, paper10) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 12
  13. 13. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 13
  14. 14. Overview of meta-learning Meta-learning: learning to learn application of machine learning techniques to meta-data about past machine learning experiments; the goal: to modify some aspect of the learning process to improve the performance of the resulting model; meta-mining: meta-learning applied to full data mining process Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 14
  15. 15. Overview of the e-LICO system (EU FP7 2009-2012) .,+1B/0DF'4;)<'<!=1)/*'.0!<!*1<'1;!'?)BB!0!*1'=/D./*!*1<'/B'1;!'!"#$%&'+0=;)1!=1>0!'G()E>0!'7H'+*?' <;/I<';/I'1;!A')*1!0+=1'1/'+=;)!J!'1;!'><!0K<'L*/I,!?E!'?)<=/J!0A'E/+,F'' 4;!'!"#$%&')*B0+<10>=1>0!'G?!.)=1!?')*'1;!'B)E>0!'>*?!0'1;!'?+<;!?',)*!H')<'1;!'D!+*<'MA'I;)=;'1;!' ?+1+"D)*)*E'.,+1B/0D')<'?!,)J!0!?'1/'<=)!*1)<1<F'4;!')**/J+1)J!'=/0!''/B'1;!'!"#$%&'.,+1B/0D')<'1;!' !"#$%%&'$"#( )&*+,-$./( 0**&*#1"#' G$NOP' +M/J!' 1;!' ?+<;!?' ,)*!H' I)1;' )1<' .,+**!0' +*?' D!1+",!+0*!0F' Q/I!J!0P'1/'?!,)J!0'1;!'?+1+"D)*)*E'.,+1B/0D'1/')1<'<=)!*1)<1'><!0<P'1;!0!'+0!'<!J!0+,'/1;!0'<!0J)=!<' +*?'=/D./*!*1<F'()E>0!'7'<;/I<'+*'/J!0J)!I'/B'!"#$%&R<'=/D./*!*1<'+*?';/I'1;!A')*1!0+=1'I)1;' !+=;'/1;!0F' ' ()E>0!'7F'&J!0J)!I'/B'1;!'!"#$%&'<A<1!DF'' 4;!0!'+0!'1I/'><!0"B+=)*E'=/D./*!*1<'B/0'1;!'!"#$%&'.,+1B/0DS'1;!<!'+,,/I'<=)!*1)<1<'1/'+==!<<'?+1+" D)*)*E' /.!0+1/0<' +*?T/0' /1;!0' ?+1+' .0/=!<<)*E' <!0J)=!<P' 1/' =/D./<!' 1;!D' )*1/' I/0LB,/I<' +*?' !U!=>1!' 1;!DP' =/,,!=1)*E' 1;!' 0!<>,1<' B/0' )*1!0.0!1+1)/*' /0' B>01;!0' +*+,A<)<F' 4;!<!' 1I/' =!*10+,' )*B0+<10>=1>0!'=/D./*!*1<'+0!V'Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 15
  16. 16. Background knowledge: DM OPtimization Ontology Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 16
  17. 17. Data Mining OPtimization Ontology (DMOP) the primary goal of DMOP is to support all decision-making steps that determine the outcome of the data mining process; development started in EU FP7 project e-LICO (2009-2012); DMOP v5.5: 723 classes, 111 properties, 4291 axioms; highly axiomatized; represented in Web Ontology Language (OWL 2); Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 17
  18. 18. Competency questions ”Given a data mining task/data set, which of the valid or applicable workflows/algorithms will yield optimal results (or at least better results than the others)?” ”Given a set of candidate workflows/algorithms for a given task/data set, which data set/workflow/algorithm characteristics should be taken into account in order to select the most appropriate one?” and others more fine-grained, e.g.: ”Which induction algorithms should I use (or avoid) when my dataset has many more variables than instances?” Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 18
  19. 19. Architecture of DMOP knowledge base and its satellite triple stores TBox% DMOP% ABox% Operator%DB% DMEX(DB1%%%%DMEX(DB2%%…%%%DMEX(DBk% OWL2% RDF% Triple% Store% Formal%Conceptual%Framework%% of%Data%Mining%Domain% Accepted%Knowledge%of%DM% Tasks,%Algorithms,%Operators%% Specific%DM%ApplicaFons% Datasets,%Workflows,%Results% MetaHminer’s%training%data% MetaHminer’s%prior%% DM%knowledge% Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 19
  20. 20. The core concepts of DMOP (simplified) Fig. 1. The core concepts of DMOP. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 20
  21. 21. DMOP: algorithm representation Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 21
  22. 22. Alignment of DMOP with DOLCE 1/3 Two main reasons to align DMOP with a foundational ontology: considerations about attributes and data properties; extant non-foundational ontology solutions were partial re-inventions of how they are treated in a foundational ontology; reuse of the ontology’s object properties; Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 22
  23. 23. Alignment of DMOP with DOLCE 2/3 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 23
  24. 24. Alignment of DMOP with DOLCE 3/3 Perdurant: DM-Experiment and DM-Operation are subclasses of dolce:process; Endurant: most DM classes, such as algorithm, software, strategy, task, and optimization problem, are subclasses of dolce:non-physical-endurant; Quality: characteristics and parameters of DM entities made subclasses of dolce:abstract-quality; Abstract: for identifying discrete values, classes added as subclasses of dolce:abstract-region; object properties: DMOP reuses mainly DOLCE’s parthood, quality, and quale relations; each of the four DOLCE main branches have been used. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 24
  25. 25. Qualities and attributes 1/3 How to handle ’attributes’ in OWL ontologies, and, in a broader context, measurements? easy way: attribute is a binary functional relation between a class and a datatype Elephant ⊑ =1 hasWeight.integer Elephant ⊑ =1 hasWeightPrecise.real Elephant ⊑ =1 hasWeightImperial.integer (in lbs) building into one’s ontology application decisions about how to store the data (and in which unit it is) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 25
  26. 26. Qualities and attributes 2/3 How to handle ’attributes’ in OWL ontologies, and, in a broader context, measurements? more elaborate way: unfold the notion of an object’s property (e.g. weight) from one attribute/OWL data property into at least two properties: ▸ one OWL object property from the object to the ’reified attribute’ (“quality property” represented as an OWL class) ▸ and another property to the value(s) favoured in foundational ontologies; solves the problem of non-reusability of the ’attribute’ and prevents duplication of data properties; measurements for DMOP more alike values for parameters; Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 26
  27. 27. Qualities and attributes 3/3 ModelingAlgorithm ⊑ =1 dolce:has-quality.LearningPolicy LearningPolicy ⊑ =1 dolce:has-quale.Eager-Lazy Eager-Lazy ⊑ ≤ 1 hasDataValue.anyType LearningPolicy is a subclass of dolce:quality Eager-Lazy is a subclass of dolce:abstract-region In this way, the ontology can be linked to many different applications, who even may use different data types, yet still agree on the meaning of the characteristics and parameters (’attributes’) of the algorithms, tasks, and other DM endurants. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 27
  28. 28. Meta-modeling in DMOP 1/4 only processes (executions of workflows) and operations (executions of operators) consume inputs and produce outputs DM algorithms (as well as operators and workflows) can only specify the type of input or output inputs and outputs (DM-Dataset and DM-Hypothesis class hierarchy, respectively) are modeled as subclasses of IO-Object class Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 28
  29. 29. Meta-modeling in DMOP 2/4 DM algorithms: classes or individuals? Individuals. Problem: expressing types of inputs/outputs associated with algorithm ”C4.5 specifiesInputClass CategoricalLabeledDataSet” Individual Class (instance of DM-Algorithm) (subclass of DM-Hypothesis) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 29
  30. 30. Meta-modeling in DMOP 3/4 Initial solution: one artificial class per each single algorithm with a single instance corresponding to this particular algorithm Problem: hasInput, hasOutput, specifiesInputClass, specifiesOutputClass—assigned a common range—IO-Object ”C4.5 specifiesInputClass Iris” ? Individual Individual (instance of DM-Algorithm) (instance of DM-Hypothesis) Iris is a concrete dataset. Clearly, any DM algorithm is not designed to handle only a particular dataset. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 30
  31. 31. Meta-modeling in DMOP 4/4 Final solution: weak form of punning available in OWL 2 IO-Class: meta-class—the class of all classes of input and output objects ”C4.5 specifiesInputClass CategoricalLabeledDataSet” Individual Individual (instance of DM-Algorithm) (instance of IO-Class) ”DM-Process hasInput some CategoricalLabeledDataSet” Class Class (subclass of dolce:process) (subclass of IO-Object) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 31
  32. 32. DM method: Fr-ONT-Qu semantic pattern miner Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 32
  33. 33. Data mining as search learning in description logics (DLs) and other relational data can be seen as search in space of concepts / RDF triples / clauses / (conjunctive / SPARQL) queries, ... it is possible to impose ordering on this search space, e.g., using subsumption as natural quasi-order and generality relation between DL concepts ▸ if D ⊑ C then C covers all instances that are covered by D refinement operators may be applied to traverse the space by computing a set of specializations (resp. generalizations) of a concept / RDF triples/ clauses/ (conjunctive / SPARQL) queries, ... Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 33
  34. 34. Properties of refinement operators Consider downward refinement operator ρ and by C ρ D denote a refinement chain from a DL concept C to D complete: each point in lattice is reachable (for D ⊑ C there exists E such that E ≡ D and a refinement chain C ρ ... ρ E weakly complete: for any concept C with C ⊑ ⊺, concept E with E ≡ C can be reached from ⊺ finite: finite for any concept redundant: there exist two different refinement chains from C to D proper: C ρ D implies C /≡ D ideal = complete + proper + finite Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 34
  35. 35. Learning in DLs and in clausal languages is hard Lehmann Hitzler (ILP 2007, MLJ 2010) proved for many DLs and (Nienhuys-Cheng Wolf, 1997) for clausal languages that no ideal refinement operator exists. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 35
  36. 36. Fr-ONT-Qu algorithm for mining patterns in RDF(s) data patterns expressed as SPARQL queries generality relation: taxonomical subsumption consists of: a refinement operator ρ and a strategy to select best patterns for further refinement Example SPARQL query head SELECT ?x WHERE { body ?x rdf:type :Paper . ?x rdf:type :PeerReviewedPaper . ?x :reviewedBy ?y } Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 36
  37. 37. New generality relation: taxonomical subsumption Taxonomically closed pattern A pattern Q is taxonomically closed, or t-closed, w.r.t. the background knowledge G if for each triple of the form (?x rdf:type c) in Q, Q also contains the transitive closure of (?x rdf:type c) w.r.t. G, and for each triple of the form (?x p ?y) that appears in the pattern Q, Q also contains the transitive closure of (?x p ?y) w.r.t. G. Taxonomical subsumption Given two patterns Q1 and Q2 over ρdf dataset G, and their t-closures Q1 t and Q2 t respectively, Q1 taxonomically subsumes (t-subsumes) Q2 iff there exists a mapping σ such that a set of triple patterns and FILTER expressions from σ(body(Q1 t )) is a subset of a set of triple patterns and FILTER expressions from body(Q2 t ). Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 37
  38. 38. Input of the algorithm a declarative bias (B) to limit a search space (i.e. classes and properties to use) and maximal number of iterations 2 thresholds: for keeping good enough patterns and for refining best patterns choice from several quality measures to select for thresholds (e.g. support on knowledge base) beam search size Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 38
  39. 39. Example B: classes: PeerReviewedPaper, JournalPaper, property: reviewedBy 1 Refine every pattern from the previous iteration by adding a single restriction for a variable already existing in the pattern. E.g. for patern {?x rdf:type :Paper.}, its refinements are: ▸ {?x rdf:type :Paper . ?x rdf:type :PeerReviewedPaper .} ▸ {?x rdf:type :Paper . ?x rdf:type :JournalPaper . } ▸ {?x rdf:type :Paper . ?x :reviewedBy ?y} 2 Evaluate patterns (with some quality measure as support on a data set) and select only the best ones 3 Repeat steps 1-2 as long as there are patterns for refinement and maximal number of iterations is not exceeded Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 39
  40. 40. Refinement operator ρ: uses trie data structure ρ: (locally) finite and complete Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 40
  41. 41. Pattern based classification 1/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 41
  42. 42. Pattern based classification 2/2 We learn features that are optimized with regard to the (classification) task Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 42
  43. 43. Propositionalisation 1/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 43
  44. 44. Propositionalisation 2/2 In this way, learned features may be consumed by any out-of-the-shelf ’attribute-value’ classification algorithm Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 44
  45. 45. Comparative experiments on classification of semantic data 1/2 we considered published work with available results and datasets (including ESWC 2008 best paper, ESWC 2012 best paper) various types of methods: kernel methods, statistical relational classifier, concept learning algorithms we strictly followed the tasks, protocols and experimental setups of the methods Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 45
  46. 46. Comparative experiments on classification of semantic data 2/2 For classification task Fr-ONT-Qu outperformed state-of-art approaches to classification of Semantic Web data (see: ”Pattern based feature construction in semantic data mining” by A. Lawrynowicz, J. Potoniec, IJSWIS 10(1), 2014): kernel methods Bloehdorn et al. (2007), Loesch et al. (ESWC 2012 best paper), statistical relational classifier SPARQL-ML by Kiefer et al (ESWC 2008 best paper), concept learning algorithms DL-FOIL by Fanizzi et al (2008), DL-Learner cutting-edge CELOE variant by Lehmann (2009) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 46
  47. 47. What is RapidMiner? 1/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 47
  48. 48. What is RapidMiner? 2/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 48
  49. 49. RapidMiner XML based workflow representation Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 49
  50. 50. Creating (meta-)dataset for meta-mining DMOP-based repository of DM processes (DMEX-DB) Dataset for training meta-miner 85 mln RDF triples Baseline DM experiment set 1581 RapidMiner executed workflows Baseline datasets 11 UCI datasets Data Characters6cs Tool (DCT) DMOP ontology Transforma6on to RDF Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 50
  51. 51. Propositionalisation Workflow pa*erns Dataset DMOP-based RDF repository of DM processes ?opex2!dmop:hasParameterSetting ?front1.! ?front0!dmop:executes rm:DM-Operator .! ?front0!dmop:implements ?front2 .!!! ?front2 a dmop:DM-Algorithm . ?front2 a dmop:InductionAlgorithm .!!! ?front2 a dmop:ModelingAlgorithm .!!! ?front2 a dmop:ClassificationModelingAlgorithm .!!! ?front2 a dmop:ClassificationTreeInductionAlgorithm .!}! was mined when Fr-ONT-Qu traversed down the algorithm classes hierarchy specializing variable ?front2. In this way, it is possible to abstract from the level of operators (algorithm implementations) to the level of algorithms and their taxonomy. For instance, both rm:RM- Decision_Tree and weka:Weka-J48 operators implement a classification tree induction algorithm and one may generalize over it. The patterns containing class hierarchies provide similar expressivity to this of patterns mined in so-called generalized association rule mining. The following pattern covers only those workflows that contain ‘Decision Tree’ operator, for which the parameter minimal size for split has value between 2 and 5.5: Q2 = select distinct ?x where { Bd ∪ ?opex2!dmop:executes ?front0 .! ?opex2!dmop:executes rm:RM-Decision_Tree .! ?opex2!dmop:hasParameterSetting ?front1.! ?front0!dmop:executes rm:DM-Operator .! ?front1!dmop:setsValueOf ?front2.! ?front1!dmop:hasValue ?front3.! filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 16.000000) . ?front2!dmop:hasParameterKey 'minimal_size_for_split'.! ?front1!dmop:hasValue ?front3.! filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 9.000000) . ?front1!dmop:hasValue ?front3.! filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 5.500000) . } Dataset characteris3cs … Features Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 51
  52. 52. Semantic meta-mining results McNemar’s test for pairs of classifiers performed with the null hypothesis that a classifier built using dataset characteristics and a mined pattern set has the same error rate as the baseline that used dataset characteristics and only the names of the machine learning DM operators Test confirmed that classifiers trained using workflow patterns performed significantly better (in terms of accuracy) than the baseline Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 52
  53. 53. Sharing: Standardization of DM/ML schemas Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 53
  54. 54. Evolution of the field of DM/ML ontologies 20092008 2011 2012 OntoDM 20142008 DMOP ontologies/vocabularies events Experiment Databases platform 2010 ExposéDMWF Data Mining Ontology Jamboree (Slovenia) 2015 MEX OpenML 2016 (Netherlands) W3C Machine Learning Schema Community Group OpenML platform 2016 ML Schema Core 2013 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 54
  55. 55. OntoDM Pance Panov, Larisa N. Soldatova, Saso Dzeroski: Ontology of core data mining entities. Data Min. Knowl. Discov. 28(5-6): 1222-1265 (2014) built in compliance to upper level ontologies BFO, OBI, IAO, modularized incorporates structured data mining Use case: generic, middle level ontology for ML; representing QSAR entities for drug design, used by Eve Robot Scientist Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 55
  56. 56. DMOP: Data Mining Optimization Ontology C. Maria Keet, Agnieszka Lawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong Nguyen, Raul Palma, Robert Stevens, Melanie Hilario: The Data Mining OPtimization Ontology. J. Web Sem. 32: 43-53 (2015) development started in e-LICO EU FP7 project (2009-2012) detailed algorithm internal characteristics (’qualities’) Use case: meta-learning (’whitebox’), meta-mining, used to produce Intelligent Discovery Assistant for RapidMiner Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 56
  57. 57. Expos´e Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, Geoffrey Holmes: Experiment databases - A new way to share, organize and learn from experiments. Machine Learning 87(2): 127-158 (2012) re-uses OntoDM (at top-level) and DMOP (at bottom level) superseded by OpenML DB schema Use case: experiment databases, ExpML markup Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 57
  58. 58. Early work towards aligning DM/ML ontologies (2010) DMO Ontology Jamboree, Josef Stefan Institute, Slovenia Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 58
  59. 59. MEX vocabulary Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, Jens Lehmann: MEX vocabulary: a lightweight interchange format for machine learning experiments. SEMANTICS 2015: 169-176 lightweight interchange format maps to PROV Use case: annotating ML experiments and interchanging ML metadata Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 59
  60. 60. How to make existing DM/ML ontologies compatible? Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 60
  61. 61. W3C Machine Learning Schema Community Group (2015) https://www.w3.org/community/ml-schema/ Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 61
  62. 62. OpenML, Lorentz Center, Netherlands (2016) First draft of ML Schema Core https://github.com/ML-Schema/core Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 62
  63. 63. Sharing beyond DM/ML domain Mapping DMOP to workflow ontologies (Research Objects, OPMW) (ROHub hosted by Poznan Supercomputing and Networking Center) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 63
  64. 64. Semantic data mining: more information Semantic data mining tutorial @ ECML/PKDD’2011 http://videolectures.net/ecmlpkdd2011_lavrac_vavpetic_mining/ peculiarities of the learning setting: Open World Assumption, what is a ”truly semantic” similarity measure?, ... methods, applications, tools Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 64
  65. 65. Summary semantic data mining: data mining with ontologies as background/prior knowledge, most often from structured data ontologies best if engineered with uses cases in mind learning in description logics and clausal languages is hard; heuristics, dealing with peculiarities Fr-ONT-Qu semantic pattern mining algorithm: theorethical properties, practical evaluation use case: semantic meta-mining for constructing Intelligent Data Mining Assistant importance of interoperability (for scientific reproducibility, for inter-domain applications) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 65
  66. 66. Acknowledgements Polish National Science Center under the SONATA program ”ARISTOTELES: Methodology and algorithms for automatic revision of ontologies in task based scenarios” (2014/13/D/ST6/02076) (2015-2018) Foundation for Polish Science under the POMOST programme, cofinanced from European Union, Regional Development Fund (POMOST/2013-7/8) (2013-2015) EU FP7 ICT-2007.4.4 (231519) ”e-LICO: An e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Science” (2009-2012) Fr-ONT-Qu, meta-mining experiments done jointly with Jedrzej Potoniec Contributors to the development of DMOP and/or other e-LICO infrastructure used in the research described in this presentation: Melanie Hilario, C. Maria Keet, Claudia d’Amato, Huyen Do, Simon Fischer, Dragan Gamberger, Lina Al-Jadir, Simon Jupp, Alexandros Kalousis, Joerg Uwe-Kietz, Petra Kralj Novak, Babak Mougouie, Phong Nguyen, Raul Palma, Floarea Serban, Robert Stevens, Anze Vavpetic, Jun Wang, Derry Wijaya, Adam Woznica Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 66

×