SlideShare a Scribd company logo
TopicMapsforAssociation Rule Mining TomášKliegr, Jan Zemánek,  Marek Ovečka Department ofInformationandKnowledgeEngineering FacultyofInformaticsandStatistics University ofEconomics, Prague
Data Mining using CRISP-DM The goal of data mining is to obtain useful non-trivial patterns from the data. Analytical Report
Common data mining tasks Sex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad) Association rules Clustering Classification
Association Rule Mining EXAMPLE Unlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasures Duration(2y+)andDistrict(Prague)=>Loan Quality(good) THE QUEST FOR TOPIC MAPS Antecedent Consequent THE PROBLEM WITH INTEREST MEASURES Itisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired,  Selectthereallyinterestingrulesfromtherulesoutputautomatically. Help searchingthroughtheresults.
Thequest More precise tasks 	or Automatic rule filtering The lingua franca for exchange of data mining models is PMML
Predictive Modeling Markup Language XML Schema PMML is the leading standard for statistical and data mining models Supported by over 20 vendors and organizations Covers the technical part of the CRISP-DM Cycle http://www.dmg.org/pmml_examples/index.html
PMML is “just” an XML Schema Developed for deploying mining models  Good for migration from one data mining environment to another But: No explicit links between nodes Verbose Self-contained. Lacks support for Interlinking multiple PMML documents Interlinking PMML with other information
Association Rule Mining Ontology The ontology is a „semantization“ of PMML XML Schema DESIGN GUIDELINES Thekey design principlewas to alloweasytransformation of data from PMML to AROn SCOPE The ontology is limited to thesubsetof PMML relevant to association rule mining.  60 topictypes, 50 associationtypesand 20 occurencetypes USE No automatictransformationisyetavailable, butwe are  working on oneusing OKS framework. Currently, data can be input usingOntopoly.
xs:element ismapped to topic type Topics are assignedsamenames as PMML Nodes Butrespectingspacesbetweenwordsandcapitalization Superclasses are introducedforsemanticallysimilar XML Nodes Namedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparent Ifan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifier Design guidelines: Elements
Design guidelines: Attributes Enumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS) Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociations Otherattributesbecomeoccurencetypes
Design guidelines: Associations Names for association types are arbitrarily chosen so that they are most descriptive Introduce less rather than more associations  minimizes the effort when populating the ontology from PMML Avoid unnecessary inflation of the topic map Link only the semantically closest topics Additional „soft“relations can be introduced  with inference statementsorderivedwithtolog
Design guidelines: Role types Topictypesused to map PMML elements are used as role types Unless multiple topics are permitted in  associationend. In that case superclassisused as a role type, or a new role type isintroduced
Twoalternativeassociation rule representations ,[object Object],(Item-Itemset) ,[object Object],(BooleanAttributes)
Ongoingwork Support for background knowledge „alreadyknownassociationrules“ Support forschemamapping „linkingof background knowledgewithminingresults“ Already in the ontology, distinguished by base ofsubjectidentifier SchemaMapping http://keg.vse.cz/sma/XXX Background Knowledge http://keg.vse.cz/bko/xxx
Data Mining Use case PREDICT LOAN QUALITY Findclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan. BASED ON PAST RECORDS    Input data: records on alreadygivenloans
The data 6181 clients in the PKDD’99 financial dataset Data were preprocessed, i.e.
…. And perhaps 9997 otherassociationrules Preprocessed data Association Rule Learner
WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT ASK CLIENT WHAT HE KNOWS Ifloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality. 				…background knowledge
Semantizetheresults
Formalize Background Knowledge
SchemaMapping Background knowledge can use different “vocabulary” than the data  If we are to use background knowledge in querying, we need to interlink them with data. The same approach would apply if we interlink several mining models (PMMLs)
DeletinginformationwithTopicMaps Find association rules that subsume background knowledge Visualizationof a tologquery
Summary Methodology for transferring XML Schema to Topic Maps Association Rule Mining Ontology based on PMML Easily extensible to other data mining algorithms Initial attempts to formalize background knowledge Initial attempts to use Topic Maps for schema mapping AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr

More Related Content

Viewers also liked

TMCL Edit
TMCL EditTMCL Edit
TMCL Edit
tmra
 
Topic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared VocabulariesTopic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared Vocabularies
tmra
 
tolog - a topic maps query language
tolog - a topic maps query languagetolog - a topic maps query language
tolog - a topic maps query language
tmra
 
Creating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space ExperimentsCreating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space Experiments
tmra
 
A step towards TMDM 3.0
A step towards TMDM 3.0A step towards TMDM 3.0
A step towards TMDM 3.0
tmra
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
tmra
 
idSpace
idSpaceidSpace
idSpace
tmra
 
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic MapsSocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
tmra
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environments
tmra
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
tmra
 

Viewers also liked (13)

TMCL Edit
TMCL EditTMCL Edit
TMCL Edit
 
Topic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared VocabulariesTopic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared Vocabularies
 
tolog - a topic maps query language
tolog - a topic maps query languagetolog - a topic maps query language
tolog - a topic maps query language
 
Creating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space ExperimentsCreating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space Experiments
 
A step towards TMDM 3.0
A step towards TMDM 3.0A step towards TMDM 3.0
A step towards TMDM 3.0
 
interchangeability
interchangeabilityinterchangeability
interchangeability
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
idSpace
idSpaceidSpace
idSpace
 
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic MapsSocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environments
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
HStrategies
HStrategiesHStrategies
HStrategies
 
vbhc
vbhcvbhc
vbhc
 

Similar to Topic Maps for Association Rule Mining

Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
DheerajPachauri
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedJohannes Hoppe
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
IEEEFINALYEARPROJECTS
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
NeeleEilers
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
Jenny Liu
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
Jordan Open Source Association
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
Learning to rank image tags with limited training examples
Learning to rank image tags with limited training examplesLearning to rank image tags with limited training examples
Learning to rank image tags with limited training examples
CloudTechnologies
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
hyunsung lee
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web: butest
 
(Talk in Powerpoint Format)
(Talk in Powerpoint Format)(Talk in Powerpoint Format)
(Talk in Powerpoint Format)butest
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
ActiveEon
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
MohammadMoreb
 
Clustering Algorithms.pptx
Clustering Algorithms.pptxClustering Algorithms.pptx
Clustering Algorithms.pptx
Issra'a Almgoter
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
Activeeon
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
Infrrd
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
Blerina Spahiu
 

Similar to Topic Maps for Association Rule Mining (20)

Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Learning to rank image tags with limited training examples
Learning to rank image tags with limited training examplesLearning to rank image tags with limited training examples
Learning to rank image tags with limited training examples
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web:
 
(Talk in Powerpoint Format)
(Talk in Powerpoint Format)(Talk in Powerpoint Format)
(Talk in Powerpoint Format)
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
 
Clustering Algorithms.pptx
Clustering Algorithms.pptxClustering Algorithms.pptx
Clustering Algorithms.pptx
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
 

More from tmra

Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...
tmra
 
External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Database
tmra
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brntmra
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
tmra
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
tmra
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federation
tmra
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Maps
tmra
 
Hatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map MergingHatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map Merging
tmra
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapstmra
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorer
tmra
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuurapostertmra
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge management
tmra
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010tmra
 
Presentation final
Presentation finalPresentation final
Presentation finaltmra
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontology
tmra
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
tmra
 
Mappe1
Mappe1Mappe1
Mappe1tmra
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semantics
tmra
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integration
tmra
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Framework
tmra
 

More from tmra (20)

Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...
 
External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Database
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federation
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Maps
 
Hatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map MergingHatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map Merging
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorer
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge management
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontology
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
 
Mappe1
Mappe1Mappe1
Mappe1
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semantics
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integration
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Framework
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Topic Maps for Association Rule Mining

  • 1. TopicMapsforAssociation Rule Mining TomášKliegr, Jan Zemánek, Marek Ovečka Department ofInformationandKnowledgeEngineering FacultyofInformaticsandStatistics University ofEconomics, Prague
  • 2. Data Mining using CRISP-DM The goal of data mining is to obtain useful non-trivial patterns from the data. Analytical Report
  • 3. Common data mining tasks Sex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad) Association rules Clustering Classification
  • 4. Association Rule Mining EXAMPLE Unlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasures Duration(2y+)andDistrict(Prague)=>Loan Quality(good) THE QUEST FOR TOPIC MAPS Antecedent Consequent THE PROBLEM WITH INTEREST MEASURES Itisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired, Selectthereallyinterestingrulesfromtherulesoutputautomatically. Help searchingthroughtheresults.
  • 5. Thequest More precise tasks or Automatic rule filtering The lingua franca for exchange of data mining models is PMML
  • 6. Predictive Modeling Markup Language XML Schema PMML is the leading standard for statistical and data mining models Supported by over 20 vendors and organizations Covers the technical part of the CRISP-DM Cycle http://www.dmg.org/pmml_examples/index.html
  • 7. PMML is “just” an XML Schema Developed for deploying mining models Good for migration from one data mining environment to another But: No explicit links between nodes Verbose Self-contained. Lacks support for Interlinking multiple PMML documents Interlinking PMML with other information
  • 8. Association Rule Mining Ontology The ontology is a „semantization“ of PMML XML Schema DESIGN GUIDELINES Thekey design principlewas to alloweasytransformation of data from PMML to AROn SCOPE The ontology is limited to thesubsetof PMML relevant to association rule mining. 60 topictypes, 50 associationtypesand 20 occurencetypes USE No automatictransformationisyetavailable, butwe are working on oneusing OKS framework. Currently, data can be input usingOntopoly.
  • 9. xs:element ismapped to topic type Topics are assignedsamenames as PMML Nodes Butrespectingspacesbetweenwordsandcapitalization Superclasses are introducedforsemanticallysimilar XML Nodes Namedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparent Ifan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifier Design guidelines: Elements
  • 10. Design guidelines: Attributes Enumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS) Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociations Otherattributesbecomeoccurencetypes
  • 11. Design guidelines: Associations Names for association types are arbitrarily chosen so that they are most descriptive Introduce less rather than more associations minimizes the effort when populating the ontology from PMML Avoid unnecessary inflation of the topic map Link only the semantically closest topics Additional „soft“relations can be introduced with inference statementsorderivedwithtolog
  • 12. Design guidelines: Role types Topictypesused to map PMML elements are used as role types Unless multiple topics are permitted in associationend. In that case superclassisused as a role type, or a new role type isintroduced
  • 13.
  • 14. Ongoingwork Support for background knowledge „alreadyknownassociationrules“ Support forschemamapping „linkingof background knowledgewithminingresults“ Already in the ontology, distinguished by base ofsubjectidentifier SchemaMapping http://keg.vse.cz/sma/XXX Background Knowledge http://keg.vse.cz/bko/xxx
  • 15. Data Mining Use case PREDICT LOAN QUALITY Findclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan. BASED ON PAST RECORDS Input data: records on alreadygivenloans
  • 16. The data 6181 clients in the PKDD’99 financial dataset Data were preprocessed, i.e.
  • 17. …. And perhaps 9997 otherassociationrules Preprocessed data Association Rule Learner
  • 18. WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT ASK CLIENT WHAT HE KNOWS Ifloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality. …background knowledge
  • 21. SchemaMapping Background knowledge can use different “vocabulary” than the data If we are to use background knowledge in querying, we need to interlink them with data. The same approach would apply if we interlink several mining models (PMMLs)
  • 22. DeletinginformationwithTopicMaps Find association rules that subsume background knowledge Visualizationof a tologquery
  • 23. Summary Methodology for transferring XML Schema to Topic Maps Association Rule Mining Ontology based on PMML Easily extensible to other data mining algorithms Initial attempts to formalize background knowledge Initial attempts to use Topic Maps for schema mapping AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr