This document discusses demos and tools for linking knowledge discovery (KDD) and linked data. It summarizes several tools that integrate linked data and KDD processes like data preprocessing, mining, and postprocessing. OpenRefine, RapidMiner, R, Matlab, ProLOD++, DL-Learner, Spark, KNIME, and Gephi were highlighted as tools that support tasks like enriching data, running SPARQL queries, loading RDF data, and visualizing linked data. The document concludes by asking about gaps and how to increase adoption, noting linked data could benefit KDD with validation, enrichment, and reasoning over semantic web data.
2. Demos and tools: what for?
Papers are one thing…
…but what can I practically do with Linked Data?
We wanted some answers:
How much do Linked Data people know about KDD tools?
What can KDD people do with Linked Data?
3. Demos and tools: what did we do?
We asked the Linked Data community to provide us with
tools
We looked at KDD tools we knew to see if (how) they
integrate Linked Data
Are we missing something? Are we wrong in something?
Tell us here https://goo.gl/DSTAFm
4. What can Linked Data do for KDD?
Preprocessing Mining Postprocessing
Validating
Enriching
Reasoning
Mining
Visualising
Interpreting
Open Refine X
Rapidminer-LD X X
Rapidminer-RMonto X X
R – SPARQL pkg X X
Matlab – SciSPARQL X X
ProLOD++ X X
DL-Learner X
Spark – GraphX&RDF X X
Knime – SPARQL nodes X X
Gephi – SemanticWebImport X X X
Dedalo X
5. Open Refine – RDF extension
Open Refine
tool for working with (messy) data
reconcile, clean, match data
RDF refine[1]
• Reconcile/interlink
• SPARQL endpoints, RDF dumps
• Search the Web for related RDF datasets
• Export RDF
• Use existing vocabularies (auto-completion )
[1] Maali et al. – DERiresearch centre, Ireland
6. Rapidminer – LOD extension
Rapidminer
A tool to perform data mining tasks
Each process is a chain of operators
e.g. CSV import operator, Data Transformation operators, Classification
operators, etc.
Linked Data extension[2]
Enriching data with information from Linked Data (Linkers)
Input Linked Data (SPARQL and Data importers)
Explaining patterns with Linked Data
[2] Paulheim et al. – University of Mannheim
7. Rapidminer – RMonto extension
Rapidminer
A tool to perform data mining tasks
Each process is a chain of operators
e.g. CSV import operator, Data Transformation operators, Classification
operators, etc.
RMonto extension[3]
Loading Data (SPARQL, RDF files)
Data transformation
Pattern Mining
Data extension
[3] Potoniec et al. – University of Poznan
8. - CRAN SPARQL package
R programming language
Statistical computing and graphics
Need to explain more?
SPARQL package[4]
• SPARQL queries (local/endpoints)
• Update data into the triple store
• Retrieve results as data frame for further processing
[4] van Hage et al. -- Synerscope
9. Matlab SPARQL extensions
MATLAB SciSPARQL Link (MSL)[5]
• Client-Server interface
• MATLAB (scientific computing) + SciSPARQL (scientific SPARQL
queries)
• populate, update, and query SSDM databases using SPARQL
queries
MatlabSPARQL
• Run queries against SPARQL endpoints
• Download data as Matlab structures
• Export in CSV format
[5] He – Uppsala University
10. ProLOD++
Profiling and Mining Linked Data[6]
Web platform for Linked Data
Merging heterogeneous sources
Cleansing, preprocessing
Analysis and exploration
Mining and profiling
[6] Abedjan et al. – Hasso Platner Institute, Germany
11. DL-Learner
Owl-based machine learning tool for supervised learning
Supports in constructing knowledge
• Learn definitions for classes
• Find similar instances
• Classify instances
Reasoners adapters (e.g. Fact++, Pellet)
Data import (OWL, N-Triples, SPARQL endpoints)
Command Line interface or Protégé Plugin
[7] Lehmann et al. – University of Leipzig, Germany
12. KNIME
Data analytics platform
Workflows are chain of nodes
KNIME SPARQL Node
• SPARQL queries against endpoints
• Connection between KNIME and Apache Jena
• Results as string tables
13. Gephi – Semantic Web Import
Gephi: graph visualization & exploration
Networks, complex systems
Dynamic and hierarchical graphs
Semantic Web Import
SPARQL queries
Statistics on the imported graph
Graph filtering and cleaning
15. Dedalo
Patterns are explained with knowledge from Linked Data
Machine Learning
positive VS negative obs.
Logic Programming
reasoning upon examples
Linked Data as knowledge Base
Graph Search
clever exploration of the Linked Data graphs
16. Discussion and conclusions
Why are those tools not enough?
What are they missing?
Why KDD people do not use Linked Data more?
What should the Linked Data community do to
make Linked Data more appealing?
Does anybody care about it?
Should we care?