LD4KD2015
Linked Data 4 Knowledge Discovery
Demos and tools
Demos and tools: what for?
Papers are one thing…
…but what can I practically do with Linked Data?
We wanted some answers:
How much do Linked Data people know about KDD tools?
What can KDD people do with Linked Data?
Demos and tools: what did we do?
We asked the Linked Data community to provide us with
tools
We looked at KDD tools we knew to see if (how) they
integrate Linked Data
Are we missing something? Are we wrong in something?
Tell us here  https://goo.gl/DSTAFm
What can Linked Data do for KDD?
Preprocessing Mining Postprocessing
Validating
Enriching
Reasoning
Mining
Visualising
Interpreting
Open Refine X
Rapidminer-LD X X
Rapidminer-RMonto X X
R – SPARQL pkg X X
Matlab – SciSPARQL X X
ProLOD++ X X
DL-Learner X
Spark – GraphX&RDF X X
Knime – SPARQL nodes X X
Gephi – SemanticWebImport X X X
Dedalo X
Open Refine – RDF extension
Open Refine
tool for working with (messy) data
reconcile, clean, match data
RDF refine[1]
• Reconcile/interlink
• SPARQL endpoints, RDF dumps
• Search the Web for related RDF datasets
• Export RDF
• Use existing vocabularies (auto-completion )
[1] Maali et al. – DERiresearch centre, Ireland
Rapidminer – LOD extension
Rapidminer
A tool to perform data mining tasks
Each process is a chain of operators
e.g. CSV import operator, Data Transformation operators, Classification
operators, etc.
Linked Data extension[2]
Enriching data with information from Linked Data (Linkers)
Input Linked Data (SPARQL and Data importers)
Explaining patterns with Linked Data
[2] Paulheim et al. – University of Mannheim
Rapidminer – RMonto extension
Rapidminer
A tool to perform data mining tasks
Each process is a chain of operators
e.g. CSV import operator, Data Transformation operators, Classification
operators, etc.
RMonto extension[3]
Loading Data (SPARQL, RDF files)
Data transformation
Pattern Mining
Data extension
[3] Potoniec et al. – University of Poznan
- CRAN SPARQL package
R programming language
Statistical computing and graphics
Need to explain more? 
SPARQL package[4]
• SPARQL queries (local/endpoints)
• Update data into the triple store
• Retrieve results as data frame for further processing
[4] van Hage et al. -- Synerscope
Matlab SPARQL extensions
MATLAB SciSPARQL Link (MSL)[5]
• Client-Server interface
• MATLAB (scientific computing) + SciSPARQL (scientific SPARQL
queries)
• populate, update, and query SSDM databases using SPARQL
queries
MatlabSPARQL
• Run queries against SPARQL endpoints
• Download data as Matlab structures
• Export in CSV format
[5] He – Uppsala University
ProLOD++
Profiling and Mining Linked Data[6]
Web platform for Linked Data
Merging heterogeneous sources
Cleansing, preprocessing
Analysis and exploration
Mining and profiling
[6] Abedjan et al. – Hasso Platner Institute, Germany
DL-Learner
Owl-based machine learning tool for supervised learning
Supports in constructing knowledge
• Learn definitions for classes
• Find similar instances
• Classify instances
Reasoners adapters (e.g. Fact++, Pellet)
Data import (OWL, N-Triples, SPARQL endpoints)
Command Line interface or Protégé Plugin
[7] Lehmann et al. – University of Leipzig, Germany
KNIME
Data analytics platform
Workflows are chain of nodes
KNIME SPARQL Node
• SPARQL queries against endpoints
• Connection between KNIME and Apache Jena
• Results as string tables
Gephi – Semantic Web Import
Gephi: graph visualization & exploration
Networks, complex systems
Dynamic and hierarchical graphs
Semantic Web Import
SPARQL queries
Statistics on the imported graph
Graph filtering and cleaning
SPARK – Linked Data processing
Spark – Large scale data processing
GraphX
• graph managing
• parallel computation
• graph algorithms
RDF processing plugins
• Banana-rdf
• SparkRDF
• ScalaRDFProcessing
Dedalo
Patterns are explained with knowledge from Linked Data
Machine Learning
positive VS negative obs.
Logic Programming
reasoning upon examples
Linked Data as knowledge Base
Graph Search
clever exploration of the Linked Data graphs
Discussion and conclusions
Why are those tools not enough?
What are they missing?
Why KDD people do not use Linked Data more?
What should the Linked Data community do to
make Linked Data more appealing?
Does anybody care about it?
Should we care?
THANKS
FOR YOUR ATTENTION!

LD4KD 2015 - Demos and tools

  • 1.
    LD4KD2015 Linked Data 4Knowledge Discovery Demos and tools
  • 2.
    Demos and tools:what for? Papers are one thing… …but what can I practically do with Linked Data? We wanted some answers: How much do Linked Data people know about KDD tools? What can KDD people do with Linked Data?
  • 3.
    Demos and tools:what did we do? We asked the Linked Data community to provide us with tools We looked at KDD tools we knew to see if (how) they integrate Linked Data Are we missing something? Are we wrong in something? Tell us here  https://goo.gl/DSTAFm
  • 4.
    What can LinkedData do for KDD? Preprocessing Mining Postprocessing Validating Enriching Reasoning Mining Visualising Interpreting Open Refine X Rapidminer-LD X X Rapidminer-RMonto X X R – SPARQL pkg X X Matlab – SciSPARQL X X ProLOD++ X X DL-Learner X Spark – GraphX&RDF X X Knime – SPARQL nodes X X Gephi – SemanticWebImport X X X Dedalo X
  • 5.
    Open Refine –RDF extension Open Refine tool for working with (messy) data reconcile, clean, match data RDF refine[1] • Reconcile/interlink • SPARQL endpoints, RDF dumps • Search the Web for related RDF datasets • Export RDF • Use existing vocabularies (auto-completion ) [1] Maali et al. – DERiresearch centre, Ireland
  • 6.
    Rapidminer – LODextension Rapidminer A tool to perform data mining tasks Each process is a chain of operators e.g. CSV import operator, Data Transformation operators, Classification operators, etc. Linked Data extension[2] Enriching data with information from Linked Data (Linkers) Input Linked Data (SPARQL and Data importers) Explaining patterns with Linked Data [2] Paulheim et al. – University of Mannheim
  • 7.
    Rapidminer – RMontoextension Rapidminer A tool to perform data mining tasks Each process is a chain of operators e.g. CSV import operator, Data Transformation operators, Classification operators, etc. RMonto extension[3] Loading Data (SPARQL, RDF files) Data transformation Pattern Mining Data extension [3] Potoniec et al. – University of Poznan
  • 8.
    - CRAN SPARQLpackage R programming language Statistical computing and graphics Need to explain more?  SPARQL package[4] • SPARQL queries (local/endpoints) • Update data into the triple store • Retrieve results as data frame for further processing [4] van Hage et al. -- Synerscope
  • 9.
    Matlab SPARQL extensions MATLABSciSPARQL Link (MSL)[5] • Client-Server interface • MATLAB (scientific computing) + SciSPARQL (scientific SPARQL queries) • populate, update, and query SSDM databases using SPARQL queries MatlabSPARQL • Run queries against SPARQL endpoints • Download data as Matlab structures • Export in CSV format [5] He – Uppsala University
  • 10.
    ProLOD++ Profiling and MiningLinked Data[6] Web platform for Linked Data Merging heterogeneous sources Cleansing, preprocessing Analysis and exploration Mining and profiling [6] Abedjan et al. – Hasso Platner Institute, Germany
  • 11.
    DL-Learner Owl-based machine learningtool for supervised learning Supports in constructing knowledge • Learn definitions for classes • Find similar instances • Classify instances Reasoners adapters (e.g. Fact++, Pellet) Data import (OWL, N-Triples, SPARQL endpoints) Command Line interface or Protégé Plugin [7] Lehmann et al. – University of Leipzig, Germany
  • 12.
    KNIME Data analytics platform Workflowsare chain of nodes KNIME SPARQL Node • SPARQL queries against endpoints • Connection between KNIME and Apache Jena • Results as string tables
  • 13.
    Gephi – SemanticWeb Import Gephi: graph visualization & exploration Networks, complex systems Dynamic and hierarchical graphs Semantic Web Import SPARQL queries Statistics on the imported graph Graph filtering and cleaning
  • 14.
    SPARK – LinkedData processing Spark – Large scale data processing GraphX • graph managing • parallel computation • graph algorithms RDF processing plugins • Banana-rdf • SparkRDF • ScalaRDFProcessing
  • 15.
    Dedalo Patterns are explainedwith knowledge from Linked Data Machine Learning positive VS negative obs. Logic Programming reasoning upon examples Linked Data as knowledge Base Graph Search clever exploration of the Linked Data graphs
  • 16.
    Discussion and conclusions Whyare those tools not enough? What are they missing? Why KDD people do not use Linked Data more? What should the Linked Data community do to make Linked Data more appealing? Does anybody care about it? Should we care?
  • 17.