Interaction with Linked Data

Interaction with Linked Data
Presented by:
Barry Norton
Michael Meier

Motivation: Music!
2
Visualization
Module
Metadata
Streaming providers
Physical Wrapper
Downloads
Dataacquisition
R2R Transf.LD Wrapper
Musical Content
Application
Analysis &
Mining Module
LDDatasetAccess
LD Wrapper
RDF/
XML
Integrated
Dataset
Interlinking Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
RDFa
Other content
EUCLID – Interaction with Linked Data

Motivation: Music! (2)
EUCLID – Interaction with Linked Data 3
• Our aim: build a music-based portal using Linked
Data technologies
• So far, we have studied different mechanisms to
consume Linked Data:
• Executing SPARQL queries
• Dereferencing URIs
• Downloading RDF dumps
• Extracting RDFa data
• The output of these mechanisms corresponds to
data in machine-readable formats
CH 2
CH 3
CH 1

Examples of machine-readable output:

Visualizations techniques are needed in order to
transform the machine-readable data into this:
Source: http://musicbrainz.fluidops.net/

In addition, visualization techniques allow for:
• Telling a story
• Engaging our pattern matching
brain
• Identifying data characteristics
which cannot be directly inferred
from statistical properties:
• Anscombe’s quartet: 4 datasets very
different, but with same statistical values.
Image: http://en.wikipedia.org/wiki/Anscombe's_quartet
Source: Donaldson, I. and Lamere P. Using Visualizations for Music Discovery
Image: Chan W., Qu. H, Mak, W. Visualizing the
Semantic Structure in Classical Musical Works.

Agenda
1. Linked Data visualization
2. Linked Data search
3. Methods for Linked Data analysis
7EUCLID – Interaction with Linked Data

LINKED DATAVISUALIZATION

LDVisualizationTechniques
• Linked Data visualization techniques should provide
graphical representations of the information within
the LD datasets
• Visualization techniques should be selected
accordingly to:
– The type of data: Specific types of data should be
visualized in a certain way
– The purpose of the visualization: Depending on the type
of analysis/application to employ

LDVisualizationTechniques (2)
• (Raw) RDF data: Instance data, taxonomies,
ontologies, vocabularies.
• Analytically extracted data: Subset of
the data denominated region of interest (ROI),
obtained via data extraction mechanisms, for
example, SPARQL queries.
• Visualization abstraction: It is obtained by
applying visualization transformations to render the
data into displayable information.
• View: Final result. The visual mapping
transformations obtain a graphic representation of
the data using the selected visualization technique.
• User interaction: The user interacts (click,
zoom, etc.) with the visualization, which may trigger
a new visualization process.
RDF data
Analytically
extracted data
Visualization
abstraction
View
Data extraction
Visualization
transformation
Visual mapping
transformation
Overview of the Linked DataVisualization process
Process partially based on: Brunetti , J.M.; Auer, S.; García, R. The Linked Data Visualization Model.
(Optional)
User
interaction

country releases
United Kingdom 225
United States 140
Germany 30
Luxembourg 29
Example of the Linked DataVisualization process
…
RDF data
Analytically
extracted data
…
Visualization
abstraction
SELECT ?country (COUNT(?release) AS ?releases)
WHERE {
<http://dbpedia.org/resource/The_Beatles> foaf:made
?release .
?release a mo:Release ;
mo:label ?label .
?label foaf:based_near ?country .}
GROUP BY ?country
ORDER BY DESC(?releases)
Data extraction
SPARQL query: Retrieve number of releases per
country of The Beatles
#widget : HeatMap |
input = 'country_code' |
output = {{ 'releases' }}
Visualization
transformation
country_code releases
GB 225
US 140
DE 30
LU 29
?country_code2 := REPLACE(str(?country), "http://ontologi.es/place/", "", "i”)
?country_code := REPLACE(?country_code2, "%", "", "i")
Formatting the names of the countries
View Visual mapping
transformation
Selecting the visualization technique (input, output)
Can be performed in a single step
… …

Example of the Linked DataVisualization process
View

Challenges for
Linked DataVisualization
• Enabling user interaction
– Users must be able to navigate through the data by exploiting the
connections between Linked Data resources
– The user might edit the underlying data to enrich it by:
• Creating additional metadata
• Highlighting or correcting errors
• Validating data
• Supporting data reusability
– The output (the plotted data or the visualization itself) might be
encoded using standard ontologies and vocabularies
• Scalability
– Linked Data visualization techniques should support the display of
large amount of data in an efficient way

Challenges for
Linked Open DataVisualization
• Extracting data from different repositories
– A Linked Data set might be partitioned into several repositories
– The region of interest (ROI) might include data from different data
sets, requiring the access to distributed repositories
• Handling heterogeneous data
– The same data (concepts) might be modeled differently, for example,
using different vocabularies
– Certain values might have different formats, for example, dates
represented as DD-MM-YYYY, MM-DD-YYYY or just YYYY
• Dealing with missing values
– Due to the semi-structuredness of Linked Data, some instances might
have missing values for certain properties

Classification of
VisualizationTechniques
Task Visualization techniques
Comparison of attributes /
values
• Bar/column and pie chart
• Line charts
• Histogram
Analysis of relationships
and hierarchies
• Graph
• Arc diagram
• Matrix
• Node-link visualizations
• Space-filling techniques: Treemaps, icicles and sunburst,
circle packing and rose diagrams
Analysis of temporal or
geographical events
• Timeline
• Maps
Analysis of multi-
dimensional data
• Parallel coordinates
• Radar/star chart
• Scatter plot

Bar/column chart
Allows the comparison of values of
different categories.
Pie chart
Useful for performing comparison
of percentages or proportions.
Comparison of
Attributes /Values
Line chart
Allows visualizing data as a series of
data points, where the measurement
points (x-axis) are ordered.
Histogram
Graphical representation of the
distribution of the data.
Image source: http://mbostock.github.io/protovis/Image source: http://musicbrainz.fluidops.net
Image source: http://mbostock.github.io/protovis/Image source: http://musicbrainz.fluidops.net

Arc diagram
The nodes are displayed in one
dimension, and the arcs represent
the connections.
Analysis of
Relationships and Hierarchies
Graph
The data entries are represented as
nodes and the links as edges.
Adjacency Matrix diagram
The nodes are displayed as rows and
columns, and the links between the
nodes are entries in the matrix.
Node-link visualizations
The data is organized in hierarchies.
Source of images: http://mbostock.github.io/protovis/

Icicles and sunburst
Hierarchies are represented by
adjacencies.
Analysis of
Relationships and Hierarchies (2)
Treemaps
Subdivide area into rectangles.
Circle-packing
Containment is used to represent the
hierarchies.
Rose diagrams
Areas are equal angles and the data
is represented by
the extension of
the area.
Source of images: http://mbostock.github.io/protovis/
Space-fillingtechniques

Analysis of Temporal or
Geographical Events
Timeline
Maps
Source: http://mbostock.github.io/protovis/
Choropleth maps
Aggregate data by
geographical area
Location maps
Display geo-points on a map
Dorling cartograms
Aggregate data and replace
each area with a circle
Discrete data points in time Continuous data in time
Source: http://www.kottke.org/08/08/2008-movie-box-office-chart
Source: http//musicbrainz.fluidops.net
Source: Google Map API Source: http//musicbrainz.fluidops.net

Scatter plot
Useful for performing comparison
of percentages or proportions.
Analysis of
Multidimensional Data
Radar/star chart
Displays multivariate data as a two-
dimensional chart. The axes
correspond to the
variables.
Parallel coordinates
Allows visualizing high-dimensional data.
Each vertical axis denotes a dimension, and
a multidimensional point is represented as
a polyline with vertices on the axes.
Source: http://mbostock.github.io/protovis/
Source: http://mbostock.github.io/protovis/Source: http://mbostock.github.io/protovis/

OtherVisualizationTechniques
• Text-based visualizations: tag clouds
• Some of the previously presented techniques can be
combined to produce more complex data
visualizations
Phrase Net of Beatles Lyrics
DBpedia music genres
Source: http://www.wordle.net
Source: http://many-eyes.com

• Get an overview of the data
• Identification of relevant resources, classes or properties in
datasets
• Learning about certain underlying characteristics of the data,
e.g., vocabularies or ontologies
• Detecting missing links between nodes in an RDF graph
• Discovering new paths between nodes in an RDF graph
• Identifying hidden patterns in the data
• Finding errors or atypical values (outliers)
Applications of Linked Data
Visualization Techniques

Tool Requirements
The requirements for visualization tools that consume Linked Data can be
summarized as follows:
• Data navigation and exploration capabilities in order to understand the
structure and the content
• Exploiting data structures:
• Links to visualize hierarchies or graphs
• Multi-dimensional
• User interaction:
• Basic and advanced querying
• Filtering values
• Interactive UI: responsive to the user input
• Publication/syndication of the graphical representation of the data
• Data extraction in order to export the data such that can be reused by
third parties

ToolTypes
1. LD browsers with text-based representation
• Dereference URIs to retrieve the resource description
• Use a textual representation of LD resources
• Display adequately texts and images
• Mainly support exploratory browsing and knowledge discovery
2. LD and RDF browsers with visualization options
• Exploit picture, graphics, images and other visual
representations of the data
• Support user interaction: allows for querying, filtering and
jumping between resources
• Suitable for browsing and knowledge discovery as well as
analytic activities

ToolTypes (2)
3.Visualization toolkits
• Frameworks providing a wide range of visualization techniques
• General toolkits support LD visualization by applying a set of
transformations of the data
• Some toolkits are specially designed to consume LD
4. SPARQL visualization
• These tools allow transforming the output of SPARQL queries
into graphics
• Contact SPARQL endpoints in order to evaluate the query
• Suitable for analytical activities

ToolTypes (3)
LD browsers with text-
based presentations
Sig.ma
Sindice
OpenLink RDF Browser
Marbles
Disco Hyperdata Browser
Piggy Bank (SIMILE)
Zitgist DataViewer
iLOD
URI Burner
Dipper – Talis Platform Browser
LD and RDF browsers
with visualization
options
Tabulator
IsaViz
OpenLink Data Explorer
RDF Gravity
RelFinder
DBpedia Mobile
LESS
SIMILE Exhibit
Haystack
FoaF Explorer
Humboldt
LENA
Noadster
Visualization toolkits
Linked Data tools:
Information Workbench
Visual RDF (by Graves)
LOD Live
LOD Visualization
Data-Driven Documents (D3)
NetworkX
Many Eyes
Tableau
Prefuse
SPARQL visualization
Google Visualization API
SPARQL package for R
Gruff (for AllegroGraph)
Linked Data:
General data:

Examples (1)
Sig.ma
Source: http://sig.ma/search?q=The+Beatles
Retrieves information from
different LD sources
Keyword
search
Displays
values per
predicate
Displays
the source
for each
value

Examples (2)
Sig.ma
Source: http://sig.ma/search?q=The+Beatles
Displays
values per
predicate:
May include (redundant)
information in different
languages, for example: annés
and anno
Summary:
• Sig.ma lists all the triples, and group
them per predicate
• Useful for browsing predicates and
values within data sets
• The meaning of the values is not evident
URIs are clickable, allowing
navigation through RDF
resources

Examples (3)
Sindice
Keyword
search
Filtering
per type
of
document
Retrieves links
to documents
Allows accessing
cache documents
Allows inspecting
resources
Source: http://sindice.com/search?q=The+Beatles

Examples (4)
Sindice
Both interfaces display the
set of triples related to the
inspected resource
Cache triples
Live triples

Examples (5)
• Demo available at: http://musicbrainz.fluidops.net
• Displays human-readable content about Linked Data
resources
• Supports visualization techniques (different types of charts,
maps, timelines, etc.) to plot results from SPARQL queries
• Allows the user to interact with the displayed data

Examples (6)
Information Workbench: Browsing a music artist
(1) Search options (2) Search results

Examples (7)
Information Workbench: Browsing a music artist
(3) Browsing the selected resource

Examples (8)
Information Workbench: Visualization techniques
(3) Browsing the selected resource

Examples (9)
Information Workbench: User interaction
LD visualizations must support navigation through the data
Source: http://musicbrainz.fluidops.net/resource/Analytical5

Examples (9)
Information Workbench: SPARQLVisualization
Implements widgets which allow:
• Retrieving ROI via SPARQL queries
• Selecting the appropriate visualization technique
• Configuring parameters of the visualization

Examples (10)
Information Workbench: SPARQL visualization
SELECT ?release
((SUM(xsd:double(?duration/60000))) AS ?avg)
WHERE {
<http://dbpedia.org/resource/The_Beatles>
foaf:made ?release .
?release mo:record ?record .
?record mo:track ?track .
?track mo:duration ?duration .}
GROUP BY ?release
ORDER BY DESC(?avg)
LIMIT 10
SPARQLQuery
Result set
Top ten The Beatles releases according to the sum of track durations in minutes

Examples (11)
Widget
Visualization: Bar chart
{{#widget: BarChart |
query ='SELECT (COUNT(?Release) AS ?COUNT)
?label WHERE {
<http://musicbrainz.org/artist/8538e728-ca0b-4321-b7e5-
cff6565dd4c0#_> foaf:made ?Release.
?Release rdf:type mo:Release .
?Release dc:title ?label .}
GROUP BY ?label
ORDER BY DESC(?COUNT)
LIMIT 20'
| settings = 'Settings:barvertical_mb'
| asynch = 'true'
| input = 'label'
| output = 'COUNT'
| height = '300’}}

Examples (12)
Other visualizations of the same result set …
Line chart:
Pie chart:

Examples (13)
Information Workbench: Automated Widget Suggestion
Bar chart
Line chart
Pie chart
1
2 3Table
Pivot
view
Select a suggested visualization Visualization
automatically built

Examples (14)
Other tools
Source: http://en.lodlive.it Source: http://lodvisualization.appspot.com
LODVisualizationLOD live
• Graph visualizations
• Interactive UI (the graph can be
expanded by clicking on the nodes)
• Live access to SPARQL endpoints
• Hierarchy visualizations: treemaps and trees
• Live access to SPARQL endpoints
(supporting JSON and SPARQL 1.1)

LinkingOpen Data Cloud
Visualization (1)
“The Linking Open Data cloud diagram”
by Richard Cyganiak and Anja Jentzsch
Source: http://lod-cloud.net
• The nodes correspond
to Linked Data sets
• The edges represent
connections between
Linked Data sets
• The size of the nodes is
proportional to the
number of triples in
each data set
• The datasets are
categorized by
knowledge domains
represented with colors

Visualization (2)
Image source: http://twitpic.com/17qj1h
“Linked Open Data Cloud” generated by Gephis
• The central cluster (green) displays DBpedia as a central focus
• The size of the nodes reflect the size of the datasets
• The length of the connections encode information about the data structure
Source: A. Dadzie and M. Rowe. Approaches to Visualizing Linked Data: A Survey. 2011

Visualization (3)
“Linked Open Data Graph” by Protovis
Source: http://inkdroid.org/lod-graph/
• The data to be displayed are
retrieved using the CKAN API
• The nodes represent Linked Data
sets available in the Data Hub “lod-
cloud” group
• The size of the nodes is proportional
to the data set size
• Edges are connections between data
sets
• The colors reflect the CKAN rating
and the intensity of the color reflects
the number of received ratings
• The nodes can be clicked to go to the
data set CKAN page

LD Reporting
• Visualizations techniques are used in the creation of reports
included in data monitoring and management solutions
• Provides and overview of the dataset by generating a low-level
descriptive analysis:
• Quantitative information about the dataset
• Users may interact with the data via dashboards
• Some systems support this feature over structured data:
• Google Webmaster Tools (https://www.google.com/webmasters/tools)
• Information Workbench (http://www.fluidops.com/information-workbench)
• eCloudManager (http://www.fluidops.com/ecloudmanager)

GoogleWebmasterTool:
Structure Data Dashboard (1)
• Provides to webmasters information about the structured
data embedded in their websites (and recognized by Google)
• The dashboard three levels:
i. Site-level view: aggregates the data by classes defined in
the vocabulary schema
ii. Item-type-level view: provides details per page for each
type of resource
iii. Page-level view: shows the attributes of every type of
resource on a given web page

Source: http://googlewebmastercentral.blogspot.de/2012/07/introducing-structured-data-dashboard.html
Site-level view

Source: http://googlewebmastercentral.blogspot.de/2012/07/introducing-structured-data-dashboard.html
Page-level view
Site-level view

LINKED DATA SEARCH

Semantic Search Process
Using semantic models for the search process
Faceted
Search
Semantic
Search
Image based on: Tran, T., Herzig, D., Ladwig, G. SemSearchPro- Using semantics through the search process
Data graphs Query
Result
visualization/present
ation
User query
(e.g. keywords, NL)
Query visualization
(Optional) User
System
Refinement
Presentation
Analysis
Presentation /
Ranking
Graph matching
Entity Extraction /
Semantic query analysis

Image Source: http://musicontology.com
Semantic Search: Example (1)
User query
(NL)
“songs written by members of the beatles”
Entity extraction:
Query expansion:
song
track
melody
tune
synonym
mo:Track
Candidates
…
song member (of)written by (the) beatles
Entity mapping:

User query
(NL)
Entity extraction:
Query expansion:
writer
composer
creator
synonym
mo:composer
Candidates
written by
inverse of
…
song member (of)written by (the) beatles
Entity mapping:

User query
(NL)
Entity extraction: song member (of)written by (the) beatles
Query expansion:
member (of)
mo:member
_of
mo:member
inverse of
Entity mapping:

User query
(NL)
Entity mapping:
(the) beatles
Candidates
Beatles
(Book)
The Beatles
(Music Group)
Beatle
(Animal)
Beatle
(Automobile)
How to identify the right “Beatle”? Examine the context (Contextual Analysis)

User query
(NL)
Entity mapping:
(the) beatles
Contextual Analysis
foaf:Agent
mo:composer
mo:Track
mo:
MusicArtist
rdfs:subClassOf
mo:
MusicGroup
mo:member
rdfs:subClassOf
This subgraph is part of the query
The Beatles
(Music Group)
dbpedia:
The_Beatles
Entity mapping:

User query
(NL)
?y
Mo:Track
?x
dbpedia:
The_Beatles
Results
(I want to) Come Home
Angel in Disguise
Another Day
…
Answers presented to the user
The results could be ranked
Query
foaf:Agent

Semantic Search
• Aims at understanding the meaning of the resources specified
in the query
• Different approaches to exploit semantics:
• Query expansion using ontologies
Since ontologies represent knowledge about specific domains, they can
be used to expand the query by incorporating related ontology terms into
the query.
• Contextual analysis
In LD, this approach may explore the resources specified in the query and their
adjacent nodes in the RDF graph. Mainly applied to disambiguate query terms.
• Reasoning
In some cases, the answer to a specific query is not explicitly contained in the
data, but it can be computed by using reasoning methods.

Semantic Search & Linked Data
Component Semantic search SPARQL query
Keyword or NL /
concept matching
Performs entity extraction
and matching to formal
concepts
Not supported
Fuzzy
concepts/relation/logics
Allows the application of
fuzzy qualifiers as query
constrains
Not supported
Graph patterns Uses the context and
other semantic
information to locate
interesting sub-graphs
Applies pattern matching
Path discovery Finds new interesting
links that may lead to
additional information
Not supported
Semantic Search vs. SPARQL query

Semantic Search: Google (1)
Input: query in NL
Output: List of answers
Google performs semantic search on certain entities and queries!

Semantic Search: Google (2)
Input: question in NL
Output: List of web pages
ranked using the algorithm
Google PageRank to display the
most relevant pages first

Semantic Search: DuckDuckGo (1)
Input: question in NL
Output: List of answers

Semantic Search: DuckDuckGo (2)
Performs disambiguation of the
query terms.
The 45 suggestions are grouped by
classes according to their
corresponding knowledge domain:
This approach is denominated
Faceted Search

Faceted Search: Example
InformationWorkbench: Searching for artists in categories
Facet
Facet
Facet
Source: http://musicbrainz.fluidops.net/resource/mo:MusicArtist?view=pivot
Depictions of artists

Faceted Search
• Facets = properties
• Suitable for browsing multi-dimensional taxonomies based on
the search attributes
• Allows user to explore the data:
• User submits a (keyword) query
• Faceted system dynamically identifies the relevant facets (properties)
for the given query and the constrains (values of those properties), and
display the search results
• User may “drill down” by selecting specific constrains to the search
results
• Information can be accessed and ranked in multiple ways

Faceted Search (2)
Challenges for supporting Faceted Search
• Identifying which facets to surface:
• In heterogeneous datasets, data entries may have different facets
• Dynamically identify the most appropriate facets for each query
• Ordering the facets depending on the relevance to the query
• Computing previews:
• Accurately predicting counts, without examining all the results
• Offering facet preview to give users an idea of what to expect
Source: Teevan , J., Dumais, S., Gutt. Z. Challenges for Supporting Faceted Search in Large, Heterogeneous
Corpora like the Web

Faceted Search: LD Example (1)
FacetedDBLP
• Retrieves information from the DBLP collection
• Shows the result set with different facets:
• Publication years
• Authors
• Conferences
• It is implemented upon the DBLP++ dataset (enhancement of
DBLP including additional keywords and abstracts):
• DBLP ++ is stored in a MySQL database
• Uses D2R server to consume RDF triples

Faceted Search: LD Example (2)
Input: “crowdsourcing”
Facets
485 results
FacetedDBLP

Classification of Search Engines
Semantic
Search
Systems
Faceted
Search
Systems
Google
(GKG)Bing
KIM
sig.ma
LOD cloud cache
/facet
Longwell
mSpace
Exhibit (SIMILE)
PoolParty Semantic
Search Server
DuckDuckGo
Hakia
SenseBot
PowerSet
DeepDive
Kosmix
Factibles
Lexxe

Searching for Semantic Data
Search for
• Ontologies
• Vocabularies
• RDF documents

Semantic Data Search Engines (1)
Searching for ontologies
Swoogle
http://kmi-web05.open.ac.uk/WatsonWUIhttp://swoogle.umbc.edu
Watson
Keyword search
Keyword search

Searching for vocabularies: LOV Portal
• Allows to search properties, classes or vocabularies in
the Linked Open Vocabulary (LOV) catalog
• The LOV search engine implement faceted search on:
• The knowledge domain
• The role of the resource matched from the input query
• The vocabulary containing the resource
• Results are ranked according to a score considering:
• Relevancy to the query (string)
• Element labels matched importance
• Number of LOV vocabularies that refer to the element

Facets
84 results
Input: “artist”
CH 3
Searching for vocabularies: LOV Portal

Searching for documents
http://swse.deri.org http://sindice.com
Semantic Web Search Engine Sindice

METHODS FOR LINKED DATA
ANALYSIS

Features of Data Analysis
Statistical analysis
• Allows describing the data via Exploratory Data Analysis (EDA) methods
• Includes statistical inference and prediction
Data aggregation & filtering
• One of the first steps in data analysis is pre-processing in order to select the
appropriate data to study
Visualization techniques can be built on top of these as part of data analysis
Machine learning
• Focuses on prediction
• Combines Artificial Intelligence and Statistics
• Includes supervised and unsupervised learning (not covered in this course)

LD Data Aggregation & Filtering
• Data aggregation refers to merging/summarizing several
values into a single a one
• Filtering allows retrieving relevant data properties and
selecting a particular range of data values
• SPARQL is able to perform these features via SELECT queries
as follows:
Features SPARQL capabilities
Aggregation Combining aggregate functions (COUNT, SUM, AVG, … ) and
GROUP BY operator
Filtering Combining projection, FILTER and HAVING operators

LD Statistical Analysis
• Statistical analysis supports descriptive and predictive
operations
• SPARQL supports some descriptive operations (average,
maximum, minimum) but does not offer more sophisticated
statistical features like:
• Fitting distributions
• Linear regressions
• Analysis of variance
• …
• Some approaches are able to consume data retrieved from
SPARQL endpoints:
– “R for SPARQL” by Willen Robert van Hage & Tomi Kauppinen
– “Performing Statistical Methods on Linked Data” by Zapilko & Mathiak

R – Statistical Computing
• R is a language and environment for statistical computing
• R provides a wide variety of statistical and graphical
techniques
• Linear and nonlinear modeling
• Classical statistical tests
• Time-series analysis
• Classification (Machine Learning)
• Clustering (Machine Learning)
• Extensible with further functionalities
• R is available as Free Software (under the terms of the
GNU general public license)

Statistical Analysis with R

R for SPARQL
• The R for SPARQL Package enables to:
• Connect a SPARQL endpoint over HTTP
• Pose a SELECT query or an UPDATE operation (LOAD, INSERT, DELETE)
• If given a SELECT query, it returns the results as a data frame
• The results can directly be mapped and visualized
• Posing requests:
• If the parameter query is given, it is assumed that the input is a SELECT query
and a GET request will be performed to get the results from the URL of the
endpoint
• If the parameter update is given, it is assumed that the input is an UPDATE
operation and a POST request will be submit to the URL of the endpoint.
Nothing is returned
Source: http://linkedscience.org/tools/sparql-package-for-r/

R for SPARQL: Example (1)
1. Download the R package and load it:
• library(SPARQL)
• Library(sp) #user for plotting spatial data
2. Define the endpoint with the triples
• endpoint = "http://spatial.linkedscience.org/sparql"
3. Define the query
• q = "SELECT ?cell ?row ?col ?polygon ?DEFOR_2002
WHERE {
?cell a <http://linkedscience.org/lsv/ns#Item> ;
<http://spatial.linkedscience.org/context/amazon/Lin> ?row ;
<http://spatial.linkedscience.org/context/amazon/Col> ?col;
<http://observedchange.com/tisc/ns#geometry> ?polygon .
<http://spatial.linkedscience.org/context/amazon/DEFOR_2002>
?DEFOR_2002 .
}"
Source: http://linkedscience.org/tools/sparql-package-for-r

4. Link the result to an object
• res <- SPARQL(endpoint,q)$results
5. Handling the results
• res$row <- -res$row
• coordinates(res) <- ~col - row
6. Chose the graphical format and plot the results
• spplot(res,"DEFOR_2002",col.regions=rev(heat.colors(
17))[-1], at=(0:16)/100, main="relative
deforestation per pixel during 2002")

Machine Learning
• Machine Learning techniques allow to extract interesting
information from data sources, and can be used to discover
hidden patterns within datasets by generalizing from examples
• Different ML approaches can be applied:
• Clustering: groups similar data into data partitions called clusters
• Association rule learning: discovers relations between variables
• Decision tree learning: analyses observations to build a predictive
model represented as a tree
• Many others …
• Weka is a Data Mining framework commonly used to apply ML
on tabular data:
– www.cs.waikato.ac.nz/ml/weka

Machine Learning on LD
Challenges for applying Machine Learning on LD
• LD heterogeneity introduces noise to the data:
– Same LD resources, different URIs
– Predicates with similar semantics, but different constraints
• The data is not independent and identically distributed (iid):
– It does not consist of only one type of objects
– The entities are related to each other
• LD rarely contains negative examples needed for ML
algorithms:
– For example, owl:differentFrom
Source http://www.cip.ifi.lmu.de/~nickel/iswc2012-slides

Applications of
Machine Learning on LD
• Node ranking:
– Ranking nodes according to their relevance for a query
• Link prediction:
– Infer edges between LD resources
– Predict the new edges that will be added to the RDF graph
• Entity resolution:
– Determine whether two URIs correspond to the same real-
world object
• Taxonomy learning:
– Infer taxonomies or concept hierarchies from a given
vocabulary or ontology

Summary
• Linked Data visualization techniques:
• Visualizations must be chosen according the type of the data
• Wide variety of tools supporting SPARQL results’ visualization
• Might be used in dashboards for supporting administrative tasks
• Linked Data search
• Semantic search: exploits the meaning of user queries (NL or set of
keywords) to present useful results
• Faceted search: allows browsing multi-dimensional data
• Linked Data analysis:
• Includes data manipulation such as aggregation & filtering
• Applies statistical methods to get a better understanding of the data
• Machine Learning techniques can be applied for predictive analysis
• Visualization techniques can be built on top of the previous features

For exercises, quiz and further material visit our website:
EUCLID - Providing Linked Data 88
@euclid_project euclidproject euclidproject
http://www.euclid-project.eu
Other channels:
eBook Course

Acknowledgements
• Alexander Mikroyannidis
• Alice Carpentier
• Andreas Harth
• Andreas Wagner
• Andriy Nikolov
• Barry Norton
• Daniel M. Herzig
• Elena Simperl
• Günter Ladwig
• Inga Shamkhalov
• Jacek Kopecky
• John Domingue 
• Juan Sequeda
• Kalina Bontcheva
• Maria Maleshkova
• Maria-Esther Vidal
• Maribel Acosta
• Michael Meier
• Ning Li
• Paul Mulholland
• Peter Haase
• Richard Power
• Steffen Stadtmüller
89

Interaction with Linked Data

More Related Content

What's hot

Viewers also liked

Similar to Interaction with Linked Data

Recently uploaded

Interaction with Linked Data

Editor's Notes