SlideShare a Scribd company logo
Resource Classification as the
Basis for a Visualization Pipeline
in LOD Scenarios
Oscar Peña del Rio, Unai Aguilera

and Diego López de Ipiña

DeustoTech, University of Deusto
Motivation
• The Semantic Web is still waiting for massive traction
• The focus is set on ontology development, provenance
and supporting technological stack
• Little is known about the SW outside the research
community
• Potential benefits should be addressed to non-technical
user profiles
2
Data overview
• Need to get the whole picture of a dataset before
working with it
• Usually time/resource constraints and lack of expert
knowledge are present
• Based on Ben Shneiderman’s Overview task (from his
famous Visual Information Seeking Mantra)
• Diverse approaches perform basic statistics to fulfill this
task (counts, averages, min/max, etc.)
3
Natural approaches to data exploration
• Takes some ideas from Tukey’s Exploratory Data Analysis
field (EDA)
• Proposes different approaches to get an overview of the
data
• Techniques lack the rigor of more formal methodologies,
is a more data-driven perspective
• Data discovery is more natural this way, in line with the
follow your nose principles
4
Visualization Pipeline
raw data analysis operators
visual
transformations
recommender
engine
end user
visualizations
rdf, json-ld, …
statistical analysis,
datatype inference…
how to encode data
in visual elements
learned lessons, best
practices & fit models
Web browser accessible
visualization
5
Defending visualization as the means
for a coherent, understandable Semantic
Web beneficial for all actors
Extract resource features
• We focus on the data itself to infer its structure and
relevance within the whole dataset
• The data is directly accessed through SPARQL queries
• Property usage: # unique class instances / # instance objects



• Completeness ratio: # values assigned to property / # instance objects
6
dc:title -> 1 foaf:nick -> 3.4dc-terms:license -> 0.12
foaf:name -> 1 foaf:title -> 0.36
Primitive datatype inference
• Required to understand how each property may be interpreted, the operations it allows,
and how they relate to each other
• We define the following classification categories:
• Integer
• Float
• Boolean
• IRI
• String
• Geographical component
• Datetime component
• Categorical data
7
Results
• 190 properties evaluated (149
unique)

• 5 datasets (approx. 10M triples)

• Diverse topics

• Inference algorithm tested
against agreement between 6
experts 

(>80% agreement, 5 out of 6)
dataset TP TN FP FN Cat Correct
Air quality 17 160 2 10 5 93,65%
Restaurants 17 201 3 17 5 91,6%
Historical sites 14 165 4 13 3 91,33%
MORElab 56 399 15 13 12 94,2%
Teseo 22 162 4 1 3 97,35%
8
Conclusions & future steps
• Common pitfalls addressed
• Missing property datatypes & ranges
• Incorrect typing / usage
• Redundancy
• Most instances typed as plain, literal strings
• Feed all the features to a classifier in order to create Entity Visualization
Templates (work in progress)
• Recommend coherent visual representations for each template
9
Thank you
Oscar Peña del Rio

oscar.pena@deusto.es

More Related Content

What's hot

Text mining
Text miningText mining
Text mining
ThejeswiniChivukula
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantinimaxfalc
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Webfeiwin
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Text mining
Text miningText mining
Text mining
Ali A Jalil
 
Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal
EdiFaizal2
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
Mounia Lalmas-Roelleke
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Carsten Eickhoff
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
guest0edcaf
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
Harsh Thakkar
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrievalNanthini Dominique
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research Domain
Michael Genkin
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Information retrieval
Information retrievalInformation retrieval
Information retrievalhplap
 

What's hot (20)

Text mining
Text miningText mining
Text mining
 
Text mining
Text miningText mining
Text mining
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantini
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Text mining
Text miningText mining
Text mining
 
Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research Domain
 
IR
IRIR
IR
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 

Similar to Resource Classification as the Basis for a Visualization Pipeline in LOD Scenarios

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005AsseroLtd
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 
Data cycle health
Data cycle healthData cycle health
Data cycle health
jyotikhadake
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
Lucidworks
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
Jordan Open Source Association
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
Jeroen Rombouts
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
Functional Genomics Data Society
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
Yongyao Jiang
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
Wake Tech BAS
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in Elasticsearch
MeetupDataScienceRoma
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Built around answering questions
Built around answering questionsBuilt around answering questions
Built around answering questions
Larry Smarr
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
S. Diana Hu
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
DataDryad
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
Institute of Contemporary Sciences
 

Similar to Resource Classification as the Basis for a Visualization Pipeline in LOD Scenarios (20)

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
Data cycle health
Data cycle healthData cycle health
Data cycle health
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in Elasticsearch
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Built around answering questions
Built around answering questionsBuilt around answering questions
Built around answering questions
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 

Recently uploaded

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 

Recently uploaded (20)

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 

Resource Classification as the Basis for a Visualization Pipeline in LOD Scenarios

  • 1. Resource Classification as the Basis for a Visualization Pipeline in LOD Scenarios Oscar Peña del Rio, Unai Aguilera and Diego López de Ipiña DeustoTech, University of Deusto
  • 2. Motivation • The Semantic Web is still waiting for massive traction • The focus is set on ontology development, provenance and supporting technological stack • Little is known about the SW outside the research community • Potential benefits should be addressed to non-technical user profiles 2
  • 3. Data overview • Need to get the whole picture of a dataset before working with it • Usually time/resource constraints and lack of expert knowledge are present • Based on Ben Shneiderman’s Overview task (from his famous Visual Information Seeking Mantra) • Diverse approaches perform basic statistics to fulfill this task (counts, averages, min/max, etc.) 3
  • 4. Natural approaches to data exploration • Takes some ideas from Tukey’s Exploratory Data Analysis field (EDA) • Proposes different approaches to get an overview of the data • Techniques lack the rigor of more formal methodologies, is a more data-driven perspective • Data discovery is more natural this way, in line with the follow your nose principles 4
  • 5. Visualization Pipeline raw data analysis operators visual transformations recommender engine end user visualizations rdf, json-ld, … statistical analysis, datatype inference… how to encode data in visual elements learned lessons, best practices & fit models Web browser accessible visualization 5 Defending visualization as the means for a coherent, understandable Semantic Web beneficial for all actors
  • 6. Extract resource features • We focus on the data itself to infer its structure and relevance within the whole dataset • The data is directly accessed through SPARQL queries • Property usage: # unique class instances / # instance objects
 
 • Completeness ratio: # values assigned to property / # instance objects 6 dc:title -> 1 foaf:nick -> 3.4dc-terms:license -> 0.12 foaf:name -> 1 foaf:title -> 0.36
  • 7. Primitive datatype inference • Required to understand how each property may be interpreted, the operations it allows, and how they relate to each other • We define the following classification categories: • Integer • Float • Boolean • IRI • String • Geographical component • Datetime component • Categorical data 7
  • 8. Results • 190 properties evaluated (149 unique) • 5 datasets (approx. 10M triples) • Diverse topics • Inference algorithm tested against agreement between 6 experts 
 (>80% agreement, 5 out of 6) dataset TP TN FP FN Cat Correct Air quality 17 160 2 10 5 93,65% Restaurants 17 201 3 17 5 91,6% Historical sites 14 165 4 13 3 91,33% MORElab 56 399 15 13 12 94,2% Teseo 22 162 4 1 3 97,35% 8
  • 9. Conclusions & future steps • Common pitfalls addressed • Missing property datatypes & ranges • Incorrect typing / usage • Redundancy • Most instances typed as plain, literal strings • Feed all the features to a classifier in order to create Entity Visualization Templates (work in progress) • Recommend coherent visual representations for each template 9
  • 10. Thank you Oscar Peña del Rio oscar.pena@deusto.es