SlideShare a Scribd company logo
1 of 31
1 
A Comparison of Propositionalization Strategies 
for Creating Features from Linked Open Data 
9/29/2014 Petar Ristoski, Heiko Paulheim
Motivation 
9/29/2014 Ristoski, Paulheim 2
Motivation 
• Many existing applications use LOD as background knowledge 
in data mining 
– Explaining data patterns and statistics: unemployment rate, 
inflation, energy savings, etc … 
– Content-based book/movies recommendation system 
– Classifying incident related tweets 
– Gene classification 
– Prediction of car fuel consumption 
9/29/2014 Ristoski, Paulheim 3
Motivation 
Local LOD 
Data 
link combine cleanse transform analyze 
9/29/2014 Ristoski, Paulheim 4
Motivation 
• Standard data mining algorithms require propositional feature 
vector representation 
• Feature space: V={v1,v2,…, vn} 
• Each instance is represented as an n-dimensional feature vector 
(v1,v2,…,vn), where for each 1≤ vi ≤n : 
– vi ∈ {true, false}, or vi ∈ {1,0} 
– vi ∈ ℝ 
– vi ∈ S, where S is a finite set of symbols 
9/29/2014 Ristoski, Paulheim 5
Motivation 
Name Person Music Artist Instrument Genre 
Trent Reznor 1 1 1 0 
Wolfgang A. Mozart 1 1 1 1 
Barack Obama 1 0 0 0 
9/29/2014 Ristoski, Paulheim 6
Related Work 
• LiDDM (Narasimha et al.) 
– a framework tool for Linked Data mining that capture data from LOD 
cloud to extract hidden information 
• SPARQL-ML (Kiefer et al.) 
– extension to SPARQL to support data mining tasks for knowledge 
discovery in the Semantic Web 
• FeGeLOD (Paulheim et al.) 
– Unsupervised generation of data mining features from LOD 
• The resulting features are binary, or numerical aggregates using 
SPARQL COUNT constructs 
• No proper evaluation of the used propositionalization strategy 
9/29/2014 Ristoski, Paulheim 7
PROPOSED STRATEGIES 
9/29/2014 Ristoski, Paulheim 8
Strategies 
• Strategies for features derived from specific relations 
– r rdf:type C 
– r dcterms:subject S 
• Strategies for features derived from generic relations 
9/29/2014 Ristoski, Paulheim 9
Strategies for Features Derived from Specific Relations 
1. Binary feature: 
– vi =1 if C(r) 
– vi =0 if ⅂C(r) 
2. Relative count feature: 
– vi = 
1 
푛 
, where r has relation to n objects 
3. TF-IDF feature: 
– vi = 
1 
푛 
log 
푁 
{푟|퐶 푟 } 
, where N is the total number of resources in the dataset, 
and {푟|퐶 푟 } denotes the number of resources for which the specific 
relation to an object C exists 
9/29/2014 Ristoski, Paulheim 10
Features Derived from Specific Relations: Binary vs TF-IDF 
+10 Music Artists 
dbpedia:Person 
dbpedia:Artist 
dbpedia:MusicArtist 
dbpedia:MilitaryPerson 
dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley 
Name Person Artist Music Artist Military Person 
Elvis Presley 1 1 1 1 
Kris Kristofferson 1 1 1 1 
Artist X 1 1 1 0 
9/29/2014 Ristoski, Paulheim 11
Features Derived from Specific Relations: Binary vs TF-IDF 
+10 Music Artists 
dbpedia:Person 
dbpedia:Artist 
dbpedia:MusicArtist 
dbpedia:MilitaryPerson 
dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley 
Name Person Artist Music Artist Military Person 
Elvis Presley 0 0 0 0.672 
Kris Kristofferson 0 0 0 0.672 
Artist X 0 0 0 0 
9/29/2014 Ristoski, Paulheim 12
Strategies 
• Strategies for features derived from specific relations 
– r rdf:type C → C(r) 
– dcterms:subject 
• Strategies for features derived from relations as such 
– describe how resource r is related to resource r′ 
– outgoing relation: p(r, r′ ) 
– Incoming relation : p(r′ ,r) 
9/29/2014 Ristoski, Paulheim 13
Strategies for Features Derived from Generic Relations 
1. Binary feature: 
– vi = 1 if p(r,r′) 
– vi = 0 if ⅂p(r) 
2. Count feature: 
– vi = n, where r is connected to n resources with relation p 
3. Relative count feature: 
– vi = 
푛푝 
푃 
, where P is the total number of outgoing relations for r, and np is 
the number of relations of type p for r 
4. TF-IDF feature: 
– vi = 
푛푝 
푃 
log 
푁 
{푟|∃푟′:푝 푟,푟′ } 
, where N is the total number of resources in the 
dataset, and {푟|∃푟′: 푝 푟, 푟′ } denotes the number of resources for which 
p(r, r′) exists 
9/29/2014 Ristoski, Paulheim 14
Features Derived from Generic Relations : Binary vs Relative Count 
dbpedia:Chester_Bennington 
dbpedia:Anthony_Kiedis 
dbpedia:Jules_Verne 
dbpedia:instrument 
8 
5 
1 
dbpedia:author 
68 
Name instrument author 
Chester Bennington 1 0 
Anthony Kiedis 1 1 
Jules Verne 0 1 
9/29/2014 Ristoski, Paulheim 15
Features Derived from Generic Relations : Binary vs Relative Count 
dbpedia:Chester_Bennington 
dbpedia:Anthony_Kiedis 
dbpedia:Jules_Verne 
dbpedia:instrument 
8 
5 
1 
dbpedia:author 
68 
Name instrument author 
Chester Bennington 1 0 
Anthony Kiedis 0.833 0.166 
Jules Verne 0 1 
9/29/2014 Ristoski, Paulheim 16
EVALUATION 
9/29/2014 Ristoski, Paulheim 18
Evaluation 
• Comparative evaluation of the propositionalization strategies on 
three data mining tasks 
– Classification 
– Regression 
– Outlier Detection 
• Evaluated on six datasets, on five feature sets 
– types 
– categories 
– incoming relations 
– outgoing relations 
– incoming and outgoing relations 
9/29/2014 Ristoski, Paulheim 19
Evaluation: Classification 
• Datasets: 
Dataset # instances # types # categories # rel in # rel out # rel in & rel out 
Sports Tweets 5,054 7,814 14,025 3,574 5,334 8,908 
Cities 212 721 999 1,304 1,081 2,385 
• Methods: 
– Naïve Bayes 
– k-Nearest Neighbors (k=3) 
– C4.5 decision trees 
• Metrics for performance evaluation 
– Accuracy 
• Results calculated using stratified 10-fold cross validation 
9/29/2014 Ristoski, Paulheim 20
Evaluation: Classification 
Datasets Cities Sports Tweets 
Features Representation NB k-NN C4.5 Avg. NB k-NN C4.5 Avg. 
types 
Binary 55.71 56.17 59.05 56.98 81 82.9 82.95 82.28 
Relative Count 57.1 49.61 55.22 53.98 80.96 81.44 81.88 81.43 
TF-IDF 57.1 48.7 54.7 53.50 82.13 82.47 82.64 82.41 
categories 
Binary 55.74 49.98 56.17 53.96 82.26 76.56 71.98 76.93 
Relative Count 59.52 44.35 58.96 54.28 90.76 84.09 80.86 85.24 
TF-IDF 55.74 49.98 57.08 54.27 89.65 81.98 81.68 84.44 
rel in 
Binary 60.41 58.46 60.35 59.74 83.12 83.63 84.65 83.80 
Count 56.69 31.1 59.37 49.05 83.25 85.11 85.4 84.59 
Relative Count 49.16 38.23 58.55 48.65 69.51 84.63 85.17 79.77 
TF-IDF 34.94 38.2 54.26 42.47 72.66 84.65 84.99 80.77 
rel out 
Binary 47.62 60 56.71 54.78 80.67 82.36 84.41 82.48 
Count 49.96 55.24 58.59 54.60 79.95 83.35 85.01 82.77 
Relative Count 48.07 58.44 56.65 54.39 62.13 84.27 83.51 76.64 
TF-IDF 40.15 54.78 58.51 51.15 69.91 84.4 84.16 79.49 
r in & out 
Binary 59.44 58.57 56.47 58.16 86.17 85.14 86.45 85.92 
Count 56.13 54.26 60.82 57.07 86.03 86.01 87.16 86.40 
Relative Count 57.68 47.14 56.56 53.79 70.01 84.51 87.22 80.58 
TF-IDF 40.17 46.21 58.46 48.28 75.15 84.86 86.19 82.07 
9/29/2014 Ristoski, Paulheim 21
Evaluation: Regression 
• Datasets: 
Dataset # instances # types # categories # rel in # rel out # rel in & rel out 
Auto MPG 391 264 308 227 370 597 
Cities 212 721 999 1,304 1,081 2,385 
• Methods: 
– Linear Regression 
– M5Rules 
– k-Nearest Neighbors (k=3) 
• Metrics for performance evaluation 
– RMSE 
• Results calculated using stratified 10-fold cross validation 
9/29/2014 Ristoski, Paulheim 22
Evaluation: Regression 
Datasets Auto MPG Cities 
Features Representation LR M5 k-NN Avg. LR M5 k-NN Avg. 
types 
Binary 3.952 3.056 3.63 3.546 24.303 18.793 22.164 21.753 
Relative Count 3.843 2.952 3.571 3.455 18.046 19.696 33.569 23.770 
TF-IDF 3.864 2.964 3.571 3.466 17.852 18.773 22.396 19.674 
categories 
Binary 3.698 2.9 3.62 3.409 18.884 22.323 22.677 21.295 
Relative Count 3.747 3 3.57 3.430 18.952 19.98 34.489 24.474 
TF-IDF 3.782 2.9 3.56 3.416 19.02 22.323 23.189 21.511 
rel in 
Binary 3.849 2.9 3.61 3.444 49.866 19.205 18.532 29.201 
Count 3.892 3 4.62 3.824 138.041 19.915 19.27 59.075 
Relative Count 3.976 2.9 3.57 3.488 122.365 22.335 18.877 54.526 
TF-IDF 4.109 2.8 3.57 3.508 122.921 21.947 18.568 54.479 
rel out 
Binary 3.792 3.1 3.6 3.490 20.008 19.364 20.918 20.097 
Count 4.072 3 4.15 3.734 36.317 19.459 23.994 26.590 
Relative Count 4.095 2.9 3.57 3.536 43.22 21.961 21.472 28.884 
TF-IDF 4.135 3 3.57 3.572 28.845 20.852 22.212 23.970 
r in & out 
Binary 3.991 3.1 3.67 3.572 40.803 18.803 18.211 25.939 
Count 3.991 3.1 4.54 3.870 107.259 19.528 18.906 48.564 
Relative Count 3.922 3 3.57 3.493 103.102 22.091 19.608 48.267 
TF-IDF 3.982 3.01 3.57 3.523 115.373 20.623 19.702 51.899 
9/29/2014 Ristoski, Paulheim 23
Evaluation: Outlier Detection 
• Datasets: 
Dataset # instances # types # rel in # rel out # rel in & rel out 
DBpedia-Peel 2,083 39 586 322 908 
DBpedia-DBTropes 4,228 128 912 2,155 3,067 
• Methods: 
– k-NN Global Anomaly Score – GAS (k=25) 
– Local Outlier Factor – LOF (10<k<50) 
– Local Outlier Probability – LoOP (k=25) 
• Metrics for performance evaluation 
– Area under the ROC curve (AUC) 
• Results calculated on partial gold standard of 100 links 
9/29/2014 Ristoski, Paulheim 24
Evaluation: Outlier Detection 
Datasets Dbpedia-Peel Dbpedia-DBTropes 
Features Representation GAS LOF LoOP Avg. GAS LOF LoOP Avg. 
types 
Binary 0.386 0.487 0.554 0.476 0.503 0.627 0.605 0.578 
Relative Count 0.385 0.398 0.595 0.459 0.503 0.385 0.314 0.401 
TF-IDF 0.386 0.504 0.602 0.497 0.503 0.672 0.417 0.531 
r in 
Binary 0.169 0.367 0.289 0.275 0.426 0.52 0.45 0.465 
Count 0.2 0.285 0.29 0.258 0.503 0.59 0.602 0.565 
Relative Count 0.293 0.496 0.452 0.414 0.589 0.555 0.493 0.546 
TF-IDF 0.14 0.354 0.317 0.270 0.509 0.519 0.568 0.532 
r out 
Binary 0.25 0.195 0.207 0.217 0.325 0.438 0.432 0.398 
Count 0.539 0.455 0.391 0.462 0.547 0.578 0.522 0.549 
Relative Count 0.542 0.544 0.391 0.492 0.618 0.601 0.513 0.577 
TF-IDF 0.116 0.396 0.24 0.251 0.322 0.629 0.472 0.474 
r in & out 
Binary 0.324 0.431 0.51 0.422 0.352 0.439 0.396 0.396 
Count 0.527 0.368 0.454 0.450 0.57 0.563 0.527 0.553 
Relative Count 0.603 0.744 0.616 0.654 0.667 0.672 0.657 0.665 
TF-IDF 0.202 0.667 0.484 0.451 0.481 0.462 0.5 0.481 
9/29/2014 Ristoski, Paulheim 25
Conclusion 
• The chosen propositionalization strategy matters 
• No general recommendation for a strategy 
– What is the data mining task? 
– What are the characteristics of the dataset? 
– Which algorithm is going to be used? 
9/29/2014 Ristoski, Paulheim 26
Future Work 
• Conduct further experiments on more feature sets 
– Qualified incoming and outgoing relations 
– Combine features from multiple LOD sources 
• Conduct experiments on more data mining tasks 
– Clustering, Recommendation Systems etc… 
• More sophisticated strategies 
– Combination of statistical and semantic measures 
– Adaptation of weighting strategies used in text mining to overcome 
problems with erroneous data 
• Use the statistical measures for feature selection 
9/29/2014 Ristoski, Paulheim 27
RapidMiner LOD Extension 
• Simple wiring of operators 
– Importing 
– Linking 
– Feature generation 
– Data consolidation 
– Feature selection 
– Visualization 
9/29/2014 Ristoski, Paulheim 28
RapidMiner LOD Extension 
Local LOD 
Data 
link combine cleanse transform analyze 
9/29/2014 Ristoski, Paulheim 29
RapidMiner LOD Extension 
Data 
Enrichment 
Data 
Analysis 
Linking 
Feature 
Selection 
Schema 
Matching & 
Data Fusion 
9/29/2014 Ristoski, Paulheim 30
RapidMiner LOD Extension 
• Simple wiring of operators 
– Importing 
– Linking 
– Feature generation 
– Data fusion 
– Feature selection 
– Visualization 
• Try it out! 
– find “Linked Open Data” on the marketplace 
– Google Group: groups.google.com/forum/#!forum/rmlod 
9/29/2014 Ristoski, Paulheim 31
32 
A Comparison of Propositionalization Strategies 
for Creating Features from Linked Open Data 
9/29/2014 Petar Ristoski, Heiko Paulheim

More Related Content

Similar to A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data

DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
 
Database fundamentals and concepts and theory
Database fundamentals and concepts and theoryDatabase fundamentals and concepts and theory
Database fundamentals and concepts and theoryMozamel Jawad
 
Efficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matchingEfficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matchingMateusz Fedoryszak
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Semantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsSemantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsFaegheh Hasibi
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Visualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backVisualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backdjw213
 
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix SchemesOptimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix SchemesHPCC Systems
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataThomas Gottron
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...Giuseppe Rizzo
 
InstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxInstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxdirkrplav
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionUmaporn Kerdsaeng
 
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Cheng Chen
 
Reference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia InfoboxesReference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia InfoboxesWłodzimierz Lewoniewski
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesParang Saraf
 

Similar to A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data (20)

DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
Database fundamentals and concepts and theory
Database fundamentals and concepts and theoryDatabase fundamentals and concepts and theory
Database fundamentals and concepts and theory
 
Efficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matchingEfficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matching
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Semantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsSemantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity Cards
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Visualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backVisualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and back
 
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix SchemesOptimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
 
InstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxInstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docx
 
A Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF GraphsA Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF Graphs
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competition
 
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
 
Reference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia InfoboxesReference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia Infoboxes
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
 

Recently uploaded

Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureSérgio Sacani
 
Cell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptxCell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptxCherry
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPirithiRaju
 
Lec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.pptLec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.pptasifali1111
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxGOWTHAMIM22
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionAreesha Ahmad
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Sérgio Sacani
 
Microbial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxMicrobial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxCherry
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategyMansiBishnoi1
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Sérgio Sacani
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfPharmatech-rx
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Sérgio Sacani
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...Sérgio Sacani
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Sérgio Sacani
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfRevenJadePalma
 
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...frank0071
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyanmuralinath2
 

Recently uploaded (20)

Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
Cell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptxCell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptx
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
Lec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.pptLec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.ppt
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
Microbial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxMicrobial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptx
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 

A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data

  • 1. 1 A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data 9/29/2014 Petar Ristoski, Heiko Paulheim
  • 3. Motivation • Many existing applications use LOD as background knowledge in data mining – Explaining data patterns and statistics: unemployment rate, inflation, energy savings, etc … – Content-based book/movies recommendation system – Classifying incident related tweets – Gene classification – Prediction of car fuel consumption 9/29/2014 Ristoski, Paulheim 3
  • 4. Motivation Local LOD Data link combine cleanse transform analyze 9/29/2014 Ristoski, Paulheim 4
  • 5. Motivation • Standard data mining algorithms require propositional feature vector representation • Feature space: V={v1,v2,…, vn} • Each instance is represented as an n-dimensional feature vector (v1,v2,…,vn), where for each 1≤ vi ≤n : – vi ∈ {true, false}, or vi ∈ {1,0} – vi ∈ ℝ – vi ∈ S, where S is a finite set of symbols 9/29/2014 Ristoski, Paulheim 5
  • 6. Motivation Name Person Music Artist Instrument Genre Trent Reznor 1 1 1 0 Wolfgang A. Mozart 1 1 1 1 Barack Obama 1 0 0 0 9/29/2014 Ristoski, Paulheim 6
  • 7. Related Work • LiDDM (Narasimha et al.) – a framework tool for Linked Data mining that capture data from LOD cloud to extract hidden information • SPARQL-ML (Kiefer et al.) – extension to SPARQL to support data mining tasks for knowledge discovery in the Semantic Web • FeGeLOD (Paulheim et al.) – Unsupervised generation of data mining features from LOD • The resulting features are binary, or numerical aggregates using SPARQL COUNT constructs • No proper evaluation of the used propositionalization strategy 9/29/2014 Ristoski, Paulheim 7
  • 8. PROPOSED STRATEGIES 9/29/2014 Ristoski, Paulheim 8
  • 9. Strategies • Strategies for features derived from specific relations – r rdf:type C – r dcterms:subject S • Strategies for features derived from generic relations 9/29/2014 Ristoski, Paulheim 9
  • 10. Strategies for Features Derived from Specific Relations 1. Binary feature: – vi =1 if C(r) – vi =0 if ⅂C(r) 2. Relative count feature: – vi = 1 푛 , where r has relation to n objects 3. TF-IDF feature: – vi = 1 푛 log 푁 {푟|퐶 푟 } , where N is the total number of resources in the dataset, and {푟|퐶 푟 } denotes the number of resources for which the specific relation to an object C exists 9/29/2014 Ristoski, Paulheim 10
  • 11. Features Derived from Specific Relations: Binary vs TF-IDF +10 Music Artists dbpedia:Person dbpedia:Artist dbpedia:MusicArtist dbpedia:MilitaryPerson dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley Name Person Artist Music Artist Military Person Elvis Presley 1 1 1 1 Kris Kristofferson 1 1 1 1 Artist X 1 1 1 0 9/29/2014 Ristoski, Paulheim 11
  • 12. Features Derived from Specific Relations: Binary vs TF-IDF +10 Music Artists dbpedia:Person dbpedia:Artist dbpedia:MusicArtist dbpedia:MilitaryPerson dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley Name Person Artist Music Artist Military Person Elvis Presley 0 0 0 0.672 Kris Kristofferson 0 0 0 0.672 Artist X 0 0 0 0 9/29/2014 Ristoski, Paulheim 12
  • 13. Strategies • Strategies for features derived from specific relations – r rdf:type C → C(r) – dcterms:subject • Strategies for features derived from relations as such – describe how resource r is related to resource r′ – outgoing relation: p(r, r′ ) – Incoming relation : p(r′ ,r) 9/29/2014 Ristoski, Paulheim 13
  • 14. Strategies for Features Derived from Generic Relations 1. Binary feature: – vi = 1 if p(r,r′) – vi = 0 if ⅂p(r) 2. Count feature: – vi = n, where r is connected to n resources with relation p 3. Relative count feature: – vi = 푛푝 푃 , where P is the total number of outgoing relations for r, and np is the number of relations of type p for r 4. TF-IDF feature: – vi = 푛푝 푃 log 푁 {푟|∃푟′:푝 푟,푟′ } , where N is the total number of resources in the dataset, and {푟|∃푟′: 푝 푟, 푟′ } denotes the number of resources for which p(r, r′) exists 9/29/2014 Ristoski, Paulheim 14
  • 15. Features Derived from Generic Relations : Binary vs Relative Count dbpedia:Chester_Bennington dbpedia:Anthony_Kiedis dbpedia:Jules_Verne dbpedia:instrument 8 5 1 dbpedia:author 68 Name instrument author Chester Bennington 1 0 Anthony Kiedis 1 1 Jules Verne 0 1 9/29/2014 Ristoski, Paulheim 15
  • 16. Features Derived from Generic Relations : Binary vs Relative Count dbpedia:Chester_Bennington dbpedia:Anthony_Kiedis dbpedia:Jules_Verne dbpedia:instrument 8 5 1 dbpedia:author 68 Name instrument author Chester Bennington 1 0 Anthony Kiedis 0.833 0.166 Jules Verne 0 1 9/29/2014 Ristoski, Paulheim 16
  • 18. Evaluation • Comparative evaluation of the propositionalization strategies on three data mining tasks – Classification – Regression – Outlier Detection • Evaluated on six datasets, on five feature sets – types – categories – incoming relations – outgoing relations – incoming and outgoing relations 9/29/2014 Ristoski, Paulheim 19
  • 19. Evaluation: Classification • Datasets: Dataset # instances # types # categories # rel in # rel out # rel in & rel out Sports Tweets 5,054 7,814 14,025 3,574 5,334 8,908 Cities 212 721 999 1,304 1,081 2,385 • Methods: – Naïve Bayes – k-Nearest Neighbors (k=3) – C4.5 decision trees • Metrics for performance evaluation – Accuracy • Results calculated using stratified 10-fold cross validation 9/29/2014 Ristoski, Paulheim 20
  • 20. Evaluation: Classification Datasets Cities Sports Tweets Features Representation NB k-NN C4.5 Avg. NB k-NN C4.5 Avg. types Binary 55.71 56.17 59.05 56.98 81 82.9 82.95 82.28 Relative Count 57.1 49.61 55.22 53.98 80.96 81.44 81.88 81.43 TF-IDF 57.1 48.7 54.7 53.50 82.13 82.47 82.64 82.41 categories Binary 55.74 49.98 56.17 53.96 82.26 76.56 71.98 76.93 Relative Count 59.52 44.35 58.96 54.28 90.76 84.09 80.86 85.24 TF-IDF 55.74 49.98 57.08 54.27 89.65 81.98 81.68 84.44 rel in Binary 60.41 58.46 60.35 59.74 83.12 83.63 84.65 83.80 Count 56.69 31.1 59.37 49.05 83.25 85.11 85.4 84.59 Relative Count 49.16 38.23 58.55 48.65 69.51 84.63 85.17 79.77 TF-IDF 34.94 38.2 54.26 42.47 72.66 84.65 84.99 80.77 rel out Binary 47.62 60 56.71 54.78 80.67 82.36 84.41 82.48 Count 49.96 55.24 58.59 54.60 79.95 83.35 85.01 82.77 Relative Count 48.07 58.44 56.65 54.39 62.13 84.27 83.51 76.64 TF-IDF 40.15 54.78 58.51 51.15 69.91 84.4 84.16 79.49 r in & out Binary 59.44 58.57 56.47 58.16 86.17 85.14 86.45 85.92 Count 56.13 54.26 60.82 57.07 86.03 86.01 87.16 86.40 Relative Count 57.68 47.14 56.56 53.79 70.01 84.51 87.22 80.58 TF-IDF 40.17 46.21 58.46 48.28 75.15 84.86 86.19 82.07 9/29/2014 Ristoski, Paulheim 21
  • 21. Evaluation: Regression • Datasets: Dataset # instances # types # categories # rel in # rel out # rel in & rel out Auto MPG 391 264 308 227 370 597 Cities 212 721 999 1,304 1,081 2,385 • Methods: – Linear Regression – M5Rules – k-Nearest Neighbors (k=3) • Metrics for performance evaluation – RMSE • Results calculated using stratified 10-fold cross validation 9/29/2014 Ristoski, Paulheim 22
  • 22. Evaluation: Regression Datasets Auto MPG Cities Features Representation LR M5 k-NN Avg. LR M5 k-NN Avg. types Binary 3.952 3.056 3.63 3.546 24.303 18.793 22.164 21.753 Relative Count 3.843 2.952 3.571 3.455 18.046 19.696 33.569 23.770 TF-IDF 3.864 2.964 3.571 3.466 17.852 18.773 22.396 19.674 categories Binary 3.698 2.9 3.62 3.409 18.884 22.323 22.677 21.295 Relative Count 3.747 3 3.57 3.430 18.952 19.98 34.489 24.474 TF-IDF 3.782 2.9 3.56 3.416 19.02 22.323 23.189 21.511 rel in Binary 3.849 2.9 3.61 3.444 49.866 19.205 18.532 29.201 Count 3.892 3 4.62 3.824 138.041 19.915 19.27 59.075 Relative Count 3.976 2.9 3.57 3.488 122.365 22.335 18.877 54.526 TF-IDF 4.109 2.8 3.57 3.508 122.921 21.947 18.568 54.479 rel out Binary 3.792 3.1 3.6 3.490 20.008 19.364 20.918 20.097 Count 4.072 3 4.15 3.734 36.317 19.459 23.994 26.590 Relative Count 4.095 2.9 3.57 3.536 43.22 21.961 21.472 28.884 TF-IDF 4.135 3 3.57 3.572 28.845 20.852 22.212 23.970 r in & out Binary 3.991 3.1 3.67 3.572 40.803 18.803 18.211 25.939 Count 3.991 3.1 4.54 3.870 107.259 19.528 18.906 48.564 Relative Count 3.922 3 3.57 3.493 103.102 22.091 19.608 48.267 TF-IDF 3.982 3.01 3.57 3.523 115.373 20.623 19.702 51.899 9/29/2014 Ristoski, Paulheim 23
  • 23. Evaluation: Outlier Detection • Datasets: Dataset # instances # types # rel in # rel out # rel in & rel out DBpedia-Peel 2,083 39 586 322 908 DBpedia-DBTropes 4,228 128 912 2,155 3,067 • Methods: – k-NN Global Anomaly Score – GAS (k=25) – Local Outlier Factor – LOF (10<k<50) – Local Outlier Probability – LoOP (k=25) • Metrics for performance evaluation – Area under the ROC curve (AUC) • Results calculated on partial gold standard of 100 links 9/29/2014 Ristoski, Paulheim 24
  • 24. Evaluation: Outlier Detection Datasets Dbpedia-Peel Dbpedia-DBTropes Features Representation GAS LOF LoOP Avg. GAS LOF LoOP Avg. types Binary 0.386 0.487 0.554 0.476 0.503 0.627 0.605 0.578 Relative Count 0.385 0.398 0.595 0.459 0.503 0.385 0.314 0.401 TF-IDF 0.386 0.504 0.602 0.497 0.503 0.672 0.417 0.531 r in Binary 0.169 0.367 0.289 0.275 0.426 0.52 0.45 0.465 Count 0.2 0.285 0.29 0.258 0.503 0.59 0.602 0.565 Relative Count 0.293 0.496 0.452 0.414 0.589 0.555 0.493 0.546 TF-IDF 0.14 0.354 0.317 0.270 0.509 0.519 0.568 0.532 r out Binary 0.25 0.195 0.207 0.217 0.325 0.438 0.432 0.398 Count 0.539 0.455 0.391 0.462 0.547 0.578 0.522 0.549 Relative Count 0.542 0.544 0.391 0.492 0.618 0.601 0.513 0.577 TF-IDF 0.116 0.396 0.24 0.251 0.322 0.629 0.472 0.474 r in & out Binary 0.324 0.431 0.51 0.422 0.352 0.439 0.396 0.396 Count 0.527 0.368 0.454 0.450 0.57 0.563 0.527 0.553 Relative Count 0.603 0.744 0.616 0.654 0.667 0.672 0.657 0.665 TF-IDF 0.202 0.667 0.484 0.451 0.481 0.462 0.5 0.481 9/29/2014 Ristoski, Paulheim 25
  • 25. Conclusion • The chosen propositionalization strategy matters • No general recommendation for a strategy – What is the data mining task? – What are the characteristics of the dataset? – Which algorithm is going to be used? 9/29/2014 Ristoski, Paulheim 26
  • 26. Future Work • Conduct further experiments on more feature sets – Qualified incoming and outgoing relations – Combine features from multiple LOD sources • Conduct experiments on more data mining tasks – Clustering, Recommendation Systems etc… • More sophisticated strategies – Combination of statistical and semantic measures – Adaptation of weighting strategies used in text mining to overcome problems with erroneous data • Use the statistical measures for feature selection 9/29/2014 Ristoski, Paulheim 27
  • 27. RapidMiner LOD Extension • Simple wiring of operators – Importing – Linking – Feature generation – Data consolidation – Feature selection – Visualization 9/29/2014 Ristoski, Paulheim 28
  • 28. RapidMiner LOD Extension Local LOD Data link combine cleanse transform analyze 9/29/2014 Ristoski, Paulheim 29
  • 29. RapidMiner LOD Extension Data Enrichment Data Analysis Linking Feature Selection Schema Matching & Data Fusion 9/29/2014 Ristoski, Paulheim 30
  • 30. RapidMiner LOD Extension • Simple wiring of operators – Importing – Linking – Feature generation – Data fusion – Feature selection – Visualization • Try it out! – find “Linked Open Data” on the marketplace – Google Group: groups.google.com/forum/#!forum/rmlod 9/29/2014 Ristoski, Paulheim 31
  • 31. 32 A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data 9/29/2014 Petar Ristoski, Heiko Paulheim