SlideShare a Scribd company logo
1 of 17
Download to read offline
Emir Muñoz 
Fujitsu (Ireland) Limited 
National University of Ireland Galway 
LD4IE 2014 @ ISWC, Riva del Garda, Trentino, Italy. Oct 20th, 2014 
http://bit.ly/1xYTR6Z 
(@emir_munoz)
2
<subject, predicate, object> Domain(predicate)  ?? Range(predicate)  ?? 
3
select distinct ?obj where 
{?sub <http://dbpedia.org/property/isbn> ?obj} 
Let’s run the following SPARQL query over endpoint… 
And some more ... 
The endpoint response is a table with the values for the isbn property: 
So, what is the correct range for ? 
4 
0 71090 6176526 2 2.7073 140043853 1107020697 2940013968264 0978-02-02+02:00 http://dbpedia.org/resource/N/a "?"@en "ISBN 0-312-85182-0"@en "See text"@en "various"@en 
"ISBN 978-0-465-02656-2, ISBN 0-14-017997-6"@en 
"ISBN 0-553-07875-5 & ISBN 0-553-56166-9"@en 
"The Claiming of Sleeping Beauty: ISBN 0-452-26656-4"@en 
"-2.0"^^<http://dbpedia.org/datatype/second> 
"TBA"@en 
"not available"@en 
"[[#Bibliography"@en
LOV Statistics (by July 7th, 2014): 
446 vocabularies 
10 classes and 20 properties in average 
5 
range of isbn is http://schema.org/Text
…but still, is it what I’m looking for? what is the syntax? 
6
Etymology 
apo- + apsis 
Noun 
apoapsis (plural apoapsides) 
(astronomy) The point of a body's elliptical orbit about the system's centre of mass where the distance between the body and the centre of mass is at its maximum. 
Property: apoapsis 
[http://en.wiktionary.org/wiki/apoapsis] 
Earth 
Satellite 
dbr:17049_Miron dbo:apoapsis 4.01288e+11 
7
8 
https://github.com/dbpedia/extraction-framework/blob/master/ core/src/main/scala/org/dbpedia/extraction/ontology/OntologyDatatypes.scala
<subject, predicate, object> 
1488-07-28+02:00 
"September 2012"@en 
"--08-26+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
1982-05-23+02:00 
"August 2012"@en 
"--01-24+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
2007-04-11+02:00 
"July 2009"@en 
"--06-11+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
Lerman et al. (JAIR 2003) 
First column: 
[NUM-NUM-NUM+NUM:NUM] (plain literal) 
Second column: 
[ALPHA<space>NUM] (plain literal + lang) 
Third column: 
[--NUM-NUM+NUM:NUM] (typed literal) 
<http://dbpedia.org/property/date> 
9
Let be the set of content patterns. 
Lerman et al. (JAIR 2003) 
More specific categories 
For the input set: 
That generates the following patterns: 
Values are decomposed in tokens, and 
each token is represented by a syntactic 
class. 
10
2.4 billion RDF triples 
53,230 properties 
Version 3.9 
Split 
Method 
19.25% plain literals 
18.02% typed literals 
62.73% without lang or datatype (xsd:string) 
11
For apoapsis example, we extracted one pattern 
And we also found some other related properties: 
For date example, we extracted 7 patterns 
http://dbpedia.org/ontology/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/ontology/Planet/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/ontology/Spacecraft/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/property/apoapsis NUMBER 0.9230769230769231 
http://dbpedia.org/property/apoapsis LARGE/FLOAT_NUMBER 0.75213675 
http://dbpedia.org/property/date -- SMALL_NUMBER - SMALL_NUMBER 0.2 
http://dbpedia.org/property/date ALPHANUMERIC MEDIUM_NUMBER 0.166 
http://dbpedia.org/property/date ALPHANUMERIC 2012 0.032 
http://dbpedia.org/property/date ALPHANUMERIC.ALPHANUMERIC 0.012 
And more … 
12
The user has this value: “2014-10-20”. 
What property can he use? 
dbp:dateCreated, dbp:dateOfProduction, dbp:dateOpened, dbp:dateSigned, dbp:dateOfPremiere, dbp:date, among others. 
What is the property dbp:admCtrOf used for? 
"town of republic significance of Meleuz"@en (http://dbpedia.org/resource/Meleuz) 
"town of oblast significance of Oktyabrsk"@en (http://dbpedia.org/resource/Oktyabrsk) 
"town of republic significance of Sortavala"@en (http://dbpedia.org/resource/Sortavala) 
 it is used to declare Administrative Control Of 
13
Check for atypical values (outliers) 
Close look into the most (in)frequent patterns 
Possible errors during automatic extraction 
For the dbp:isbn property we can find the following values: 
"summer or autumn 380"@en 
"Late November"@en 
"Fall 1040"@en 
680 
"December, 67 BC"@en 
"April-July 1799"@en 
http://dbpedia.org/resource/New_Year's_Day 
http://dbpedia.org/resource/Second_Intermediate_Period_of_Egypt 
"New moon day of Kartika, celebrations begin two days prior and end two days after that date"@en 
Are they orvalues? 
14
E-mail: user1@domain.com 
Given name: John 
Surname: Snow 
Birthday: 1986-02-14 
A vCard, may be annotated with microformat hCard 
LD4IE Challenge 2014 
vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE 0.82 
vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com 0.69 
vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE 0.54 
vcard:email mailto : ALPHA @ ALPHANUMERIC . com 0.46 
vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE 0.36 
We can use our database to extract and validate the email: 
vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 
vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 
…also the birthday 
15
Extraction of lexico-syntactic patterns from LD datasets 
Different use cases: 
Search for properties 
Validation of values 
Information extraction based on patterns 
Future work: 
Study of consistency analysis of knowledge bases 
Extension of patterns to cover other knowledge bases 
Among others 
16 
500,000 content patterns
http://emunoz.org 
@emir_munoz 
Emir.Munoz@ie.fujistu.com 
https://github.com/emir-munoz/ld-patterns/

More Related Content

Viewers also liked

The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingEmir Muñoz
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014Emir Muñoz
 
Sell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreSell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreRobert Douglass
 
A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesEmir Muñoz
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataEmir Muñoz
 
Why contributing to Drupal is awesome
Why contributing to Drupal is awesomeWhy contributing to Drupal is awesome
Why contributing to Drupal is awesomeRobert Douglass
 
Drupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingDrupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingRobert Douglass
 
ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"Robert Douglass
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrRobert Douglass
 

Viewers also liked (12)

DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Sell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreSell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStore
 
A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
Why contributing to Drupal is awesome
Why contributing to Drupal is awesomeWhy contributing to Drupal is awesome
Why contributing to Drupal is awesome
 
The Business of Drupal
The Business of DrupalThe Business of Drupal
The Business of Drupal
 
Drupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingDrupal and Interactive Digital Marketing
Drupal and Interactive Digital Marketing
 
ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Surface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road AheadSurface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road Ahead
 

Similar to Learning Content Patterns from Linked Data

Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanfordSakthivel C R
 
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Equipex Biblissima
 
Craig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchCraig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchimarcticblue
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkagezouzias
 
Snmp class
Snmp classSnmp class
Snmp classaduitsis
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldJohn Kunze
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksData Con LA
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538Krishna Sankar
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San FranciscoMartin Odersky
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 

Similar to Learning Content Patterns from Linked Data (20)

SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
 
Craig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchCraig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearch
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
 
Snmp class
Snmp classSnmp class
Snmp class
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Scala+data
Scala+dataScala+data
Scala+data
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 

Recently uploaded

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Delhi Call girls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 

Recently uploaded (20)

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 

Learning Content Patterns from Linked Data

  • 1. Emir Muñoz Fujitsu (Ireland) Limited National University of Ireland Galway LD4IE 2014 @ ISWC, Riva del Garda, Trentino, Italy. Oct 20th, 2014 http://bit.ly/1xYTR6Z (@emir_munoz)
  • 2. 2
  • 3. <subject, predicate, object> Domain(predicate)  ?? Range(predicate)  ?? 3
  • 4. select distinct ?obj where {?sub <http://dbpedia.org/property/isbn> ?obj} Let’s run the following SPARQL query over endpoint… And some more ... The endpoint response is a table with the values for the isbn property: So, what is the correct range for ? 4 0 71090 6176526 2 2.7073 140043853 1107020697 2940013968264 0978-02-02+02:00 http://dbpedia.org/resource/N/a "?"@en "ISBN 0-312-85182-0"@en "See text"@en "various"@en "ISBN 978-0-465-02656-2, ISBN 0-14-017997-6"@en "ISBN 0-553-07875-5 & ISBN 0-553-56166-9"@en "The Claiming of Sleeping Beauty: ISBN 0-452-26656-4"@en "-2.0"^^<http://dbpedia.org/datatype/second> "TBA"@en "not available"@en "[[#Bibliography"@en
  • 5. LOV Statistics (by July 7th, 2014): 446 vocabularies 10 classes and 20 properties in average 5 range of isbn is http://schema.org/Text
  • 6. …but still, is it what I’m looking for? what is the syntax? 6
  • 7. Etymology apo- + apsis Noun apoapsis (plural apoapsides) (astronomy) The point of a body's elliptical orbit about the system's centre of mass where the distance between the body and the centre of mass is at its maximum. Property: apoapsis [http://en.wiktionary.org/wiki/apoapsis] Earth Satellite dbr:17049_Miron dbo:apoapsis 4.01288e+11 7
  • 9. <subject, predicate, object> 1488-07-28+02:00 "September 2012"@en "--08-26+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 1982-05-23+02:00 "August 2012"@en "--01-24+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 2007-04-11+02:00 "July 2009"@en "--06-11+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> Lerman et al. (JAIR 2003) First column: [NUM-NUM-NUM+NUM:NUM] (plain literal) Second column: [ALPHA<space>NUM] (plain literal + lang) Third column: [--NUM-NUM+NUM:NUM] (typed literal) <http://dbpedia.org/property/date> 9
  • 10. Let be the set of content patterns. Lerman et al. (JAIR 2003) More specific categories For the input set: That generates the following patterns: Values are decomposed in tokens, and each token is represented by a syntactic class. 10
  • 11. 2.4 billion RDF triples 53,230 properties Version 3.9 Split Method 19.25% plain literals 18.02% typed literals 62.73% without lang or datatype (xsd:string) 11
  • 12. For apoapsis example, we extracted one pattern And we also found some other related properties: For date example, we extracted 7 patterns http://dbpedia.org/ontology/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/ontology/Planet/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/ontology/Spacecraft/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/property/apoapsis NUMBER 0.9230769230769231 http://dbpedia.org/property/apoapsis LARGE/FLOAT_NUMBER 0.75213675 http://dbpedia.org/property/date -- SMALL_NUMBER - SMALL_NUMBER 0.2 http://dbpedia.org/property/date ALPHANUMERIC MEDIUM_NUMBER 0.166 http://dbpedia.org/property/date ALPHANUMERIC 2012 0.032 http://dbpedia.org/property/date ALPHANUMERIC.ALPHANUMERIC 0.012 And more … 12
  • 13. The user has this value: “2014-10-20”. What property can he use? dbp:dateCreated, dbp:dateOfProduction, dbp:dateOpened, dbp:dateSigned, dbp:dateOfPremiere, dbp:date, among others. What is the property dbp:admCtrOf used for? "town of republic significance of Meleuz"@en (http://dbpedia.org/resource/Meleuz) "town of oblast significance of Oktyabrsk"@en (http://dbpedia.org/resource/Oktyabrsk) "town of republic significance of Sortavala"@en (http://dbpedia.org/resource/Sortavala)  it is used to declare Administrative Control Of 13
  • 14. Check for atypical values (outliers) Close look into the most (in)frequent patterns Possible errors during automatic extraction For the dbp:isbn property we can find the following values: "summer or autumn 380"@en "Late November"@en "Fall 1040"@en 680 "December, 67 BC"@en "April-July 1799"@en http://dbpedia.org/resource/New_Year's_Day http://dbpedia.org/resource/Second_Intermediate_Period_of_Egypt "New moon day of Kartika, celebrations begin two days prior and end two days after that date"@en Are they orvalues? 14
  • 15. E-mail: user1@domain.com Given name: John Surname: Snow Birthday: 1986-02-14 A vCard, may be annotated with microformat hCard LD4IE Challenge 2014 vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE 0.82 vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com 0.69 vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE 0.54 vcard:email mailto : ALPHA @ ALPHANUMERIC . com 0.46 vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE 0.36 We can use our database to extract and validate the email: vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 …also the birthday 15
  • 16. Extraction of lexico-syntactic patterns from LD datasets Different use cases: Search for properties Validation of values Information extraction based on patterns Future work: Study of consistency analysis of knowledge bases Extension of patterns to cover other knowledge bases Among others 16 500,000 content patterns
  • 17. http://emunoz.org @emir_munoz Emir.Munoz@ie.fujistu.com https://github.com/emir-munoz/ld-patterns/