SlideShare a Scribd company logo
DBpedia ♥ Commons 
Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan 
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
A lot of pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
Many pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
Not very similar to pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
DBpedia Extraction Framework 
2nd DBpedia Meeting Leipzig 03.09.2014 
✔ “Wiki agnostic” 
✔ Pluggable 
extractors 
✔ Out of the box 
support for 
common 
metadata 
✗ Tuned for extraction in the main namespace (not File:) 
✗ Many other challenges left
2nd DBpedia Meeting Leipzig 03.09.2014 
Challenges 
✔ File metadata 
✔ KML files 
✔ Image Galleries 
✔ Image Annotations 
✔ Mappings Wiki 
✔ Bootstrap community mappings 
✔ Template Statistics 
✔ Licensing 
✔ Technical details I'll not go into
Out-of-the-box support 
2nd DBpedia Meeting Leipzig 03.09.2014 
● Categories (skos) 
● External links 
● Geo-coordinates 
● Raw infobox properties 
● Labels 
● PageIds / Revisions 
● Links (internal / external) 
● Mappings Wiki (with some tweaking / more on that later)
2nd DBpedia Meeting Leipzig 03.09.2014 
File metadata 
● New Extractor 
● New file Class hierarchy 
– dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and 
dbo:Sound 
Sample Output: 
:Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; 
dcterms:type dbo:StillImage 
dbo:fileExtension "jpg" 
dcterms:format "image/jpeg" 
dbo:fileURL commons-path:Aeropetes.JPG ; 
foaf:depiction commons-path:Aeropetes.JPG ; 
dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
2nd DBpedia Meeting Leipzig 03.09.2014 
Image Galleries 
● Attach each gallery 
item to the page 
resource 
:Colorado dbo:hasGalleryItem 
Colorado.JPG, 
Denver_Colorado_Art.jpg, 
ColoradoCenter1.jpg.
Image Annotations 
2nd DBpedia Meeting Leipzig 03.09.2014 
● Annotation 
Gadget 
● Boxes with 
optional 
description
Image Annotations 
● W3 Media Fragments recommendation 
● Embed the box in the URI 
– ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> . 
● Add descriptions in the new resource 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Mappings Wiki
Template Statistics 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Licensing 
● Identified & imported automatically ~360 licence templates 
● Use the mappings wiki 
● Needed some hacking to make it work 
– e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}} 
:Acraea_circeis.JPG dbo:license 
<http://creativecommons.org/publicdomain/mark/1.0/> 
:Antepipona_deflenda_-_2012-10-17.webm dbo:license < 
http://creativecommons.org/licenses/by-sa/3.0/ >
KML Annotations attached to media 
Attach raw KML data to resource with custom extractor 
Sample Output: 
:Yellowstone_1871b.jpg dbo:hasKMLData “”” 
?xml version=1.0 encoding=UTF-8?> 
<kml xmlns=http://earth.google.com/kml/2.2”> 
<GroundOverlay> 
<name>Yorktown, Indiana (1878)</name> 
<description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman 
Brothers&apos; Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> 
<color>99ffffff</color><Icon><href>BIG_LINK_HERE</href> 
<viewBoundScale>0.75</viewBoundScale></Icon> 
<LatLonBox> 
<north>40.26126145890567</north><south>40.25777915632657</south> 
<east>-86.77033439383223</east><west>-86.77398493316619</west> 
<rotation>-1.123009884936565</rotation></LatLonBox> 
</GroundOverlay></kml>“”"^^rdfs:XMLLiteral . 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Left TODOs 
● Nested templates are commonly used and cannot be handled 
by the mappings wiki atm 
– e.g. Media descriptions (although mapped) are missing 
{{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr| 
Logo du projet [[w:fr:DBpedia|DBpedia]]}} 
● Annotation descriptions need some tweaking 
– Need to render wikitext 
● Put it under a SPARQL Endpoint 
● Provide Linked Data 
– http://commons.dbpedia.org
2nd DBpedia Meeting Leipzig 03.09.2014 
Thank You! 
Special thanks to: 
● Alexandru Todor (importing the License templates) 
● Google Summer of Code for sponsoring this project 
(Gaurav Vaidya) 
Questions? 
Dataset: http://nl.dbpedia.org/downloads/commonswiki 
Dataset samples: https://github.com/gaurav/commons-extraction

More Related Content

What's hot

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
BigData_Europe
 
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
PROIDEA
 
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BigData_Europe
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOcean
DigitalOcean
 
BDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architectureBDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architecture
BigData_Europe
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
Victor de Boer
 

What's hot (6)

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
 
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOcean
 
BDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architectureBDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architecture
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
 

Viewers also liked

DBpedia past, present & future
DBpedia past, present & futureDBpedia past, present & future
DBpedia past, present & future
Dimitris Kontokostas
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
Dimitris Kontokostas
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
Dimitris Kontokostas
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
Dimitris Kontokostas
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
Dimitris Kontokostas
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
Dimitris Kontokostas
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
Dimitris Kontokostas
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
Dimitris Kontokostas
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
andimou
 

Viewers also liked (9)

DBpedia past, present & future
DBpedia past, present & futureDBpedia past, present & future
DBpedia past, present & future
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 

Similar to DBpedia ♥ Commons

The DBpedia databus
The DBpedia databusThe DBpedia databus
The DBpedia databus
Leipziger Semantic Web Tag
 
Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017
Michael Frank
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)
Bartlomiej Filipek
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
DuraSpace
 
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
MariaDB plc
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
FIWARE
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public Images
Roberto Hashioka
 
Bring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In ProductionBring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In Production
Databricks
 
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Alexey Grigorev
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE
 
Modern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро ТарасенкоModern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро Тарасенко
Sigma Software
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
scorlosquet
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
Henry S
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
IWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologiesIWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologies
IWMW
 
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Jaime Crespo
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
Tobias Koprowski
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Radulescu Adina-Valentina
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Amazon Web Services
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM Briefing
John Gasper
 

Similar to DBpedia ♥ Commons (20)

The DBpedia databus
The DBpedia databusThe DBpedia databus
The DBpedia databus
 
Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
 
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public Images
 
Bring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In ProductionBring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In Production
 
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
 
Modern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро ТарасенкоModern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро Тарасенко
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
IWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologiesIWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologies
 
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM Briefing
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

DBpedia ♥ Commons

  • 1. DBpedia ♥ Commons Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan 2nd DBpedia Meeting Leipzig 03.09.2014
  • 2. ~23M pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 3. ~23M pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 4. A lot of pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 5. Many pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 6. Not very similar to pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 7. DBpedia Extraction Framework 2nd DBpedia Meeting Leipzig 03.09.2014 ✔ “Wiki agnostic” ✔ Pluggable extractors ✔ Out of the box support for common metadata ✗ Tuned for extraction in the main namespace (not File:) ✗ Many other challenges left
  • 8. 2nd DBpedia Meeting Leipzig 03.09.2014 Challenges ✔ File metadata ✔ KML files ✔ Image Galleries ✔ Image Annotations ✔ Mappings Wiki ✔ Bootstrap community mappings ✔ Template Statistics ✔ Licensing ✔ Technical details I'll not go into
  • 9. Out-of-the-box support 2nd DBpedia Meeting Leipzig 03.09.2014 ● Categories (skos) ● External links ● Geo-coordinates ● Raw infobox properties ● Labels ● PageIds / Revisions ● Links (internal / external) ● Mappings Wiki (with some tweaking / more on that later)
  • 10. 2nd DBpedia Meeting Leipzig 03.09.2014 File metadata ● New Extractor ● New file Class hierarchy – dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and dbo:Sound Sample Output: :Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; dcterms:type dbo:StillImage dbo:fileExtension "jpg" dcterms:format "image/jpeg" dbo:fileURL commons-path:Aeropetes.JPG ; foaf:depiction commons-path:Aeropetes.JPG ; dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
  • 11. 2nd DBpedia Meeting Leipzig 03.09.2014 Image Galleries ● Attach each gallery item to the page resource :Colorado dbo:hasGalleryItem Colorado.JPG, Denver_Colorado_Art.jpg, ColoradoCenter1.jpg.
  • 12. Image Annotations 2nd DBpedia Meeting Leipzig 03.09.2014 ● Annotation Gadget ● Boxes with optional description
  • 13. Image Annotations ● W3 Media Fragments recommendation ● Embed the box in the URI – ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> . ● Add descriptions in the new resource 2nd DBpedia Meeting Leipzig 03.09.2014
  • 14. 2nd DBpedia Meeting Leipzig 03.09.2014 Mappings Wiki
  • 15. Template Statistics 2nd DBpedia Meeting Leipzig 03.09.2014
  • 16. 2nd DBpedia Meeting Leipzig 03.09.2014 Licensing ● Identified & imported automatically ~360 licence templates ● Use the mappings wiki ● Needed some hacking to make it work – e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}} :Acraea_circeis.JPG dbo:license <http://creativecommons.org/publicdomain/mark/1.0/> :Antepipona_deflenda_-_2012-10-17.webm dbo:license < http://creativecommons.org/licenses/by-sa/3.0/ >
  • 17. KML Annotations attached to media Attach raw KML data to resource with custom extractor Sample Output: :Yellowstone_1871b.jpg dbo:hasKMLData “”” ?xml version=1.0 encoding=UTF-8?> <kml xmlns=http://earth.google.com/kml/2.2”> <GroundOverlay> <name>Yorktown, Indiana (1878)</name> <description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman Brothers&apos; Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> <color>99ffffff</color><Icon><href>BIG_LINK_HERE</href> <viewBoundScale>0.75</viewBoundScale></Icon> <LatLonBox> <north>40.26126145890567</north><south>40.25777915632657</south> <east>-86.77033439383223</east><west>-86.77398493316619</west> <rotation>-1.123009884936565</rotation></LatLonBox> </GroundOverlay></kml>“”"^^rdfs:XMLLiteral . 2nd DBpedia Meeting Leipzig 03.09.2014
  • 18. 2nd DBpedia Meeting Leipzig 03.09.2014 Left TODOs ● Nested templates are commonly used and cannot be handled by the mappings wiki atm – e.g. Media descriptions (although mapped) are missing {{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr| Logo du projet [[w:fr:DBpedia|DBpedia]]}} ● Annotation descriptions need some tweaking – Need to render wikitext ● Put it under a SPARQL Endpoint ● Provide Linked Data – http://commons.dbpedia.org
  • 19. 2nd DBpedia Meeting Leipzig 03.09.2014 Thank You! Special thanks to: ● Alexandru Todor (importing the License templates) ● Google Summer of Code for sponsoring this project (Gaurav Vaidya) Questions? Dataset: http://nl.dbpedia.org/downloads/commonswiki Dataset samples: https://github.com/gaurav/commons-extraction