SlideShare a Scribd company logo
1 of 28
Download to read offline
Batch import of large RDF datasets
using RDFIO or the new
rdf2smw tool
Samuel Lampa - @smllmp
PhD Student
in Pharmaceutical Bioinformatics @ pharmb.io
with Assoc. Prof. Ola Spjuth - @ola_spjuth
@ Dept. of Pharm. Biosci. / Uppsala University
Semantic MediaWiki Conference Fall 2016, Frankfurt am Main,
RDF Import? Who wants that?
Research interests
● Large datasets
● Automation
● Scientific workflows
● Machine Learning
● Semantic data
● Reasoning
● Query systems
● Something user friendly
● … and hopefully usable
● “Answer ALL the research questionz”
RDFIO
github.com/rdfio/rdfio
What’s the problem?
● Semantic MediaWiki has great support for
exporting to RDF
What’s the problem?
● … but, not really any (proper) RDF import
(as in: plain triples → wiki syntax in articles)
RDFIO What?!
● SMW extension
● Import plain RDF triples
● No need for an ontology
● RDF URIs → Wiki titles
● Retains Original URIs
● Translates back to
Original URIs on export
● Round-trip SMW ↔ RDF
● tinyurl.com/getrdfio
Turning RDF Triples into Wiki Pages
<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden>
<http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer
<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany>
<http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer
Turning RDF Triples into Wiki Pages
<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden>
<http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer
<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany>
<http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer
Stockholm
[[Located In::Sweden]]
[[Population::789024]]
[[Original URI::http://ex.org/Stockholm]]
Frankfurt
[[Located In::Germany]]
[[Population::731095]]
[[Original URI::http://ex.org/Frankfurt]]
Turning RDF Triples into Wiki Pages
<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden>
<http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer
<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany>
<http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer
Sweden
[[Original URI::http://ex.org/Sweden]]
Germany
[[Original URI::http://ex.org/Germany]]
Stockholm
[[Located In::Sweden]]
[[Population::789024]]
[[Original URI::http://ex.org/Stockholm]]
Frankfurt
[[Located In::Germany]]
[[Population::731095]]
[[Original URI::http://ex.org/Frankfurt]]
Turning RDF Triples into Wiki Pages
<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden>
<http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer
<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany>
<http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer
Property:LocatedIn
[[Has type::Page]]
[[Original URI::http://ex.org/LocatedIn]]
Property:Population
[[Has type::Number]]
[[Original URI::http://ex.org/Population]]
Sweden
[[Original URI::http://ex.org/Sweden]]
Germany
[[Original URI::http://ex.org/Germany]]
Stockholm
[[Located In::Sweden]]
[[Population::789024]]
[[Original URI::http://ex.org/Stockholm]]
Frankfurt
[[Located In::Germany]]
[[Population::731095]]
[[Original URI::http://ex.org/Frankfurt]]
RDF Import interface
SPARQL Endpoint
SPARQL: Output Original URI
SPARQL: Query by Original URI
RDFIO History Timeline
RDFIO – Current Status
● SMW 2.3 support – with some hacks
(Ali working on the last minor issues)
● See the Vagrant box for a working automated
setup with MW 1.26.4 + SMW 2.3.1:
– github.com/rdfio/rdfio-vagrantbox
● Some known minor issues
New Feature: Commandline Import
Problem:
● Importing 300K triples can take like 24h
.
.
.
.
.
.
.
.
● What if you realize a mis-configuration
only after 24h?
Solution:
rdf2smw
(new tool)
The new rdf2smw tool
● Convert RDF → MediaWiki XML (Really fast!)
● Import via MediaWiki XML import (Still slow...)
● But: Can now preview before the XML import!
More rdf2smw facts:
● Written in Go for compiled, multi-core performance
● Very pluggable architecture
● Easy to install: Just download and run!
● Get it: github.com/samuell/rdf2smw
rdf2smw: Architecture
rdf2smw performance
50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000
0
100
200
300
400
500
600
Number of triples
Executiontime(s)
Future outlook
● How to make RDFIO more maintainable, for developers
with too little time?
● Drastically simplify?
● Break out well defined sub-modules?
(SPARQL endpoint, RDF Import, etc)
● Integrate with MW REST API Instead of dedicated Special-
page – as per Denny’s original idea with SMWWriter?
● Re-use core SMW functionality more? (Or not?)
● Your ideas?
RDFIO Vagrant box
github.com/rdfio/rdfio-vagrantbox
$ vagrant up
20 min
The new Vagrant box:
Set up MW + SMW + RDFIO in 7 steps
1) Install dependencies
2) $ git clone https://github.com/rdfio/rdfio-vagrantbox.git
3) $ cd rdfio-vagrantbox
4) $ vagrant up
5) Surf in on localhost:8080/w/index.php/Special:RDFIOAdmin
6) Log in with Admin and changethis
7) Click “Setup”
Done!
Acknowledgements
● Denny Vrandečić (@vrandezo) - Basically had the same idea for an extension already
when the (eventually accepted) GSOC proposal was submitted in 2010, and supported
the project with valuable ideas and though mentoring the GSOC 2010 project.
● Ali King (@ali_king) – Has done great work at updating the extension to the latest
standards and versions, and added the new template editing functionality, as part of
aOPW 2014 project.
● Joel Sachs (@xjsachs) - Championed the addition of the template editing functionality,
provided valuable encouragement and mentored Ali King’s FOSS OPW project.
● Egon Willighagen (@egonwillighagen) - Has supported the project with valuable
testing, constructive feedback, encouragement and new ideas.
● Ola Spjuth (@ola_spjuth) – Has provided constructive feedback and encouragement,
as well as financed parts of the further development of the project.
● Google Inc. - Supported the initial development through it’s
summer of code program (GSOC) in 2010.
● Gnome Foundation - Supporting further development as part of its
outreach program for women (OPW) in 2014.

More Related Content

What's hot

Solr fusion lt elag2014
Solr fusion lt elag2014Solr fusion lt elag2014
Solr fusion lt elag2014
Leander Seige
 
Solr and ManifoldCF
Solr and ManifoldCFSolr and ManifoldCF
Solr and ManifoldCF
Minoru Osuka
 

What's hot (20)

Logstash
LogstashLogstash
Logstash
 
Solr fusion lt elag2014
Solr fusion lt elag2014Solr fusion lt elag2014
Solr fusion lt elag2014
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
Introducing ELK
Introducing ELKIntroducing ELK
Introducing ELK
 
Pharo Status Fosdem 2015
Pharo Status Fosdem 2015Pharo Status Fosdem 2015
Pharo Status Fosdem 2015
 
Garage RDBMS
Garage RDBMSGarage RDBMS
Garage RDBMS
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Fluentd and AWS at classmethod
Fluentd and AWS at classmethodFluentd and AWS at classmethod
Fluentd and AWS at classmethod
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
kikstart journey of Golang with Hello world - Gopherlabs
kikstart journey of Golang with Hello world - Gopherlabs kikstart journey of Golang with Hello world - Gopherlabs
kikstart journey of Golang with Hello world - Gopherlabs
 
Solr and ManifoldCF
Solr and ManifoldCFSolr and ManifoldCF
Solr and ManifoldCF
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014
 
Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus Denker
 
Introduction to RethinkDB and Horizon.js
Introduction to RethinkDB and Horizon.jsIntroduction to RethinkDB and Horizon.js
Introduction to RethinkDB and Horizon.js
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
 

Viewers also liked

STATA - Importing Data
STATA - Importing DataSTATA - Importing Data
STATA - Importing Data
stata_org_uk
 

Viewers also liked (15)

Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
 
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 
Thesis presentation Samuel Lampa
Thesis presentation Samuel LampaThesis presentation Samuel Lampa
Thesis presentation Samuel Lampa
 
Vagrant + Ansible + Docker
Vagrant + Ansible + DockerVagrant + Ansible + Docker
Vagrant + Ansible + Docker
 
Reproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarReproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience Seminar
 
FDA & USDA Import Food Safety Forum
FDA & USDA Import Food Safety ForumFDA & USDA Import Food Safety Forum
FDA & USDA Import Food Safety Forum
 
Vmware virtual appliances
Vmware virtual appliancesVmware virtual appliances
Vmware virtual appliances
 
Hacking Virtual Appliances
Hacking Virtual AppliancesHacking Virtual Appliances
Hacking Virtual Appliances
 
Model-Driven Deployment : The Best Practice Successor to Virtual Appliances
Model-Driven Deployment : The Best Practice Successor to Virtual AppliancesModel-Driven Deployment : The Best Practice Successor to Virtual Appliances
Model-Driven Deployment : The Best Practice Successor to Virtual Appliances
 
The RDFIO Extension - A Status update
The RDFIO Extension - A Status updateThe RDFIO Extension - A Status update
The RDFIO Extension - A Status update
 
STATA - Importing Data
STATA - Importing DataSTATA - Importing Data
STATA - Importing Data
 
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 
iRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat SheetiRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat Sheet
 
How to Successfully Import Natural Products to the USA
How to Successfully Import Natural Products to the USAHow to Successfully Import Natural Products to the USA
How to Successfully Import Natural Products to the USA
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 

Similar to Batch import of large RDF datasets into Semantic MediaWiki

S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)
Joachim Neubert
 

Similar to Batch import of large RDF datasets into Semantic MediaWiki (20)

Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
2007 03 12 Swecr 2
2007 03 12 Swecr 22007 03 12 Swecr 2
2007 03 12 Swecr 2
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
Open event (show&tell april 2016)
Open event (show&tell april 2016)Open event (show&tell april 2016)
Open event (show&tell april 2016)
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introduction
 
G3 talk rld_2
G3 talk rld_2G3 talk rld_2
G3 talk rld_2
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Intro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandIntro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP Switzerland
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 
Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
Rob Davidson at the G3 Workshop: Open Source - Tools for ReproducibilityRob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Apache Marmotta - Introduction
Apache Marmotta - IntroductionApache Marmotta - Introduction
Apache Marmotta - Introduction
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
 

More from Samuel Lampa

Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
Samuel Lampa
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
Samuel Lampa
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
Samuel Lampa
 

More from Samuel Lampa (9)

Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
How to document computational research projects
How to document computational research projectsHow to document computational research projects
How to document computational research projects
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
First encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsFirst encounter with Elixir - Some random things
First encounter with Elixir - Some random things
 
Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
 
Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 

Batch import of large RDF datasets into Semantic MediaWiki

  • 1. Batch import of large RDF datasets using RDFIO or the new rdf2smw tool Samuel Lampa - @smllmp PhD Student in Pharmaceutical Bioinformatics @ pharmb.io with Assoc. Prof. Ola Spjuth - @ola_spjuth @ Dept. of Pharm. Biosci. / Uppsala University Semantic MediaWiki Conference Fall 2016, Frankfurt am Main,
  • 2. RDF Import? Who wants that?
  • 3. Research interests ● Large datasets ● Automation ● Scientific workflows ● Machine Learning ● Semantic data ● Reasoning ● Query systems ● Something user friendly ● … and hopefully usable ● “Answer ALL the research questionz”
  • 5. What’s the problem? ● Semantic MediaWiki has great support for exporting to RDF
  • 6. What’s the problem? ● … but, not really any (proper) RDF import (as in: plain triples → wiki syntax in articles)
  • 7. RDFIO What?! ● SMW extension ● Import plain RDF triples ● No need for an ontology ● RDF URIs → Wiki titles ● Retains Original URIs ● Translates back to Original URIs on export ● Round-trip SMW ↔ RDF ● tinyurl.com/getrdfio
  • 8. Turning RDF Triples into Wiki Pages <http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden> <http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer <http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany> <http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer
  • 9. Turning RDF Triples into Wiki Pages <http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden> <http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer <http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany> <http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer Stockholm [[Located In::Sweden]] [[Population::789024]] [[Original URI::http://ex.org/Stockholm]] Frankfurt [[Located In::Germany]] [[Population::731095]] [[Original URI::http://ex.org/Frankfurt]]
  • 10. Turning RDF Triples into Wiki Pages <http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden> <http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer <http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany> <http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer Sweden [[Original URI::http://ex.org/Sweden]] Germany [[Original URI::http://ex.org/Germany]] Stockholm [[Located In::Sweden]] [[Population::789024]] [[Original URI::http://ex.org/Stockholm]] Frankfurt [[Located In::Germany]] [[Population::731095]] [[Original URI::http://ex.org/Frankfurt]]
  • 11. Turning RDF Triples into Wiki Pages <http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden> <http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer <http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany> <http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer Property:LocatedIn [[Has type::Page]] [[Original URI::http://ex.org/LocatedIn]] Property:Population [[Has type::Number]] [[Original URI::http://ex.org/Population]] Sweden [[Original URI::http://ex.org/Sweden]] Germany [[Original URI::http://ex.org/Germany]] Stockholm [[Located In::Sweden]] [[Population::789024]] [[Original URI::http://ex.org/Stockholm]] Frankfurt [[Located In::Germany]] [[Population::731095]] [[Original URI::http://ex.org/Frankfurt]]
  • 15. SPARQL: Query by Original URI
  • 17. RDFIO – Current Status ● SMW 2.3 support – with some hacks (Ali working on the last minor issues) ● See the Vagrant box for a working automated setup with MW 1.26.4 + SMW 2.3.1: – github.com/rdfio/rdfio-vagrantbox ● Some known minor issues
  • 19. Problem: ● Importing 300K triples can take like 24h . . . . . . . . ● What if you realize a mis-configuration only after 24h?
  • 21. The new rdf2smw tool ● Convert RDF → MediaWiki XML (Really fast!) ● Import via MediaWiki XML import (Still slow...) ● But: Can now preview before the XML import!
  • 22. More rdf2smw facts: ● Written in Go for compiled, multi-core performance ● Very pluggable architecture ● Easy to install: Just download and run! ● Get it: github.com/samuell/rdf2smw
  • 24. rdf2smw performance 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000 0 100 200 300 400 500 600 Number of triples Executiontime(s)
  • 25. Future outlook ● How to make RDFIO more maintainable, for developers with too little time? ● Drastically simplify? ● Break out well defined sub-modules? (SPARQL endpoint, RDF Import, etc) ● Integrate with MW REST API Instead of dedicated Special- page – as per Denny’s original idea with SMWWriter? ● Re-use core SMW functionality more? (Or not?) ● Your ideas?
  • 27. The new Vagrant box: Set up MW + SMW + RDFIO in 7 steps 1) Install dependencies 2) $ git clone https://github.com/rdfio/rdfio-vagrantbox.git 3) $ cd rdfio-vagrantbox 4) $ vagrant up 5) Surf in on localhost:8080/w/index.php/Special:RDFIOAdmin 6) Log in with Admin and changethis 7) Click “Setup” Done!
  • 28. Acknowledgements ● Denny Vrandečić (@vrandezo) - Basically had the same idea for an extension already when the (eventually accepted) GSOC proposal was submitted in 2010, and supported the project with valuable ideas and though mentoring the GSOC 2010 project. ● Ali King (@ali_king) – Has done great work at updating the extension to the latest standards and versions, and added the new template editing functionality, as part of aOPW 2014 project. ● Joel Sachs (@xjsachs) - Championed the addition of the template editing functionality, provided valuable encouragement and mentored Ali King’s FOSS OPW project. ● Egon Willighagen (@egonwillighagen) - Has supported the project with valuable testing, constructive feedback, encouragement and new ideas. ● Ola Spjuth (@ola_spjuth) – Has provided constructive feedback and encouragement, as well as financed parts of the further development of the project. ● Google Inc. - Supported the initial development through it’s summer of code program (GSOC) in 2010. ● Gnome Foundation - Supporting further development as part of its outreach program for women (OPW) in 2014.