SlideShare a Scribd company logo
1 of 22
Download to read offline
Bringing the Semantic Web
closer to reality
PostgreSQL as RDF Graph Database
Jimmy Angelakos
EDINA, University of Edinburgh
FOSDEM
04-05/02/2017
or how to export your data to
someone who's expecting RDF
Jimmy Angelakos
EDINA, University of Edinburgh
FOSDEM
04-05/02/2017
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Semantic Web? RDF?
● Resource Description Framework
– Designed to overcome the limitations of HTML
– Make the Web machine readable
– Metadata data model
– Multigraph (Labelled, Directed)
– Triples (Subject – Predicate – Object)
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
● Information addressable via URIs
● <http://example.org/person/Mark_Twain>
<http://example.org/relation/author>
<http://example.org/books/Huckleberry_Finn>
● <http://edina.ac.uk/ns/item/74445709>
<http://purl.org/dc/terms/title>
"The power of words: A model of honesty and
fairness" .
● Namespaces
@prefix dc:
<http://purl.org/dc/elements/1.1/> .
RDF Triples
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Triplestores
● Offer persistence to our RDF graph
● RDFLib extended by RDFLib-SQLAlchemy
● Use PostgreSQL as storage backend!
● Querying
– SPARQL
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
{ +
"DOI": "10.1007/11757344_1", +
"URL": "http://dx.doi.org/10.1007/11757344_1", +
"type": "book-chapter", +
"score": 1.0, +
"title": [ +
"CrossRef Listing of Deleted DOIs" +
], +
"member": "http://id.crossref.org/member/297", +
"prefix": "http://id.crossref.org/prefix/10.1007", +
"source": "CrossRef", +
"created": { +
"date-time": "2006-10-19T13:32:01Z", +
"timestamp": 1161264721000, +
"date-parts": [ +
[ +
2006, +
10, +
19 +
] +
] +
}, +
"indexed": { +
"date-time": "2015-12-24T00:59:48Z", +
"timestamp": 1450918788746, +
"date-parts": [ +
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
import psycopg2
from rdflib import plugin, Graph, Literal, URIRef
from rdflib.namespace import Namespace
from rdflib.store import Store
import rdflib_sqlalchemy
EDINA = Namespace('http://edina.ac.uk/ns/')
PRISM =
Namespace('http://prismstandard.org/namespaces/basic/2.1/')
rdflib_sqlalchemy.registerplugins()
dburi =
URIRef('postgresql+psycopg2://myuser:mypass@localhost/rdfgraph'
)
ident = EDINA.rdfgraph
store = plugin.get('SQLAlchemy', Store)(identifier=ident)
gdb = Graph(store, ident)
gdb.open(dburi, create=True)
gdb.bind('edina', EDINA)
gdb.bind('prism', PRISM)
item = EDINA['item/' + str(1)]
triples = []
triples += (item, RDF.type, EDINA.Item),
triples += (item, PRISM.doi, Literal('10.1002/crossmark_policy'),
gdb.addN(t + (gdb,) for t in triples)
gdb.serialize(format='turtle')
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
BIG
DATA
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Super size me!
● Loop over original database contents
● Create triples
● Add them to graph efficiently
● Serialise graph efficiently
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
But but… ?
● Big data without Java?
● Graphs without Java?
– Gremlin? BluePrints? TinkerPop? Jena? Asdasdfaf? XYZZY?
● Why not existing triplestores? “We are the only Graph DB that...”
● Python/Postgres run on desktop hardware
● Simplicity (few LOC and readable)
● Unoptimised potential for improvement→
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
conn = psycopg2.connect(database='mydb', user='myuser')
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
seqcur = conn.cursor()
seqcur.execute("""
CREATE SEQUENCE IF NOT EXISTS item;
""")
conn.commit()
cur = conn.cursor('serverside_cur',
cursor_factory=psycopg2.extras.DictCursor)
cur.itersize = 50000
cur.arraysize = 10000
cur.execute("""
SELECT data
FROM mytable
""")
while True:
recs = cur.fetchmany(10000)
if not recs:
break
for r in recs:
if 'DOI' in r['data'].keys():
seqcur.execute("SELECT nextval('item')")
item = EDINA['item/' + str(seqcur.fetchone()[0])]
triples += (item, RDF.type, EDINA.Item),
triples += (item, PRISM.doi, Literal(r['data']['DOI'])),
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
1st
challenge
● rdflib-sqlalchemy
– No ORM, autocommit (!)
– Creates SQL statements, executes one at a time
– INSERT INTO … VALUES (…);
INSERT INTO … VALUES (…);
INSERT INTO … VALUES (…);
– We want INSERT INTO … VALUES (…),(…),
(…)
– Creates lots of indexes which must be dropped
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
2nd
challenge
● How to restart if interrupted?
● Solved with querying and caching.
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
from rdflib.plugins.sparql import prepareQuery
orgQ = prepareQuery("""
SELECT ?org ?pub
WHERE { ?org a foaf:Organization .
?org rdfs:label ?pub . }
""", initNs = { 'foaf': FOAF, 'rdfs': RDFS })
orgCache = {}
for o in gdb.query(orgQ):
orgCache[o[1].toPython()] = URIRef(o[0].toPython())
if 'publisher' in r['data'].keys():
publisherFound = False
if r['data']['publisher'] in orgCache.keys():
publisherFound = True
triples += (item, DCTERMS.publisher, orgCache[r['data']
['publisher']]),
if not publisherFound:
seqcur.execute("SELECT nextval('org')")
org = EDINA['org/' + str(seqcur.fetchone()[0])]
orgCache[r['data']['publisher']] = org
triples += (org, RDF.type, FOAF.Organization),
triples += (org, RDFS.label, Literal(r['data']
['publisher'])),
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
3rd
challenge
● rdflib-sqlalchemy (yup… guessed it)
– Selects whole graph into memory
● Server side cursor:
res = connection.execution_options(
stream_results=True).execute(q)
● Batching:
while True:
result = res.fetchmany(1000) …
yield …
– Inexplicably caches everything read in RAM!
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
4th
challenge
● Serialise efficiently! Multiprocessing→
– Processes, JoinableQueues
● Turtle: Unsuitable N-Triples→
– UNIX magic
python3 rdf2nt.py |
split -a4 -d -C4G
--additional-suffix=.nt
--filter='gzip > $FILE.gz' -
exported/rdfgraph_
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Desktop hardware...
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
5th
challenge
● Serialisation outran HDD!
● Waits for:
– JoinableQueues to empty
– sys.stdout.flush()
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
rdfgraph=> dt
List of relations
Schema | Name | Type | Owner
--------+-----------------------------------+-------+----------
public | kb_a8f93b2ff6_asserted_statements | table | myuser
public | kb_a8f93b2ff6_literal_statements | table | myuser
public | kb_a8f93b2ff6_namespace_binds | table | myuser
public | kb_a8f93b2ff6_quoted_statements | table | myuser
public | kb_a8f93b2ff6_type_statements | table | myuser
(5 rows)
rdfgraph=> x
Expanded display is on.
rdfgraph=> select * from kb_a8f93b2ff6_asserted_statements limit
1;
-[ RECORD 1 ]-----------------------------------------------
id | 62516955
subject | http://edina.ac.uk/ns/creation/12993043
predicate | http://www.w3.org/ns/prov#wasAssociatedWith
object | http://edina.ac.uk/ns/agent/12887967
context | http://edina.ac.uk/ns/rdfgraph
termcomb | 0
Time: 0.531 ms
rdfgraph=>
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
More caveats!
● Make sure you are not entering literals in a URI field.
● Also make sure your URIs are valid (amazingly some
DOIs fail when urlencoded)
● rdflib-sqlalchemy unoptimized for FTS
– Your (BTree) indices will be YUGE.
– Don't use large records (e.g. 10+ MB cnt:bytes)
● you need to drop index; insert; create
index
– pg_dump is your friend
– Massive maintenance work memory (Postgres)
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
DROP INDEX public.kb_a8f93b2ff6_uri_index;
DROP INDEX public.kb_a8f93b2ff6_type_mkc_key;
DROP INDEX public.kb_a8f93b2ff6_quoted_spoc_key;
DROP INDEX public.kb_a8f93b2ff6_member_index;
DROP INDEX public.kb_a8f93b2ff6_literal_spoc_key;
DROP INDEX public.kb_a8f93b2ff6_klass_index;
DROP INDEX public.kb_a8f93b2ff6_c_index;
DROP INDEX public.kb_a8f93b2ff6_asserted_spoc_key;
DROP INDEX public."kb_a8f93b2ff6_T_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_Q_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_Q_s_index";
DROP INDEX public."kb_a8f93b2ff6_Q_p_index";
DROP INDEX public."kb_a8f93b2ff6_Q_o_index";
DROP INDEX public."kb_a8f93b2ff6_Q_c_index";
DROP INDEX public."kb_a8f93b2ff6_L_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_L_s_index";
DROP INDEX public."kb_a8f93b2ff6_L_p_index";
DROP INDEX public."kb_a8f93b2ff6_L_c_index";
DROP INDEX public."kb_a8f93b2ff6_A_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_A_s_index";
DROP INDEX public."kb_a8f93b2ff6_A_p_index";
DROP INDEX public."kb_a8f93b2ff6_A_o_index";
DROP INDEX public."kb_a8f93b2ff6_A_c_index";
ALTER TABLE ONLY public.kb_a8f93b2ff6_type_statements
DROP CONSTRAINT kb_a8f93b2ff6_type_statements_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_quoted_statements
DROP CONSTRAINT kb_a8f93b2ff6_quoted_statements_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_namespace_binds
DROP CONSTRAINT kb_a8f93b2ff6_namespace_binds_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_literal_statements
DROP CONSTRAINT kb_a8f93b2ff6_literal_statements_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_asserted_statements
DROP CONSTRAINT kb_a8f93b2ff6_asserted_statements_pkey;
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Thank you =)
Twitter: @vyruss
EDINA Labs blog – http://labs.edina.ac.uk
Hack – http://github.com/vyruss/rdflib-sqlalchemy
Dataset – Stay tuned

More Related Content

What's hot

JavaのテストGroovyでいいのではないかという話
JavaのテストGroovyでいいのではないかという話JavaのテストGroovyでいいのではないかという話
JavaのテストGroovyでいいのではないかという話disc99_
 
学位論文の書き方メモ (Tips for writing thesis)
学位論文の書き方メモ (Tips for writing thesis)学位論文の書き方メモ (Tips for writing thesis)
学位論文の書き方メモ (Tips for writing thesis)Nobuyuki Umetani
 
優れた問いを見つける(中京大学講演)
優れた問いを見つける(中京大学講演)優れた問いを見つける(中京大学講演)
優れた問いを見つける(中京大学講演)cvpaper. challenge
 
ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)NTT DATA Technology & Innovation
 
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開KnowledgeGraph
 
オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門
オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門
オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門Kouji Kozaki
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall
思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall
思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 FallYoshitaka Kawashima
 
Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境
Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境
Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境Yuki Ando
 
Linked Open Data 作成支援ツールの紹介
Linked Open Data作成支援ツールの紹介Linked Open Data作成支援ツールの紹介
Linked Open Data 作成支援ツールの紹介uedayou
 
Rdf入門handout
Rdf入門handoutRdf入門handout
Rdf入門handoutSeiji Koide
 
異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j昌桓 李
 
SDL2の紹介
SDL2の紹介SDL2の紹介
SDL2の紹介nyaocat
 
ナレッジグラフ/LOD利用技術の入門(後編)
ナレッジグラフ/LOD利用技術の入門(後編)ナレッジグラフ/LOD利用技術の入門(後編)
ナレッジグラフ/LOD利用技術の入門(後編)KnowledgeGraph
 
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5Takahiro YAMADA
 
Linked Open Data(LOD)の基本的な使い方
Linked Open Data(LOD)の基本的な使い方Linked Open Data(LOD)の基本的な使い方
Linked Open Data(LOD)の基本的な使い方Kouji Kozaki
 
オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?
オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?
オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?Tomoya Shimaguchi
 
SQLアンチパターン~スパゲッティクエリ
SQLアンチパターン~スパゲッティクエリSQLアンチパターン~スパゲッティクエリ
SQLアンチパターン~スパゲッティクエリItabashi Masayuki
 

What's hot (20)

JavaのテストGroovyでいいのではないかという話
JavaのテストGroovyでいいのではないかという話JavaのテストGroovyでいいのではないかという話
JavaのテストGroovyでいいのではないかという話
 
学位論文の書き方メモ (Tips for writing thesis)
学位論文の書き方メモ (Tips for writing thesis)学位論文の書き方メモ (Tips for writing thesis)
学位論文の書き方メモ (Tips for writing thesis)
 
優れた問いを見つける(中京大学講演)
優れた問いを見つける(中京大学講演)優れた問いを見つける(中京大学講演)
優れた問いを見つける(中京大学講演)
 
ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
ストリーム処理におけるApache Avroの活用について(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
 
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
 
オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門
オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門
オントロジー工学に基づくセマンティック技術(2)ナレッジグラフ入門
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall
思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall
思考停止しないアーキテクチャ設計 ➖ JJUG CCC 2018 Fall
 
Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境
Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境
Docker Desktop WSL2 Backendで捗るWindows PCのコンテナ開発環境
 
Linked Open Data 作成支援ツールの紹介
Linked Open Data作成支援ツールの紹介Linked Open Data作成支援ツールの紹介
Linked Open Data 作成支援ツールの紹介
 
Rdf入門handout
Rdf入門handoutRdf入門handout
Rdf入門handout
 
異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j
 
SDL2の紹介
SDL2の紹介SDL2の紹介
SDL2の紹介
 
ナレッジグラフ/LOD利用技術の入門(後編)
ナレッジグラフ/LOD利用技術の入門(後編)ナレッジグラフ/LOD利用技術の入門(後編)
ナレッジグラフ/LOD利用技術の入門(後編)
 
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
 
Linked Open Data(LOD)の基本的な使い方
Linked Open Data(LOD)の基本的な使い方Linked Open Data(LOD)の基本的な使い方
Linked Open Data(LOD)の基本的な使い方
 
オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?
オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?
オープンデータなんかでよく使うTurtle形式なんだけどXMLと違ってどう便利なの?
 
ヤフー社内でやってるMySQLチューニングセミナー大公開
ヤフー社内でやってるMySQLチューニングセミナー大公開ヤフー社内でやってるMySQLチューニングセミナー大公開
ヤフー社内でやってるMySQLチューニングセミナー大公開
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
 
SQLアンチパターン~スパゲッティクエリ
SQLアンチパターン~スパゲッティクエリSQLアンチパターン~スパゲッティクエリ
SQLアンチパターン~スパゲッティクエリ
 

Similar to Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...Holden Karau
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Turi, Inc.
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiA Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiData Con LA
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)wqchen
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 

Similar to Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database (20)

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiA Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 

More from Jimmy Angelakos

Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Jimmy Angelakos
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresJimmy Angelakos
 
Changing your huge table's data types in production
Changing your huge table's data types in productionChanging your huge table's data types in production
Changing your huge table's data types in productionJimmy Angelakos
 
The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12Jimmy Angelakos
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesJimmy Angelakos
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataJimmy Angelakos
 
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονEισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονJimmy Angelakos
 
PostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationPostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationJimmy Angelakos
 

More from Jimmy Angelakos (9)

Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Changing your huge table's data types in production
Changing your huge table's data types in productionChanging your huge table's data types in production
Changing your huge table's data types in production
 
The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονEισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
 
PostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationPostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data Replication
 

Recently uploaded

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 

Recently uploaded (20)

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 

Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database

  • 1. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Jimmy Angelakos EDINA, University of Edinburgh FOSDEM 04-05/02/2017
  • 2. or how to export your data to someone who's expecting RDF Jimmy Angelakos EDINA, University of Edinburgh FOSDEM 04-05/02/2017
  • 3. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Semantic Web? RDF? ● Resource Description Framework – Designed to overcome the limitations of HTML – Make the Web machine readable – Metadata data model – Multigraph (Labelled, Directed) – Triples (Subject – Predicate – Object)
  • 4. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database ● Information addressable via URIs ● <http://example.org/person/Mark_Twain> <http://example.org/relation/author> <http://example.org/books/Huckleberry_Finn> ● <http://edina.ac.uk/ns/item/74445709> <http://purl.org/dc/terms/title> "The power of words: A model of honesty and fairness" . ● Namespaces @prefix dc: <http://purl.org/dc/elements/1.1/> . RDF Triples
  • 5. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Triplestores ● Offer persistence to our RDF graph ● RDFLib extended by RDFLib-SQLAlchemy ● Use PostgreSQL as storage backend! ● Querying – SPARQL
  • 6. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database { + "DOI": "10.1007/11757344_1", + "URL": "http://dx.doi.org/10.1007/11757344_1", + "type": "book-chapter", + "score": 1.0, + "title": [ + "CrossRef Listing of Deleted DOIs" + ], + "member": "http://id.crossref.org/member/297", + "prefix": "http://id.crossref.org/prefix/10.1007", + "source": "CrossRef", + "created": { + "date-time": "2006-10-19T13:32:01Z", + "timestamp": 1161264721000, + "date-parts": [ + [ + 2006, + 10, + 19 + ] + ] + }, + "indexed": { + "date-time": "2015-12-24T00:59:48Z", + "timestamp": 1450918788746, + "date-parts": [ +
  • 7. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database import psycopg2 from rdflib import plugin, Graph, Literal, URIRef from rdflib.namespace import Namespace from rdflib.store import Store import rdflib_sqlalchemy EDINA = Namespace('http://edina.ac.uk/ns/') PRISM = Namespace('http://prismstandard.org/namespaces/basic/2.1/') rdflib_sqlalchemy.registerplugins() dburi = URIRef('postgresql+psycopg2://myuser:mypass@localhost/rdfgraph' ) ident = EDINA.rdfgraph store = plugin.get('SQLAlchemy', Store)(identifier=ident) gdb = Graph(store, ident) gdb.open(dburi, create=True) gdb.bind('edina', EDINA) gdb.bind('prism', PRISM) item = EDINA['item/' + str(1)] triples = [] triples += (item, RDF.type, EDINA.Item), triples += (item, PRISM.doi, Literal('10.1002/crossmark_policy'), gdb.addN(t + (gdb,) for t in triples) gdb.serialize(format='turtle')
  • 8. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database BIG DATA
  • 9. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Super size me! ● Loop over original database contents ● Create triples ● Add them to graph efficiently ● Serialise graph efficiently
  • 10. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database But but… ? ● Big data without Java? ● Graphs without Java? – Gremlin? BluePrints? TinkerPop? Jena? Asdasdfaf? XYZZY? ● Why not existing triplestores? “We are the only Graph DB that...” ● Python/Postgres run on desktop hardware ● Simplicity (few LOC and readable) ● Unoptimised potential for improvement→
  • 11. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database conn = psycopg2.connect(database='mydb', user='myuser') cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) seqcur = conn.cursor() seqcur.execute(""" CREATE SEQUENCE IF NOT EXISTS item; """) conn.commit() cur = conn.cursor('serverside_cur', cursor_factory=psycopg2.extras.DictCursor) cur.itersize = 50000 cur.arraysize = 10000 cur.execute(""" SELECT data FROM mytable """) while True: recs = cur.fetchmany(10000) if not recs: break for r in recs: if 'DOI' in r['data'].keys(): seqcur.execute("SELECT nextval('item')") item = EDINA['item/' + str(seqcur.fetchone()[0])] triples += (item, RDF.type, EDINA.Item), triples += (item, PRISM.doi, Literal(r['data']['DOI'])),
  • 12. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 1st challenge ● rdflib-sqlalchemy – No ORM, autocommit (!) – Creates SQL statements, executes one at a time – INSERT INTO … VALUES (…); INSERT INTO … VALUES (…); INSERT INTO … VALUES (…); – We want INSERT INTO … VALUES (…),(…), (…) – Creates lots of indexes which must be dropped
  • 13. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 2nd challenge ● How to restart if interrupted? ● Solved with querying and caching.
  • 14. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database from rdflib.plugins.sparql import prepareQuery orgQ = prepareQuery(""" SELECT ?org ?pub WHERE { ?org a foaf:Organization . ?org rdfs:label ?pub . } """, initNs = { 'foaf': FOAF, 'rdfs': RDFS }) orgCache = {} for o in gdb.query(orgQ): orgCache[o[1].toPython()] = URIRef(o[0].toPython()) if 'publisher' in r['data'].keys(): publisherFound = False if r['data']['publisher'] in orgCache.keys(): publisherFound = True triples += (item, DCTERMS.publisher, orgCache[r['data'] ['publisher']]), if not publisherFound: seqcur.execute("SELECT nextval('org')") org = EDINA['org/' + str(seqcur.fetchone()[0])] orgCache[r['data']['publisher']] = org triples += (org, RDF.type, FOAF.Organization), triples += (org, RDFS.label, Literal(r['data'] ['publisher'])),
  • 15. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 3rd challenge ● rdflib-sqlalchemy (yup… guessed it) – Selects whole graph into memory ● Server side cursor: res = connection.execution_options( stream_results=True).execute(q) ● Batching: while True: result = res.fetchmany(1000) … yield … – Inexplicably caches everything read in RAM!
  • 16. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 4th challenge ● Serialise efficiently! Multiprocessing→ – Processes, JoinableQueues ● Turtle: Unsuitable N-Triples→ – UNIX magic python3 rdf2nt.py | split -a4 -d -C4G --additional-suffix=.nt --filter='gzip > $FILE.gz' - exported/rdfgraph_
  • 17. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Desktop hardware...
  • 18. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 5th challenge ● Serialisation outran HDD! ● Waits for: – JoinableQueues to empty – sys.stdout.flush()
  • 19. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database rdfgraph=> dt List of relations Schema | Name | Type | Owner --------+-----------------------------------+-------+---------- public | kb_a8f93b2ff6_asserted_statements | table | myuser public | kb_a8f93b2ff6_literal_statements | table | myuser public | kb_a8f93b2ff6_namespace_binds | table | myuser public | kb_a8f93b2ff6_quoted_statements | table | myuser public | kb_a8f93b2ff6_type_statements | table | myuser (5 rows) rdfgraph=> x Expanded display is on. rdfgraph=> select * from kb_a8f93b2ff6_asserted_statements limit 1; -[ RECORD 1 ]----------------------------------------------- id | 62516955 subject | http://edina.ac.uk/ns/creation/12993043 predicate | http://www.w3.org/ns/prov#wasAssociatedWith object | http://edina.ac.uk/ns/agent/12887967 context | http://edina.ac.uk/ns/rdfgraph termcomb | 0 Time: 0.531 ms rdfgraph=>
  • 20. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database More caveats! ● Make sure you are not entering literals in a URI field. ● Also make sure your URIs are valid (amazingly some DOIs fail when urlencoded) ● rdflib-sqlalchemy unoptimized for FTS – Your (BTree) indices will be YUGE. – Don't use large records (e.g. 10+ MB cnt:bytes) ● you need to drop index; insert; create index – pg_dump is your friend – Massive maintenance work memory (Postgres)
  • 21. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database DROP INDEX public.kb_a8f93b2ff6_uri_index; DROP INDEX public.kb_a8f93b2ff6_type_mkc_key; DROP INDEX public.kb_a8f93b2ff6_quoted_spoc_key; DROP INDEX public.kb_a8f93b2ff6_member_index; DROP INDEX public.kb_a8f93b2ff6_literal_spoc_key; DROP INDEX public.kb_a8f93b2ff6_klass_index; DROP INDEX public.kb_a8f93b2ff6_c_index; DROP INDEX public.kb_a8f93b2ff6_asserted_spoc_key; DROP INDEX public."kb_a8f93b2ff6_T_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_Q_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_Q_s_index"; DROP INDEX public."kb_a8f93b2ff6_Q_p_index"; DROP INDEX public."kb_a8f93b2ff6_Q_o_index"; DROP INDEX public."kb_a8f93b2ff6_Q_c_index"; DROP INDEX public."kb_a8f93b2ff6_L_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_L_s_index"; DROP INDEX public."kb_a8f93b2ff6_L_p_index"; DROP INDEX public."kb_a8f93b2ff6_L_c_index"; DROP INDEX public."kb_a8f93b2ff6_A_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_A_s_index"; DROP INDEX public."kb_a8f93b2ff6_A_p_index"; DROP INDEX public."kb_a8f93b2ff6_A_o_index"; DROP INDEX public."kb_a8f93b2ff6_A_c_index"; ALTER TABLE ONLY public.kb_a8f93b2ff6_type_statements DROP CONSTRAINT kb_a8f93b2ff6_type_statements_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_quoted_statements DROP CONSTRAINT kb_a8f93b2ff6_quoted_statements_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_namespace_binds DROP CONSTRAINT kb_a8f93b2ff6_namespace_binds_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_literal_statements DROP CONSTRAINT kb_a8f93b2ff6_literal_statements_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_asserted_statements DROP CONSTRAINT kb_a8f93b2ff6_asserted_statements_pkey;
  • 22. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Thank you =) Twitter: @vyruss EDINA Labs blog – http://labs.edina.ac.uk Hack – http://github.com/vyruss/rdflib-sqlalchemy Dataset – Stay tuned