Advanced Machine Learning for Business Professionals
SC2 Workshop 2: CELLAR: The Publications Office's Semantic Repository
1. CELLAR: The Publications Office's
Semantic Repository
Marc Wilhelm Küster
Publications Office of the EU
European Policy Perspectives on Data-intensive
Agriculture & Food
Brussels, 30 September 2016
2. What goes into the CELLAR?
Contractors
…
Reception Validation Conversion
IMMC
CELLARMETS
ELI: European Legislation Identifier
IMMC: Standardized XML transmission envelope
METS: Metadata Encoding Standard
3. How are things structured in the CELLAR?
Ontologies /
Common Data Model
InstanceDataControlData
Thesauri / authority
tables
…
WORK
<Directive>
e.g. 32006L0121
Expression
FR: Directive 2006/121/CE du
Parlement européen et du Conseil
du 18 décembre 2006[…]
Expression
EN: Directive 2006/121/EC of the
European Parliament and of the
Council of 18 December 2006
amending Council Directive 67/
548/EEC[…]
Expression
EL: Οδηγία 2006/121/ΕΚ του
Ευρωπαϊκού Κοινοβουλίου και του
Συμβουλίου, της 18ης Δεκεμβρίου
2006 , για την τροποποίηση της
οδηγίας 67/548/ΕΟΚ […]
Manifestation
PDF
Manifestation
xhtml
Manifestation
PDF
Manifestation
xhtml
Manifestation
PDF
Manifestation
xhtml
SUBJECT
002897: rapprochement des
législations
AGENT
PE: European Parliament
CONSIL: Council
4. How can you retrieve data from CELLAR?
SPARQLDirect access /
RESTful WS
Notification
/ RSS
EUR-Lex
OP Portal
Internet
http://publications.europa.eu/webapi/rdf/sparql
http://publications.europa.eu/resource/...
Dublin Core (core metadata)
Linked Open Data (LOD)
Web-friendly ("RESTful") Interface
Resource Description Framework (RDF)
Standard Query Language (SPARQL)
FRBR model
URIs:
http://publications.europa.eu/resource/
{ps-id}/{obj-id}
8. •8 mio requests per day served on
average, peaks >20 mio
•>100k SPARQL queries / day
•> 1 mio different resources in > 10
million linguistic versions and > 28 mio
items
•> 230 million persistent identifiers
•> 1500 million triples in Oracle RDF
store
•Ca. 5000 resources treated each day
(most in 23 languages)
• Sizes:
•4 TB Oracle DB (compressed)
•Content (in Fedora repository) > 17.5
TB
•120 million files in Fedora
State: 2016-09
How much is CELLAR used?
Requests from internet / country (2016-09)
Daily requests / day (2016-09)
SPARQL requests / day (2016-09)
9. Attributions for reused images:
Wine CELLAR: https://flic.kr/p/pkG1QS
Photo of OWL: https://flic.kr/p/6AMV1C
http://gephi.github.io/features/
Network: https://en.wikipedia.org/wiki/Network_theory#/media/File:Internet_map_1024.jpg
https://openclipart.org/detail/169750/fileiconpdf
https://openclipart.org/detail/169753/fileiconxml
https://openclipart.org/detail/169751/fileiconhtml