Produce and Consume Linked Data with Drupal!

Digital Enterprise Research Institute www.deri.ie

Produce and Consume Linked Data
with Drupal!
Stéphane Corlosquet, Renaud Delbru, Tim Clark,
Axel Polleres and Stefan Decker
ISWC 2009

scorlosquet@gmail.com
DERI NUI Galway, MGH
October 27th, 2009
Chapter 1
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Loads of Data on the Web in CMS...

2

Some Motivations...

 Status of the current web
 Data contained in millions of documents
 Disparate platforms and systems
 Wide range of topics (personal blogs, news, etc.)
 Various types of resources (text, pictures, video, etc.)
 Note: Lots of Structured data in Content Management Systems

 Problem
 Not possible to reuse this data outside the CMS (except RSS)
 Not available as uniﬁed machine readable format

3

So, here’s our idea of CMS:

PROJECT BLOGS

DBLP

SPARQL
endpoint

SPARQL REMOTE DRUPAL SITE
endpoint

SELECT ?name ?title Tim
WHERE { .........
?person foaf:made ?pub.
?person rdfs:label ?name.
?pub dc:title ?title. SPARQL
FILTER regex(?title, "knowledge", "i") endpoint
}

Figure 3.5: Extended example in a typical Linked Data eco-system.
4

Approach

 Our Goal
 Integrate "any" CMS site to the Web of Data

A challenging task
 Little incentive for users to annotate their data manually
 Site owners do not have the resources to convert their data to RDF
 Per-siteschema: each site is different and its structure cannot be
predeﬁned

 Solutions
 Expose the CMS site structure in a uniﬁed format AUTOMATICALLY!
 Use Semantic Web standards (RDFa, SPARQL)

5

Approach

 Implementation in Drupal
 Why? One of the most popular CMS out there
 Modules to take the burden off the site users

 What our modules allow:
 1. Automatic site vocabulary generation
 2. Mapping Content Models to existing ontologies
 3. Data endpoint for SPARQL querying
 4. Lazy loading of external data (data import)

6

Pre-Existing work

 “Semantic Content Management Systems”

 Ontology-based CMS:
– Semantic community Web portals (2000)
– OntoWebber: Model-Driven Ontology-Based Web Site Management
(2001)

 Our approach is reverse: from existing CMS structure to
ontologies

7

The Drupal CMS

 Drupal*
 Easy to use
 Large community
 Popularon the Web
 Hundreds of thousands of sites
 Modular design
 Drupal site workﬂow
 Site administrator: set up the site and install modules they
like/need
 Site editors: create the content of the site following the
schema deﬁned by the site administrator

* http://drupal.org/

8

Drupal: Content Construction Kit

 Content Construction Kit (CCK) module
 GUI for extending the internal schema of a Drupal site
 Used on many Drupal sites
 Can build new types of pages, known as content types
 Can create fields for each content types. Fields can be of
various types: plain text fields, dates, email addresses, file
uploads, reference to other pages

9


 Demo use case: project blogs site*
 Community site
PROJECT BLOGS

 Various content:
– People DBLP

– Organizations
– Projects SPARQL
endpoint

– Blogs SPARQL REMOTE DRUPAL SITE
endpoint

SELECT ?name ?title Tim
WHERE { .........
?person foaf:made ?pub.
?person rdfs:label ?name.
?pub dc:title ?title. SPARQL
FILTER regex(?title, "knowledge", "i") endpoint
}

Figure 3.5: Extended example in a typical Linked Data eco-system.

one for bridging the DBLP SPARQL endpoint to the project blogs website, and a sec-
ond for bridging the Science Collaboration Framework website. When visiting Tim’s
proﬁle page, the relevant publication information will be fetched from both DBLP and
* http://drupal.deri.ie/projectblogs/ SCF websites, and either new nodes will be created on the site or older ones will be
updated if necessary.

10 3.4 Neologism: Easy RDFS vocabulary publishing
Neologism11 is a web-based vocabulary editor and publishing platform designed to


 CCK User Interface

11

Drupal: the Person contentConstruction KitThis form
The fields form for
Content type is displayed on Figure 2.11.
llows to easily reorder the fields by a “drag and drop” technique, add new fields,

emove existing fields or access the configuration form for a field.
 CCK User Interface

Figure 2.12: Defining constraints on the gender field in Drupal’s CCK.
12

Figures 2.9, 2.10, 2.11 and 2.12 show the typical look and feelKit
Drupal: Content Construction of a Drupal page and
administrative interface for the Person content type, without our extensions installed.
This content type offers ﬁelds such as name, homepage, email, colleagues, blog url,
current project,User Interface
 CCK past projects, publications, contributions.

Figure 2.9: User proﬁle page built with Drupal’s CCK.
13
An example of node (page) of the type Person is depicted on Figure 2.9 where all

What do we add?

1, 2

14

1. Site Vocabulary

 Automatic site vocabulary in RDFS/OWL from CCK
 Describes the content types and ﬁelds
 Content type <=> RDF class
 Field
<=> RDF property
 RDFa output on site
 http://siteurl/ns#

15

1. Site Vocabulary

 Automatic site vocabulary in RDFS/OWL
 Field constraints
 Example with cardinalities:
– the name of a Person is required
– max. 5 projects per person

16

Search examples are shown in Figure 3.2. Details on improving the ran
2.search algorithm can be found in [45].
Mapping Content Models to existing ontologies

3.2.3 Mapping process
 Mapping Content Models to Existing Ontologies
The terms suggested by both of the import service and the ontology search
 Import of any vocabulary published online
be mapped to each content type and their fields. For mapping content ty
choose among the classes of service
 External ontology search
the imported ontologies and for fields, one
 Local terms are subclasses/subproperties of public terms
among the properties. The local terms will be linked with rdfs:subCl
rdfs:subPropertyOf statements, e.g.
site:Person rdfs:subClassOf foaf:Person to the mapped
site vocabulary; wherever a mapping is definined, extra triples using the m
are exposed in the RDFa of the page.
 Ensure “safe” vocabulary re-use:
– only subclassing/subproperty avoids “redefinition” properties. E.g., ass
Additionally, we allow inverse reuse of existing
administrator imports amight introduce inconsistencies a relation between C
– adding cardinalities vocabulary ex: that defines still, possible to
gions and goods user interface
avoid in the that this region/coutry produces via the property ex:prod
user interface also allows to relate fields to the inverse of imported proper
stance, the origin field could be related to ex:produces in such an inve
resulting in
17
site:origin rdfs:subPropertyOf

2. Mapping Content Models to existing ontologies

 RDF mappings page

18
Figure 3.2: RDF mappings management through the Dru

2. Mapping Content Models to existing ontologies

 RDF mappings page

agement through the Drupal interface: RDF class map-
19

What do we add?

1, 2
3

20

3. Data endpoint for complex querying

 Local RDF data exposed in a SPARQL endpoint
 Enables interoperability across sites
 Built on the PHP ARC2 library
 AllRDF data indexed in the endpoint
 Each page stored as graph and kept up to date

Figure 3.6: A list of SPARQL results (left) and an RDF SPARQL Proxy
21

3. Data endpoint for complex querying

 Local RDF data exposed in a SPARQL endpoint
 enable interoperability across sites
 built on the PHP ARC2 library
 allRDF data indexed in the endpoint
 Each page stored as graph and kept up to date

22

What do we add?

4

1, 2
3

23

4. Lazy loading of external data

 Lazy loading (caching) of distant RDF resources
 Enables interoperability across sites
 Built on the PHP ARC2 library
 CONSTRUCT query to map distant schema to local schema

A list of SPARQL results (left) and an RDF SPARQL Proxy proﬁle form
24

4. Lazy loading of external data

 Lazy loading of distant RDF resources

25


Where is it used?

26

Science Collaboration Framework

 Web application toolkit based on Drupal
 Enables online scientiﬁc collaboration
– publishing, annotating, sharing and discussing any content
– articles, papers, reviews, perspectives, interviews, news, biographies
– proﬁle information on community members
 Targets biomedecine communities, but generic in essence

 Networked sites producing Linked Data

27

SCF collaborating sites

 Stembook (Stem Cell articles and reviews)
– http://www.stembook.org/

28

SCF collaborating sites

 Michael J Fox Foundation (Parkinson disease)
– http://www.pdonlineresearch.org/

29


Conclusion

30

Conclusion

 Structureof CMS sites contain valuable schema
information
 Our suggested “workﬂow”:
 site vocabulary from the local structure (RDF CCK)
 enables out-of-the-box RDF export: expose your Drupal site
to the Web of Data without any additional effort from site
admin or content editors (RDF CCK)
 mapping to existing RDF vocabularies improves integration in
the LOD cloud (evoc)
 SPARQL endpoint
 Lazy loading of RDF resources (RDF Proxy)

31

Conclusion

 Drupal 6 modules available for download
– http://drupal.org/project/rdfcck
– http://drupal.org/project/evoc
– http://drupal.org/project/sparql_ep
– http://drupal.org/project/rdfproxy
 Online prototype
– http://drupal.deri.ie/projectblogs/

32

Good news from Drupal 7:

 RDF mapping feature committed to Drupal 7 core
 RDFa output by default (blogs, forums, comments, etc.)
using FOAF, SIOC, DC, SKOS.
 Download development snapshot
– http://ftp.drupal.org/ﬁles/projects/drupal-7.x-dev.tar.gz
 Currently more than 200.000* sites on Drupal 6
 waiting to make the switch to Drupal 7
 waiting to massively increase the amount of RDF data
on the Web
 Discussion
 http://groups.drupal.org/semantic-web

* http://drupal.org/project/usage/drupal

33

Produce and Consume Linked Data with Drupal!

Recommended

Recommended

More Related Content

Similar to Produce and Consume Linked Data with Drupal!

Similar to Produce and Consume Linked Data with Drupal! (20)

More from scorlosquet

More from scorlosquet (19)

Recently uploaded

Recently uploaded (20)

Produce and Consume Linked Data with Drupal!