Produce and Consume Linked Data with Drupal!

7,280 views

Published on

Currently a large number of Web sites are driven by Content Management Systems (CMS) which manage textual and multimedia content but also - inherently - carry valuable information about a site's structure and content model. Exposing this structured information to the Web of Data has so far required considerable expertise in RDF and OWL modelling and additional programming effort. In this paper we tackle one of the most popular CMS: Drupal. We enable site administrators to export their site content model and data to the Web of Data without requiring extensive knowledge on Semantic Web technologies. Our modules create RDFa annotations and - optionally - a SPARQL endpoint for any Drupal site out of the box. Likewise, we add the means to map the site data to existing ontologies on the Web with a search interface to find commonly used ontology terms. We also allow a Drupal site administrator to include existing RDF data from remote SPARQL endpoints on the Web in the site. When brought together, these features allow networked RDF Drupal sites that reuse and enrich Linked Data. We finally discuss the adoption of our modules and report on a use case in the biomedical field and the current status of its deployment.

Published in: Technology, Education
1 Comment
19 Likes
Statistics
Notes
No Downloads
Views
Total views
7,280
On SlideShare
0
From Embeds
0
Number of Embeds
395
Actions
Shares
0
Downloads
233
Comments
1
Likes
19
Embeds 0
No embeds

No notes for slide

Produce and Consume Linked Data with Drupal!

  1. 1. Digital Enterprise Research Institute www.deri.ie Produce and Consume Linked Data with Drupal! Stéphane Corlosquet, Renaud Delbru, Tim Clark, Axel Polleres and Stefan Decker ISWC 2009 scorlosquet@gmail.com DERI NUI Galway, MGH October 27th, 2009 Chapter 1 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  2. 2. Loads of Data on the Web in CMS... Digital Enterprise Research Institute www.deri.ie 2
  3. 3. Some Motivations... Digital Enterprise Research Institute www.deri.ie  Status of the current web  Data contained in millions of documents  Disparate platforms and systems  Wide range of topics (personal blogs, news, etc.)  Various types of resources (text, pictures, video, etc.)  Note: Lots of Structured data in Content Management Systems  Problem  Not possible to reuse this data outside the CMS (except RSS)  Not available as unified machine readable format 3
  4. 4. So, here’s our idea of CMS: Digital Enterprise Research Institute www.deri.ie PROJECT BLOGS DBLP SPARQL endpoint SPARQL REMOTE DRUPAL SITE endpoint SELECT ?name ?title Tim WHERE { ......... ?person foaf:made ?pub. ?person rdfs:label ?name. ?pub dc:title ?title. SPARQL FILTER regex(?title, "knowledge", "i") endpoint } Figure 3.5: Extended example in a typical Linked Data eco-system. 4
  5. 5. Approach Digital Enterprise Research Institute www.deri.ie  Our Goal  Integrate "any" CMS site to the Web of Data A challenging task  Little incentive for users to annotate their data manually  Site owners do not have the resources to convert their data to RDF  Per-siteschema: each site is different and its structure cannot be predefined  Solutions  Expose the CMS site structure in a unified format AUTOMATICALLY!  Use Semantic Web standards (RDFa, SPARQL) 5
  6. 6. Approach Digital Enterprise Research Institute www.deri.ie  Implementation in Drupal  Why? One of the most popular CMS out there  Modules to take the burden off the site users  What our modules allow:  1. Automatic site vocabulary generation  2. Mapping Content Models to existing ontologies  3. Data endpoint for SPARQL querying  4. Lazy loading of external data (data import) 6
  7. 7. Pre-Existing work Digital Enterprise Research Institute www.deri.ie  “Semantic Content Management Systems”  Ontology-based CMS: – Semantic community Web portals (2000) – OntoWebber: Model-Driven Ontology-Based Web Site Management (2001)  Our approach is reverse: from existing CMS structure to ontologies 7
  8. 8. The Drupal CMS Digital Enterprise Research Institute www.deri.ie  Drupal*  Easy to use  Large community  Popularon the Web  Hundreds of thousands of sites  Modular design  Drupal site workflow  Site administrator: set up the site and install modules they like/need  Site editors: create the content of the site following the schema defined by the site administrator * http://drupal.org/ 8
  9. 9. Drupal: Content Construction Kit Digital Enterprise Research Institute www.deri.ie  Content Construction Kit (CCK) module  GUI for extending the internal schema of a Drupal site  Used on many Drupal sites  Can build new types of pages, known as content types  Can create fields for each content types. Fields can be of various types: plain text fields, dates, email addresses, file uploads, reference to other pages 9
  10. 10. Drupal: Content Construction Kit Digital Enterprise Research Institute www.deri.ie  Demo use case: project blogs site*  Community site PROJECT BLOGS  Various content: – People DBLP – Organizations – Projects SPARQL endpoint – Blogs SPARQL REMOTE DRUPAL SITE endpoint SELECT ?name ?title Tim WHERE { ......... ?person foaf:made ?pub. ?person rdfs:label ?name. ?pub dc:title ?title. SPARQL FILTER regex(?title, "knowledge", "i") endpoint } Figure 3.5: Extended example in a typical Linked Data eco-system. one for bridging the DBLP SPARQL endpoint to the project blogs website, and a sec- ond for bridging the Science Collaboration Framework website. When visiting Tim’s profile page, the relevant publication information will be fetched from both DBLP and * http://drupal.deri.ie/projectblogs/ SCF websites, and either new nodes will be created on the site or older ones will be updated if necessary. 10 3.4 Neologism: Easy RDFS vocabulary publishing Neologism11 is a web-based vocabulary editor and publishing platform designed to
  11. 11. Drupal: Content Construction Kit Digital Enterprise Research Institute www.deri.ie  CCK User Interface 11
  12. 12. Drupal: the Person contentConstruction KitThis form The fields form for Content type is displayed on Figure 2.11. llows to easily reorder the fields by a “drag and drop” technique, add new fields, Digital Enterprise Research Institute www.deri.ie emove existing fields or access the configuration form for a field.  CCK User Interface Figure 2.12: Defining constraints on the gender field in Drupal’s CCK. 12
  13. 13. Figures 2.9, 2.10, 2.11 and 2.12 show the typical look and feelKit Drupal: Content Construction of a Drupal page and administrative interface for the Person content type, without our extensions installed. Digital Enterprise Research Institute www.deri.ie This content type offers fields such as name, homepage, email, colleagues, blog url, current project,User Interface  CCK past projects, publications, contributions. Figure 2.9: User profile page built with Drupal’s CCK. 13 An example of node (page) of the type Person is depicted on Figure 2.9 where all
  14. 14. What do we add? Digital Enterprise Research Institute www.deri.ie 1, 2 14
  15. 15. 1. Site Vocabulary Digital Enterprise Research Institute www.deri.ie  Automatic site vocabulary in RDFS/OWL from CCK  Describes the content types and fields  Content type <=> RDF class  Field <=> RDF property  RDFa output on site  http://siteurl/ns# 15
  16. 16. 1. Site Vocabulary Digital Enterprise Research Institute www.deri.ie  Automatic site vocabulary in RDFS/OWL  Field constraints  Example with cardinalities: – the name of a Person is required – max. 5 projects per person 16
  17. 17. Search examples are shown in Figure 3.2. Details on improving the ran 2.search algorithm can be found in [45]. Mapping Content Models to existing ontologies Digital Enterprise Research Institute www.deri.ie 3.2.3 Mapping process  Mapping Content Models to Existing Ontologies The terms suggested by both of the import service and the ontology search  Import of any vocabulary published online be mapped to each content type and their fields. For mapping content ty choose among the classes of service  External ontology search the imported ontologies and for fields, one  Local terms are subclasses/subproperties of public terms among the properties. The local terms will be linked with rdfs:subCl rdfs:subPropertyOf statements, e.g. site:Person rdfs:subClassOf foaf:Person to the mapped site vocabulary; wherever a mapping is definined, extra triples using the m are exposed in the RDFa of the page.  Ensure “safe” vocabulary re-use: – only subclassing/subproperty avoids “redefinition” properties. E.g., ass Additionally, we allow inverse reuse of existing administrator imports amight introduce inconsistencies a relation between C – adding cardinalities vocabulary ex: that defines still, possible to gions and goods user interface avoid in the that this region/coutry produces via the property ex:prod user interface also allows to relate fields to the inverse of imported proper stance, the origin field could be related to ex:produces in such an inve resulting in 17 site:origin rdfs:subPropertyOf
  18. 18. 2. Mapping Content Models to existing ontologies Digital Enterprise Research Institute www.deri.ie  RDF mappings page 18 Figure 3.2: RDF mappings management through the Dru
  19. 19. 2. Mapping Content Models to existing ontologies Digital Enterprise Research Institute www.deri.ie  RDF mappings page agement through the Drupal interface: RDF class map- 19
  20. 20. What do we add? Digital Enterprise Research Institute www.deri.ie 1, 2 3 20
  21. 21. 3. Data endpoint for complex querying Digital Enterprise Research Institute www.deri.ie  Local RDF data exposed in a SPARQL endpoint  Enables interoperability across sites  Built on the PHP ARC2 library  AllRDF data indexed in the endpoint  Each page stored as graph and kept up to date Figure 3.6: A list of SPARQL results (left) and an RDF SPARQL Proxy 21
  22. 22. 3. Data endpoint for complex querying Digital Enterprise Research Institute www.deri.ie  Local RDF data exposed in a SPARQL endpoint  enable interoperability across sites  built on the PHP ARC2 library  allRDF data indexed in the endpoint  Each page stored as graph and kept up to date 22
  23. 23. What do we add? Digital Enterprise Research Institute www.deri.ie 4 1, 2 3 23
  24. 24. 4. Lazy loading of external data Digital Enterprise Research Institute www.deri.ie  Lazy loading (caching) of distant RDF resources  Enables interoperability across sites  Built on the PHP ARC2 library  CONSTRUCT query to map distant schema to local schema A list of SPARQL results (left) and an RDF SPARQL Proxy profile form 24
  25. 25. 4. Lazy loading of external data Digital Enterprise Research Institute www.deri.ie  Lazy loading of distant RDF resources 25
  26. 26. Digital Enterprise Research Institute www.deri.ie Where is it used? 26
  27. 27. Science Collaboration Framework Digital Enterprise Research Institute www.deri.ie  Web application toolkit based on Drupal  Enables online scientific collaboration – publishing, annotating, sharing and discussing any content – articles, papers, reviews, perspectives, interviews, news, biographies – profile information on community members  Targets biomedecine communities, but generic in essence  Networked sites producing Linked Data 27
  28. 28. SCF collaborating sites Digital Enterprise Research Institute www.deri.ie  Stembook (Stem Cell articles and reviews) – http://www.stembook.org/ 28
  29. 29. SCF collaborating sites Digital Enterprise Research Institute www.deri.ie  Michael J Fox Foundation (Parkinson disease) – http://www.pdonlineresearch.org/ 29
  30. 30. Digital Enterprise Research Institute www.deri.ie Conclusion 30
  31. 31. Conclusion Digital Enterprise Research Institute www.deri.ie  Structureof CMS sites contain valuable schema information  Our suggested “workflow”:  site vocabulary from the local structure (RDF CCK)  enables out-of-the-box RDF export: expose your Drupal site to the Web of Data without any additional effort from site admin or content editors (RDF CCK)  mapping to existing RDF vocabularies improves integration in the LOD cloud (evoc)  SPARQL endpoint  Lazy loading of RDF resources (RDF Proxy) 31
  32. 32. Conclusion Digital Enterprise Research Institute www.deri.ie  Drupal 6 modules available for download – http://drupal.org/project/rdfcck – http://drupal.org/project/evoc – http://drupal.org/project/sparql_ep – http://drupal.org/project/rdfproxy  Online prototype – http://drupal.deri.ie/projectblogs/ 32
  33. 33. Good news from Drupal 7: Digital Enterprise Research Institute www.deri.ie  RDF mapping feature committed to Drupal 7 core  RDFa output by default (blogs, forums, comments, etc.) using FOAF, SIOC, DC, SKOS.  Download development snapshot – http://ftp.drupal.org/files/projects/drupal-7.x-dev.tar.gz  Currently more than 200.000* sites on Drupal 6  waiting to make the switch to Drupal 7  waiting to massively increase the amount of RDF data on the Web  Discussion  http://groups.drupal.org/semantic-web * http://drupal.org/project/usage/drupal 33

×