Your SlideShare is downloading. ×
0
Digital Enterprise Research Institute                                            www.deri.ie




                         ...
Interlinking wikis
Digital Enterprise Research Institute                                                  www.deri.ie


  ...
Many isolated communities of users and their data
Digital Enterprise Research Institute                               www....
Interlinking wikis
Digital Enterprise Research Institute                               www.deri.ie


 We propose a new app...
Wiki Models
Digital Enterprise Research Institute                                                  www.deri.ie



      Se...
SIOC
                          Semantically-Interlinked Online Communities
Digital Enterprise Research Institute          ...
Extending the SIOC ontology
Digital Enterprise Research Institute                                       www.deri.ie




  ...
Relevant wiki features
Digital Enterprise Research Institute                                                        www.de...
Relevant wiki features
Digital Enterprise Research Institute                                                        www.de...
Relevant wiki features
Digital Enterprise Research Institute                                                        www.de...
Relevant wiki features
Digital Enterprise Research Institute                                                              ...
Relevant wiki features
Digital Enterprise Research Institute                                         www.deri.ie




   • ...
Relevant wiki features
Digital Enterprise Research Institute                                         www.deri.ie




     ...
Relevant wiki features
Digital Enterprise Research Institute                                                         www.d...
SIOC-MediaWiki Exporter
Digital Enterprise Research Institute                                             www.deri.ie


  ...
SIOC-MediaWiki Exporter
Digital Enterprise Research Institute                                             www.deri.ie


  ...
Browsing the generated data
Digital Enterprise Research Institute                                          www.deri.ie


 ...
Browsing the generated data
Digital Enterprise Research Institute                                          www.deri.ie


 ...
The DokuSIOC plugin
Digital Enterprise Research Institute                                            www.deri.ie




    ...
The DokuSIOC plugin
Digital Enterprise Research Institute                    www.deri.ie
Collecting Data
Digital Enterprise Research Institute                                        www.deri.ie


      To evalua...
Collecting Data
Digital Enterprise Research Institute                                        www.deri.ie




             ...
Building the application
Digital Enterprise Research Institute                                        www.deri.ie




  
...
Digital Enterprise Research Institute   www.deri.ie
The underlying queries
Digital Enterprise Research Institute                                                 www.deri.ie

...
Conclusions and Future Work
Digital Enterprise Research Institute                                               www.deri.i...
Digital Enterprise Research Institute                    www.deri.ie




                                        Thank you...
Upcoming SlideShare
Loading in...5
×

Semantic Search on Heterogeneous Wiki Systems - wikisym2010

2,662

Published on

by Fabrizio Orlandi at WikiSym 2010, Gdansk, Poland

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,662
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Semantic Search on Heterogeneous Wiki Systems - wikisym2010"

  1. 1. Digital Enterprise Research Institute www.deri.ie Semantic Search on Heterogeneous Wiki Systems Fabrizio Orlandi, Alexandre Passant DERI – Galway WikiSym 2010 Gdansk – 8th July 2010 © Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  2. 2. Interlinking wikis Digital Enterprise Research Institute www.deri.ie All wikis share a wide common knowledge, within many different wiki platforms: TWiki DokuWiki MoinMoin Widely used even in the workplace... Atlassian Trac XWiki Confluence Wiki All with different structures, platform dependent, all disconnected... 2 of 27
  3. 3. Many isolated communities of users and their data Digital Enterprise Research Institute www.deri.ie Wikis are also disconnected with other social media websites * Source: Pidgin Technologies, www.pidgintech.com
  4. 4. Interlinking wikis Digital Enterprise Research Institute www.deri.ie We propose a new approach based on Linked Data principles to solve such issues and to enable semantic search across heterogeneous wiki systems 4 of 27
  5. 5. Wiki Models Digital Enterprise Research Institute www.deri.ie Several semantic models have been implemented and used within specific semantic wiki platforms e.g.: Semantic MediaWiki as well as efforts to create generic ontology models: •WikiOnt ontology (DERI) •WIF (Wiki Interchange Format) ontology (Völkel, Oren - 1st Workshop on Semantic Wikis - 2006) But they are all specific to wikis and not open to other social websites 5 of 27
  6. 6. SIOC Semantically-Interlinked Online Communities Digital Enterprise Research Institute www.deri.ie • A project developed by DERI to semantically describe the content and structure of community sites • In particular the SIOC ontology is not specific to wikis and is widely used on the Web • It aims to create new connections between online discussion posts and items, forums, blogs... and wikis. • Adopted in a framework of more than 50 applications, deployed on over 400 sites including Drupal 7 and Yahoo! SearchMonkey http://sioc-project.org 6 of 27
  7. 7. Extending the SIOC ontology Digital Enterprise Research Institute www.deri.ie We decided to extend the SIOC ontology to make it compliant with wikis and make wikis interoperable and linkable to other social objects. First we considered the typical and relevant features of wikis in terms of structure and social interactions. Modeling these features using SIOC has other advantages: • Integration with existing SIOC data, as well as interlinking with other RDF data for advanced querying purposes; • Ability to run the same SPARQL query to find items on a particular wiki site or on a weblog or a forum. 7 of 27
  8. 8. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Multi-authoring: multiple users edit the same content collaboratively. Feature modeled using the class sioc:UserAccount (subclass of foaf:OnlineAccount) as object of sioc:has_creator that describes a user account in an online community site. In this way a foaf:Person can be linked to several sioc:UserAccount belonging to different wiki sites. 8 of 27
  9. 9. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Multi-authoring: multiple users edit the same content collaboratively. Feature modeled using the class sioc:UserAccount (subclass of foaf:OnlineAccount) as object of sioc:has_creator that describes a user account in an online community site. In this way a foaf:Person can be linked to several sioc:UserAccount belonging to different wiki sites. 9 of 27
  10. 10. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Categories: sets of articles on related topics which are hierarchically organized. A solution is provided by the SKOS vocabulary, as it offers a way to model hierarchical structures between various categories, as instances of skos:Concept [Miles, Bechhofer – W3C Recommendation - 2009] Hence we defined the sioct:Category class as a subclass of skos:Concept. 10 of 27
  11. 11. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Social Tagging: non-organized but dynamic organization process. The properties sioc:topic (using URIs) and dc:subject (using keywords) can be used to represent tags related to a particular wiki page. http://wiki.../The_Clash sioc:topic http://wiki.../punk_rock dc:subject tag:hasTag Punk rock 11 of 27
  12. 12. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Discussions: pages where people can discuss about the article subject. We added a new sioc:has_discussion property, with domain sioc:Item and open range (to make this property reusable). 12 of 27
  13. 13. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Backlinks: (or “what links here”) wiki internal links pointing to the same wiki article. We modeled this feature using the already existing sioc:links_to property (subproperty of dcterms:references). 13 of 27
  14. 14. Relevant wiki features Digital Enterprise Research Institute www.deri.ie • Pages Versioning: each page has an associated page history. In order to define an essential and lightweight model we: • Added a sioc:latest_version property; • Added 2 transitive (OWL) properties: sioc:earlier_version & sioc:later_version; • Defined sioc:later_version as inverse property of sioc:earlier_version; • Defined sioc:next(previous)_version as subproperty of sioc:later(earlier)_version. 14 of 27
  15. 15. SIOC-MediaWiki Exporter Digital Enterprise Research Institute www.deri.ie An exporter from a popular wiki platform to expose data in RDF using our proposed model. A webservice, written in PHP, that exports a MediaWiki article in RDF publicly available at: http://ws.sioc-project.org/mediawiki/ 15 of 27
  16. 16. SIOC-MediaWiki Exporter Digital Enterprise Research Institute www.deri.ie An exporter from a popular wiki platform to expose data in RDF using our proposed model. A webservice, written in PHP, that exports a MediaWiki article in RDF publicly available at: http://ws.sioc-project.org/mediawiki/ 16 of 27
  17. 17. Browsing the generated data Digital Enterprise Research Institute www.deri.ie RDF data extracted from a wiki page is browsable with tools such as The Tabulator To offer a better browsing experience and ease the process of crawling SIOC exports of MediaWiki instances, the webservice automatically produces rdfs:seeAlso links between wiki pages, following the Linked Data practices; Link to the corresponding Dbpedia resource added automatically, if the article is from the Wikipedia [English] (with foaf:primaryTopic) A RDF crawler can easily follow all the seeAlso links found on every document and continue to crawl, so it is possible to crawl an entire wiki site starting from a single URI.
  18. 18. Browsing the generated data Digital Enterprise Research Institute www.deri.ie RDF data extracted from a wiki page is browsable with tools such as The Tabulator To offer a better browsing experience and ease the process of crawling SIOC exports of MediaWiki instances, the webservice automatically produces rdfs:seeAlso links between wiki pages, following the Linked Data principles; Link to the corresponding DBpedia resource added automatically, if the article is from the Wikipedia [English] (with foaf:primaryTopic) A RDF crawler can easily follow all the seeAlso links found on every document and continue to crawl, so it is possible to crawl an entire wiki site starting from a single URI. 18 of 27
  19. 19. The DokuSIOC plugin Digital Enterprise Research Institute www.deri.ie  A plugin for DokuWiki that exports RDF data using popular lightweight ontologies (originally developed by M. Haschke, a SIOC contributor).  We modified and extended this plug-in in order to be compliant with our proposed model and to export all the needed wiki features.  It takes information from the metadata stored in the wiki system about pages, users, links, etc. and provides it as raw RDF/XML serialized data (instead of the usual HTML page).  Developed in PHP and easy to install in every DokuWiki system.  It uses the SIOC PHP API. 19 of 27
  20. 20. The DokuSIOC plugin Digital Enterprise Research Institute www.deri.ie
  21. 21. Collecting Data Digital Enterprise Research Institute www.deri.ie To evaluate our proposal, we exported and crawled different MediaWiki and DokuWiki instances: 5 wikis have been crawled, collecting more than 1GB of RDF data. More than 3000 wiki articles and 700 users. RDF data loaded in a triple-store: Sesame + OWLIM Using the SPARQL endpoint it is possible to run advanced and cross- sites queries on the top of the data collected by combining FOAF and SIOC e.g.: SELECT DISTINCT ?content WHERE { <http://example.org/js#me> foaf:account ?account . ?account rdf:type sioc:UserAccount . ?content sioc:has_creator ?account . } 21 of 27
  22. 22. Collecting Data Digital Enterprise Research Institute www.deri.ie SELECT DISTINCT ?content WHERE { <http://example.org/js#me> foaf:account ?account . ?account rdf:type sioc:UserAccount . ?content sioc:has_creator ?account . } 22 of 27
  23. 23. Building the application Digital Enterprise Research Institute www.deri.ie  The data acquisition module is a PHP script that:  queries the triple-store  collects and parses the results  translates the data in the correct format (JSON) for the visualization layer  The visualization layer has been built with the Exhibit framework by the MIT SIMILE Project  It is a set of Javascript files directly configurable on the HTML code of the page to display  It allows for faceted browsing capabilities 23 of 27
  24. 24. Digital Enterprise Research Institute www.deri.ie
  25. 25. The underlying queries Digital Enterprise Research Institute www.deri.ie  The first part shows co-authors of the requested user and their articles in common. SELECT DISTINCT ?wiki ?title ?coauthor WHERE { ?pag1 dc:contributor ?me. FILTER regex(?me, "UserX", "i"). ?pag1 dc:title ?title ; sioc:has_container ?wiki . ?pag2 dc:title ?title2. FILTER regex(str(?title), str(?title2)). ?pag2 dc:contributor ?coauthor. FILTER ((?coauthor) != (?me)). }  The second part shows all the articles, and the related categories, contributed by the requested user on different wikis. SELECT DISTINCT ? wiki ? title ? category WHERE { ?pag1 dc:contributor ?me. FILTER regex(?me, "UserX", "i"). ?pag1 dc:title ?title ; sioc:has_container ?wiki ; sioc:topic ?category . } 25 of 27
  26. 26. Conclusions and Future Work Digital Enterprise Research Institute www.deri.ie  Presented how the SIOC ontology and lightweight semantics can be used and extended to represent the structure of wikis in an unified way;  Demonstrated an overall benefit on applying SemWeb technologies to wikis: – enabling end-users to access the information generated in a simple and transparent way, – showing potentialities that cannot be obtained using the traditional Web 2.0 instruments;  The presented work goes in the direction of creating a collective knowledge system on the Web following the best Linked Data principles. Future work:  To provide more details about the content of wiki articles  To add to the system architecture a real-time search functionality  To standardise and spread plugins and exporters 26 of 27
  27. 27. Digital Enterprise Research Institute www.deri.ie Thank you! Any questions? 27 of 27
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×