• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

on

  • 4,925 views

Tags are a very effcient method of describing information ...

Tags are a very effcient method of describing information
with metadata. Adding semantic information to the keywords allows
computers to comprehend what the pages are saying and use that knowledge to o er better service to humans when interacting with them. The
tagging extension for the XWiki Platform links the user-defi ned keywords
with semantic information from the DBpedia knowledge base.

Statistics

Views

Total Views
4,925
Views on SlideShare
4,923
Embed Views
2

Actions

Likes
0
Downloads
84
Comments
0

1 Embed 2

http://www.slashdocs.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Semantic Tagging for the XWiki Platform with Zemanta and DBpedia Semantic Tagging for the XWiki Platform with Zemanta and DBpedia Document Transcript

    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia Elena-Oana T˘b˘ranu and Anna-Maria Metzak a a Faculty of Computer Science “Alexandru I. Cuza” University of Ia¸i s {elena.tabaranu,anna.metzak}@info.uaic.ro Abstract. Tags are a very efficient method of describing information with metadata. Adding semantic information to the keywords allows computers to comprehend what the pages are saying and use that knowl- edge to offer better service to humans when interacting with them. The tagging extension for the XWiki Platform links the user-defined keywords with semantic information from the DBpedia knowledge base. Key words: XWiki, Zemanta, DBpedia, knowledge base, Semantic Web, tagging, Common Tag 1 Introduction A tag is a relevant keyword or term associated with specific content. Labeling by keywords has long been used in scientific publications. Recent comeback hap- pened when web users and developers realized tags are a very efficient method of describing information with metadata. The goal of this project is to extend a conventional open source Web ap- plication with semantic information. The Semantic Tagging XWiki component enriches the tagging mechanism for the XWiki Platform using the content rec- ommendation tool Zemanta1 and the knowledge base DBpedia2 . The XWiki semantic tagging mechanism allows the user to get suggestions when adding new tags and have links for each new tag to concepts extracted from the world’s biggest knowledge base, Wikipedia. 2 The XWiki Platform XWiki is a open source platform for developing collaborative web applications using the wiki paradigm. XWiki Products are based on the XWiki Platform 1 Zemanta is a tool which brings relevant content from around the web brought as the user is typing. The API allows to bring these related Images, Articles, Hyperlinks and Tags to your Application. 2 DBpedia is a community effort toextract structured information from Wikipedia andtomake this information available onthe Web.
    • 2 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia which provides common services and UI to them. XWiki is a second generation wiki that provides all the basic content management and administration features of common wikis, but with much more. XWiki takes the wiki approach to a whole new level by providing enhanced features and capabilities. With XWiki, you can build simple applications, extend the platform with custom plugins/components, or even build complex Web applications. Some of the features offered by the XWiki Platform are: – Edit pages by using wiki syntax to format text, create tables, create links, display images, etc. Alternatively use a powerful WYSIWYG editor to edit the content of documents. – Create, Edit, Show, Print, Delete, Copy, Move and Rename documents. – Export wiki pages to PDF, RTF, XML or HTML. – Attach as many files as you want to any page. These files can then be refer- enced and used in page contents. – Control who can view, edit or delete documents in a flexible manner. Apply rights to a document, a space or an entire wiki. – Use XWiki’s programming API directly into your pages (Velocity or Groovy) to perform advanced formatting, layout or anything really. – Create applications by grouping several pages together. Import and export Applications to/from your wiki. Examples of applications that non-developers can create quickly and in an or- ganic manner using XWiki: – A blogging application. – An RSS feed aggregator. – Mashups. For example combining Google Maps with Delicious with Flickr with Google Base with Google Calendar, etc. – Collaborative authoring of documents in real time. – Form-based applications to enter collections of items – A Poll/Survey application 2.1 The XWiki Platform Core XWiki Core is a single historic JAR that is split into several distinct modules and that currently implements the following features: – Model: All the classes representing the wiki model, i.e. the following notions: Document, Space, Wiki, Classes/Objects, Attachments and more. – XWiki Syntax 1.0 Rendering: This is the old service for rendering XWiki Syntax 1.0 which we keep for backward compatibility so that existing users can keep using the XWiki Syntax 1.0. For all other syntaxes there’s now a new Rendering Module. – Localization: Handles translations in various languages. A new Localization module is under development that will replace this old module.
    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 3 Fig. 1. The XWiki Platform Architecture. – Notification: Handles event registration and distribution. For example code can subscribe to receive an event when a new document is created. – Exports (PDF, RTF, XAR). In the future this will be done by implementing specific Renderers in the new Rendering Module. – Security: Authentication and Authorization handling. – User Management 2.2 The XWiki Platform Plugins The plugins created and maintained by the XWiki development team are ei- ther in their own JAR, either are still located in the XWiki Core JAR. Besides these ones, others plugins have been contributed by the community and can be installed. The full list of available plugins is available on the Code Zone3 . 2.3 The XWiki Platform Modules A module offers services in a given domain. Modules are the equivalent of Plugins but using the new XWiki component-based architecture. XWiki’s Architecture is based on Component-oriented Development. XWiki has chosen to be independent of all existing Components Managers and instead to define some simple Component interfaces that can then be bound on any ex- isting Component Manager. XWiki is currently implementing its own lightweight Component Manager. 3 Contributions from the XWiki community can be accessed at: http://code.xwiki. org/xwiki/bin/view/Main/.
    • 4 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 2.4 The XWiki Platform Applications The applications created and maintained by the XWiki development team are: Panels, Administration, Blog, Application Manager, Wiki Manager, Scheduler, Statistics, Watch List, Office Importer, WebDAV, WebDAV, Tags, Search. In addition to these, others applications have been contributed by the community and can be installed. The full list of available applications is available on the Code Zone. 2.5 Extending The XWiki Platform The XWiki Platform can be extended by: – Writing scripts in wiki pages – Writing Applications (set of wiki pages) – Writing Plugins in Java – Writing Modules (a set of components) in Java – Writing new Skins or extending existing ones – Extending existing Service APIs when they provide extension points. Fig. 2. Extending the XWiki Platform. 3 Bringing Semantic Tagging to the XWiki Platform with Zemanta and DBpedia Semantic Tagging is a proposal to extend XWiki’s default tagging mechanism using the Zemanta content recommendation tool and the DBpedia knowledge base:
    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 5 – tag documents with user-defined tags (default behavior in XWiki for tag- ging); – use Zemanta to recommend tags for the wiki page content; – add concept information for each tag using Dbpedia. The mockups below were produced using Balsamq mockups and provide the user interface changes for the XWiki Platform when adding and displaying a semantic tag. 3.1 Add a semantic tag When adding a tag for the content of a wiki page, the user has two options from the “Add Tag” form: the “Suggested tags” tab or the “Wiki Tags” tab. When hovering over a suggested tag, a popup with semantic details will be displayed: tag description and URI link for the DBpedia resource page. Besides the “Suggested tags”, the user can use the “Wiki tags” tab to display the tag cloud from the entire wiki. Also, the default autocomplete feature will help the user find tags already used in the wiki instance. After a tag will be added to the Tags section for a wiki page, it will be deac- tivated from the suggested list. The grey color was used to mark the deactivated tags. Fig. 3. Mockup for tagging a wiki page in XWiki.
    • 6 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia Fig. 4. Tagging a wiki page in XWiki. Fig. 5. Autocomplete feature for tagging a wiki page in XWiki.
    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 7 3.2 Display semantic information for a tag A semantic tag will preserve the default behavior for XWiki in view mode: add icon, remove icon and link to the list of documents which were tagged with it, but will also have semantic information attached. Fig. 6. Mockup for displaying a wiki page in XWiki. Fig. 7. Semantic information for a wiki tag. 3.3 Instruments used for suggestions Digitalization of content started by putting written word into ASCII form. HTML and web eventually enabled linking and interleaving with other types of media such as images, sound and video. Flash and Javascript further enabled interactive widgets such as map views. Lately the content on the web is moving into direction of explicitly exposing relations between pieces of data. General intention of explicitly exposing relations is to allow computers to comprehend what pages are saying and use that knowledge to offer better service to humans when interacting with them.
    • 8 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia While authoring text comes naturally for educated human beings many rea- sons exist why creating fully featured web content is still cumbersome experience. Those reasons can be split into two main categories. One issue is efficiently find- ing the right content that should be included or connected to. This usually takes a lot of time. The other issue is efficiently telling the computer the relationships between our content and external content and data. This usually requires skills and knowledge from depths of specifications and standards. Zemanta is the service that tries to resolve those two issues by providing semi-automatic process of content enrichment to be more appealing to humans and at the same time placing it in correct relations to other content in a way computers can understand. Fig. 8. Authoring process with Zemanta. Zemanta API allows application developers to automatically query the Ze- manta engine for contextual information about the text that user enters. Tech- nically, the API accepts (any) text through a POST request and upon analysis of that text returns suggestions. While some other services only try to find the most overrepresented rare words or proper names in the text, Zemanta goes deeper when processing con- tent. Zemanta offers both tags based on words and phrases that can be found inside author’s text and also those that are only topics that could represent the content as a whole, but are not explicitly mentioned. It goes even further and tries to find very concrete items and concepts that are related to what is being said, but are only connected through a third piece of information. Therefore author can expect topics, names and concepts as tags.
    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 9 Structure of Zemanta’s RDF/XML response was inspired by Linking Open Data initiative, other APIs offering semantic responses and most importantly ideas championed by W3C. The XWiki Semantic Tagging component uses the Zemanta API to suggest possible keywords for a specific text. The component identifies itself with an API key. The API key is a string that uniquely identifies a specific instance of application that is using the Zemanta web service. Also, there are limitations on the number of requests per day and number of requests per second: default developer accounts allow for 1000 posts per day and 1 post per second. 3.4 Instruments used for semantic information DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to questions where the information is spread across many differ- ent Wikipedia articles. DBpedia is served on the Web under the terms of the GNU Free Documentation License. In order to full the requirements of different client applications and can be accessed through four mechanisms: Linked Data, SPARQL endpoint, RDF dumps and index lookup. Linked Data is a method of publishing RDF data on the Web that relies on HTTP URIs as resource identifers and the HTTP protocol to retrieve re- source descriptions. DBpedia resource identifers (such as http://dbpedia.org/ resource/Andy_Warhol) are set up to return RDF descriptions when accessed by Semantic Web agents and a simple HTML view of the same information to traditional Web browsers. HTTP content negotiation is used to deliver the appropriate format. A SPARQL endpoint is available for querying the Dbpedia knowledge base. Client applications can send queries over the SPARQL protocol to the endpoint at http://dbpedia.org/sparql. In addition to standard SPARQL, the end- point supports several extensions of the query language that have proved useful for developing client applications, such as full text search over selected RDF predicates, and aggregate functions, notably COUNT(). To protect the service from overload, limits on query complexity and result size are in place. The DBpedia knowledge base is sliced by triple predicate into several parts and N-Triple serializations of these parts are available for download on the DB- pedia website. In addition to the knowledge base that is served as Linked Data and via the SPARQL endpoint, the download page also ooffers infobox datasets that have been extracted from Wikipedia editions in 29 languages other than English. In order to make it easy for Linked Data publishers to find Dbpedia resource URIs to link to, a lookup service proposes DBpedia URIs for a given label. The Web service is based on a Lucene index providing a weighted label lookup, which combines string similarity with a relevance ranking in order to and the most likely matches for a given term. DBpedia lookup is available as a Web service at http://lookup.dbpedia.org/api/search.asmx. The XWiki Semantic Tagging component links information from the DBpedia index (short description for a tag, URI for the resource page, label) to the user-
    • 10 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia defined tags in the wiki. This is an extension to the default tagging mechanism for the XWiki platform which does not link the user-defined tags to a concept. 3.5 Common Tags The Semantic Tagging component uses the Common Tags RDFa vocabulary to bring semantic markup to the default XWiki tagging mechanism. Fig. 9. Example of semantic markup using RDFa for a wiki tag. 3.6 Implementation details Extensions for the XWiki Platform to implement the semantic tagging mecha- nism: – a XWiki application(SemTags.Tooltip) for the tag tooltip: contains a Javascript skin extension, Stylesheet skin extension; – a XWiki application (SemTags.CreateTagForm) for the new form for seman- tic tagging: velocity code to add a tag suggested from Zemanta, linked with information from DBpedia or just a tag already used in the wiki; – a XWiki component for the backend tag mechanism: connect to the Zemanta API, query the DBpedia index. – resources modifications: Javascript code to support the new tagging func- tionality; – template modifications: updating htmlheader.vm with the DOCTYPE of the XHTML wiki pages to support the new RDFa vocabulary, updating documentTags.vm with the new display for a keyword. The XWiki code lifecycle is based on maven, hence a maven archetype was used to help create a simple component module with respect to the XWiki architecture and components specific requirements. Since the XWiki platform is written using the Java programming language, a Java library was used to query the Zemanta engine and the API was added as a maven dependency for the XWiki component.
    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 11 Maven dependency for the Zemanta API. <dependency> <groupId>com.zemanta.api</groupId> <artifactId>zemapi</artifactId> <version>1.0</version> </dependency> The HTTPClient library was used to query the Dbpedia lookup web service and a dependency was also added in the component pom.xml. Maven dependency for the HTTPClient library. <dependency> <groupId>commons-httpclient</groupId></dependency> <artifactId>commons-httpclient</artifactId> <version>3.1</version> </dependency> Content of the component declaration file components.txt. org.xwiki.semtag.component.internal.DefaultSemanticTagger org.xwiki.semtag.component.internal. vcinitializer.SemanticTaggerVelocityContextInitializer The @ComponentRole annotation used for declaring the interface of the compo- nent. @ComponentRole public interface SemanticTagger { public ArrayList<SemanticTag> getSuggestions(String text); public void updateFirstSemanticDetail(SemanticTag tag) throws SAXException, ParserConfigurationException, RemoteException; public SemanticTag updateSemanticDetails(String tagName) throws ParserConfigurationException, SAXException; } The @Component annotation is used to implement the XWiki component which will be accessed using a scripting language like Velocity. @Component("tagger") public class SemanticTaggerVelocityContextInitializer implements VelocityContextInitializer { /** The key to add to the velocity context */ public static final String VELOCITY_CONTEXT_KEY = "tagger";
    • 12 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia @Requirement private SemanticTagger semanticTagger; /** * Add the component instance to the velocity context * received as parameter. */ public void initialize(VelocityContext context) { context.put(VELOCITY_CONTEXT_KEY, semanticTagger); } } Using the component API from Velocity to display the tag name, description and link to the DBpedia URI. #set($suggestedList = $tagger.getSuggestions("$request.text")) #foreach($suggestedTag in $suggestedList) #set($ok = $tagger.updateFirstSemanticDetail($suggestedTag)) #set($details = $suggestedTag.getSemanticDetails()) <li> <a class="suggested-tag" href="#">$suggestedTag.name</a> <span class="suggested-tag-info" style="display: none">$details.get(0).getDescription() <br/><a href="$details.get(0).getUri()">Visit</a> <div id="more-at">Powered by <a href="http://www.dbpedia.org"> <img src=’$dbpediaImg’ alt="Dbpedia"/></a></div> </span> </li> #end 4 Conclusions A tag is a relevant keyword or term associated with specific content and provide a very efficient method of describing information with metadata. The tagging extension for the XWiki platform provides semantic details extracted from the world’s biggest knowledge base improving the content understanding both user and the computer. 5 Bibliography 1. Common Tag, http://commontag.org/Home
    • Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 13 2. Bizer, Ch., Lehmann, J., Kobilarov, G., Auer, S., Becker, Ch., Cyganiak, R., Hell- mann, S.: Dbpedia A Crystallization Point for the Web of Data 3. Zemnata Developer Network, http://developer.zemanta.com/ 4. Tori,A.: Everything you need to know about Zemanta API besides the specification 5. Writing XWiki Components, http://platform.xwiki.org/xwiki/bin/view/ DevGuide/WritingComponents 6. ***, http://platform.xwiki.org/xwiki/bin/view/Main/ 7. ***, http://platform.xwiki.org/xwiki/bin/view/DevGuide/Architecture 8. ***, http://platform.xwiki.org/xwiki/bin/view/DevGuide/