Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Zemanta: A Content Recommendation Engine


Published on

Zemanta, a content recommendation engine.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Zemanta: A Content Recommendation Engine

  1. 1. Zemanta: A Content Recommendation Engine Claudiu Mih˘il˘ a a Faculty of Computer Science, ”Al.I. Cuza” University of Ia¸i, s 16, G-ral Berthelot Street, 700483 Ia¸i, Romania s Abstract. This paper reports on Zemanta, a content recommendation engine, regarding its utility, usability, and relation to DBpedia. It focusses on the multilingual applicability of the service, the semantic disambiguation using DBpedia and exemplifies its use through both application extensions and public api. Key words: Zemanta, social Web, DBpedia, content recommendation 1 Introduction In the context of an evermore increasing and expanding Web, one has become ineffective in acknowledging and accumulating the immense amount of information with which one is provided. Therefore, the need for an automatic content analysis and recommendation system is self-explanatory. By using such a tool, one has the possibility to sieve through all the data and find and access directly the information one is interested in. Furthermore, the ability of such a system to analyse content at a semantical level improves the accuracy and appropriateness of its result. In what follows, the content recommendation engine Zemanta is analysed with regard to its usability and relation to DBpedia. 2 Zemanta Zemanta1 is a content recommendation engine created for the use of bloggers and other types of content creators. By using Zemanta, authors are able to enrich their texts with appropriate images and links and add a list of related articles, tags and categories to their own creation. The content Zemanta suggests is obtained from multiple sources, such as Wikipedia, Youtube, IMDB,, Crunchbase, Flickr, ITIS, Musicbrainz, Mybloglog, Myspace, NCBI, Rottentoma- toes, Twitter, Facebook, Snooth and Wikinvest, as well as the blogs of other Zemanta users. One important aspect of the recommended content is that the suggested images are free to use by anyone, due to the fact that they were published under various copyleft or free licenses, such as Creative Commons or GNU General Public Licence. The content of related articles and links, although it may be protected by copyright laws, is presented as hyperlinks. Therefore, copyright issues are avoided. Originally released for the use of bloggers, Zemanta is available as a Firefox and Internet Explorer extension and plugins for Wordpress, Blogger, TypePad, Ning, MySpace, LiveJournal, MovableType, Tumblr, Drupal, and Joomla. Furthermore, Zemanta is available for the use inside web-based email environments, such as Gmail and Yahoo mail, and an Outlook add-in is being developed currently. 2.1 Mechanism Zemanta is an authoring application, and its fundamental process of recommending is depicted in Fig. 1. As it can be noticed, the content management system presents suggestions to the author as they create the text. The author is able to select which recommended pieces of information are appropriate to be incorporated into their own content. Nevertheless, it is to be observed that after publishing the
  2. 2. Fig. 1. Authoring process with Zemanta suggestions from Zemanta that were included in the text become static and do not change unless the author does so. Zemanta uses the http protocol for communication between the client and the server, and all queries are to be sent using only the post method. The provided explanation for avoiding the get method is that some web servers and proxies impose limits on the length of urls. The service conforms to the rest constraints and allows responses encoded in json and xml formats. 2.2 Multilingual aspects At the moment, Zemanta supports content writ- ten only in English. Nevertheless, the use of uni- versal words, trademarks or buzzwords gives Ze- manta sufficient input to create appropriate and accurate content recommendations. Furthermore, a combination with automatic translation Web ap- plications, in such a manner similar to the Faviki2 social bookmarking tool, creates the possibility of adding new content in languages different from English. The process depicted in Fig. 2 shows the orchestration of three web services, Zemanta, Google Language API and DBpedia in order to produce multilingual tags for Faviki [1]. 2.3 Relation to DBpedia DBpedia is a freely available, enormous source of knowledge, developed by extracting structured in- formation from Wikipedia and storing it as rdf triples. The knowledge base currently consists of more than 2.9 million things, including persons, places, species, etc. and over 479 million facts. Furthermore, it contains labels and abstracts de- scribing the concepts in 91 different languages. For each comprised entity, DBpedia defines a glob- Fig. 2. Faviki process of suggesting semantic tags ally unique identifier (uri) that can be derefer- using Zemanta, Google Language API and DBpedia enced according to the Linked Data principles [2, 3]. Moreover, the fact that the knowledge is connected with multiple various other data sources (e.g., MusicBrainz, Geonames, EuroStat) supplements the coverage of the real world. 1 2 2
  3. 3. The knowledge base of DBpedia represents one source of information in Zemanta’s recommending process. Therefore, Zemanta has the possibility to disambiguate between the entities mentioned in the text, and is able to provide better, more accurate responses to the users. Furthermore, by connecting the entities to DBpedia uris, the chances of knowledge reuse and correct future automatically performed semantic tasks, such as search, reasoning, or advertising based on the stated facts increase significantly. Moreover, due to the multilingual features included in the DBpedia design and development, Zemanta has potential access to suggest recommendations in different languages. In fact, Faviki combines the ser- vices offered by Zemanta with two other public services, Google Translate3 and DBpedia, for multilingual support [1]. Although the main language for Zemanta remains English, Faviki translates the input using Google Language and selects from DBpedia the recommended tags in the corresponding language. 2.4 Using Zemanta The services provided by Zemanta can be easily accessed either by using the available plug-ins or exten- sions, or by creating an application that connects to the Zemanta Application Programming Interface4 (api). Both methods are simple and straightforward, and are concisely described in what follows. Zemanta plug-ins and extensions By adding a Zemanta plug-in or extension to a third party appli- cation, users are able to receive almost instantaneously the content recommendations as they create the text. The images, tags, links or related articles are included automatically inside the text if selected from the list of recommended pieces of information. Furthermore, users are allowed to define own searching criteria and save their preferences. Application Programming Interface Another possibility to exploit Zemanta’s services is to create a new application or to extend an existing one, in order to connect it to the public api. Nevertheless, an api key is required by the server in order for it to respond to queries. Zemanta offers two types of keys, depending on the intended use. On the one hand, developers are able to obtain free api keys, which allow up to 1000 queries per day. On the other hand, Zemanta offers support for content management systems (cms) and platforms, which automatically assign keys to each of their end users. However, details in the case of cms are not provided and users are instructed to contact Zemanta personally. An example of Zemanta api usage in php with the curl library is included in Program 1. As it can be observed, the request is not complex and only a few lines of code are required in order to call Zemanta’s suggest method. The selected response format is xml, and a brief illustration of it is provided in Program 2. As shown, the return message contains multiple articles, links, images and categories, all accompanied by confidence values. Zemanta offers two more other public methods, suggest markup and preferences. The former is similar to the suggest method presented above, differing in only that it retrieves just the markup and links. The latter is used to get the preferences of a specific user from the server, information which is usually transmitted further to the suggest method. 3 Conclusions This paper focusses on Zemanta, a content recommendation tool. The system’s alignment to current Web application development principles, efficiency and usability prove that Zemanta is a powerful Web service. Furthermore, the high number of information sources and freely available suggested content strengthen this conclusion. Although it is available only for English content, combinations with automatic translation Web services have already been developed and are used successfully. 3 4 3
  4. 4. <?php $zemantaURL = ’’; $format = ’xml’; $text = "The text to be parsed by Zemanta"; $key = "zemanta_api_key"; $method = "zemanta.suggest"; $args = array( ’method’ => $method, ’api_key’ => $key, ’text’ => $text, ’format’ => $format ); $data = ""; foreach($args as $key => $value) { $data .= ($data != "") ? "&" : ""; $data .= urlencode($key)."=".urlencode($value); } $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $zemantaURL); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); ?> Program 1: Zemanta api usage in php with curl. 4
  5. 5. <rsp> <status>ok</status> <articles> <article> <url>;page=1</url> <confidence>0.048289</confidence> <published_datetime>2008-06-26T19:12:59Z</published_datetime> <title>Seeds of Life Found in Martian Soil</title> <zemified>0</zemified> </article> </articles> <markup> <text>The &lt;a class="zem_slink" href=""&gt; Phoenix Mars Lander&lt;/a&gt; has successfully deployed its &lt;a class="zem_slink" href="http: //"&gt;robotic arm&lt;/a&gt; and tested other instruments including a laser designed to detect dust, clouds, and fog. The arm will be used to dig up samples of the &lt;a class="zem_slink" href=""&gt;Martian&lt;/a&gt; surface which will be analyzed as a possible habitat for life.</text> <links> <link> <confidence>0.006165</confidence> <anchor>robotic arm</anchor> <target> <url></url> <type>wikipedia</type> <title>Robotic arm</title> </target> </link> </links> </markup> <images> <image> <description>An artist’s rendition of the Phoenix Mars probe during landing. The sophisticated landing system on Phoenix allows the spacecraft to touch down within 10 km (6.2 miles) of the targeted landing area. Thrusters are started when the lander is 570 m (1900 feet) above the surface. The navigation system is capable of detecting and avoiding hazards on the surface of Mars.</description> <attribution>Image via &lt;a href=" .jpg"&gt;Wikipedia&lt;/a&gt;</attribution> <license>Public domain</license> <confidence>0.99</confidence> <source_url></source_url> <url_l></url_l> <url_l_w>5200</url_l_w> <url_l_h>4800</url_l_h> </image> </images> <keywords> <keyword> <confidence>0.506297</confidence> <name>Mars</name> <scheme>general</scheme> </keyword> </keywords> <categories> <category> <confidence>0.195886</confidence> <categorization>dmoz</categorization> <name>Top/Science/Astronomy/Solar_System/Planets/Mars</name> </category> </categories> <signature>&lt;div class="zemanta-pixie"&gt;&lt;a class="zemanta-pixie-a" href="http://reblog." title="Zemified by Zemanta"&gt;&lt; img class="zemanta-pixie-img" src=" a22b-c07ba38b2d9f" alt="Zemanta Pixie" /&gt;&lt;/a&gt;&lt;/div&gt;</signature> <rid>40b3d04b-5248-4256-a22b-c07ba38b2d9f</rid> </rsp> Program 2: Brief Zemanta response in xml format. 5
  6. 6. References 1. Miliˇi´, V.: Case study: Semantic tags. (2008) cc 2. Berners-Lee, T.: Linked data-design issues. (2006) 3. Bizer, C., Cyganiak, R., Heath, T.: How to publish linked data on the web. http://sites.wiwiss.fu- (2007) 6