1. Zemanta: A Content Recommendation Engine
Claudiu Mih˘il˘
a a
Faculty of Computer Science,
”Al.I. Cuza” University of Ia¸i,
s
16, G-ral Berthelot Street,
700483 Ia¸i, Romania
s
claudiu.mihaila@info.uaic.ro
Abstract. This paper reports on Zemanta, a content recommendation engine, regarding its utility,
usability, and relation to DBpedia. It focusses on the multilingual applicability of the service, the
semantic disambiguation using DBpedia and exemplifies its use through both application extensions
and public api.
Key words: Zemanta, social Web, DBpedia, content recommendation
1 Introduction
In the context of an evermore increasing and expanding Web, one has become ineffective in acknowledging
and accumulating the immense amount of information with which one is provided. Therefore, the need
for an automatic content analysis and recommendation system is self-explanatory. By using such a tool,
one has the possibility to sieve through all the data and find and access directly the information one is
interested in. Furthermore, the ability of such a system to analyse content at a semantical level improves
the accuracy and appropriateness of its result.
In what follows, the content recommendation engine Zemanta is analysed with regard to its usability
and relation to DBpedia.
2 Zemanta
Zemanta1 is a content recommendation engine created for the use of bloggers and other types of content
creators. By using Zemanta, authors are able to enrich their texts with appropriate images and links and
add a list of related articles, tags and categories to their own creation.
The content Zemanta suggests is obtained from multiple sources, such as Wikipedia, Youtube,
IMDB, Amazon.com, Crunchbase, Flickr, ITIS, Musicbrainz, Mybloglog, Myspace, NCBI, Rottentoma-
toes, Twitter, Facebook, Snooth and Wikinvest, as well as the blogs of other Zemanta users.
One important aspect of the recommended content is that the suggested images are free to use by
anyone, due to the fact that they were published under various copyleft or free licenses, such as Creative
Commons or GNU General Public Licence. The content of related articles and links, although it may be
protected by copyright laws, is presented as hyperlinks. Therefore, copyright issues are avoided.
Originally released for the use of bloggers, Zemanta is available as a Firefox and Internet Explorer
extension and plugins for Wordpress, Blogger, TypePad, Ning, MySpace, LiveJournal, MovableType,
Tumblr, Drupal, and Joomla. Furthermore, Zemanta is available for the use inside web-based email
environments, such as Gmail and Yahoo mail, and an Outlook add-in is being developed currently.
2.1 Mechanism
Zemanta is an authoring application, and its fundamental process of recommending is depicted in Fig.
1. As it can be noticed, the content management system presents suggestions to the author as they
create the text. The author is able to select which recommended pieces of information are appropriate
to be incorporated into their own content. Nevertheless, it is to be observed that after publishing the
2. Fig. 1. Authoring process with Zemanta
suggestions from Zemanta that were included in the text become static and do not change unless the
author does so.
Zemanta uses the http protocol for communication between the client and the server, and all queries
are to be sent using only the post method. The provided explanation for avoiding the get method is
that some web servers and proxies impose limits on the length of urls. The service conforms to the rest
constraints and allows responses encoded in json and xml formats.
2.2 Multilingual aspects
At the moment, Zemanta supports content writ-
ten only in English. Nevertheless, the use of uni-
versal words, trademarks or buzzwords gives Ze-
manta sufficient input to create appropriate and
accurate content recommendations. Furthermore,
a combination with automatic translation Web ap-
plications, in such a manner similar to the Faviki2
social bookmarking tool, creates the possibility of
adding new content in languages different from
English. The process depicted in Fig. 2 shows
the orchestration of three web services, Zemanta,
Google Language API and DBpedia in order to
produce multilingual tags for Faviki [1].
2.3 Relation to DBpedia
DBpedia is a freely available, enormous source of
knowledge, developed by extracting structured in-
formation from Wikipedia and storing it as rdf
triples. The knowledge base currently consists of
more than 2.9 million things, including persons,
places, species, etc. and over 479 million facts.
Furthermore, it contains labels and abstracts de-
scribing the concepts in 91 different languages. For
each comprised entity, DBpedia defines a glob- Fig. 2. Faviki process of suggesting semantic tags
ally unique identifier (uri) that can be derefer- using Zemanta, Google Language API and DBpedia
enced according to the Linked Data principles [2,
3]. Moreover, the fact that the knowledge is connected with multiple various other data sources (e.g.,
MusicBrainz, Geonames, EuroStat) supplements the coverage of the real world.
1
http://www.zemanta.com/
2
http://www.faviki.com/
2
3. The knowledge base of DBpedia represents one source of information in Zemanta’s recommending
process. Therefore, Zemanta has the possibility to disambiguate between the entities mentioned in the
text, and is able to provide better, more accurate responses to the users. Furthermore, by connecting the
entities to DBpedia uris, the chances of knowledge reuse and correct future automatically performed
semantic tasks, such as search, reasoning, or advertising based on the stated facts increase significantly.
Moreover, due to the multilingual features included in the DBpedia design and development, Zemanta
has potential access to suggest recommendations in different languages. In fact, Faviki combines the ser-
vices offered by Zemanta with two other public services, Google Translate3 and DBpedia, for multilingual
support [1]. Although the main language for Zemanta remains English, Faviki translates the input using
Google Language and selects from DBpedia the recommended tags in the corresponding language.
2.4 Using Zemanta
The services provided by Zemanta can be easily accessed either by using the available plug-ins or exten-
sions, or by creating an application that connects to the Zemanta Application Programming Interface4
(api). Both methods are simple and straightforward, and are concisely described in what follows.
Zemanta plug-ins and extensions By adding a Zemanta plug-in or extension to a third party appli-
cation, users are able to receive almost instantaneously the content recommendations as they create the
text. The images, tags, links or related articles are included automatically inside the text if selected from
the list of recommended pieces of information. Furthermore, users are allowed to define own searching
criteria and save their preferences.
Application Programming Interface Another possibility to exploit Zemanta’s services is to create
a new application or to extend an existing one, in order to connect it to the public api. Nevertheless, an
api key is required by the server in order for it to respond to queries. Zemanta offers two types of keys,
depending on the intended use. On the one hand, developers are able to obtain free api keys, which allow
up to 1000 queries per day. On the other hand, Zemanta offers support for content management systems
(cms) and platforms, which automatically assign keys to each of their end users. However, details in the
case of cms are not provided and users are instructed to contact Zemanta personally.
An example of Zemanta api usage in php with the curl library is included in Program 1. As it
can be observed, the request is not complex and only a few lines of code are required in order to call
Zemanta’s suggest method. The selected response format is xml, and a brief illustration of it is provided
in Program 2. As shown, the return message contains multiple articles, links, images and categories, all
accompanied by confidence values.
Zemanta offers two more other public methods, suggest markup and preferences. The former is
similar to the suggest method presented above, differing in only that it retrieves just the markup and
links. The latter is used to get the preferences of a specific user from the server, information which is
usually transmitted further to the suggest method.
3 Conclusions
This paper focusses on Zemanta, a content recommendation tool. The system’s alignment to current Web
application development principles, efficiency and usability prove that Zemanta is a powerful Web service.
Furthermore, the high number of information sources and freely available suggested content strengthen
this conclusion. Although it is available only for English content, combinations with automatic translation
Web services have already been developed and are used successfully.
3
http://translate.google.com/
4
http://api.zemanta.com/services/rest/0.0/
3
4. <?php
$zemantaURL = ’http://api.zemanta.com/services/rest/0.0/’;
$format = ’xml’;
$text = "The text to be parsed by Zemanta";
$key = "zemanta_api_key";
$method = "zemanta.suggest";
$args = array(
’method’ => $method,
’api_key’ => $key,
’text’ => $text,
’format’ => $format
);
$data = "";
foreach($args as $key => $value)
{
$data .= ($data != "") ? "&" : "";
$data .= urlencode($key)."=".urlencode($value);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $zemantaURL);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
?>
Program 1: Zemanta api usage in php with curl.
4
5. <rsp>
<status>ok</status>
<articles>
<article>
<url>http://abcnews.go.com/Technology/story?id=5255072&page=1</url>
<confidence>0.048289</confidence>
<published_datetime>2008-06-26T19:12:59Z</published_datetime>
<title>Seeds of Life Found in Martian Soil</title>
<zemified>0</zemified>
</article>
</articles>
<markup>
<text>The <a class="zem_slink" href="http://en.wikipedia.org/wiki/Phoenix_%28spacecraft%29">
Phoenix Mars Lander</a> has successfully deployed its <a class="zem_slink" href="http:
//en.wikipedia.org/wiki/Robotic_arm">robotic arm</a> and tested other instruments including
a laser designed to detect dust, clouds, and fog. The arm will be used to dig up samples of the <a
class="zem_slink" href="http://en.wikipedia.org/wiki/Mars">Martian</a> surface which will be
analyzed as a possible habitat for life.</text>
<links>
<link>
<confidence>0.006165</confidence>
<anchor>robotic arm</anchor>
<target>
<url>http://en.wikipedia.org/wiki/Robotic_arm</url>
<type>wikipedia</type> <title>Robotic arm</title>
</target>
</link>
</links>
</markup>
<images>
<image>
<description>An artist’s rendition of the Phoenix Mars probe during landing. The sophisticated
landing system on Phoenix allows the spacecraft to touch down within 10 km (6.2 miles) of the targeted
landing area. Thrusters are started when the lander is 570 m (1900 feet) above the surface. The
navigation system is capable of detecting and avoiding hazards on the surface of Mars.</description>
<attribution>Image via <a href="http://commons.wikipedia.org/wiki/Image:Phoenix_landing
.jpg">Wikipedia</a></attribution>
<license>Public domain</license> <confidence>0.99</confidence>
<source_url>http://commons.wikipedia.org/wiki/Image:Phoenix_landing.jpg</source_url>
<url_l>http://upload.wikimedia.org/wikipedia/commons/6/6a/Phoenix_landing.jpg</url_l>
<url_l_w>5200</url_l_w> <url_l_h>4800</url_l_h>
</image>
</images>
<keywords>
<keyword>
<confidence>0.506297</confidence> <name>Mars</name> <scheme>general</scheme>
</keyword>
</keywords>
<categories>
<category>
<confidence>0.195886</confidence> <categorization>dmoz</categorization>
<name>Top/Science/Astronomy/Solar_System/Planets/Mars</name>
</category>
</categories>
<signature><div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.
zemanta.com/zemified/40b3d04b-5248-4256-a22b-c07ba38b2d9f/" title="Zemified by Zemanta"><
img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=40b3d04b-5248-4256-
a22b-c07ba38b2d9f" alt="Zemanta Pixie" /></a></div></signature>
<rid>40b3d04b-5248-4256-a22b-c07ba38b2d9f</rid>
</rsp>
Program 2: Brief Zemanta response in xml format.
5
6. References
1. Miliˇi´, V.: Case study: Semantic tags. http://www.w3.org/2001/sw/sweo/public/UseCases/Faviki/ (2008)
cc
2. Berners-Lee, T.: Linked data-design issues. http://www.w3.org/DesignIssues/LinkedData.html (2006)
3. Bizer, C., Cyganiak, R., Heath, T.: How to publish linked data on the web. http://sites.wiwiss.fu-
berlin.de/suhl/bizer/pub/LinkedDataTutorial/ (2007)
6