OpenCalais in Linked Data contextDocument Transcript
Using OpenCalais API in the context of Linked Data
Eldorina Andreea Alergus
Faculty of Computer Science, Distributed Systems
Abstract. In this paper we discuss about OpenCalais Api in the context of
Linked Data. With the growth of Linked Datasets, automating certain tasks,
such as discovery or interlinking data becomes more and more important. We
will survey in this work what OpenCalais is offering us for linking the data.
Keywords: OpenCalais, linked data, Web of Data
The OpenCalais Web Service automatically generates rich semantic metadata for the
submitted content. OpenCalais analyses the content using method as: natural language
processing (NLP) or machine learning and finds the entities (Company, Country,
City, Product, Movie etc) within it, and more, it finds events (person P was hired at
company C) and facts (person P works for company C) within your text. The
metadata returned as response is an RDF construct that is also centrally stored.
The metadata gives us the possibility of building maps, networks or graphs by
linking documents to people, geographies, places, companies, etc. Those maps can be
used in order to verify if our content contains what we expect, to tag and organize it
and also to create structured folksonomies or to improve site navigation. We can share
our maps with anyone else in the content ecosystem.
The Calais ecosystem is exposed via Linked Data endpoints. We use the term
Linked Data to describe a method of exposing, sharing and connecting data on the
Web via dereferenceable URIs. Having linked data, we can find other related
data. This is the Semantic Web, it’s about interlinking data, so that a person or a
machine to be able to explore the web of data. The main idea behind linked data is
that we may increase the value and the usability of data by connecting it with other
2 Eldorina Andreea Alergus
Calais is part of the Linked Open Data (LOD) Cloud, and it links to the following
assets: Dbpedia, Wikipedia, Freebase, Reuters.com, GeoNames, Shopping.com,
IMDB, LinkedMDB. In order to understand what Calais is offering, we must first
understand the concept of Linked Data.
2 Linked Data
As we said above, Linked Data is the technique of publishing data on the Web and
interlinking data between different sources. It is machine-readable, its meaning is
explicitly defined, it is linked to other external data sets and can also be linked from
external data sets. Linked Data is based on RDF (Resource Description Framework)
documents, which is used to make typed statements that link arbitrary things in the
world. In order to access the web of data, we use Linked Data browsers (Tabulator,
Disco, RDFViz, BrowseRDF, etc) which enable navigation between different sources
using RDF. For instance, while looking at data about a product, a user may be
interested in information about the company that produces the thing. Following the
RDF link, he can navigate to information about that company contained in another
Berners-Lee outlined a set of rules in order to publish data on the Web in a way
that all published data becomes part of a single global data space:
1. Name things using URIs (Uniform Resource Identifiers).
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information using the
standards (RDF, SPARQL).
4. Data should be interlinked with other data.
These principles provide a basic recipe for publishing and connecting data using
the infrastructure of the Web while adhering to its architecture and standards.
Linked Data relies on two fundamental technologies: URI and HTTP. URIs
provide generic methods of identifying any existing entity. Entities identified by URIs
that use http:// can be looked up by dereferencing the URI. We say dereferencing a
URI is the act of retrieving the representation of a resource identified by that URI.
Using OpenCalais API in the context of Linked Data 3
To URI and HTTP we add a necessarily technology to the Web of Data – the RDF.
Similarly to HTML which provides the means to structure and link documents on the
Web, RDF provides a graph-based data model to structure and link data that describes
In RDFs data has the form of a triple: subject, predicate, object. The subject and
the object are URIs that identify a resource, or a URI and a string. The predicate
describes how the subject and the object are related, and is also represented by a URI.
A linked dataset is a collection of data, published and maintained by a single
provider, available as RDF on the Web, where at least some of the resources in the
dataset are identified by dereferenceable URIs (http://rdfs.org/ns/void/html). In the
image below, we have an image of the Linked Open Data Cloud, on which we can see
the available datasets, and the links between them.
By publishing data on the Web according to the Linked Data principles, we add
our data to a global data space, which allows data to be discovered and used by
various applications. To publish data set a Linked Data on the web, we must follow
three basic steps:
4 Eldorina Andreea Alergus
- Assign Uris to the entities described by the dataset and provide for
dereferencing these URIs into RDF representations.
- Set RDF links to other data sources on the Web.
- Provide metadata about the published data so that clients to evaluate the
quality of the published data.
We will talk forward about how we can create rich semantic metadata for some
3 OpenCalais Web Service
As we already said in Introduction, The OpenCalais Web Service automatically
generates rich semantic metadata for the submitted content. It uses natural language
processing (NLP), machine learning and other methods to analyze content and return
the entities it finds, such as the cities, countries and people with dereferenceable
Linked Data style URIs. The events, facts and entity types, are defined in the
OpenCalais RDF Schemas (http://s.opencalais.com/1/pred/asf/1/pred/.html).
In order to get started with OpenCalais, you first need to get an API key. Do get
the key, you must register at http://www.opencalais.com/user/register. The Calais WS
can be called from .NET, java, php etc using SOAP or REST. We can also use Calais
Viewer to see how it works, and what the output of a Calais call is.
When we want to make a call to Calais API, we must provide some input
parameters, whom must be HTTP encoded. The service we invoke is at
http://api.opencalais.com/enlighten/?wsdl. We will explain what do we need to call
the service via SOAP.
The method enlighten which allows to call the Open Calais web service via soap
has three parameters:
- licenseId. This is your API key that you can get from Calais site.
- paramsXML. Those are the input parameters of the service in XML format.
More information about the input parameters we can find at
Using OpenCalais API in the context of Linked Data 5
- content. This is the content on which the extraction will be performed.
For start we use a simple text as content: The Palace of Versailles, or simply
Versailles, is a royal château in Versailles, the Île-de-France region of France.
When the château was built, Versailles was a country village; today, however, it is a
suburb of Paris, some twenty kilometers southwest of the French capital. The court of
Versailles was the center of political power in France from 1682, when Louis XIV
moved from Paris, until the royal family was forced to return to the capital in October
1789 after the beginning of French Revolution. Versailles is therefore famous not only
as a building, but as a symbol of the system of absolute monarchy of the Ancien
We call the service using C# as follows: add in our project a service reference to
the Calais wsdl, then call the service as it follows:
CalaisReference.calaisSoapClient client = new
string response = client.Enlighten(m_Licence,m_Content,
The m_Content and m_Params is better to be read fron a file, and the response (a
RDF) should also be kept in a file.
The entities found are: City (Paris, France), Country (France) and Facility (Palace
of Versailles). If we look at the URI http://d.opencalais.com/er/geo/city/ralg-
geo1/797c999a-d455-520d-e5cf-04ca7fb255c1.html, we can say thet the entity (City)
has been disambiguated, because it contains /er/. The entities which contain /em/ are
not disambiguated by OpenCalais. If we open the link in a browser, we see that is was
linked to other data sets (OpenCalais is linked to Freebase, Dbpedia, Geonames,
Linked IMDB) as: http://dbpedia.org/resource/Paris,
http://sws.geonames.org/2988507/ and is also has assigned a Web link -
For the detected entities OpenCalais provides an entity relevance score (shown for
each respectively in the screen shots below ) The relevance capability detects the
importance of each unique entity and assigns a relevance score in the range 0-1 (1
6 Eldorina Andreea Alergus
being the most relevant and important). We see that France is the most relevant
For a better understanding of how Calais can be used, we take a look at
http://gvlt.appspot.com/opencalais-geo/. In this project, the Calais API is used to
identify geographic references in a text and display them on an Open Layers map. The
Calais is used with JSON output, and all the processing is done on client side in the
OpenCalais can also be useful to content managers to create smart indexes. Instead
of indexing by keywords, you can index by referenced subject. If you have a
collection of unstructured documents, in a website for example, you can use
OpenCalais to help manage and reference them together. By using the OpenCalais
API, a website's side navigation bar can suggest other related documents based on the
conceptual subject, instead of word matching as is used by most indexes. By taking
the RDF/XML document returned by the OpenCalais HTTP interface and storing it in
a RDF store, you can enable an application to find documents related to anything in
the RDF store. (http://www.devx.com/semantic/Article/38517/1763/page/2).
Nowadays, the Web means more than just putting data on the web, it means
interlinking and sharing data as we share documents. The web is seen as an increasing
global graph. It started with the assumption that the values and usefulness of the data
Using OpenCalais API in the context of Linked Data 7
increases by creating links between the data. This is what Linked Data means: uses
the Web to create typed links between data from different sources.
Calais is a rapidly growing toolkit of capabilities that allows you to readily
incorporate state-of-the-art semantic functionality within your blog, content
management system, website or application. We have described in this paper how the
Calais WS can be invoked and what the RDF output is offering us. OpenCalais
represents an important move forward Semantic Web. With OpenCalais computers
could do the research for you, combing through and comparing company names,
locations and rumored or real transactions real time to give you answers in a way that
keyword search simply cannot do.
 C. Bizer, R. Cyganiak, T. Heath, How to Publish Linked Data on the Web
 T. Heath, An Introduction to Linked Data, 2009
 C. Bizer, T. Heath, T. Berners-Lee, Linked Data - The Story So Far
 M. Watson, Practical Semantic Web Programming With AllegroGraph, 2009
 K. Alexander, R. Cyganiaky, M. Hausenblasz, J. Zhaox, Describing Linked