OpenCalais in Linked Data context


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

OpenCalais in Linked Data context

  1. 1. Using OpenCalais API in the context of Linked Data Eldorina Andreea Alergus Faculty of Computer Science, Distributed Systems Abstract. In this paper we discuss about OpenCalais Api in the context of Linked Data. With the growth of Linked Datasets, automating certain tasks, such as discovery or interlinking data becomes more and more important. We will survey in this work what OpenCalais is offering us for linking the data. Keywords: OpenCalais, linked data, Web of Data 1 Introduction The OpenCalais Web Service automatically generates rich semantic metadata for the submitted content. OpenCalais analyses the content using method as: natural language processing (NLP) or machine learning and finds the entities (Company, Country, City, Product, Movie etc) within it, and more, it finds events (person P was hired at company C) and facts (person P works for company C) within your text. The metadata returned as response is an RDF construct that is also centrally stored. The metadata gives us the possibility of building maps, networks or graphs by linking documents to people, geographies, places, companies, etc. Those maps can be used in order to verify if our content contains what we expect, to tag and organize it and also to create structured folksonomies or to improve site navigation. We can share our maps with anyone else in the content ecosystem. The Calais ecosystem is exposed via Linked Data endpoints. We use the term Linked Data to describe a method of exposing, sharing and connecting data on the Web via dereferenceable URIs.[15] Having linked data, we can find other related data. This is the Semantic Web, it’s about interlinking data, so that a person or a machine to be able to explore the web of data. The main idea behind linked data is that we may increase the value and the usability of data by connecting it with other related data.
  2. 2. 2 Eldorina Andreea Alergus Calais is part of the Linked Open Data (LOD) Cloud, and it links to the following assets: Dbpedia, Wikipedia, Freebase,, GeoNames,, IMDB, LinkedMDB. In order to understand what Calais is offering, we must first understand the concept of Linked Data. 2 Linked Data As we said above, Linked Data is the technique of publishing data on the Web and interlinking data between different sources. It is machine-readable, its meaning is explicitly defined, it is linked to other external data sets and can also be linked from external data sets. Linked Data is based on RDF (Resource Description Framework) documents, which is used to make typed statements that link arbitrary things in the world. In order to access the web of data, we use Linked Data browsers (Tabulator, Disco, RDFViz, BrowseRDF, etc) which enable navigation between different sources using RDF. For instance, while looking at data about a product, a user may be interested in information about the company that produces the thing. Following the RDF link, he can navigate to information about that company contained in another dataset. Berners-Lee outlined a set of rules in order to publish data on the Web in a way that all published data becomes part of a single global data space: 1. Name things using URIs (Uniform Resource Identifiers). 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information using the standards (RDF, SPARQL). 4. Data should be interlinked with other data. These principles provide a basic recipe for publishing and connecting data using the infrastructure of the Web while adhering to its architecture and standards. Linked Data relies on two fundamental technologies: URI and HTTP. URIs provide generic methods of identifying any existing entity. Entities identified by URIs that use http:// can be looked up by dereferencing the URI. We say dereferencing a URI is the act of retrieving the representation of a resource identified by that URI.[16] 2
  3. 3. Using OpenCalais API in the context of Linked Data 3 To URI and HTTP we add a necessarily technology to the Web of Data – the RDF. Similarly to HTML which provides the means to structure and link documents on the Web, RDF provides a graph-based data model to structure and link data that describes things. In RDFs data has the form of a triple: subject, predicate, object. The subject and the object are URIs that identify a resource, or a URI and a string. The predicate describes how the subject and the object are related, and is also represented by a URI. A linked dataset is a collection of data, published and maintained by a single provider, available as RDF on the Web, where at least some of the resources in the dataset are identified by dereferenceable URIs ( In the image below, we have an image of the Linked Open Data Cloud, on which we can see the available datasets, and the links between them. By publishing data on the Web according to the Linked Data principles, we add our data to a global data space, which allows data to be discovered and used by various applications. To publish data set a Linked Data on the web, we must follow three basic steps:
  4. 4. 4 Eldorina Andreea Alergus - Assign Uris to the entities described by the dataset and provide for dereferencing these URIs into RDF representations. - Set RDF links to other data sources on the Web. - Provide metadata about the published data so that clients to evaluate the quality of the published data. We will talk forward about how we can create rich semantic metadata for some content. 3 OpenCalais Web Service As we already said in Introduction, The OpenCalais Web Service automatically generates rich semantic metadata for the submitted content. It uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The events, facts and entity types, are defined in the OpenCalais RDF Schemas ( In order to get started with OpenCalais, you first need to get an API key. Do get the key, you must register at The Calais WS can be called from .NET, java, php etc using SOAP or REST. We can also use Calais Viewer to see how it works, and what the output of a Calais call is. When we want to make a call to Calais API, we must provide some input parameters, whom must be HTTP encoded. The service we invoke is at We will explain what do we need to call the service via SOAP. The method enlighten which allows to call the Open Calais web service via soap has three parameters: - licenseId. This is your API key that you can get from Calais site. - paramsXML. Those are the input parameters of the service in XML format. More information about the input parameters we can find at calls/input-parameters. 4
  5. 5. Using OpenCalais API in the context of Linked Data 5 - content. This is the content on which the extraction will be performed. For start we use a simple text as content: The Palace of Versailles, or simply Versailles, is a royal château in Versailles, the Île-de-France region of France. When the château was built, Versailles was a country village; today, however, it is a suburb of Paris, some twenty kilometers southwest of the French capital. The court of Versailles was the center of political power in France from 1682, when Louis XIV moved from Paris, until the royal family was forced to return to the capital in October 1789 after the beginning of French Revolution. Versailles is therefore famous not only as a building, but as a symbol of the system of absolute monarchy of the Ancien Régime. We call the service using C# as follows: add in our project a service reference to the Calais wsdl, then call the service as it follows: CalaisReference.calaisSoapClient client = new CalaisReference.calaisSoapClient(); string response = client.Enlighten(m_Licence,m_Content, m_Params()); The m_Content and m_Params is better to be read fron a file, and the response (a RDF) should also be kept in a file. The entities found are: City (Paris, France), Country (France) and Facility (Palace of Versailles). If we look at the URI geo1/797c999a-d455-520d-e5cf-04ca7fb255c1.html, we can say thet the entity (City) has been disambiguated, because it contains /er/. The entities which contain /em/ are not disambiguated by OpenCalais. If we open the link in a browser, we see that is was linked to other data sets (OpenCalais is linked to Freebase, Dbpedia, Geonames, Linked IMDB) as:, , and is also has assigned a Web link - For the detected entities OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1
  6. 6. 6 Eldorina Andreea Alergus being the most relevant and important). We see that France is the most relevant (69%). For a better understanding of how Calais can be used, we take a look at In this project, the Calais API is used to identify geographic references in a text and display them on an Open Layers map. The Calais is used with JSON output, and all the processing is done on client side in the browser. OpenCalais can also be useful to content managers to create smart indexes. Instead of indexing by keywords, you can index by referenced subject. If you have a collection of unstructured documents, in a website for example, you can use OpenCalais to help manage and reference them together. By using the OpenCalais API, a website's side navigation bar can suggest other related documents based on the conceptual subject, instead of word matching as is used by most indexes. By taking the RDF/XML document returned by the OpenCalais HTTP interface and storing it in a RDF store, you can enable an application to find documents related to anything in the RDF store. ( 4 Conclusions Nowadays, the Web means more than just putting data on the web, it means interlinking and sharing data as we share documents. The web is seen as an increasing global graph. It started with the assumption that the values and usefulness of the data 6
  7. 7. Using OpenCalais API in the context of Linked Data 7 increases by creating links between the data. This is what Linked Data means: uses the Web to create typed links between data from different sources. Calais is a rapidly growing toolkit of capabilities that allows you to readily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application. We have described in this paper how the Calais WS can be invoked and what the RDF output is offering us. OpenCalais represents an important move forward Semantic Web. With OpenCalais computers could do the research for you, combing through and comparing company names, locations and rumored or real transactions real time to give you answers in a way that keyword search simply cannot do. 5 References [1] C. Bizer, R. Cyganiak, T. Heath, How to Publish Linked Data on the Web [2] T. Heath, An Introduction to Linked Data, 2009 [3] C. Bizer, T. Heath, T. Berners-Lee, Linked Data - The Story So Far [4] M. Watson, Practical Semantic Web Programming With AllegroGraph, 2009 [5] K. Alexander, R. Cyganiaky, M. Hausenblasz, J. Zhaox, Describing Linked Datasets [6] [7] [8] [9] extract-entities-facts-and-events-in-4-minutes/ [10] [11] [12] [13] ets [14]
  8. 8. 8 Eldorina Andreea Alergus [15] [16] 8