The OpenCalais Web Service & Open API
• A Thomson Reuters initiative to connect all the world’s
• A free service that brings new efficiencies and
productivity to publishers and content curators.
• The fastest, easiest way to categorize your content, and
tag the entities, facts and events therein.
• Progress since Feb., 2008:
• 18,000 developers
• 20+ publishers using OpenCalais
• 50+ cool new apps and services created
• 4+ million documents per day processed
Free Metadata Generation
1. You feed your content into our
2. It categorizes the stories; finds
the people, places, companies,
facts and events, and then
returns that metadata to you
3. Along with the metadata, it
returns links to free data on the
open Web (i.e. Wikipedia, CIA
World Fact book, IMDB, etc.)
4. You use the metadata to
streamline content ops, enhance
your content, create topic hubs
on the fly, improve search, etc.
1. Cut and paste a business news story into the viewer,
and hit submit.
2. View the semantic markup (hover over underlined
items to see relevance, for instance).
3. Expand the extracted entities, facts and events on
the left hand rail.
4. Click on one of the companies in the list on the left,
to view the OpenCalais / Thomson Reuters asset on
that company in the Linked Data cloud.
5. Click the ‘SameAs’ links at the bottom to find more
data on the Linked Data cloud.
How Metadata Connects You to the Open Web
The Linked Data Cloud – December, 2008
Your Content & The OpenCalais Process
Metadata 3 Which provides
1 returned to
the user other Linked
Unstructur with keys Data pointers
Calais 2 Linked
entities, To a range of open
facts and and partner Linked
events data assets,
• Aggregate & organize content in new ways.
• Automatically produce topic-based sites.
• Improve search functionality.
• Generate better content recommendations.
• Publish reviews, articles & blog posts for programmatic use on the open Web
• Content Triage
• Hyper-local news
• Contextual Ad Placement
New Publishers to tap OpenCalais include
• The New Republic: The new TNR.com uses OpenPublish, an
OpenCalais-enabled Drupal-powered CMS to increase editorial productivity
& drive reader engagement.
• Al Jazeera English’s new blogging network: uses
OpenCalais for content operations & tagging; features Al Jazeera
correspondents from around the world.
• Slate Magazine’s News Dots Network: visualizes the
most recent topics in the news as a concise network of related topics.
• I *heart* Sea: a hyper-local news aggregation site that collects some
of the best blogs in Seattle, especially those serving the Capitol Hill area.
Media Monitoring and Intelligence Tools
• Meltwater: a rapidly growing SaaS-based provider in the Corporate IR
& PR Services
• Tattler (app): an open source topic monitoring tool for today's Web.
Tattler finds and aggregates content from the Web on topics users ask it to
• Interceder: a social media monitoring tool that makes it easy to track
trending topics and search the latest content from major news Web sites,
blogs, Twitter and YouTube.
• AskJot: a tool for analyzing web pages for keywords, and displaying
them as links to search results from services around the Web.
New Content Experiences / Open Research
• Feedly: a Firefox plug-in that brings user-selected inputs from Google
Reader, Twitter, RSS feeds, etc. in an easy-to-read magazine-style format.
• OpenPublish: a new CMS based on Drupal that integrates
OpenCalais from the ground up, OpenPublish is tailored to the needs of
today's online publishers & media providers.
• DocumentCloud: founded by reporters from The NYT and
ProPublica, and funded by the Knight Foundation, DocumentCloud will offer
public access to news reporters’ original source materials.
• MediaCloud:an open research tool from Harvard’s Berkman Center
that aggregates mainstream media and blogs to enable researchers to
identify how and where news coverage starts, what we’re missing, etc.
Why Thomson Reuters Cares
• Its mission is to connect all the world’s business-
relevant content to provide professionals with ‘intelligent
• The days of surviving
as a ‘walled garden’ of
content are over.
• ‘Crowdsourcing’ Q&A
creates faster, better,