SlideShare a Scribd company logo
1 of 4
Download to read offline
Thomson Reuters Calais Web Service & the Linked Content Economy
Executive Summary: The rise of the Internet has brought dramatic change to the publishing
industry. While newspapers in particular struggle to adapt, advertisers are cutting budgets,
seeking new efficiencies and increasingly using the Web to go straight to the consumer.

Semantic technologies and new open data resources on the Web give both publishers and
advertisers new tools and services that can help them succeed. The Thomson Reuters Calais
Web service, found at OpenCalais.com, is one such service.

Calais identifies and automatically tags the
people, places, companies, facts and events        “Calais turns static text into ‘Smart
in text. It then forges connections between        Media’ that is enriched with open data
those entities and relevant data sets, media       and connected to a dynamic ‘Linked
files, Wikipedia entries and more on the open      Content Economy’.”
Web. Finally, it gives publishers a new way to
share that tagged content with next generation         -Thomas (Tom) Tague, Calais initiative lead
search engines, news aggregators and others
in the content ecosystem.

Armed with this powerful new tool, forward-looking publishers are automating time consuming
content operations and increasing editorial productivity. They are also enhancing the value of
their content, improving their user experience and preparing to reach more readers in tomorrow’s
media landscape – increasingly called the ‘linked content economy.’

Background: Calais is a strategic initiative at Thomson Reuters to advance the interoperability of
content and support the company’s mission to provide pervasive intelligent information to its
customers. Calais uses Natural Language Processing to give publishers free metatagging
services, developer tools and an open standard for the generation of semantic content.

The latest update to Calais – Calais 4.0 – is a significant advance on the initiative’s goals. The
Calais team originally set out to help developers, bloggers and publishers automatically tag their
content to improve search and navigation, and enable new reader engagement features.

With Calais 4.0, the Calais Web service goes beyond metatagging to help publishers enhance
their content, using open data from sources like Wikipedia, DBpedia. GeoNames, the Internet
Movie Database (IMDB), Shopping.com and more. It also makes it easy for publishers to use
their metadata to share their content with next generation content consumers – such as search
engines, news aggregators ‘related stories’ service and more – to ultimately reach more readers.

With these added capabilities, Calais helps content creators and content consumers alike
connect to the rapidly emerging ‘Linked Content Economy’ and deliver ‘Smart Media.’

The Linked Content Economy & Smart Media: The Linked Content Economy is an evolving
ecosystem of enriched and connected content that helps publishers engage readers, improve the
user experience, and – ultimately – better convert readership to revenue.

Linked Content goes beyond ‘link journalism,’ (linking to related stories, etc.). It uses metadata to
help publishers create “Smart Media” – content that automatically connects the concepts, people,
companies, etc. it contains to a rich array of related data sets and media assets on the Web.

It then uses metadata to help publishers share their “Smart Media” with the rest of the content
ecosystem, including search engines, news aggregators, ‘related stories’ applications and more.

How it Works:

 1. Publishers submit content to the Calais
    Web service using their Calais API key.

 2. Calais tags each person, place, fact and
    event in the content, making it machine-
    readable and interoperable on the Web.

 3. Each piece of content - and each entity or
    event in that content - is assigned a unique
    identifier (a document ID and many URIs)
    that ties back to the Linked Data Cloud.

 4. Publishers use the metadata Calais returns
    (tags, document IDs and URIs) to enhance
    their content and create features like topic
    pages that improve the reader experience.

 5. Publishers can also use their metadata to
    share their content with next generation
    search engines, news aggregators, etc.



Calais’ participation in this ecosystem is as a platform. Calais lays the foundation on which, in
conjunction with Content Management Systems, users can create a next generation publishing
site, service or community.

Calais adopted the Linked Data standard to build a back-end infrastructure and repository,
enabling linkage between concepts and documents. Linked Data is a standard promulgated by
Sir Tim Berners-Lee. Here are some of the open data assets in the Linked Data cloud.
By embracing the Linked Data standard —and by creating a Calais repository of Linked Data
assets on publicly-traded companies — Thomson Reuters has built scaffolding that enables Web
sites, social networks and other content-rich applications to navigate between previously separate
silos of data and information. Here’s how it works:

1.) When Calais processes an article, it extracts many named entities. For some classes of
    named entities, such as companies, Calais now also returns an HTTP hyperlink, called a
    Uniform Resource Identifier (URI).
2.) This hyperlink points into the Calais repository, to a machine readable XML page containing
    related content (company description, management team, board of directors, etc.) as well as
    links to related assets in DBpedia, from Thomson Reuters, etc.
3.) This linked data infrastructure forms a web-of-links that applications can navigate and use to
    pull information up for display or integration into the user experience.

Calais has thus created a lingua-franca to drive
content interoperability, and provided a simple          “Calais provides a transportation layer
standard for the sharing of rich semantic metadata       that enables users to share their semantic
                                                         metadata with downstream consumers
Here’s an example:                                       like search engines, news aggregators,
A news story breaks on an IBM earnings report.           ‘related stories’ applications and more.”
The user wants to find out if IBM has any affiliation
with Warren Buffett of Berkshire Hathaway.                      -Thomas (Tom) Tague, Calais initiative lead


Today such a complex query requires time-consuming research. Search engines can’t hopscotch
through content.
But with Calais:
1. The news application sends the story to Calais.
2. Calais extracts IBM from the news story, ties it to International Business Machines
    Corporation in the Linked Data cloud and returns the URI (i.e. hyperlink) for IBM
3. The app. uses the IBM URI to retrieve the list of the Board of Director members from the
    Retuers.com content in the Calais repository
4. The app. queries the Board members for their other affiliations and finds a member that is
    also on the Board of Coca Cola plus a member that is the CEO of American Express
5. The app. runs a query of shareholders of Coca Cola and finds Berkshire Hathaway.
6. The app. runs a query on shareholders of American Express and finds Berkshire Hathaway.

 IBM Corporation

 Board of Directors

 Cathleen Black                Cathleen Black
 William Brody
 Kenneth Chenault            Other Affiliations
 Michael Eskew
                             President,
                             Hearst
                             Magazines
                             Board Member,             Coca Cola
                                                                             Berkshire Hathaway
                                                   Key Stockholders
                                                                             Management Team
                            Kenneth Chenault
                                                   Berkshire
                                                                             Warren Buffett
                            Other Affiliations
                                                                             Charlie Munger

                            CEO,
                            American Express         American Express

                                                     Key Stockholders


                                                     Berkshire Hathaway



Semantic extraction is far more powerful than keyword search, which can confuse Paris (Texas),
Paris (France) and Paris (Hilton). Calais can determine that the Paris in this particular article is
Paris Texas based on sophisticated disambiguation that leverages a variety of clues in the text.

New Applications: Calais 4.0 and beyond will enable many emergent applications including:
- Publisher sites that dynamically mingle and deliver additional relevant content based on user
   preferences, profiles, history, friends’ selections and breaking topics that are hot now.
- Media Monitoring tools that deliver slices of relevant information, e.g. content from all sites
   and blogs discussing natural disasters occurring near iron mines in Southeast Asia.
- Plug-ins that integrate social networking / community / blogging, and bypass search.
- Semantic ad networks and servers that go beyond keywords to inform ad placement with
   context, e.g. preventing airline ads from appearing next to news of air accidents.

Conclusion: Armed with this powerful new tool, publishers are automating content operations,
increasing productivity and cutting costs. They are enhancing the value of their content,
improving their user experience and preparing to lead in the linked content economy.

No-one can predict precisely what kinds of creative and potentially game-changing applications
will emerge. With more than nine thousand users in the OpenCalais.com community, Thomson
Reuters expects to see hyper-evolution in many arenas.                                         

More Related Content

Similar to Publisher whitepaper

The OpenCalais Workshop at WeMedia 2010
The OpenCalais Workshop at WeMedia 2010The OpenCalais Workshop at WeMedia 2010
The OpenCalais Workshop at WeMedia 2010Krista Thomas
 
Five Ways To Calais V01
Five Ways To Calais V01Five Ways To Calais V01
Five Ways To Calais V01Thomas Tague
 
OpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilOpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilKrista Thomas
 
How to Develop an Enterprise Content Syndication Strategy
How to Develop an Enterprise Content Syndication StrategyHow to Develop an Enterprise Content Syndication Strategy
How to Develop an Enterprise Content Syndication StrategyScott Abel
 
Web 3 0 Krista Thomas
Web 3 0 Krista ThomasWeb 3 0 Krista Thomas
Web 3 0 Krista ThomasMediabistro
 
Web 3 0 Krista Thomas 1 26 10
Web 3 0 Krista Thomas 1 26 10Web 3 0 Krista Thomas 1 26 10
Web 3 0 Krista Thomas 1 26 10Krista Thomas
 
Oxford Seo.Com Presentation
Oxford Seo.Com PresentationOxford Seo.Com Presentation
Oxford Seo.Com PresentationIgorgold
 
Enabling news companies as content curators
Enabling news companies as content curatorsEnabling news companies as content curators
Enabling news companies as content curatorsPARC, a Xerox company
 
Web 3 0 Krista Thomas
Web 3 0 Krista ThomasWeb 3 0 Krista Thomas
Web 3 0 Krista Thomasguest4513a7
 
Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Christian Buckley
 
Content Management Case Study
Content Management Case StudyContent Management Case Study
Content Management Case StudyJerald Burget
 
Marko hurst jboye2011-deliverenterprisecs
Marko hurst jboye2011-deliverenterprisecsMarko hurst jboye2011-deliverenterprisecs
Marko hurst jboye2011-deliverenterprisecsWIKOLO
 
Open Calais Release 4.0
Open Calais Release 4.0Open Calais Release 4.0
Open Calais Release 4.0Krista Thomas
 
You Name Here1. List several products or services subject to n.docx
You Name Here1. List several products or services subject to n.docxYou Name Here1. List several products or services subject to n.docx
You Name Here1. List several products or services subject to n.docxjeffevans62972
 
Digital Transformation Templates.ppt
Digital Transformation Templates.pptDigital Transformation Templates.ppt
Digital Transformation Templates.pptOlusegun Mosugu
 
Web 2.0: It's Whole New Internet (from 2005)
Web 2.0: It's Whole New Internet (from 2005)Web 2.0: It's Whole New Internet (from 2005)
Web 2.0: It's Whole New Internet (from 2005)Jim Cuene
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebAmit Sheth
 

Similar to Publisher whitepaper (20)

The OpenCalais Workshop at WeMedia 2010
The OpenCalais Workshop at WeMedia 2010The OpenCalais Workshop at WeMedia 2010
The OpenCalais Workshop at WeMedia 2010
 
Five Ways To Calais V01
Five Ways To Calais V01Five Ways To Calais V01
Five Ways To Calais V01
 
San diego
San diegoSan diego
San diego
 
San diego
San diegoSan diego
San diego
 
OpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilOpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry Council
 
How to Develop an Enterprise Content Syndication Strategy
How to Develop an Enterprise Content Syndication StrategyHow to Develop an Enterprise Content Syndication Strategy
How to Develop an Enterprise Content Syndication Strategy
 
Web 3 0 Krista Thomas
Web 3 0 Krista ThomasWeb 3 0 Krista Thomas
Web 3 0 Krista Thomas
 
Web 3 0 Krista Thomas 1 26 10
Web 3 0 Krista Thomas 1 26 10Web 3 0 Krista Thomas 1 26 10
Web 3 0 Krista Thomas 1 26 10
 
Oxford Seo.Com Presentation
Oxford Seo.Com PresentationOxford Seo.Com Presentation
Oxford Seo.Com Presentation
 
Enabling news companies as content curators
Enabling news companies as content curatorsEnabling news companies as content curators
Enabling news companies as content curators
 
Web 3 0 Krista Thomas
Web 3 0 Krista ThomasWeb 3 0 Krista Thomas
Web 3 0 Krista Thomas
 
Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Content Management Case Study
Content Management Case StudyContent Management Case Study
Content Management Case Study
 
Marko hurst jboye2011-deliverenterprisecs
Marko hurst jboye2011-deliverenterprisecsMarko hurst jboye2011-deliverenterprisecs
Marko hurst jboye2011-deliverenterprisecs
 
Open Calais Release 4.0
Open Calais Release 4.0Open Calais Release 4.0
Open Calais Release 4.0
 
You Name Here1. List several products or services subject to n.docx
You Name Here1. List several products or services subject to n.docxYou Name Here1. List several products or services subject to n.docx
You Name Here1. List several products or services subject to n.docx
 
Digital Transformation Templates.ppt
Digital Transformation Templates.pptDigital Transformation Templates.ppt
Digital Transformation Templates.ppt
 
Web 2.0: It's Whole New Internet (from 2005)
Web 2.0: It's Whole New Internet (from 2005)Web 2.0: It's Whole New Internet (from 2005)
Web 2.0: It's Whole New Internet (from 2005)
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 

Publisher whitepaper

  • 1. Thomson Reuters Calais Web Service & the Linked Content Economy Executive Summary: The rise of the Internet has brought dramatic change to the publishing industry. While newspapers in particular struggle to adapt, advertisers are cutting budgets, seeking new efficiencies and increasingly using the Web to go straight to the consumer. Semantic technologies and new open data resources on the Web give both publishers and advertisers new tools and services that can help them succeed. The Thomson Reuters Calais Web service, found at OpenCalais.com, is one such service. Calais identifies and automatically tags the people, places, companies, facts and events “Calais turns static text into ‘Smart in text. It then forges connections between Media’ that is enriched with open data those entities and relevant data sets, media and connected to a dynamic ‘Linked files, Wikipedia entries and more on the open Content Economy’.” Web. Finally, it gives publishers a new way to share that tagged content with next generation -Thomas (Tom) Tague, Calais initiative lead search engines, news aggregators and others in the content ecosystem. Armed with this powerful new tool, forward-looking publishers are automating time consuming content operations and increasing editorial productivity. They are also enhancing the value of their content, improving their user experience and preparing to reach more readers in tomorrow’s media landscape – increasingly called the ‘linked content economy.’ Background: Calais is a strategic initiative at Thomson Reuters to advance the interoperability of content and support the company’s mission to provide pervasive intelligent information to its customers. Calais uses Natural Language Processing to give publishers free metatagging services, developer tools and an open standard for the generation of semantic content. The latest update to Calais – Calais 4.0 – is a significant advance on the initiative’s goals. The Calais team originally set out to help developers, bloggers and publishers automatically tag their content to improve search and navigation, and enable new reader engagement features. With Calais 4.0, the Calais Web service goes beyond metatagging to help publishers enhance their content, using open data from sources like Wikipedia, DBpedia. GeoNames, the Internet Movie Database (IMDB), Shopping.com and more. It also makes it easy for publishers to use
  • 2. their metadata to share their content with next generation content consumers – such as search engines, news aggregators ‘related stories’ service and more – to ultimately reach more readers. With these added capabilities, Calais helps content creators and content consumers alike connect to the rapidly emerging ‘Linked Content Economy’ and deliver ‘Smart Media.’ The Linked Content Economy & Smart Media: The Linked Content Economy is an evolving ecosystem of enriched and connected content that helps publishers engage readers, improve the user experience, and – ultimately – better convert readership to revenue. Linked Content goes beyond ‘link journalism,’ (linking to related stories, etc.). It uses metadata to help publishers create “Smart Media” – content that automatically connects the concepts, people, companies, etc. it contains to a rich array of related data sets and media assets on the Web. It then uses metadata to help publishers share their “Smart Media” with the rest of the content ecosystem, including search engines, news aggregators, ‘related stories’ applications and more. How it Works: 1. Publishers submit content to the Calais Web service using their Calais API key. 2. Calais tags each person, place, fact and event in the content, making it machine- readable and interoperable on the Web. 3. Each piece of content - and each entity or event in that content - is assigned a unique identifier (a document ID and many URIs) that ties back to the Linked Data Cloud. 4. Publishers use the metadata Calais returns (tags, document IDs and URIs) to enhance their content and create features like topic pages that improve the reader experience. 5. Publishers can also use their metadata to share their content with next generation search engines, news aggregators, etc. Calais’ participation in this ecosystem is as a platform. Calais lays the foundation on which, in conjunction with Content Management Systems, users can create a next generation publishing site, service or community. Calais adopted the Linked Data standard to build a back-end infrastructure and repository, enabling linkage between concepts and documents. Linked Data is a standard promulgated by Sir Tim Berners-Lee. Here are some of the open data assets in the Linked Data cloud.
  • 3. By embracing the Linked Data standard —and by creating a Calais repository of Linked Data assets on publicly-traded companies — Thomson Reuters has built scaffolding that enables Web sites, social networks and other content-rich applications to navigate between previously separate silos of data and information. Here’s how it works: 1.) When Calais processes an article, it extracts many named entities. For some classes of named entities, such as companies, Calais now also returns an HTTP hyperlink, called a Uniform Resource Identifier (URI). 2.) This hyperlink points into the Calais repository, to a machine readable XML page containing related content (company description, management team, board of directors, etc.) as well as links to related assets in DBpedia, from Thomson Reuters, etc. 3.) This linked data infrastructure forms a web-of-links that applications can navigate and use to pull information up for display or integration into the user experience. Calais has thus created a lingua-franca to drive content interoperability, and provided a simple “Calais provides a transportation layer standard for the sharing of rich semantic metadata that enables users to share their semantic metadata with downstream consumers Here’s an example: like search engines, news aggregators, A news story breaks on an IBM earnings report. ‘related stories’ applications and more.” The user wants to find out if IBM has any affiliation with Warren Buffett of Berkshire Hathaway. -Thomas (Tom) Tague, Calais initiative lead Today such a complex query requires time-consuming research. Search engines can’t hopscotch through content.
  • 4. But with Calais: 1. The news application sends the story to Calais. 2. Calais extracts IBM from the news story, ties it to International Business Machines Corporation in the Linked Data cloud and returns the URI (i.e. hyperlink) for IBM 3. The app. uses the IBM URI to retrieve the list of the Board of Director members from the Retuers.com content in the Calais repository 4. The app. queries the Board members for their other affiliations and finds a member that is also on the Board of Coca Cola plus a member that is the CEO of American Express 5. The app. runs a query of shareholders of Coca Cola and finds Berkshire Hathaway. 6. The app. runs a query on shareholders of American Express and finds Berkshire Hathaway. IBM Corporation Board of Directors Cathleen Black Cathleen Black William Brody Kenneth Chenault Other Affiliations Michael Eskew President, Hearst Magazines Board Member, Coca Cola Berkshire Hathaway Key Stockholders Management Team Kenneth Chenault Berkshire Warren Buffett Other Affiliations Charlie Munger CEO, American Express American Express Key Stockholders Berkshire Hathaway Semantic extraction is far more powerful than keyword search, which can confuse Paris (Texas), Paris (France) and Paris (Hilton). Calais can determine that the Paris in this particular article is Paris Texas based on sophisticated disambiguation that leverages a variety of clues in the text. New Applications: Calais 4.0 and beyond will enable many emergent applications including: - Publisher sites that dynamically mingle and deliver additional relevant content based on user preferences, profiles, history, friends’ selections and breaking topics that are hot now. - Media Monitoring tools that deliver slices of relevant information, e.g. content from all sites and blogs discussing natural disasters occurring near iron mines in Southeast Asia. - Plug-ins that integrate social networking / community / blogging, and bypass search. - Semantic ad networks and servers that go beyond keywords to inform ad placement with context, e.g. preventing airline ads from appearing next to news of air accidents. Conclusion: Armed with this powerful new tool, publishers are automating content operations, increasing productivity and cutting costs. They are enhancing the value of their content, improving their user experience and preparing to lead in the linked content economy. No-one can predict precisely what kinds of creative and potentially game-changing applications will emerge. With more than nine thousand users in the OpenCalais.com community, Thomson Reuters expects to see hyper-evolution in many arenas. 