Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Calais For SF And LA Meetups


Published on

Here is the deck we shared with the SF and LA Semantic Web Meetups this past week (March, '09). It covers Calais 4.0 and its connection to the Linked Data cloud. Please join us at

Published in: Technology
  • Be the first to comment

Open Calais For SF And LA Meetups

  1. 1. Calais Thomson Reuters Calais Initiative
  2. 2. Overview • Going to discuss five basic topics – What is Calais? – Why we’re doing it & what our goals are – How it works / What’s under the hood? – A few examples – Where it’s headed
  3. 3. Calais… • Calais extracts smart metadata from unstructured text and links that metadata to the Linked Data cloud.
  4. 4. Calais progress to date • Launched in late January, 2008 • 9,500 developers have joined • 1-3 million content ‘transactions’ per day • Delivered four major update releases • Free (as in free) for commercial or non- commercial use
  5. 5. 5 3 Which provides Metadata information and 1 returned to other Linked the user Unstructur Data pointers with keys ed Text 4 Keys provide access to the Calais 2 Linked Calais Data cloud 6 extracts entities, To a range of open and partner Linked facts and data assets, events including Thomson Reuters
  6. 6. Quick Demo You can find the Calais Viewer demonstration tool here: (Note that the Calais Viewer is not the Calais service. It is merely a demonstration of how the service works.) – Copy and paste the text of a business news article from AP, Dow Jones or into the viewer, and press submit. The article is sent to the Calais engine which tags the content and returns it, marked-up. – The tags appear on the left hand rail, and you can click on the plus (+) sign to see the tags expand. – Since we are now on Calais 4.0, you can also use the viewer to see the Linked Data assets related to the tags Calais returns. • Click on a company name on the left hand rail to find a Calais summary page featuring a basic description for that company, as well as a number of links. • Follow those links to see the other data entries on that company that are available for public use in the Linked Data Cloud. – For example, here is the Calais summary page for IBM: a07aa7933633.html – And here is the summary page for IBM in DBPedia (the Wikipedia translated into computer language):
  7. 7. Why & What 1. Derive semantic metadata from textual assets 2. Use that semantic metadata to create entry points into the linked data ecosystem 3. Provide a simple mechanism for the sharing of semantic metadata about textual content assets 4. And just why are you doing this…
  8. 8. 1: Semantics from Text: The Text Problem • People consume text • Most of it isn’t semantically enabled • Most of it won’t be semantically enabled • This isn’t about standards – microfromats vs RDFa vs. whatever. • Why: Latency, cost and short shelf- life
  9. 9. 1: Semantics from Text: The Text Problem • Target areas where: Years – The economics Great Novels don’t support Scient. Shelf Life metadata Pubs creation Legacy – The value of News metadata is New Gen potentially high News – The value of Seconds Tweets aggregated metadata is Latency ds potentially on rs extremely high a c Ye Se
  10. 10. 2: Getting from Text to the Linked Data Ecosystem
  11. 11. The Linked Data Cloud
  12. 12. 3: Semantic Metadata Transport Layer • I’m a content producer. We’ve loaded the car with rich semantic metadata – I’m sharing it within my four walls – How do I transport it to my consumers? – RSS / Atom, XML, Proprietary data feeds, Content API’s
  13. 13. 4: Why We’re Doing It • Two simple answers: – Hyper-evolution of capabilities – better, faster, stronger – The walled garden content world
  14. 14. How it Works – Under the Hood of Calais
  15. 15. How it Works – Under the Hood of Calais Document Level Metadata Metadata Reference Management Data Assets Entity Level Linked Data and … Stat Tools Disambig. ClearForest Calais Web RD Engine NLP Engine Service F Rule Lexi Base cons Output Formatting
  16. 16. Where From Here? • We’ve seen examples of first generation uses. • Where does this go in the future? • Beyond the document – Social Resume analysis – Museum Content Coalitions – Knowledge Management Applications – Investigative Journalism*
  17. 17. Investigative Journalism FOIA Calais Web Company:Contract Contract Service Company:Affiliation Document s Big Fuzzy Graph News Calais Web Company:Person Service FamilyRelation
  18. 18. What’s in the Pipeline? • 2009 (this is a fuzzy list) – Person disambiguation @ domain level? – Other disambiguation – Continued expansion of URI’s (entities & events) – Calais as hub – Exposure of the IDE? – User managed lexicons – Languages – Opt-in SPARQL Endpoint?
  19. 19. • – Gallery – code and applications examples – Forums – Documentation • Twitter @opencalais, Facebook Group