0
Calais Thomson Reuters Calais Initiative: Calais 4.0 ~ January, 14, 2009 Thomas (“Tom”) Tague and Krista Thomas
Overview <ul><li>Going to discuss five basic topics </li></ul><ul><ul><li>What is Calais? </li></ul></ul><ul><ul><li>Why w...
Calais? What’s Calais? As seen from U.K & the Continent As seen from North America As seen by us
Calais? What’s Calais? <ul><li>A semantic metadata generation service that extracts entities, facts and events from unstru...
Why We’re Doing It <ul><li>Two simple answers: </li></ul><ul><ul><li>Hyper-evolution of capabilities – better, faster, str...
Our Goals / The Capabilities We Want to Deploy <ul><li>Let’s state them here and then walk through why we have these goals...
1: Semantics from Text: The Text Problem <ul><li>People consume text </li></ul><ul><li>Most of it  isn’t  semantically ena...
1: Semantics from Text: The Text Problem <ul><li>Target areas where: </li></ul><ul><ul><li>The economics don’t support met...
2: Getting from Text to the Linked Data Ecosystem
The Linked Data Cloud
3: Semantic Metadata Transport Layer <ul><li>I’m a content producer. We’ve loaded the car with rich semantic metadata </li...
How it Works – Under the Hood of Calais
How it Works – Under the Hood of Calais Calais Web Service ClearForest NLP Engine Rule Base Lexicons RDF Disambig. Engine ...
How You Can Use It – the SemHead version <ul><li>Send unstructured text </li></ul><ul><ul><li>Get back document categoriza...
Entities,  Facts  &  Events <ul><li>Anniversary, City, Company, Continent, Country, Currency, EmailAddress, EntertainmentA...
Extending Calais’ Reach <ul><li>More than just a web service – a growing collection of tools and applications to make it v...
Calais progress to date <ul><li>Launched in late January, 2008 </li></ul><ul><li>9,000 developers have joined OpenCalais.c...
Example: The Mail & Guardian Online, South African Newspaper <ul><li>Using Calais to metatag new and historical articles, ...
Example: Gist - today’s news filtered by people, places & events GIST uses Calais to prioritize stories, rank newsmakers &...
Example: The Powerhouse Museum in Sydney Using Calais to tag historical archives & using tags as search terms
Example: IT Healthcare News Using Calais to surface ambient “related content”
Examples <ul><li>Those are examples of first generation uses. Some of what we’re seeing in the pipeline: </li></ul><ul><ul...
Investigative Journalism FOIA Contract Documents Calais Web Service Company:Person FamilyRelation News Calais Web Service ...
What’s new in Release 4? <ul><li>Release 4 – What’s New? </li></ul><ul><ul><li>Linked data for approximately 25 entities <...
What’s in the Pipeline? <ul><li>2009 (this is a fuzzy list) </li></ul><ul><ul><li>Person disambiguation @ domain level? </...
<ul><li>www.opencalais.com </li></ul><ul><ul><li>Gallery – code and applications examples </li></ul></ul><ul><ul><li>Forum...
Upcoming SlideShare
Loading in...5
×

Open Calais Release 4.0

4,820

Published on

A brief, entry-level overview of version 4.0 of the Calais Web service. Calais 4.0 automatically connects publishers to the exploding ecosystem of Linked Data assets on the Web, and helps them syndicate their metadata to reach downstream readers via search engines,news aggregators, 'related stories' recommendation services and more.

Published in: Education, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,820
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
86
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • First draft, with beautiful work by Sagit. Note that ALL text is editable.
  • Transcript of "Open Calais Release 4.0"

    1. 1. Calais Thomson Reuters Calais Initiative: Calais 4.0 ~ January, 14, 2009 Thomas (“Tom”) Tague and Krista Thomas
    2. 2. Overview <ul><li>Going to discuss five basic topics </li></ul><ul><ul><li>What is Calais? </li></ul></ul><ul><ul><li>Why we’re doing it & what our goals are </li></ul></ul><ul><ul><li>How it works / What’s under the hood? </li></ul></ul><ul><ul><li>A few examples </li></ul></ul><ul><ul><li>Where it’s headed </li></ul></ul>
    3. 3. Calais? What’s Calais? As seen from U.K & the Continent As seen from North America As seen by us
    4. 4. Calais? What’s Calais? <ul><li>A semantic metadata generation service that extracts entities, facts and events from unstructured text </li></ul><ul><li>Creates linkages from extracted entities to linked data ecosystem </li></ul><ul><li>Provides a transportation layer for rich semantic metadata from producers to consumers </li></ul><ul><li>Details to follow…. </li></ul>
    5. 5. Why We’re Doing It <ul><li>Two simple answers: </li></ul><ul><ul><li>Hyper-evolution of capabilities – better, faster, stronger </li></ul></ul><ul><ul><li>The walled garden content world </li></ul></ul>
    6. 6. Our Goals / The Capabilities We Want to Deploy <ul><li>Let’s state them here and then walk through why we have these goals </li></ul><ul><ul><li>Derive semantic metadata from textual assets </li></ul></ul><ul><ul><li>Use that semantic metadata to create entry points into the linked data ecosystem </li></ul></ul><ul><ul><li>Provide a simple mechanism for the sharing of semantic metadata about textual content assets </li></ul></ul>
    7. 7. 1: Semantics from Text: The Text Problem <ul><li>People consume text </li></ul><ul><li>Most of it isn’t semantically enabled </li></ul><ul><li>Most of it won’t be semantically enabled </li></ul><ul><li>This isn’t about standards – microfromats vs RDFa vs whatever. </li></ul><ul><li>Why: Latency, cost and short shelf-life </li></ul>
    8. 8. 1: Semantics from Text: The Text Problem <ul><li>Target areas where: </li></ul><ul><ul><li>The economics don’t support metadata creation </li></ul></ul><ul><ul><li>The value of metadata is potentially high </li></ul></ul><ul><ul><li>The value of aggregated metadata is potentially extremely high </li></ul></ul>Seconds Years Seconds Years Tweets Blogs News Scient. Pubs Great Novels Latency Shelf Life
    9. 9. 2: Getting from Text to the Linked Data Ecosystem
    10. 10. The Linked Data Cloud
    11. 11. 3: Semantic Metadata Transport Layer <ul><li>I’m a content producer. We’ve loaded the car with rich semantic metadata </li></ul><ul><ul><li>I’m sharing it within my four walls </li></ul></ul><ul><ul><li>How do I transport it to my consumers? </li></ul></ul><ul><ul><li>RSS / Atom, XML, Proprietary data feeds, Content API’s </li></ul></ul>
    12. 12. How it Works – Under the Hood of Calais
    13. 13. How it Works – Under the Hood of Calais Calais Web Service ClearForest NLP Engine Rule Base Lexicons RDF Disambig. Engine Reference Data Assets Metadata Management Document Level Metadata Entity Level Linked Data and … Output Formatting Stat Tools
    14. 14. How You Can Use It – the SemHead version <ul><li>Send unstructured text </li></ul><ul><ul><li>Get back document categorization, entities, facts and events – with document and entity level URI’s </li></ul></ul><ul><li>Syndicate Metadata </li></ul><ul><ul><li>Send unstructured text </li></ul></ul><ul><ul><li>Share /syndicate the document GUID </li></ul></ul><ul><li>Access Endpoints </li></ul><ul><ul><li>Use entity level URI </li></ul></ul><ul><ul><li>Access entity level Linked Data endpoints & TR Content </li></ul></ul>
    15. 15. Entities, Facts & Events <ul><li>Anniversary, City, Company, Continent, Country, Currency, EmailAddress, EntertainmentAwardEvent, Facility, FaxNumber, Holiday, IndustryTerm, MarketIndex, MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalDisaster, NaturalFeature, OperatingSystem, Organization, Person, PhoneNumber, Product, ProgrammingLanguage, ProvinceOrState, PublishedMedium, RadioProgram, RadioStation, Region, SportsEvent, SportsGame, SportsLeague, Technology, TVShow, TVStation, URL </li></ul><ul><li>Acquisition, Alliance, AnalystEarningsEstimate, AnalystRecommendation, Bankruptcy, BonusShares, BusinessRelation, Buybacks, CompanyAffiliates, CompanyCustomer, CompanyEarningsAnnouncement, CompanyEarningsGuidance, CompanyInvestment, CompanyLegalIssues, CompanyLocation, CompanyMeeting, CompanyReorganization, CompanyTechnology, CompanyTicker, ConferenceCall, CreditRating, EmploymentRelation, FamilyRelation, FDAPhase, IPO, JointVenture, ManagementChange, Merger, MovieRelease, MusicAlbumRelease, PatentFiling, PatentIssuance, PersonAttributes, PersonCommunication, PersonEducation, PersonEmailAddress, PersonPolitical, PersonPoliticalPast, PersonProfessional, PersonProfessionalPast, PersonRelation, PersonTravel, Quotation, SecondaryIssuance, StockSplit </li></ul>
    16. 16. Extending Calais’ Reach <ul><li>More than just a web service – a growing collection of tools and applications to make it valuable in the real world </li></ul>Calais Browser Extensions Gnosis Content Management Tools WordPress Drupal UIMA Development Tools & Libraries PHP Ruby JAVA .NET Applications And more… TopBraid RSS Tagger Powerhouse LinkedFacts Wirecatch FeedShaver
    17. 17. Calais progress to date <ul><li>Launched in late January, 2008 </li></ul><ul><li>9,000 developers have joined OpenCalais.com </li></ul><ul><li>Approx. 1 million content ‘transactions’ per day </li></ul><ul><li>Delivered four major update releases </li></ul><ul><li>Lots of interesting apps </li></ul><ul><ul><li>The Mail & Guardian Online ( http:// www.mg.co.za / ) </li></ul></ul><ul><ul><li>www.powerhousemuseum.com </li></ul></ul><ul><ul><li>Gist.whistlehog.com </li></ul></ul><ul><ul><li>http://www.semanticproxy.com </li></ul></ul>
    18. 18. Example: The Mail & Guardian Online, South African Newspaper <ul><li>Using Calais to metatag new and historical articles, and: </li></ul><ul><ul><li>Build an index or topics A-Z </li></ul></ul><ul><ul><li>Pull out automatic related articles or pictures </li></ul></ul><ul><ul><li>Create news alerts on companies or people </li></ul></ul><ul><ul><li>Pull up maps for the countries named in articles </li></ul></ul><ul><ul><li>Predict readers’ interests based on browsing habits </li></ul></ul><ul><ul><li>Create tag clouds, showing popular subjects, people, etc. </li></ul></ul>Using Calais to optimize search and navigation; drive consumer engagement
    19. 19. Example: Gist - today’s news filtered by people, places & events GIST uses Calais to prioritize stories, rank newsmakers & reveal trends / reader demand. It automatically aggregates multiple news sources and slots them into topic.
    20. 20. Example: The Powerhouse Museum in Sydney Using Calais to tag historical archives & using tags as search terms
    21. 21. Example: IT Healthcare News Using Calais to surface ambient “related content”
    22. 22. Examples <ul><li>Those are examples of first generation uses. Some of what we’re seeing in the pipeline: </li></ul><ul><ul><li>Social Resume analysis </li></ul></ul><ul><ul><li>Investigative Journalism* </li></ul></ul><ul><ul><li>Museum metadata coalitions </li></ul></ul>
    23. 23. Investigative Journalism FOIA Contract Documents Calais Web Service Company:Person FamilyRelation News Calais Web Service Company:Contract Company:Affiliation Big Fuzzy Graph
    24. 24. What’s new in Release 4? <ul><li>Release 4 – What’s New? </li></ul><ul><ul><li>Linked data for approximately 25 entities </li></ul></ul><ul><ul><li>A start at Thomson Reuters contributed content </li></ul></ul><ul><ul><li>Metadata hosting and transport </li></ul></ul><ul><ul><li>Basic French </li></ul></ul><ul><ul><li>Published RDFS Ontology </li></ul></ul><ul><ul><li>New entities / relationships </li></ul></ul><ul><ul><ul><li>Products </li></ul></ul></ul><ul><ul><ul><li>Competitive intelligence </li></ul></ul></ul><ul><ul><ul><li>Expanded document level categorization </li></ul></ul></ul>
    25. 25. What’s in the Pipeline? <ul><li>2009 (this is a fuzzy list) </li></ul><ul><ul><li>Person disambiguation @ domain level? </li></ul></ul><ul><ul><li>Other disambiguation </li></ul></ul><ul><ul><li>Dramatic expansion of endpoints (entities & events) </li></ul></ul><ul><ul><li>Calais as hub </li></ul></ul><ul><ul><li>Exposure of the IDE? </li></ul></ul><ul><ul><li>User managed lexicons </li></ul></ul><ul><ul><li>Languages </li></ul></ul><ul><ul><li>Opt-in SPARQL Endpoint? </li></ul></ul>
    26. 26. <ul><li>www.opencalais.com </li></ul><ul><ul><li>Gallery – code and applications examples </li></ul></ul><ul><ul><li>Forums </li></ul></ul><ul><ul><li>Documentation </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×