Calais @ the Palo Alto Semantic Web Meetup

  • 2,521 views
Uploaded on

An oveview of Thomson Reuters Calais Initiative given by Tom Tague at the Palo Alto Semantic Web Meetup in San Francisco, CA in August

An oveview of Thomson Reuters Calais Initiative given by Tom Tague at the Palo Alto Semantic Web Meetup in San Francisco, CA in August

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Hi, thanks for sharing. I’m Ana Mui Stanley, working on my latest site on lyrics, www.lyrics-search.org/ . I enjoy reading the slide.
    Are you sure you want to
    Your message goes here
  • Great. I learned a new thing about Calais.

    John.
    www.freeringtones.ws/
    Are you sure you want to
    Your message goes here
  • Great display about the need to innovate business models; tips on how to represent them succinctly; as well as the need to make advancement initiatives actionable. Superb use of pictures and clear to see illustrative examples.

    Janie
    http://financejedi.com
    http://healthjedi.com
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,521
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
51
Comments
3
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • First draft, with beautiful work by Sagit. Note that ALL text is editable.

Transcript

  • 1. Calais PAWS Sep 4, 2008
  • 2. Calais?
  • 3. ClearForest
    • Founded in 1998 by text analytics pioneers
    • A software organization that enables Intelligent Information
    • Enterprise and government customers
    • Led the market in the establishment of unstructured text as a key corporate asset
    • Acquired by Reuters June 2007
    • Offices: Boston, Israel
  • 4. The Text Problem
    • People consume text
    • Most of it isn’t semantically enabled
    • Most of it won’t be semantically enabled
    • Why: Latency, cost and short shelf-life
  • 5. Calais’ Piece of the Puzzle
    • A semantic metadata generation service that extracts entities, facts and events from unstructured text
    • Two new capabilities: topics & relevance
    • Available for commercial or non-commercial use up to 40,000 times per day
    Calais Named Entities Facts Events People, Companies, Geographies, Albums, Authors, etc. Position, Alliance, Education, Political Affiliation, etc. Management Change, IPO, Labor Action, Sporting, Entertainment etc. Unstructured Documents (Text / HTML / XML)
  • 6. Reuters Announced the Acquisition of ClearForest New York - April 30, 2007 Reuters, the global information company, has entered into an agreement to acquire all of the outstanding shares of ClearForest Ltd., a privately held provider of Text Analytics solutions, whose tagging platform and analytical products allow clients to derive precise business information from huge amounts of textual content. ClearForest has received sufficient shareholder approval to complete the transaction, which is expected to close in approximately 30 days, subject to customary closing conditions. The financial terms were not disclosed. Reuters plans to retain and continue to work with the existing management team and their highly skilled workforces in the US and Israel. It also plans to continue to support existing products and customers. Reuters believes that search will be a pivotal element to the future of how financial information is sourced and consumed. As part of its drive into this space, Reuters has created a new strategic group and appointed Gerry Campbell, who will oversee the integration of ClearForest and drive this innovation. <Topic>M&A</Topic> <Acquisition offset=&quot;494&quot; length=&quot;130&quot;>   <Company_Acquirer>Reuters</Company_Acquirer>   <Company_Acquired>ClearForest Ltd.</Company_Acquired>   <Status>Planned</Status> </Acquisition> <Company>Reuters</Company> <Company>ClearForest Ltd.</Company> <Product>Text Analytic Solution </Product> <Company>ClearForest Ltd.</Company> <Company>Reuters</Company> <Country>United States</Country> <Country>Israel</Country> <Company>Reuters</Company> <Person>Gerry Campbell</Person> <ManagementChange offset=&quot;2789&quot; length=&quot;92&quot;> <Person>Gerry Campbell</Person> <Company>Reuters</Company> <Action>Enters</Position> </ManagementChange>
  • 7. What’s Behind and Event … An Example Digital Marketing Services,Inc. (DMS), the leading provider of online marketing research and a division of America Online Inc. (AOL), today announced an alliance with Netcentives Inc. (Nasdaq: NCNT) Extracted instances: Company = Digital Marketing Services, Inc. Company = Netcentives Inc. Status = announced DateString = today Date = 2000-01-31
  • 8. Live Example Viewer Demo Gnosis Demo
  • 9. Extending Calais’ Reach
    • More than just a web service – a growing collection of tools and applications to make it valuable in the real world
    Calais Browser Extensions Gnosis Content Management Tools WordPress Drupal UIMA Development Tools & Libraries PHP Ruby JAVA .NET Applications And more… TopBraid RSS Tagger Powerhouse LinkedFacts Wirecatch FeedShaver
  • 10. How Calais is Being Used Today
    • Gist Automatically aggregates multiple news sources and automatically slots them into topic, etc.
  • 11. The Stack ClearForest Tags Platform File Based Connector Programmatic API (SOAP web Service) RDBMS Connector Web Crawlers (Agents) Console Rich XML Live Feed Tooling Modeler Developer Cat Manager A F External Content/live feed/Enterprise Content ClearForest Extraction Modules B ClearForest Categorizer C
  • 12. Detailed Stack Rich XML Rich XML ClearForest Tags Platform Files Document Conversion and Normalization Control DB Tags API Control API File Based API Programmatic API (SOAP web Service) Web Agents RDBMS based API Enterprise System Categorizer Semantic Tagging Language ID Headline Generation Classifier Extraction Modules Language Classifier Templates Categorization Manager ClearForest Dvlpr/Modeler Languages Configuration Key Concepts Configuration ClearForest Studio Rich XML External Feed Configuration & Monitoring Console Farm Manager
  • 13. Platform Highlights
    • Single run-time platform for all technologies
    • Modular architecture
    • Additional functional plug-in can be added anywhere
    • Web services interfaces
    • SOA ready
    • Java based
    • Programmatic API to all components
    • Farming support for scalability
    • Best practices/standards (XML, Unicode, Architectural Patterns, Design patterns …)
  • 14. File API Programmatic API (SOAP web Service) RDBMS based API Web Custom Document Tagging (Doc Runner) Categorization Information extraction Control Console Control API Tags Pipeline KB Writer DB Writer XML Writer IO Bound Rich XML ANS Collection DB Other (Headline Generation) Document Conversion Conversion & Normalization PDF Conv. XML Conv. Doc Conv. File/Web/DB based API (Document Provider) Profile Listener Listener Listener Language identification Queues: CPU Bound Web Document Injector (flight plan) Technology
  • 15. The NLP Stack Events & Facts Entities Candidates, Resolution, Normalization Basic NLP Noun Groups, Verb Groups, Numbers Phrases, Abbreviations Metadata Analysis Title, Date, Body, Paragraph Sentence Marking Morphological Analyzer POS Tagging (per word) Stem, Tense, Aspect, Singular/Plural Gender, Prefix/Suffix Separation Tokenization
  • 16. Calais, Semantics and the Semantic Web
    • Issues, Opportunities
      • Ontologies
        • How do we make this a community effort?
      • Dereferenceable URI’s & Endpoints
        • Engineering
        • Population
          • Basic data
          • Links
          • Proprietary data sources
          • Functions? Code?
  • 17. What’s in the Pipeline?
    • 2008
      • The basics of de-referenceable URI’s
      • Disambiguation – company & geography
      • Hooks
    • 2009 (this is a fuzzy list)
      • Person disambiguation (social networks?)
      • Other disambiguation
      • Continued population of endpoints
      • Calais as hub
      • Exposure of the IDE
      • User managed lexicons
      • Lots and lots of hooks
  • 18.
    • www.opencalais.com
      • Gallery – code and applications examples
      • Forums
      • Documentation