Your SlideShare is downloading. ×
Text Analytics Past, Present & Future
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Text Analytics Past, Present & Future


Published on

Text Analytics Past, Present & Future: keynote presentation by Seth Grimes at the TEMIS User Conference, Barcelona, July 9, 2009.

Text Analytics Past, Present & Future: keynote presentation by Seth Grimes at the TEMIS User Conference, Barcelona, July 9, 2009.

Published in: Technology, Education
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Text Analytics Past, Present & Future
    Seth Grimes
  • 2. >>Past, Present & Future
    He who controls the present, controls the past. He who controls the past, controls the future.
    -- derived from George Orwell’s 1984
  • 3. >> The Present: Today’s Market
    I have estimated a $350 million global market in 2008, up 40% from $250 million in 2007.
    Covers software licenses, vendor provided support and professional services.
    $(hundreds) million more value created by:
    Universities and research centers, especially in the life sciences.
    Government, particularly for intelligence & counter-terrorism.
    OEM licensees, for listening platforms, e-discovery, etc.
    Systems integrators and consultants.
  • 4. >> Applications Today
    Broadly grouped --
    Intelligence and counter-terrorism.
    Life sciences.
    Content management, publishing & search.
    Customer & market intelligence.
    Enterprise feedback.
    Law enforcement.
    Risk, fraud, compliance, and investigation.
  • 5. >>On the Demand Side…
    How do current and prospective users see the market?
    I recently published a study report, “Text Analytics 2009: User Perspectives on Solutions and Providers.” Drawing from the findings…
  • 6. >> Primary Applications
    What are your primary applications where text comes into play?
  • 7. >> Primary Applications
    Results found by Fern Halper of Hurwitz & Associates.
  • 8. >> The “Unstructured Data” Challenge
    Sources are highly varied –
    • Web sites, news & journal articles, images, video.
    • 9. Blogs, forum postings, and social media.
    • 10. E-mail, Contact-center notes and transcripts; recorded conversation.
    • 11. Surveys, feedback forms, warranty & insurance claims.
    • 12. Office documents, regulatory filings, reports, scientific papers.
    • 13. And every other sort of document imaginable.
  • >> Important Sources
    What textual information are you analyzing or do you plan to analyze?
    Currentusers responded:
  • 14. >> Finding Business Value
    Why? In customer-experience initiatives, for example, “more unsolicited, unstructured data [implies] increasing use of text analytics.”
    -- Bruce Temkin, Forrester Research
  • 15. >> Information in Text
    Do you need (or expect to need) to extract or analyze:
  • 16. Please rate your overall experience -- your satisfaction.
    Fern Halper of Hurwitz & Associates found in her 2009 survey, “all of the companies that had deployed text analytics stated that the implementations either met or exceeded their expectations.  And, close to 60% stated that text analytics had actually exceeded expectations.”
    >>TextAnalytics Satisfaction
  • 17. >> Today’s Text Analytics Players
    Data mining and analytics.
    Enterprise- and specialized-application focus.
    Search tools and services.
    Software-tool, OEM suppliers.*
    Text analytics pure-plays, diverse applications.*
    Web services.
    * TEMIS categories.
  • 18. >> Today’s Text Analytics
    Contrast with the 1999 landscape –
    “The nascent field of text data mining (TDM) has the peculiar distinction of having a name and a fair amount of hype but as yet almost no practitioners.”
    -- Prof. Marti A. Hearst,
    “Untangling Text Data Mining,” 1999
    (For our purposes, “text analytics” = “text mining” = “text data mining.”)
  • 19. >>What’sPastis Prologue
    “Don't look back. Something might be gaining on you.”
    -- Satchel Paige
  • 20. >> Understanding the Challenge
    Marti Hearst in 1999:
    “Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically.”
    “[A] way to view text data mining is as a process of exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for which the answer is not currently known.”
    Challenges: Access, decoding, discovery, application.
  • 21. >> In Business Terms
    Business intelligence (BI) as defined in 1958:
    “In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera... The notion of intelligence is also defined here... as ‘the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.’”
    -- Hans Peter Luhn,
    “A Business Intelligence System,”
    IBM Journal, October 1958
  • 22. Document input and processing
    Information extraction
    Knowledge management
    H.P. Luhn, “A Business Intelligence System,” IBM Journal, October 1958
  • 23. >>StatisticalAnalysis of Content
    “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance.”
    Hans Peter Luhn, “The Automatic Creation of Literature Abstracts,”
    IBM Journal, April 1958
  • 24. >>SignificancefromSemantics
    “This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”
    -- Hans Peter Luhn, 1958
  • 25. >> Methods
    Technologists developed approaches to taming text:
    Vector-space representations.
    Salton, Wong & Yang, 1975,
    “A Vector Space Model for Automatic Indexing.”
    Clustering & classification algorithms.
    Naive Bayes.
    Support Vector Machine.
    K-nearest neighbor.
    Linguistic methods.
    Machine learning.
  • 26. >> Looking Ahead
  • 27. >>Market Trends
    “The Diverse and Exploding Digital Universe,” (IDC, 2008)
    Stronger than ever:
    Life sciences.
    Intelligence & counter-terrorism.
    Continued steep growth:
    Media & publishing.
    • Seek to mine and to classify/process.
    • 28. For users, semantic annotations ease navigation and boost findability.
    Customer experience.
    • Key to quality, satisfaction.
    Market intelligence including competitive intelligence.
    • Aggregates and details are both important.
  • >>Technology Initiatives
    Now and near future.
    Semantic search.
    Guha (IBM), McCool (Stanford), Miller (W3C): “The addition of explicit semantics can improve [navigational and research] search” (2003).
    Question answering.
    Matthew Glotzbach, Google: “Question answering is the future of enterprise search” (2006).
    Sentiment analysis.
    Bing Liu, Univ of Illinois: “The Web has dramatically changed the way that people express their views and opinions.”
  • 29. >>Technology Initiatives 2
    Now and near future.
    Listening platforms.
    Bruce Temkin, Forrester Research: “The future is clearly about analyzing feedback in any form that your customers give it. That’s a trend that won’t go away.”
    Text visualization.
    We’re still coming to terms with the idea of actually extracting and exploiting the information content of rich media.
    Web 3.0 & the Semantic Web.
    Ronen Feldman, Bar-Ilan University and Hebrew University: “Text analytics [is] driving the Semantic Web” (2006).
  • 30. >> Search, from Keywords to Intelligence
    Text analytics enables smarter search that better responds to user goals.
  • 31. >> Question Answering
    Text analytics (information extraction) feeds curated knowledge bases.
  • 32. >>Sentiment Analysis
    Two assertions:
    • Human communications are inherently subjective.
    • 33. Opinion often masquerades as Fact.
  • >>Sentiment Analysis
    “Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.”
    -- Wilson, Wiebe & Hoffman, 2005, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis”
    “Great hotel, just a few brilliant streets, full of restaurants and shops, from La Rambla. Beautiful hotel restaurant and the pool is UNBELIEVABLE! Single room is very modern and the blackout blind is awesome on mornings that you wish to sleep for a few more minutes. Will definitely be back!”
    « Logiciel d’apparence assez simple (j’aime beaucoup l’icône de l’application), mais qui se trouve être très malin et sait se différencier de ses concurrents, par la possibilité de lui appliquer des thèmes ! »
  • 34. >>Text Visualization
  • 35. >>Web 3.0 & the Semantic Web
    “We have many of the tools in place -- from Web 2.0 technologies… to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth.”
    -- “The Diverse and Exploding Digital Universe,” (IDC, 2008)
    “The Semantic Web is a web of data, in some ways like a global database.” -- Tim Berners-Lee, 1998
    Web 3.0 = Web 2.0 + the Semantic Web + semantic tools.
  • 36. >>Web 3.0 & the Semantic Web
    Recurring themes:
    Semantically enriched -- context sensitive -- localized.
    Technical concepts:
    Linked Data -- Microformats, RDF, SPARQL – OWL.
    Text analytics enables Web 3.0 and the Semantic Web.
    Automated content categorization and classification.
    Text augmentation: metadata generation, content tagging.
    Information extraction to databases.
    Exploratory analysis and visualization.
  • 37. Text Analytics Past, Present & Future
    Seth Grimes