SlideShare a Scribd company logo
Text Analytics 2009:
 User Perspectives on
Solutions and Providers

      Seth Grimes

  An Alta Plana research study
         Sponsored by
Text Analytics 2009: User Perspectives

Table of Contents
Executive Summary................................................................................................................... 3
Text Analytics Basics ................................................................................................................ 4
  Discovering Meaning in Text.....................................................................................................4
Software and Solution Market Overview.................................................................................. 7
  Applications and Sources ............................................................................................................ 7
Demand-Side Perspectives ........................................................................................................ 9
  Study Context..............................................................................................................................9
  About the Survey ....................................................................................................................... 10
Demand-Side Study 2009: Response ......................................................................................... 13
  Q1: Length of Experience ........................................................................................................... 13
  Q2: Application Areas ................................................................................................................ 13
  Q3: Information Sources ........................................................................................................... 14
  Q4: Return on Investment ......................................................................................................... 15
  Q5: Mindshare ............................................................................................................................ 15
  Q6: Spending ............................................................................................................................. 16
  Q8: Satisfaction ......................................................................................................................... 16
  Q9: Overall Experience ............................................................................................................. 16
  Q12: Like and Dislike ................................................................................................................. 18
  Q13: Information Types ............................................................................................................ 19
  Q14: Important Properties & Capabilities ................................................................................ 20
Additional Analysis .................................................................................................................. 21
  Selected Cross-tabulations .........................................................................................................21
  Interpretive Limitations ............................................................................................................ 22
About the Study ....................................................................................................................... 24
Solution Profile: Attensity ....................................................................................................... 26
Solution Profile: Clarabridge ................................................................................................... 28
Solution Profile: GATE ........................................................................................................... 30
Solution Profile: IxReveal ......................................................................................................... 32
Solution Profile: Nstein ........................................................................................................... 34
Solution Profile: SAP BusinessObjects ................................................................................... 36
Solution Profile: TEMIS ......................................................................................................... 38

                  Published May 31, 2009 under the Creative Commons Attribution 3.0 License.

Text Analytics 2009: User Perspectives

Executive Summary
       The global text-analytics market is growing at a very rapid pace, an estimated 40% in
       2008, creating a $350 million market for software and vendor supplied support and
       services. The total business value generated by text-analytics reliant information
       products, in-house development, service providers, applications such as e-discovery,
       and research surely multiplies this figure eight-fold. The author projects 2009 market
       growth up to 25% despite the economic downturn.
                                                                                Market Factors
       A number of factors have impelled sustained text-analytics market growth. The
       technology – text mining and related visualization and analytical software – continues
       to deliver unmatched capabilities both in early-adopter domains such as intelligence
       and the life sciences and in business sectors that have embraced text analytics more
       recently, in the last 3-5 years. These latter sectors include, notably, media and
       publishing, financial services and insurance, travel and hospitality, and consumer
       products and retail. Business and technical functions such as customer support and
       satisfaction, brand and reputation management, claims processing, human resources,
       media monitoring, risk management and fraud, and search have fueled recent growth.
       No single organization or approach dominates the market. While existing players
       have been very successful, they and new entrants continue to innovate, offering
       cutting-edge capabilities, for instance in sentiment analysis, as well as in newer, as-a-
       service and mash-up ready delivery models and capabilities targeted to market niches.
                                                    Text Analytics 2009: User Perspectives
       Insights into the question, “What do current and prospective text-analytics users really
       think of the technology, solutions, and solution providers?” will help providers craft
       products and services that better serve users. Insights will guide users seeking to
       maximize benefit for their own organizations. Alta Plana conducted a spring 2009
       survey to explore the topic. This report, “Text Analytics 2009: User Perspectives on
       Solutions and Providers,” presents findings drawn from 116 responses, the majority of
       whom already use text analytics. The study was supported by seven sponsors but is
       editorially independent, designed and conducted by industry analyst and consultant
       Seth Grimes, a recognized expert in the application of text analytics.
                                                                              Key Study Stats
       The following are key study findings:
               Top business applications of text analytics for respondents are a) Brand /
               product / reputation management (40% of respondents), b) Competitive
               intelligence (37%), and c) Voice of the Customer / Customer Experience
               Management (33%) and d) other Research (33%).
               These applications match a focus on on-line sources: a) blogs and other social
               media (47%), b) news articles (44%), and c) on-line forums (35%) as well as
               direct customer feedback in the form of d) e-mail and correspondence (36%)
               and customer/market surveys (34%).
               Users with 2 years or more experience prefer tools that support specialized
               dictionaries, taxonomies, or extraction rules and they often like open source.
               Prospective users expect to focus their initial text analytics work on inside-
               the-firewall feedback sources: e-mail, surveys, and contact center materials.
               Prospective users have high ROI hopes. Use of each of six different measures,
               led by increased sales to existing customers, is favored by over 50% of
               respondents who are not current users. Other measures are not far behind.

Text Analytics 2009: User Perspectives

Text Analytics Basics
           The term text analytics describes software and transformational steps that discover
           business value in “unstructured” text. The aim is to improve automated text
           Most everything people do with electronic documents falls into one of four classes:
               1.   Compose, publish, manage, and archive.
               2. Index and search.
               3.   Categorize and classify according to metadata & contents.
               4. Summarize and extract information.
           Text analytics enhances the first and second sets of functions and enables the third
           and fourth.
           The remainder of this section will at the technology, and the section after will look at
           the market and applications.

           Discovering Meaning in Text
           Text analytics encompasses applications of the technology in government, science,
           and industry and for cross-cutting tasks that range from information retrieval to text-
           fueled investigative analyses. Text analytics can be seen as a subspecies of business
           intelligence, and capabilities will be an essential component of the eventual creation
           of the Semantic Web.
                                                                                 Structure in Text
           Text – news and blog articles, scientific papers, spoken call-center conversations,
           survey responses, product reviews posted to on-line forums, this report – is replete
           with structure. Humans (relatively easily) learn to use this structure – the
           morphology of individual words, the syntax the governs the composition of
           expressions, the grammar behind phrases and sentences, and the larger-scale structure
           of text as organized and presented in Web pages, e-mail, newspapers, books, and
           myriad other forms – to both understand and generate text. We are able to do this
           without conscious thought, coupled with a grasp of context, knowledge, and emotion
           that allows us to understand often-complex interactions.
           Text-analytics software technology – text mining and related visualization and
           analytical tools – enables machine treatment of text that replicates, automates, and
           extends human capabilities.
                                                                Sense-Making through Statistics
           The earliest approaches to automated text analysis applied statistical methods to text.
           Consider Hans Peter Luhn‟s 1958 IBM Journal paper, “The Automatic Creation of
           Literature Abstracts”1, which envisaged application of statistics for sense-making and
           summarization. Luhn wrote,
               “Statistical information derived from word frequency and distribution is used
               by the machine to compute a relative measure of significance, first for
               individual words and then for sentences. Sentences scoring highest in
               significance are extracted and printed out to become the auto-abstract.”
           Luhn illustrated his approach, as shown in the figure below, with the kind of
           frequency analysis that is performed today by search-engine optimization (SEO)
           tools and software such as Wordle that generates word and tag clouds. Luhn
1 -- paper is behind a “paywall.”

Text Analytics 2009: User Perspectives

          additionally proposed a Keyword-in-Context (KWIC) indexing system that is at the
          root of modern information retrieval methods.

            “Statistical information derived from word frequency and distribution is used by the machine
                              to compute a relative measure of significance": H.P. Luhn
                                                                             Vector Space Methods
          Vector-space models became the prevailing approach to representing documents for
          information retrieval, classification, and other tasks.
          The text content of a document is reduced to an
          unordered “bag of words” that becomes a point in a
          high-dimensional vector space that may embed the
          word content of many documents as illustrated in the
          diagram that appears to the right2.
          Approaches such as TF-IDF (term frequency–inverse
          document frequency) weigh the significance of a term
          according to its prevalence in a larger document set.
          We apply additional analytical methods to make text
          tractable, for instance, latent semantic indexing
          utilizing singular value decomposition for term
                                        reduction / feature selection to create a new, reduced
                                        concept space. In plain English, such techniques identify
                                        and retain the most important concepts and consolidate or
                                        eliminate lesser concepts.
                                        Text analytics will typically apply one or more of a
                                        number of statistical clustering and classification methods
                                        to documents. These methods include Naive Bayes,
                                        Support Vector Machines, and k-nearest neighbor
                                        clustering. The diagram to the left illustrates the
                                        identification of a hyperplane, the red line given a 2-D
          picture, that best separates the dot-/circle-represented documents into distinct sets.
    Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975

Text Analytics 2009: User Perspectives

                                                                         Linguistic Approaches
       Statistical approaches have a hard time making sense of nuanced human language, an
       issue that H.P. Luhn foresaw in 1958. Luhn wrote in his visionary paper, cited above,
               "This rather unsophisticated argument on „significance‟ [inferred
               from a word‟s frequency of use] avoids such linguistic implications as
               grammar and syntax. In general, the method does not even propose to
               differentiate between word forms. Thus the variants differ,
               differentiate, different, differently, difference and differential could
               ordinarily be considered identical notions and regarded as the same
               word. No attention is paid to the logical and semantic relationships
               the author has established. In other words, an inventory is taken and a
               word list compiled in descending order of frequency."
       Consider the following pair of sentences, proposed by Luca Scagliarini of Expert
       System. The two cases produce the same “bag of words” but their meanings, the data
       content of the texts, is very different given the switch of fell and gained.
               The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's
               500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq
               composite gained 6.84, or 0.32 percent, to 2,162.78.
               The Dow gained 46.58, or 0.42 percent, to 11,002.14. The Standard &
               Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq
               composite fell 6.84, or 0.32 percent, to 2,162.78.
       Linguistic approaches will, for instance, analyze the parts of speech of a phrase,
       detecting the subject-verb-object triple that constitutes a factual (or subjective)
       statement as well as additional, modifying elements.
                                                                 Natural Language Processing
       Part-of-speech (POS) analysis is typically one of a sequence or pipeline of resolving
       steps applied to text. Other, typically applied steps include:
               Tokenization: Identification of distinct elements within a text, usually words,
               expressions, punctuation markets, white space, etc.
               Stemming: Identifying variants of word bases created by conjugation,
               declension, case, and pluralization, e.g., “act” for “acts,” “actor,” and “acted.”
               Lemmatization: Use of stemming and other techniques, including analysis of
               context and parts of speech, to associate multiple words or terms with a
               canonical term. For example, "better" might have "good" as its lemma.
               Entity Recognition: Look-up in lexicons or gazetteers and use of pattern
               matching to discern items such as names of people, companies, products, and
               places and expressions such as e-mail addresses, phone numbers, and dates.
               Tagging: XML mark-up of distinct elements, a.k.a. text annotation.
       Entities are one type of “feature” found in text. Other features of interest include:
               Attributes: A person‟s attributes include age, sex, height, and occupation.
               Abstract attributes: Properties such as “expensive” and “comfortable.”
               Concepts: Abstractions of entities, for instance, a category.
               Metadata: In this context, items that describe a document such as the author,
               creation date, and title as well a topic tag.
               Facts and relationships: These include statements such as “Dow fell 46.58.”
               Subjective data: Covers sentiment, opinions, mood, and other attitudinal data.
       The next section of the report looks at how the technology is applied.

Text Analytics 2009: User Perspectives

Software and Solution Market Overview
        What we now see as text analytics was actually, in the late 1950s, put forward as the
        foundation for a visionary business intelligence system. This system would focus on
        discovering and communicating relationships (and not just data values) and on
        business-goal alignment. Knowledge-management questions drove this early
        BI conceptualization, with answers to questions such as:
                What is known?
                Who knows what?
                Who needs to know?
        to be derived or discovered via text mining.3
        Such systems are technically very difficult to realize, and BI of course developed in
        other directions. Numerical data, drawn from transactional and operational systems
        and stored in databases, is far easier to analyze than is information locked in text. BI
        and related tools and techniques – spreadsheets, reporting, OLAP, data mining –
        generally do an excellent job of creating business value from this data.
        In the last few years attention has turned back to text sources. Commercial software
        vendors – and open source projects – have responded to the opportunity.

        Applications and Sources
        Applications of text mining in the life sciences and intelligence date to the 1990s, for
        purposes that include pharmaceutical lead generation – mining scientific literature to
        accelerate expensive, time consuming drug-discovery processes – and counter-
        terrorism. A number of factors – the huge and growing volume of on-line content,
        advances in search and information retrieval, cheap computing power, and better
        software – have created a market for application of these same text technologies to a
        much broader variety of business, scientific, and research problems.
                                                                             Application domains
        Market awareness has grown immensely in the last 3-5 years, but up-take and
        experiences have varied by application domain. To study adoption, survey question 2
        asked, “What are your primary applications where text comes into play?” It listed the
        following choices, an attempt to capture the most important application domains:
                Brand/product/reputation management
                Competitive intelligence
                Content management or publishing
                Customer service
                Financial services
                Insurance, risk management, or fraud
                Law enforcement
                Life sciences or clinical medicine
                Product/service design, quality assurance, or warranty claims
                Research (not listed)
                Voice of the Customer / Customer Experience Management
 “BI at 50 Turns Back to the Future,”

Text Analytics 2009: User Perspectives

                                                                             Information sources
          In each of the application areas listed above, text analytics enhances existing analyses.
          It enhances both BI and data mining applied to transactional data and non-automated
          review of textual sources, a.k.a. reading. By automating the reading process, text
          analytics allows analysts and researchers to tap material that had not previously been
          systematically mined. It allows them to work far faster than before and to analyze far
          greater volumes of information than ever before. Importantly, text analytics can
          make a huge difference in text analysis and processing costs and enable the creation of
          new information products and services.
          Survey question 3 asked about information sources. These sources may be grouped:
                  On-line and social media: blogs and other social media (twitter, social-network
                  sites, etc.); news articles; review sites or forums.
                  Enterprise communications and feedback: chat and/or instant messaging text;
                  contact-center notes or transcripts; customer/market surveys; e-mail and
                  correspondence; employee surveys; point-of-service notes or transcripts;
                  SMS/text messages; warranty claims/documentation; Web-site feedback.
                  Operational materials (of course varying by business): crime, legal, or judicial
                  reports or evidentiary materials; insurance claims or underwriting notes;
                  medical records; patent/IP filings; scientific or technical literature.
                                                                              Application modes
          The applications themselves vary widely. They may be classified in several
          (overlapping) groups:
                  Media and publishing systems – the author includes search engines here – use
                  text analytics to generate metadata and enrich and index metadata and
                  content in order to support content distribution and retrieval. Semantic Web
                  applications would fit in this category.
                  Content management systems – and, again, related search tools – use text
                  analytics to enhance the findability of content for business processes that
                  include compliance, e-discovery, and claims processing.
                  Line-of-business systems for functions such as compliance and risk, customer
                  experience management (CEM), customer support and service, human
                  resources and recruiting.
                  Investigative and research systems for functions such as fraud, intelligence
                  and law enforcement, competitive intelligence, and life sciences research.
          This list is representative and not exhaustive. All listed areas are experiencing strong
          growth. In certain cases, text-analytics‟ contribution is not at all obvious. Google and
          other major search engines top their responses to “map massachusetts” and “34+178”
          and “orcl” with a map, the number 212, and Oracle share data, respectively, enabled by
          their ability to recognize named entities and expressions. This particular application
          of text analytics is shallow but reaches a very, very large audience.
                                                                              Solution providers
          Text-analytics solution providers include a significant cadre of young but mature
          pure-play software vendors, software giants that have built or acquired text
          technologies, robust open-source projects, and a constant stream of start-ups, many of
          which focus on market niches or specialized capabilities such as sentiment analysis.
          The provider-side is vibrant and doing well despite the adverse economic climate due
          to the market‟s growing awareness of solution providers‟ ability to respond to
          business needs and technical challenges alike.4


Text Analytics 2009: User Perspectives

Demand-Side Perspectives
       Alta Plana designed a spring 2009 survey, “Text Analytics demand-side perspectives:
       users, prospects, and the market,” to collect raw material for an exploration of key text-
       analytics market-shaping questions:
                What do customers, prospects, and users think of the technology, solutions,
                and vendors?
                What works, and what needs work?
                How can solution providers better serve the market?
                Will your companies expand their use of text analytics in the coming year?
                Will spending on text analytics grow, decrease, or remain the same?
       It is clear that current and prospective text-analytics users wish to learn how others
       are using the technology, and solution providers of course need demand-side data to
       improve their products, services, and market positioning, to boost sales and better
       satisfy customers. The Alta Plana study therefore has two goals:
           1.   To raise market awareness and educate current and prospective users.
           2. To collect information of value to sponsors.
       Survey findings, as presented and analyzed in this study report, provide a form of
       measure of the state of the market, a form of benchmark. They are designed to be of
       use to everyone who is interested in the commercial text-analytics market.

       Study Context
       Text-analytics solutions have been applied to a spectrum of business problems.
       Provider revenues are booming (for most established providers). Academic and
       industrial research is only expanding, and there has been a steady pace of emergence
       of new companies in the field, many of them academic spin-offs. Demand-side views
       are, anecdotally, quite positive, judging from published user stories and case studies
       and based on inquiries from organizations that are researching solutions.
       The author previously explored market questions in a number of papers and articles.
       These included white papers created for the Text Analytics Summit in 2005, The
       Developing Text Mining Market,”5 and 2007, “What's Next for Text.”6
                                                              Analyst and Provider Analyses
       The 2007 paper contains a number of telling quotations:
           “Organizations embracing text analytics all report having an epiphany
           moment when they suddenly knew more than before.”
                                          – Philip Russom, the Data Warehousing Institute
           “Growth is largely driven by the wealth of unstructured information found
           on the external web, in corporate intranets, document repositories, call-
           centers, and in customer and employee business communications.”
                                                           – IBM researcher David Ferrucci
       Other analysts and solution providers have had a lot to say about text analytics‟
       benefits and growth. The article “Perspectives on Text Analytics in 2009”7 is a
       systematic (albeit informal) survey of industry perspectives that reports provider

Text Analytics 2009: User Perspectives

          CEO and CTO and thought-leader responses to the query:
              “What do you see as the 3 (or fewer) most important text analytics
              technology, solution or market challenges in 2009?”
          Responses were informative, based on the respondents‟ own research and, especially,
          on extensive contact with customers and prospects.
          In the current context, a market challenge articulated by Aaron B. Brown, IBM
          program director for ECM Discovery, is particularly telling. That challenge is for
          solution text-analytics providers to better define business cases. According to Brown,
              “In the current economic situation, organizations are clamping down on new
              projects and more than ever looking for hard ROI savings to justify
              investment. To pass the funding bar, text-analytics solutions, which typically
              fall in the category of new projects undertaken for business optimization, need
              to come with solid business cases that demonstrate hard-dollar operational
              savings based on proven examples. Given the emerging nature of many text-
              analytics solution areas, this will be a challenge to growth in 2009.”
          Business cases don‟t rest solely on solution-provider research and assertions, of
          course. Demand-side experiences and perceptions can and should also contribute.
                                                                            Demand-Side Views
          A systematic look at the demand side will provide a good complement to provider-
          side views and to vendor- and analyst-published case studies.
          Alta Plana‟s 2008 study report, “Voice of the Customer: Text Analytics for the Responsive
          Enterprise,”8 published by, was our first systematic survey of
          demand-side perspectives, albeit focused on a particular set of business problems.
          VoC analysis is frequently applied to enhance customer support and satisfaction
          initiatives, in support of marketing, product and service quality, brand and reputation
          management, and other enterprise feedback initiatives. The listening concept is
          extended to other voice applications: Voice of the Patient, Voice of the Market, etc.
          Views related in our 2008 study were telling:
              “Text Analytics is exciting technology, opening up new applications
              and approaches to solving information needs and supporting decision
              making for an improved customer experience.”
                                – Michael House, Maritz Research, Division Vice President
              “We've uncovered concepts and relationships in text that would be too
              costly – or even impossible – to detect by any other methods. We can
              now combine multiple data sources to evaluate customer expectations
              and improve customer satisfaction by employing more one-to-one
              customer contact and preemptively resolving customer complaints to
              keep our retention rates high."
                        – Federico Cesconi, Cablecom, head of customer insight and retention

          About the Survey
          There were 116 responses to the 2009 survey, which ran from April 13 to May 10.
                                                                              Survey invitations
          The author solicited responses via:
                      E-mail to the TextAnalytics, Corpora, datamining2, sla-dkm (Special

Text Analytics 2009: User Perspectives

                       Libraries Association, Division for Knowledge Management), sla-dite
                       (SLA Information Technology), Asis-l (American Society for
                       Information Science), and GATE lists and the author‟s personal list.
                       Invitations published in electronic newsletters: Intelligent Enterprise,
                       KDnuggets,, TDWI‟s BI This Week,
                       Text Analytics Summit, and
                       Notices posted to LinkedIn forums and Facebook groups and on
                       Messages sent by sponsors to their communities.
                                                                             Survey introduction
           The survey started with a definition and brief description as follow:
               Text Analytics is the use of computer software to automate:
                         annotation and information extraction from text – entities, concepts,
                         topics, facts, and attitudes,
                         analysis of annotated/extracted information, and
                         document processing – retrieval, categorization, and classification, and
                         derivation of business insight from textual sources.
               This is a survey of demand-side perceptions of text technologies, solutions, and
               providers. Please respond only if you are a user, prospect, integrator, or
               There are 20 questions. The survey should take you 5-10 minutes to complete.
               For this survey, text mining, text data mining, content analytics, and text
               analytics are all synonymous.
               I'll be preparing a free report with my findings. Thanks for participating!
                                                                                   Survey response
           There is little question that the survey results overweight current text-analytics users
           – 63% of respondents who answered Q1, “How long have you been using Text Analytics?,”
           versus 61% of respondents who replied to Q7, “Are you currently using text analytics?” –
           among the broad set of potential business, government, and academic users.
           BI market comparison
           We can infer this overweighting, for example, from market-size figures. The author
           estimates a $350 million global market for text-analytics software and vendor supplied
           support and services. By contrast, in March 2009, research firm IDC published a
           preliminary, 2008 BI-market estimate. IDC‟s sizing “suggests that the business
           intelligence tools software market grew 6.4% in 2008 to reach $7.5 billion.”9 Former
           Forrester analyst Merv Adrian estimated $8.4 billion for 2008. A simple, good-enough
           heuristic says that if the BI market is 20 times the size of the text-analytics market,
           there are likely around 20 times as many BI users as there are text-analytics users.
           Data mining comparison
           Another contrasting data point is that 55% of respondents to a March 2009 KDnuggets
           poll10 report currently using text analytics on projects. KDnuggets reaches data
           miners, a technically sophisticated audience who are among the most likely of any
           market segment to have embraced text analytics. The rate of text-analytics adoption
           by data miners surely exceeds the rate adoption by any other user sector.


Text Analytics 2009: User Perspectives

                     How much did you use text analytics / text mining in 2008?
             Did not use (45)                                                     45%
             Used on < 10% of my projects (17)                    17%
             Used on 10-25% of projects (14)                    14%
             Used on 26-50% of my projects (11)                11%
             Used on over 50% of my projects (14)               14%
       As an aside, that 52% of KDnuggets respondents stated that in 2009, they would use
       text analytics more than in 2008, with 42% stating their use would be about the same
       as in 2008, strongly suggests growth in the user base.

Text Analytics 2009: User Perspectives

Demand-Side Study 2009: Response
       The subsections that follow tabulate and chart survey responses, which are presented
       without unnecessary elaboration.

       Q1: Length of Experience

                           How long have you been using Text Analytics?



              Percentage   30%



                                    not using,
                                                                              6 months        one year    two years
                                                  currently    less than 6      to less        to less     to less     four years
                                                 evaluating      months       than one        than two    than four     or more
                                     plans to
                                                                                 year           years       years
             Response %                16%            22%         8%               5%           7%             18%       25%
             Cumulative Response                                  8%               13%          20%            37%       63%

       Q2: Application Areas

                 What are your primary applications where text
                               comes into play?
     Brand / product / reputation management                                                                     40%
                      Competitive intelligence                                                                 37%
  Voice of the Customer / Customer Experience …                                                          33%
                           Research (not listed)                                                         33%
                                 Customer service                                         22%
           Content management or publishing                                             19%
              Life sciences or clinical medicine                                   18%
         Insurance, risk management, or fraud                                      17%
                                 Financial services                           15%
                                       E-discovery                            15%
  Product/service design, quality assurance, or …                            14%
                                             Other                           13%
                                      Compliance                   8%
                              Law enforcement                     7%

                                                      0%      5% 10% 15% 20% 25% 30% 35% 40% 45%

Text Analytics 2009: User Perspectives

               Q3: Information Sources

                   What textual information are you analyzing or do you
                                     plan to analyze?
                           blogs and other social media                                                       47%
                                            news articles                                                   44%
                             e-mail and correspondence                                                36%
                                          on-line forums                                             35%
                              customer/market surveys                                                34%
                         scientific or technical literature                                    27%
                     contact-center notes or transcripts                                      25%
                                      Web-site feedback                                 21%
                                  review sites or forums                                21%
                                         medical records                          16%
                                       employee surveys                           16%
                insurance claims or underwriting notes                            15%
                     chat and/or instant messaging text                           15%
                                                     other                    14%
crime, legal, or judicial reports or evidentiary materials                   13%
                    point-of-service notes or transcripts                   12%
                                         patent/IP filings                  11%
                                     SMS/text messages                 8%
                       warranty claims/documentation                  7%

                                                              0%   5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

Text Analytics 2009: User Perspectives

       Q4: Return on Investment
       Question 4 asked, “How do you measure ROI, Return on Investment? Have you
       achieved positive ROI yet?” Results are charted from highest to lowest values of the
       sum of “currently measure” and “plan to measure”:
              How do you measure ROI, Return on Investment?

                increased sales to existing                                           54%
                higher satisfaction ratings

                  improved new-customer                                        46%
         higher customer retention/lower                                 39%
         reduction in required staff/higher                           38%
                 staff productivity                                            Measure or Plan to
              more accurate processing of                            36%       Measure
                    faster processing of                             36%
                claims/requests/casework                                       Plan to Measure
         ability to create new information                         34%
             fewer issues reported and/or                      30%
                  service complaints
          lower average cost of sales, new                     30%
               & existing customers
               higher search ranking, Web                    28%
                                                                               Currently Measure
                  traffic, or ad response

                                              0%     20%             40%               60%

       Q5: Mindshare
       A word cloud, generated at, seemed a good way to present responses to the
       query, “Please enter the names of text-analytics companies you have heard of.”

Text Analytics 2009: User Perspectives

       Q6: Spending
       Question 6 asked, “How much did your organization spend in 2008, and how much do
       you expect to spend in 2009, on text-analytics solutions?”

                  13%                   use open source      11%
     20%                                                                                use open source
                          7%            under $50,000                 6%
                                                                                        under $50,000
                               7%                                          8%
                                        $50,000 to $99,000                              $50,000 to $99,000
                                                                                        $100,000 to $199,999
                                        $100,000 to                                     $200,000 to $499,999
                          20%                38%
                                        $199,999                    22%                 $500,000 or above
                                        $200,000 to

        2008 Spending                             2009 Spending
       Q8: Satisfaction
       Question 8 asked, “Please rate your overall experience – your satisfaction – with text
       analytics.” Results are as shown:

                                                                            Completely satisfied

                 53%                                                        Disappointed

                                                                            Very disappointed

       Q9: Overall Experience
       Question 9 asked, “Please describe your overall experience – your satisfaction – with
       text analytics.” The following are 32 verbatim responses, lightly edited for spelling
       and grammar and to mask the two products that were named:

       We are highly satisfied. Costs were lower than expected due to high degree of automation.
       Expectations were exceeded. More timely and more fine grained customer insight and market
       intelligence and competitive intelligence than ever before.
       It's been a fun journey, but still struggling with how to get to root cause and how far text

Text Analytics 2009: User Perspectives

       analytics can get you there vs. need analysts.
       No one solution addresses every use case. Some solutions better address the up-front creation of
       dictionaries than others.
       I would like a more automated system the integrates with our current IS.
       Not really neutral but it's sort of a love hate thing. There's a very high learning curve,
       sometimes it's seductive to measure things that aren't relevant - to run things just because you
       cannot because they tell you anything. But the customers like it - even if they don't understand
       I want to see more applications
       Pretty good on named entity extraction, fairly good on fact extraction, poor on sentiment
       Several possibilities, several applications; Emphasis on efficiency enhancing; solutions;
       Problems in selling accuracy.
       I was satisfied of the effectiveness of the tools - specifically for named-entity recognition.
       Good but still have a ways to go with capabilities
       OK, it is hard to describe satisfaction of using text analytics tools when we all know how
       language is ambiguous and complex - we cannot expect too much from automatic processing yet,
       maybe in the time when neutral networks can be used, but NLP on its own cannot impress us
       yet I think.
       Developing part-of-speech tagging for Arabic text, morphological analyzer, to deal with wide
       range of text domain, formats and genres.
       Frustration with developing custom dictionaries that allow real-time categorization of content.
       Pleased with progress in neural analysis of text content.
       I'm building this all myself using open source tools. I'm extremely satisfied.
       Hard learning curve, but we have it going now.
       We have pretty low expectations for the accuracy of automated classification techniques, and
       those are fulfilled but not exceeded. We use automated categorization in building demos, but
       most of our customers use semi-automated or manual tagging.
       It has been extremely valuable in certain situations. We always look at the text and verbatims
       with our [product] software
       It's great, but most of it is primarily designed for the English language only. As soon as you
       need other languages, you need a lot of different providers (= increased implementation costs)
       or you have to pay a lot of money.
       I have written an entire textbook based upon text analytics and plan to write another.
       92% accuracy, 6.7 fold increase in productivity, cut search time by 50%
       Hundreds of hours of auditors’ time has been saved by a combination of scanning of hard copy
       evidence, electronic evidence collection, and importing into [product], building business rules
       from auditors defined keywords to produce first cut analysis classification.
       Very satisfied - state-of-the-art in text analytics is advancing at a very rapid pace and text-
       analytics based solutions are able to demonstrate business value addition/ROI.
       Feedback from our users with the current tools is that they are not meeting their needs, which
       is why we are looking at other solutions.
       Difficult implementation into our core software, but now works as designed.
       We have presented sentiment analysis on a wide range of documents and used the information
       to be predictive in nature.
       Text analytics allows us to gain new customer and market insights as well as better
       competitive intelligence: higher report frequency, automated reporting, lower cost, finer
       Great hopes.
       Long way to go.
       Too early to tell.
       10 million Voice of Customer can be in real time understood.

Text Analytics 2009: User Perspectives

       Q12: Like and Dislike
       Question 12 asked, “What do you like or dislike about your solution or software
       provider(s)?” Respondents were allowed to enter up to five points. Twenty-seven
       individuals responded, entering a total of 82 points. One respondent entered “cost” in
       all five slots.
       The following table normalizes, classifies as positive or negative, and groups the
       responses into thematic categories. We take the sum of positive and negative remarks
       in a category as indicating the category‟s importance, so the chart is sorted in
       descending order of (sum) number of remarks.
                  What do you like or dislike about your solution or
                               software provider(s)?

  12                                                        Plus

  10                                                        Minus






Text Analytics 2009: User Perspectives

              Q13: Information Types

                  Do you need (or expect to need) to extract or analyze -

                                                Other                15%

     Other entities – phone numbers, e-mail & street
Metadata such as document author, publication date,
                title, headers, etc.

                   Events, relationships, and/or facts                                        55%

         Concepts, that is, abstract groups of entities                                        58%

            Sentiment, opinions, attitudes, emotions                                               60%

                                   Topics and themes                                                 65%

    Named entities – people, companies, geographic
        locations, brands, ticker symbols, etc.

                                                          0%   10%   20%   30%   40%   50%    60%    70%        80%

              Q19: Comments
              There were twelve comments. Several pushing-the-envelope respondent observations
              were particularly interesting:
                   “We were shocked at the lack of appreciation for hosted and/or turnkey
                   solutions from many vendors we evaluated in 2008. The product capabilities
                   of many commercial solutions were poorly conceived, leading us to believe
                   that they didn't really understand the potential of text analytics.”
                   “As a market research supplier, my clients cross a number of industries.
                   Thus, lack of scalability is the major obstacle to adopting text analysis for my
                   “Twitter data requires new text analytic algorithms, because of the presence
                   of „@person‟ fields, hashtags, and HTML links that have been shortened. As a
                   consequence, "traditional" algorithms don't work. I am developing those
                   algorithms myself, which is yet another reason I use open source tools
              One other comment is interesting and prompts a response.
                   “We are building an information retrieval product and wish to embed out-of-
                   the-box functionality but with the option to plug in other 3rd party analytical
              The response is that several frameworks provide a plug-in architecture for the
              construction of IR and other text-analytics applications. These include:
                       UIMA, the Unstructured Information Management Architecture, an Apache
                       Incubator project that was recently approved as an OASIS standard.
                       GATE, the General Architecture for Text Engineering.
                       Eclipse SMILA, a new SeMantic Information Logistics Architecture project.

Text Analytics 2009: User Perspectives

                 Q14: Important Properties & Capabilities
                 What is important in a solution?

                                   Important Properties & Capabilities
 ability to use specialized dictionaries, taxonomies, or extraction

                           broad information extraction capability                                          59%

                               deep sentiment/opinion extraction                                      53%

                                                            low cost                                  51%

                                    support for multiple languages                              39%

                                   predictive-analytics integration                             37%

                              BI (business intelligence) integration                           35%

                                                       open source                       24%

                                ability to create custom workflows                       24%
sector adaptation (e.g., hospitality, insurance, retail, health care,
              communications, financial services)

                              media monitoring/analysis interface                       22%

                                    hosted or "as a service" option                     22%

                           supports data fusion / unified analytics                 19%

                    interface specialized for your line-of-business                17%
     vendor's reseller/integrator/OEM relationships with tech or
                           service providers

                                                               other         9%

                                                                        0%        20%          40%     60%         80%

Text Analytics 2009: User Perspectives

Additional Analysis
       The survey was designed so that responses to questions would be easy to interpret and
       immediately useful without elaborate cross-tabulation or filtering. The exception was
       cross-tabulation of length of time using text analytics and of whether a respondent is
       currently using text analytics or not with other variables.

       Selected Cross-tabulations
       The author‟s interpretation of survey findings generally supports prior notions, points
       such as –
                 Length of involvement with text analytics correlates with particularity of
                 requirements. Each bar represents the percentage of respondents in a time
                 category who indicated that “ability to…” is important:
              80%                                                      Ability to use specialized
              60%                                                      dictionaries, taxonomies,
              50%                                                      or extraction rules is
              30%                                                      important
               0%                                                      Ability to create custom
                      less than    6    one year two       four        workflows is important
                          6      months to less years to years or
                       months to less than two less than more
                                than one years    four
                                  year           years

                 Length of involvement with text analytics correlates with preference for open

                           Open source is important versus Time using Text
                          less than 6 6 months to one year to two years to four years or
                            months less than one less than two less than      more
                                         year         years    four years

                                                                                     Using / Not
       Other interesting points come out of contrasting respondents who are already using
       text analytics with respondents who are still in planning stages.
       The top responses to “What textual information are you analyzing or do you plan to
       analyze?” for current users are:
                         blogs and other social media (twitter, social-network       62%
                         sites, etc.)

Text Analytics 2009: User Perspectives

                       news articles                                            55%
                       on-line forums                                           41%
                       e-mail and correspondence                                38%
                       customer/market surveys                                  35%
       These are on-line and other feedback-rich sources. Their high rate of selection
       suggests that veteran users have found significant benefit in these sources.
       By contrast, only three information-type categories were selected by over 26% of
       respondents who are not yet using text analytics:
                       e-mail and correspondence                                37%
                       customer/market surveys                                  34%
                       contact-center notes or transcripts                      29%
       It‟s easy to infer that the value of on-line materials (social media, news articles,
       forums), which is evident to current users, has not yet become clear to prospective
       users. That only a minority chose any particular category suggests some combination
       of the following, that
               Prospective users are more broadly distributed across application categories.
               Prospective users are cautious about how many different sources they tackle
       The particular top selections suggest that the plurality – the largest portion – of
       prospective users will focus initially on materials they have on hand that involve
       interactions with known stakeholders. Web sources can come later.
       Prospective users are not similarly guarded in their expectations. When responses to
       Question 4 “How do you measure ROI, Return on Investment?” are split out by
       current versus prospective use, six measures are each selected by between 50% and
       55% of prospective-user respondents. They are:
               increased sales to existing customers
               improved new-customer acquisition
               higher satisfaction ratings
               fewer issues reported and/or service complaints
               faster processing of claims/requests/casework
               reduction in required staff/higher staff productivity
       (Of prospective-user respondents, almost a quarter are already using “increased sales
       to existing customers” as an ROI measure, which make sense. Sales are easily tracked
       and analyzed by current systems where items such as satisfaction ratings are not.)
       “Higher customer retention/lower churn” comes in at just under 50% and three others
       top 38%.
       These prospective users, and the folks who advise them, would do well to manage and
       focus their expectations.

       Interpretive Limitations
       The number of survey respondents was not large enough to support further useful

Text Analytics 2009: User Perspectives

       cross-tabulation of variables beyond the analyses above.
       In interpreting presented findings, do keep in mind that the survey was not designed
       or conducted scientifically, that is, with the intention or the actuality of creating a
       random sample or a statistically robust characterization of the broad market.
       Findings surely reflect selection bias due to 1) the outlets where the survey was
       advertised and 2) a likelihood that those individuals who are unaware of text analytics
       or the potential for text analytics to help them solve their business problems would
       not respond to the survey. Findings therefore over-represent current text-analytics
       users and also over-represent, to a lesser extent, the business intelligence and data
       warehousing communities.
       Finally, responses to several of the survey questions were not especially illuminating
       or likely to be of much use to report readers. These questions are, in particular,
              Question 10. Who is your provider?
              Question 11. How did you identify and choose your provider?
              Question 15. What BI (business intelligence) software do you use if any?
              Question 16. What social media do/would you look to for text-analytics
              contacts, discussions, or information?
              Question 17. What industry publications do you receive, on paper or
              Question 18. What industry/technical conferences do you attend?

Text Analytics 2009: User Perspectives

About the Study
       Text Analytics 2009: Users Perspectives on Solutions and Providers reports the findings
       of a study conducted by Seth Grimes, president and principal consultant at Alta Plana
       Corporation. Findings were drawn from responses to a spring 2009 survey of current
       and prospective text-analytics users, consultants, and integrators. The survey asked
       respondents to relay their perceptions of text-analytics technology, solutions, and
       vendors. It asked users to describe their organizations‟ usage of text analytics and
       their experiences.

       The author is grateful for the support of seven sponsors – Attensity, Clarabridge, the
       University of Sheffield (GATE project), IxReveal, Nstein, SAP, and TEMIS – whose
       financial contribution enabled him to conduct the current study and publish study
       findings. The content of the sponsor solution profiles was provided by the sponsors.
       The survey findings and the editorial content of this report do not necessarily
       represent the views of the study sponsors. This report, with the exception of the
       sponsor solution profiles, was not reviewed by the sponsors prior to publication.

       Media Partners
       The author acknowledges assistance received from six media partners in
       disseminating invitations to participate in the survey. Those media partners are
       Intelligent Enterprise, KDnuggets,,, the
       Text Analytics Summit, and The Data Warehousing Institute (TDWI).

       Seth Grimes
       Author Seth Grimes is an information technology analyst and analytics strategy
       consultant. He is contributing editor at Intelligent Enterprise magazine, founding chair
       of the Text Analytics Summit, an instructor for The Data Warehousing Institute (TDWI),
       KDnuggets contributor, and text analytics channel expert at the Business Intelligence
       Seth founded Washington DC-based Alta Plana Corporation in 1997. He consults,
       writes, and speaks on information-systems strategy, data management and analysis
       systems, industry trends, and emerging analytical technologies.
       Seth can be reached at, 301-270-0795.

Text Analytics 2009: User Perspectives

        Sponsor Solution Profiles

Text Analytics 2009: User Perspectives

Solution Profile: Attensity
Business is built on conversations. These customer, partner, and employee conversations are captured in emails, call
notes, letters, surveys, forums and other social media, and more. Attensity enables you to use these conversations
to drive better relationships with your customers – transforming them into loyal advocates of your business.
Attensity delivers the power of sophisticated data and semantic analytics in an integrated suite of easy-to-use
business applications, allowing business leaders, customer support personnel, and customers to get relevant and
actionable answers fast.
An Integrated Suite of Products to Help You Manage the Customer Conversation: Analyze and Respond
Attensity's ability to extract valuable insight from free-form text anywhere and transform it into actionable insights
offers organizations the opportunity to understand their customers and to manage the entire customer conversation
– analyzing and responding to customer needs. Recognized as best-of-breed by leading analysts for more than a
decade, our applications, powered by the industry’s leading natural language processing technologies, are designed
to automate related business processes, and add the rigor and speed necessary to swiftly identify often subtle
relationships and root causes and to respond timely and accurately to customers. Equally important, our easy-to-use
business applications are not only designed for analysts, but also for business leaders, researchers, brand and
category managers, and customer service representatives, while also used directly by customers to efficiently self-
Attensity Voice of the Customer/Market Voice allows your organization to glean and analyze your customers’ candid
thoughts about your brand and products, rapidly and accurately understanding and analyzing comments in E-Service
records, surveys, and emails, along with the market buzz found in web communities, blogs, product reviews and social
media sites. This delivers the actionable insights - authentic customer sentiments and issues around your brand,
products, services, your competitors and more -- you need to make smarter decisions and deliver better products and
services. Attensity Voice of the Customer/Market Voice features sophisticated integrated reporting and pre-
packaged Voice of the Customer extraction domains for fast-time-to-value, detailed sentiment analysis, and an
extensive partner solutions network to help you
extend the value of your applications.
Attensity’s other products include E-Service Suite,
Automated Response Management, Research and
Discovery and Intelligence Analysis. E-Service Suite
offers an Agent Service Portal and a Self-Service
application that enables your customers to
effectively self-serve while your agents are
empowered to extend informed and efficient service
support real-time. Attensity Automated Response
Management, a part of the E-Service Suite,
optimizes and automates up to 100% of the handling
of all incoming and outbound customer
communications, enabling you deliver a superior
customer experience while achieving significant
operational efficiency and productivity gains in your
contact center. Research and Discovery provides your organization with sophisticated information extraction,
advanced classification and enterprise-class search of and access to internal and external data, helping you meet
compliance and litigation demands while controlling costs. Intelligence Analysis allows commercial and government
organizations to “connect the dots” by delivering automatic extraction and analytical processing of “relational events”
from unstructured data –not only who or what, but the “why, when, where and how.”
A Relentless Focus on Customer Success

Companies across the full industrial spectrum and around the globe are discovering how our advanced solutions help
them thrive by helping resolve customer support issues more quickly, enable more accurate research and analysis of
customer feedback, and rapidly address and proactively prevent problems while mitigating risk. Across industries,
companies are optimizing customer interaction processes in the contact center, deepening customer relations
through effective and efficient self-serve support, and growing their competitive edge with Attensity solutions
adapted to their industry specific business needs.
Attensity’s team of vertical experts allow us to provide expert advice and specialized applications for areas such as
aerospace, automotive, consumer packaged goods, contact center outsourcing, financial services and insurance,
government and law enforcement, healthcare, hospitality, manufacturing, media and publishing, retail, technology,

Text Analytics 2009: User Perspectives

and telecommunications. Attensity has a strong record of customer success across all of our products, including Voice
of the Customer, E-Service, and Research and Discovery. Three of our VoC success stories are presented here.
JetBlue Airways |
New York-based JetBlue Airways has created a new airline category based on value, service and style. Known for its
award-winning service and free TV as much as its low fares, JetBlue is now pleased to offer customers the most
legroom throughout coach (based on average fleet-wide seat pitch for U.S. airlines). JetBlue is also America’s first and
only airline to offer its own Customer Bill of Rights, with meaningful compensation for customers inconvenienced by
service disruptions within JetBlue’s control.
JetBlue Airways currently uses Attensity’s Voice of the Customer application in its customer service organization to
uncover customer issues, requirements and overall sentiment about the airline. The company’s pilot project
demonstrated a significant ability to find key information about customer sentiment and tangible data around how to
augment its services. JetBlue uses Attensity VoC to proactively manage and analyze all freeform customer feedback to
improve service, marketing, sales and the products they offer.
“From our Customer Bill of Rights to our customer advisory council, JetBlue is dedicated to bringing humanity back to
air travel,” Bryan Jeppsen, Research Analyst Manager said. “One of the best ways to do that is to listen — truly listen
— to our customers. Our commitment with Attensity enables us to mine subtle but important clues from all forms of
customer communications to continue improving all aspects of our company. We’re eager to learn as much as we
can, and we’re excited to have Attensity’s simple to use yet sophisticated software at our service.”
JetBlue customer service analysts use Attensity VoC daily to cull insights and actions from feedback. “Attensity Voice
of the Customer offers us the unprecedented ability to automatically extract customer sentiments, preferences and
requests we simply wouldn’t find any other way,” according to Jeppsen. “Attensity VOC enables us to intelligently
structure, search and integrate the data into our other business intelligence and decision-making systems.”
Charles Schwab |
For this Global 1000 investment services firm, Attensity is a central part of efforts to understand and act on customer
feedback. With hundreds of thousands of interactions per month, the need to understand customer issues, act on
signs of dissatisfaction and churn and drive sales and service interactions can be the difference between success and
failure. With Attensity they are able to capture these interactions through customer service notes, emails, survey
responses and online discussions and analyze them to power customer retention and growth.
Attensity Voice of the Customer enables Schwab to analyze customer feedback to drive proactive programs and
understand emerging issues and opportunities, communicate key issues and opportunities at the client segment level
on a daily basis, and integrate this valuable customer feedback into their SAS analytics platform on their Teradata
data warehouse to expand the customer signature and to deepen customer loyalty analytics.
Attensity has become integral to Schwab customer program planning and churn identification efforts. The firm has
improved satisfaction and been able to mitigate churn via improved direct broker communications with customers
and marketing programs. Customer satisfaction, specifically reasons customers are not happy, is directly monitored
and specific issues are addressed. Issues can include problems with services, communication, collateral, and specific
individual interactions. Attensity also helps the firm dig deep into Net Promoter™ Program results, uncovering
reasons customers give low scores and identify as “detractors.” Attensity contributed to important changes to
account statements. Most importantly, Attensity reduced the time needed to analyze customer satisfaction issues
from almost 1 year to less than one week!
Whirlpool |
As a $13.2B appliance manufacturer and the #1 appliance manufacturer in the world, Whirlpool focuses on great
products and great customer relationships to maintain and grow its global customer base. As a customer-centered
company, Whirlpool need to understand the root cause of pain points and brand, product, and service related issues.
With the vast amounts of customer service records, emails, survey response and online community forums, there is
more than enough data to get and use customer insights to improve customer experiences.
When Whirlpool started with Attensity in 2004, the company wanted to be able to leverage the web and over 8.5
million annual customer and repair visit interactions captured in service notes to drive marketing programs, product
development, and quality initiatives. Whirlpool has done just that and more. With over 300 Attensity VoC users
worldwide, Whirlpool listens and acts on customer data in the service department, its innovation and product
developments groups, and in the market every day.
With Attensity VoC, Whirlpool gets early warning of safety and warranty issues and has been able to mitigate
expensive recalls through rapid change out of defective parts. Whirlpool extrapolates an ~80% reduction in the cost
of recalls due to early detection. In addition to Attensity-fueled product quality improvements, Whirlpool better
understands customers’ needs and wants – and the competition and what they are doing to win over customers.

Text Analytics 2009: User Perspectives

Solution Profile: Clarabridge
                                                          Clarabridge was founded with the
                                                          simple premise of enabling companies
                                                          to    drive   business    value    by
                                                          understanding key customer and
                                                          prospect experiences. Now more than
                                                          ever, consumer-focused companies
                                                          turn to Clarabridge to help retain
                                                          customers, attract new customers, cut
                                                          servicing and operational costs, sell
                                                          more products to current customers,
                                                          and develop more relevant products
                                                          and services. Clarabridge is the
                                                          leading provider of text mining
                                                          software for Customer Experience
                                                          Management (CEM) due to four key
    Commitment to CEM applications: Clarabridge’s rapid growth is due to a focus on the
    value our customers gain from leveraging our VOC solutions. Our staff, technology,
    customers, and partners are all 100% focused around delivering VOC applications, and our
    entire company is committed to providing business value for our customers.
    Speed-to-Value: No other advanced text mining solution can be deployed as efficiently and
    powerfully as Clarabridge. Whether an implementation is source specific or enterprise-wide,
    no other vender can compete with the speed in which our customers not only implement but
    recognize value.
    Market Leadership: We believe that being a market leader is more than market statistics
    and sales wins. While Clarabridge dominates these statistics, we believe that being the
    market leader also means being a thought leader, an innovator and a standard-setting force
    in the marketplace. Clarabridge is the first company in our industry to organize a specific
    user group and conference on using text-mining to support VOC.
    The Best Technology: There are many great technologies in the text mining world. Some
    are proven in academia and government think tanks, others within very controlled
    implementations. But no current text mining technology can compete with our ability to deliver
    repeatable and tangible business value within the commercial space.
Enabled by text analytics, CEM provides the opportunity to create innovative offerings from the
start while targeting the precise customer segments and later react to customer feedback on
desired improvements and enhancements.
Text Mining to Support Business Improvements
Clarabridge’s content mining
process involves three
integrated components:
1. Collect and Connect:
Clarabridge's pre-built source
connectors allow easy access
to external and internal
customer information,
harvesting content from all of
your listening posts.
Clarabridge’s built-in feedback
module allows the design,
deployment and capture of
surveys, campaigns,
community forums and web

Text Analytics 2009: User Perspectives

2. Mine and Refine:
Once all textual content is sourced, Clarabridge extracts meaning through its fully integrated and
automated features, so millions of verbatims transform seamlessly into actionable information.
Clarabridge deep parsing Natural Language Processing technology extracts parts of speech and
linguistic relationships. This output is used for downstream entity & fact extraction, sentiment
extraction, categorization, and root cause analysis.
3. Analyze and Discover:
Clarabridge provides two interfaces with a range of functional and analytic tools: Clarabridge
Reporting and Analysis and Clarabridge Navigator. Analysts and business users can identify key
themes and emerging issues, dynamically investigate results, set up alerts and drill into root
causes with the full discovery functionality integrated into the software.
Case Studies: Technology in Action
Today, leading Fortune 1000 companies across all major markets rely on Clarabridge for
the essential customer experience intelligence they require for strategic insight and pre-
emptive action. Supported by the Clarabridge Content Mining Platform, clients are able to
capture the 360-degree view on current customer attitudes and sentiment shifts, rather than to
settle for a limited understanding of their Voice of the Customer.
Use cases reflect Clarabridge’s successful engagements and their outcomes with clients across a
range of major industries.
    AOL uses Clarabridge to manage, capture and analyze over 5 million website feedback forms
    for over 150 products in dozens of languages. Clarabridge automatically processes and
    reports the now quantified insights directly to product teams.
    A major international airline company uses Clarabridge to capture and analyze over 7 million
    surveys per year, allowing them to analyze drivers of loyalty and dissatisfaction for all of their
    customer segments. The airline can better meet the needs of their passengers through
    improved understanding of the drivers of customer satisfaction.
    Gaylord Entertainment used Clarabridge to replace their manual guest satisfaction review
    process with automatic coding, sentiment extraction and reporting. VOC analysis is available
    near real-time based on the needs of Gaylord employees. Driving more business through
    high value event planners and raising customer satisfaction scores, Gaylord has had
    enormous business and customer experience success using Clarabridge.

Vision, Experience, and Strength
Clarabridge’s goal is to help you fully access your customer experience intelligence—and
leverage that information to your advantage. By bridging the gap between your customer’s
experience and your brand’s promise, we provide a unique portal into the human dimension of
your business. With this insight, you gain the strategic edge in serving your customers, controlling
costs and risk, competing resourcefully, and building profitability.
When you work with Clarabridge, you work with the management team that had guided the
company’s growth and innovation from the start. Each has had decades of experience, bolstered
by successful entrepreneurial ventures and strengthened by prior top-level management
experience. Executives, who include a nationally recognized entrepreneur and a multiple patent
holder, are all published authors and frequent speakers at industry conferences.
With a commitment to excellence, partnership model with clients, and fast-paced development
processes, Clarabridge is strong from the ground up. What’s more, our financial backing, board
advisors, reputation, and partnerships are sound, ensuring our software will evolve to meet your
emerging demands.

Text Analytics 2009: User Perspectives

Solution Profile: GATE
An Open Source Solution                                General Architecture
          for Full Lifecycle                           for Text Engineering
            Text Analytics                   

FREE                                                  founder member of OASIS/UIMA committee.
Open source, licensed under LGPL allowing EFFICIENT
unrestricted commercial use, hosted on SourceForge. Optimisations included with the latest version
100% JAVA                                           provide a 20 to 40% speed and memory usage
Runs on any platform supporting Java 5 or later. improvement.
Developed and tested daily on Linux, Windows, and Highly efficient finite state text processing engine;
Mac OS X.                                         many plug-ins with linear execution time.
In development since 1996; now at version 5.0; Assessed as “outstanding” and “internationally
around 20 active developers.                          leading” by an anonymous EPSRC peer review.
COMPREHENSIVE                                         Used at thousands of sites: companies, universities
                                                      and research laboratories, all over the world.
Support for manual annotation, performance
                                                      ~35,000 downloads/year.
evaluation, information extraction, [semi-]automatic
semantic annotation, and many other tasks.            Rolling funding for more than 15 staff at the
                                                      University of Sheffield.
Over 50 plug-ins included with the standard
distribution, containing over 70 resource types. Many DATA MANAGEMENT
others available from independent sources.            Pluggable input filters with out of the box support
                                                      for XML, HTML, PDF, MS Word, email, plain text, etc.
                                                      Common in-memory data model built around
                                                      stand-off annotation, documents and corpora.
                                                      Persistent storage layer with support for XML or
                                                      Java serialisation. I/O interoperation with many
                                                      other systems.
                                                      STANDARD ALGORITHMS
                                                      Ready made implementations for many typical NLP
                                                      tasks such as tokenisation, POS tagging, sentence
                                                      splitting, named entity recognition, co-reference
                                                      resolution, machine learning, etc.
                                                      USER INTERFACE
                                                      Comprehensive tool set for data editing and
INTEGRATION                                           visualisation, rapid application development, manual
Leveraging the power of other projects such as:       annotation, ontology management.
• Information Retrieval: Lucene (Nutch, Solr),
  Google and Yahoo search APIs, MG4J;
• Machine Learning: Weka, MaxEnt, SVMLight, etc.;
• Ontology Support: Sesame and OWLIM;
• Parsing: RASP, Minipar, and SUPPLE;
• Other: UIMA, Wordnet, Snowball, etc.
Friendly and active community of developers and
users offers efficient help. Commercial support
available from Ontotext and Matrixware.
Reference implementation in ISO TC37/SC4 LIRICS
project; supports XCES, ACE, TREC etc. formats;

Text Analytics 2009: User Perspectives

GATE was first released in 1996, then completely re-designed, re-written, and re-released in 2002. The
system is now one of the most widely-used systems of its type and is a comprehensive infrastructure for
language processing software development.
The new UIMA architecture from IBM/Apache has taken inspiration from GATE and IBM have paid the
University of Sheffield to develop an interoperability layer between the two systems.
Key features of GATE are:
•   Component-based development reduces the systems integration overhead in collaborative research.
•   Automatic performance measurement of Language Engineering (LE) components promotes quantitative
    comparative evaluation.
•   Distinction between low-level tasks such as data storage, data visualisation, discovery, and loading of
    components and the high-level language processing tasks.
•   Clean separation between data structures and algorithms that process human language.
•   Consistent use of standard mechanisms for components to communicate data about language, and use
    of open standards such as Unicode and XML.
•   Insulation from idiosyncratic data formats (GATE performs automatic format conversion and enables
    uniform access to linguistic data).
•   Provision of a baseline set of LE components that can be extended and/or replaced by users as required.
Text Analysis (TA) is a process which takes
unseen texts as input and produces fixed-
format, unambiguous data as output. This data
may be used directly for display to users, or may
be stored in a database or spreadsheet for later
analysis, or may be used for indexing purposes
in Information Retrieval (IR) applications.
TA covers a family of applications including
named entity recognition, relation extraction,
and event detection.
GATE has been used for TA applications in
domains including bioinformatics, health and
safety, and 17th century court reports.
TA systems built on GATE have been evaluated
among the top ones at international competitions (MUC, ACE, Pascal). A system built by the GATE team
came top in two of three categories in the NTCIR 2007 patent classification competition.
•   GATE Developer: an integrated development environment for language processing components
    bundled with the most widely used Information Extraction system and a comprehensive set of other
•   GATE Embedded: an object library optimised for inclusion in diverse applications giving access to all the
    services used by GATE Developer and more
•   GATE Teamware: a collaborative annotation environment for high volume factory-style semantic
    annotation projects built around a workflow engine and the GATE Cloud backend web services
•   GATE Cloud: a parallel distributed processing engine that combines GATE Embedded with a heavily
    optimized service infrastructure
•   Ontotext KIM: UIs demonstrating our multiparadigm approach to information management, navigation
    and search
•   Ontotext Mimir: (Multi-paradigm Information Management Index and Repository) a massively scaleable
    multiparadigm index built on Ontotext's semantic repository family, GATE's annotation structures
    database plus full-text indexing from MG4J
Sponsored by:,               Contact: Prof. Hamish Cunningham
Research funding: EU, UK Research Councils and JISC

Text Analytics 2009: User Perspectives

Solution Profile: IxReveal
IxReveal is a leading analytics software company that transcends current search and business
intelligence technologies. Our patented platforms transform large volumes of unstructured and
structured data into actionable intelligence, while enabling automatic and collaborative sharing
of concepts, connections, and findings.
Clients include global corporations, financial institutions, health organizations, and major
government agencies with data-intensive needs in areas such as fraud, security, finance, crime,
and intelligence.

                     is aimed at helping analysts in organizations
                     solve business problems and making
informed business decisions by leveraging their investment in        Law Enforcement: “uReveal
collecting data. Organizations have spent millions of dollars in     made our analysts ridiculously
collecting and storing information like crime incidents, claims,     efficient.”
customer calls, emails etc. With uReveal, they are able to combine   - Crime Analysis Manager
the structured and unstructured data to find meaningful trends
and patterns to fight crime and insurance fraud and to reshape the organization to be customer focused.
                                      uReveal provides the bottom or top-line changing ability to analyze
                                      huge volumes of textual data. It works with various data sources
 Insurance: “Level of accuracy of     like existing search infrastructures, databases containing textual
 suspicious claims identification     information, emails, and content management systems. uReveal’s
 increased five-fold and false        powerful decision support capabilities are finally making it possible
 positives decreased.”                to find trends and patterns and zero in on critical slices of
 - Insurance Claims Manager           information buried deep within the text.
                                        uReveal is a tool that has been developed for analysts, putting
them in control by enabling them to focus their precious time on value-added analysis - instead of having
to read all the documents returned. It is designed for small to mid-sized workgroups that work with vast
amount of free-form information as part of their jobs.
With an intuitive and highly configurable user interface      Insurance: “This technology not only helps
and patent pending analytics (such as relationship            our analysts become very efficient but
discovery and integrated charting/graphing                    helps us save on legal costs as well.”
capabilities), uReveal users are able to create a             - Workers Comp Fraud Manager
personalized environment to get their job done faster.

uReveal is the solution for analytical teams that work with unstructured information and provide decisive
insight as part of a mission critical business process. Using uReveal, they can both find and substantiate
business insights and recommendations, pointing back to the unstructured information as validation.

Text Analytics 2009: User Perspectives on Solutions and Providers
Text Analytics 2009: User Perspectives on Solutions and Providers
Text Analytics 2009: User Perspectives on Solutions and Providers
Text Analytics 2009: User Perspectives on Solutions and Providers
Text Analytics 2009: User Perspectives on Solutions and Providers
Text Analytics 2009: User Perspectives on Solutions and Providers
Text Analytics 2009: User Perspectives on Solutions and Providers

More Related Content

What's hot

Digital marketing framework
Digital marketing framework Digital marketing framework
Digital marketing framework
Factors affecting customer loyalty in telecom sector in india
Factors affecting customer loyalty in telecom sector in indiaFactors affecting customer loyalty in telecom sector in india
Factors affecting customer loyalty in telecom sector in india
Nicole Valerio
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
The-State-Of-Internet-Marketing-Research-(2005-2012)--A-Systematic-Review-Usi...Janarthanan B
A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone
A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone  A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone
A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone
Use of secondary data in marketing analytics
Use of secondary data in marketing analyticsUse of secondary data in marketing analytics
Use of secondary data in marketing analytics
Digital marketing 1
Digital marketing 1Digital marketing 1
Digital marketing 1
Bi and big data
Bi and big dataBi and big data
Bi and big data
bharati k
Extended summary of "A different cup of TI? The added value of commercial thr...
Extended summary of "A different cup of TI? The added value of commercial thr...Extended summary of "A different cup of TI? The added value of commercial thr...
Extended summary of "A different cup of TI? The added value of commercial thr...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Piyush Malik
IAEME Publication
Evaluating Taxonomies
Evaluating TaxonomiesEvaluating Taxonomies
Evaluating Taxonomies
Joseph Busch
Whitepaper key market research challenges
Whitepaper key market research challengesWhitepaper key market research challenges
Whitepaper key market research challenges
QB Ireland
ITC Infotech
Marketing Research Ch4
Marketing Research Ch4 Marketing Research Ch4
Marketing Research Ch4
Personalizing Search
Personalizing SearchPersonalizing Search
Personalizing Search
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Mirko Marras
Market Reasearch Brochure : Transvisionarysolutions
Market Reasearch Brochure : TransvisionarysolutionsMarket Reasearch Brochure : Transvisionarysolutions
Market Reasearch Brochure : Transvisionarysolutions
Transvisionary Solutions
Review of business intelligence and portfolios performance with case study
Review of business intelligence and portfolios performance with case studyReview of business intelligence and portfolios performance with case study
Review of business intelligence and portfolios performance with case study
Alexander Decker

What's hot (20)

Digital marketing framework
Digital marketing framework Digital marketing framework
Digital marketing framework
Factors affecting customer loyalty in telecom sector in india
Factors affecting customer loyalty in telecom sector in indiaFactors affecting customer loyalty in telecom sector in india
Factors affecting customer loyalty in telecom sector in india
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone
A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone  A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone
A Comparative Analysis of Factor Effecting the Buying Judgement of Smart Phone
Use of secondary data in marketing analytics
Use of secondary data in marketing analyticsUse of secondary data in marketing analytics
Use of secondary data in marketing analytics
Digital marketing 1
Digital marketing 1Digital marketing 1
Digital marketing 1
Bi and big data
Bi and big dataBi and big data
Bi and big data
Extended summary of "A different cup of TI? The added value of commercial thr...
Extended summary of "A different cup of TI? The added value of commercial thr...Extended summary of "A different cup of TI? The added value of commercial thr...
Extended summary of "A different cup of TI? The added value of commercial thr...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Evaluating Taxonomies
Evaluating TaxonomiesEvaluating Taxonomies
Evaluating Taxonomies
Whitepaper key market research challenges
Whitepaper key market research challengesWhitepaper key market research challenges
Whitepaper key market research challenges
Marketing Research Ch4
Marketing Research Ch4 Marketing Research Ch4
Marketing Research Ch4
Personalizing Search
Personalizing SearchPersonalizing Search
Personalizing Search
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Market Reasearch Brochure : Transvisionarysolutions
Market Reasearch Brochure : TransvisionarysolutionsMarket Reasearch Brochure : Transvisionarysolutions
Market Reasearch Brochure : Transvisionarysolutions
Review of business intelligence and portfolios performance with case study
Review of business intelligence and portfolios performance with case studyReview of business intelligence and portfolios performance with case study
Review of business intelligence and portfolios performance with case study

Similar to Text Analytics 2009: User Perspectives on Solutions and Providers

Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET Journal
TLNBusinessAnalytics_researchPoster_FinalYi Qi
Content analytics
Content analyticsContent analytics
Content analytics
Mayank Tyagi
Big Data Update - MTI Future Tense 2014
Big Data Update - MTI Future Tense 2014Big Data Update - MTI Future Tense 2014
Big Data Update - MTI Future Tense 2014
Hawyee Auyong
Certified Business Analytics Specialist (CBAS)
Certified Business Analytics Specialist (CBAS) Certified Business Analytics Specialist (CBAS)
Certified Business Analytics Specialist (CBAS)
Global Science and Technology Forum
Mit tech review_machinelearning
Mit tech review_machinelearningMit tech review_machinelearning
Mit tech review_machinelearning
Abhishek Sood
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by Actuate
Edgar Alejandro Villegas
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - Actuate
Edgar Alejandro Villegas
All That Glitters Is Not Gold Digging Beneath The Surface Of Data Mining
All That Glitters Is Not Gold  Digging Beneath The Surface Of Data MiningAll That Glitters Is Not Gold  Digging Beneath The Surface Of Data Mining
All That Glitters Is Not Gold Digging Beneath The Surface Of Data Mining
Jim Webb
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
How to use Gartner Database
How to use Gartner DatabaseHow to use Gartner Database
How to use Gartner Database
CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDDavid Darrough
Pet Microchip Market.pdf
Pet Microchip Market.pdfPet Microchip Market.pdf
Pet Microchip Market.pdf
Deriving Business Value from Big Data using Sentiment analysis
Deriving Business Value from Big Data using Sentiment analysisDeriving Business Value from Big Data using Sentiment analysis
Deriving Business Value from Big Data using Sentiment analysis
CTRM Center
KNX Products Market.pdf
KNX Products Market.pdfKNX Products Market.pdf
KNX Products Market.pdf
Wireless RAN Market.pdf
Wireless RAN Market.pdfWireless RAN Market.pdf
Wireless RAN Market.pdf

Similar to Text Analytics 2009: User Perspectives on Solutions and Providers (20)

Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
Content analytics
Content analyticsContent analytics
Content analytics
Big Data Update - MTI Future Tense 2014
Big Data Update - MTI Future Tense 2014Big Data Update - MTI Future Tense 2014
Big Data Update - MTI Future Tense 2014
Certified Business Analytics Specialist (CBAS)
Certified Business Analytics Specialist (CBAS) Certified Business Analytics Specialist (CBAS)
Certified Business Analytics Specialist (CBAS)
Mit tech review_machinelearning
Mit tech review_machinelearningMit tech review_machinelearning
Mit tech review_machinelearning
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - Actuate
All That Glitters Is Not Gold Digging Beneath The Surface Of Data Mining
All That Glitters Is Not Gold  Digging Beneath The Surface Of Data MiningAll That Glitters Is Not Gold  Digging Beneath The Surface Of Data Mining
All That Glitters Is Not Gold Digging Beneath The Surface Of Data Mining
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
How to use Gartner Database
How to use Gartner DatabaseHow to use Gartner Database
How to use Gartner Database
CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
Pet Microchip Market.pdf
Pet Microchip Market.pdfPet Microchip Market.pdf
Pet Microchip Market.pdf
Deriving Business Value from Big Data using Sentiment analysis
Deriving Business Value from Big Data using Sentiment analysisDeriving Business Value from Big Data using Sentiment analysis
Deriving Business Value from Big Data using Sentiment analysis
KNX Products Market.pdf
KNX Products Market.pdfKNX Products Market.pdf
KNX Products Market.pdf
Wireless RAN Market.pdf
Wireless RAN Market.pdfWireless RAN Market.pdf
Wireless RAN Market.pdf

More from Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
Seth Grimes
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Seth Grimes
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Seth Grimes
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
Emotion AI
Emotion AIEmotion AI
Emotion AI
Seth Grimes
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Seth Grimes
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Seth Grimes
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Seth Grimes
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Seth Grimes
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Seth Grimes
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Seth Grimes
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
Seth Grimes
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis
Seth Grimes

More from Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Emotion AI
Emotion AIEmotion AI
Emotion AI
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis

Recently uploaded Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance

Recently uploaded (20) Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Text Analytics 2009: User Perspectives on Solutions and Providers

  • 1. Text Analytics 2009: User Perspectives on Solutions and Providers Seth Grimes An Alta Plana research study Sponsored by
  • 2. Text Analytics 2009: User Perspectives Table of Contents Executive Summary................................................................................................................... 3 Text Analytics Basics ................................................................................................................ 4 Discovering Meaning in Text.....................................................................................................4 Software and Solution Market Overview.................................................................................. 7 Applications and Sources ............................................................................................................ 7 Demand-Side Perspectives ........................................................................................................ 9 Study Context..............................................................................................................................9 About the Survey ....................................................................................................................... 10 Demand-Side Study 2009: Response ......................................................................................... 13 Q1: Length of Experience ........................................................................................................... 13 Q2: Application Areas ................................................................................................................ 13 Q3: Information Sources ........................................................................................................... 14 Q4: Return on Investment ......................................................................................................... 15 Q5: Mindshare ............................................................................................................................ 15 Q6: Spending ............................................................................................................................. 16 Q8: Satisfaction ......................................................................................................................... 16 Q9: Overall Experience ............................................................................................................. 16 Q12: Like and Dislike ................................................................................................................. 18 Q13: Information Types ............................................................................................................ 19 Q14: Important Properties & Capabilities ................................................................................ 20 Additional Analysis .................................................................................................................. 21 Selected Cross-tabulations .........................................................................................................21 Interpretive Limitations ............................................................................................................ 22 About the Study ....................................................................................................................... 24 Solution Profile: Attensity ....................................................................................................... 26 Solution Profile: Clarabridge ................................................................................................... 28 Solution Profile: GATE ........................................................................................................... 30 Solution Profile: IxReveal ......................................................................................................... 32 Solution Profile: Nstein ........................................................................................................... 34 Solution Profile: SAP BusinessObjects ................................................................................... 36 Solution Profile: TEMIS ......................................................................................................... 38 Published May 31, 2009 under the Creative Commons Attribution 3.0 License. 2
  • 3. Text Analytics 2009: User Perspectives Executive Summary The global text-analytics market is growing at a very rapid pace, an estimated 40% in 2008, creating a $350 million market for software and vendor supplied support and services. The total business value generated by text-analytics reliant information products, in-house development, service providers, applications such as e-discovery, and research surely multiplies this figure eight-fold. The author projects 2009 market growth up to 25% despite the economic downturn. Market Factors A number of factors have impelled sustained text-analytics market growth. The technology – text mining and related visualization and analytical software – continues to deliver unmatched capabilities both in early-adopter domains such as intelligence and the life sciences and in business sectors that have embraced text analytics more recently, in the last 3-5 years. These latter sectors include, notably, media and publishing, financial services and insurance, travel and hospitality, and consumer products and retail. Business and technical functions such as customer support and satisfaction, brand and reputation management, claims processing, human resources, media monitoring, risk management and fraud, and search have fueled recent growth. No single organization or approach dominates the market. While existing players have been very successful, they and new entrants continue to innovate, offering cutting-edge capabilities, for instance in sentiment analysis, as well as in newer, as-a- service and mash-up ready delivery models and capabilities targeted to market niches. Text Analytics 2009: User Perspectives Insights into the question, “What do current and prospective text-analytics users really think of the technology, solutions, and solution providers?” will help providers craft products and services that better serve users. Insights will guide users seeking to maximize benefit for their own organizations. Alta Plana conducted a spring 2009 survey to explore the topic. This report, “Text Analytics 2009: User Perspectives on Solutions and Providers,” presents findings drawn from 116 responses, the majority of whom already use text analytics. The study was supported by seven sponsors but is editorially independent, designed and conducted by industry analyst and consultant Seth Grimes, a recognized expert in the application of text analytics. Key Study Stats The following are key study findings: Top business applications of text analytics for respondents are a) Brand / product / reputation management (40% of respondents), b) Competitive intelligence (37%), and c) Voice of the Customer / Customer Experience Management (33%) and d) other Research (33%). These applications match a focus on on-line sources: a) blogs and other social media (47%), b) news articles (44%), and c) on-line forums (35%) as well as direct customer feedback in the form of d) e-mail and correspondence (36%) and customer/market surveys (34%). Users with 2 years or more experience prefer tools that support specialized dictionaries, taxonomies, or extraction rules and they often like open source. Prospective users expect to focus their initial text analytics work on inside- the-firewall feedback sources: e-mail, surveys, and contact center materials. Prospective users have high ROI hopes. Use of each of six different measures, led by increased sales to existing customers, is favored by over 50% of respondents who are not current users. Other measures are not far behind. 3
  • 4. Text Analytics 2009: User Perspectives Text Analytics Basics The term text analytics describes software and transformational steps that discover business value in “unstructured” text. The aim is to improve automated text processing. Most everything people do with electronic documents falls into one of four classes: 1. Compose, publish, manage, and archive. 2. Index and search. 3. Categorize and classify according to metadata & contents. 4. Summarize and extract information. Text analytics enhances the first and second sets of functions and enables the third and fourth. The remainder of this section will at the technology, and the section after will look at the market and applications. Discovering Meaning in Text Text analytics encompasses applications of the technology in government, science, and industry and for cross-cutting tasks that range from information retrieval to text- fueled investigative analyses. Text analytics can be seen as a subspecies of business intelligence, and capabilities will be an essential component of the eventual creation of the Semantic Web. Structure in Text Text – news and blog articles, scientific papers, spoken call-center conversations, survey responses, product reviews posted to on-line forums, this report – is replete with structure. Humans (relatively easily) learn to use this structure – the morphology of individual words, the syntax the governs the composition of expressions, the grammar behind phrases and sentences, and the larger-scale structure of text as organized and presented in Web pages, e-mail, newspapers, books, and myriad other forms – to both understand and generate text. We are able to do this without conscious thought, coupled with a grasp of context, knowledge, and emotion that allows us to understand often-complex interactions. Text-analytics software technology – text mining and related visualization and analytical tools – enables machine treatment of text that replicates, automates, and extends human capabilities. Sense-Making through Statistics The earliest approaches to automated text analysis applied statistical methods to text. Consider Hans Peter Luhn‟s 1958 IBM Journal paper, “The Automatic Creation of Literature Abstracts”1, which envisaged application of statistics for sense-making and summarization. Luhn wrote, “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” Luhn illustrated his approach, as shown in the figure below, with the kind of frequency analysis that is performed today by search-engine optimization (SEO) tools and software such as Wordle that generates word and tag clouds. Luhn 1 -- paper is behind a “paywall.” 4
  • 5. Text Analytics 2009: User Perspectives additionally proposed a Keyword-in-Context (KWIC) indexing system that is at the root of modern information retrieval methods. “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance": H.P. Luhn Vector Space Methods Vector-space models became the prevailing approach to representing documents for information retrieval, classification, and other tasks. The text content of a document is reduced to an unordered “bag of words” that becomes a point in a high-dimensional vector space that may embed the word content of many documents as illustrated in the diagram that appears to the right2. Approaches such as TF-IDF (term frequency–inverse document frequency) weigh the significance of a term according to its prevalence in a larger document set. We apply additional analytical methods to make text tractable, for instance, latent semantic indexing utilizing singular value decomposition for term reduction / feature selection to create a new, reduced concept space. In plain English, such techniques identify and retain the most important concepts and consolidate or eliminate lesser concepts. Text analytics will typically apply one or more of a number of statistical clustering and classification methods to documents. These methods include Naive Bayes, Support Vector Machines, and k-nearest neighbor clustering. The diagram to the left illustrates the identification of a hyperplane, the red line given a 2-D picture, that best separates the dot-/circle-represented documents into distinct sets. 2 Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975 5
  • 6. Text Analytics 2009: User Perspectives Linguistic Approaches Statistical approaches have a hard time making sense of nuanced human language, an issue that H.P. Luhn foresaw in 1958. Luhn wrote in his visionary paper, cited above, "This rather unsophisticated argument on „significance‟ [inferred from a word‟s frequency of use] avoids such linguistic implications as grammar and syntax. In general, the method does not even propose to differentiate between word forms. Thus the variants differ, differentiate, different, differently, difference and differential could ordinarily be considered identical notions and regarded as the same word. No attention is paid to the logical and semantic relationships the author has established. In other words, an inventory is taken and a word list compiled in descending order of frequency." Consider the following pair of sentences, proposed by Luca Scagliarini of Expert System. The two cases produce the same “bag of words” but their meanings, the data content of the texts, is very different given the switch of fell and gained. The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite gained 6.84, or 0.32 percent, to 2,162.78. The Dow gained 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite fell 6.84, or 0.32 percent, to 2,162.78. Linguistic approaches will, for instance, analyze the parts of speech of a phrase, detecting the subject-verb-object triple that constitutes a factual (or subjective) statement as well as additional, modifying elements. Natural Language Processing Part-of-speech (POS) analysis is typically one of a sequence or pipeline of resolving steps applied to text. Other, typically applied steps include: Tokenization: Identification of distinct elements within a text, usually words, expressions, punctuation markets, white space, etc. Stemming: Identifying variants of word bases created by conjugation, declension, case, and pluralization, e.g., “act” for “acts,” “actor,” and “acted.” Lemmatization: Use of stemming and other techniques, including analysis of context and parts of speech, to associate multiple words or terms with a canonical term. For example, "better" might have "good" as its lemma. Entity Recognition: Look-up in lexicons or gazetteers and use of pattern matching to discern items such as names of people, companies, products, and places and expressions such as e-mail addresses, phone numbers, and dates. Tagging: XML mark-up of distinct elements, a.k.a. text annotation. Entities are one type of “feature” found in text. Other features of interest include: Attributes: A person‟s attributes include age, sex, height, and occupation. Abstract attributes: Properties such as “expensive” and “comfortable.” Concepts: Abstractions of entities, for instance, a category. Metadata: In this context, items that describe a document such as the author, creation date, and title as well a topic tag. Facts and relationships: These include statements such as “Dow fell 46.58.” Subjective data: Covers sentiment, opinions, mood, and other attitudinal data. The next section of the report looks at how the technology is applied. 6
  • 7. Text Analytics 2009: User Perspectives Software and Solution Market Overview What we now see as text analytics was actually, in the late 1950s, put forward as the foundation for a visionary business intelligence system. This system would focus on discovering and communicating relationships (and not just data values) and on business-goal alignment. Knowledge-management questions drove this early BI conceptualization, with answers to questions such as: What is known? Who knows what? Who needs to know? to be derived or discovered via text mining.3 Such systems are technically very difficult to realize, and BI of course developed in other directions. Numerical data, drawn from transactional and operational systems and stored in databases, is far easier to analyze than is information locked in text. BI and related tools and techniques – spreadsheets, reporting, OLAP, data mining – generally do an excellent job of creating business value from this data. In the last few years attention has turned back to text sources. Commercial software vendors – and open source projects – have responded to the opportunity. Applications and Sources Applications of text mining in the life sciences and intelligence date to the 1990s, for purposes that include pharmaceutical lead generation – mining scientific literature to accelerate expensive, time consuming drug-discovery processes – and counter- terrorism. A number of factors – the huge and growing volume of on-line content, advances in search and information retrieval, cheap computing power, and better software – have created a market for application of these same text technologies to a much broader variety of business, scientific, and research problems. Application domains Market awareness has grown immensely in the last 3-5 years, but up-take and experiences have varied by application domain. To study adoption, survey question 2 asked, “What are your primary applications where text comes into play?” It listed the following choices, an attempt to capture the most important application domains: Brand/product/reputation management Competitive intelligence Content management or publishing Customer service E-discovery Financial services Compliance Insurance, risk management, or fraud Law enforcement Life sciences or clinical medicine Product/service design, quality assurance, or warranty claims Research (not listed) Voice of the Customer / Customer Experience Management 3 “BI at 50 Turns Back to the Future,” 7
  • 8. Text Analytics 2009: User Perspectives Information sources In each of the application areas listed above, text analytics enhances existing analyses. It enhances both BI and data mining applied to transactional data and non-automated review of textual sources, a.k.a. reading. By automating the reading process, text analytics allows analysts and researchers to tap material that had not previously been systematically mined. It allows them to work far faster than before and to analyze far greater volumes of information than ever before. Importantly, text analytics can make a huge difference in text analysis and processing costs and enable the creation of new information products and services. Survey question 3 asked about information sources. These sources may be grouped: On-line and social media: blogs and other social media (twitter, social-network sites, etc.); news articles; review sites or forums. Enterprise communications and feedback: chat and/or instant messaging text; contact-center notes or transcripts; customer/market surveys; e-mail and correspondence; employee surveys; point-of-service notes or transcripts; SMS/text messages; warranty claims/documentation; Web-site feedback. Operational materials (of course varying by business): crime, legal, or judicial reports or evidentiary materials; insurance claims or underwriting notes; medical records; patent/IP filings; scientific or technical literature. Application modes The applications themselves vary widely. They may be classified in several (overlapping) groups: Media and publishing systems – the author includes search engines here – use text analytics to generate metadata and enrich and index metadata and content in order to support content distribution and retrieval. Semantic Web applications would fit in this category. Content management systems – and, again, related search tools – use text analytics to enhance the findability of content for business processes that include compliance, e-discovery, and claims processing. Line-of-business systems for functions such as compliance and risk, customer experience management (CEM), customer support and service, human resources and recruiting. Investigative and research systems for functions such as fraud, intelligence and law enforcement, competitive intelligence, and life sciences research. This list is representative and not exhaustive. All listed areas are experiencing strong growth. In certain cases, text-analytics‟ contribution is not at all obvious. Google and other major search engines top their responses to “map massachusetts” and “34+178” and “orcl” with a map, the number 212, and Oracle share data, respectively, enabled by their ability to recognize named entities and expressions. This particular application of text analytics is shallow but reaches a very, very large audience. Solution providers Text-analytics solution providers include a significant cadre of young but mature pure-play software vendors, software giants that have built or acquired text technologies, robust open-source projects, and a constant stream of start-ups, many of which focus on market niches or specialized capabilities such as sentiment analysis. The provider-side is vibrant and doing well despite the adverse economic climate due to the market‟s growing awareness of solution providers‟ ability to respond to business needs and technical challenges alike.4 4 8
  • 9. Text Analytics 2009: User Perspectives Demand-Side Perspectives Alta Plana designed a spring 2009 survey, “Text Analytics demand-side perspectives: users, prospects, and the market,” to collect raw material for an exploration of key text- analytics market-shaping questions: What do customers, prospects, and users think of the technology, solutions, and vendors? What works, and what needs work? How can solution providers better serve the market? Will your companies expand their use of text analytics in the coming year? Will spending on text analytics grow, decrease, or remain the same? It is clear that current and prospective text-analytics users wish to learn how others are using the technology, and solution providers of course need demand-side data to improve their products, services, and market positioning, to boost sales and better satisfy customers. The Alta Plana study therefore has two goals: 1. To raise market awareness and educate current and prospective users. 2. To collect information of value to sponsors. Survey findings, as presented and analyzed in this study report, provide a form of measure of the state of the market, a form of benchmark. They are designed to be of use to everyone who is interested in the commercial text-analytics market. Study Context Text-analytics solutions have been applied to a spectrum of business problems. Provider revenues are booming (for most established providers). Academic and industrial research is only expanding, and there has been a steady pace of emergence of new companies in the field, many of them academic spin-offs. Demand-side views are, anecdotally, quite positive, judging from published user stories and case studies and based on inquiries from organizations that are researching solutions. The author previously explored market questions in a number of papers and articles. These included white papers created for the Text Analytics Summit in 2005, The Developing Text Mining Market,”5 and 2007, “What's Next for Text.”6 Analyst and Provider Analyses The 2007 paper contains a number of telling quotations: “Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” – Philip Russom, the Data Warehousing Institute “Growth is largely driven by the wealth of unstructured information found on the external web, in corporate intranets, document repositories, call- centers, and in customer and employee business communications.” – IBM researcher David Ferrucci Other analysts and solution providers have had a lot to say about text analytics‟ benefits and growth. The article “Perspectives on Text Analytics in 2009”7 is a systematic (albeit informal) survey of industry perspectives that reports provider 5 6 7 9
  • 10. Text Analytics 2009: User Perspectives CEO and CTO and thought-leader responses to the query: “What do you see as the 3 (or fewer) most important text analytics technology, solution or market challenges in 2009?” Responses were informative, based on the respondents‟ own research and, especially, on extensive contact with customers and prospects. In the current context, a market challenge articulated by Aaron B. Brown, IBM program director for ECM Discovery, is particularly telling. That challenge is for solution text-analytics providers to better define business cases. According to Brown, “In the current economic situation, organizations are clamping down on new projects and more than ever looking for hard ROI savings to justify investment. To pass the funding bar, text-analytics solutions, which typically fall in the category of new projects undertaken for business optimization, need to come with solid business cases that demonstrate hard-dollar operational savings based on proven examples. Given the emerging nature of many text- analytics solution areas, this will be a challenge to growth in 2009.” Business cases don‟t rest solely on solution-provider research and assertions, of course. Demand-side experiences and perceptions can and should also contribute. Demand-Side Views A systematic look at the demand side will provide a good complement to provider- side views and to vendor- and analyst-published case studies. Alta Plana‟s 2008 study report, “Voice of the Customer: Text Analytics for the Responsive Enterprise,”8 published by, was our first systematic survey of demand-side perspectives, albeit focused on a particular set of business problems. VoC analysis is frequently applied to enhance customer support and satisfaction initiatives, in support of marketing, product and service quality, brand and reputation management, and other enterprise feedback initiatives. The listening concept is extended to other voice applications: Voice of the Patient, Voice of the Market, etc. Views related in our 2008 study were telling: “Text Analytics is exciting technology, opening up new applications and approaches to solving information needs and supporting decision making for an improved customer experience.” – Michael House, Maritz Research, Division Vice President “We've uncovered concepts and relationships in text that would be too costly – or even impossible – to detect by any other methods. We can now combine multiple data sources to evaluate customer expectations and improve customer satisfaction by employing more one-to-one customer contact and preemptively resolving customer complaints to keep our retention rates high." – Federico Cesconi, Cablecom, head of customer insight and retention About the Survey There were 116 responses to the 2009 survey, which ran from April 13 to May 10. Survey invitations The author solicited responses via: E-mail to the TextAnalytics, Corpora, datamining2, sla-dkm (Special 8 10
  • 11. Text Analytics 2009: User Perspectives Libraries Association, Division for Knowledge Management), sla-dite (SLA Information Technology), Asis-l (American Society for Information Science), and GATE lists and the author‟s personal list. Invitations published in electronic newsletters: Intelligent Enterprise, KDnuggets,, TDWI‟s BI This Week, Text Analytics Summit, and Notices posted to LinkedIn forums and Facebook groups and on twitter. Messages sent by sponsors to their communities. Survey introduction The survey started with a definition and brief description as follow: Text Analytics is the use of computer software to automate: annotation and information extraction from text – entities, concepts, topics, facts, and attitudes, analysis of annotated/extracted information, and document processing – retrieval, categorization, and classification, and derivation of business insight from textual sources. This is a survey of demand-side perceptions of text technologies, solutions, and providers. Please respond only if you are a user, prospect, integrator, or consultant. There are 20 questions. The survey should take you 5-10 minutes to complete. For this survey, text mining, text data mining, content analytics, and text analytics are all synonymous. I'll be preparing a free report with my findings. Thanks for participating! Survey response There is little question that the survey results overweight current text-analytics users – 63% of respondents who answered Q1, “How long have you been using Text Analytics?,” versus 61% of respondents who replied to Q7, “Are you currently using text analytics?” – among the broad set of potential business, government, and academic users. BI market comparison We can infer this overweighting, for example, from market-size figures. The author estimates a $350 million global market for text-analytics software and vendor supplied support and services. By contrast, in March 2009, research firm IDC published a preliminary, 2008 BI-market estimate. IDC‟s sizing “suggests that the business intelligence tools software market grew 6.4% in 2008 to reach $7.5 billion.”9 Former Forrester analyst Merv Adrian estimated $8.4 billion for 2008. A simple, good-enough heuristic says that if the BI market is 20 times the size of the text-analytics market, there are likely around 20 times as many BI users as there are text-analytics users. Data mining comparison Another contrasting data point is that 55% of respondents to a March 2009 KDnuggets poll10 report currently using text analytics on projects. KDnuggets reaches data miners, a technically sophisticated audience who are among the most likely of any market segment to have embraced text analytics. The rate of text-analytics adoption by data miners surely exceeds the rate adoption by any other user sector. 9 10 11
  • 12. Text Analytics 2009: User Perspectives How much did you use text analytics / text mining in 2008? Did not use (45) 45% Used on < 10% of my projects (17) 17% Used on 10-25% of projects (14) 14% Used on 26-50% of my projects (11) 11% Used on over 50% of my projects (14) 14% As an aside, that 52% of KDnuggets respondents stated that in 2009, they would use text analytics more than in 2008, with 42% stating their use would be about the same as in 2008, strongly suggests growth in the user base. 12
  • 13. Text Analytics 2009: User Perspectives Demand-Side Study 2009: Response The subsections that follow tabulate and chart survey responses, which are presented without unnecessary elaboration. Q1: Length of Experience How long have you been using Text Analytics? 70% 60% 50% 40% Response Percentage 30% 20% 10% 0% not using, 6 months one year two years no currently less than 6 to less to less to less four years definite evaluating months than one than two than four or more plans to year years years use Response % 16% 22% 8% 5% 7% 18% 25% Cumulative Response 8% 13% 20% 37% 63% Q2: Application Areas What are your primary applications where text comes into play? Brand / product / reputation management 40% Competitive intelligence 37% Voice of the Customer / Customer Experience … 33% Research (not listed) 33% Customer service 22% Content management or publishing 19% Life sciences or clinical medicine 18% Insurance, risk management, or fraud 17% Financial services 15% E-discovery 15% Product/service design, quality assurance, or … 14% Other 13% Compliance 8% Law enforcement 7% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 13
  • 14. Text Analytics 2009: User Perspectives Q3: Information Sources What textual information are you analyzing or do you plan to analyze? blogs and other social media 47% news articles 44% e-mail and correspondence 36% on-line forums 35% customer/market surveys 34% scientific or technical literature 27% contact-center notes or transcripts 25% Web-site feedback 21% review sites or forums 21% medical records 16% employee surveys 16% insurance claims or underwriting notes 15% chat and/or instant messaging text 15% other 14% crime, legal, or judicial reports or evidentiary materials 13% point-of-service notes or transcripts 12% patent/IP filings 11% SMS/text messages 8% warranty claims/documentation 7% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 14
  • 15. Text Analytics 2009: User Perspectives Q4: Return on Investment Question 4 asked, “How do you measure ROI, Return on Investment? Have you achieved positive ROI yet?” Results are charted from highest to lowest values of the sum of “currently measure” and “plan to measure”: How do you measure ROI, Return on Investment? increased sales to existing 54% customers 51% higher satisfaction ratings improved new-customer 46% acquisition higher customer retention/lower 39% churn reduction in required staff/higher 38% staff productivity Measure or Plan to more accurate processing of 36% Measure claims/requests/casework faster processing of 36% claims/requests/casework Plan to Measure ability to create new information 34% products fewer issues reported and/or 30% Achieved service complaints lower average cost of sales, new 30% & existing customers higher search ranking, Web 28% Currently Measure traffic, or ad response 18% other 0% 20% 40% 60% Q5: Mindshare A word cloud, generated at, seemed a good way to present responses to the query, “Please enter the names of text-analytics companies you have heard of.” 15
  • 16. Text Analytics 2009: User Perspectives Q6: Spending Question 6 asked, “How much did your organization spend in 2008, and how much do you expect to spend in 2009, on text-analytics solutions?” 13% use open source 11% 14% 20% use open source 7% under $50,000 6% under $50,000 7% 8% $50,000 to $99,000 $50,000 to $99,000 $100,000 to $199,999 $100,000 to $200,000 to $499,999 20% 38% $199,999 22% $500,000 or above 34% $200,000 to $499,999 2008 Spending 2009 Spending Q8: Satisfaction Question 8 asked, “Please rate your overall experience – your satisfaction – with text analytics.” Results are as shown: 23% Completely satisfied Satisfied 2% Neutral 2% 53% Disappointed Very disappointed 21% Q9: Overall Experience Question 9 asked, “Please describe your overall experience – your satisfaction – with text analytics.” The following are 32 verbatim responses, lightly edited for spelling and grammar and to mask the two products that were named: We are highly satisfied. Costs were lower than expected due to high degree of automation. Expectations were exceeded. More timely and more fine grained customer insight and market intelligence and competitive intelligence than ever before. It's been a fun journey, but still struggling with how to get to root cause and how far text 16
  • 17. Text Analytics 2009: User Perspectives analytics can get you there vs. need analysts. No one solution addresses every use case. Some solutions better address the up-front creation of dictionaries than others. I would like a more automated system the integrates with our current IS. Not really neutral but it's sort of a love hate thing. There's a very high learning curve, sometimes it's seductive to measure things that aren't relevant - to run things just because you cannot because they tell you anything. But the customers like it - even if they don't understand it. I want to see more applications Pretty good on named entity extraction, fairly good on fact extraction, poor on sentiment analysis. Several possibilities, several applications; Emphasis on efficiency enhancing; solutions; Problems in selling accuracy. I was satisfied of the effectiveness of the tools - specifically for named-entity recognition. Good but still have a ways to go with capabilities OK, it is hard to describe satisfaction of using text analytics tools when we all know how language is ambiguous and complex - we cannot expect too much from automatic processing yet, maybe in the time when neutral networks can be used, but NLP on its own cannot impress us yet I think. Developing part-of-speech tagging for Arabic text, morphological analyzer, to deal with wide range of text domain, formats and genres. Frustration with developing custom dictionaries that allow real-time categorization of content. Pleased with progress in neural analysis of text content. I'm building this all myself using open source tools. I'm extremely satisfied. Hard learning curve, but we have it going now. Excellent. We have pretty low expectations for the accuracy of automated classification techniques, and those are fulfilled but not exceeded. We use automated categorization in building demos, but most of our customers use semi-automated or manual tagging. It has been extremely valuable in certain situations. We always look at the text and verbatims with our [product] software It's great, but most of it is primarily designed for the English language only. As soon as you need other languages, you need a lot of different providers (= increased implementation costs) or you have to pay a lot of money. I have written an entire textbook based upon text analytics and plan to write another. 92% accuracy, 6.7 fold increase in productivity, cut search time by 50% Hundreds of hours of auditors’ time has been saved by a combination of scanning of hard copy evidence, electronic evidence collection, and importing into [product], building business rules from auditors defined keywords to produce first cut analysis classification. Very satisfied - state-of-the-art in text analytics is advancing at a very rapid pace and text- analytics based solutions are able to demonstrate business value addition/ROI. Feedback from our users with the current tools is that they are not meeting their needs, which is why we are looking at other solutions. Difficult implementation into our core software, but now works as designed. We have presented sentiment analysis on a wide range of documents and used the information to be predictive in nature. Text analytics allows us to gain new customer and market insights as well as better competitive intelligence: higher report frequency, automated reporting, lower cost, finer granularity. Great hopes. Long way to go. Too early to tell. 10 million Voice of Customer can be in real time understood. 17
  • 18. Text Analytics 2009: User Perspectives Q12: Like and Dislike Question 12 asked, “What do you like or dislike about your solution or software provider(s)?” Respondents were allowed to enter up to five points. Twenty-seven individuals responded, entering a total of 82 points. One respondent entered “cost” in all five slots. The following table normalizes, classifies as positive or negative, and groups the responses into thematic categories. We take the sum of positive and negative remarks in a category as indicating the category‟s importance, so the chart is sorted in descending order of (sum) number of remarks. What do you like or dislike about your solution or software provider(s)? 14 12 Plus 10 Minus Sum 8 6 4 2 0 18
  • 19. Text Analytics 2009: User Perspectives Q13: Information Types Do you need (or expect to need) to extract or analyze - Other 15% Other entities – phone numbers, e-mail & street 40% addresses Metadata such as document author, publication date, 53% title, headers, etc. Events, relationships, and/or facts 55% Concepts, that is, abstract groups of entities 58% Sentiment, opinions, attitudes, emotions 60% Topics and themes 65% Named entities – people, companies, geographic 71% locations, brands, ticker symbols, etc. 0% 10% 20% 30% 40% 50% 60% 70% 80% Q19: Comments There were twelve comments. Several pushing-the-envelope respondent observations were particularly interesting: “We were shocked at the lack of appreciation for hosted and/or turnkey solutions from many vendors we evaluated in 2008. The product capabilities of many commercial solutions were poorly conceived, leading us to believe that they didn't really understand the potential of text analytics.” “As a market research supplier, my clients cross a number of industries. Thus, lack of scalability is the major obstacle to adopting text analysis for my purpose.” “Twitter data requires new text analytic algorithms, because of the presence of „@person‟ fields, hashtags, and HTML links that have been shortened. As a consequence, "traditional" algorithms don't work. I am developing those algorithms myself, which is yet another reason I use open source tools exclusively.” One other comment is interesting and prompts a response. “We are building an information retrieval product and wish to embed out-of- the-box functionality but with the option to plug in other 3rd party analytical components.” The response is that several frameworks provide a plug-in architecture for the construction of IR and other text-analytics applications. These include: UIMA, the Unstructured Information Management Architecture, an Apache Incubator project that was recently approved as an OASIS standard. GATE, the General Architecture for Text Engineering. Eclipse SMILA, a new SeMantic Information Logistics Architecture project. 19
  • 20. Text Analytics 2009: User Perspectives Q14: Important Properties & Capabilities What is important in a solution? Important Properties & Capabilities ability to use specialized dictionaries, taxonomies, or extraction 62% rules broad information extraction capability 59% deep sentiment/opinion extraction 53% low cost 51% support for multiple languages 39% predictive-analytics integration 37% BI (business intelligence) integration 35% open source 24% ability to create custom workflows 24% sector adaptation (e.g., hospitality, insurance, retail, health care, 23% communications, financial services) media monitoring/analysis interface 22% hosted or "as a service" option 22% supports data fusion / unified analytics 19% interface specialized for your line-of-business 17% vendor's reseller/integrator/OEM relationships with tech or 13% service providers other 9% 0% 20% 40% 60% 80% 20
  • 21. Text Analytics 2009: User Perspectives Additional Analysis The survey was designed so that responses to questions would be easy to interpret and immediately useful without elaborate cross-tabulation or filtering. The exception was cross-tabulation of length of time using text analytics and of whether a respondent is currently using text analytics or not with other variables. Selected Cross-tabulations The author‟s interpretation of survey findings generally supports prior notions, points such as – Length of involvement with text analytics correlates with particularity of requirements. Each bar represents the percentage of respondents in a time category who indicated that “ability to…” is important: 100% 90% 80% Ability to use specialized 70% 60% dictionaries, taxonomies, 50% or extraction rules is 40% 30% important 20% 10% 0% Ability to create custom less than 6 one year two four workflows is important 6 months to less years to years or months to less than two less than more than one years four year years Length of involvement with text analytics correlates with preference for open source: Open source is important versus Time using Text Analytics 60% 40% 20% 0% less than 6 6 months to one year to two years to four years or months less than one less than two less than more year years four years Using / Not Other interesting points come out of contrasting respondents who are already using text analytics with respondents who are still in planning stages. Sources The top responses to “What textual information are you analyzing or do you plan to analyze?” for current users are: blogs and other social media (twitter, social-network 62% sites, etc.) 21
  • 22. Text Analytics 2009: User Perspectives news articles 55% on-line forums 41% e-mail and correspondence 38% customer/market surveys 35% These are on-line and other feedback-rich sources. Their high rate of selection suggests that veteran users have found significant benefit in these sources. By contrast, only three information-type categories were selected by over 26% of respondents who are not yet using text analytics: e-mail and correspondence 37% customer/market surveys 34% contact-center notes or transcripts 29% It‟s easy to infer that the value of on-line materials (social media, news articles, forums), which is evident to current users, has not yet become clear to prospective users. That only a minority chose any particular category suggests some combination of the following, that Prospective users are more broadly distributed across application categories. Prospective users are cautious about how many different sources they tackle initially. The particular top selections suggest that the plurality – the largest portion – of prospective users will focus initially on materials they have on hand that involve interactions with known stakeholders. Web sources can come later. Expectations Prospective users are not similarly guarded in their expectations. When responses to Question 4 “How do you measure ROI, Return on Investment?” are split out by current versus prospective use, six measures are each selected by between 50% and 55% of prospective-user respondents. They are: increased sales to existing customers improved new-customer acquisition higher satisfaction ratings fewer issues reported and/or service complaints faster processing of claims/requests/casework reduction in required staff/higher staff productivity (Of prospective-user respondents, almost a quarter are already using “increased sales to existing customers” as an ROI measure, which make sense. Sales are easily tracked and analyzed by current systems where items such as satisfaction ratings are not.) “Higher customer retention/lower churn” comes in at just under 50% and three others top 38%. These prospective users, and the folks who advise them, would do well to manage and focus their expectations. Interpretive Limitations The number of survey respondents was not large enough to support further useful 22
  • 23. Text Analytics 2009: User Perspectives cross-tabulation of variables beyond the analyses above. In interpreting presented findings, do keep in mind that the survey was not designed or conducted scientifically, that is, with the intention or the actuality of creating a random sample or a statistically robust characterization of the broad market. Findings surely reflect selection bias due to 1) the outlets where the survey was advertised and 2) a likelihood that those individuals who are unaware of text analytics or the potential for text analytics to help them solve their business problems would not respond to the survey. Findings therefore over-represent current text-analytics users and also over-represent, to a lesser extent, the business intelligence and data warehousing communities. Finally, responses to several of the survey questions were not especially illuminating or likely to be of much use to report readers. These questions are, in particular, Question 10. Who is your provider? Question 11. How did you identify and choose your provider? Question 15. What BI (business intelligence) software do you use if any? Question 16. What social media do/would you look to for text-analytics contacts, discussions, or information? Question 17. What industry publications do you receive, on paper or electronically? Question 18. What industry/technical conferences do you attend? 23
  • 24. Text Analytics 2009: User Perspectives About the Study Text Analytics 2009: Users Perspectives on Solutions and Providers reports the findings of a study conducted by Seth Grimes, president and principal consultant at Alta Plana Corporation. Findings were drawn from responses to a spring 2009 survey of current and prospective text-analytics users, consultants, and integrators. The survey asked respondents to relay their perceptions of text-analytics technology, solutions, and vendors. It asked users to describe their organizations‟ usage of text analytics and their experiences. Sponsors The author is grateful for the support of seven sponsors – Attensity, Clarabridge, the University of Sheffield (GATE project), IxReveal, Nstein, SAP, and TEMIS – whose financial contribution enabled him to conduct the current study and publish study findings. The content of the sponsor solution profiles was provided by the sponsors. The survey findings and the editorial content of this report do not necessarily represent the views of the study sponsors. This report, with the exception of the sponsor solution profiles, was not reviewed by the sponsors prior to publication. Media Partners The author acknowledges assistance received from six media partners in disseminating invitations to participate in the survey. Those media partners are Intelligent Enterprise, KDnuggets,,, the Text Analytics Summit, and The Data Warehousing Institute (TDWI). Seth Grimes Author Seth Grimes is an information technology analyst and analytics strategy consultant. He is contributing editor at Intelligent Enterprise magazine, founding chair of the Text Analytics Summit, an instructor for The Data Warehousing Institute (TDWI), KDnuggets contributor, and text analytics channel expert at the Business Intelligence Network. Seth founded Washington DC-based Alta Plana Corporation in 1997. He consults, writes, and speaks on information-systems strategy, data management and analysis systems, industry trends, and emerging analytical technologies. Seth can be reached at, 301-270-0795. 24
  • 25. Text Analytics 2009: User Perspectives Sponsor Solution Profiles 25
  • 26. Text Analytics 2009: User Perspectives Solution Profile: Attensity Business is built on conversations. These customer, partner, and employee conversations are captured in emails, call notes, letters, surveys, forums and other social media, and more. Attensity enables you to use these conversations to drive better relationships with your customers – transforming them into loyal advocates of your business. Attensity delivers the power of sophisticated data and semantic analytics in an integrated suite of easy-to-use business applications, allowing business leaders, customer support personnel, and customers to get relevant and actionable answers fast. An Integrated Suite of Products to Help You Manage the Customer Conversation: Analyze and Respond Attensity's ability to extract valuable insight from free-form text anywhere and transform it into actionable insights offers organizations the opportunity to understand their customers and to manage the entire customer conversation – analyzing and responding to customer needs. Recognized as best-of-breed by leading analysts for more than a decade, our applications, powered by the industry’s leading natural language processing technologies, are designed to automate related business processes, and add the rigor and speed necessary to swiftly identify often subtle relationships and root causes and to respond timely and accurately to customers. Equally important, our easy-to-use business applications are not only designed for analysts, but also for business leaders, researchers, brand and category managers, and customer service representatives, while also used directly by customers to efficiently self- serve. Attensity Voice of the Customer/Market Voice allows your organization to glean and analyze your customers’ candid thoughts about your brand and products, rapidly and accurately understanding and analyzing comments in E-Service records, surveys, and emails, along with the market buzz found in web communities, blogs, product reviews and social media sites. This delivers the actionable insights - authentic customer sentiments and issues around your brand, products, services, your competitors and more -- you need to make smarter decisions and deliver better products and services. Attensity Voice of the Customer/Market Voice features sophisticated integrated reporting and pre- packaged Voice of the Customer extraction domains for fast-time-to-value, detailed sentiment analysis, and an extensive partner solutions network to help you extend the value of your applications. Attensity’s other products include E-Service Suite, Automated Response Management, Research and Discovery and Intelligence Analysis. E-Service Suite offers an Agent Service Portal and a Self-Service application that enables your customers to effectively self-serve while your agents are empowered to extend informed and efficient service support real-time. Attensity Automated Response Management, a part of the E-Service Suite, optimizes and automates up to 100% of the handling of all incoming and outbound customer communications, enabling you deliver a superior customer experience while achieving significant operational efficiency and productivity gains in your contact center. Research and Discovery provides your organization with sophisticated information extraction, advanced classification and enterprise-class search of and access to internal and external data, helping you meet compliance and litigation demands while controlling costs. Intelligence Analysis allows commercial and government organizations to “connect the dots” by delivering automatic extraction and analytical processing of “relational events” from unstructured data –not only who or what, but the “why, when, where and how.” A Relentless Focus on Customer Success Companies across the full industrial spectrum and around the globe are discovering how our advanced solutions help them thrive by helping resolve customer support issues more quickly, enable more accurate research and analysis of customer feedback, and rapidly address and proactively prevent problems while mitigating risk. Across industries, companies are optimizing customer interaction processes in the contact center, deepening customer relations through effective and efficient self-serve support, and growing their competitive edge with Attensity solutions adapted to their industry specific business needs. Attensity’s team of vertical experts allow us to provide expert advice and specialized applications for areas such as aerospace, automotive, consumer packaged goods, contact center outsourcing, financial services and insurance, government and law enforcement, healthcare, hospitality, manufacturing, media and publishing, retail, technology, 26
  • 27. Text Analytics 2009: User Perspectives and telecommunications. Attensity has a strong record of customer success across all of our products, including Voice of the Customer, E-Service, and Research and Discovery. Three of our VoC success stories are presented here. JetBlue Airways | New York-based JetBlue Airways has created a new airline category based on value, service and style. Known for its award-winning service and free TV as much as its low fares, JetBlue is now pleased to offer customers the most legroom throughout coach (based on average fleet-wide seat pitch for U.S. airlines). JetBlue is also America’s first and only airline to offer its own Customer Bill of Rights, with meaningful compensation for customers inconvenienced by service disruptions within JetBlue’s control. JetBlue Airways currently uses Attensity’s Voice of the Customer application in its customer service organization to uncover customer issues, requirements and overall sentiment about the airline. The company’s pilot project demonstrated a significant ability to find key information about customer sentiment and tangible data around how to augment its services. JetBlue uses Attensity VoC to proactively manage and analyze all freeform customer feedback to improve service, marketing, sales and the products they offer. “From our Customer Bill of Rights to our customer advisory council, JetBlue is dedicated to bringing humanity back to air travel,” Bryan Jeppsen, Research Analyst Manager said. “One of the best ways to do that is to listen — truly listen — to our customers. Our commitment with Attensity enables us to mine subtle but important clues from all forms of customer communications to continue improving all aspects of our company. We’re eager to learn as much as we can, and we’re excited to have Attensity’s simple to use yet sophisticated software at our service.” JetBlue customer service analysts use Attensity VoC daily to cull insights and actions from feedback. “Attensity Voice of the Customer offers us the unprecedented ability to automatically extract customer sentiments, preferences and requests we simply wouldn’t find any other way,” according to Jeppsen. “Attensity VOC enables us to intelligently structure, search and integrate the data into our other business intelligence and decision-making systems.” Charles Schwab | For this Global 1000 investment services firm, Attensity is a central part of efforts to understand and act on customer feedback. With hundreds of thousands of interactions per month, the need to understand customer issues, act on signs of dissatisfaction and churn and drive sales and service interactions can be the difference between success and failure. With Attensity they are able to capture these interactions through customer service notes, emails, survey responses and online discussions and analyze them to power customer retention and growth. Attensity Voice of the Customer enables Schwab to analyze customer feedback to drive proactive programs and understand emerging issues and opportunities, communicate key issues and opportunities at the client segment level on a daily basis, and integrate this valuable customer feedback into their SAS analytics platform on their Teradata data warehouse to expand the customer signature and to deepen customer loyalty analytics. Attensity has become integral to Schwab customer program planning and churn identification efforts. The firm has improved satisfaction and been able to mitigate churn via improved direct broker communications with customers and marketing programs. Customer satisfaction, specifically reasons customers are not happy, is directly monitored and specific issues are addressed. Issues can include problems with services, communication, collateral, and specific individual interactions. Attensity also helps the firm dig deep into Net Promoter™ Program results, uncovering reasons customers give low scores and identify as “detractors.” Attensity contributed to important changes to account statements. Most importantly, Attensity reduced the time needed to analyze customer satisfaction issues from almost 1 year to less than one week! Whirlpool | As a $13.2B appliance manufacturer and the #1 appliance manufacturer in the world, Whirlpool focuses on great products and great customer relationships to maintain and grow its global customer base. As a customer-centered company, Whirlpool need to understand the root cause of pain points and brand, product, and service related issues. With the vast amounts of customer service records, emails, survey response and online community forums, there is more than enough data to get and use customer insights to improve customer experiences. When Whirlpool started with Attensity in 2004, the company wanted to be able to leverage the web and over 8.5 million annual customer and repair visit interactions captured in service notes to drive marketing programs, product development, and quality initiatives. Whirlpool has done just that and more. With over 300 Attensity VoC users worldwide, Whirlpool listens and acts on customer data in the service department, its innovation and product developments groups, and in the market every day. With Attensity VoC, Whirlpool gets early warning of safety and warranty issues and has been able to mitigate expensive recalls through rapid change out of defective parts. Whirlpool extrapolates an ~80% reduction in the cost of recalls due to early detection. In addition to Attensity-fueled product quality improvements, Whirlpool better understands customers’ needs and wants – and the competition and what they are doing to win over customers. 27
  • 28. Text Analytics 2009: User Perspectives Solution Profile: Clarabridge Clarabridge was founded with the simple premise of enabling companies to drive business value by understanding key customer and prospect experiences. Now more than ever, consumer-focused companies turn to Clarabridge to help retain customers, attract new customers, cut servicing and operational costs, sell more products to current customers, and develop more relevant products and services. Clarabridge is the leading provider of text mining software for Customer Experience Management (CEM) due to four key strengths: Commitment to CEM applications: Clarabridge’s rapid growth is due to a focus on the value our customers gain from leveraging our VOC solutions. Our staff, technology, customers, and partners are all 100% focused around delivering VOC applications, and our entire company is committed to providing business value for our customers. Speed-to-Value: No other advanced text mining solution can be deployed as efficiently and powerfully as Clarabridge. Whether an implementation is source specific or enterprise-wide, no other vender can compete with the speed in which our customers not only implement but recognize value. Market Leadership: We believe that being a market leader is more than market statistics and sales wins. While Clarabridge dominates these statistics, we believe that being the market leader also means being a thought leader, an innovator and a standard-setting force in the marketplace. Clarabridge is the first company in our industry to organize a specific user group and conference on using text-mining to support VOC. The Best Technology: There are many great technologies in the text mining world. Some are proven in academia and government think tanks, others within very controlled implementations. But no current text mining technology can compete with our ability to deliver repeatable and tangible business value within the commercial space. Enabled by text analytics, CEM provides the opportunity to create innovative offerings from the start while targeting the precise customer segments and later react to customer feedback on desired improvements and enhancements. Text Mining to Support Business Improvements Clarabridge’s content mining process involves three integrated components: 1. Collect and Connect: Clarabridge's pre-built source connectors allow easy access to external and internal customer information, harvesting content from all of your listening posts. Clarabridge’s built-in feedback module allows the design, deployment and capture of surveys, campaigns, community forums and web forms. 28
  • 29. Text Analytics 2009: User Perspectives 2. Mine and Refine: Once all textual content is sourced, Clarabridge extracts meaning through its fully integrated and automated features, so millions of verbatims transform seamlessly into actionable information. Clarabridge deep parsing Natural Language Processing technology extracts parts of speech and linguistic relationships. This output is used for downstream entity & fact extraction, sentiment extraction, categorization, and root cause analysis. 3. Analyze and Discover: Clarabridge provides two interfaces with a range of functional and analytic tools: Clarabridge Reporting and Analysis and Clarabridge Navigator. Analysts and business users can identify key themes and emerging issues, dynamically investigate results, set up alerts and drill into root causes with the full discovery functionality integrated into the software. Case Studies: Technology in Action Today, leading Fortune 1000 companies across all major markets rely on Clarabridge for the essential customer experience intelligence they require for strategic insight and pre- emptive action. Supported by the Clarabridge Content Mining Platform, clients are able to capture the 360-degree view on current customer attitudes and sentiment shifts, rather than to settle for a limited understanding of their Voice of the Customer. Use cases reflect Clarabridge’s successful engagements and their outcomes with clients across a range of major industries. AOL uses Clarabridge to manage, capture and analyze over 5 million website feedback forms for over 150 products in dozens of languages. Clarabridge automatically processes and reports the now quantified insights directly to product teams. A major international airline company uses Clarabridge to capture and analyze over 7 million surveys per year, allowing them to analyze drivers of loyalty and dissatisfaction for all of their customer segments. The airline can better meet the needs of their passengers through improved understanding of the drivers of customer satisfaction. Gaylord Entertainment used Clarabridge to replace their manual guest satisfaction review process with automatic coding, sentiment extraction and reporting. VOC analysis is available near real-time based on the needs of Gaylord employees. Driving more business through high value event planners and raising customer satisfaction scores, Gaylord has had enormous business and customer experience success using Clarabridge. Vision, Experience, and Strength Clarabridge’s goal is to help you fully access your customer experience intelligence—and leverage that information to your advantage. By bridging the gap between your customer’s experience and your brand’s promise, we provide a unique portal into the human dimension of your business. With this insight, you gain the strategic edge in serving your customers, controlling costs and risk, competing resourcefully, and building profitability. When you work with Clarabridge, you work with the management team that had guided the company’s growth and innovation from the start. Each has had decades of experience, bolstered by successful entrepreneurial ventures and strengthened by prior top-level management experience. Executives, who include a nationally recognized entrepreneur and a multiple patent holder, are all published authors and frequent speakers at industry conferences. With a commitment to excellence, partnership model with clients, and fast-paced development processes, Clarabridge is strong from the ground up. What’s more, our financial backing, board advisors, reputation, and partnerships are sound, ensuring our software will evolve to meet your emerging demands. 29
  • 30. Text Analytics 2009: User Perspectives Solution Profile: GATE An Open Source Solution General Architecture for Full Lifecycle for Text Engineering Text Analytics FREE founder member of OASIS/UIMA committee. Open source, licensed under LGPL allowing EFFICIENT unrestricted commercial use, hosted on SourceForge. Optimisations included with the latest version 100% JAVA provide a 20 to 40% speed and memory usage Runs on any platform supporting Java 5 or later. improvement. Developed and tested daily on Linux, Windows, and Highly efficient finite state text processing engine; Mac OS X. many plug-ins with linear execution time. MATURE AND ACTIVELY SUPPORTED POPULAR In development since 1996; now at version 5.0; Assessed as “outstanding” and “internationally around 20 active developers. leading” by an anonymous EPSRC peer review. COMPREHENSIVE Used at thousands of sites: companies, universities and research laboratories, all over the world. Support for manual annotation, performance ~35,000 downloads/year. evaluation, information extraction, [semi-]automatic semantic annotation, and many other tasks. Rolling funding for more than 15 staff at the University of Sheffield. Over 50 plug-ins included with the standard distribution, containing over 70 resource types. Many DATA MANAGEMENT others available from independent sources. Pluggable input filters with out of the box support for XML, HTML, PDF, MS Word, email, plain text, etc. Common in-memory data model built around stand-off annotation, documents and corpora. Persistent storage layer with support for XML or Java serialisation. I/O interoperation with many other systems. STANDARD ALGORITHMS Ready made implementations for many typical NLP tasks such as tokenisation, POS tagging, sentence splitting, named entity recognition, co-reference resolution, machine learning, etc. USER INTERFACE Comprehensive tool set for data editing and INTEGRATION visualisation, rapid application development, manual Leveraging the power of other projects such as: annotation, ontology management. • Information Retrieval: Lucene (Nutch, Solr), Google and Yahoo search APIs, MG4J; • Machine Learning: Weka, MaxEnt, SVMLight, etc.; • Ontology Support: Sesame and OWLIM; • Parsing: RASP, Minipar, and SUPPLE; • Other: UIMA, Wordnet, Snowball, etc. COMMUNITY AND SUPPORT Friendly and active community of developers and users offers efficient help. Commercial support available from Ontotext and Matrixware. STANDARDS BASED Reference implementation in ISO TC37/SC4 LIRICS project; supports XCES, ACE, TREC etc. formats; 30
  • 31. Text Analytics 2009: User Perspectives OVERVIEW GATE was first released in 1996, then completely re-designed, re-written, and re-released in 2002. The system is now one of the most widely-used systems of its type and is a comprehensive infrastructure for language processing software development. The new UIMA architecture from IBM/Apache has taken inspiration from GATE and IBM have paid the University of Sheffield to develop an interoperability layer between the two systems. Key features of GATE are: • Component-based development reduces the systems integration overhead in collaborative research. • Automatic performance measurement of Language Engineering (LE) components promotes quantitative comparative evaluation. • Distinction between low-level tasks such as data storage, data visualisation, discovery, and loading of components and the high-level language processing tasks. • Clean separation between data structures and algorithms that process human language. • Consistent use of standard mechanisms for components to communicate data about language, and use of open standards such as Unicode and XML. • Insulation from idiosyncratic data formats (GATE performs automatic format conversion and enables uniform access to linguistic data). • Provision of a baseline set of LE components that can be extended and/or replaced by users as required. TEXT ANALYSIS Text Analysis (TA) is a process which takes unseen texts as input and produces fixed- format, unambiguous data as output. This data may be used directly for display to users, or may be stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in Information Retrieval (IR) applications. TA covers a family of applications including named entity recognition, relation extraction, and event detection. GATE has been used for TA applications in domains including bioinformatics, health and safety, and 17th century court reports. TA systems built on GATE have been evaluated among the top ones at international competitions (MUC, ACE, Pascal). A system built by the GATE team came top in two of three categories in the NTCIR 2007 patent classification competition. THE GATE FAMILY • GATE Developer: an integrated development environment for language processing components bundled with the most widely used Information Extraction system and a comprehensive set of other plug-ins • GATE Embedded: an object library optimised for inclusion in diverse applications giving access to all the services used by GATE Developer and more • GATE Teamware: a collaborative annotation environment for high volume factory-style semantic annotation projects built around a workflow engine and the GATE Cloud backend web services • GATE Cloud: a parallel distributed processing engine that combines GATE Embedded with a heavily optimized service infrastructure FIRST COUSINS: THE ONTOTEXT FAMILY • Ontotext KIM: UIs demonstrating our multiparadigm approach to information management, navigation and search • Ontotext Mimir: (Multi-paradigm Information Management Index and Repository) a massively scaleable multiparadigm index built on Ontotext's semantic repository family, GATE's annotation structures database plus full-text indexing from MG4J Sponsored by:, Contact: Prof. Hamish Cunningham Research funding: EU, UK Research Councils and JISC 31
  • 32. Text Analytics 2009: User Perspectives Solution Profile: IxReveal IxReveal is a leading analytics software company that transcends current search and business intelligence technologies. Our patented platforms transform large volumes of unstructured and structured data into actionable intelligence, while enabling automatic and collaborative sharing of concepts, connections, and findings. Clients include global corporations, financial institutions, health organizations, and major government agencies with data-intensive needs in areas such as fraud, security, finance, crime, and intelligence. is aimed at helping analysts in organizations solve business problems and making informed business decisions by leveraging their investment in Law Enforcement: “uReveal collecting data. Organizations have spent millions of dollars in made our analysts ridiculously collecting and storing information like crime incidents, claims, efficient.” customer calls, emails etc. With uReveal, they are able to combine - Crime Analysis Manager the structured and unstructured data to find meaningful trends and patterns to fight crime and insurance fraud and to reshape the organization to be customer focused. uReveal provides the bottom or top-line changing ability to analyze huge volumes of textual data. It works with various data sources Insurance: “Level of accuracy of like existing search infrastructures, databases containing textual suspicious claims identification information, emails, and content management systems. uReveal’s increased five-fold and false powerful decision support capabilities are finally making it possible positives decreased.” to find trends and patterns and zero in on critical slices of - Insurance Claims Manager information buried deep within the text. uReveal is a tool that has been developed for analysts, putting them in control by enabling them to focus their precious time on value-added analysis - instead of having to read all the documents returned. It is designed for small to mid-sized workgroups that work with vast amount of free-form information as part of their jobs. With an intuitive and highly configurable user interface Insurance: “This technology not only helps and patent pending analytics (such as relationship our analysts become very efficient but discovery and integrated charting/graphing helps us save on legal costs as well.” capabilities), uReveal users are able to create a - Workers Comp Fraud Manager personalized environment to get their job done faster. uReveal is the solution for analytical teams that work with unstructured information and provide decisive insight as part of a mission critical business process. Using uReveal, they can both find and substantiate business insights and recommendations, pointing back to the unstructured information as validation. 32