Text/Content Analytics 2011:   User Perspectives on  Solutions and Providers                       Seth Grimes           A...
Text/Content Analytics 2011: User PerspectivesTable of ContentsExecutive Summary ............................................
Text/Content Analytics 2011: User PerspectivesExecutive Summary       Text and content analytics have become a source of c...
Text/Content Analytics 2011: User Perspectives           Question-answering and information access           Question-answ...
Text/Content Analytics 2011: User Perspectives               as well as direct customer feedback in the form of:          ...
Text/Content Analytics 2011: User PerspectivesText and Content Analytics Basics       The term text analytics describes so...
Text/Content Analytics 2011: User Perspectives                                                                            ...
Text/Content Analytics 2011: User PerspectivesApplications and Markets          Business users naturally focus on business...
Text/Content Analytics 2011: User Perspectives               postings) – and, at times, great urgency.              In li...
Text/Content Analytics 2011: User Perspectives                   support, first and foremost by listening and responding t...
Text/Content Analytics 2011: User Perspectives           presentation of enterprise, online, and social information derive...
Text/Content Analytics 2011: User Perspectives       projects at universities and similar institutions. Outside research c...
Text/Content Analytics 2011: User PerspectivesDemand-Side Perspectives        Alta Plana designed a 2011 survey, “Text/Con...
Text/Content Analytics 2011: User Perspectives       Survey invitations       The author solicited responses via          ...
Text/Content Analytics 2011: User Perspectives       Survey response       There is little question that the survey result...
Text/Content Analytics 2011: User Perspectives                                                                        The ...
Text/Content Analytics 2011: User PerspectivesDemand-Side Study 2011: Response       The subsections that follow tabulate ...
Text/Content Analytics 2011: User Perspectives              Q2: Application Areas             What are your primary applic...
Text/Content Analytics 2011: User Perspectives               Q3: Information Sources           What textual information ar...
Text/Content Analytics 2011: User Perspectives       The 215 respondents in 2011 chose a total of 962 textual-information ...
Text/Content Analytics 2011: User Perspectives          Q4: Return on Investment          Question 4 asked, “How do you me...
Text/Content Analytics 2011: User Perspectives              Incremental sales lift.              Lowered cost of fraud, ...
Text/Content Analytics 2011: User Perspectives        Q6: Spending        Question 6 asked about 2010 spending and 2011 ex...
Text/Content Analytics 2011: User Perspectives                Availability of professional services/support (n=112, 13 NE...
Text/Content Analytics 2011: User Perspectives                      Experience/satisfaction sentiment polarity            ...
Text/Content Analytics 2011: User Perspectives        I use software to overcome the biases inherent in manual analysis.  ...
Text/Content Analytics 2011: User Perspectives        accuracy is still not a lot.                                        ...
Text/Content Analytics 2011: User Perspectives        Still learning.        Very early days!        Promising but still v...
Text/Content Analytics 2011: User Perspectives       consultant.”       Q11: Provider Selection       Question 11 asked, “...
Text/Content Analytics 2011: User Perspectives        Proof of Concept – evaluated about a dozen or so vendors – have not ...
Text/Content Analytics 2011: User Perspectives        I dont, thats up to my clients. But my advice to them is to begin wi...
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Upcoming SlideShare
Loading in...5
×

Text/Content Analytics 2011: User Perspectives on Solutions and Providers

11,479

Published on

The text analytics market -- plus findings from a 2011 market study -- written by consultant and industry analyst Seth Grimes, Alta Plana Corporation.

Published in: Data & Analytics, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
11,479
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
242
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Text/Content Analytics 2011: User Perspectives on Solutions and Providers

  1. 1. Text/Content Analytics 2011: User Perspectives on Solutions and Providers Seth Grimes An Alta Plana research study Sponsored by Published September 9, 2011 under the Creative Commons Attribution 3.0 License.
  2. 2. Text/Content Analytics 2011: User PerspectivesTable of ContentsExecutive Summary ............................................................................................................................................ 3 Market Size and Growth............................................................................................................................. 3 Growth Drivers ........................................................................................................................................... 3 The 2011 Market ........................................................................................................................................ 4 The Study.................................................................................................................................................... 4 Key Study Findings...................................................................................................................................... 4 About the Study and the Report ................................................................................................................ 5Text and Content Analytics Basics ...................................................................................................................... 6 From Patterns… .......................................................................................................................................... 6 … To Structure ............................................................................................................................................ 7 Beyond Text................................................................................................................................................ 7 Metadata .................................................................................................................................................... 7 A Focus on Applications ............................................................................................................................. 7Applications and Markets .................................................................................................................................. 8 Application modes...................................................................................................................................... 8 Business Domains ....................................................................................................................................... 8 Business Functions ..................................................................................................................................... 9 Technology Domains ................................................................................................................................ 10 Solution Providers .................................................................................................................................... 12Demand-Side Perspectives ............................................................................................................................... 13 Study Context ........................................................................................................................................... 13 About the Survey ...................................................................................................................................... 13 Market Size and the Larger BI Market ...................................................................................................... 15 The Data Mining Community ................................................................................................................... 16Demand-Side Study 2011: Findings .................................................................................................................. 17 Q1: Length of Experience ......................................................................................................................... 17 Q2: Application Areas ............................................................................................................................... 18 Q3: Information Sources .......................................................................................................................... 19 Q4: Return on Investment ........................................................................................................................ 21 Q5: Mindshare.......................................................................................................................................... 22 Q6: Spending ............................................................................................................................................ 23 Q8: Satisfaction ........................................................................................................................................ 23 Q9: Overall Experience ............................................................................................................................. 25 Q10: Providers .......................................................................................................................................... 28 Q11: Provider Selection ............................................................................................................................ 29 Q13: Promoter? ........................................................................................................................................ 31 Q14: Information Types ........................................................................................................................... 32 Q15: Important Properties and Capabilities ............................................................................................ 32 Q16: Languages ........................................................................................................................................ 34 Q17: BI Software Use ............................................................................................................................... 35 Q18: Guidance .......................................................................................................................................... 36 Q19: Comments ....................................................................................................................................... 39 Additional Analysis ................................................................................................................................... 40 Interpretive Limitations and Judgments .................................................................................................. 42About the Study ............................................................................................................................................... 43Solution Profile: AlchemyAPI ............................................................................................................................ 45Solution Profile: Attensity ................................................................................................................................. 47Solution Profile: Basis Technology .................................................................................................................... 49Solution Profile: Language Computer Corp. ..................................................................................................... 51Solution Profile: Lexalytics ................................................................................................................................ 53Solution Profile: Medallia ................................................................................................................................. 55Solution Profile: SAS ......................................................................................................................................... 57Solution Profile: Sybase .................................................................................................................................... 59Solution Profile: Verint Systems Inc.................................................................................................................. 61 2
  3. 3. Text/Content Analytics 2011: User PerspectivesExecutive Summary Text and content analytics have become a source of competitive advantage, enabling businesses, government agencies, and researchers to extract unprecedented value from “unstructured” data. Uptake is strong – software, solutions, and services are delivering significant business value to users in a spectrum of industries – yet the potential of the market remains unreached. These points and more are brought out in Alta Plana’s market study, “Text/Content Analytics 2011: User Perspectives on Solutions and Providers.” Market Size and Growth Tools and solutions now cover the gamut of business, research, and governmental needs. User adoption continues to grow at a very rapid pace, an estimated 25% in 2010, creating an $835 million market for software tools, business solutions, and vendor supplied support and services. These tools and solutions generate business value several times that figure, extrapolating from revenue generated by applications and solutions (for instance, social-media analysis, e-discovery, and search), information products created by mining content, professional services, and research. The addressable market for text/content analytics is much larger. The technologies are a subset of a larger business intelligence, analytics, and performance management software market, which is dominated by solutions that analyze numerical data that originates in enterprise operational systems. Gartner estimated that larger market at $10.5 billion globally in 2010. Yet, given now-broad awareness of the business value that resides in “unstructured” social, online, and enterprise sources, text/content-analytics’ share of the much larger market will surely grow steeply in coming years. Overall, expect annual text/content-analytics growth averaging up to 25% for the next several years. Growth Drivers A number of factors contribute to sustained growth, foremost the growth of social platforms, which have become essential life tools for individuals and an important business marketing, communication, research, and commerce channel. Social Keeping up with Social is a must for every consumer-facing organization, and automated monitoring, measurement, and engagement is the only way to deal with Social’s variety, volume, and velocity. Leading solutions rely on natural-language processing, provided by text/content analytics, to identify and extract facts and sentiment. Expect even lower-end tools to embrace NLP by 2013. Publishing, advertising, and information services Second, text/content analytics is central to competitive online publishing and advertising to effective information access (essentially, next-generation search). These are two sides of a single coin. As applied by content producers and publishers, technologies discover and associate appropriate descriptive and semantic labels with content. The aims are to optimize search findability, to allow content to be stored and retrieved at a fine-grained level (documents as databases), and to enhance the content consumer’s experience interacting with content. As applied by search, content aggregation, online advertising, and information-service providers, the technology fuels situationally appropriate results that respond to the information/service seeker’s context and intent. 3
  4. 4. Text/Content Analytics 2011: User Perspectives Question-answering and information access Question-answering systems such as IBM Watson and Wolfram Alpha are examples of next-generation, analytics-enabled information-access engines, which will play a key role in online commerce, customer support, health-service delivery, and other applications starting by early 2013. Similarly, Semantic Web information resources should finally enter the mainstream by 2014. They will very frequently rely on analytics to semanticize and structure content and support on-the-fly information integration. Rich media Last, content analytics makes sense of rich media. The technology finds and exploits patterns – what’s in a given piece of content and how the content of content changes over time – in speech and sound, images, and video. There are important today content- analytics applications for contact centers, security, general information access, and even in consumer electronics: Witness face detection and tracking in consumer-grade cameras and camcorders. Arguably, we could include analyses of social and enterprise network, mined from e-mail, messaging, online, and social content, under the content-analytics umbrella. The 2011 Market As in prior years, no single solution provider dominates the market. Players range from the largest enterprise software vendors to a stream of new entrants, both commercializing research technologies and bringing solutions to new markets. In between, established enterprise content management (ECM), BI and analytics, search, software tools, and business-solution providers – the sponsors of this study among them – continue to innovate and deliver business value. The Study Alta Plana’s 2011 text/content analytics market study combines a survey-based, quantitative and qualitative examination of usage, perceptions, and plans with observations derived from numerous conversations with solution providers and users. It seeks to answer the question, “What do current and prospective text/content-analytics users really think of the technology, solutions, and solution providers?” Responses will help providers craft products and services that better serve users. Findings will guide users seeking to maximize benefit for their own organizations. Alta Plana received 224 valid survey responses between June 6 and July 9, 2011. This document reports findings and when appropriate, contrasts them with comparable numbers from Alta Plana’s spring-2009 text-analytics market study.1 Key Study Findings The following are key 2011 study findings:  The big news is not news at all: Social is by far the most popular source fueling text/content analytics initiatives. Four of the top 5 information categories are social/online (as opposed to in-enterprise) sources: o blogs and other social media (62%) o news articles (41%) o on-line forums (35%) o reviews (30%)1 “Text Analytics 2009: User Perspectives on Solutions and Providers”: http://altaplana.com/TA2009 4
  5. 5. Text/Content Analytics 2011: User Perspectives as well as direct customer feedback in the form of: o customer/market surveys (35%) o e-mail and correspondence (29%) for an average of 4.5 sources per respondent.  All three top capabilities that users look for in a solution, each garnering over 50% response, relate to getting the most information out of sources: o Broad information extraction capabilities (63%) o Ability to use specialized dictionaries, taxonomies, ontologies, or extraction rules (57%) o Deep sentiment/emotion/opinion extraction (57%) Low cost dropped from 51% of 2009 responses to 38% in 2011.  Top business applications of text/content analytics for respondents are the following: o Brand / product / reputation management (39% of respondents) o Voice of the Customer / Customer Experience Management (39%) o Search, Information Access, or questions Answering (36%) o Competitive intelligence (33%)  Seventy percent of users are Satisfied or Completely Satisfied with text/content analytics and 24% are Neutral with only 7% Disappointed or Very Disappointed. Dissatisfaction is greatest, at 25%, with ease of use, with only 36% satisfied. Only 42% are satisfied with availability of professional services/support.  Only 49% of users are likely to recommend their most important provider. 28% would recommend against their most important provider. About the Study and the Report Seth Grimes, an industry analyst and consultant who is a recognized authority on the application of text analytics, designed and conducted the study “Text/Content Analytics 2011: User Perspectives on Solutions and Providers” and wrote this report. The author is grateful for the support of the nine study sponsors, Verint, Sybase, SAS, Medallia, Lexalytics, Language Computer Corporation, Basis Technology, Attensity, and AlchemyAPI. Their sponsorships allowed him to conduct an editorially independent study that should promote understanding of the text/content analytics market and of user- indicated implementation and operations best practices. The solution profiles that follow the report’s editorial matter were provided by the sponsors and included with only minor editing for to regularize their layout. Otherwise, the author is solely responsible for the editorial content of this report, which was not reviewed by the sponsors prior to publication. 5
  6. 6. Text/Content Analytics 2011: User PerspectivesText and Content Analytics Basics The term text analytics describes software and transformational processes that uncover business value in “unstructured” text via the application of statistical, linguistic, machine learning, and data analysis and visualization techniques. The aim is to improve automated text processing, whether for search, classification, data and opinion extraction, business intelligence, or other purposes. Rough synonyms include text mining, text ETL, and semantic analysis. Terminology choices are typically rooted in history and competitive positioning. Text mining is an extension of data mining and text ETL of the BI world’s extract-transform-load concept. Semantic analysis seems most often used by Semantic Web aficionados, who sometimes use the broader term Semantic Web technologies, which also covers protocols such as RDF, triple stores, query systems, and the like. These text technologies all perform some form of natural language processing (NLP). Content analytics can and should be seen as an extension of capabilities to also cover images, audio and speech, video, and composites, the gamut of information types not generated or held in data fields. (Some organizations use the content analytics label for text analytics on online, social, and enterprise content, typically, published information. These organizations most often have a strong focus on enterprise content management (ECM) systems.) From Patterns… Text, images, speech and other audio, and video are all directly understandable by humans (although not universally: Any given human language – English, Japanese, or Swahili – is spoken by a minority of people, and not everyone recognizes a Beethoven symphony or Nelson Mandela in a photo). Understanding relies on three capabilities: 1) Ability to recognize small- and large-scale patterns. 2) Ability to grasp context and, from context, to infer meaning. 3) Ability to create and apply models. Descriptive statistics provides an NLP starting point: The most frequently used words and terms give an indication of the topics a message or document is about. We can create categories and classify text (a form of modeling) based on notions of statistical similarity. Next steps take advantage of the linguistic structure of text, detectable by machines as patterns. We have word form (“morphology”) and arrangement (grammar and syntax) as well as higher-level narrative and discourse. Usage may be correct (as judged by editors, grammarians, and linguists) or not, whether the language is spoken, formally written, or texted or tweeted: The most robust technologies deal with text in the wild. We apply assets such as lexicons of “named entities”; part-of-speech resolution that can help identify subject, object, relationship, and attributes; and “word nets” that associate words to help in disambiguation, determination of the contextual sense of terms that may have different meanings in different contexts. Yet, in the words of artificial-intelligence pioneer Edward A. Feigenbaum, “Reading from text in general is a hard problem, because it involves all of common sense knowledge. But reading from text in structured domains, I don’t think is as hard.” So some techniques (also) apply knowledge representations such as ontologies to the analysis task. All techniques, however, aim to generate machine-processable structure. 6
  7. 7. Text/Content Analytics 2011: User Perspectives … To Structure NLP outputs, as part of a text-analytics system, are typically expressed in the form of document annotations, that is, in-line or external tags that identify and describe features of interest. Outputs may be mapped into machine-manageable data structures whether relational database records or in XML, JSON, RDF, or another format. Text-extracted data represented in the Semantic Web’s Resource Description Framework (RDF) may form part of a Linked Data system. Text-derived information stored in a relational database may become part of a business intelligence system that jointly analyzes, for instance, DBMS-captured customer transactions and free-text responses to customer-satisfaction surveys. And text-extracted features such as entities, topics, dates, and measurement units may form the basis of advanced semantic search systems. Beyond Text Beyond-text technologies for information-extraction from images, audio, video, and composite media exist but do not match NLP’s sense-making capabilities. Likely most developed is speech-analysis technology that supports indexing and search using phonemes and is capable of detecting emotion in speech via analysis of indicators such as pace, volume, and intonation with contact-center and others applications that include intelligence. Intelligence, along with consumer and social search, motivates work on image analysis, as do marketing and competitive-intelligence related studies of online and social brand mentions and use. Video analytics extends both speech and image analysis, with an added temporal aspect, for security applications and also potential business uses such as study of customer in-store behavior. For beyond-text media, as for text, metadata is of critical importance. Metadata Metadata describes data properties that may include the provenance, structure, content, and use of data points, datasets, documents, and document collections. Content-linked metadata typically includes author, production and modification dates, title, topic(s), keywords, format, language, encoding (e.g., character set), rights, and so on. The metadata label extends to specialized annotations such as part-of-speech and data type. Metadata may be created as part of content production or publication (for instance, the save date captured by a word-processor, a geotag associated with a social update, camera information stored in an image file). It may be appended (for instance via social tagging), or extracted from content via text/content analysis. Whether stored internally within a data object (for instance via RDFa, FOAF, or other microformats embedded in a Web page) or managed externally, in a database or search index, metadata is fuel for a range of applications. A Focus on Applications We will not devote further space in this report to discussion of text- and content-analysis technology. If you do want to learn more about text-analytics history and technology, do continue with the technology sections of Alta Plana’s 2009 study report, “Text Analytics 2009: User Perspectives on Solutions and Providers,” available online at http://altaplana.com/TextAnalyticsPerspectives2009.pdf. As a bridge to survey-derived reporting of user perceptions of the text and content analytics market, solutions, and providers, we will look next at applications. 7
  8. 8. Text/Content Analytics 2011: User PerspectivesApplications and Markets Business users naturally focus on business benefits, whether of analytics or of any other technology or investment. Who are those users? Text and content analytics solutions have a place a) in any business domain, b) for any business function, and c) within any technology stack, that would benefit from automated text/content handling, that is, wherever text/content volume, velocity, and variety, and business urgency, are sufficient to justify costs. Consider a very telling quotation, however: Philip Russom of the Data Warehousing Institute wrote in a 2007 report, “BI Search and Text Analytics: New Additions to the BI Technology Stack,”2 “Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” In the analytics world, we see now that it is not enough to know more. You need to understand how to use knowledge gained, the processes and outcomes necessary to turn insights into ROI. Text and content analytics elements – information sources, insights sought, processes, and ROI measures – will vary by industry and application. In this report section, by way of lead-in to survey findings – applications, information sources, and ROI measures are the subject of survey questions 3, 4, and 5 – we look at text/content analytics adaptation for applications in several industries and for a variety of business functions. Application modes Applications are diverse but may be classified in several (overlapping) groups. Our categorization is an update of 2009’s with social and online addition in particular:  Media, knowledgebase, and publishing systems – the author includes search engines here – use text and content analytics to generate metadata and enrich and index metadata and content in order to support content distribution and retrieval. Semantic Web applications would fit in this category, as would emerging information-access engines.  Content management systems – and, again, related search tools – use text analytics to enhance the findability of content for business processes that include compliance, e-discovery, and claims processing.  Line-of-business and supporting systems for functions such as compliance and risk, customer experience management (CEM), customer support and service, marketing and market research, human resources and recruiting… and newer tasks that include social monitoring, measurement, and engagement.  Investigative and research systems for functions such as fraud, intelligence and law enforcement, competitive intelligence, and science. Where are these applications used? Business Domains Consider a sampling of industry domains where text and content analytics are frequently applied:  In intelligence and counter-terrorism, and in law enforcement, there is broad content variety – languages, format (text, audio, images, and video), sources (news, field reports, communications intercepts, government records, social2 http://www.teradata.com/assets/0/206/308/96d9065a-0240-44f1-b93c-17e08ae6eacc.pdf 8
  9. 9. Text/Content Analytics 2011: User Perspectives postings) – and, at times, great urgency.  In life sciences, for instance for pharmaceutical drug discovery, source materials have been more uniform (scientific literature, clinical reports) and there is no need for real-time response, yet information volumes are huge and complex and the potential payoff – years and millions of dollars shaved off lead-generation and clinical trials processes – to justify very significant investments in text mining.  For financial services and insurance, effective credit, risk, fraud, and legal and regulatory-compliance decision-making involves creation of predictive models via analysis of large volumes of transactional records and often incorporates information mined from text sources such as financial and news reports, e-mail and corporate correspondence, insurance and warranty claims. Automated methods are essential.  Market researchers rely on text analytics to hear and understand market voices. Focus groups are (on their way) out: They are costly, slow, and often unreliable. Surveys still have great value – beyond soliciting opinions, they can serve as an engagement tool – but neither they nor focus groups help researchers hear unprompted views, the attitudes that consumers express to their peers but not in more formal research settings. Why text analytics? Social is hot, yet human analysis, whether or surveys or of social postings, can be inconsistent and don’t scale. Add in text analytics and you have next-generation market research.  As content delivery and consumption shift to digital, search and information- dissemination tools that exploit metadata (publisher-produced, analytically generated metadata, or socially tagged) are essential survival tools for media and publishing organizations. Content analytics creates better targeted, richer content and a much friendlier and more powerful experience for content consumers.  Online and social have fomented an advertising revolution. Targeting is the word, whether based on behaviors (modeled via tracking and clickstream analysis) or on analytically computed matching. Matches may draw from user profiles, context (geography, accessing application, device or machine being used) and inferred intent (for instance from search terms), and the semantic-signatures of the content where ads are to be delivered.  Text analytics provides essential capabilities in support of legal domain e- discovery mandates. Organizations must “produce” materials relevant to lawsuits, a task that would often be impossible without automated text processing, given huge volumes of electronically stored information generated in the course of business. Intellectual property is another legal-domain application. The task is to identify names, terminology, properties, and functions salient to an IP search that seeks to identify, for instance, prior art and possible patent infringement. Business Functions Many business tasks are independent of industry. Every organization of any significant size has in-house customer support, marketing, product development, and similar functions (even while definitions of customer, marketing, and product do still, of course, vary by industry.) Let’s examine the role text and content analytics play for the following:  Customer experience management (CEM) is a signal text/content analytics success story. The aim is to transform customer relationship management (CRM), which captures transactions and interactions, into a set of tools and practices that cover the engagement span from customer acquisition to customer service and 9
  10. 10. Text/Content Analytics 2011: User Perspectives support, first and foremost by listening and responding to the voice of the customer across channels. In plain(er) English, CEM marries text- and speech- sourced information – from e-mail, online forums, surveys, contact-center conversations, and other touchpoints… and also from employee input – with transactional and profile information. The hope is to improve customer satisfaction and operate more efficiently and profitably. Simplistic, reductive indicators such as the Net Promoter Score can only point at issues and challenges. They can neither explain them nor suggest actions or remedies – insights that are accessible (at enterprise scale) only via text and content analytics.  Marketers translate market-research and competitive-intelligence findings into marketing campaigns and advertising and, in cooperation with product developers, into higher quality, more satisfying products and services. It’s all about listening. Steve Rappaport, in his book Listen First!, says we should “Change the research paradigm. Social media listening research should bring about an era of real-time data that anticipates change and can be used to visualize and create a rewarding business future,” as well as “rethink marketing, advertising, and media.” His prescriptions about listening apply across channels and touchpoints, as they do for CEM, with the difference here, for research-related functions, being that we are looking at an aggregate rather than an individualized picture, seeking to hear the voice of the market, again aided by text and content analytics. Our aim is to deliver targeted, compelling advertising via more effective marketing and, of course, superior products and services that better meet customer needs.  Competitive intelligence, in particular, involves mining customer voices, at both individual and aggregate levels, and also business information, for instance about sales, personnel, alliances, and market conditions that indicate opportunities and threats. Ability to extract domain- and sector-focused information from online and social sources and to integrate information from disparate sources in order to derive coherent signals is essential, delivered by analytically rooted technologies.  Business intelligence (BI) was first defined, in the late 1950s, in terms of extraction and reuse of knowledge drawn from textual sources.3 BI took off in a different direction, however, starting in the late 1960s, centering on analysis of numerical data captured in computerized corporate operational and transactional systems. Back to the (1950s) Future: Number crunchers of all stripes recognize the business value of information in text sources. They are seeking, with the help of both major and niche BI and data warehousing vendors, to bring text-sources information into enterprise BI initiatives. Call this integrated analytics, also incorporating geospatial and machine-generated Big Data to bring businesses a step closer to the sought-after (although mythical) 360o-view of the customer (and the market and one’s own business). Technology Domains Last, for context, let’s briefly consider technology domains where text and content analytics come into play, semantics and the Semantic Web, and then look at emerging text analytics applications. Semantic Computing First, redefining, text/content analytics involves the acquisition, processing, analysis, and3 Seth Grimes, “BI at 50 Turns Back to the Future,” InformationWeek, November 21, 2008: http://www.informationweek.com/news/software/bi/211900005 10
  11. 11. Text/Content Analytics 2011: User Perspectives presentation of enterprise, online, and social information derived from text and rich- media sources. The technology is one route to semantics, to generating machine-usable identification of information objects attached to databases, tables, fields, and rows; to corpora, documents, and document content; and to media files, e-mail and text messages. Text/content analytics provides a descriptive route to semantics, making sense of information in-the-wild, as generated by humans (and machines) online, on social platforms, and in everyday business and personal communications, whether written, spoken, or captured in rich media. The alternative route to semantics is prescriptive, generated or captured in the course of content generation, whether via database export or a plug-in to an authoring application. The Semantics market includes technologies for the creation, management, and use of artifacts such as taxonomies, ontologies, thesauruses, gazetteers, semantic networks, controlled vocabularies, and metadata. These artifacts may be generated manually by subject-matter experts. They may be generated automatically by text analytics. And in many situations, a hybrid system involving manual curation of automatically generated artifacts may be in order. Semantics applications include digital content management, publishing, research, and librarianship across a broad set of industrial and government applications. The semantics market includes semantic search, whether open-domain, vertical (applied to a particular information domain), or horizontal (applied in a particular business function). It also includes classification and information integration. Classification, Search, and Integration Semantic computing finds its primary application in classification, search, and integration. Classification determines what a data item or object represents, including how it may be used, in relation to other data items and objects in a data space. This is, admittedly, an abstract and not particularly practical definition. Information integration and search are where semantics finds its most compelling applications. Semantic search is, in essence, “search made smarter, search that seeks to boost accuracy by taming ambiguity via an understanding of context.”4 Several approaches fit under the semantic search umbrella. They include related searches, search-results enrichment, concept searches, faceted search, and more. The common thread is better matching searcher intent (inferred from search context including past searches and the searcher’s profile) to searched-for information content. Semantic search is behind many emerging search-based applications, fueled by text and content analytics, for applications such as e-discovery, faceted navigation for online commerce, and search-driven business intelligence. And it is captured semantics, in the form of data identifiers and descriptions, that enables dynamic, adaptive information integration, where join paths are discovered based on business and application needs, not hard-wired as in until-recent computing generations. The Semantic Web The Semantic Web is, at its root, an information-integration and sharing application, a set of standards and protocols designed to facilitate creation and use of “Web of data.” Eventually, the Semantic Web market will include tools and services that execute knowledge-reliant business transactions over distributed, semantically infused data spaces. We are years from that market. The bulk of Semantic Web focused expenditures are for government funded research4 Seth Grimes, “Breakthrough Analysis: Two + Nine Types of Semantic Search,” InformationWeek, January 21, 2010: http://www.informationweek.com/news/software/bi/222400100 11
  12. 12. Text/Content Analytics 2011: User Perspectives projects at universities and similar institutions. Outside research contexts, business implementations do not extend significantly beyond a) the use of microformats and RDFa (Resource Description Framework–attributes) to allow Web-published structured data to be indexed by search engines to facilitate information access and b) the use of RDF triples as a convenient format for structuring facts for storage in DBMSes supporting graph- database schemas to facilitate integration and query of data from disparate sources. At a certain point however, perhaps in 2-4 years, the Semantic Web will reach a tipping point where its business value, and the revenues generated by technology and solutions sales, licensing, and support, will explode. Value Today At this time, text/content analytics delivers business value that is greater by far than the value delivered by related semantic and Semantic Web technologies. This is because the vast majority of subject information – text, images, audio, and video (a.k.a. content) – is in “unstructured” form, just a string of bytes (and terms, in the case of text) so far as software systems – Web browsers and office productivity tools, content management systems, search engines – are concerned. To make content tractable for business ends, for operational or analytical purposes or in order to monetize content as a product, one must first create structure. To maximize content usability, for most social and for many enterprise sources, generated structure will take into account semantic information extracted from source materials. That is, structure shouldn’t be arbitrary, a matter of sticking information into a set of round pigeonholes for square-peg content. This process of the discovery, extraction, and use of semantic information in content is the domain of text/content analytics solutions. Solution Providers The aggregate characteristics of the text and content analytics solution-provider spectrum are little changed since 2009 although there has been significant turn-over in players. We still have, as reported in 2009, “a significant cadre of young pure-play software vendors, software giants that have built or acquired text technologies, robust open-source projects, and a constant stream of start-ups, many of which focus on market niches or specialized capabilities such as sentiment analysis.” The big change is in delivery mode. The market now favors as-a-service analytics, whether in the form of online applications, cloud provisioned, or provided via Web application programming interfaces (APIs). This shift makes sense.  The most in-demand new information sources are online, social, and on-cloud.  Use of as-a-service, cloud, and via-API applications means low up-front investment, faster time to use, and pay-as-you-go pricing without IT involvement.  Certain providers offer as-a-service access to both historical and current data at attractive costs given the buy-once, sell-many-times economies they enjoy.  Modern applications are designed to draw data via APIs, facilitating application- inclusion of plug-in text and content analytics capabilities. There is every expectation that the solution-provider market will continue to evolve to keep pace with user needs and broad-market business and technical trends. 12
  13. 13. Text/Content Analytics 2011: User PerspectivesDemand-Side Perspectives Alta Plana designed a 2011 survey, “Text/Content Analytics demand-side perspectives: users, prospects, and the market,” to collect raw material for an exploration of key text- analytics market-shaping questions:  What do customers, prospects, and users think of the technology, solutions, and vendors?  What works, and what needs work?  How can solution providers better serve the market?  Will your companies expand their use of text analytics in the coming year? Will spending on text/content analytics grow, decrease, or remain the same? It is clear that current and prospective text/content-analytics users wish to learn how others are using the technology, and solution providers of course need demand-side data to improve their products, services, and market positioning, to boost sales and better satisfy customers. The Alta Plana study therefore has two goals:  To raise market awareness and educate current and prospective users.  To collect information of value to solution providers, both study sponsors and non-sponsors. Survey findings, as presented and analyzed in this study report, provide a form of measure of the state of the market, a form of benchmark. They are designed to be of use to everyone who is interested in the commercial text/content-analytics market. Study Context The author previously explored market questions in a number of papers and articles. These included white papers created for the Text Analytics Summit in 2005, The Developing Text Mining Market,”5 and 2007, “Whats Next for Text.”6 A systematic look at the demand side provides a good complement to provider-side views and to vendor- and analyst-published case studies, including the author’s own. This understanding motivated the 2009 study, “Text Analytics 2009: User Perspectives on Solutions and Providers,” available for free download.7 That research was preceded by Alta Plana’s 2008 study report, “Voice of the Customer: Text Analytics for the Responsive Enterprise,”8 published by BeyeNETWORK.com, a first systematic survey of demand-side perspectives, albeit focused on a particular set of business problems. VoC analysis is frequently applied to enhance customer support and satisfaction initiatives, in support of marketing, product and service quality, brand and reputation management, and other enterprise feedback initiatives. About the Survey There were 224 responses to the 2011 survey, which ran from June 6 to July 9, 2011. (Contrast with 116 responses to the 2009 survey, which ran from April 13 to May 10, 2009.)5 http://altaplana.com/TheDevelopingTextMiningMarket.pdf6 http://altaplana.com/WhatsNextForText.pdf7 http://altaplana.com/TA20098 http://altaplana.com/BIN-VOCTextAnalyticsReport.pdf 13
  14. 14. Text/Content Analytics 2011: User Perspectives Survey invitations The author solicited responses via  E-mail to the TextAnalytics, SentimentAI, Corpora, Lotico, BioNLP, Information- Knowledge-Content-Management, and ContentStrategy lists and the author’s personal list.  Invitations published in electronic newsletters: InformationWeek, BeyeNETWORK, CMSWire, KDnuggets, AnalyticBridge, and Text Analytics Summit.  Notices posted to LinkedIn forums and Facebook groups and on Twitter.  Messages sent by sponsors to their communities. Survey introduction The survey started with a definition and brief description as follow: Text Analytics / Content Analytics is the use of computer software or services to automate • annotation and information extraction from text – entities, concepts, topics, facts, and attitudes, • analysis of annotated/extracted information, • document processing – retrieval, categorization, and classification, and • derivation of business insight from textual sources. This is a survey of demand-side perceptions of text technologies, solutions, and providers. Please respond only if you are a user, prospect, integrator, or consultant. There are 21 questions. The survey should take you 5-10 minutes to complete. For this survey, text mining, text data mining, content analytics, and text analytics are all synonymous. Ill be preparing a free report with my findings. Thanks for participating! Seth Grimes (grimes@altaplana.com, +1 301-270-0795) The introduction ended with the text: Privacy statement: This survey records your IP address, which we will use only in an effort to detect bogus responses. It is your choice whether to provide your name, company, and contact information. That information will not be shared with sponsors without your permission, and if shared with sponsors, it will not be linked to your survey responses. 14
  15. 15. Text/Content Analytics 2011: User Perspectives Survey response There is little question that the survey results overweight current text-analytics users – 73% of respondents who answered Q1, “How long have you been using Text Analytics?” (n=224) versus 78% of respondents who replied to Q7, “Are you currently using text/content analytics?” (n=206) – among the broad set of potential business, government, and academic users. (The difference in percentage is likely due to a higher rate of survey abandonment among non-users. The figures contrast with 63% and 61% in the 2009 survey.) So call this a Pac Man question, one whose response indicates very significant survey selection bias: Are you currently using text/content analytics? Yes No 21.8% 78.2% (n=206) Market Size and the Larger BI Market We can infer overweighting by comparing market-size figures. The author estimates an $835 million 2010 global market for text/content-analytics software and vendor supplied support and services. As the author described in the May 12, 2011 InformationWeek article Text-Analytics Demand Approaches $1 Billion9, “My $835 million market-size estimate covers software licenses, service subscriptions, and vendor-provided technical support and professional services. Despite strong growth, it remains a small fraction of Gartners $10.5 billion 2010 valuation of the broader BI, analytics, and performance- management software market.”10 By contrast, the 2009 text-analytics market report cited the author’s figure of $350 million for the global, 2008 text analytics market. (That figure did not account for search-based applications, which were included in the 2010 market-size estimate.) The 2009 report also cited a 2008 BI-market estimate from research firm IDC: “The business intelligence tools software market grew 6.4% in 2008 to reach $7.5 billion.”119 http://www.informationweek.com/news/software/bi/22950009610 http://www.gartner.com/it/page.jsp?id=164271411 http://www.idc.com/getdoc.jsp?containerId=217443 15
  16. 16. Text/Content Analytics 2011: User Perspectives The Data Mining Community Another contrasting data point is that 65% of respondents to a July 2011 KDnuggets poll12 report (n=121) using text analytics on projects in the preceding year. Results were tallied nine days into the poll, before it was closed, so final numbers may differ from those reported here. The figure in a similar, March 2009 poll was 55% currently using text analytics/text mining. KDnuggets: How much did you use text analytics / text mining in the past 12 months? Used on over 50% of my projects 21.5% Used on 26-50% of my projects 9.9% Used on 10-25% of projects 14.9% Used on < 10% of my projects 19.0% Did not use 34.7% 0% 5% 10% 15% 20% 25% 30% 35% 40% KDnuggets reaches data miners, a technically sophisticated audience who are among the most likely of any market segment to have embraced text analytics. The rate of text- analytics adoption by data miners surely exceeds the rate adoption by any other user sector. As an aside, 49% of KDnuggets respondents stated that in comparison to the last 12 months, in the next 12 they would use text analytics more, whether on additional projects or more intensively on a steady project workload. 43% stated their use would remain about the same and only 8% anticipated less use.12 http://www.kdnuggets.com/2011/07/poll-text-analytics-use.html 16
  17. 17. Text/Content Analytics 2011: User PerspectivesDemand-Side Study 2011: Response The subsections that follow tabulate and chart survey responses, which are presented without unnecessary elaboration. Q1: Length of Experience As in 2009, the 2011 survey opened with a basic question – How long have you been using Text/Content Analytics? 35% 30% 25% 20% 15% 10% 5% 0% not using, 6 months to one year to two years to currently less than 6 four years no definite less than less than less than evaluating months or more plans to use one year two years four years 2009 (n=107) 16% 22% 8% 5% 7% 18% 25% 2011 (n=224) 6% 21% 3% 5% 12% 20% 33% We see that 2011 responses skew to longer experience than measured in 2009. Survey results were not based on a scientifically designed or measured population sample however, neither in 2011 nor in 2009, and given how out of proportion survey-measured experience is to that of the broad business population – the addressable market for text/content analytics likely extends far beyond the currently user base – the most plausible conclusion one can draw from Q1 responses is that 2011 survey outreach failed to bring in the proportion of new and prospective users reached in 2009. Nonetheless, Q1 responses will prove illuminating in analyses of subsequent survey questions, in studying how attitudes vary by length of text/content analytics experience. 17
  18. 18. Text/Content Analytics 2011: User Perspectives Q2: Application Areas What are your primary applications where text comes into play? 39% Brand/product/reputation management 40% Voice of the Customer / Customer Experience 39% Management 33% 39% Search, information access, or Question Answering 36% Research (not listed) 33% 33% Competitive intelligence 37% 26% Customer service/CRM 22% Product/service design, quality assurance, or warranty 15% claims 14% 15% Life sciences or clinical medicine 18% 2011 (n=219) 15% E-discovery 2009 (n=103) 15%Online commerce including shopping, price intelligence, 11% reviews 10% Financial services/capital markets 15% 9% Other 13% 8% Insurance, risk management, or fraud 17% 8% Content management or publishing 19% 7% Military/national security/intelligence 6% Law enforcement 7% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% The 219 respondents in 2011 chose a total of 748 primary applications, an average of 3.4 primary applications per respondent. While there is some category overlap, it is notable that respondents are applying text analytics toward multiple business needs. 18
  19. 19. Text/Content Analytics 2011: User Perspectives Q3: Information Sources What textual information are you analyzing or do you plan to analyze? 62% blogs and other social media 47% 41% news articles 44% 35% on-line forums 35% 35% customer/market surveys 34% 30% review sites or forums 21% 29% e-mail and correspondence 36% 27% scientific or technical literature 27% 23% contact-center notes or transcripts 25% 22% Web-site feedback 21% 21% text messages/SMS/chat 8% 15% 2011 (n=215) employee surveys 16% 14% 2009 (n=100) field/intelligence reports 14% speech or other audio 12%crime, legal, or judicial reports or evidentiary materials 13% 10% medical records 16% 9% point-of-service notes or transcripts 12% 9% patent/IP filings 11% 8% photographs or other graphical images 7% insurance claims or underwriting notes 15% 6% video or animated images 5% warranty claims/documentation 7% 0% 10% 20% 30% 40% 50% 60% 70% 19
  20. 20. Text/Content Analytics 2011: User Perspectives The 215 respondents in 2011 chose a total of 962 textual-information sources, an average of 4.5 sources per respondent. The big news is not news at all: Social sources are by far the most popular and 4 of the top 5 categories are social/online (as opposed to in- enterprise) sources. Despite social’s status, however, it is a source for barely more than 6 out of 10 respondents. 20
  21. 21. Text/Content Analytics 2011: User Perspectives Q4: Return on Investment Question 4 asked, “How do you measure ROI, Return on Investment? Have you achieved positive ROI yet?” There were 164 respondents. Results are charted from highest to lowest values of the sum of “currently measure” and “plan to measure”: How do you measure ROI, Return on Investment? Measure: Achieved Measure: Not Achieved Plan to Measure higher satisfaction ratings 19% 18% 28% increased sales to existing customers 13% 18% 29% ability to create new information products 11% 13% 27% improved new-customer acquisition 9% 15% 25% higher customer retention/lower churn 10% 12% 23% higher search ranking, Web traffic, or ad response 10% 12% 22% reduction in required staff/higher staff productivity 9% 9% 23% fewer issues reported and/or service complaints 9% 6% 23% lower average cost of sales, new & existing customers 5% 7% 23% faster processing of claims/requests/casework 10% 6% 19%more accurate processing of claims/requests/casework 6% 7% 20% 0% 10% 20% 30% 40% 50% 60% 70% Out of 164 respondents, 37.8% (62), report that they have achieved positive ROI according to some measure. Those 62 respondents reported achieving ROI according to a total of 182 measures, that is, 2.94 ROI-achieved measures for each respondent who achieved positive ROI. Out of 164 respondents, 50 are measuring ROI but have not yet achieved positive ROI according to any measure. The 112 respondents who are measuring ROI (whether achieved or not) track a total of 385 measures among them, 3.44 measures per respondent. The following are several of the Other responses given:  Better customer insight, market intelligence, and competitive intelligence.  Content findability.  Creation of scientific knowledge.  Higher employee engagement and better L&D outcomes.  Improvement in existing processes, turnover time. 21
  22. 22. Text/Content Analytics 2011: User Perspectives  Incremental sales lift.  Lowered cost of fraud, more accurate predictive analytics.  Number of action executives can take, estimated dollar savings from risk correction/avoidance.  Patient outcomes.  Providing better data to scholars.  Reduction of Claim Cost.  Stronger understanding of subconscious emotional zones.  We don´t know how to measure it properly. Q5: Mindshare A word cloud, generated at Wordle.net, seemed a good way to present responses to the query, “Please enter the names of companies that you know provide text/content analytics functionality, separated by commas. List up to the first 8 that come to mind.” There were 129 responses, many offering several companies. A bit of data cleansing was done, to regularize names and remove inappropriate responses. Contrast with the 2009 word cloud (deliberately rendered smaller than the 2011 cloud, without an attempt to create sizing consistent between the two clouds) based on 48 response records, as follows: Note that IBM acquired SPSS in mid-2009. 22
  23. 23. Text/Content Analytics 2011: User Perspectives Q6: Spending Question 6 asked about 2010 spending and 2011 expected spending. How much did your organization spend in 2010, and how much do you expect to spend in 2011, on text/content analytics software/service solutions? 90% 80% 7% 70% 3% 6% 6% 60% 2% 7% 4% 7% 7% 50% $1 million or above 9% 40% $500,000 to under $1 million 30% $200,000 to $499,999 30% $100,000 to $199,999 20% 23% $50,000 to $99,000 10% under $50,000 15% 19% use open source 0% 2010 spent (n=176) 2011 expected (n=165) $1 million or above 6% 7% $500,000 to under $1 million 2% 3% $200,000 to $499,999 4% 6% $100,000 to $199,999 7% 7% $50,000 to $99,000 9% 7% under $50,000 23% 30% use open source 15% 19% Questions asked of only current text/content-analytics users. Questions 8 through 13 were posed exclusively to current text/content analytics users, to the 81.2% of the 206 respondents to Q7: Are you currently using text/content analytics? Q8: Satisfaction Question 8 asked, “Please rate your overall experience – your satisfaction – with text analytics.” It offered five categories, listed here with response counts:  Overall experience/satisfaction (n=117, of whom 3 No experience/No opinion).  Ability to solve business problems (n=114, 12 NE/NO).  Solution/technology ease of use (n=112, 5 NE/NO).  Solution/technology performance (n=114, 4 NE/NO). 23
  24. 24. Text/Content Analytics 2011: User Perspectives  Availability of professional services/support (n=112, 13 NE/NO). Responses, which across categories are somewhat anomalous, are as shown: Please rate your overall experience – your satisfaction – with text/content analytics 100% 3% 3% 4% 4% 4% 4% 7% 90% 13% 17% 21% 24% 80% 31% Very disappointed 70% Disappointed 36% Neutral 60% 36% 38% Satisfied 50% Completely satisfied 40% 58% 42% 30% 35% 31% 27% 20% 10% 17% 12% 12% 11% 9% 0% Overall, 70% of current-users respondents who had an opinion reported themselves Satisfied/Completely Satisfied even while the breakout-category counts totaled 59%, 36%, 47%, and 42% Satisfied/Completely Satisfied. We can surmise that the numbers who voiced “No experience/No opinion” for the breakout categories tended to have a favorable overall experience. 24
  25. 25. Text/Content Analytics 2011: User Perspectives Experience/satisfaction sentiment polarity Positive Overall experience / satisfaction Neutral 80% Negative 60% Availability of 40% Ability to solve professional services / 20% business problems support 0% Solution / technology Solution / technology performance ease of use Q9: Overall Experience Question 9 asked, “Please describe your overall experience – your satisfaction – with text analytics.” The following are 49 from among the 63 responses, categorized, lightly edited for spelling and grammar and with the names of three products masked: Happy It works. Excellent. Absolutely essential. Very satisfied, most goals exceeded, big jump in effectiveness and customer satisfaction. Pretty happy given we are in a highly technical different to monitor/track niche. Saving a lot of time for our journalists. We have found having an application with the capabilities to clean and normalize the text and quantitative data, process it to a form to analyze, and run text mining and categorization on an ad hoc or production basis has greatly enhanced my teams capabilities and productivity. We found great value from using a Speech Analytics solution to retain customers and improve the overall customer experience through root-cause analysis. I have been working with text analytics for academic and scientific purposes and I am quite satisfied with results achieved. I work with nurse and social science researchers. They think that a chat with 20 people is research. I tend to analyze hundreds or thousands of free-text comments. 25
  26. 26. Text/Content Analytics 2011: User Perspectives I use software to overcome the biases inherent in manual analysis. It Takes Work Very powerful tool but requires the organizations ability to take action on the insights. Valuable tool; my clients are content to underutilize it, so what is available more than meets our needs. Since we use open source, the ROI is basically how much time you put into the solution and how many problems it solves. We have been successful so far. Very Satisfied but extremely labor intensive We provide this as a tool to our clients in our application for publishing press releases. It works fine but could be better but that is up to us to implement it fully. Once you spend man hours to set up the tool, it is extremely consistent on doing what you tell it to do. I know improvements are coming but Id like more AI from text analytics tools than what is currently offered. Do-It-Yourself is challenging but not impossible. Very cheap to operate. Fairly satisfied – problem is I am sole researcher and data/text clean-up takes too much time given other demands. Ive been a user and vendor of text analytics (in fact, in my early <...> days, we helped coin the phrase “text analytics”). Vendors generally overpromise and have difficulty delivering. Both vendors and customers underestimate the amount of resources required to get it right. So, still hard to use for mainstream purposes. Reservations and complications Steep learning curve. I am currently satisfied, but I believe we (as analysts) are just beginning to fully unlock the full potential of text analytics. On one hand, Im amazed and thrilled that this stuff exists at all. But on the other hand, I havent seen anything that does just what I want it to do. Its opened up opportunities to analyze unstructured data but not at the same level as structured data. Works well at highest level of analysis (e.g. sentiment) but not as well in auto- coding for custom (i.e. project) studies. Tools are good, but lack transparency, ability to explain how conclusions are reached. There is still a lot of work required to optimize this technology since it can currently provide concepts but does not capture context and it’s a lot of slow painful work to get the software to recognize context in which something is mentioned and 26
  27. 27. Text/Content Analytics 2011: User Perspectives accuracy is still not a lot. Unmet needs Very promising technology but some difficulties to - Implement smoothly text mining component into existing information system. - Cope with various languages, formats, volumes, etc. of data. - Measure and demonstrate tangible results in terms of improved information extraction quality. - Assess ROI (reducing processing time / saving resources for core tasks e.g. analysis). Powerful but overly difficult, impenetrable - technology vs. solutions. An emerging and enabling technology in our business with broad applicability. Satisfied in our applications with accuracy and precision but hitherto disappointed with export capability to other applications. Still a volatile market for applications beyond VOC/sentiment analysis. Vendors are eager to please but sometimes overstate the capabilities. However, I still have limited experience in solving real business problems with these tools (I am a consultant). I think this field is in its infancy. Lots of issues with data quality. Sentiment analytics often flawed. Hard to scale or automate. The handful of companies and solutions I came across do not seem to marry or integrate structured and unstructured text easily... Algorithms are not quite available as a function or way to improve accuracy. I feel there is so much more work to be done both on the analysis side and also on the business implementation side. While I work heavily in this area, I wont be more satisfied until I see better end-to-end integration and until I see more effective and systematic use of insights. I do everything myself. The lack of good lexical resources and taxonomies is a real problem that drives up the cost (in manpower) of providing a solution. And the complexity of the infrastructure required vs. the apparent simplicity of the problem (in managers minds) makes it very difficult to adjust expectations. We use <...> and we have to write our own routines to find the text and content that we are interested in. There are plenty of functions that help us with our goals but obviously there is still much that we need to do to higher recall and accuracy. <...> is the only tool which is both open source and professionally useful. However in spite of 20 years of development, it still has a very poor user interface as well as API interface which hinder productivity and acceptance at a beginners level. Skepticism Jury is still out. It’s still evolving, accuracy of results something to watch for in iterations. 27
  28. 28. Text/Content Analytics 2011: User Perspectives Still learning. Very early days! Promising but still very difficult to see quick results. Everything seems to take ages and it’s been a painful learning curve. Hard to trust the automated results when youve been used to achieving 100% with manual human analysis. Still too new. Field as a whole is underperforming what is possible. Though the concept is very appealing, it is still in its native stages, and a lot more possibilities are left to be explored. IBM Watson is a good step ahead in that direction. Very poor, almost useless. Looking ahead On the whole, very satisfied with the range of solutions available and their ease of use. Very much looking forward to watching the technology progress – its obviously not perfect yet. Unlike structured data, getting value out of text analytics tools require understanding of text elements – how to utilize occurrence of different parts of speech, how to interpret different types of sentences like requests, commands, opinionated sentences, etc. Domain knowledge and tunable and adaptable systems are a must for success. Non-availability of trained personnel to provide text mining services leads to dissatisfaction of users. Business end users do not like to use the tools themselves because of the complexity. The process or strategy for text mining needs to be established. Were pretty happy with text analytics and see it as a transformational technology. Most of text analytics problems lie in how it is sold. It is both broad and deep and has a myriad of tools best suited for very different use cases, but customers think "text analytics is text analytics." Really, “text analytics” is a horrible term that needs to be broken up into component parts. Q10: Providers Question 10 asked, “Who is your provider? Enter one or more, separated by commas, most important provider first.” There were 77 response records, listing providers (sorted and without counts): Autonomy, Clarabridge, Colbenson, Content Analyst, Expert System, GATE, IBM, in-house, Lexalytics, LingPipe, Megaputer, MotiveQuest, open source , Open Text, R, Radian6, Rapid-I, Saplo, SAS, Smartlogic, Sysomos, TEMIS, Teradata, TextKernel, Thomson Reuters (including Calais, ClearForest), Verint, Zemanta Note that the survey asked, “Please respond only if you are a user, prospect, integrator, or 28
  29. 29. Text/Content Analytics 2011: User Perspectives consultant.” Q11: Provider Selection Question 11 asked, “How did you identify and choose your provider? (If more than one, limit response to your most important provider.)” Applicability, robust performance, open source. Research. Experience and luck. Very satisfied reference customers with similar applications, most flexible solutions, expertise of consultants, high quality of service, extreme agility, and extremely rapid idea-to-deployment cycles. They contacted us before launch of their first product. Product evaluation in context of business application. Based on business requirements in the framework of a European competitive tender procedure. Advised by a related Web development consultant. We spent about a year evaluating and classifying vendors that in part or whole would fill our needs as expressed in Q9. We decided on using an application with integrated quantitative and qualitative analytic capabilities as the best possibilities. We ended up doing POCs with SAS, SPSS and Megaputer, and ended up choosing the later. We evaluated multiple providers based on (1) tool flexibility – can we customize? (2) accuracy (3) type of content it can tag (4) sentiment methodology (5) price. Main criteria are cost, multi-language capability, and integration with SAS. Competitive bids. Large existing analytics relationship: Tool was an add on. Conducted a thorough investigation of leading providers in the space. Quality and reviews. Personal recommendations. Constructed a needs analysis ranking system. Our needs included ease of integration, tools, ability to produce meaningful results at sub-document (short document) level, ease of (or no) training. Networking and academic partnerships. 29
  30. 30. Text/Content Analytics 2011: User Perspectives Proof of Concept – evaluated about a dozen or so vendors – have not selected a TM vendor as yet. Recommended by a trusted source. Based on recommendations <...> and our own search / lab testing, which brought us to <...>. Introduction from my manager. What my client uses. It was an obvious choice since there was no real alternative on the market (i.e. language is limiting the products). We compared a number of providers and decided to go for <...> that have a local presence and are experts on the Swedish language. Free, for research purposes. Trials based on performance. Price/performance tradeoff and applicability to targeted business problem. I work for the company. Use other languages (Perl) as necessary. Tested various services, rated results. I choose the vendor or tools based upon my client application needs. We do not have a primary provider ... we maintain a library of tools and use many of them in the same project. Trying all major ones. Reputation, personal contacts. Established open-source project. Market research, pricing, case studies and product evaluations. It was recommended to us. Working in-house. I worked for one of them and selected the other on their open source commitment. Proof of Concept. Support for Drupal. Cost, applicability to needs. 30
  31. 31. Text/Content Analytics 2011: User Perspectives I dont, thats up to my clients. But my advice to them is to begin with an understanding of the goals, and work backward to identify the provider. Company demo. Recommendation from experts, and tried and tested different ones. Management mandate, client. RFP to replace an existing legacy system. We already used <...> and had everything we needed to do Proof of Concept; waiting for business reason to acquire <...>. Advanced, scalable LSI technology. Work for them. Ability to mine audio, text, and customer surveys. Q13: Promoter? Question 13 is new with the 2011 survey; we did not ask it in 2009. It is a basic net- promoter type question, without the “net” part: “How likely are you to recommend your most important provider to others who are looking for a text/content analytics solution?” Of 87 responses, 49% were positive, 23% were neutral, and 28% were negative. How likely are you to recommend your most important provider? Extremely likely to recommend against 15% Moderately likely to recommend against 34% Slightly likely to recommend 6% against Neither likely to recommend 7% nor recommend against Slightly likely to recommend Moderately likely to 10% 23% recommend 5% Extremely likely to recommend Promoters outweigh detractors by a net of 21. 31

×