Text/Content Analytics 2011: User Perspectives on Solutions and Providers

  • 10,259 views
Uploaded on

An Alta Plana market study.

An Alta Plana market study.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
10,259
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
214
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Text/Content Analytics 2011: User Perspectives on Solutions and Providers Seth Grimes An Alta Plana research study Sponsored by Published September 9, 2011 under the Creative Commons Attribution 3.0 License.
  • 2. Text/Content Analytics 2011: User PerspectivesTable of ContentsExecutive Summary ............................................................................................................................................ 3 Market Size and Growth............................................................................................................................. 3 Growth Drivers ........................................................................................................................................... 3 The 2011 Market ........................................................................................................................................ 4 The Study.................................................................................................................................................... 4 Key Study Findings...................................................................................................................................... 4 About the Study and the Report ................................................................................................................ 5Text and Content Analytics Basics ...................................................................................................................... 6 From Patterns… .......................................................................................................................................... 6 … To Structure ............................................................................................................................................ 7 Beyond Text................................................................................................................................................ 7 Metadata .................................................................................................................................................... 7 A Focus on Applications ............................................................................................................................. 7Applications and Markets .................................................................................................................................. 8 Application modes...................................................................................................................................... 8 Business Domains ....................................................................................................................................... 8 Business Functions ..................................................................................................................................... 9 Technology Domains ................................................................................................................................ 10 Solution Providers .................................................................................................................................... 12Demand-Side Perspectives ............................................................................................................................... 13 Study Context ........................................................................................................................................... 13 About the Survey ...................................................................................................................................... 13 Market Size and the Larger BI Market ...................................................................................................... 15 The Data Mining Community ................................................................................................................... 16Demand-Side Study 2011: Findings .................................................................................................................. 17 Q1: Length of Experience ......................................................................................................................... 17 Q2: Application Areas ............................................................................................................................... 18 Q3: Information Sources .......................................................................................................................... 19 Q4: Return on Investment ........................................................................................................................ 21 Q5: Mindshare.......................................................................................................................................... 22 Q6: Spending ............................................................................................................................................ 23 Q8: Satisfaction ........................................................................................................................................ 23 Q9: Overall Experience ............................................................................................................................. 25 Q10: Providers .......................................................................................................................................... 28 Q11: Provider Selection ............................................................................................................................ 29 Q13: Promoter? ........................................................................................................................................ 31 Q14: Information Types ........................................................................................................................... 32 Q15: Important Properties and Capabilities ............................................................................................ 32 Q16: Languages ........................................................................................................................................ 34 Q17: BI Software Use ............................................................................................................................... 35 Q18: Guidance .......................................................................................................................................... 36 Q19: Comments ....................................................................................................................................... 39 Additional Analysis ................................................................................................................................... 40 Interpretive Limitations and Judgments .................................................................................................. 42About the Study ............................................................................................................................................... 43Solution Profile: AlchemyAPI ............................................................................................................................ 45Solution Profile: Attensity ................................................................................................................................. 47Solution Profile: Basis Technology .................................................................................................................... 49Solution Profile: Language Computer Corp. ..................................................................................................... 51Solution Profile: Lexalytics ................................................................................................................................ 53Solution Profile: Medallia ................................................................................................................................. 55Solution Profile: SAS ......................................................................................................................................... 57Solution Profile: Sybase .................................................................................................................................... 59Solution Profile: Verint Systems Inc.................................................................................................................. 61 2
  • 3. Text/Content Analytics 2011: User PerspectivesExecutive Summary Text and content analytics have become a source of competitive advantage, enabling businesses, government agencies, and researchers to extract unprecedented value from “unstructured” data. Uptake is strong – software, solutions, and services are delivering significant business value to users in a spectrum of industries – yet the potential of the market remains unreached. These points and more are brought out in Alta Plana’s market study, “Text/Content Analytics 2011: User Perspectives on Solutions and Providers.” Market Size and Growth Tools and solutions now cover the gamut of business, research, and governmental needs. User adoption continues to grow at a very rapid pace, an estimated 25% in 2010, creating an $835 million market for software tools, business solutions, and vendor supplied support and services. These tools and solutions generate business value several times that figure, extrapolating from revenue generated by applications and solutions (for instance, social-media analysis, e-discovery, and search), information products created by mining content, professional services, and research. The addressable market for text/content analytics is much larger. The technologies are a subset of a larger business intelligence, analytics, and performance management software market, which is dominated by solutions that analyze numerical data that originates in enterprise operational systems. Gartner estimated that larger market at $10.5 billion globally in 2010. Yet, given now-broad awareness of the business value that resides in “unstructured” social, online, and enterprise sources, text/content-analytics’ share of the much larger market will surely grow steeply in coming years. Overall, expect annual text/content-analytics growth averaging up to 25% for the next several years. Growth Drivers A number of factors contribute to sustained growth, foremost the growth of social platforms, which have become essential life tools for individuals and an important business marketing, communication, research, and commerce channel. Social Keeping up with Social is a must for every consumer-facing organization, and automated monitoring, measurement, and engagement is the only way to deal with Social’s variety, volume, and velocity. Leading solutions rely on natural-language processing, provided by text/content analytics, to identify and extract facts and sentiment. Expect even lower-end tools to embrace NLP by 2013. Publishing, advertising, and information services Second, text/content analytics is central to competitive online publishing and advertising to effective information access (essentially, next-generation search). These are two sides of a single coin. As applied by content producers and publishers, technologies discover and associate appropriate descriptive and semantic labels with content. The aims are to optimize search findability, to allow content to be stored and retrieved at a fine-grained level (documents as databases), and to enhance the content consumer’s experience interacting with content. As applied by search, content aggregation, online advertising, and information-service providers, the technology fuels situationally appropriate results that respond to the information/service seeker’s context and intent. 3
  • 4. Text/Content Analytics 2011: User Perspectives Question-answering and information access Question-answering systems such as IBM Watson and Wolfram Alpha are examples of next-generation, analytics-enabled information-access engines, which will play a key role in online commerce, customer support, health-service delivery, and other applications starting by early 2013. Similarly, Semantic Web information resources should finally enter the mainstream by 2014. They will very frequently rely on analytics to semanticize and structure content and support on-the-fly information integration. Rich media Last, content analytics makes sense of rich media. The technology finds and exploits patterns – what’s in a given piece of content and how the content of content changes over time – in speech and sound, images, and video. There are important today content- analytics applications for contact centers, security, general information access, and even in consumer electronics: Witness face detection and tracking in consumer-grade cameras and camcorders. Arguably, we could include analyses of social and enterprise network, mined from e-mail, messaging, online, and social content, under the content-analytics umbrella. The 2011 Market As in prior years, no single solution provider dominates the market. Players range from the largest enterprise software vendors to a stream of new entrants, both commercializing research technologies and bringing solutions to new markets. In between, established enterprise content management (ECM), BI and analytics, search, software tools, and business-solution providers – the sponsors of this study among them – continue to innovate and deliver business value. The Study Alta Plana’s 2011 text/content analytics market study combines a survey-based, quantitative and qualitative examination of usage, perceptions, and plans with observations derived from numerous conversations with solution providers and users. It seeks to answer the question, “What do current and prospective text/content-analytics users really think of the technology, solutions, and solution providers?” Responses will help providers craft products and services that better serve users. Findings will guide users seeking to maximize benefit for their own organizations. Alta Plana received 224 valid survey responses between June 6 and July 9, 2011. This document reports findings and when appropriate, contrasts them with comparable numbers from Alta Plana’s spring-2009 text-analytics market study.1 Key Study Findings The following are key 2011 study findings:  The big news is not news at all: Social is by far the most popular source fueling text/content analytics initiatives. Four of the top 5 information categories are social/online (as opposed to in-enterprise) sources: o blogs and other social media (62%) o news articles (41%) o on-line forums (35%) o reviews (30%)1 “Text Analytics 2009: User Perspectives on Solutions and Providers”: http://altaplana.com/TA2009 4
  • 5. Text/Content Analytics 2011: User Perspectives as well as direct customer feedback in the form of: o customer/market surveys (35%) o e-mail and correspondence (29%) for an average of 4.5 sources per respondent.  All three top capabilities that users look for in a solution, each garnering over 50% response, relate to getting the most information out of sources: o Broad information extraction capabilities (63%) o Ability to use specialized dictionaries, taxonomies, ontologies, or extraction rules (57%) o Deep sentiment/emotion/opinion extraction (57%) Low cost dropped from 51% of 2009 responses to 38% in 2011.  Top business applications of text/content analytics for respondents are the following: o Brand / product / reputation management (39% of respondents) o Voice of the Customer / Customer Experience Management (39%) o Search, Information Access, or questions Answering (36%) o Competitive intelligence (33%)  Seventy percent of users are Satisfied or Completely Satisfied with text/content analytics and 24% are Neutral with only 7% Disappointed or Very Disappointed. Dissatisfaction is greatest, at 25%, with ease of use, with only 36% satisfied. Only 42% are satisfied with availability of professional services/support.  Only 49% of users are likely to recommend their most important provider. 28% would recommend against their most important provider. About the Study and the Report Seth Grimes, an industry analyst and consultant who is a recognized authority on the application of text analytics, designed and conducted the study “Text/Content Analytics 2011: User Perspectives on Solutions and Providers” and wrote this report. The author is grateful for the support of the nine study sponsors, Verint, Sybase, SAS, Medallia, Lexalytics, Language Computer Corporation, Basis Technology, Attensity, and AlchemyAPI. Their sponsorships allowed him to conduct an editorially independent study that should promote understanding of the text/content analytics market and of user- indicated implementation and operations best practices. The solution profiles that follow the report’s editorial matter were provided by the sponsors and included with only minor editing for to regularize their layout. Otherwise, the author is solely responsible for the editorial content of this report, which was not reviewed by the sponsors prior to publication. 5
  • 6. Text/Content Analytics 2011: User PerspectivesText and Content Analytics Basics The term text analytics describes software and transformational processes that uncover business value in “unstructured” text via the application of statistical, linguistic, machine learning, and data analysis and visualization techniques. The aim is to improve automated text processing, whether for search, classification, data and opinion extraction, business intelligence, or other purposes. Rough synonyms include text mining, text ETL, and semantic analysis. Terminology choices are typically rooted in history and competitive positioning. Text mining is an extension of data mining and text ETL of the BI world’s extract-transform-load concept. Semantic analysis seems most often used by Semantic Web aficionados, who sometimes use the broader term Semantic Web technologies, which also covers protocols such as RDF, triple stores, query systems, and the like. These text technologies all perform some form of natural language processing (NLP). Content analytics can and should be seen as an extension of capabilities to also cover images, audio and speech, video, and composites, the gamut of information types not generated or held in data fields. (Some organizations use the content analytics label for text analytics on online, social, and enterprise content, typically, published information. These organizations most often have a strong focus on enterprise content management (ECM) systems.) From Patterns… Text, images, speech and other audio, and video are all directly understandable by humans (although not universally: Any given human language – English, Japanese, or Swahili – is spoken by a minority of people, and not everyone recognizes a Beethoven symphony or Nelson Mandela in a photo). Understanding relies on three capabilities: 1) Ability to recognize small- and large-scale patterns. 2) Ability to grasp context and, from context, to infer meaning. 3) Ability to create and apply models. Descriptive statistics provides an NLP starting point: The most frequently used words and terms give an indication of the topics a message or document is about. We can create categories and classify text (a form of modeling) based on notions of statistical similarity. Next steps take advantage of the linguistic structure of text, detectable by machines as patterns. We have word form (“morphology”) and arrangement (grammar and syntax) as well as higher-level narrative and discourse. Usage may be correct (as judged by editors, grammarians, and linguists) or not, whether the language is spoken, formally written, or texted or tweeted: The most robust technologies deal with text in the wild. We apply assets such as lexicons of “named entities”; part-of-speech resolution that can help identify subject, object, relationship, and attributes; and “word nets” that associate words to help in disambiguation, determination of the contextual sense of terms that may have different meanings in different contexts. Yet, in the words of artificial-intelligence pioneer Edward A. Feigenbaum, “Reading from text in general is a hard problem, because it involves all of common sense knowledge. But reading from text in structured domains, I don’t think is as hard.” So some techniques (also) apply knowledge representations such as ontologies to the analysis task. All techniques, however, aim to generate machine-processable structure. 6
  • 7. Text/Content Analytics 2011: User Perspectives … To Structure NLP outputs, as part of a text-analytics system, are typically expressed in the form of document annotations, that is, in-line or external tags that identify and describe features of interest. Outputs may be mapped into machine-manageable data structures whether relational database records or in XML, JSON, RDF, or another format. Text-extracted data represented in the Semantic Web’s Resource Description Framework (RDF) may form part of a Linked Data system. Text-derived information stored in a relational database may become part of a business intelligence system that jointly analyzes, for instance, DBMS-captured customer transactions and free-text responses to customer-satisfaction surveys. And text-extracted features such as entities, topics, dates, and measurement units may form the basis of advanced semantic search systems. Beyond Text Beyond-text technologies for information-extraction from images, audio, video, and composite media exist but do not match NLP’s sense-making capabilities. Likely most developed is speech-analysis technology that supports indexing and search using phonemes and is capable of detecting emotion in speech via analysis of indicators such as pace, volume, and intonation with contact-center and others applications that include intelligence. Intelligence, along with consumer and social search, motivates work on image analysis, as do marketing and competitive-intelligence related studies of online and social brand mentions and use. Video analytics extends both speech and image analysis, with an added temporal aspect, for security applications and also potential business uses such as study of customer in-store behavior. For beyond-text media, as for text, metadata is of critical importance. Metadata Metadata describes data properties that may include the provenance, structure, content, and use of data points, datasets, documents, and document collections. Content-linked metadata typically includes author, production and modification dates, title, topic(s), keywords, format, language, encoding (e.g., character set), rights, and so on. The metadata label extends to specialized annotations such as part-of-speech and data type. Metadata may be created as part of content production or publication (for instance, the save date captured by a word-processor, a geotag associated with a social update, camera information stored in an image file). It may be appended (for instance via social tagging), or extracted from content via text/content analysis. Whether stored internally within a data object (for instance via RDFa, FOAF, or other microformats embedded in a Web page) or managed externally, in a database or search index, metadata is fuel for a range of applications. A Focus on Applications We will not devote further space in this report to discussion of text- and content-analysis technology. If you do want to learn more about text-analytics history and technology, do continue with the technology sections of Alta Plana’s 2009 study report, “Text Analytics 2009: User Perspectives on Solutions and Providers,” available online at http://altaplana.com/TextAnalyticsPerspectives2009.pdf. As a bridge to survey-derived reporting of user perceptions of the text and content analytics market, solutions, and providers, we will look next at applications. 7
  • 8. Text/Content Analytics 2011: User PerspectivesApplications and Markets Business users naturally focus on business benefits, whether of analytics or of any other technology or investment. Who are those users? Text and content analytics solutions have a place a) in any business domain, b) for any business function, and c) within any technology stack, that would benefit from automated text/content handling, that is, wherever text/content volume, velocity, and variety, and business urgency, are sufficient to justify costs. Consider a very telling quotation, however: Philip Russom of the Data Warehousing Institute wrote in a 2007 report, “BI Search and Text Analytics: New Additions to the BI Technology Stack,”2 “Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” In the analytics world, we see now that it is not enough to know more. You need to understand how to use knowledge gained, the processes and outcomes necessary to turn insights into ROI. Text and content analytics elements – information sources, insights sought, processes, and ROI measures – will vary by industry and application. In this report section, by way of lead-in to survey findings – applications, information sources, and ROI measures are the subject of survey questions 3, 4, and 5 – we look at text/content analytics adaptation for applications in several industries and for a variety of business functions. Application modes Applications are diverse but may be classified in several (overlapping) groups. Our categorization is an update of 2009’s with social and online addition in particular:  Media, knowledgebase, and publishing systems – the author includes search engines here – use text and content analytics to generate metadata and enrich and index metadata and content in order to support content distribution and retrieval. Semantic Web applications would fit in this category, as would emerging information-access engines.  Content management systems – and, again, related search tools – use text analytics to enhance the findability of content for business processes that include compliance, e-discovery, and claims processing.  Line-of-business and supporting systems for functions such as compliance and risk, customer experience management (CEM), customer support and service, marketing and market research, human resources and recruiting… and newer tasks that include social monitoring, measurement, and engagement.  Investigative and research systems for functions such as fraud, intelligence and law enforcement, competitive intelligence, and science. Where are these applications used? Business Domains Consider a sampling of industry domains where text and content analytics are frequently applied:  In intelligence and counter-terrorism, and in law enforcement, there is broad content variety – languages, format (text, audio, images, and video), sources (news, field reports, communications intercepts, government records, social2 http://www.teradata.com/assets/0/206/308/96d9065a-0240-44f1-b93c-17e08ae6eacc.pdf 8
  • 9. Text/Content Analytics 2011: User Perspectives postings) – and, at times, great urgency.  In life sciences, for instance for pharmaceutical drug discovery, source materials have been more uniform (scientific literature, clinical reports) and there is no need for real-time response, yet information volumes are huge and complex and the potential payoff – years and millions of dollars shaved off lead-generation and clinical trials processes – to justify very significant investments in text mining.  For financial services and insurance, effective credit, risk, fraud, and legal and regulatory-compliance decision-making involves creation of predictive models via analysis of large volumes of transactional records and often incorporates information mined from text sources such as financial and news reports, e-mail and corporate correspondence, insurance and warranty claims. Automated methods are essential.  Market researchers rely on text analytics to hear and understand market voices. Focus groups are (on their way) out: They are costly, slow, and often unreliable. Surveys still have great value – beyond soliciting opinions, they can serve as an engagement tool – but neither they nor focus groups help researchers hear unprompted views, the attitudes that consumers express to their peers but not in more formal research settings. Why text analytics? Social is hot, yet human analysis, whether or surveys or of social postings, can be inconsistent and don’t scale. Add in text analytics and you have next-generation market research.  As content delivery and consumption shift to digital, search and information- dissemination tools that exploit metadata (publisher-produced, analytically generated metadata, or socially tagged) are essential survival tools for media and publishing organizations. Content analytics creates better targeted, richer content and a much friendlier and more powerful experience for content consumers.  Online and social have fomented an advertising revolution. Targeting is the word, whether based on behaviors (modeled via tracking and clickstream analysis) or on analytically computed matching. Matches may draw from user profiles, context (geography, accessing application, device or machine being used) and inferred intent (for instance from search terms), and the semantic-signatures of the content where ads are to be delivered.  Text analytics provides essential capabilities in support of legal domain e- discovery mandates. Organizations must “produce” materials relevant to lawsuits, a task that would often be impossible without automated text processing, given huge volumes of electronically stored information generated in the course of business. Intellectual property is another legal-domain application. The task is to identify names, terminology, properties, and functions salient to an IP search that seeks to identify, for instance, prior art and possible patent infringement. Business Functions Many business tasks are independent of industry. Every organization of any significant size has in-house customer support, marketing, product development, and similar functions (even while definitions of customer, marketing, and product do still, of course, vary by industry.) Let’s examine the role text and content analytics play for the following:  Customer experience management (CEM) is a signal text/content analytics success story. The aim is to transform customer relationship management (CRM), which captures transactions and interactions, into a set of tools and practices that cover the engagement span from customer acquisition to customer service and 9
  • 10. Text/Content Analytics 2011: User Perspectives support, first and foremost by listening and responding to the voice of the customer across channels. In plain(er) English, CEM marries text- and speech- sourced information – from e-mail, online forums, surveys, contact-center conversations, and other touchpoints… and also from employee input – with transactional and profile information. The hope is to improve customer satisfaction and operate more efficiently and profitably. Simplistic, reductive indicators such as the Net Promoter Score can only point at issues and challenges. They can neither explain them nor suggest actions or remedies – insights that are accessible (at enterprise scale) only via text and content analytics.  Marketers translate market-research and competitive-intelligence findings into marketing campaigns and advertising and, in cooperation with product developers, into higher quality, more satisfying products and services. It’s all about listening. Steve Rappaport, in his book Listen First!, says we should “Change the research paradigm. Social media listening research should bring about an era of real-time data that anticipates change and can be used to visualize and create a rewarding business future,” as well as “rethink marketing, advertising, and media.” His prescriptions about listening apply across channels and touchpoints, as they do for CEM, with the difference here, for research-related functions, being that we are looking at an aggregate rather than an individualized picture, seeking to hear the voice of the market, again aided by text and content analytics. Our aim is to deliver targeted, compelling advertising via more effective marketing and, of course, superior products and services that better meet customer needs.  Competitive intelligence, in particular, involves mining customer voices, at both individual and aggregate levels, and also business information, for instance about sales, personnel, alliances, and market conditions that indicate opportunities and threats. Ability to extract domain- and sector-focused information from online and social sources and to integrate information from disparate sources in order to derive coherent signals is essential, delivered by analytically rooted technologies.  Business intelligence (BI) was first defined, in the late 1950s, in terms of extraction and reuse of knowledge drawn from textual sources.3 BI took off in a different direction, however, starting in the late 1960s, centering on analysis of numerical data captured in computerized corporate operational and transactional systems. Back to the (1950s) Future: Number crunchers of all stripes recognize the business value of information in text sources. They are seeking, with the help of both major and niche BI and data warehousing vendors, to bring text-sources information into enterprise BI initiatives. Call this integrated analytics, also incorporating geospatial and machine-generated Big Data to bring businesses a step closer to the sought-after (although mythical) 360o-view of the customer (and the market and one’s own business). Technology Domains Last, for context, let’s briefly consider technology domains where text and content analytics come into play, semantics and the Semantic Web, and then look at emerging text analytics applications. Semantic Computing First, redefining, text/content analytics involves the acquisition, processing, analysis, and3 Seth Grimes, “BI at 50 Turns Back to the Future,” InformationWeek, November 21, 2008: http://www.informationweek.com/news/software/bi/211900005 10
  • 11. Text/Content Analytics 2011: User Perspectives presentation of enterprise, online, and social information derived from text and rich- media sources. The technology is one route to semantics, to generating machine-usable identification of information objects attached to databases, tables, fields, and rows; to corpora, documents, and document content; and to media files, e-mail and text messages. Text/content analytics provides a descriptive route to semantics, making sense of information in-the-wild, as generated by humans (and machines) online, on social platforms, and in everyday business and personal communications, whether written, spoken, or captured in rich media. The alternative route to semantics is prescriptive, generated or captured in the course of content generation, whether via database export or a plug-in to an authoring application. The Semantics market includes technologies for the creation, management, and use of artifacts such as taxonomies, ontologies, thesauruses, gazetteers, semantic networks, controlled vocabularies, and metadata. These artifacts may be generated manually by subject-matter experts. They may be generated automatically by text analytics. And in many situations, a hybrid system involving manual curation of automatically generated artifacts may be in order. Semantics applications include digital content management, publishing, research, and librarianship across a broad set of industrial and government applications. The semantics market includes semantic search, whether open-domain, vertical (applied to a particular information domain), or horizontal (applied in a particular business function). It also includes classification and information integration. Classification, Search, and Integration Semantic computing finds its primary application in classification, search, and integration. Classification determines what a data item or object represents, including how it may be used, in relation to other data items and objects in a data space. This is, admittedly, an abstract and not particularly practical definition. Information integration and search are where semantics finds its most compelling applications. Semantic search is, in essence, “search made smarter, search that seeks to boost accuracy by taming ambiguity via an understanding of context.”4 Several approaches fit under the semantic search umbrella. They include related searches, search-results enrichment, concept searches, faceted search, and more. The common thread is better matching searcher intent (inferred from search context including past searches and the searcher’s profile) to searched-for information content. Semantic search is behind many emerging search-based applications, fueled by text and content analytics, for applications such as e-discovery, faceted navigation for online commerce, and search-driven business intelligence. And it is captured semantics, in the form of data identifiers and descriptions, that enables dynamic, adaptive information integration, where join paths are discovered based on business and application needs, not hard-wired as in until-recent computing generations. The Semantic Web The Semantic Web is, at its root, an information-integration and sharing application, a set of standards and protocols designed to facilitate creation and use of “Web of data.” Eventually, the Semantic Web market will include tools and services that execute knowledge-reliant business transactions over distributed, semantically infused data spaces. We are years from that market. The bulk of Semantic Web focused expenditures are for government funded research4 Seth Grimes, “Breakthrough Analysis: Two + Nine Types of Semantic Search,” InformationWeek, January 21, 2010: http://www.informationweek.com/news/software/bi/222400100 11
  • 12. Text/Content Analytics 2011: User Perspectives projects at universities and similar institutions. Outside research contexts, business implementations do not extend significantly beyond a) the use of microformats and RDFa (Resource Description Framework–attributes) to allow Web-published structured data to be indexed by search engines to facilitate information access and b) the use of RDF triples as a convenient format for structuring facts for storage in DBMSes supporting graph- database schemas to facilitate integration and query of data from disparate sources. At a certain point however, perhaps in 2-4 years, the Semantic Web will reach a tipping point where its business value, and the revenues generated by technology and solutions sales, licensing, and support, will explode. Value Today At this time, text/content analytics delivers business value that is greater by far than the value delivered by related semantic and Semantic Web technologies. This is because the vast majority of subject information – text, images, audio, and video (a.k.a. content) – is in “unstructured” form, just a string of bytes (and terms, in the case of text) so far as software systems – Web browsers and office productivity tools, content management systems, search engines – are concerned. To make content tractable for business ends, for operational or analytical purposes or in order to monetize content as a product, one must first create structure. To maximize content usability, for most social and for many enterprise sources, generated structure will take into account semantic information extracted from source materials. That is, structure shouldn’t be arbitrary, a matter of sticking information into a set of round pigeonholes for square-peg content. This process of the discovery, extraction, and use of semantic information in content is the domain of text/content analytics solutions. Solution Providers The aggregate characteristics of the text and content analytics solution-provider spectrum are little changed since 2009 although there has been significant turn-over in players. We still have, as reported in 2009, “a significant cadre of young pure-play software vendors, software giants that have built or acquired text technologies, robust open-source projects, and a constant stream of start-ups, many of which focus on market niches or specialized capabilities such as sentiment analysis.” The big change is in delivery mode. The market now favors as-a-service analytics, whether in the form of online applications, cloud provisioned, or provided via Web application programming interfaces (APIs). This shift makes sense.  The most in-demand new information sources are online, social, and on-cloud.  Use of as-a-service, cloud, and via-API applications means low up-front investment, faster time to use, and pay-as-you-go pricing without IT involvement.  Certain providers offer as-a-service access to both historical and current data at attractive costs given the buy-once, sell-many-times economies they enjoy.  Modern applications are designed to draw data via APIs, facilitating application- inclusion of plug-in text and content analytics capabilities. There is every expectation that the solution-provider market will continue to evolve to keep pace with user needs and broad-market business and technical trends. 12
  • 13. Text/Content Analytics 2011: User PerspectivesDemand-Side Perspectives Alta Plana designed a 2011 survey, “Text/Content Analytics demand-side perspectives: users, prospects, and the market,” to collect raw material for an exploration of key text- analytics market-shaping questions:  What do customers, prospects, and users think of the technology, solutions, and vendors?  What works, and what needs work?  How can solution providers better serve the market?  Will your companies expand their use of text analytics in the coming year? Will spending on text/content analytics grow, decrease, or remain the same? It is clear that current and prospective text/content-analytics users wish to learn how others are using the technology, and solution providers of course need demand-side data to improve their products, services, and market positioning, to boost sales and better satisfy customers. The Alta Plana study therefore has two goals:  To raise market awareness and educate current and prospective users.  To collect information of value to solution providers, both study sponsors and non-sponsors. Survey findings, as presented and analyzed in this study report, provide a form of measure of the state of the market, a form of benchmark. They are designed to be of use to everyone who is interested in the commercial text/content-analytics market. Study Context The author previously explored market questions in a number of papers and articles. These included white papers created for the Text Analytics Summit in 2005, The Developing Text Mining Market,”5 and 2007, “Whats Next for Text.”6 A systematic look at the demand side provides a good complement to provider-side views and to vendor- and analyst-published case studies, including the author’s own. This understanding motivated the 2009 study, “Text Analytics 2009: User Perspectives on Solutions and Providers,” available for free download.7 That research was preceded by Alta Plana’s 2008 study report, “Voice of the Customer: Text Analytics for the Responsive Enterprise,”8 published by BeyeNETWORK.com, a first systematic survey of demand-side perspectives, albeit focused on a particular set of business problems. VoC analysis is frequently applied to enhance customer support and satisfaction initiatives, in support of marketing, product and service quality, brand and reputation management, and other enterprise feedback initiatives. About the Survey There were 224 responses to the 2011 survey, which ran from June 6 to July 9, 2011. (Contrast with 116 responses to the 2009 survey, which ran from April 13 to May 10, 2009.)5 http://altaplana.com/TheDevelopingTextMiningMarket.pdf6 http://altaplana.com/WhatsNextForText.pdf7 http://altaplana.com/TA20098 http://altaplana.com/BIN-VOCTextAnalyticsReport.pdf 13
  • 14. Text/Content Analytics 2011: User Perspectives Survey invitations The author solicited responses via  E-mail to the TextAnalytics, SentimentAI, Corpora, Lotico, BioNLP, Information- Knowledge-Content-Management, and ContentStrategy lists and the author’s personal list.  Invitations published in electronic newsletters: InformationWeek, BeyeNETWORK, CMSWire, KDnuggets, AnalyticBridge, and Text Analytics Summit.  Notices posted to LinkedIn forums and Facebook groups and on Twitter.  Messages sent by sponsors to their communities. Survey introduction The survey started with a definition and brief description as follow: Text Analytics / Content Analytics is the use of computer software or services to automate • annotation and information extraction from text – entities, concepts, topics, facts, and attitudes, • analysis of annotated/extracted information, • document processing – retrieval, categorization, and classification, and • derivation of business insight from textual sources. This is a survey of demand-side perceptions of text technologies, solutions, and providers. Please respond only if you are a user, prospect, integrator, or consultant. There are 21 questions. The survey should take you 5-10 minutes to complete. For this survey, text mining, text data mining, content analytics, and text analytics are all synonymous. Ill be preparing a free report with my findings. Thanks for participating! Seth Grimes (grimes@altaplana.com, +1 301-270-0795) The introduction ended with the text: Privacy statement: This survey records your IP address, which we will use only in an effort to detect bogus responses. It is your choice whether to provide your name, company, and contact information. That information will not be shared with sponsors without your permission, and if shared with sponsors, it will not be linked to your survey responses. 14
  • 15. Text/Content Analytics 2011: User Perspectives Survey response There is little question that the survey results overweight current text-analytics users – 73% of respondents who answered Q1, “How long have you been using Text Analytics?” (n=224) versus 78% of respondents who replied to Q7, “Are you currently using text/content analytics?” (n=206) – among the broad set of potential business, government, and academic users. (The difference in percentage is likely due to a higher rate of survey abandonment among non-users. The figures contrast with 63% and 61% in the 2009 survey.) So call this a Pac Man question, one whose response indicates very significant survey selection bias: Are you currently using text/content analytics? Yes No 21.8% 78.2% (n=206) Market Size and the Larger BI Market We can infer overweighting by comparing market-size figures. The author estimates an $835 million 2010 global market for text/content-analytics software and vendor supplied support and services. As the author described in the May 12, 2011 InformationWeek article Text-Analytics Demand Approaches $1 Billion9, “My $835 million market-size estimate covers software licenses, service subscriptions, and vendor-provided technical support and professional services. Despite strong growth, it remains a small fraction of Gartners $10.5 billion 2010 valuation of the broader BI, analytics, and performance- management software market.”10 By contrast, the 2009 text-analytics market report cited the author’s figure of $350 million for the global, 2008 text analytics market. (That figure did not account for search-based applications, which were included in the 2010 market-size estimate.) The 2009 report also cited a 2008 BI-market estimate from research firm IDC: “The business intelligence tools software market grew 6.4% in 2008 to reach $7.5 billion.”119 http://www.informationweek.com/news/software/bi/22950009610 http://www.gartner.com/it/page.jsp?id=164271411 http://www.idc.com/getdoc.jsp?containerId=217443 15
  • 16. Text/Content Analytics 2011: User Perspectives The Data Mining Community Another contrasting data point is that 65% of respondents to a July 2011 KDnuggets poll12 report (n=121) using text analytics on projects in the preceding year. Results were tallied nine days into the poll, before it was closed, so final numbers may differ from those reported here. The figure in a similar, March 2009 poll was 55% currently using text analytics/text mining. KDnuggets: How much did you use text analytics / text mining in the past 12 months? Used on over 50% of my projects 21.5% Used on 26-50% of my projects 9.9% Used on 10-25% of projects 14.9% Used on < 10% of my projects 19.0% Did not use 34.7% 0% 5% 10% 15% 20% 25% 30% 35% 40% KDnuggets reaches data miners, a technically sophisticated audience who are among the most likely of any market segment to have embraced text analytics. The rate of text- analytics adoption by data miners surely exceeds the rate adoption by any other user sector. As an aside, 49% of KDnuggets respondents stated that in comparison to the last 12 months, in the next 12 they would use text analytics more, whether on additional projects or more intensively on a steady project workload. 43% stated their use would remain about the same and only 8% anticipated less use.12 http://www.kdnuggets.com/2011/07/poll-text-analytics-use.html 16
  • 17. Text/Content Analytics 2011: User PerspectivesDemand-Side Study 2011: Response The subsections that follow tabulate and chart survey responses, which are presented without unnecessary elaboration. Q1: Length of Experience As in 2009, the 2011 survey opened with a basic question – How long have you been using Text/Content Analytics? 35% 30% 25% 20% 15% 10% 5% 0% not using, 6 months to one year to two years to currently less than 6 four years no definite less than less than less than evaluating months or more plans to use one year two years four years 2009 (n=107) 16% 22% 8% 5% 7% 18% 25% 2011 (n=224) 6% 21% 3% 5% 12% 20% 33% We see that 2011 responses skew to longer experience than measured in 2009. Survey results were not based on a scientifically designed or measured population sample however, neither in 2011 nor in 2009, and given how out of proportion survey-measured experience is to that of the broad business population – the addressable market for text/content analytics likely extends far beyond the currently user base – the most plausible conclusion one can draw from Q1 responses is that 2011 survey outreach failed to bring in the proportion of new and prospective users reached in 2009. Nonetheless, Q1 responses will prove illuminating in analyses of subsequent survey questions, in studying how attitudes vary by length of text/content analytics experience. 17
  • 18. Text/Content Analytics 2011: User Perspectives Q2: Application Areas What are your primary applications where text comes into play? 39% Brand/product/reputation management 40% Voice of the Customer / Customer Experience 39% Management 33% 39% Search, information access, or Question Answering 36% Research (not listed) 33% 33% Competitive intelligence 37% 26% Customer service/CRM 22% Product/service design, quality assurance, or warranty 15% claims 14% 15% Life sciences or clinical medicine 18% 2011 (n=219) 15% E-discovery 2009 (n=103) 15%Online commerce including shopping, price intelligence, 11% reviews 10% Financial services/capital markets 15% 9% Other 13% 8% Insurance, risk management, or fraud 17% 8% Content management or publishing 19% 7% Military/national security/intelligence 6% Law enforcement 7% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% The 219 respondents in 2011 chose a total of 748 primary applications, an average of 3.4 primary applications per respondent. While there is some category overlap, it is notable that respondents are applying text analytics toward multiple business needs. 18
  • 19. Text/Content Analytics 2011: User Perspectives Q3: Information Sources What textual information are you analyzing or do you plan to analyze? 62% blogs and other social media 47% 41% news articles 44% 35% on-line forums 35% 35% customer/market surveys 34% 30% review sites or forums 21% 29% e-mail and correspondence 36% 27% scientific or technical literature 27% 23% contact-center notes or transcripts 25% 22% Web-site feedback 21% 21% text messages/SMS/chat 8% 15% 2011 (n=215) employee surveys 16% 14% 2009 (n=100) field/intelligence reports 14% speech or other audio 12%crime, legal, or judicial reports or evidentiary materials 13% 10% medical records 16% 9% point-of-service notes or transcripts 12% 9% patent/IP filings 11% 8% photographs or other graphical images 7% insurance claims or underwriting notes 15% 6% video or animated images 5% warranty claims/documentation 7% 0% 10% 20% 30% 40% 50% 60% 70% 19
  • 20. Text/Content Analytics 2011: User Perspectives The 215 respondents in 2011 chose a total of 962 textual-information sources, an average of 4.5 sources per respondent. The big news is not news at all: Social sources are by far the most popular and 4 of the top 5 categories are social/online (as opposed to in- enterprise) sources. Despite social’s status, however, it is a source for barely more than 6 out of 10 respondents. 20
  • 21. Text/Content Analytics 2011: User Perspectives Q4: Return on Investment Question 4 asked, “How do you measure ROI, Return on Investment? Have you achieved positive ROI yet?” There were 164 respondents. Results are charted from highest to lowest values of the sum of “currently measure” and “plan to measure”: How do you measure ROI, Return on Investment? Measure: Achieved Measure: Not Achieved Plan to Measure higher satisfaction ratings 19% 18% 28% increased sales to existing customers 13% 18% 29% ability to create new information products 11% 13% 27% improved new-customer acquisition 9% 15% 25% higher customer retention/lower churn 10% 12% 23% higher search ranking, Web traffic, or ad response 10% 12% 22% reduction in required staff/higher staff productivity 9% 9% 23% fewer issues reported and/or service complaints 9% 6% 23% lower average cost of sales, new & existing customers 5% 7% 23% faster processing of claims/requests/casework 10% 6% 19%more accurate processing of claims/requests/casework 6% 7% 20% 0% 10% 20% 30% 40% 50% 60% 70% Out of 164 respondents, 37.8% (62), report that they have achieved positive ROI according to some measure. Those 62 respondents reported achieving ROI according to a total of 182 measures, that is, 2.94 ROI-achieved measures for each respondent who achieved positive ROI. Out of 164 respondents, 50 are measuring ROI but have not yet achieved positive ROI according to any measure. The 112 respondents who are measuring ROI (whether achieved or not) track a total of 385 measures among them, 3.44 measures per respondent. The following are several of the Other responses given:  Better customer insight, market intelligence, and competitive intelligence.  Content findability.  Creation of scientific knowledge.  Higher employee engagement and better L&D outcomes.  Improvement in existing processes, turnover time. 21
  • 22. Text/Content Analytics 2011: User Perspectives  Incremental sales lift.  Lowered cost of fraud, more accurate predictive analytics.  Number of action executives can take, estimated dollar savings from risk correction/avoidance.  Patient outcomes.  Providing better data to scholars.  Reduction of Claim Cost.  Stronger understanding of subconscious emotional zones.  We don´t know how to measure it properly. Q5: Mindshare A word cloud, generated at Wordle.net, seemed a good way to present responses to the query, “Please enter the names of companies that you know provide text/content analytics functionality, separated by commas. List up to the first 8 that come to mind.” There were 129 responses, many offering several companies. A bit of data cleansing was done, to regularize names and remove inappropriate responses. Contrast with the 2009 word cloud (deliberately rendered smaller than the 2011 cloud, without an attempt to create sizing consistent between the two clouds) based on 48 response records, as follows: Note that IBM acquired SPSS in mid-2009. 22
  • 23. Text/Content Analytics 2011: User Perspectives Q6: Spending Question 6 asked about 2010 spending and 2011 expected spending. How much did your organization spend in 2010, and how much do you expect to spend in 2011, on text/content analytics software/service solutions? 90% 80% 7% 70% 3% 6% 6% 60% 2% 7% 4% 7% 7% 50% $1 million or above 9% 40% $500,000 to under $1 million 30% $200,000 to $499,999 30% $100,000 to $199,999 20% 23% $50,000 to $99,000 10% under $50,000 15% 19% use open source 0% 2010 spent (n=176) 2011 expected (n=165) $1 million or above 6% 7% $500,000 to under $1 million 2% 3% $200,000 to $499,999 4% 6% $100,000 to $199,999 7% 7% $50,000 to $99,000 9% 7% under $50,000 23% 30% use open source 15% 19% Questions asked of only current text/content-analytics users. Questions 8 through 13 were posed exclusively to current text/content analytics users, to the 81.2% of the 206 respondents to Q7: Are you currently using text/content analytics? Q8: Satisfaction Question 8 asked, “Please rate your overall experience – your satisfaction – with text analytics.” It offered five categories, listed here with response counts:  Overall experience/satisfaction (n=117, of whom 3 No experience/No opinion).  Ability to solve business problems (n=114, 12 NE/NO).  Solution/technology ease of use (n=112, 5 NE/NO).  Solution/technology performance (n=114, 4 NE/NO). 23
  • 24. Text/Content Analytics 2011: User Perspectives  Availability of professional services/support (n=112, 13 NE/NO). Responses, which across categories are somewhat anomalous, are as shown: Please rate your overall experience – your satisfaction – with text/content analytics 100% 3% 3% 4% 4% 4% 4% 7% 90% 13% 17% 21% 24% 80% 31% Very disappointed 70% Disappointed 36% Neutral 60% 36% 38% Satisfied 50% Completely satisfied 40% 58% 42% 30% 35% 31% 27% 20% 10% 17% 12% 12% 11% 9% 0% Overall, 70% of current-users respondents who had an opinion reported themselves Satisfied/Completely Satisfied even while the breakout-category counts totaled 59%, 36%, 47%, and 42% Satisfied/Completely Satisfied. We can surmise that the numbers who voiced “No experience/No opinion” for the breakout categories tended to have a favorable overall experience. 24
  • 25. Text/Content Analytics 2011: User Perspectives Experience/satisfaction sentiment polarity Positive Overall experience / satisfaction Neutral 80% Negative 60% Availability of 40% Ability to solve professional services / 20% business problems support 0% Solution / technology Solution / technology performance ease of use Q9: Overall Experience Question 9 asked, “Please describe your overall experience – your satisfaction – with text analytics.” The following are 49 from among the 63 responses, categorized, lightly edited for spelling and grammar and with the names of three products masked: Happy It works. Excellent. Absolutely essential. Very satisfied, most goals exceeded, big jump in effectiveness and customer satisfaction. Pretty happy given we are in a highly technical different to monitor/track niche. Saving a lot of time for our journalists. We have found having an application with the capabilities to clean and normalize the text and quantitative data, process it to a form to analyze, and run text mining and categorization on an ad hoc or production basis has greatly enhanced my teams capabilities and productivity. We found great value from using a Speech Analytics solution to retain customers and improve the overall customer experience through root-cause analysis. I have been working with text analytics for academic and scientific purposes and I am quite satisfied with results achieved. I work with nurse and social science researchers. They think that a chat with 20 people is research. I tend to analyze hundreds or thousands of free-text comments. 25
  • 26. Text/Content Analytics 2011: User Perspectives I use software to overcome the biases inherent in manual analysis. It Takes Work Very powerful tool but requires the organizations ability to take action on the insights. Valuable tool; my clients are content to underutilize it, so what is available more than meets our needs. Since we use open source, the ROI is basically how much time you put into the solution and how many problems it solves. We have been successful so far. Very Satisfied but extremely labor intensive We provide this as a tool to our clients in our application for publishing press releases. It works fine but could be better but that is up to us to implement it fully. Once you spend man hours to set up the tool, it is extremely consistent on doing what you tell it to do. I know improvements are coming but Id like more AI from text analytics tools than what is currently offered. Do-It-Yourself is challenging but not impossible. Very cheap to operate. Fairly satisfied – problem is I am sole researcher and data/text clean-up takes too much time given other demands. Ive been a user and vendor of text analytics (in fact, in my early <...> days, we helped coin the phrase “text analytics”). Vendors generally overpromise and have difficulty delivering. Both vendors and customers underestimate the amount of resources required to get it right. So, still hard to use for mainstream purposes. Reservations and complications Steep learning curve. I am currently satisfied, but I believe we (as analysts) are just beginning to fully unlock the full potential of text analytics. On one hand, Im amazed and thrilled that this stuff exists at all. But on the other hand, I havent seen anything that does just what I want it to do. Its opened up opportunities to analyze unstructured data but not at the same level as structured data. Works well at highest level of analysis (e.g. sentiment) but not as well in auto- coding for custom (i.e. project) studies. Tools are good, but lack transparency, ability to explain how conclusions are reached. There is still a lot of work required to optimize this technology since it can currently provide concepts but does not capture context and it’s a lot of slow painful work to get the software to recognize context in which something is mentioned and 26
  • 27. Text/Content Analytics 2011: User Perspectives accuracy is still not a lot. Unmet needs Very promising technology but some difficulties to - Implement smoothly text mining component into existing information system. - Cope with various languages, formats, volumes, etc. of data. - Measure and demonstrate tangible results in terms of improved information extraction quality. - Assess ROI (reducing processing time / saving resources for core tasks e.g. analysis). Powerful but overly difficult, impenetrable - technology vs. solutions. An emerging and enabling technology in our business with broad applicability. Satisfied in our applications with accuracy and precision but hitherto disappointed with export capability to other applications. Still a volatile market for applications beyond VOC/sentiment analysis. Vendors are eager to please but sometimes overstate the capabilities. However, I still have limited experience in solving real business problems with these tools (I am a consultant). I think this field is in its infancy. Lots of issues with data quality. Sentiment analytics often flawed. Hard to scale or automate. The handful of companies and solutions I came across do not seem to marry or integrate structured and unstructured text easily... Algorithms are not quite available as a function or way to improve accuracy. I feel there is so much more work to be done both on the analysis side and also on the business implementation side. While I work heavily in this area, I wont be more satisfied until I see better end-to-end integration and until I see more effective and systematic use of insights. I do everything myself. The lack of good lexical resources and taxonomies is a real problem that drives up the cost (in manpower) of providing a solution. And the complexity of the infrastructure required vs. the apparent simplicity of the problem (in managers minds) makes it very difficult to adjust expectations. We use <...> and we have to write our own routines to find the text and content that we are interested in. There are plenty of functions that help us with our goals but obviously there is still much that we need to do to higher recall and accuracy. <...> is the only tool which is both open source and professionally useful. However in spite of 20 years of development, it still has a very poor user interface as well as API interface which hinder productivity and acceptance at a beginners level. Skepticism Jury is still out. It’s still evolving, accuracy of results something to watch for in iterations. 27
  • 28. Text/Content Analytics 2011: User Perspectives Still learning. Very early days! Promising but still very difficult to see quick results. Everything seems to take ages and it’s been a painful learning curve. Hard to trust the automated results when youve been used to achieving 100% with manual human analysis. Still too new. Field as a whole is underperforming what is possible. Though the concept is very appealing, it is still in its native stages, and a lot more possibilities are left to be explored. IBM Watson is a good step ahead in that direction. Very poor, almost useless. Looking ahead On the whole, very satisfied with the range of solutions available and their ease of use. Very much looking forward to watching the technology progress – its obviously not perfect yet. Unlike structured data, getting value out of text analytics tools require understanding of text elements – how to utilize occurrence of different parts of speech, how to interpret different types of sentences like requests, commands, opinionated sentences, etc. Domain knowledge and tunable and adaptable systems are a must for success. Non-availability of trained personnel to provide text mining services leads to dissatisfaction of users. Business end users do not like to use the tools themselves because of the complexity. The process or strategy for text mining needs to be established. Were pretty happy with text analytics and see it as a transformational technology. Most of text analytics problems lie in how it is sold. It is both broad and deep and has a myriad of tools best suited for very different use cases, but customers think "text analytics is text analytics." Really, “text analytics” is a horrible term that needs to be broken up into component parts. Q10: Providers Question 10 asked, “Who is your provider? Enter one or more, separated by commas, most important provider first.” There were 77 response records, listing providers (sorted and without counts): Autonomy, Clarabridge, Colbenson, Content Analyst, Expert System, GATE, IBM, in-house, Lexalytics, LingPipe, Megaputer, MotiveQuest, open source , Open Text, R, Radian6, Rapid-I, Saplo, SAS, Smartlogic, Sysomos, TEMIS, Teradata, TextKernel, Thomson Reuters (including Calais, ClearForest), Verint, Zemanta Note that the survey asked, “Please respond only if you are a user, prospect, integrator, or 28
  • 29. Text/Content Analytics 2011: User Perspectives consultant.” Q11: Provider Selection Question 11 asked, “How did you identify and choose your provider? (If more than one, limit response to your most important provider.)” Applicability, robust performance, open source. Research. Experience and luck. Very satisfied reference customers with similar applications, most flexible solutions, expertise of consultants, high quality of service, extreme agility, and extremely rapid idea-to-deployment cycles. They contacted us before launch of their first product. Product evaluation in context of business application. Based on business requirements in the framework of a European competitive tender procedure. Advised by a related Web development consultant. We spent about a year evaluating and classifying vendors that in part or whole would fill our needs as expressed in Q9. We decided on using an application with integrated quantitative and qualitative analytic capabilities as the best possibilities. We ended up doing POCs with SAS, SPSS and Megaputer, and ended up choosing the later. We evaluated multiple providers based on (1) tool flexibility – can we customize? (2) accuracy (3) type of content it can tag (4) sentiment methodology (5) price. Main criteria are cost, multi-language capability, and integration with SAS. Competitive bids. Large existing analytics relationship: Tool was an add on. Conducted a thorough investigation of leading providers in the space. Quality and reviews. Personal recommendations. Constructed a needs analysis ranking system. Our needs included ease of integration, tools, ability to produce meaningful results at sub-document (short document) level, ease of (or no) training. Networking and academic partnerships. 29
  • 30. Text/Content Analytics 2011: User Perspectives Proof of Concept – evaluated about a dozen or so vendors – have not selected a TM vendor as yet. Recommended by a trusted source. Based on recommendations <...> and our own search / lab testing, which brought us to <...>. Introduction from my manager. What my client uses. It was an obvious choice since there was no real alternative on the market (i.e. language is limiting the products). We compared a number of providers and decided to go for <...> that have a local presence and are experts on the Swedish language. Free, for research purposes. Trials based on performance. Price/performance tradeoff and applicability to targeted business problem. I work for the company. Use other languages (Perl) as necessary. Tested various services, rated results. I choose the vendor or tools based upon my client application needs. We do not have a primary provider ... we maintain a library of tools and use many of them in the same project. Trying all major ones. Reputation, personal contacts. Established open-source project. Market research, pricing, case studies and product evaluations. It was recommended to us. Working in-house. I worked for one of them and selected the other on their open source commitment. Proof of Concept. Support for Drupal. Cost, applicability to needs. 30
  • 31. Text/Content Analytics 2011: User Perspectives I dont, thats up to my clients. But my advice to them is to begin with an understanding of the goals, and work backward to identify the provider. Company demo. Recommendation from experts, and tried and tested different ones. Management mandate, client. RFP to replace an existing legacy system. We already used <...> and had everything we needed to do Proof of Concept; waiting for business reason to acquire <...>. Advanced, scalable LSI technology. Work for them. Ability to mine audio, text, and customer surveys. Q13: Promoter? Question 13 is new with the 2011 survey; we did not ask it in 2009. It is a basic net- promoter type question, without the “net” part: “How likely are you to recommend your most important provider to others who are looking for a text/content analytics solution?” Of 87 responses, 49% were positive, 23% were neutral, and 28% were negative. How likely are you to recommend your most important provider? Extremely likely to recommend against 15% Moderately likely to recommend against 34% Slightly likely to recommend 6% against Neither likely to recommend 7% nor recommend against Slightly likely to recommend Moderately likely to 10% 23% recommend 5% Extremely likely to recommend Promoters outweigh detractors by a net of 21. 31
  • 32. Text/Content Analytics 2011: User Perspectives Q14: Information Types Information extraction is a key text/content analytics capability, so Question 14 asked, “On the technical front, do you need (or expect to need) to extract or analyze --” with a response count of n=138. Responses are charted alongside 2009 responses. In 2009, n=80. Do you need (or expect to need) to extract or analyze... 83% Topics and themes 65% 77% Sentiment, opinions, attitudes, emotions 60% Named entities – people, companies, 76% geographic locations, brands, ticker symbols,… 71% 66% Concepts, that is, abstract groups of entities 58% 62% Events, relationships, and/or facts 55% Metadata such as document author, 52% publication date, title, headers, etc. 53% Other entities – phone numbers, part/product 46% 2011 numbers, e-mail & street addresses, etc. 40% (n=138) 2009 10% Other (n=80) 15% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% There were fifteen Other responses. They included:  Top service issues to address, most pressing customer problems.  Type of sentences, context in which text was recorded, type of questions asked in a survey, correlation with numeric measures.  Motivations, advocacy.  Textual relations.  Related texts.  Similarities and differences between documents.  Triples and various phrase types (noun, verb, and adjective phrases).  Activities, interests.  Multi-lingual items.  Influence, engagement. Q15: Important Properties and Capabilities For Q15, the 2009 response-choice options were retained to create comparability, and 32
  • 33. Text/Content Analytics 2011: User Perspectives several choices were added. There were 136 response records for 2011 versus 78 for 2009 to the question, “What is important in a solution?” What is important in a solution? 63% broad information extraction capability 59% ability to use specialized dictionaries, taxonomies, 57% ontologies, or extraction rules 62% 57% deep sentiment/emotion/opinion extraction 53% 44% support for multiple languages 39% ability to create custom workflows or to create or 44% change topics/categories yourself 24% 41% predictive-analytics integration 37% 38% low cost 51% 36% BI (business intelligence) integration 35%sector adaptation (e.g., hospitality, insurance, retail, 35% health care, communications, financial services) 23% 35% "real time" capabilities 29%"Big Data" capabilities, e.g., via Hadoop/MapReduce 26% media monitoring/analysis interface 22% 25% open source 24% 24% supports data fusion / unified analytics 19% 21% hosted or "as a service" option 22% 18% interface specialized for your line-of-business 17% frontline voice of the customer (VOC) system 18% integration 2011 (n=136) export to Semantic Web formats (RDF, OWL, 15% 2009 (n=78) microformats, etc.)vendors reseller/integrator/OEM relationships with 12% tech or service providers 13% 6% other 9% 0% 10% 20% 30% 40% 50% 60% 70% The areas of greatest concern relate to core technical capabilities. Putting aside the choices added for 2011, there are two notable differences in the 2009 and 2011 results. 33
  • 34. Text/Content Analytics 2011: User Perspectives 1. The desire for customizability is markedly higher in 2011 than in 2009:  Ability to create custom workflows or to create or change topics/categories yourself.  Sector adaptation. 2. Low cost is substantially less important for 2011 respondents than for 2009 respondents. Contrast responses in the chart above with 2011 Top 10 experienced-user responses, selected by 30% or more of users with 2 or more years experience with text/content analytics (n=66). What is important in a solution? (responses of n=66 users with 2 or more years text/content analytics experience) ability to use specialized dictionaries, taxonomies, 64% ontologies, or extraction rules broad information extraction capability 62% ability to create custom workflows or to create or change 50% topics/categories yourself support for multiple languages 49% deep sentiment/emotion/opinion extraction 49% predictive-analytics integration 46%sector adaptation (e.g., hospitality, insurance, retail, health 39% care, communications, financial services) "Big Data" capabilities, e.g., via Hadoop/MapReduce 36% "real time" capabilities 32% BI (business intelligence) integration 32% 0% 10% 20% 30% 40% 50% 60% 70% Q16: Languages Question 16 asked, “What written languages other than English do you currently analyze or seek to analyze? What languages do you see a need for within two years?” There were 63 responses. This question was not asked in the 2009 survey. It offered, as responses, major languages and language groupings. It did not offer a None response, that is, it did not measure survey respondents who are not currently, and have no foreseeable (two-year) need, to analyze languages other than English. 34
  • 35. Text/Content Analytics 2011: User Perspectives What written languages other than English do/will you analyze... French 56% 6% Spanish 44% 10% German 33% 13% Chinese 22% 24% Arabic 19% 13% Italian 25% 5% Russian 13% 16% Scandinavian or Baltic 21% 6% Japanese 14% 13% Portuguese 14% 11% Dutch 16% 6% Korean 10% 10% Hindi, Urdu, Bengali, Punjabi, or other South Asian 5% 14% Other 10% 2% Currently Other European or Slavic/Cyrillic 6% 3% Within 2 years Other East Asian 3% 6% Polish 8% 2% Greek 3% 6%Other Arabic script (including Urdu, Pashto, Farsi, Dari) 3% 3% Turkish or Turkic 2% 5% Bahasa Indonesia or Malay 2% 3% Other African 2% 2% 0% 10% 20% 30% 40% 50% 60% 70% The response profile would surely have been different if the survey had had greater reach into military and intelligence and international communities. Q17: BI Software Use Question 17 asked, “What BI (business intelligence) or analytics software do you use if any? Please separate entries with commas, listing first the most important to you.” This question was posed because not infrequently, text analytics users seek to integrate results and analyses within numbers-focused BI tools. There were n=59 response records. There was no None choice offered so survey respondents who did not respond to this question may or may not be BI software users. Responses (sorted and without counts) were: Custom/in-house, EDM, IBM Big Sheets, IBM Cognos, IBM SPSS, Information Builders, Megaputer PolyAnalyst, Microsoft, MicroStrategy, Mint Global, None, Oracle BI, Pentaho, R, Radian6, Rapid-I RapidMiner, SAP Business Objects, SAS, SAS JMP, SpagoBI, Tableau, Teradata, Thomson ONE & Eikon, TIBCO Spotfire 35
  • 36. Text/Content Analytics 2011: User Perspectives Q18: Guidance Question 18 asked, “What guidance do you have for others who are evaluating text/content analytics?” There were 62 responses. After editing for clarity and relevance, 52 are listed here, in four categories and arranged to create a sense of narrative progression. Evaluating Define the requirements; learn the options. Get very detailed information about the technical functionality of the product or service and be sure it will fit your purposes. Research before buying. Dont buy a name, and make sure you know what you want before doing anything else. Be clear what you want to accomplish and how you plan to use the information. Validate that the tools will do what you need but dont over buy. Product evaluations, where a wide range of example content from the chosen domain is tested against the analytics solution, is paramount for determining the range and depth of information returned by the solution. Make a plan with measurable milestones. Objectively evaluate vendor claims. Theres a lot of hype and over-promising that doesnt match real-world performance. Start by determining whether you need a semantic or a pure text-mining solution. Then aggressively source solutions looking at all reasonable solutions. Beware hidden charges and capabilities that are available in only some languages. Try as many as you can. Dont rely on only one off-the-shelf solution. Try and adopt several until you find one you feel comfortable with. Perform a test of your preferred top 3 solutions, compare usability and pricing, consider vendor expertise in automating text mining processes, consider vendor flexibility and agility, consider customization options and pricing. Its case sensitive, that is, there is a wide variety of needs and solutions, and each customer needs its own evaluation and analysis [process and criteria]. I would take the time to conduct a manual test of the tool and compare the results to the computed results. Software companies get way to nerdy about rating sentiments as being positive or negative and forget that its more about accurate, efficient intelligence gathering for many of us. Do some reading in literary and linguistic computing. 36
  • 37. Text/Content Analytics 2011: User Perspectives Data & Proof of Concept Dont even consider it unless you have many thousands of abstracts (or equivalent) to read in your specific application as cost of development time will vastly exceed the benefit. Understand the pitfalls and challenges of interpreting the vast array of inputs. Do a serious POC. Make sure that your vendor is doing the work with you, not simply delivering the output. Youll be surprised how much manual cleanup is done in many POCs. Be sure to have what your objective are for analyzing and taking action off of textual information. Also, I recommend the POC approach to get a handle on the effort involved (training and ongoing) to conduct text mining and/or automated categorization and whether the application will indeed help you meet your goals. If you are classifying text, make sure that you have some planned actionability against each category that you classify text into. If not, work towards creating a smaller number of actionable categories. But in spite of that comment, be prepared to take action at a micro level as well as at a macro level. Evaluate using your own data and report output formats. Try a truly representative data set with a blind test set and challenge vendor to deliver impressive results in a proof of concept. Make sure you use the same dataset and same conditions (e.g. same server configuration) [as you will use in production]. Consider cost of further improvements to your text-analysis pipeline [that will be necessary in] say 1-2 years. After-care can be very expensive with non-open source solutions. Also, open source allows you to bring maintenance and updates in house, if required. Insist on performance against public corpora, or create a test environment on your own. Dont rely on gut reactions. Collect use cases in advance. Capabilities Match the tool with the job. Not all text analytics tools solve the same problem. First understand the broad categories of text analytics types, then match it up with the analysis youre seeking to perform. Always make sure you can answer the all- important "so what" question. Walk before running; problem domains to [address] become clearer the more familiar you become with the tooling. Dont lock yourself into an expensive, monolithic product/platform. Linguistic semantics is the key to understanding words in context and makes Linked Data possible. Use products which allow you to easily modify source code and extend source code. 37
  • 38. Text/Content Analytics 2011: User Perspectives I am a survey researcher so I need to see the ability to create and adapt dictionaries. Emphasize multi-language support. Use a cloud-based service offering. Look for flexibility to change automatic sentiment analysis. Seek a unified voice of the customer analytics platform, which tends to have a lower cost of ownership and higher ROI. You need to evaluate scalability, ingestion speed, entity tagging accuracy. Be prepared to involve both people and computers. No business analysis, and text analytics in particular, is possible without this integration. If you are already evaluating and categorizing comments with people, do an inter-rater reliability test so you have a better understanding of the level of imperfection in your current system. Spend lots of time inspiring analysts on why it is important to use this tool and treat it as valuable. Evaluate post sales support before buying the technology. Assessment of dictionaries that come with the tool may be a good idea. Hire a good consultant. Expectations Dont have illusions about what this stuff does. Theres value in it, but it isnt magic, and it isnt yet what we hope it to be. It is hard work to use text analytics well. Text analytics does not equal VOC/sentiment analysis. There is not one tool that will solve all "text analytics" problems. Manage expectations. Dont expect it to be right "out of the box"; also, a combination of more than one solution may be optimal. Keep your expectations very low. Dont be fooled by shiny technologies. [One] should measure its use in critical commercial and government systems Have patience. Dont [jump] to conclusions fast Make your business objectives very clear to management at the outset to align expectations with reality. Think in the long term and do not skimp on infrastructure, especially metadata management, auditing, and taxonomies. Design your data infrastructure to accommodate CONSTANT change. It is no fun when you get a sample solution ready for prime time that breaks when a core schema or 38
  • 39. Text/Content Analytics 2011: User Perspectives platform changes for reasons out of your control. Measure results against specific business applications – language can be domain specific. Also, generic tools may not service specific needs, such as analytics used to understanding helpdesk responses vs. mining tweets. Expect to invest a lot of manual labor to tune whatever Text Analytics you plan on – there is no solutions out of the box. It took us approximately 3 man years to build an automobile dictionary for the phrases and text we needed for our application – ultimately we had about 200,000 terms/phrases with decent recall and hit rate A certain degree of manual intervention is necessary to improve the creation of topics and themes. Expect a great deal of manual intervention to get what you need. Text mining systems are not pluggable coffee machines. Use text analytics as a service on a continuous basis. One size does not fit all departments of the same organization. Initial investment is needed to customize system to provide domain knowledge and business requirements. Benefits will follow. Analyzing the data is a smaller task; implementation and operationalization pose challenges. Q19: Comments Lastly, Question 19 solicited comments. The following are 10 of the 17 responses. Open source solutions are amazingly powerful and flexible… Be an amateur linguist. If you are not comfortable thinking about the structure of language, hire someone who is. Business users need to articulate their requirements concretely. "Tell me all that is there in text" may yield a whole lot of crap and not help in any positive way, adding to the opinion that "text mining does not deliver". Providers need to define their strategies towards content mining and put it in place. The tools have achieved a certain degree of accuracy which can provide a whole lot of useful insights to business when analyzed by experts. Text mining tools should be used with proper training and understanding about how to analyze text, the mechanisms below the tool, with full domain knowledge and business requirement understanding. An ideal system for us would combine real field based intel with deep Web mining. It doesnt exist at the moment. Feeling that this niche is going through a big change, something that has not happened in the last 10 years or so. Looking forward to going beyond text, to rich media like picture or video analytics. Its a rapidly developing field with emerging applications ... stay close to it. 39
  • 40. Text/Content Analytics 2011: User Perspectives Im in the real time news business. Im trying to measure the trends in social media traffic around our articles to determine how to leverage it for advertising. For example, we had a major rush on gun purchases here after the shooting on Jan 8th, but comments were surprisingly neutral about guns in our social media. We had turned away advertising dollars from guns (a personal safety ad from a major vendor) fearing it would inflame our readership. If Id had the data then, I might have known precisely when to allow it. I am a text-centric scholar. My interest in this is what types of business oriented analytics lend themselves to scholarly applications. Quite a few do. Pick a project that is small enough to get a good handle on, one that cannot be illustrated without text mining, that would not have been possible. Additional Analysis The survey was designed so that responses to questions would be immediately useful without elaborate cross-tabulation or filtering. Certain analyses are revealing, however, for instance whether a respondent is currently using text analytics or not, and bringing in length of time using text analytics, cross-tabulated against other variables. Satisfaction with text/content analytics generally increases with time using the technologies. Contrast the figures in the following chart, which cross-tabulates Q1 Length of experience (for respondents with 6 months or longer experience) by Q8 Satisfaction, with the overall satisfaction rates across all lengths of experience: 12% Completely satisfied; 58% Satisfied; 24% Neutral; 4% Disappointed; 3% Very Disappointed. Satisfaction by Time Using Text/Content Analytics 100% 5% 2% 11% 8% 6% 90% 17% 80% 30% 33% 70% Satisfaction 60% 50% 79% Very disappointed 58% 40% Disappointed 60% 30% Neutral 56% 20% Satisfied 10% Completely satisfied 13% 17% 5% 0% 6 months to one year to less two years to four years or less than one than two years less than four more (n=48) year (n=9) (n=20) years (n=24) Time using text/content analytics 40
  • 41. Text/Content Analytics 2011: User Perspectives Length of experience with text analytics correlates with solution needs. The following chart cross-tabulates Q1 Length of experience (this time for respondents with any experience) by Q15 What is important in a solution. The Top 6 Q15 responses are included. Bars are ordered by the choices of the most-experienced, four-years-or-more users, who also represented a large proportion of the respondents. It is notable that the top choices for both those users and for all users involve capabilities linked to information extraction, classification, and disambiguation. Top Text/Content Analytics "Important in a Solution" by Length of Experience 60 50 40 less than 6 months 6 months to less than one year 30 one year to less than two years two years to less than four years 20 four years or more 10 0 Other interesting points come out of contrasting respondents who are already using text analytics with respondents who are still in planning stages. The suggestion in the following table, which crosses the Top 6 response to Q3 What textual information are you analyzing or do you plan to analyze? by Q7 Are you currently using text/content analytics? values, is that prospective users look to social and online sources – blogs and social media, news article, discussion forums, and review sites – at a higher rate than do current users. Note, in particular, the 26-point gap between the top category and the second category for prospective users (Not Yet Using). The gap is smaller, 21 points, for current users. (Because the survey was not based on a scientifically designed sample, we do not evaluate 41
  • 42. Text/Content Analytics 2011: User Perspectives the statistical significance of the difference.) Textual Information Sources by Current/Prospective User Current User Not Yet Using All Textual Information Source (n=158) (n=42) (weighted) 60% 71% 63% Blogs and other social media (twitter, social-network sites, etc.) 39% 45% 40% News articles 37% 38% 37% Customer/market surveys 34% 43% 36% On-line [discussion] forums 30% 33% 31% Review sites or forums 31% 26% 30% E-mail and correspondence Interpretive Limitations and Judgments The number of survey respondents was not large enough to support further useful cross- tabulation of variables beyond the analyses above. In interpreting presented findings, do keep in mind that the survey was not designed or conducted scientifically, that is, with the intention or the actuality of creating a random sample or a statistically robust characterization of the broad market. Findings surely reflect selection bias due to 1) the outlets where the survey was advertised and 2) a likelihood that those individuals who are unaware of text/content analytics or the potential for the technologies or solutions to help them solve their business problems would not respond to the survey. Findings therefore over-represent current users. Finally, responses to several of the survey questions were not especially illuminating or likely to be of much use to report readers. In particular,  Question 12 asked, “What do you like or dislike about your solution or software provider(s)?” Respondents were allowed to enter up to five points. Twenty-seven individuals responded, entering a total of 82 points that were, unfortunately, all over the map, too difficult to classify and interpret.  Question 17 asked, “What BI (business intelligence) or analytics software do you use if any?” There were 59 response records, but their value is more in linking use of particular BI software to particular text/content-analytics software and to respondent goals and experiences. The study author judges that breaking out or summing responses by software tool would provide an unrepresentative and unfair picture of experience with certain tools, given the limited reach and response of the study, so responses to this question are not reported. 42
  • 43. Text/Content Analytics 2011: User PerspectivesAbout the Study The Report Text/Content Analytics 2011: Users Perspectives on Solutions and Providers reports the findings of a study conducted by Seth Grimes, president and principal consultant at Alta Plana Corporation. Findings were drawn from responses to a mid-2011 survey of current and prospective text-analytics users, consultants, and integrators. The survey ran from June 6 to July 9, 2011. It asked respondents to relay their perceptions of text and content analytics technology, solutions, and vendors. It asked users to describe their organizations’ usage of text and content analytics and their experiences. Sponsors The author is grateful for the support of nine sponsors – AlchemyAPI, Attensity, Basis Technology, Language Computer Corporation, Lexalytics, Medallia, SAS, Sybase, and Verint – whose financial support enabled him to conduct the current study and publish study findings. The content of the sponsor solution profiles was provided by the sponsors. The survey findings and the editorial content of this report do not necessarily represent the views of the study sponsors. This report, with the exception of the sponsor solution profiles, was not reviewed by the sponsors prior to publication. Media Partners The author acknowledges assistance received from 6 media partners in disseminating invitations to participate in the survey. Those media partners are AnalyticBridge, BeyeNETWORK.com (TechTarget), CMSWire, InformationWeek (TechWeb), KDnuggets, Next Generation Market Research (Anderson Analytics), and the Text Analytics Summit. Seth Grimes Author Seth Grimes is an information technology analyst and analytics strategy consultant. He founded Washington DC-based Alta Plana Corporation in 1997. Seth consults, writes, and speaks on information-systems strategy, data management and analysis systems, industry trends, and emerging analytical technologies. He is the leading industry analyst covering the text analytics and sentiment analysis markets. Seth is contributing editor at InformationWeek; founding chair of the Sentiment Analysis Symposium (@SentimentSymp on Twitter), Smart Content: The Content Analytics Conference (@CAnalytics), and the Text Analytics Summit; an instructor for The Data Warehousing Institute (TDWI), and text analytics channel expert at the Business Intelligence Network. He serves on the board of the Council of Professional Organizations for Federal Statistics (COPAFS). Seth can be reached at grimes@altaplana.com, 301-270-0795. Follow him on Twitter at @sethgrimes. 43
  • 44. Text/Content Analytics 2011: User Perspectives Sponsor Solution Profiles 44
  • 45. Text/Content Analytics 2011: User PerspectivesSolution Profile: AlchemyAPIAlchemyAPI is a leading provider of semantic tagging and text miningsolutions, helping companies worldwide enhance, understand, and better-leverage their unstructured information assets. AlchemyAPI empowersorganizations with the tools and capabilities necessary to compete in todayshighly-interconnected global information economy, by providing advancednatural language processing and information extraction capabilities notfound in competing solutions.AlchemyAPI Transforms Text Into Knowledge  Advanced natural language processing in a unified platform: Named entity extraction, keyword extraction, concept extraction, sentiment analysis, topic categorization, fact/event extraction, language detection, HTML cleaning, and more.  Analysis of content written in eight languages: English, French, German, Italian, Portuguese, Russian, Spanish, and Swedish.  Web crawling and content fire-hose integration: Integrated web crawling and support for major content fire-hose data providers.  High performance, scalable document analysis: AlchemyAPI customers perform more than 1 billion document analysis operations each month.Unstructured text contains a wealth of actionable data – AlchemyAPI natural languageprocessing is an accurate, scalable, and cost effective way to unlock this information:Media Monitoring Discovery Preference Targeting Intelligence Gathering News articles contain reports Social media is a trove of up-to- Document archives can be of events involving people, the-second opinion from analyzed, summarized, tagged, companies, dates, locations, diverse demographic groups. linked, and clustered, using and actions. AlchemyAPI AlchemyAPI can analyze topics, AlchemyAPI technology. enables your organization to terminology, and consumer Previously unknown monitor, discover, and react. tone, providing key insights. relationships can be uncovered. 45
  • 46. Text/Content Analytics 2011: User PerspectivesAlchemyAPI Features Overview Named Entity Extraction Identify people, companies, cities, organizations, geographic features, and other typed entities within a document. Features include co-reference resolution, entity-targeted sentiment extraction, relevance ranking, quotations extraction, semantic linking, disambiguation, and taxonomy assignment. Concept Extraction Identify the relevant concepts within a document. AlchemyAPI’s concept tagging engine is capable of making logical abstractions ("Barbara Bush + Hillary Clinton + Laura Bush = First Ladies of the United States") and annotating documents at high rates of accuracy. Keyword Extraction Identify important terminology and “topic” keywords for any document. AlchemyAPI’s keyword extraction engine includes relevance ranking and fully integrated sentiment functionality to compute targeted sentiment for all keywords. Sentiment Analysis Identify the positive and negative expressions within a document. AlchemyAPI provides multiple modes of sentiment analysis including document- level, sentence-level, keyword-targeted and entity-targeted sentiment. Support for negation, amplifiers, diminishers, and slang is also included.Extensive developer support: Software Get started using AlchemyAPI bydevelopment kits for all major programming registering for an API key atlanguages including C, C++, Java, Ruby, .NET, Python,Perl. Easy-to-integrate web services API for Internet www.alchemyapi.comapplications. 46
  • 47. Text/Content Analytics 2011: User PerspectivesSolution Profile: AttensityCustomer conversations with and about a company contain critical information thatcan have a direct impact on the health of the business. Millions of these conversations are taking place in real timeonline in blogs, on review sites and across social media networks. Additional customer insights are often storedinternally throughout the enterprise in the form of emails, call center notes, surveys and CRM databases.Attensity helps the world’s leading brands mine these customer conversations, extract the key insights from thoseconversations and act on those insights. For over ten years, Attensity has been an innovator and leader in the field oftext analytics and natural language processing. With seven patents to its credit, the company’s award-winning softwareis used by the world’s leading brands, including AT&T, Charles Schwab, CitiGroup, Geico, JetBlue, RBC, Travelocity, WellsFargo, Whirlpool and many more, to incorporate the voice of the customer into their most vital business processes.Attensity’s award-winning suite of customer analytics and engagement solutions suite are built on a world-class,massively scalable text analytics platform that gives organizations the key capabilities they need to leverage customerconversations as a business asset, including:LISTEN: Attensity’s listening post monitors information from over 75 million social media and online sources in 17different languages, including the full Twitter Firehose, Facebook, forums, blogs, online news, video sites, review sitesand more. And while online channels are increasingly important, Attensity also recognizes there is a wealth of data thatcan be mined from your internal channels. Attensity’s multi-channel approach lets you combine online conversationdata with information found in direct interactions with your company through emails, call center logs, private forums,customer surveys and more. As a result, you get a holistic view of your customer across all of your conversation sources. ANALYZE: Attensity Analyze is the next generation in customer analytics applications designed with the business user in mind. It sets new industry standards for accuracy, scalability and ease of use, making it possible for large enterprise organizations to analyze high volumes of customer conversations across multiple channels—including social media, emails, survey responses, text messages, CRM notes and more—and quickly extract valuable insights to drive business decisions. Attensity Analyze comes with over 100 out of the box dashboards and reports designed for business users to quicklygain insights into customer conversations, including:  Customer behavior profiling  Detailed sentiment analysis  Root cause of issues  Hotspotting (emerging issues)  Net Promoter© analysis  Predictive analytics  and moreIn addition, Attensity Analyze includes a drag and drop reporting wizard and analysis tools that allow business users toquickly and easily build ad hoc reports around the themes that matter to them.RELATE: Attensity enables organizations to combine the insights gleaned from unstructured customer conversationswith the structured data that exists within your organization. Attensity’s open platform effectively leverages yourexisting CRM, business intelligence, data warehouse and other investments. For example, you might want to betterunderstand how the content or sentiment expressed in customer emails affects call center time-to-resolution, or howvarious actions affect your net promoter scores. 47
  • 48. Text/Content Analytics 2011: User Perspectives ACT: Attensity is the only text analytics platformthat lets you close the loop on customerconversations with our proactive, multi-channelresponse application. Attensity Respond handlesall incoming emails, contact form submissions,comments in customer forums, letters, SMSmessages, tweets and other types of customercommunications, and categorizes, routes andqueues them using Attensity’s advanced patternrecognition technology. It then allows thosemessages to be tracked as trouble tickets andgenerates a suggested response that can beautomatically send to the customer or routed toa contact center agent for servicing by theappropriate person or team.Quick Start Industry SolutionsAcross industries, companies are optimizing the customer experience with Attensity solutions tailored to their specificbusiness needs. Attensity’s industry solutions include specific topics, categories, analytics and reports tailored to theneeds of global leaders in retail banking, telecom, insurance, hospitality, high-tech manufacturing and other keymarkets that value the customer experience. Following are just two examples of our customer successes:JetBlue AIrways Success StoryNew York-based JetBlue Airways has created a new airline category based on value, service and style. Known for itsaward-winning service and free TV as much as their low fare, JetBlue is now pleased to offer customers the mostlegroom throughout coach (based on average fleet-wide seat pitch for U.S. airlines). JetBlue is also America’s first andonly airline to offer its own Customer Bill of Rights, with meaningful compensation for customers inconvenienced byservice disruptions within JetBlue’s control.JetBlue Airways currently uses Attensity in its customer service organization to uncover customer issues, requirementsand overall sentiment about the airline. The company’s pilot project demonstrated a significant ability to find keyinformation about customer sentiment and tangible data around how to augment its services. JetBlue uses Attensity toproactively manage and analyze all freeform customer feedback to improve service, marketing, sales and the productsthey offer.“From our Customer Bill of Rights to our customer advisory council, JetBlue is dedicated to bringing humanity back to airtravel,” Bryan Jeppsen, Research Analyst Manager said. “One of the best ways to do that is to listen — truly listen — toour customers. Our commitment with Attensity enables us to mine subtle but important clues from all forms ofcustomer communications to continue improving all aspects of our company. We’re eager to learn as much as we can,and we’re excited to have Attensity’s simple to use yet sophisticated software at our service.”JetBlue customer service analysts use Attensity daily to cull insights and actions from feedback. “Attensity offers us theunprecedented ability to automatically extract customer sentiments, preferences and requests we simply wouldn’t findany other way,” according to Jeppsen. “Attensity VOC enables us to intelligently structure, search and integrate the datainto our other business intelligence and decision-making systems.”WhirlpoolWhirlpool Corporation is a long-time Attensity customer, and the worlds leading manufacturer and marketer of majorhome appliances, with annual sales of approximately $17 billion, 67,000 employees, and nearly 70 manufacturing andtechnology research centers around the world. As a customer-centered company, Whirlpool needed to understand theroot cause of pain points and brand, product, and service related issues.When Whirlpool started with Attensity in 2004, the company wanted to be able to leverage the web and over 8.5million annual customer and repair visit interactions captured in service notes to drive marketing programs, productdevelopment, and quality initiatives. Whirlpool has done just that and more. With over 300 Attensity VoC usersworldwide, Whirlpool listens and acts on customer data in the service department, its innovation and productdevelopments groups, and in the market every day. With Attensity VoC, Whirlpool gets early warning of safety andwarranty issues and has been able to mitigate expensive recalls through rapid change out of defective parts. In additionto Attensity-fueled product quality improvements, Whirlpool better understands customers’ needs and wants – and thecompetition and what they are doing to win over customers.Michael Page, Development and Testing Manager for Quality Analytics at Whirlpool Corporation, affirms these types ofbenefits: “Attensity’s products have provided immense value to our business. We’ve been able to proactively addresscustomer feedback and work toward high levels of customer service and product success.”www.attensity.com 48
  • 49. Text/Content Analytics 2011: User PerspectivesSolution Profile: Basis TechnologyThe Rosette® Linguistics Platform is the world’s most widelyused component library for multilingual text retrieval and analysis. Rosette providesautomatic language identification, text normalization, entity extraction, and entitytranslation in over thirty languages, all in a single, unified framework.Text Analytics for the Global “As the market expands internationally, one lastMarketplace significant market differentiator is the ability to mineEarly adopters of text analytics have text in foreign languages. Both Teragram andhistorically been forced to choose Inxight possessed this capability. But with thosebetween breadth (i.e. scope of companies being acquired, Basis Technology is thelanguage coverage) and depth (i.e. main pure play in the market with multilingual textstrength of precision and recall). With analysis capabilities.”Rosette, such compromises are a thing — Nick Patience, Research Director,of the past. We enable sophisticated Information Management, The 451 Groupapplications incorporating intelligentsearch, faceted navigation, andidentity resolution to be delivered to the global marketplace, regardless of whether the audiencespeaks English, German, Arabic, or Chinese.Flexible Building BlocksRosette enables your application toingest unstructured text, process itintelligently, and put it to work. Thesebuilding blocks can be assembled intoflexible solutions that fit your uniquerequirements, working seamlesslywithin your existing workflows whileenabling new languages, charactersets, and data sources.  Automatically identify the language or languages present in a document for correct analysis, filtering, and retrieval of text;  Increase the precision and recall of full-text search in multiple languages;  Extract important concepts from unstructured text, such as names, locations, and dates;  Translate names from foreign writing systems into English; and  Match names extracted from documents against lists, regardless of language or writing system.Advanced text analytics is complemented by ease of integration. Rosette componentsplug into an application through a single API that supports C, C++, Java, or .NET.Developers can utilize the modules they need within their application and workflow andeasily add new capabilities or languages as requirements evolve. 49
  • 50. Text/Content Analytics 2011: User Perspectives “Google selected Basis Technology to provide the AsianSearch Multilingual Documents linguistic technology needed to create the ultimatewith High Accuracy Chinese, Japanese, and Korean search engine. ThisRosette enables search-based marks a key milestone in establishing Google as theapplications to process multilingual preferred search engine for Internet users worldwide.”documents by providing essential — Urs Hölzle, Senior Vice President, Googlelinguistic services, includingtokenization, lemmatization, anddecompounding.Each human language presentsunique challenges, so Rosettecombines multiple technologies. ForEast Asian languages, Rosetteemploys morphological analysis tocorrectly tokenize input text. ForEuropean languages, Rosetteperforms lemmatization to identifythe dictionary form of each word.Languages such as Arabic, Persian,and Urdu require complexorthographic analysis to achieveaccurate search results.This same linguistic technology—licensed by search powerhouses Amazon, Bing, Google, andYahoo!—is also now available to the rapidly-growing community of developers buildingapplications based on Apache Solr and Apache Lucene.Combine Entity Extraction “Were really impressed with the Rosette Namewith Entity Resolution Translators capabilities and how it has improved our OFAC name checking process. It translated 330,000Rosette enables the classification, Arabic names into Roman script very quickly,management, and analysis of large volumes consistently, and accurately.”of unstructured text by identifying entities— — Peter Wilkinson, VP of Application Development,names, places, identifiers, and other unique NCB Capitalwords or phrases—that establish the realmeaning in text. Three types of entity detection technology—statistical modeling, regularexpressions, and gazetteers—are applied to achieve industry-leading accuracy.After entities are extracted, Rosettematches names of people, places, andorganizations against a single, universalindex. For example, looking for“Mao Zedong” returns matches written inEnglish (such as “Mao Ze Dong” or“Mao Tse Tung”); in Simplified Chinese(毛泽东); in Traditional Chinese (毛澤東);or even in other languages such as Arabic(‫.)ماو تسي تونغ‬Unlike legacy solutions driven by listscontaining billions of spelling variants,Rosette analyzes the intrinsic structure of each name component, and performs an intelligentcomparison using advanced linguistic algorithms. This approach is not limited to a particular list ofvariants and reduces the likelihood of both “false positives” and “false negatives”. 50
  • 51. Text/Content Analytics 2011: User PerspectivesSolution Profile: Language Computer Corp.Since 1995, Language Computer hasbeen at the forefront of developments innatural language processing technology.Our latest product—Cicero On-Demand—is a Java-based "We evaluated 7 extraction tools, and foundsolution for text analytics. Internet startups, large Language Computers Cicero to be the best incorporations, and government agencies rely on Cicero terms of quality and price. We run Cicero 24-7 toOn-Demand for extracting semantic information from assist in our tagging process to identify topics in millions of documents and have been verylarge collections of unstructured text. satisfied with its speed, accuracy, and ease of use." — Scott Decker, CTO, PublishThisWe regularly submit our technology to communityevaluations and have participated in more than 20different competitions including TREC, DUC, ACE and TAC.We invite you to review our performance.11 See results of past competitions: (TAC) http://www.nist.gov/tac; (TREC) http://trec.nist.gov; (DUC) http://duc.nist.gov; and (ACE) http://www.itl.nist.gov/iad/mig/tests/ace. What sets Language Computer apart?Customization Service In the 2009 Text Analytics survey Scalable and Affordable With Cicero On-the top requested feature for text analytics was custom Demands hosted solution, you purchaselexicons, entities, and relations. Customization lets you only the services you need when you needdefine what to extract from unstructured text. Our team them—resulting in substantial savings—of software engineers, linguists, and researchers has and the services effortlessly scale with yoursuccessfully provided this service to companies and growing demands.government agencies for many years. Expertise Language ComputersBroad Out-of-the-Box Coverage If you need rich researchers and language specialistssemantics then youll love Cicero On-Demands 150 kinds continuously contribute to advancementsof entities, 80 relations, and 40 topics out-of-the-box. in the natural language processing field.Enjoy domain-specific insights into dozens of verticaldomains including finance, medical, sports, and others. Text Analytics? Why Can’t I Just Ask the Computer a Question?Question Answering Over the past 13 years, Language Learn More About the Future of LanguageComputer has developed industry-leading question Processinganswering technology. Question answering builds on textanalytics to allow you to communicate in a natural waywith the computer and easily retrieve the knowledge andfacts woven within unstructured text. www.languagecomputer.com/whatsnext 51
  • 52. Text/Content Analytics 2011: User Perspectives Product Spotlight: Cicero On-DemandText Entity Extraction Language Computers Time and Location Normalization LanguageAnalytics entity extraction technology identifies Computers Pinpoint technology converts more than 150 different types of mentions of dates, times, and locations into quantitative, temporal, and named standard formats such as latitude and longitude. entities. Entity extraction is available in This normalization allows you to easily visualize English, Chinese, Arabic, and select your results. European languages. Support for Many Document Formats Many Fact and Relation Extraction Cicero On- input document formats are supported including Demands relation extraction identifies XML, HTML, PDF, MS Word (DOC/DOCX), more than 80 specific types of relations PowerPoint (PPT), plain text and email. that exist between entities in text. Entity Coreference and Disambiguation Try the Online Demo Today The entity coreference system identifies named and pronominal references within and across documents which refer to the same entity, even when different expressions are used. This is enhanced through our entity linking system which provides links from entities to an external demo.languagecomputer.com/cicero knowledge base. Topic Detection Cicero On-Demand identifies the presence of 40 topics in text including those from the top-level Google AdWords category targeting codes. Topics identified are given a numeric strength rating.Platform Server Licensing Cicero On-Demand is Hosted Solution Leave the work of managing theOptions easily deployed "inside the firewall" when data center to us. With our software-as-a-service security and privacy are essential. All (SaaS) solution you pay only for what you use, components are implemented in pure when you use it. Java and accessed either through Java libraries or an easy-to-use REST API. 52
  • 53. Text/Content Analytics 2011: User PerspectivesSolution Profile: Lexalytics Lexalytics builds a text analysis engine that’s entirely out of the box”, quickly adding value to your business purposed for OEM integration into your application – letting you focus on building your service and not on or service. the technical details of text analytics. For 9 years, we’ve been refining and improving our We don’t believe in black boxes, so every parameter is engine: “Salience”. available for you to tune and optimize. Salience is a multi-lingual text analysis engine that Put simply, Salience has the strength to analyze the is currently integrated into systems for business entire blogosphere, the subtlety to look for slight intelligence, (social) media monitoring, reputation trends in confidential records, and the flexibility for management, automated trading, survey analysis, completely novel problems. customer satisfaction, and more. Salience integrates painlessly with your BI/dashboard/ Unlike other text analytics companies – analytics or data warehouse system, and works “right Lexalytics focuses on your application, not ours. Named Entity Recognition: Sure – which of our Sentiment Analysis: Would you like that on your 5 algorithms do you want to use? All of them? Ok. document? Sentence? Named Entity? Theme? Are you fine with our model or do you want to train Do you want to use a model or a dictionary your own? Up to you. approach? Are you happy with our 250k+ entry dictionary, or would you like to have lots of Named Entity Relationships: Care to configure different dictionaries for each of your customers? gene-regulation relationships? Merger & Acquisition Fine by us. relationships? Speaker, Topic, and Opinion? Can do. Summarization: How summarized do you want that? Context Extraction: Bi-grams? Noun phrases? Context-Scored Themes? Yes. File types: You name it; we’ve probably processed it. 53
  • 54. Text/Content Analytics 2011: User Perspectives 4 out of the top 5 Social Media Monitoring companies use Lexalytics software. Lexalytics Salience Engine scales to We are Endeca’s preferred text analytics solution, handle our trillions of powering many of their large Enterprise customers. entity/concept relationships. Lexalytics animates several of the top survey analysis companies including Maritz and Service Yes, that’s trillions with a ‘T’ Management Group (SMG). David Seuss, We are the oomph behind the Thomson-Reuters CEO, Northern Light Group LLC automated trading service. We enhance Dow Jones’ Factiva information service. Lexalytics has won in many different data-analysis It took 2 engineers only 4 days to vertical industries. Our customers, in turn, are integrate Lexalytics’ Salience using our engine to analyze data for almost every engine into our content processing vertical industry on Earth. Every day, you come into contact with thousands of products and services pipeline. belonging to companies improved by our software. . Nick Halstead, And most of those companies are blissfully unaware CEO and Founder, DataSift of the text analytics engine that’s doing the work for them, they just know that their reputation management partner, or their content management (Lexalytics is processing 9TB/day for DataSift.) partner, or their customer satisfaction partner… is giving them great information. We’ve got more non-English languages. • Conceptual Topics: The ability to define a classification subject with only a few simple We’ve got the world’s first commercial integration concept keywords. Food for example could of the knowledge of Wikipedia© into the speed and be defined as nothing more than (Food, power of algorithmic text analysis. We’re calling Restaurant and Dining). This definition would this the “Concept MatrixTM”. You’re going to call sweep in content about gourmet pizza, or it “Awesome”. “a cool new bar.” Based on the Concept MatrixTM and other technology, • Improved sentiment scoring via the we’re shipping some new features: understanding of subjective vs. objective phrasing • Facets: Extracts important subjects and their pertinent attributes. Yes, you need more detail on these. That’s why • Collections: Will utilize the metadata across you’re going to need to reach out to us for many documents to produce enhanced results. more information. If you want an out-of-the-box text analysis application, we’re not the right vendor. However, if you want to make your analytics application/service truly different and great, then you need to come to our website (http://www.lexalytics.com) or call us at: 1-800-377-8036 x1 54
  • 55. Text/Content Analytics 2011: User PerspectivesSolution Profile: Medalliamine text in foreign languages. Both Teragram and Inxight possessed this capability. But withTHE POWER OF TEXT ANALYTICS—RIGHT INthe main pure play in the market withthose companies being acquired, Basis Technology ismultilingual text analysis capabilities.” SYSTEMYOUR FEEDBACK MANAGEMENT— Nick Patience, Research Director, Information Management, The 451 GroupCustomer comments often yield the best business insights, but for most companies, the data theygenerate is just too much to read and analyze. To tackle this, companies traditionally employedstandalone text analytics solutions that required specialized in-house experts to make any use of them.Unfortunately, with these solutions, companies were unable (or found it horribly difficult) to leverageother important aspects of the feedback—like quantitative scores. So it was no surprise that when weinterviewed some of the greatest companies in the world about their text analytics initiatives, we foundnothing but frustration and failed expectations.With that in mind, we set out to do something different. We wanted to create a fully integrated textanalytics solution that provided rich results without sacrificing ease of use. More than that, we wantedthose insights to be accessible across your organization. It had to be possible. And it is.Medallia Text Analytics: powerful, accessible insights for your entire organization.AN EASY-TO-USE, INTEGRATED SOLUTIONOne of the difficulties of traditional text analytics initiatives was the need to run reports in multipledisparate applications, then cross-reference findings for further analysis. At best, it was burdensome andtime-consuming, and for most, it simply never happened.Medallia is different.Text Analytics is available right inside the Medallia system, so its insights are easy to share with peopleacross your organization. There’s nothing new to learn, no new software to install, no integrations toworry about. Medallia Text Analytics generates intuitive, easy-to-use reports that users can understandwith little to no formal training.Plus, by integrating text analytics into Medallia’s powerful reporting tools, you have the luxury of viewingsummary data with our rich homepage modules, or digging deeper and fully researching a topic with ourdetailed reports.ANALYZE COMMENTS AND SCORES TOGETHER WITH THE MEDALLIA IMPACTINDEX 55
  • 56. Text/Content Analytics 2011: User PerspectivesNatural language processing, linguistic connections, machine learning—blah, blah, blah. We can talk thattalk (and obviously we do) but what we set out to do was deliver text analytics that really helps you figureout how to take action.To do that, we’ve created a simple number, a proprietary algorithm called the Medallia Impact Index,that allows you to quickly analyze comments and scores together.The Medallia Impact Index (or MII) is Medallia’s proprietary methodology for determining whichcomment topics have the greatest impact on overall satisfaction and loyalty scores. Every commentaffects your averages, and with the MII, you’ll know which responses are affecting them the most.The MII is a statistical indicator of how much a certain comment topic affects your overall customerexperience. It is based on both verbatim comments and quantitative scores, such as your overallsatisfaction score or likelihood to recommend score. A positive MII score shows that people talking abouta particular comment topic are more satisfied than average. A topic with a negative MII is one thatpeople are less satisfied with than average. To improve your customer experience, you should focus onthe topics with the most negative MII scores.By using the Medallia Impact Index in analyzing your feedback, you can know which comment topicshave the biggest impact on customer experience. That allows you to target your improvement planappropriately to have the greatest impact and ROI for your efforts.And it’s available only from Medallia.DEEP, ACTIONABLE INSIGHTSBy marrying text analytics with quantitative survey scores, our proprietary engine better weighs theimportance of comments, yielding accurate results. Simply put: Our text analytics engine is better.Medallia Text Analytics allows you to uncover powerful insights: Instantly identify which locations or employees contribute most to scores (both positively and negatively) on every topic. Discover which customer segments are most affected by key issues. Detect emerging trends, allowing you to take action immediately, before issues become truly troublesome.ABOUT MEDALLIAMedallia’s solutions enablecompanies to gather, monitor,and act on feedback fromcustomers, partners, andemployees. Customers includeglobal financial services, retail,high-tech, business-to-business, and hotel companies.For more information about Medallia’s Customer Feedback Management services or platform, visitwww.medallia.com or call 650-321-3000. 56
  • 57. Text/Content Analytics 2011: User PerspectivesSolution Profile: SASSince 1976, SAS has delivered proven solutions that drive innovation and improve performance. SASprovides an extensive suite of business analytics software and services that provide a unified way fororganizations to solve complex business problems, drive sustainable growth through innovation, andanticipate and manage change.SAS has an impact on everyone, every day. From the roads you travel and mortgage loans you purchase, tothe brand of cereal you eat and cell phone plan you select, SAS plays a role in your daily life.To lead your organization in todays challenging economic climate, you need fact-based answers that youand others can believe in. Traditional approaches to decision support have not yielded optimal results.SAS provides the world’s best analytics embedded in a framework that supports the entire decision-makingprocess. This helps SAS customers find answers to immediate business issues and implement their visionsfor the future. By delivering insights gleaned from data about customers, suppliers, operations, performanceand more, SAS Business Analytics empowers you to:  Provide the right answers to lead your business forward, stay ahead of competitors and continuously improve the way you work.  Solve complex business problems and seize new opportunities.  Enhance business performance and optimize value creation.The SAS Business Analytics Framework combines thestrengths of SAS solutions and technologies with SAScommitment to innovation, continuous technical support, By bringing together the combined strengths of SAS Businessprofessional services, training and partnerships. Solutions and the technology underpinnings of those solutions, the SAS Business Analytics Framework delivers competitive insights that drive evidence-based decisions.What is SAS® Text AnalyticsMaximize the value buried in unstructured data assetsSAS Text Analytics is a framework, supporting 27 different languages (as well as dialects), that enablesbusinesses to maximize the value of the information encapsulated within large quantities of text that isgenerated, acquired or exists in content repositories – documents, Web pages, customer communications,online discussions, blogs, call center notes fields, email, open-ended survey responses, claims forms, salesreturns – by extracting contextually relevant information and interpreting, mining and structuringinformation to improve findability and reveal concealed patterns, sentiment and relationships.SAS® Text Analytics OfferingsSAS® Enterprise Content CategorizationSAS Enterprise Content Categorization is an easy-to-use application that lets organizations automaticallycreate better relevance, enhanced usability and quality content from search and retrieval activities. Byapplying natural language processing and advanced linguistic techniques, it correctly parses, analyzes andextracts content entities, facts and events, creating the necessary metadata to accurately represent adocument’s keywords and phrases.The metadata is applied to incoming documents before other systems index the content. This added layer ofintelligence, which can include definitions for identical or duplicate matches – and which is defined by thedocuments themselves – provides the information that search and content management systems must havein order to pinpoint only the relevant information. 57
  • 58. Text/Content Analytics 2011: User PerspectivesSAS Enterprise Content Categorization drives faster, more efficient information organization and documentretrieval – and drastically reduces the time spent searching for documents and the overhead associated withmanual tagging, retrospective indexing, and search and content management system replacement.SAS® Text MinerSAS Text Miner provides a rich suite of linguistic and analytical modeling tools for discovering andextracting knowledge from multiple text documents. Across electronic text snippets, document collectionsand Web downloads, you can automatically identify patterns as topics and themes, defining explicitassociations between terms and phrases. Documents can have unique membership in only one cluster, orthey can have membership in multiple clusters (called text topics). Text mining discovers previouslyunknown patterns in document collections – and once topics or clusters are identified, the text isnumerically represented for use in predictive modeling.SAS Text Miner supports 27 different languages (as well as dialects). A text import node simplifiesinclusion of Web and file crawling. The easy-to-use software automatically identifies the language of eachdocument, provides document cutoff definitions, makes full use of natural language processing, documentconversion and parsing routines, and includes a replacement node for the text miner node – which can beused for even more control over clustering parameters.SAS® Sentiment AnalysisSAS Sentiment Analysis delivers superior results for understanding opinions because it harnesses thepower of statistical techniques and linguistic rules to provide greater accuracy in assessment of sentimentextracted from source documents. It provides detailed feature and attribute sentiment evaluations throughhighly configurable semantic rules.Use the software to monitor and assess consumer sentiment through rich visualizations, reports andmonitoring tools that make it easier to identify trends and potential problems, evaluate competitive standingand track changes in sentiment over time. With this insight, organizational teams that monitor productcapability, marketing and branding effectiveness, and internal operations gain feedback that’s useful formarketing, research and development, as well as service and logistics. With these results, you can identifyemerging issues before they become larger-scale problems, monitor opinions after making adjustments andchanges to process improvements, and gauge the impact of promotions and business communications.SAS® Ontology ManagementSAS Ontology Management enables the management of content and maintenance of ontologies inenterprise content repositories and databases. It creates semantic terms that are used to organize previouslydisassociated and isolated text repositories. It creates and maintains consistent and centralized metadataacross text collections so that information search-and-retrieval engines systematically identify commonconcepts. This helps organizations maximize the value of their semantic repositories and collections bylinking them together with consistently defined relationships to quickly and accurately address informationsearch-and-retrieval needs. Text Analytics Maturity Model Think Loc ally, Think Globally, Think Globally, Think Globally, HIGH LOW Ac t Loc ally. Ac t Loc ally. Ac t Collec t ively. Ac t Globally. Cr eate str uctur e for integr ation with str uctur ed data and Risk Discover tr ends / contextual analysis patter ns, extr act Value meaning (e.g. facts, sentiment) and consolidate, Automatic c ontent c ategor ization and summar ize and aggr egate content taxonomy fr om text and management unstr uctur ed data to infor m and automate business pr ocesses Manual, r edundant and incr ease the and er r or -pr one fidelity of pr edictive content tagging models and r etr ospective indexing LOW HIGH UNDISCIPLINED REACTIVE PROACTIVE GOVERNED People, Polic ies, Technology Adoption 58
  • 59. Text/Content Analytics 2011: User PerspectivesSolution Profile: SybaseSybase IQ is a highly optimized analytics servers specifically designed to deliver superiorperformance for mission-critical business intelligence, analytics and data warehousing solutionson any standard hardware and operating system. The first column-based database in themarket, Sybase IQ is recognized by industry analysts as a leader in data warehousing andspecialty analytic databases, with more customers than other specialty data warehousingvendors combined.Sybase IQ allows you to answer complex questions involving 100s of terabytes or petabytes ofdata with extreme speed and efficiency. Sybase IQ accelerates queries by factors of 10 to 100,the difference between 10 minutes and 16 hours, often with no database tuning and using lesshardware. Sybase IQ’s compression technology allows enterprises to analyze 10 times moredata without requiring additional storage space, and its unique grid architecture enablesbusinesses to support large user communities running a wide range of analytic workloads.SYBASE IQ AND UNSTRUCTURED DATASybase IQ provides the means to store and retrieve unstructured data objects as part of thesame repository as transactional or analytical data. Unstructured data may include images,maps, documents (postscript files, word processing files, presentations, proprietary formats),audio, video, and XML files. Sybase IQ can manage individual unstructured data objectscontaining terabytes or even petabytes of data, as needed. By bringing relational andunstructured data together into a single location, Sybase IQ enables an organization to accessboth types of data using the same application, and the same interface. Figure 1 - Store and analyze structured and unstructured data in one database 59
  • 60. Text/Content Analytics 2011: User PerspectivesFULL TEXT SEARCH AND ANALYSISMaking unstructured data objects accessible within the database environment is just thebeginning. Honing in on the exact information needed within a large body of text can be adaunting task. That task becomes more manageable with Sybase IQ by enabling businesses tosearch and analyze text content, including the ability to:  Search for both words and phrases  Perform Boolean and proximity searches  Score results based on relevanceELIMINATING RISK AND COSTLY WORKAROUNDSWhen using Sybase IQ to manage structured and unstructured data together, databaseadministrators find that they have control over all the data used by the application, not just thestructured data. This eliminates the need for complex logistical arrangements with systemadministrators who manage unstructured data in file systems. It also eliminates the risk of dataunavailability should the file systems experience system failure or downtime. Centralized dataaccess means a more stable and more secure system. The Sybase IQ UDA option brings the fullpower of an enterprise class DBMS to the management of unstructured data.DRAMATIC SPEED IN LOAD ING, RETRIEVING, SEARCHING AND QUERYINGSybase IQ supports parallel and bulk data loads, which improve loading of unstructured data byreducing the I/O load from the disk. This approach to data loading is also much faster thanloading data into traditional file systems that are less efficient at handling large sequential files.Unstructured data can also be extracted out of Sybase IQ with tremendous speed and efficiency.Apply Sybase IQ’s unique data compression technology to unstructured data objects, especiallytext data, and you get much more efficient storage of this data than a file system can support.Plus, the data compression makes retrieval of the unstructured data dramatically faster.ABOUT SYBASESybase, an SAP® company, is an industry leader in delivering enterprise software to manage,analyze and mobilize information. We are recognized globally as a performance leader, provenin the most data-intensive industries and across all major systems, networks and devices. Ourinformation management, analytics and enterprise mobility solutions power the world’s mostmission-critical systems in financial services, telecommunications, manufacturing andgovernment.For more information: www.sybase.comRead Sybase blogs: blogs.sybase.comFollow us on Twitter at @Sybase 60
  • 61. Text/Content Analytics 2011: User PerspectivesSolution Profile: Verint Systems Inc.Verint Systems’ Voice of the Customer (VoC) solutions include speech analytics, text analyticsand feedback analytics. The company’s VoC offering enables organizations to tap into sources ofcritical business insight across all significant customer touch points, spanning across email, webchat, social media, agent notes, recorded contact center conversations and customer feedbacksurveys. By collecting, unifying and analyzing data across all of these channels, organizations areable to access a holistic view of their business from the customer perspective for the first time.Today’s consumers demand ever-increasing flexibility in how they interact with the organizationsthey do business with. Addressing this requires a VoC program and analysis strategy thatmanages and unifies all of those information sources and content typeseach of which presentsunique data mining needs. Historically, technical and organizational challenges, however, haveprevented many organizations from embracing a holistic strategy for data collection and analysis.They simply can’t access and analyze this crucial information fast enough within their systems,resulting in delayed responses to customer needs and market demands. Critical data oftenremains locked up in information silos and is not readily accessible to functional owners looking tomake timely business decisions. As a result, organizations are often ineffective in makingchanges to increase customer satisfaction and struggle to decrease inefficiencies surrounding thecustomer experience, risking an increase in customer frustration and adverse impacts oncompany performance.Verint’s Voice of the Customer Analytics SolutionsVerint’s Voice of the Customer Analytics solutions provide a single-vendor solution set thatenables today’s organizations to collect, analyze and act on customer insights. The multichannelanalytics solution set spans speech, social media, email, web chat and enterprisefeedback—and is comprised of speech analytics, text analytics and feedback analytics, as wellas the ability to integrate data from web analytics, social media channels and other customerinteraction points.  Speech Analytics Recorded customer service conversations and survey comments remain a largely under- utilized asset for many organizations. Speech Analytics automatically mines this data to surface critical intelligence around customer needs and behaviors, product or service issues, new opportunities, emerging trends and associated customer sentiment. This unbiased and unsolicited feedback is essential for developing effective business strategies and delivering high-caliber customer experiences. The solution can pinpoint market perceptions, and opportunities, the factors driving high costs and unproductive call volumes, and the strengths and weaknesses of people, products and processes across the organization.  Text Analytics Sophisticated Text Analytics mines customer interactions and intelligence derived through feedback across multichannel customer communications, including email messages, web chat sessions, blogs, review sites, social media and virtually any other text-based channel. Sentiment analysis provides the essential additional dimension to highlight drivers of satisfaction and frustration, helping organizations hone in on business improvement and transformation opportunities. This intelligence can be combined with other data gathered in customer service operations environments—from the contact center to branch and remote offices, to back-office operations— for a holistic picture of enterprise customer service.  Feedback Analytics Enterprise and Customer Feedback Analytics—which creates useable and timely insights needed to improve products, loyalty and advocacy—enables organizations to collect, analyze and act on direct customer feedback. It includes such feedback analytics capabilities as advanced survey management, customer profile management, interactive dashboards and ad-hoc analysis across any customer touch point, along with interactive 61
  • 62. Text/Content Analytics 2011: User Perspectives dashboards that accelerate the discovery and sharing of insight, support and ad-hoc analysis to drive timely insight into the business.Verint’s VoC Analytics solutions are designed to combine all sources of customer interaction datainto a single holistic platform that allows cross-channel analysis, as well as individual customertracking capabilities. The platform features integrations that span from an ability to view acustomer journey across channels, to cross-channel automatic trend analysis and combined rootcause analysis, to the ability to forecast customer behavior given a full picture of interactions.These expand on Verint’s patented ™Customer Behavior Indicators thathelp automatically reveal criticalcustomer experience information,along with associated sentimentanalysis. This cross-departmentapproach provides a solution that actsas a platform for capturing andanalyzing all aspects associated withthe voice of the customer. To thatend, organizations are increasinglyseeking a single provider and solutionthat centralizes this critical functionwithin the enterprise.Transforming Insight into Action:VoC Analytics and WFOWhere a VoC initiative isimplemented to understand customer activity, behavior and sentiment, a WFO initiative isimplemented to manage change, measure performance and drive operational excellence. Verint’s ® ™Impact 360 Workforce Optimization suite is comprised quality monitoring and recording, voiceof the customer analytics (including speech analytics, text analytics and feedback analytics),desktop and process analytics, workforce management, performance management, eLearning,coaching and more. These capabilities provide a wider, unified framework for customerexperience management. For example, text and speech analytics results flowing directly toperformance management scorecards, allowing analysts and other VoC executives to track andmeasure specific key performance indicators (KPIs) related to the voice of the customer.Verint’s VoC Analytics solutions are part of the company’s Impact 360 Workforce Optimizationsuite, which is designed to help organizations to not just to collect and analyze criticalintelligence, but enact and manage change at the front linetypically within customer serviceoperations.Making a Mark in the VoC IndustryThe Verint Voice of the Customer Analytics solution set provides the market’s mostcomprehensive view into what’s really happening with customers, the drivers of issues and cross-channel patterns, and the ability to predict their needs, expectations and end behaviors. Thecompany’s VoC solutions have received recognition in the market, earning an “InnovationSolution” honor from Information Management, the “Technovation Award” from the AmericanTeleservices Association (ATA), TSIA’s “Recognized Innovator” and “Speech TechnologyExcellence Award” from Customer Interaction Solutions. For more information, call 1-800-4VERINT or visit www.verint.com. 62