ANALYTICS TO COMBAT GROWTH IN UNSTRUCTURED TEXT DATA ANNELIES TJETJEP BUSINESS SOLUTION MANAGER, ANALYTICS 21ST FEBRUARY 2013
EXPLORATION, CATEGORISATION,TEXT ANALYTICS SENTIMENT ANALYSIS & INSIGHT• Analytics in a World of Big Data• What is Text Analytics?• The SAS® Text Analytics Suite• Text Mining in Action• SAS® Social Media Analytics• Questions?
ANALYTICS IN A BREAKDOWN OF DATA USAGEWORLD OF BIG DATA We put nearly all of the data that is of real value to good use 22% We probably leverage about half of our valuable data 53% Vast quantities of useful data go untapped 24% Source: Economist Intelligence Unit 2011 Report, Sponsored by SAS, 2011
ANALYTICS IN A BREAKDOWN OF DATA COLLECTION & ANALYSISWORLD OF BIG DATA Structured data ( tables, records ) Semi-structured data ( XML and similar standards ) Complex data ( hierarchical or legacy sources ) Event data ( messages, usually in real time ) Unstructured data ( human language, audio, video ) Social media data ( blogs, tweets, social networks ) Web logs and click streams Spatial data ( long / lat coordinates, GPS output ) Machine-generated data ( sensors, RFID, devices ) Scientific data ( astronomy, genomes, physics ) Other Based on 450 responses from 109 respondents who report practicing Big Data analytics; 4.1 responses per respondent on average . Source: TDWI Big Data Analytics Report, 4 th Quarter 2011, Philip Russom
WHAT IS TEXT HONG KONG EFFICIENCY UNIT ANALYTICS?The 1823 Call Centre of the Hong Kong governments Efficiency Unit acts as asingle point of contact for handling public inquiries and complaints onbehalf of many government departments.1823 operates round-the-clock, including Sundays and public holidays. Eachyear, it answers about 2.65 million calls and 98,000 e-mails, including inquiries,suggestions and complaints.
1823 HONG KONG EFFICIENCY UNIT PUBLIC BUSINESS ISSUE RESULTSDevelop a Compliant "By decoding the messages through statistical andIntelligence System that root-cause analyses of complaints data, theuncovers the trends, government can better understand the voice of thepatterns and relationships people, and help government departments improveinherent in the complaints service delivery, make informed decisions and develop smart strategies. This in turn helps boost public satisfaction with the government, and build a quality city.” - Efficiency Unit’s Assistant Director, W. F. Yuk Hong Kong ICT Awards 2009 Grand Award Best Public Service Application (Transformations)
TRIBUNE American news organization, reaching more than 80% of US households COMPANY PUBLISHING MEDIA AND BUSINESS ISSUE RESULTSNeeded to quickly and “The news hits so fast that you have to be changingaccurately define and things very quickly. You have to be aware of whatcategorize online content youre writing about and the content that yourerelevant to readership tagging it to. If an indexing mistake happens, you have to change it very quickly because reputations are at stake.” - Keith DeWeese, Director of Information Semantics Management • Better ad targeting and increased ad revenue
THE SAS® TEXT BOTH BRAINS OF THE EQUATION ANALYTICS SUITE Statistical Analysis Natural Language Processing Singular value decomposition Taxonomic classification Flat and hierarchical clustering Entity and concept extraction Word relationship strength profiling Sentiment identification Dominant word pairs identification Contextual and pattern recognitionAlgorithmically-based predictive models Linguistically-based classification models
THE SAS® TEXTANALYTICS SUITE Content Sentiment Text Mining Categorization Analysis
EXPLORING & DISCOVERING SAS® TEXT MINER INSIGHTS1. Input text messages – 2. Parse & explore Text Data –break 3. Discover Topics – clustere.g. twitter data, reports, down text and explore relationships documents of similar content email, news, forum of key concepts such as persons, and describe them with messages places, organizations… important key words
DISCOVERING PATTERNS FOR SAS® TEXT MINER MODELLING 2. Parse Text Data and Discover 1. Input text messages – 3. Predictive Modeling with text data – Topics – Break down text into e.g. twitter data, reports, text data input into models may provide structured data, group email, news, forum reliable info to predict outcome & messages of similar content messages behaviorCustomer data Predict customers that are likely to accept the offer…
TAXONOMIES Related Terms, Phrases, linguistic logicCategories and Service Check-in, Check Out, Staff, Concierge, etcsub-categories Bed, shower, TV, room art, lighting, technology, Accommodations etc. Amenities Fitness, pools, spa, parking, etc. Hotel Brand Food and Bev Pool bar, restaurant, room service, etc. Nightlife, ambience, relaxation, romantic, Experience etc. Gaming Slots, tables, tournaments, etc. Website Navigation, ease of reservations, etc.
CONTENT SAS® ENTERPRISE CONTENTCATEGORISATION CATEGORIZATION 2. Parse content through 1. Input text content – e.g. 3. Output Results – e.g. each categorization taxonomy – twitter data, reports, email, message/ document is now match and score messages/ news, forum messages associated with detailed documents to relevant category/ subcategories categories Categorization Taxonomy Topic = Organized Crime Results are indexed or fed into existing systems for search & analysis
CONCEPT SAS® ENTERPRISE CONTENT EXTRACTION CATEGORIZATION1. Input text content – e.g. 2. Parse content through 3. Output Results – e.g. eachtwitter data, reports, email, concepts taxonomy – match message/ document is now news, forum messages messages/ documents to associated with a list of extract concepts extracted concepts Concept Taxonomy Concepts • Locations – kitchen… • Persons – John… • Dates – Monday… • Weapons – knife… Results are indexed or fed into existing systems for search & analysis
SENTIMENT SAS® SENTIMENT ANALYSIS EXTRACTION 2. Parse messages through 3. Output Results – e.g. each Sentiment taxonomy – message/ document and1. Input text content – e.g. characteristics within the match and score messages,twitter data, reports, email, document are now associated and their details, for news, forum messages with a sentiment polarity score sentiment polarity (e.g. message is 80% positive) 4. Sentiments Reports – Results are easily analyzed against time period and/or Sentiment Taxonomy product features, This is positive drillable to see exact message This is negative This is negative This is positive This is negative This is positive Results are indexed or fed into existing systems for search & analysis
WHOLE BRAIN INTERACTIONS PROCESS A CALL CENTRE EXAMPLE Exploration of linkages Caller1234: Initial taxonomy i called them with a little issue that ihad on my car repair, and the original Concepts: representative blind transferred meover to the second representative that Call reason: car repairi spoke to, so when i got to the second Unhappy reasons: blindrep (John?), he had no idea who i am, transfer; re-explain what my account was, what were the Topic categorisation reasons that i was calling. Other related staff: Johni had to re-explain myself completely. Classification Predictive Modelling Taxonomies Sentiment
Social Media is everywhere – it’s not just Facebook and Twitter.• Your customers are there talking about your brand.• What are customers saying about you and what impact could that have on your business?Sources: The Conversation: Brian Solis and Jess3
POWER SHIFT THE EMPOWERED CONSUMERCOMPANIES CONSUMERS
SOLUTION FRAMEWORK iPad & SAS SAS SAS Android Media Media Conversation apps Portal Workbench CenterListen, Engage, & Leverage Customizable Influence Text Clusters Classify & Segment Taxonomies Sentiment & Engagement & Segments Analysis Data Correlation & Mine & Forecast Text Mining Natural Language Processing Mining Forecasting Survey Blog Web Online Call Media Collect Data Data Data Reviews logs Data Clean Integrate Web Crawling Data Organize Stores Sample Online Sources
• ANALYTICS TO COMBAT GROWTH INTEXT ANALYTICS UNSTRUCTURED TEXT DATA • Data is “BIG” and growing • Most data is in unstructured or semi-structured format • Need for smarter ways of mining data: automation & analytics • Need for whole-brained analysis of textual information • SAS provides an end-to-end text analytics suite • Power is now in the hands of the consumer