This document provides an overview of text analytics from the past to the present and future. It discusses how text analytics has evolved from early pioneers using word frequencies to current applications in domains like customer experience management and sentiment analysis. The document also outlines the commercial landscape of text analytics vendors and common decision criteria when selecting a solution, such as support for multiple languages and integration with business intelligence. Finally, the document speculates on the future of text analytics incorporating additional data types like audio, video and images to provide more context and derive deeper insights.
Course - Machine Learning Basics with R Persontyle
This course is meant to be a fast-paced, hands-on introduction to Machine Learning using R. The course will be focusing mainly on basics of Machine Learning methods and practical implementation of these methods to solve real-world problems. This course aims to develop basic understanding of supervised learning methods, through the use of the R programming platform. It describes the different types of learning and the two main categories of their applications: Classification and Regression.
For corporate bookings or to organize on-site training email hello@persontyle.comor call now +44 (0)20 3239 3141
www.persontyle.com
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
DataKind SG sharing on our first DataDive with Humanitarian Organization for Migration Economics (HOME) and Earth Hour.
Know of other non-profits we can help? Reach out to singapore@datakind.org or drop me a note =)
O’Reilly Media 2014-data-science-salary-survey
After an extremely popular salary survey last year, the second annual salary report delves deeper into the data collected from over 800 respondents from a variety of industries, across 53 countries and 41 U.S. states. The report analyzes the tools that successful data analysts and engineers use, and how those tool choices relate to their compensation.
Findings from the survey include:
• Average number of tools and median income for all respondents
• Distribution of responses by a variety of factors, including age, location, industry, position, and cloud computing
• Detailed analysis of tool use, including tool clusters
• Correlation of tool usage and salary
Gain insight from these potentially career-changing findings
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
This is the first lecture in a series of data analytics topics and geared to individuals and business professionals who have no understand of building modern analytics approaches. This lecture provides an overview of the models and techniques we will address throughout the lecture series, we will discuss Business Intelligence topics, predictive analytics, and big data technologies. Finally, we will walk through a simple yet effective example which showcases the potential of predictive analytics in a business context.
Change is the only constant, and who knows it better than us.
Since internet and mobile is changing every industry so I decided to have a look at the impact of these two game-changing factors on MR and summed up my thoughts in the attached presentation.
Here is a summary of what I think will happen in Marketing research:-
1. Traditional MR will be replaced by Social Media Listening and Integrated data solutions
2. Revenues of marketing research will mostly come from emerging economies and hence more understanding of local languages and cultures will be required to interpret data meaningfully
3. Integrated data mining tools that will integrate social media data, sales data/ retail data and with customer profile will be used in predictive models to forecast behaviors and trends.
4. Impact of each data based decision will be assessed through financial value i.e. cost of analysis and revenue generated
5. A Research manager will be required to understand the complete value chain of client and how the data is flowing in the value chain. Plus, he will be required to understand strategic planning, forecasting models, data integration methodologies
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Course - Machine Learning Basics with R Persontyle
This course is meant to be a fast-paced, hands-on introduction to Machine Learning using R. The course will be focusing mainly on basics of Machine Learning methods and practical implementation of these methods to solve real-world problems. This course aims to develop basic understanding of supervised learning methods, through the use of the R programming platform. It describes the different types of learning and the two main categories of their applications: Classification and Regression.
For corporate bookings or to organize on-site training email hello@persontyle.comor call now +44 (0)20 3239 3141
www.persontyle.com
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
DataKind SG sharing on our first DataDive with Humanitarian Organization for Migration Economics (HOME) and Earth Hour.
Know of other non-profits we can help? Reach out to singapore@datakind.org or drop me a note =)
O’Reilly Media 2014-data-science-salary-survey
After an extremely popular salary survey last year, the second annual salary report delves deeper into the data collected from over 800 respondents from a variety of industries, across 53 countries and 41 U.S. states. The report analyzes the tools that successful data analysts and engineers use, and how those tool choices relate to their compensation.
Findings from the survey include:
• Average number of tools and median income for all respondents
• Distribution of responses by a variety of factors, including age, location, industry, position, and cloud computing
• Detailed analysis of tool use, including tool clusters
• Correlation of tool usage and salary
Gain insight from these potentially career-changing findings
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
This is the first lecture in a series of data analytics topics and geared to individuals and business professionals who have no understand of building modern analytics approaches. This lecture provides an overview of the models and techniques we will address throughout the lecture series, we will discuss Business Intelligence topics, predictive analytics, and big data technologies. Finally, we will walk through a simple yet effective example which showcases the potential of predictive analytics in a business context.
Change is the only constant, and who knows it better than us.
Since internet and mobile is changing every industry so I decided to have a look at the impact of these two game-changing factors on MR and summed up my thoughts in the attached presentation.
Here is a summary of what I think will happen in Marketing research:-
1. Traditional MR will be replaced by Social Media Listening and Integrated data solutions
2. Revenues of marketing research will mostly come from emerging economies and hence more understanding of local languages and cultures will be required to interpret data meaningfully
3. Integrated data mining tools that will integrate social media data, sales data/ retail data and with customer profile will be used in predictive models to forecast behaviors and trends.
4. Impact of each data based decision will be assessed through financial value i.e. cost of analysis and revenue generated
5. A Research manager will be required to understand the complete value chain of client and how the data is flowing in the value chain. Plus, he will be required to understand strategic planning, forecasting models, data integration methodologies
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
BigInsights and Text Analytics.
As enterprises seek to gain operational efficiencies and competitive advantage through greater use of analytics, much of the new information they need to analyze is found in text documents and, increasingly, in a wide variety of social media sites and portals. A critical step in gaining insights from this information is extracting core data from huge volumes of text. That data is then available for downstream analytic, mining and machine learning tools. AQL (Annotator Query Language) is a powerful declarative, rule-based language for the extraction of information from text documents.
Analyzing social conversation: a guide to data mining and data visualization Tempero UK
These slides were presented by Mick Conroy of Tempero and Jonathan Stray of Associated Press/Overview Project as part of Social Media Week New York #smwnyc
Tip from IBM Connect 2014: Socialytics = Social Business, Big Social Data and...SocialBiz UserGroup
In this tip, speaker Scott Padget explains how socialytics provides customer and competitive insights as well as real-time operational insights. He introduces the SIFT (Social Intelligence Fusion Toolkit) Solution that funnels big social data into actionable business intelligence. Scott also describes the lifecycle of socialytics and gives a live demo. Obviously, the slides don’t capture the exact live demo, but they do show some screenshot examples of the SIFT Solution in action.
Data Driven PR: 8 Steps to Building Media Attention with ResearchWalkerSands
Do you want to learn how your internal data can be used to gain media coverage in The New York Times, USA Today, and Mashable? Or how a simple consumer survey can lead to hundreds of new leads for your business?
Learn how in this presentation from Mike Santoro, President of tech PR firm Walker Sands, and Andrea Kempfer, Director of Marketing at market research firm Lab42.
The recorded presentation can be viewed at: http://www.walkersands.com/Data-Driven-PR-Webinar
Business Models in the Data Economy: A Case Study from the Business Partner D...Boris Otto
Data management seems to experience a renaissance today. One particular trend in the so-called data economy has been the emergence of business models based on the provision of high-quality data. In this context, the paper
examines business models of business partner data providers. The paper explores as to how and why these business models differ. Based on a study of six cases, the paper identifies three different business model patterns. A resource-based view is taken to explore the details of these patterns. Furthermore, the paper develops a set of propositions that help understand why the different business models evolved and how they may develop in the future. Finally, the paper discusses the ongoing market transformation process indicating a shift from traditional value chains toward value networks—a change which, if it is sustainable, would seriously threaten the business models of well-established data providers, such as Dun & Bradstreet, for example.
Matthew Russell's "Unleashing Twitter Data for Fun and Insight" presentation from Strata 2011. Matthew Russell's "Unleashing Twitter Data for Fun and Insight" presentation from Strata 2011. See http://strataconf.com/strata2011/public/schedule/detail/17714 for an overview of the talk.
In this talk we outline some of the key challenges in text analytics, describe some of Endeca's current research work in this area, examine the current state of the text analytics market and explore some of the prospects for the future.
Sentiment Analysis: The Marketplace and ProvidersSeth Grimes
Short tutorial presentation by Seth Grimes, presented as part of the Practical Sentiment Analysis tutorial on May 7, 2013, prior to the Sentiment Analysis Symposium, http://sentimentsymposium.com/
Text Analytics for NLPers, a presentation by Seth Grimes, created for the December 2, 2019 Natural Language Processing-New York (NYC-NLP) meetup, https://www.meetup.com/NLP-NY/events/266093296/
Forrester's Tina Moffett explorers the adoption of advanced analytics, measurement and attribution across marketing channels. She will share current practices and trends to better understanding how brands can use and apply measurement and analytics in their organization to optimize both cross-channel marketing campaigns and customer programs.
What is the impact of Big Data on Analytics from a Data Science perspective.
Presented at the Big Data and Analytics Summit 2014, Nasscom by Mamatha Upadhyaya.
Overview of Data and Analytics Essentials and FoundationsNUS-ISS
As companies increasingly integrate data across functions, the boundaries between marketing, sales and operations have been blurring. This allows them to find new opportunities that arise by aligning and integrating the activities of supply and demand to improve commercial effectiveness. Instead of conducting post-hoc analyses that allow them to correct future actions, companies generate and analyze data in near real-time and adjust their operations processes dynamically. Transitioning from static analytics outputs to more dynamic contextualized insights means analytics can be delivered with increased relevance closer to the point of decision.
This talk will cover the analytics journey from descriptive, predictive and prescriptive analytics to derive actionable and timely insights to improve customer experience to drive marketing, salesforce and operations excellence.
Keynote by Seth Grimes, presented at the Knowledge Extraction from Social Media workshop, November 12, 2012, preceding the International Semantic Web Conference
Creating an AI Startup: What You Need to KnowSeth Grimes
Seth Grimes presented "Creating an AI Startup: What You Need to Know," at a May 20, 2021 Launch Annapolis + Maryland AI (https://www.meetup.com/MarylandAI) program, focusing on opportunity and resources for Maryland tech entrepreneurs.
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
Moshe Wasserblat, Intel AI, presents on Efficient Deep Learning in Natural Language Processing Production to an online NLP meetup audience, August 3, 2020. Visit https://www.meetup.com/NY-NLP for the New York NLP meetup.
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
From Customer Emotions to Actionable Insights -- A presentation by Peter Dorrington, founder, XMplify Consulting, at the 2020 CX Emotion conference (https://cx-emotion.com), July 22, 2020.
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Dan Lee from Dentuit AI presented an Intro to Deep Learning for Medical Image Analysis at the Maryland AI meetup (https://www.meetup.com/Maryland-AI), May 27, 2020. Visit https://www.youtube.com/watch?v=xl8i7CGDQi0 for video.
Emotion AI refers to a set of technologies -- natural language processing, voice tech, facial coding, neuroscience, and behavioral analytics -- applied to interactions to extract, convey, and induce emotion. Emotion AI is a presentation by Seth Grimes at AI for Human Language, March 5, 2020 in Tel Aviv.
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
"Our FinTech Future – AI’s Opportunities and Challenges?" is a presentation by Jim Kyung-Soo Liew, Ph.D. to the Artificial Intelligence Maryland (MD-AI) meetup (https://www.meetup.com/Maryland-AI/), November 20, 2019. Dr. Liew is Co-Founder of SoKat.co and Associate Professor at Johns Hopkins Carey Business School.
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
Presentation by Nathan Schneider, Assistant Professor of Linguistics and Computer Science at Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019 (https://www.meetup.com/DC-NLP/events/264894589/).
The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpu...Seth Grimes
Presentation by Nathan Scheider, Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019, https://www.meetup.com/DC-NLP/events/264894589/.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
3. Text Analytics: An Industry View
JADT – June 5, 2014
3
Analytics is the systematic application of
algorithmic methods that derive and deliver
information, typically expressed
quantitatively, whether in the form of
indicators, tables, visualizations, or models.
• Systematic means formal & repeatable.
• Algorithmic contrasts with heuristic.
4. Text Analytics: An Industry View
JADT – June 5, 2014
4
Text analytics past:
Pioneers…
5. Document
input and
processing
Knowledge
handling is
key
Desk Set (1957): Computer engineer
Richard Sumner (Spencer Tracy)
and television network librarian
Bunny Watson (Katherine Hepburn)
and the "electronic brain" EMERAC.
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
6. Text Analytics: An Industry View
JADT – June 5, 2014
6
“Statistical information derived from word frequency and distribution is
used by the machine to compute a relative measure of significance, first for
individual words and then for sentences. Sentences scoring highest in
significance are extracted and printed out to become the auto-abstract.”
H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
7.
8.
9.
10. Text Analytics: An Industry View
JADT – June 5, 2014
10
Pipelines and patterns
IBM’s MedTAKMI,
1997-
http://www.research.ibm.com/trl/projects/textmining/index_e.htm
11. Text Analytics: An Industry View
JADT – June 5, 2014
11
Exhaustive extraction
An (old) Attensity example – NLP to identify roles and
relationships, for a law-enforcement application .
12. Text Analytics: An Industry View
JADT – June 5, 2014
12
Language engineering
GATE: General Architecture for Text Engineering.
http://gate.ac.uk/
13. Text Analytics: An Industry View
JADT – June 5, 2014
13
Text analytics present:
Business, technology, applications, and
solutions…
14. Text Analytics: An Industry View
JADT – June 5, 2014
14
“Organizations embracing text analytics all
report having an epiphany moment when
they suddenly knew more than before.”
-- Philip Russom, the Data Warehousing Institute, 2007
http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-analytics.aspx
15. Text Analytics: An Industry View
JADT – June 5, 2014
15
Linguistics, statistics, and semantics
Text analytics (typically) involves linguistic modelling,
statistical characterization, learned patterns, and
semantic understanding of text-derived features –
Named entities: people, companies, places, etc.
Pattern-based features: e-mail addresses, phone numbers,
etc.
Concepts: abstractions of entities.
Facts and relationships.
Events.
Concrete and abstract attributes (e.g., “expensive” &
“comfortable”) including measure-value pairs.
Subjectivity in the forms of opinions, sentiments, and
emotions: attitudinal data.
– applied to business ends.
16. Text Analytics: An Industry View
JADT – June 5, 2014
16
Sources
It’s a truism that 80% of enterprise-relevant information
originates in “unstructured” form:
E-mail and messages.
Web pages, online news & blogs, forum postings, and other
social media.
Contact-center notes and transcripts.
Surveys, feedback forms, warranty claims.
Scientific literature, books, legal documents.
...
Non-text “unstructured” content?
Images
Audio including speech
Video
Value derives from patterns.
17. Text Analytics: An Industry View
JADT – June 5, 2014
17
Value
What do we do with text, whether online, on-social, or in
the enterprise?
1. Post/Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata &
contents.
4. Extract information and Analyze.
18. Text Analytics: An Industry View
JADT – June 5, 2014
18
Semantics, analytics, and IR
Text analytics generates semantics to bridge search, BI, and
applications, enabling next-generation information
systems.
Search
BI/Big
Data
Applica-
tions
Search based
applications
(search + text +
apps)
Information access
(search + analytics)
Synthesis (text +
BI)/(big data)
Text analytics
(inner circle)
Semantic search
(search + text)
NextGen CRM, EFM,
MR, marketing,
apps…
19. Text Analytics: An Industry View
JADT – June 5, 2014
19
Content, composites, connections 1
20. Text Analytics: An Industry View
JADT – June 5, 2014
20
Content, Composites, Connections, 2
Content, composites, connections 2
21. Text Analytics: An Industry View
JADT – June 5, 2014
21
Applications
Text analytics has applications in:
Intelligence & law enforcement.
Life sciences & clinical medicine.
Media & publishing including social-media analysis and
contextual advertizing.
Competitive intelligence.
Voice of the Customer: CRM, product management &
marketing.
Public administration & policy.
Legal, tax & regulatory (LTR) including compliance.
Recruiting.
22. Text Analytics: An Industry View
JADT – June 5, 2014
22
Opinion, sentiment & emotion
23. Text Analytics: An Industry View
JADT – June 5, 2014
23
Sentiment analysis
A specialization, of relevance to:
Brand/reputation management.
Customer experience management (CEM).
Competitive intelligence.
Survey analysis (EFM = Enterprise Feedback Management).
Market research.
Product design/quality.
Trend spotting.
24. Text Analytics: An Industry View
JADT – June 5, 2014
24
Data exploration
via dashboards
and
workbenches.
25. Text Analytics: An Industry View
JADT – June 5, 2014
25
Text analytics present:
The market…
26. Text Analytics: An Industry View
JADT – June 5, 2014
26
http://altaplana.com/TA2014
27. Text Analytics: An Industry View
JADT – June 5, 2014
27
5%
6%
8%
9%
10%
11%
13%
14%
15%
16%
25%
27%
29%
33%
38%
38%
39%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Military/national security/intelligence
Law enforcement
Intellectual property/patent analysis
Financial services/capital markets
Product/service design, quality assurance, or warranty claims
Other
Insurance, risk management, or fraud
E-discovery
Life sciences or clinical medicine
Online commerce including shopping, price intelligence,…
Content management or publishing
Customer /CRM
Search, information access, or Question Answering
Competitive intelligence
Brand/product/reputation management
Research (not listed)
Voice of the Customer / Customer Experience Management
What are your primary applications where text comes into
play?
28. Text Analytics: An Industry View
JADT – June 5, 2014
28
Voice of the Customer
Text analytics is applied to improve customer service and
boost satisfaction and loyalty.
Analyze customer interactions and opinions –
• E-mail, contact-center notes, survey responses.
• Forum & blog posting and other social media.
– to –
• Address customer product & service issues.
• Improve quality.
• Manage brand & reputation.
Assessment of qualitative information from text helps users –
• Gain feedback on interactions.
• Assess customer value.
• Understand root causes.
• Mine data for measures such as churn likelihood.
29. Text Analytics: An Industry View
JADT – June 5, 2014
29
The commercial scene
30. Text Analytics: An Industry View
JADT – June 5, 2014
30
Online commerce
Text analytics is applied for marketing, search optimization,
competitive intelligence.
Analyze social media and enterprise feedback to understand
the Voice of the Market:
• Opportunities
• Threats
• Trends
Categorize product and service offerings for on-site search
and faceted navigation and to enrich content delivery.
Annotate pages to enhance Web-search findability, ranking.
Scrape competitor sites for offers and pricing.
Analyze social and news media for competitive information.
31. Text Analytics: An Industry View
JADT – June 5, 2014
31
E-Discovery and compliance
Text analytics is applied for compliance, fraud and risk, and
e-discovery.
Regulatory mandates and corporate practices dictate –
• Monitoring corporate communications
• Managing electronic stored information for production in
event of litigation
Sources include e-mail (!!), news, social media
Risk avoidance and fraud detection are key to effective
decision making
• Text analytics mines critical data from unstructured sources
• Integrated text-transactional analytics provides rich insights
32. Text Analytics: An Industry View
JADT – June 5, 2014
32
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
61%
0% 20% 40% 60% 80%
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form+micro)
What textual information are you analyzing or do you plan to analyze?
2014
2011
2009
33. Text Analytics: An Industry View
JADT – June 5, 2014
33
5%
5%
5%
5%
7%
9%
11%
11%
12%
12%
12%
13%
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
43%
46%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
insurance claims or underwriting notes
point-of-service notes or transcripts
video or animated images
warranty claims/documentation
photographs or other graphical images
crime, legal, or judicial reports or evidentiary materials
field/intelligence reports
speech or other audio
patent/IP filings
other
text messages/instant messages/SMS
medical records
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form) including Tumblr
Twitter, Sina Weibo, or other microblogs
What textual information are you analyzing or do you plan to analyze?
34. Text Analytics: An Industry View
JADT – June 5, 2014
34
Current, 33%
Current, 31%
Current, 34%
Current, 47%
Current, 51%
Current, 56%
Current, 47%
Current, 54%
Current, 66%
Expect, 21%
Expect, 24%
Expect, 23%
Expect, 23%
Expect, 28%
Expect, 25%
Expect, 33%
Expect, 28%
Expect, 22%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Events
Semantic annotations
Other entities – phone numbers, part/product…
Metadata such as document author,…
Concepts, that is, abstract groups of entities
Named entities – people, companies,…
Relationships and/or facts
Sentiment, opinions, attitudes, emotions,…
Topics and themes
Do you currently need (or expect to need) to extract or analyze...
35. Text Analytics: An Industry View
JADT – June 5, 2014
35
“The share rise in users
who selected
Arabic…coincided with
much of the civil
unrest… in Middle
Eastern countries.”
http://bits.blogs.nytimes.com/2014/03/09/the
-languages-of-twitter-users/
36. Text Analytics: An Industry View
JADT – June 5, 2014
36
10%
1%
16%
9%
36%
34%
2%
2%
18%
7%
4%
3%
13%
8%
7%
38%
3%
2%
3%
2%
5%
9%
17%
3%
28%
7%
17%
24%
2%
10%
11%
15%
8%
4%
17%
21%
3%
20%
4%
0%
1%
1%
2%
0%
0% 10% 20% 30% 40% 50% 60%
Arabic
Bahasa Indonesia or Malay
Chinese
Dutch
French
German
Greek
Hindi, Urdu, Bengali, Punjabi, or…
Italian
Japanese
Korean
Polish
Portuguese
Russian
Scandinavian or Baltic
Spanish
Turkish or Turkic
Other African
Other Arabic script (including Urdu,…
Other East Asian
Other European or Slavic/Cyrillic
Other
Current
Within 2 years
Non-English language support?
37. Text Analytics: An Industry View
JADT – June 5, 2014
37
Software & platform options
Text-analytics options may be grouped in general classes.
• Installed text-analysis application, whether desktop or
server or deployed in-database.
• Data mining workbench.
• Hosted.
• Programming tool.
• As-a-service, via an application programming interface
(API).
• Code library or component of a business/vertical
application, for instance for CRM, e-discovery, search.
Text analytics is frequently embedded in search or other
end-user applications.
The slides that follow next will present leading options in
each category except Hosted…
38. Text Analytics: An Industry View
JADT – June 5, 2014
38
22%
25%
28%
30%
32%
33%
33%
36%
37%
40%
41%
43%
44%
45%
53%
53%
54%
64%
0% 10% 20% 30% 40% 50% 60% 70%
media monitoring/analysis interface
hosted or Web service (on-demand "API") option
supports data fusion / unified analytics
sector adaptation (e.g., hospitality, insurance, retail, health care,…
BI (business intelligence) integration
ability to create custom workflows or to create or change…
big data capabilities, e.g., via Hadoop/MapReduce
predictive-analytics integration
open source
support for multiple languages
sentiment scoring
"real time" capabilities
low cost
deep sentiment/emotion/opinion/intent extraction
document classification
broad information extraction capability
ability to use specialized dictionaries, taxonomies, ontologies, or…
ability to generate categories or taxonomies
What is important in a solution?
2014 (n=139)
2011 (n=136)
2009 (n=78)
39. Text Analytics: An Industry View
JADT – June 5, 2014
39
User decision criteria
Primary considerations include –
Adaptation or specialization: To a business or cultural domain,
language, information type (e.g., text, speech, images) &
source (e.g., Twitter, e-mail, online news).
By-user customization possibilities: For instance, via custom
taxonomies, rules, lexicons.
Sentiment resolution: Aggregate, message, or feature level.
(What features? Topics, coreferenced entities?)
What sentiment? Valence & what else? Emotion? Intent?
Outputs: E.g., annotated text, models, indicators, dashboards,
exploratory data interfaces.
Usage mode: As-a-service (API), installed, or hosted/cloud.
Capacity: Volume, performance, throughput, latency.
Cost.
40. Text Analytics: An Industry View
JADT – June 5, 2014
40
A few French companies
41. Text Analytics: An Industry View
JADT – June 5, 2014
41
Academic spin-offs
People Pattern
42. Text Analytics: An Industry View
JADT – June 5, 2014
42
Text analytics future:
Synthesis and sensemaking.
45. Text Analytics: An Industry View
JADT – June 5, 2014
45
Emotion and outcomes
46. Text Analytics: An Industry View
JADT – June 5, 2014
46
Audio including speech.
Images.
Video.
http://www.geekosystem.com/
facebook-face-recognition/
http://www.sciencedirect.com/science
/article/pii/S0167639312000118
http://flylib.com/books/en/2.495.1.54/1/
Beyond Text
47. Text Analytics: An Industry View
JADT – June 5, 2014
47
The world of big data
Machine data (e.g., logs, sensor outputs, clickstreams).
Actions, interactions, and transactions: geolocation and
time.
Profiles: individual, demographic & behavioral.
Text, audio, images, and video.
Facts and feelings.
48. Text Analytics: An Industry View
JADT – June 5, 2014
48
(Accessible) data everywhere
49. Text Analytics: An Industry View
JADT – June 5, 2014
49
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
A big data analytics architecture (example)
50. Text Analytics: An Industry View
JADT – June 5, 2014
50
http://searchuserinterfaces.com/
“It is convenient to divide the entire
information access process into two
main components: information
retrieval through searching and
browsing, and analysis and synthesis
of results. This broader process is
often referred to in the literature as
sensemaking.
Sensemaking refers to an iterative
process of formulating a conceptual
representation from of a large
volume of information.”
– Marti Hearst, 2009
Sensemaking
51. Text Analytics: An Industry View
JADT – June 5, 2014
51
http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm
En route
52. Text Analytics Past, Present &
Future: An Industry View
Seth Grimes
Alta Plana Corporation
@sethgrimes
June 5, 2014