This document summarizes Seth Grimes' presentation on text analytics at the 2nd LIDER roadmapping workshop in Madrid on May 8, 2014. The presentation covered various applications of text analytics including customer experience management, online commerce, and e-discovery. It also discussed the types of textual data that can be analyzed such as emails, social media posts, reviews and surveys. The document provided information on important capabilities for text analytics solutions such as information extraction, sentiment analysis and integration with other systems.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
When to use the different text analytics tools - Meaning CloudMeaningCloud
Classification, topic extraction, clustering... When to use the different Text Analytics tools?
How to leverage Text Analytics technology for your business
MeaningCloud webinar, February 8th, 2017
More information and recording of the webinar https://www.meaningcloud.com/blog/recorded-webinar-use-different-text-analytics-tools
www.meaningcloud.com
O’Reilly Media 2014-data-science-salary-survey
After an extremely popular salary survey last year, the second annual salary report delves deeper into the data collected from over 800 respondents from a variety of industries, across 53 countries and 41 U.S. states. The report analyzes the tools that successful data analysts and engineers use, and how those tool choices relate to their compensation.
Findings from the survey include:
• Average number of tools and median income for all respondents
• Distribution of responses by a variety of factors, including age, location, industry, position, and cloud computing
• Detailed analysis of tool use, including tool clusters
• Correlation of tool usage and salary
Gain insight from these potentially career-changing findings
Course - Machine Learning Basics with R Persontyle
This course is meant to be a fast-paced, hands-on introduction to Machine Learning using R. The course will be focusing mainly on basics of Machine Learning methods and practical implementation of these methods to solve real-world problems. This course aims to develop basic understanding of supervised learning methods, through the use of the R programming platform. It describes the different types of learning and the two main categories of their applications: Classification and Regression.
For corporate bookings or to organize on-site training email hello@persontyle.comor call now +44 (0)20 3239 3141
www.persontyle.com
Sentiment Analysis on Twitter Dataset using R Languageijtsrd
Sentiment Analysis involves determining the evaluative nature of a piece of text. A product review can express a positive, negative, or neutral sentiment or polarity . Automatically identifying sentiment expressed in text has a number of applications, including tracking sentiment towards Movie reviews and Automobile reviews improving customer relation models, detecting happiness and well being, and improving automatic dialogue systems. The evaluative intensity for both positive and negative terms changes in a negated context, and the amount of change varies from term to term. To adequately capture the impact of negation on individual terms, here proposed to empirically estimate the sentiment scores of terms in negated context from movie review and auto mobile review, and built two lexicons, one for terms in negated contexts and one for terms in affirmative non negated contexts. By using these Affirmative Context Lexicons and Negated Context Lexicons were able to significantly improve the performance of the overall sentiment analysis system on both tasks. This thesis have proposed a sentiment analysis system that detects the sentiment of corpus dataset using movie review and Automobile review as well as the sentiment of a term a word or a phrase within a message term level task using R language. B. Nagajothi | Dr. R. Jemima Priyadarsini "Sentiment Analysis on Twitter Dataset using R Language" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28071.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/28071/sentiment-analysis-on-twitter-dataset-using-r-language/b-nagajothi
Detecting Gender-bias from Energy Modeling Jobscapeyungahhh
Despite of many strides made by women in the workplace, workplace inequality persists, whether they come from geographical cultures or a result of workplace attrition. Unconscious (or implicit) bias can creep
into job listings and can actively or passively discourage certain applicants’ pool. The goal of this study is to understand the role of gender bias in the energy modeling job market.
Job listings and their word choices can significantly affect the application pool. This study investigates the word choices from job postings specifically for the energy modeling sector within the United States building construction industry, using web-scraped data of an online job site. The study explores the potential usage of gender-bias terminology through frequency analysis and Natural Language Processing Kit. The study also employs two machine learning methods (one supervise and one unsupervised) to test the application in this
context and evaluate its utilization.
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Deep Recommender Systems - PAPIs.io LATAM 2018Gabriel Moreira
In this talk, we provide an overview of the state on how Deep Learning techniques have been recently applied to Recommender Systems. Furthermore, I provide an brief view of my ongoing Phd. research on News Recommender Systems with Deep Learning
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
This is the first lecture in a series of data analytics topics and geared to individuals and business professionals who have no understand of building modern analytics approaches. This lecture provides an overview of the models and techniques we will address throughout the lecture series, we will discuss Business Intelligence topics, predictive analytics, and big data technologies. Finally, we will walk through a simple yet effective example which showcases the potential of predictive analytics in a business context.
By applying user context and uncovering essential information, search engines can deliver a more rewarding experience, resulting in more digital revenue for the organization.
Keynote by Seth Grimes, presented at the Knowledge Extraction from Social Media workshop, November 12, 2012, preceding the International Semantic Web Conference
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
When to use the different text analytics tools - Meaning CloudMeaningCloud
Classification, topic extraction, clustering... When to use the different Text Analytics tools?
How to leverage Text Analytics technology for your business
MeaningCloud webinar, February 8th, 2017
More information and recording of the webinar https://www.meaningcloud.com/blog/recorded-webinar-use-different-text-analytics-tools
www.meaningcloud.com
O’Reilly Media 2014-data-science-salary-survey
After an extremely popular salary survey last year, the second annual salary report delves deeper into the data collected from over 800 respondents from a variety of industries, across 53 countries and 41 U.S. states. The report analyzes the tools that successful data analysts and engineers use, and how those tool choices relate to their compensation.
Findings from the survey include:
• Average number of tools and median income for all respondents
• Distribution of responses by a variety of factors, including age, location, industry, position, and cloud computing
• Detailed analysis of tool use, including tool clusters
• Correlation of tool usage and salary
Gain insight from these potentially career-changing findings
Course - Machine Learning Basics with R Persontyle
This course is meant to be a fast-paced, hands-on introduction to Machine Learning using R. The course will be focusing mainly on basics of Machine Learning methods and practical implementation of these methods to solve real-world problems. This course aims to develop basic understanding of supervised learning methods, through the use of the R programming platform. It describes the different types of learning and the two main categories of their applications: Classification and Regression.
For corporate bookings or to organize on-site training email hello@persontyle.comor call now +44 (0)20 3239 3141
www.persontyle.com
Sentiment Analysis on Twitter Dataset using R Languageijtsrd
Sentiment Analysis involves determining the evaluative nature of a piece of text. A product review can express a positive, negative, or neutral sentiment or polarity . Automatically identifying sentiment expressed in text has a number of applications, including tracking sentiment towards Movie reviews and Automobile reviews improving customer relation models, detecting happiness and well being, and improving automatic dialogue systems. The evaluative intensity for both positive and negative terms changes in a negated context, and the amount of change varies from term to term. To adequately capture the impact of negation on individual terms, here proposed to empirically estimate the sentiment scores of terms in negated context from movie review and auto mobile review, and built two lexicons, one for terms in negated contexts and one for terms in affirmative non negated contexts. By using these Affirmative Context Lexicons and Negated Context Lexicons were able to significantly improve the performance of the overall sentiment analysis system on both tasks. This thesis have proposed a sentiment analysis system that detects the sentiment of corpus dataset using movie review and Automobile review as well as the sentiment of a term a word or a phrase within a message term level task using R language. B. Nagajothi | Dr. R. Jemima Priyadarsini "Sentiment Analysis on Twitter Dataset using R Language" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28071.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/28071/sentiment-analysis-on-twitter-dataset-using-r-language/b-nagajothi
Detecting Gender-bias from Energy Modeling Jobscapeyungahhh
Despite of many strides made by women in the workplace, workplace inequality persists, whether they come from geographical cultures or a result of workplace attrition. Unconscious (or implicit) bias can creep
into job listings and can actively or passively discourage certain applicants’ pool. The goal of this study is to understand the role of gender bias in the energy modeling job market.
Job listings and their word choices can significantly affect the application pool. This study investigates the word choices from job postings specifically for the energy modeling sector within the United States building construction industry, using web-scraped data of an online job site. The study explores the potential usage of gender-bias terminology through frequency analysis and Natural Language Processing Kit. The study also employs two machine learning methods (one supervise and one unsupervised) to test the application in this
context and evaluate its utilization.
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Deep Recommender Systems - PAPIs.io LATAM 2018Gabriel Moreira
In this talk, we provide an overview of the state on how Deep Learning techniques have been recently applied to Recommender Systems. Furthermore, I provide an brief view of my ongoing Phd. research on News Recommender Systems with Deep Learning
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
This is the first lecture in a series of data analytics topics and geared to individuals and business professionals who have no understand of building modern analytics approaches. This lecture provides an overview of the models and techniques we will address throughout the lecture series, we will discuss Business Intelligence topics, predictive analytics, and big data technologies. Finally, we will walk through a simple yet effective example which showcases the potential of predictive analytics in a business context.
By applying user context and uncovering essential information, search engines can deliver a more rewarding experience, resulting in more digital revenue for the organization.
Keynote by Seth Grimes, presented at the Knowledge Extraction from Social Media workshop, November 12, 2012, preceding the International Semantic Web Conference
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
As per Gartner, global revenue in the business intelligence (BI) and analytics software market is forecast to reach $18.3 billion in 2017, an increase of 7.3 percent from 2016, according to the latest forecast from Gartner, Inc. By the end of 2020, the market is forecast to grow to $22.8 billion.
Il ruolo chiave degli Advanced Analytics per la Supply ChainACTOR
Presentation of the speech held by Raffaele Maccioni (Co-Founder and CEO at ACT Operations Research) and Claudia Beldon (VP - Fashion & Luxury Industry at ACT Operations Research) at the recent "Logistica Efficiente" event titled "L'innovazione nella Supply Chain 2018".
Creating an AI Startup: What You Need to KnowSeth Grimes
Seth Grimes presented "Creating an AI Startup: What You Need to Know," at a May 20, 2021 Launch Annapolis + Maryland AI (https://www.meetup.com/MarylandAI) program, focusing on opportunity and resources for Maryland tech entrepreneurs.
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
Moshe Wasserblat, Intel AI, presents on Efficient Deep Learning in Natural Language Processing Production to an online NLP meetup audience, August 3, 2020. Visit https://www.meetup.com/NY-NLP for the New York NLP meetup.
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
From Customer Emotions to Actionable Insights -- A presentation by Peter Dorrington, founder, XMplify Consulting, at the 2020 CX Emotion conference (https://cx-emotion.com), July 22, 2020.
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Dan Lee from Dentuit AI presented an Intro to Deep Learning for Medical Image Analysis at the Maryland AI meetup (https://www.meetup.com/Maryland-AI), May 27, 2020. Visit https://www.youtube.com/watch?v=xl8i7CGDQi0 for video.
Emotion AI refers to a set of technologies -- natural language processing, voice tech, facial coding, neuroscience, and behavioral analytics -- applied to interactions to extract, convey, and induce emotion. Emotion AI is a presentation by Seth Grimes at AI for Human Language, March 5, 2020 in Tel Aviv.
Text Analytics for NLPers, a presentation by Seth Grimes, created for the December 2, 2019 Natural Language Processing-New York (NYC-NLP) meetup, https://www.meetup.com/NLP-NY/events/266093296/
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
"Our FinTech Future – AI’s Opportunities and Challenges?" is a presentation by Jim Kyung-Soo Liew, Ph.D. to the Artificial Intelligence Maryland (MD-AI) meetup (https://www.meetup.com/Maryland-AI/), November 20, 2019. Dr. Liew is Co-Founder of SoKat.co and Associate Professor at Johns Hopkins Carey Business School.
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
Presentation by Nathan Schneider, Assistant Professor of Linguistics and Computer Science at Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019 (https://www.meetup.com/DC-NLP/events/264894589/).
The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpu...Seth Grimes
Presentation by Nathan Scheider, Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019, https://www.meetup.com/DC-NLP/events/264894589/.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Text Analytics Applied (LIDER roadmapping presentation)
1. Text Analytics Applied
Seth Grimes
Alta Plana Corporation
@sethgrimes
2nd LIDER roadmapping
workshop – Madrid
May 8, 2014
2. Text Analytics Applied
2nd LIDER workshop
2
“Organizations embracing text analytics all
report having an epiphany moment when
they suddenly knew more than before.”
-- Philip Russom, the Data Warehousing Institute, 2007
http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-analytics.aspx
4. Document
input and
processing
Knowledge
handling is
key
Desk Set (1957): Computer engineer
Richard Sumner (Spencer Tracy)
and television network librarian
Bunny Watson (Katherine Hepburn)
and the "electronic brain" EMERAC.
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
5. Text Analytics Applied
2nd LIDER workshop
5
Statistics and semantics
Text analytics involves statistical characterization and
semantic understanding of text-derived features –
Named entities: people, companies, places, etc.
Pattern-based entities: e-mail addresses, phone numbers, etc.
Concepts: abstractions of entities.
Facts and relationships.
Events.
Concrete and abstract attributes (e.g., “expensive” &
“comfortable”) including measure-value pairs.
Subjectivity in the forms of opinions, sentiments, and
emotions: attitudinal data.
– applied to business ends.
6. Text Analytics Applied
2nd LIDER workshop
6
Sources
It’s a truism that 80% of enterprise-relevant information
originates in “unstructured” form:
E-mail and messages.
Web pages, online news & blogs, forum postings, and other
social media.
Contact-center notes and transcripts.
Surveys, feedback forms, warranty claims.
Scientific literature, books, legal documents.
...
Non-text “unstructured” content?
Images
Audio including speech
Video
Value derives from patterns.
7. Text Analytics Applied
2nd LIDER workshop
7
Value
What do we do with information online, on-social, and in the
enterprise?
1. Post/Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata &
contents.
4. Extract and Analyze.
8. Text Analytics Applied
2nd LIDER workshop
8
Semantics, analytics, and IR
Text analytics generates semantics to bridge search, BI, and
applications, enabling next-generation information
systems.
Search
BI/Big
Data
Applica-
tions
Search based
applications
(search + text +
apps)
Information access
(search + analytics)
Synthesis (text +
BI)/(big data)
Text analytics
(inner circle)
Semantic search
(search + text)
NextGen CRM, EFM,
MR, marketing,
apps…
12. Text Analytics Applied
2nd LIDER workshop
12
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
A big data analytics architecture (example)
13. Text Analytics Applied
2nd LIDER workshop
13
Applications
Synthesis is cool, but let’s take a step back…
Text analytics has applications in:
Intelligence & law enforcement.
Life sciences & clinical medicine.
Media & publishing including social-media analysis and
contextual advertizing.
Competitive intelligence.
Voice of the Customer: CRM, product management &
marketing.
Public administration & policy.
Legal, tax & regulatory (LTR) including compliance.
Recruiting.
14. Text Analytics Applied
2nd LIDER workshop
14
Sentiment analysis
A specialization, of relevance to:
Brand/reputation management.
Customer experience management (CEM).
Competitive intelligence.
Survey analysis (EFM).
Market research.
Product design/quality.
Trend spotting.
16. Text Analytics Applied
2nd LIDER workshop
16
5%
6%
8%
9%
10%
11%
13%
14%
15%
16%
25%
27%
29%
33%
38%
38%
39%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Military/national security/intelligence
Law enforcement
Intellectual property/patent analysis
Financial services/capital markets
Product/service design, quality assurance, or warranty claims
Other
Insurance, risk management, or fraud
E-discovery
Life sciences or clinical medicine
Online commerce including shopping, price intelligence, reviews
Content management or publishing
Customer /CRM
Search, information access, or Question Answering
Competitive intelligence
Brand/product/reputation management
Research (not listed)
Voice of the Customer / Customer Experience Management
What are your primary applications where text comes into play?
17. Text Analytics Applied
2nd LIDER workshop
17
Voice of the Customer
Text analytics is applied to improve customer service and
boost satisfaction and loyalty.
Analyze customer interactions and opinions –
• E-mail, contact-center notes, survey responses.
• Forum & blog posting and other social media.
– to –
• Address customer product & service issues.
• Improve quality.
• Manage brand & reputation.
Assessment of qualitative information from text helps users –
• Gain feedback on interactions.
• Assess customer value.
• Understand root causes.
• Mine data for measures such as churn likelihood.
18. Text Analytics Applied
2nd LIDER workshop
18
Online commerce
Text analytics is applied for marketing, search optimization,
competitive intelligence.
Analyze social media and enterprise feedback to understand
the Voice of the Market:
• Opportunities
• Threats
• Trends
Categorize product and service offerings for on-site search
and faceted navigation and to enrich content delivery.
Annotate pages to enhance Web-search findability, ranking.
Scrape competitor sites for offers and pricing.
Analyze social and news media for competitive information.
19. Text Analytics Applied
2nd LIDER workshop
19
E-Discovery and compliance
Text analytics is applied for compliance, fraud and risk, and
e-discovery.
Regulatory mandates and corporate practices dictate –
• Monitoring corporate communications
• Managing electronic stored information for production in
event of litigation
Sources include e-mail (!!), news, social media
Risk avoidance and fraud detection are key to effective
decision making
• Text analytics mines critical data from unstructured sources
• Integrated text-transactional analytics provides rich insights
20. Text Analytics Applied
2nd LIDER workshop
20
5%
5%
5%
5%
7%
9%
11%
11%
12%
12%
12%
13%
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
43%
46%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
insurance claims or underwriting notes
point-of-service notes or transcripts
video or animated images
warranty claims/documentation
photographs or other graphical images
crime, legal, or judicial reports or evidentiary materials
field/intelligence reports
speech or other audio
patent/IP filings
other
text messages/instant messages/SMS
medical records
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form) including Tumblr
Twitter, Sina Weibo, or other microblogs
What textual information are you analyzing or do you plan to
analyze?
21. Text Analytics Applied
2nd LIDER workshop
21
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
43%
46%
0% 10% 20% 30% 40% 50% 60% 70%
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form) including Tumblr
Twitter, Sina Weibo, or other microblogs
What textual information are you analyzing or do you plan to
analyze?
2014
2011
2009
22. Text Analytics Applied
2nd LIDER workshop
22
Current, 33%
Current, 31%
Current, 34%
Current, 47%
Current, 51%
Current, 56%
Current, 47%
Current, 54%
Current, 66%
Expect, 21%
Expect, 24%
Expect, 23%
Expect, 23%
Expect, 28%
Expect, 25%
Expect, 33%
Expect, 28%
Expect, 22%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Events
Semantic annotations
Other entities – phone numbers, part/product numbers, e-mail &
street addresses, etc.
Metadata such as document author, publication
date, title, headers, etc.
Concepts, that is, abstract groups of entities
Named entities – people, companies, geographic
locations, brands, ticker symbols, etc.
Relationships and/or facts
Sentiment, opinions, attitudes, emotions, perceptions, intent
Topics and themes
Do you currently need (or expect to need) to extract or analyze...
23. Text Analytics Applied
2nd LIDER workshop
23
16%
18%
22%
25%
28%
30%
32%
33%
33%
36%
37%
40%
41%
43%
44%
45%
53%
53%
54%
64%
0% 10% 20% 30% 40% 50% 60% 70%
export to Semantic Web formats…
frontline voice of the customer (VOC) system integration
media monitoring/analysis interface
hosted or Web service (on-demand "API") option
supports data fusion / unified analytics
sector adaptation (e.g., hospitality, insurance, retail, health…
BI (business intelligence) integration
ability to create custom workflows or to create or change…
big data capabilities, e.g., via Hadoop/MapReduce
predictive-analytics integration
open source
support for multiple languages
sentiment scoring
"real time" capabilities
low cost
deep sentiment/emotion/opinion/intent extraction
document classification
broad information extraction capability
ability to use specialized…
ability to generate categories or taxonomies
What is important in a solution?
24. Text Analytics Applied
2nd LIDER workshop
24
10%
1%
16%
9%
36%
34%
2%
2%
18%
7%
4%
3%
13%
8%
7%
38%
3%
2%
3%
2%
5%
9%
17%
3%
28%
7%
17%
24%
2%
10%
11%
15%
8%
4%
17%
21%
3%
20%
4%
0%
1%
1%
2%
0%
0% 10% 20% 30% 40% 50% 60%
Arabic
Bahasa Indonesia or Malay
Chinese
Dutch
French
German
Greek
Hindi, Urdu, Bengali, Punjabi, or other…
Italian
Japanese
Korean
Polish
Portuguese
Russian
Scandinavian or Baltic
Spanish
Turkish or Turkic
Other African
Other Arabic script (including…
Other East Asian
Other European or Slavic/Cyrillic
Other
Current
Within 2 years
Non-English language support?
25. Text Analytics Applied
2nd LIDER workshop
25
Software & platform options
Text-analytics options may be grouped in general classes.
• Installed text-analysis application, whether desktop or
server or deployed in-database.
• Data mining workbench.
• Hosted.
• Programming tool.
• As-a-service, via an application programming interface
(API).
• Code library or component of a business/vertical
application, for instance for CRM, e-discovery, search.
Text analytics is frequently embedded in search or other
end-user applications.
The slides that follow next will present leading options in
each category except Hosted…
26. Text Analytics Applied
2nd LIDER workshop
26
User decision criteria
Primary considerations include –
Adaptation or specialization: To a business or cultural domain,
language, information type (e.g., text, speech, images) &
source (e.g., Twitter, e-mail, online news).
By-user customization possibilities: For instance, via custom
taxonomies, rules, lexicons.
Sentiment resolution: Aggregate, message, or feature level.
(What features? Topics, coreferenced entities?)
What sentiment? Valence & what else? Emotion? Intent?
Outputs: E.g., annotated text, models, indicators, dashboards,
exploratory data interfaces.
Usage mode: As-a-service (API), installed, or hosted/cloud.
Capacity: Volume, performance, throughput, latency.
Cost.