Text Analytics Past, Present &
Future: An Industry View
Seth Grimes
Alta Plana Corporation
@sethgrimes
June 5, 2014
Text Analytics: An Industry View
JADT – June 5, 2014
2
Text Analytics: An Industry View
JADT – June 5, 2014
3
Analytics is the systematic application of
algorithmic methods that derive and deliver
information, typically expressed
quantitatively, whether in the form of
indicators, tables, visualizations, or models.
• Systematic means formal & repeatable.
• Algorithmic contrasts with heuristic.
Text Analytics: An Industry View
JADT – June 5, 2014
4
Text analytics past:
Pioneers…
Document
input and
processing
Knowledge
handling is
key
Desk Set (1957): Computer engineer
Richard Sumner (Spencer Tracy)
and television network librarian
Bunny Watson (Katherine Hepburn)
and the "electronic brain" EMERAC.
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
Text Analytics: An Industry View
JADT – June 5, 2014
6
“Statistical information derived from word frequency and distribution is
used by the machine to compute a relative measure of significance, first for
individual words and then for sentences. Sentences scoring highest in
significance are extracted and printed out to become the auto-abstract.”
H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
Text Analytics: An Industry View
JADT – June 5, 2014
10
Pipelines and patterns
IBM’s MedTAKMI,
1997-
http://www.research.ibm.com/trl/projects/textmining/index_e.htm
Text Analytics: An Industry View
JADT – June 5, 2014
11
Exhaustive extraction
An (old) Attensity example – NLP to identify roles and
relationships, for a law-enforcement application .
Text Analytics: An Industry View
JADT – June 5, 2014
12
Language engineering
GATE: General Architecture for Text Engineering.
http://gate.ac.uk/
Text Analytics: An Industry View
JADT – June 5, 2014
13
Text analytics present:
Business, technology, applications, and
solutions…
Text Analytics: An Industry View
JADT – June 5, 2014
14
“Organizations embracing text analytics all
report having an epiphany moment when
they suddenly knew more than before.”
-- Philip Russom, the Data Warehousing Institute, 2007
http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-analytics.aspx
Text Analytics: An Industry View
JADT – June 5, 2014
15
Linguistics, statistics, and semantics
Text analytics (typically) involves linguistic modelling,
statistical characterization, learned patterns, and
semantic understanding of text-derived features –
Named entities: people, companies, places, etc.
Pattern-based features: e-mail addresses, phone numbers,
etc.
Concepts: abstractions of entities.
Facts and relationships.
Events.
Concrete and abstract attributes (e.g., “expensive” &
“comfortable”) including measure-value pairs.
Subjectivity in the forms of opinions, sentiments, and
emotions: attitudinal data.
– applied to business ends.
Text Analytics: An Industry View
JADT – June 5, 2014
16
Sources
It’s a truism that 80% of enterprise-relevant information
originates in “unstructured” form:
E-mail and messages.
Web pages, online news & blogs, forum postings, and other
social media.
Contact-center notes and transcripts.
Surveys, feedback forms, warranty claims.
Scientific literature, books, legal documents.
...
Non-text “unstructured” content?
Images
Audio including speech
Video
Value derives from patterns.
Text Analytics: An Industry View
JADT – June 5, 2014
17
Value
What do we do with text, whether online, on-social, or in
the enterprise?
1. Post/Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata &
contents.
4. Extract information and Analyze.
Text Analytics: An Industry View
JADT – June 5, 2014
18
Semantics, analytics, and IR
Text analytics generates semantics to bridge search, BI, and
applications, enabling next-generation information
systems.
Search
BI/Big
Data
Applica-
tions
Search based
applications
(search + text +
apps)
Information access
(search + analytics)
Synthesis (text +
BI)/(big data)
Text analytics
(inner circle)
Semantic search
(search + text)
NextGen CRM, EFM,
MR, marketing,
apps…
Text Analytics: An Industry View
JADT – June 5, 2014
19
Content, composites, connections 1
Text Analytics: An Industry View
JADT – June 5, 2014
20
Content, Composites, Connections, 2
Content, composites, connections 2
Text Analytics: An Industry View
JADT – June 5, 2014
21
Applications
Text analytics has applications in:
Intelligence & law enforcement.
Life sciences & clinical medicine.
Media & publishing including social-media analysis and
contextual advertizing.
Competitive intelligence.
Voice of the Customer: CRM, product management &
marketing.
Public administration & policy.
Legal, tax & regulatory (LTR) including compliance.
Recruiting.
Text Analytics: An Industry View
JADT – June 5, 2014
22
Opinion, sentiment & emotion
Text Analytics: An Industry View
JADT – June 5, 2014
23
Sentiment analysis
A specialization, of relevance to:
Brand/reputation management.
Customer experience management (CEM).
Competitive intelligence.
Survey analysis (EFM = Enterprise Feedback Management).
Market research.
Product design/quality.
Trend spotting.
Text Analytics: An Industry View
JADT – June 5, 2014
24
Data exploration
via dashboards
and
workbenches.
Text Analytics: An Industry View
JADT – June 5, 2014
25
Text analytics present:
The market…
Text Analytics: An Industry View
JADT – June 5, 2014
26
http://altaplana.com/TA2014
Text Analytics: An Industry View
JADT – June 5, 2014
27
5%
6%
8%
9%
10%
11%
13%
14%
15%
16%
25%
27%
29%
33%
38%
38%
39%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Military/national security/intelligence
Law enforcement
Intellectual property/patent analysis
Financial services/capital markets
Product/service design, quality assurance, or warranty claims
Other
Insurance, risk management, or fraud
E-discovery
Life sciences or clinical medicine
Online commerce including shopping, price intelligence,…
Content management or publishing
Customer /CRM
Search, information access, or Question Answering
Competitive intelligence
Brand/product/reputation management
Research (not listed)
Voice of the Customer / Customer Experience Management
What are your primary applications where text comes into
play?
Text Analytics: An Industry View
JADT – June 5, 2014
28
Voice of the Customer
Text analytics is applied to improve customer service and
boost satisfaction and loyalty.
Analyze customer interactions and opinions –
• E-mail, contact-center notes, survey responses.
• Forum & blog posting and other social media.
– to –
• Address customer product & service issues.
• Improve quality.
• Manage brand & reputation.
Assessment of qualitative information from text helps users –
• Gain feedback on interactions.
• Assess customer value.
• Understand root causes.
• Mine data for measures such as churn likelihood.
Text Analytics: An Industry View
JADT – June 5, 2014
29
The commercial scene
Text Analytics: An Industry View
JADT – June 5, 2014
30
Online commerce
Text analytics is applied for marketing, search optimization,
competitive intelligence.
Analyze social media and enterprise feedback to understand
the Voice of the Market:
• Opportunities
• Threats
• Trends
Categorize product and service offerings for on-site search
and faceted navigation and to enrich content delivery.
Annotate pages to enhance Web-search findability, ranking.
Scrape competitor sites for offers and pricing.
Analyze social and news media for competitive information.
Text Analytics: An Industry View
JADT – June 5, 2014
31
E-Discovery and compliance
Text analytics is applied for compliance, fraud and risk, and
e-discovery.
Regulatory mandates and corporate practices dictate –
• Monitoring corporate communications
• Managing electronic stored information for production in
event of litigation
Sources include e-mail (!!), news, social media
Risk avoidance and fraud detection are key to effective
decision making
• Text analytics mines critical data from unstructured sources
• Integrated text-transactional analytics provides rich insights
Text Analytics: An Industry View
JADT – June 5, 2014
32
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
61%
0% 20% 40% 60% 80%
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form+micro)
What textual information are you analyzing or do you plan to analyze?
2014
2011
2009
Text Analytics: An Industry View
JADT – June 5, 2014
33
5%
5%
5%
5%
7%
9%
11%
11%
12%
12%
12%
13%
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
43%
46%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
insurance claims or underwriting notes
point-of-service notes or transcripts
video or animated images
warranty claims/documentation
photographs or other graphical images
crime, legal, or judicial reports or evidentiary materials
field/intelligence reports
speech or other audio
patent/IP filings
other
text messages/instant messages/SMS
medical records
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form) including Tumblr
Twitter, Sina Weibo, or other microblogs
What textual information are you analyzing or do you plan to analyze?
Text Analytics: An Industry View
JADT – June 5, 2014
34
Current, 33%
Current, 31%
Current, 34%
Current, 47%
Current, 51%
Current, 56%
Current, 47%
Current, 54%
Current, 66%
Expect, 21%
Expect, 24%
Expect, 23%
Expect, 23%
Expect, 28%
Expect, 25%
Expect, 33%
Expect, 28%
Expect, 22%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Events
Semantic annotations
Other entities – phone numbers, part/product…
Metadata such as document author,…
Concepts, that is, abstract groups of entities
Named entities – people, companies,…
Relationships and/or facts
Sentiment, opinions, attitudes, emotions,…
Topics and themes
Do you currently need (or expect to need) to extract or analyze...
Text Analytics: An Industry View
JADT – June 5, 2014
35
“The share rise in users
who selected
Arabic…coincided with
much of the civil
unrest… in Middle
Eastern countries.”
http://bits.blogs.nytimes.com/2014/03/09/the
-languages-of-twitter-users/
Text Analytics: An Industry View
JADT – June 5, 2014
36
10%
1%
16%
9%
36%
34%
2%
2%
18%
7%
4%
3%
13%
8%
7%
38%
3%
2%
3%
2%
5%
9%
17%
3%
28%
7%
17%
24%
2%
10%
11%
15%
8%
4%
17%
21%
3%
20%
4%
0%
1%
1%
2%
0%
0% 10% 20% 30% 40% 50% 60%
Arabic
Bahasa Indonesia or Malay
Chinese
Dutch
French
German
Greek
Hindi, Urdu, Bengali, Punjabi, or…
Italian
Japanese
Korean
Polish
Portuguese
Russian
Scandinavian or Baltic
Spanish
Turkish or Turkic
Other African
Other Arabic script (including Urdu,…
Other East Asian
Other European or Slavic/Cyrillic
Other
Current
Within 2 years
Non-English language support?
Text Analytics: An Industry View
JADT – June 5, 2014
37
Software & platform options
Text-analytics options may be grouped in general classes.
• Installed text-analysis application, whether desktop or
server or deployed in-database.
• Data mining workbench.
• Hosted.
• Programming tool.
• As-a-service, via an application programming interface
(API).
• Code library or component of a business/vertical
application, for instance for CRM, e-discovery, search.
Text analytics is frequently embedded in search or other
end-user applications.
The slides that follow next will present leading options in
each category except Hosted…
Text Analytics: An Industry View
JADT – June 5, 2014
38
22%
25%
28%
30%
32%
33%
33%
36%
37%
40%
41%
43%
44%
45%
53%
53%
54%
64%
0% 10% 20% 30% 40% 50% 60% 70%
media monitoring/analysis interface
hosted or Web service (on-demand "API") option
supports data fusion / unified analytics
sector adaptation (e.g., hospitality, insurance, retail, health care,…
BI (business intelligence) integration
ability to create custom workflows or to create or change…
big data capabilities, e.g., via Hadoop/MapReduce
predictive-analytics integration
open source
support for multiple languages
sentiment scoring
"real time" capabilities
low cost
deep sentiment/emotion/opinion/intent extraction
document classification
broad information extraction capability
ability to use specialized dictionaries, taxonomies, ontologies, or…
ability to generate categories or taxonomies
What is important in a solution?
2014 (n=139)
2011 (n=136)
2009 (n=78)
Text Analytics: An Industry View
JADT – June 5, 2014
39
User decision criteria
Primary considerations include –
Adaptation or specialization: To a business or cultural domain,
language, information type (e.g., text, speech, images) &
source (e.g., Twitter, e-mail, online news).
By-user customization possibilities: For instance, via custom
taxonomies, rules, lexicons.
Sentiment resolution: Aggregate, message, or feature level.
(What features? Topics, coreferenced entities?)
What sentiment? Valence & what else? Emotion? Intent?
Outputs: E.g., annotated text, models, indicators, dashboards,
exploratory data interfaces.
Usage mode: As-a-service (API), installed, or hosted/cloud.
Capacity: Volume, performance, throughput, latency.
Cost.
Text Analytics: An Industry View
JADT – June 5, 2014
40
A few French companies
Text Analytics: An Industry View
JADT – June 5, 2014
41
Academic spin-offs
People Pattern
Text Analytics: An Industry View
JADT – June 5, 2014
42
Text analytics future:
Synthesis and sensemaking.
New York Times,
September 8, 1957
Text Analytics: An Industry View
JADT – June 5, 2014
44
Emotion in text
Text Analytics: An Industry View
JADT – June 5, 2014
45
Emotion and outcomes
Text Analytics: An Industry View
JADT – June 5, 2014
46
Audio including speech.
Images.
Video.
http://www.geekosystem.com/
facebook-face-recognition/
http://www.sciencedirect.com/science
/article/pii/S0167639312000118
http://flylib.com/books/en/2.495.1.54/1/
Beyond Text
Text Analytics: An Industry View
JADT – June 5, 2014
47
The world of big data
Machine data (e.g., logs, sensor outputs, clickstreams).
Actions, interactions, and transactions: geolocation and
time.
Profiles: individual, demographic & behavioral.
Text, audio, images, and video.
Facts and feelings.
Text Analytics: An Industry View
JADT – June 5, 2014
48
(Accessible) data everywhere
Text Analytics: An Industry View
JADT – June 5, 2014
49
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
A big data analytics architecture (example)
Text Analytics: An Industry View
JADT – June 5, 2014
50
http://searchuserinterfaces.com/
“It is convenient to divide the entire
information access process into two
main components: information
retrieval through searching and
browsing, and analysis and synthesis
of results. This broader process is
often referred to in the literature as
sensemaking.
Sensemaking refers to an iterative
process of formulating a conceptual
representation from of a large
volume of information.”
– Marti Hearst, 2009
Sensemaking
Text Analytics: An Industry View
JADT – June 5, 2014
51
http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm
En route
Text Analytics Past, Present &
Future: An Industry View
Seth Grimes
Alta Plana Corporation
@sethgrimes
June 5, 2014

Text Analytics Past, Present & Future: An Industry View

  • 1.
    Text Analytics Past,Present & Future: An Industry View Seth Grimes Alta Plana Corporation @sethgrimes June 5, 2014
  • 2.
    Text Analytics: AnIndustry View JADT – June 5, 2014 2
  • 3.
    Text Analytics: AnIndustry View JADT – June 5, 2014 3 Analytics is the systematic application of algorithmic methods that derive and deliver information, typically expressed quantitatively, whether in the form of indicators, tables, visualizations, or models. • Systematic means formal & repeatable. • Algorithmic contrasts with heuristic.
  • 4.
    Text Analytics: AnIndustry View JADT – June 5, 2014 4 Text analytics past: Pioneers…
  • 5.
    Document input and processing Knowledge handling is key DeskSet (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC. Hans Peter Luhn “A Business Intelligence System” IBM Journal, October 1958
  • 6.
    Text Analytics: AnIndustry View JADT – June 5, 2014 6 “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
  • 10.
    Text Analytics: AnIndustry View JADT – June 5, 2014 10 Pipelines and patterns IBM’s MedTAKMI, 1997- http://www.research.ibm.com/trl/projects/textmining/index_e.htm
  • 11.
    Text Analytics: AnIndustry View JADT – June 5, 2014 11 Exhaustive extraction An (old) Attensity example – NLP to identify roles and relationships, for a law-enforcement application .
  • 12.
    Text Analytics: AnIndustry View JADT – June 5, 2014 12 Language engineering GATE: General Architecture for Text Engineering. http://gate.ac.uk/
  • 13.
    Text Analytics: AnIndustry View JADT – June 5, 2014 13 Text analytics present: Business, technology, applications, and solutions…
  • 14.
    Text Analytics: AnIndustry View JADT – June 5, 2014 14 “Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute, 2007 http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-analytics.aspx
  • 15.
    Text Analytics: AnIndustry View JADT – June 5, 2014 15 Linguistics, statistics, and semantics Text analytics (typically) involves linguistic modelling, statistical characterization, learned patterns, and semantic understanding of text-derived features – Named entities: people, companies, places, etc. Pattern-based features: e-mail addresses, phone numbers, etc. Concepts: abstractions of entities. Facts and relationships. Events. Concrete and abstract attributes (e.g., “expensive” & “comfortable”) including measure-value pairs. Subjectivity in the forms of opinions, sentiments, and emotions: attitudinal data. – applied to business ends.
  • 16.
    Text Analytics: AnIndustry View JADT – June 5, 2014 16 Sources It’s a truism that 80% of enterprise-relevant information originates in “unstructured” form: E-mail and messages. Web pages, online news & blogs, forum postings, and other social media. Contact-center notes and transcripts. Surveys, feedback forms, warranty claims. Scientific literature, books, legal documents. ... Non-text “unstructured” content? Images Audio including speech Video Value derives from patterns.
  • 17.
    Text Analytics: AnIndustry View JADT – June 5, 2014 17 Value What do we do with text, whether online, on-social, or in the enterprise? 1. Post/Publish, Manage, and Archive. 2. Index and Search. 3. Categorize and Classify according to metadata & contents. 4. Extract information and Analyze.
  • 18.
    Text Analytics: AnIndustry View JADT – June 5, 2014 18 Semantics, analytics, and IR Text analytics generates semantics to bridge search, BI, and applications, enabling next-generation information systems. Search BI/Big Data Applica- tions Search based applications (search + text + apps) Information access (search + analytics) Synthesis (text + BI)/(big data) Text analytics (inner circle) Semantic search (search + text) NextGen CRM, EFM, MR, marketing, apps…
  • 19.
    Text Analytics: AnIndustry View JADT – June 5, 2014 19 Content, composites, connections 1
  • 20.
    Text Analytics: AnIndustry View JADT – June 5, 2014 20 Content, Composites, Connections, 2 Content, composites, connections 2
  • 21.
    Text Analytics: AnIndustry View JADT – June 5, 2014 21 Applications Text analytics has applications in: Intelligence & law enforcement. Life sciences & clinical medicine. Media & publishing including social-media analysis and contextual advertizing. Competitive intelligence. Voice of the Customer: CRM, product management & marketing. Public administration & policy. Legal, tax & regulatory (LTR) including compliance. Recruiting.
  • 22.
    Text Analytics: AnIndustry View JADT – June 5, 2014 22 Opinion, sentiment & emotion
  • 23.
    Text Analytics: AnIndustry View JADT – June 5, 2014 23 Sentiment analysis A specialization, of relevance to: Brand/reputation management. Customer experience management (CEM). Competitive intelligence. Survey analysis (EFM = Enterprise Feedback Management). Market research. Product design/quality. Trend spotting.
  • 24.
    Text Analytics: AnIndustry View JADT – June 5, 2014 24 Data exploration via dashboards and workbenches.
  • 25.
    Text Analytics: AnIndustry View JADT – June 5, 2014 25 Text analytics present: The market…
  • 26.
    Text Analytics: AnIndustry View JADT – June 5, 2014 26 http://altaplana.com/TA2014
  • 27.
    Text Analytics: AnIndustry View JADT – June 5, 2014 27 5% 6% 8% 9% 10% 11% 13% 14% 15% 16% 25% 27% 29% 33% 38% 38% 39% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% Military/national security/intelligence Law enforcement Intellectual property/patent analysis Financial services/capital markets Product/service design, quality assurance, or warranty claims Other Insurance, risk management, or fraud E-discovery Life sciences or clinical medicine Online commerce including shopping, price intelligence,… Content management or publishing Customer /CRM Search, information access, or Question Answering Competitive intelligence Brand/product/reputation management Research (not listed) Voice of the Customer / Customer Experience Management What are your primary applications where text comes into play?
  • 28.
    Text Analytics: AnIndustry View JADT – June 5, 2014 28 Voice of the Customer Text analytics is applied to improve customer service and boost satisfaction and loyalty. Analyze customer interactions and opinions – • E-mail, contact-center notes, survey responses. • Forum & blog posting and other social media. – to – • Address customer product & service issues. • Improve quality. • Manage brand & reputation. Assessment of qualitative information from text helps users – • Gain feedback on interactions. • Assess customer value. • Understand root causes. • Mine data for measures such as churn likelihood.
  • 29.
    Text Analytics: AnIndustry View JADT – June 5, 2014 29 The commercial scene
  • 30.
    Text Analytics: AnIndustry View JADT – June 5, 2014 30 Online commerce Text analytics is applied for marketing, search optimization, competitive intelligence. Analyze social media and enterprise feedback to understand the Voice of the Market: • Opportunities • Threats • Trends Categorize product and service offerings for on-site search and faceted navigation and to enrich content delivery. Annotate pages to enhance Web-search findability, ranking. Scrape competitor sites for offers and pricing. Analyze social and news media for competitive information.
  • 31.
    Text Analytics: AnIndustry View JADT – June 5, 2014 31 E-Discovery and compliance Text analytics is applied for compliance, fraud and risk, and e-discovery. Regulatory mandates and corporate practices dictate – • Monitoring corporate communications • Managing electronic stored information for production in event of litigation Sources include e-mail (!!), news, social media Risk avoidance and fraud detection are key to effective decision making • Text analytics mines critical data from unstructured sources • Integrated text-transactional analytics provides rich insights
  • 32.
    Text Analytics: AnIndustry View JADT – June 5, 2014 32 16% 19% 20% 20% 22% 26% 31% 31% 32% 36% 37% 38% 42% 61% 0% 20% 40% 60% 80% Web-site feedback social media not listed above chat employee surveys contact-center notes or transcripts e-mail and correspondence online reviews scientific or technical literature Facebook postings on-line forums customer/market surveys comments on blogs and articles news articles blogs (long form+micro) What textual information are you analyzing or do you plan to analyze? 2014 2011 2009
  • 33.
    Text Analytics: AnIndustry View JADT – June 5, 2014 33 5% 5% 5% 5% 7% 9% 11% 11% 12% 12% 12% 13% 16% 19% 20% 20% 22% 26% 31% 31% 32% 36% 37% 38% 42% 43% 46% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% insurance claims or underwriting notes point-of-service notes or transcripts video or animated images warranty claims/documentation photographs or other graphical images crime, legal, or judicial reports or evidentiary materials field/intelligence reports speech or other audio patent/IP filings other text messages/instant messages/SMS medical records Web-site feedback social media not listed above chat employee surveys contact-center notes or transcripts e-mail and correspondence online reviews scientific or technical literature Facebook postings on-line forums customer/market surveys comments on blogs and articles news articles blogs (long form) including Tumblr Twitter, Sina Weibo, or other microblogs What textual information are you analyzing or do you plan to analyze?
  • 34.
    Text Analytics: AnIndustry View JADT – June 5, 2014 34 Current, 33% Current, 31% Current, 34% Current, 47% Current, 51% Current, 56% Current, 47% Current, 54% Current, 66% Expect, 21% Expect, 24% Expect, 23% Expect, 23% Expect, 28% Expect, 25% Expect, 33% Expect, 28% Expect, 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Events Semantic annotations Other entities – phone numbers, part/product… Metadata such as document author,… Concepts, that is, abstract groups of entities Named entities – people, companies,… Relationships and/or facts Sentiment, opinions, attitudes, emotions,… Topics and themes Do you currently need (or expect to need) to extract or analyze...
  • 35.
    Text Analytics: AnIndustry View JADT – June 5, 2014 35 “The share rise in users who selected Arabic…coincided with much of the civil unrest… in Middle Eastern countries.” http://bits.blogs.nytimes.com/2014/03/09/the -languages-of-twitter-users/
  • 36.
    Text Analytics: AnIndustry View JADT – June 5, 2014 36 10% 1% 16% 9% 36% 34% 2% 2% 18% 7% 4% 3% 13% 8% 7% 38% 3% 2% 3% 2% 5% 9% 17% 3% 28% 7% 17% 24% 2% 10% 11% 15% 8% 4% 17% 21% 3% 20% 4% 0% 1% 1% 2% 0% 0% 10% 20% 30% 40% 50% 60% Arabic Bahasa Indonesia or Malay Chinese Dutch French German Greek Hindi, Urdu, Bengali, Punjabi, or… Italian Japanese Korean Polish Portuguese Russian Scandinavian or Baltic Spanish Turkish or Turkic Other African Other Arabic script (including Urdu,… Other East Asian Other European or Slavic/Cyrillic Other Current Within 2 years Non-English language support?
  • 37.
    Text Analytics: AnIndustry View JADT – June 5, 2014 37 Software & platform options Text-analytics options may be grouped in general classes. • Installed text-analysis application, whether desktop or server or deployed in-database. • Data mining workbench. • Hosted. • Programming tool. • As-a-service, via an application programming interface (API). • Code library or component of a business/vertical application, for instance for CRM, e-discovery, search. Text analytics is frequently embedded in search or other end-user applications. The slides that follow next will present leading options in each category except Hosted…
  • 38.
    Text Analytics: AnIndustry View JADT – June 5, 2014 38 22% 25% 28% 30% 32% 33% 33% 36% 37% 40% 41% 43% 44% 45% 53% 53% 54% 64% 0% 10% 20% 30% 40% 50% 60% 70% media monitoring/analysis interface hosted or Web service (on-demand "API") option supports data fusion / unified analytics sector adaptation (e.g., hospitality, insurance, retail, health care,… BI (business intelligence) integration ability to create custom workflows or to create or change… big data capabilities, e.g., via Hadoop/MapReduce predictive-analytics integration open source support for multiple languages sentiment scoring "real time" capabilities low cost deep sentiment/emotion/opinion/intent extraction document classification broad information extraction capability ability to use specialized dictionaries, taxonomies, ontologies, or… ability to generate categories or taxonomies What is important in a solution? 2014 (n=139) 2011 (n=136) 2009 (n=78)
  • 39.
    Text Analytics: AnIndustry View JADT – June 5, 2014 39 User decision criteria Primary considerations include – Adaptation or specialization: To a business or cultural domain, language, information type (e.g., text, speech, images) & source (e.g., Twitter, e-mail, online news). By-user customization possibilities: For instance, via custom taxonomies, rules, lexicons. Sentiment resolution: Aggregate, message, or feature level. (What features? Topics, coreferenced entities?) What sentiment? Valence & what else? Emotion? Intent? Outputs: E.g., annotated text, models, indicators, dashboards, exploratory data interfaces. Usage mode: As-a-service (API), installed, or hosted/cloud. Capacity: Volume, performance, throughput, latency. Cost.
  • 40.
    Text Analytics: AnIndustry View JADT – June 5, 2014 40 A few French companies
  • 41.
    Text Analytics: AnIndustry View JADT – June 5, 2014 41 Academic spin-offs People Pattern
  • 42.
    Text Analytics: AnIndustry View JADT – June 5, 2014 42 Text analytics future: Synthesis and sensemaking.
  • 43.
  • 44.
    Text Analytics: AnIndustry View JADT – June 5, 2014 44 Emotion in text
  • 45.
    Text Analytics: AnIndustry View JADT – June 5, 2014 45 Emotion and outcomes
  • 46.
    Text Analytics: AnIndustry View JADT – June 5, 2014 46 Audio including speech. Images. Video. http://www.geekosystem.com/ facebook-face-recognition/ http://www.sciencedirect.com/science /article/pii/S0167639312000118 http://flylib.com/books/en/2.495.1.54/1/ Beyond Text
  • 47.
    Text Analytics: AnIndustry View JADT – June 5, 2014 47 The world of big data Machine data (e.g., logs, sensor outputs, clickstreams). Actions, interactions, and transactions: geolocation and time. Profiles: individual, demographic & behavioral. Text, audio, images, and video. Facts and feelings.
  • 48.
    Text Analytics: AnIndustry View JADT – June 5, 2014 48 (Accessible) data everywhere
  • 49.
    Text Analytics: AnIndustry View JADT – June 5, 2014 49 http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html A big data analytics architecture (example)
  • 50.
    Text Analytics: AnIndustry View JADT – June 5, 2014 50 http://searchuserinterfaces.com/ “It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information.” – Marti Hearst, 2009 Sensemaking
  • 51.
    Text Analytics: AnIndustry View JADT – June 5, 2014 51 http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm En route
  • 52.
    Text Analytics Past,Present & Future: An Industry View Seth Grimes Alta Plana Corporation @sethgrimes June 5, 2014