Text analytics involves applying natural language processing techniques like named entity recognition, sentiment analysis, and topic modeling to extract insights from text data sources. It is used for applications like customer experience, market research, and competitive intelligence. The presentation provided an overview of text analytics approaches and tools, highlighting how it is part of business intelligence and data science solutions. Examples of early natural language processing work from the 1950s were also discussed.
3. Analytics is the systematic, repeatable application
of algorithmic methods that derive and deliver
information, typically expressed quantitatively,
whether in the form of indicators, tables,
visualizations, or models.
• Systematic means formal & repeatable.
• Algorithmic contrasts with heuristic.
• Information Knowledge
Text analytics is a term for software and business
processes that apply natural language processing
(NLP) to extract & communicate business insights
from social, online, and enterprise text sources.
4. Text analytics (typically) involves linguistic
modelling, statistical characterization, learned
patterns, and semantic understanding of text-
derived features –
• Named entities: people, companies, places, etc.
• Pattern-based features: e-mail addresses, phone
numbers, etc.
• Concepts: abstractions of entities.
• Facts and relationships.
• Events.
• Concrete and abstract attributes (e.g., “expensive” &
“comfortable”) including measure-value pairs.
• Subjectivity in the forms of opinions, sentiments, and
emotions: attitudinal & affective data.
– applied to business ends.
5.
6. “Statistical information derived from word frequency and
distribution is used by the machine to compute a relative
measure of significance, first for individual words and
then for sentences.”
-- H.P. Luhn, The Automatic Creation of Literature
Abstracts, IBM Journal, 1958.
Early text modeling (1958)
http://wordle.net
7. Document
input and
processing
Knowledge
handling Desk Set (1957): Computer engineer
Richard Sumner (Spencer Tracy)
and television network librarian
Bunny Watson (Katherine Hepburn)
and the "electronic brain" EMERAC.
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
8. Same era (~1957), foreshadowing NLP
application of the the Distributional
Hypothesis including embeddings:
• “You shall know a word by the company it keeps.”
-- J.R. Firth
• Keyword in Context (KWIC) Indexing
-- H.P. Luhn
See Manning and Schütze, Foundations of
Statistical Natural Language Processing,
1999
15. From a user
survey I ran
earlier this
year…
One respondent’s comment: “Since language technologies are still immature the vendor
landscape is highly fragmented and with no clear market leader. Most of the vendors
provide APIs for development staff requiring specific technical expertise, new skills and
systems to learn. Other simplified text analytic tools are usually narrow domain dedicated
and require plenty of manual work, manually built knowledge bases, and long training.”
16. The How of text analytics:
• Analysis workbenches
• Business applications
• Tools