SlideShare a Scribd company logo
1 of 102
An Introduction to Text Analytics
in IBM SPSS Modeler
Skylar Ritchie
Shawn Bergman
1
Objectives
1. To give a broad overview of text analytics…
a. Defining key terms
b. Describing important steps in the process
2. To provide a step-by-step tutorial for how to use IBM SPSS Modeler
to...
a. Read in source text
b. Extract concepts, sentiment, and text link patterns from records
c. Categorize records
d. Visualize the results
2
Overview of Text Analytics
Objective #1
3
Text Analytics
“The process of deriving high quality
information from text”
--Marisa Peacock, Social Media Strategist
”A technology and process both, a
mechanism for knowledge discovery applied
to documents, a means of finding value in
text. Solutions…analyze linguistic
structure...discern entities...as well as
relationships, concepts, and even
sentiments. They...automate
classification...of source documents. They
exploit visualization for exploratory analysis.”
--Seth Grimes, Analytics Strategy Consultant
1. Extraction: to discern entities,
relationships, concepts, and sentiments
2. Categorization: to automate classification
3. Visualization
4
What does text analytics ”look” like?
1: Source Text
•File
•Web Feed
2: Dictionaries
• Substitution
• Type
• Exclude
3: Extraction
Results
•Concepts
•Types
•Text Link
Analysis
Patterns
4: Grouping
Techniques
•Concept
Inclusion
•Concept Root
Derivation
•Semantic
Network
•Co-occurrence
5: Categorization
Results
•Categories
•Descriptors
Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
5
Handout provided
Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
6
Key Terms
Source text file
Field
Document/record
7
Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
8
Key Terms
<Organization>
university
university, college, school,
academy, institute,
polytechnic, alma mater,
graduate school…
Types: higher-level concepts
Concepts: lead terms under which
similar terms are grouped
together
Terms: single words (uni-terms) or
word phrases (multi-terms) that
are interesting or relevant
9
Handout provided
Substitution Dictionary: Terms  Concepts
An editable collection of
synonymous terms grouped under
a target term, or concept
Target Term Synonyms
university university, college,
school, academy,
institute, polytechnic,
alma mater, graduate
school…
student student, scholar,
undergraduate, graduate,
grad student,
postdoctoral fellow,
freshman, sophomore,
junior, senior…
professor professor, prof, tenured
faculty member, dean,
assistant professor,
associate professor,
lecturer, academic…
university
graduate
school
college
university
10
Type Dictionary: Concepts  Types
An editable collection of
concepts grouped under a
label known as the type
name
Concept Type
5 star <Positive>
a lot better <Positive>
beyond my expectations <Positive>
abhor <Negative>
bizarre <Negative>
can’t stand <Negative>
all about the same <Uncertain>
been with it for too little time <Uncertain>
can’t think of any <Uncertain>
11
Exclude Dictionary
An editable collection of terms
and types that will be removed
from the final extraction results
Exclude List
any kind of problem
can’t say enough
can’t wait
i was out of
if it ain’t broke, don’t fix it
prefer not to
to work with
went down to
12
Text Link Analysis (TLA)
A pattern-matching
technology that is used
to extract relationships
found between…
• Either concepts
• Or types
• <Organization> + <Positive>
• university + excellent
“This is a 5 star
university”
• <Unknown> + <Unknown> +
<Negative>
• undergraduates + lecturers + dislike
“Undergraduates
abhor mere
lecturers”
13
Handout provided
Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
14
Key Terms
Categorization: the process of
assigning records to a category when
the text within them matches a
descriptor
Category: higher-level ideas that
capture the central message of the
text
Descriptor: concepts, types,
patterns, and category rules that
have been used to define a category
Descriptors
Concepts
Types
TLA patterns
Category rules
15
Category Rules
Statements that classify records into a category based on a logical
expression using extracted concepts, types, and patterns as well as
Boolean operators
Operator Meaning Example
+ ”And”
(order
important)
• <Organization> + <Positive>
• university + excellent
& ”And”
(order not
important)
• <Positive> & <Organization>
• excellent & university
| ”Or” • <Person> | <Organization>
• student | university
!() “Not” • !(<Person>)
• !(student)
Matching Sentence
This is a 5 star university
16
Handout provided
Wildcard Operator
The Boolean operator * that acts as a variable and stands in for a missing
word or word fragment
Usage Example Matching Phrases
Space after word graduate * • graduate school
• graduate student
Space before word * graduate • university graduate
No space after word graduate* • graduates
• graduated
No space before word *graduate • undergraduate
17
Grouping Techniques
The mechanisms underlying the categorization process
Extraction Results
• Concepts
• Types
• Text Link
Analysis
Patterns
Grouping
Techniques
• Concept
Inclusion
• Concept Root
Derivation
• Semantic
Network
• Co-occurrence
Categorization
Results
• Categories
• Descriptors
18
Handout provided
Concept Inclusion
What?
Grouping based on subsets and
supersets
How?
1. Breaking concepts into
components
2. De-inflecting components
When?
Text that is somewhat technical
Descriptor: De-inflected Components
faculty
De-inflected Components
{graduate, faculty} {faculty, committee} {tenure, faculty, member}
Components
{graduate, faculty} {faculty, committees} {tenured, faculty, members}
Concepts
graduate faculty faculty committees tenured faculty members
19
Concept Root Derivation
What?
Grouping based on morphological
relationships
How?
1. Breaking concepts into
components
2. De-inflecting components
3. Removing suffixes to find root
When?
Any text, but few categories
Descriptor: De-inflected Component Roots
psycholog-
De-inflected Components
{study, psychology} {psychological, study} {noteworthy, psychologist}
Components
{studies, psychology} {psychological, studies} {noteworthy, psychologist}
Concepts
studies in psychology psychological studies noteworthy psychologist
20
Semantic Network
What?
Grouping based on semantic
relationships
How?
• Synonyms: “are” relationship
• Hyponyms: “is a” relationship
When?
Text that is not highly technical
Category
educators
Synonyms
professors teachers
Category
social science
Hyponyms
psychology social science
21
Co-occurrence
What?
Grouping based on concepts that
appear together
How?
𝐶 𝑋𝑌 ≥ 2 → 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 =
(𝐶 𝑋𝑌)2
𝐶 𝑋 × 𝐶 𝑌
When?
Any text, but many categories based
on possibly distant relationships
Example Concepts
Students flock to ASU • students = W
• ASU = X
ASU focuses on sustainability • ASU = X
• sustainability = Y
Sustainability is the way of
the future
• sustainability = Y
• way of the future = Z
𝐶 𝑊 = 1
𝐶 𝑋 = 2
𝐶 𝑌 = 2
𝐶 𝑍 = 1
𝐶 𝑊𝑋 = 1
𝐶 𝑋𝑌 = 1
𝐶 𝑌𝑍 = 1
(𝐶 𝑊𝑋)2
𝐶 𝑊 × 𝐶 𝑋
=
12
1 × 2
=
1
2
(𝐶 𝑋𝑌)2
𝐶 𝑋 × 𝐶 𝑌
=
12
2 × 2
=
1
4
(𝐶 𝑌𝑍)2
𝐶 𝑌 × 𝐶 𝑍
=
12
2 × 1
=
1
222
Extraction v. Categorization
Extraction Categorization
Ends To discover what records contain To classify records based on what they
contain
Means • Substitution dictionary
• Type dictionary
• Exclude dictionary
• Concept root derivation
• Concept inclusion
• Semantic network
• Co-occurrence
Output • Concepts
• Types
• TLA patterns
• Categories
• Descriptors
• Concepts
• Types
• TLA patterns
• Category rules
23
Modeler Tutorial
Objective #2
24
Starting
Modeler by…
Creating a
new stream
Sourcing an
Excel file
25
Creating a New Stream
1. Open IBM SPSS Modeler 17.1
2. Select
3. Click Ok
4. To create another stream, click
26
Starting
Modeler by…
Creating a
new stream
Sourcing an
Excel file
27
28
Sourcing an Excel File
1. Click the tab
2. Double click the node or click and drag it into the
stream
3. Double click the node within the stream or right
click and click Edit
4. Click on the tab
5. Select the
6. Select the
7. A
8. Select
9. Click Ok
29
Starting Interactive
Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
30
Handout provided
• Less information in
substitution, type, and
exclude dictionaries
• No categories
• More information in
substitution, type, and
exclude dictionaries
• No categories
• More information in
substitution, type, and
exclude dictionaries
• Pre-built categories
Starting Interactive
Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
31
Starting an Interactive Workbench Session with the Basic Resources Template
1. Click the tab
2. Double click the node or click and drag it into
the stream
3. Double click the node within the stream or right
click and click Edit
4. Click on the tab
5. Select the
6. Click on the tab
7. Select
8. Click
32
Interactive Workbench – Categories & Concepts View
Categories Pane
Extraction Results Pane
Data Pane
33
Interactive Workbench – Resource Editor View
Type Dictionary
Substitution Dictionary
Exclude Dictionary
34
Starting Interactive
Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
35
Starting an Interactive Workbench Session with the Opinions Template
1. Double click the node within the stream or right
click and click Edit
2. Click on the tab
3. Click
4. Select
5. Click Ok
6. Click
36
Interactive Workbench – Categories & Concepts View
Concept View
37
Interactive Workbench – Categories & Concepts View
Type View
38
Interactive Workbench – Resource Editor View
Type Dictionary
Substitution Dictionary
Exclude Dictionary
39
Starting Interactive
Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
40
Starting an Interactive Workbench Session with the Opinions Text Analysis Package
1. Double click the node within the stream or right
click and click Edit
2. Click on the tab
3. Select
4. Click
5. Select
6. Click
7. Click
41
Interactive Workbench – Categories & Concepts View
Categories Pane
Extraction Results Pane
Data Pane
42
Interactive Workbench – Resource Editor View
Type Dictionary
Substitution Dictionary
Exclude Dictionary
43
Templates v. Text Analysis Packages
Libraries Pre-Built Categories
Basic Resources Template • Local
• Core
• Variations
• Nonlinguistic Entities
No
Opinions Template • Local
• Core
• Variations
• Nonlinguistic Entities
• Opinions
• Budget
• Slang
• Emoticon
No
Opinions Text Analysis Package • Local
• Core
• Variations
• Nonlinguistic Entities
• Opinions
• Budget
• Slang
• Emoticon
Yes
44
Handout provided
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
45
Handout provided
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
46
Interactive Workbench – Categories & Concepts View
47
Editing the Substitution Dictionary
1. Right click on the concept
2. Select Add to Synonym
3. Click New
4. Create the target term to which you want to assign the
synonym
5. Click Ok
6. Click
48
Interactive Workbench
Categories
&
Concepts View
Resource
Editor View
49
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
50
Interactive Workbench – Categories & Concepts View
51
Editing the Type Dictionary
1. Right click on the concept
2. Select Add to Type
3. Click More
4. Select the type to which you want to assign the concept
5. Click Ok
6. Click Ok again
7. Click
52
Interactive Workbench
Categories
&
Concepts View
Resource
Editor View
53
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
54
Interactive Workbench – Categories & Concepts View
55
Editing the Exclude Dictionary
1. Right click on the concept
2. Click Exclude from Extraction
3. Click
56
Interactive Workbench
Categories & Concepts View Resource Editor View
57
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
58
Extracting TLA Patterns
1. In the Text Link Analysis View, click
2. Select a type pattern to see the concept patterns that
correspond to it
3. Click to see the concepts and type webs
corresponding to these patterns
59
Interactive Workbench – Text Link Analysis View
60
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
61
Automatically Building Categories
1. In the Categories & Concepts View, click
2. Click Edit:
3. Select
4. Click
5. Click
6. Select
7. Select
8. Select
9. Select
10. Select
11. Click Ok
12. Click
62
Interactive Workbench – Categories & Concepts View
Category
Subcategory
Descriptor
Visualization Pane:
Category Bar
63
Interactive Workbench – Categories & Concepts View
Category Web
64
Interactive Workbench – Categories & Concepts View
Category Web Table
65
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
66
Interactive Workbench – Categories & Concepts View
67
Manually Categorizing Concepts
1. Select the concept you want to categorize
2. Click
3. Select the category to which you want to assign the
concept:
4. Click Ok
68
Interactive Workbench – Categories & Concepts View
69
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
70
Interactive Workbench – Categories & Concepts View
71
Manually Categorizing Types
1. Select the type you want to categorize
2. Click
3. Select the category to which you want to assign the
concept or create a new category:
4. Click Ok
72
Interactive Workbench – Categories & Concepts View
73
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
74
Interactive Workbench – Text Link Analysis View
Type Patterns Concept Patterns
75
Manually Categorizing TLA Patterns
1. Select the TLA pattern you want to categorize
2. Click
3. Select the category to which you want to assign the
concept or create a new category:
4. Click Ok
76
Interactive Workbench – Categories & Concepts View
77
Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
78
Manually Creating Category Rules
1. Right click on the category for which you want to create
a rule
2. Click Create Category Rule
3. Create your rule by…
1. Dragging concepts or types into the Rule Editor
2. Combining them with Boolean operators
4. Click to see how many records match
5. Click
79
Interactive Workbench – Categories & Concepts View
80
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
81
Handout provided
82
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
83
Interactive Workbench – Categories & Concepts View
84
Manually Adjusting Categories
1. Right click on the category or categories that you want
to adjust
2. Select either Move to Category or Merge Categories or
Edit > Delete
85
Interactive Workbench – Categories & Concepts View
86
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
87
Interactive Workbench – Categories & Concepts View
88
Generating Model
1. Once you are satisfied with the categories you have
created, click
2. Drag the newly created modeling node
into your stream
3. Right click on your source node
4. Click Connect
5. Click on your modeling node to connect the
two nodes
89
Stream
90
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
91
Converting Model Categories to Fields
1. Right click on your modeling node
2. Click Edit
3. Click on the tab
4. Select
5. Change the
6. Click Ok
92
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
93
Deriving a Total Negativity Score
1. Click on the tab
2. Double click the node or click and drag it into the
stream
3. Double click the node within the stream or right
click and click Edit
4. Give a descriptive name to your
5. Click to create a formula
6. In Expression Builder, click on a category that you want to
be in your formula
7. Click to add it
8. Click on an operator such as
9. Add another category
10. When you are finished, click Ok
11. Repeat the process to create additional formulas
94
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
95
Deriving an Overall Sentiment Score
1. Click on the tab
2. Double click the node or click and drag it into
the stream
3. Double click the node within the stream or right
click and click Edit
4. Give a descriptive name to your
5. Select
6. Define field settings:
7. Click Ok
96
Stream
97
Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
98
Visualizing Model Results
1. Click on the tab
2. Double click the node or click and drag it into
the stream
3. Double click the node within the stream or right
click and click Edit
4. Click on the tab
5. Select
6. Select overlay:
7. Select
8. Click
99
100
Summary
1. To give a broad overview of text analytics…
a. Defining key terms
b. Describing important steps in the process
2. To provide a step-by-step tutorial for how to use IBM SPSS Modeler
to...
a. Read in source text
b. Extract concepts, sentiment, and text link patterns from records
c. Categorize records
d. Visualize the results
101
Additional Resources
• Users Guide:
http://public.dhe.ibm.com/software/analytics/spss/documentation/m
odeler/17.0/en/ModelerTextAnalytics.pdf
• Introduction to SPSS Text Analytics Webinar:
https://www.youtube.com/watch?v=tK-o4MnRScQ&list=WL&index=2
102

More Related Content

What's hot

Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Text mining Pre-processing
Text mining Pre-processingText mining Pre-processing
Text mining Pre-processingCreditas
 
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...Nirav Raje
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptxSarojkumari55
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
kinds of analytics
kinds of analyticskinds of analytics
kinds of analyticsBenila Paul
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceData Science Thailand
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - IntroDavid Hubbard
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
What Is Prescriptive Analytics? Your 5-Minute Overview
What Is Prescriptive Analytics? Your 5-Minute OverviewWhat Is Prescriptive Analytics? Your 5-Minute Overview
What Is Prescriptive Analytics? Your 5-Minute OverviewShannon Kearns
 

What's hot (20)

Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Text mining
Text miningText mining
Text mining
 
Text mining
Text miningText mining
Text mining
 
Text mining Pre-processing
Text mining Pre-processingText mining Pre-processing
Text mining Pre-processing
 
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptx
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Data preparation
Data preparationData preparation
Data preparation
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
kinds of analytics
kinds of analyticskinds of analytics
kinds of analytics
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
 
Data Mining
Data MiningData Mining
Data Mining
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
What Is Prescriptive Analytics? Your 5-Minute Overview
What Is Prescriptive Analytics? Your 5-Minute OverviewWhat Is Prescriptive Analytics? Your 5-Minute Overview
What Is Prescriptive Analytics? Your 5-Minute Overview
 
Data Science
Data ScienceData Science
Data Science
 

Viewers also liked

IBM SPSS Overview Text Analytics Brief
IBM SPSS Overview Text Analytics BriefIBM SPSS Overview Text Analytics Brief
IBM SPSS Overview Text Analytics BriefIan Balina
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTao Xie
 
Aplicação de text mining
Aplicação de text miningAplicação de text mining
Aplicação de text miningJosias Oliveira
 
Presentación Guadalajara #Tecnopoliticay15M
Presentación Guadalajara #Tecnopoliticay15MPresentación Guadalajara #Tecnopoliticay15M
Presentación Guadalajara #Tecnopoliticay15MJavier Toret Medina
 
Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...
Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...
Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...IBM
 
Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics IBM SPSS Software
 
Text Mining
Text MiningText Mining
Text Miningdp6
 
OUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: IntroductionOUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: IntroductionFlorian Leitner
 
Introducción a Text Mining
Introducción a Text MiningIntroducción a Text Mining
Introducción a Text MiningJuan Azcurra
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsSFIMA
 
Chez Direct Energie, l'analyse prédictive éclaire le comportement des clients
Chez Direct Energie, l'analyse prédictive éclaire le comportement des clientsChez Direct Energie, l'analyse prédictive éclaire le comportement des clients
Chez Direct Energie, l'analyse prédictive éclaire le comportement des clientsSolutions IT et Business
 
5 text mining la ultima palabra yesenia glez pearson
5 text mining la ultima palabra yesenia glez pearson5 text mining la ultima palabra yesenia glez pearson
5 text mining la ultima palabra yesenia glez pearsonEvelyn Femat
 
Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.
Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.
Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.DMC Perú
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text miningKrish_ver2
 
Predictive Maintenance in the Industrial Internet of Things
Predictive Maintenance in the Industrial Internet of ThingsPredictive Maintenance in the Industrial Internet of Things
Predictive Maintenance in the Industrial Internet of ThingsTibbo
 

Viewers also liked (20)

IBM SPSS Overview Text Analytics Brief
IBM SPSS Overview Text Analytics BriefIBM SPSS Overview Text Analytics Brief
IBM SPSS Overview Text Analytics Brief
 
Polyanalyst
PolyanalystPolyanalyst
Polyanalyst
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
Campus Party2010
Campus Party2010Campus Party2010
Campus Party2010
 
Aplicação de text mining
Aplicação de text miningAplicação de text mining
Aplicação de text mining
 
J15 45 peset_fernanda
J15 45 peset_fernandaJ15 45 peset_fernanda
J15 45 peset_fernanda
 
Presentación Guadalajara #Tecnopoliticay15M
Presentación Guadalajara #Tecnopoliticay15MPresentación Guadalajara #Tecnopoliticay15M
Presentación Guadalajara #Tecnopoliticay15M
 
Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...
Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...
Marketing, eCommerce et Relation Client multi-canal : le parcours de Lily, fe...
 
IBM - Predictive Maintenance
IBM - Predictive MaintenanceIBM - Predictive Maintenance
IBM - Predictive Maintenance
 
Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics
 
Text Mining
Text MiningText Mining
Text Mining
 
OUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: IntroductionOUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: Introduction
 
Introducción a Text Mining
Introducción a Text MiningIntroducción a Text Mining
Introducción a Text Mining
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive Analytics
 
Chez Direct Energie, l'analyse prédictive éclaire le comportement des clients
Chez Direct Energie, l'analyse prédictive éclaire le comportement des clientsChez Direct Energie, l'analyse prédictive éclaire le comportement des clients
Chez Direct Energie, l'analyse prédictive éclaire le comportement des clients
 
5 text mining la ultima palabra yesenia glez pearson
5 text mining la ultima palabra yesenia glez pearson5 text mining la ultima palabra yesenia glez pearson
5 text mining la ultima palabra yesenia glez pearson
 
Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.
Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.
Text Mining: Segmentaciónd de Usuarios de Twitter. Lima Metropolitana.
 
Fondevila, UPF, Universitat de Girona y Universitat Ramon Llull, Joaquín Marq...
Fondevila, UPF, Universitat de Girona y Universitat Ramon Llull, Joaquín Marq...Fondevila, UPF, Universitat de Girona y Universitat Ramon Llull, Joaquín Marq...
Fondevila, UPF, Universitat de Girona y Universitat Ramon Llull, Joaquín Marq...
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Predictive Maintenance in the Industrial Internet of Things
Predictive Maintenance in the Industrial Internet of ThingsPredictive Maintenance in the Industrial Internet of Things
Predictive Maintenance in the Industrial Internet of Things
 

Similar to Text Analytics Presentation

How to read papers
How to  read papersHow to  read papers
How to read papersXiao Qin
 
Data analysis – qualitative data presentation 2
Data analysis – qualitative data   presentation 2Data analysis – qualitative data   presentation 2
Data analysis – qualitative data presentation 2Azura Zaki
 
Module 1 - SLPManaging Individual BehaviorThe SLP for this c.docx
Module 1 - SLPManaging Individual BehaviorThe SLP for this c.docxModule 1 - SLPManaging Individual BehaviorThe SLP for this c.docx
Module 1 - SLPManaging Individual BehaviorThe SLP for this c.docxclairbycraft
 
课程介绍.pptx
课程介绍.pptx课程介绍.pptx
课程介绍.pptxmingliu107
 
TSL3133 Topic 11 Qualitative Data Analysis
TSL3133 Topic 11 Qualitative Data AnalysisTSL3133 Topic 11 Qualitative Data Analysis
TSL3133 Topic 11 Qualitative Data AnalysisYee Bee Choo
 
Scoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the JobScoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the JobJason Kaufman
 
Seymour PBL 8-25-2015 LINK version
Seymour PBL 8-25-2015 LINK versionSeymour PBL 8-25-2015 LINK version
Seymour PBL 8-25-2015 LINK versionKim Bennett
 
Systematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise OverviewSystematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise Overviewyoukayaslam
 
Thesis Writing Service From Writekraft [www.writekraft.com]
Thesis Writing Service From Writekraft [www.writekraft.com]Thesis Writing Service From Writekraft [www.writekraft.com]
Thesis Writing Service From Writekraft [www.writekraft.com]WriteKraft Dissertations
 
Thesis writing services [www.writekraft.com]
Thesis writing services [www.writekraft.com]Thesis writing services [www.writekraft.com]
Thesis writing services [www.writekraft.com]WriteKraft Dissertations
 
Thesis Writing Service from Writekraft [www.writekraft.com]
Thesis Writing Service from Writekraft [www.writekraft.com]Thesis Writing Service from Writekraft [www.writekraft.com]
Thesis Writing Service from Writekraft [www.writekraft.com]WriteKraft Dissertations
 
Thesis writing service [www.writekraft.com]
Thesis writing service [www.writekraft.com]Thesis writing service [www.writekraft.com]
Thesis writing service [www.writekraft.com]WriteKraft Dissertations
 
Dissertation Writing Service from Writekraft [www.writekrfat.com]
Dissertation Writing Service from Writekraft [www.writekrfat.com]Dissertation Writing Service from Writekraft [www.writekrfat.com]
Dissertation Writing Service from Writekraft [www.writekrfat.com]WriteKraft Dissertations
 
Dissertation writing Service [www.writekraft.com]
Dissertation writing Service [www.writekraft.com]Dissertation writing Service [www.writekraft.com]
Dissertation writing Service [www.writekraft.com]WriteKraft Dissertations
 
Dissertation Service from Writekraft [www.writekraft.com]
Dissertation Service from Writekraft [www.writekraft.com]Dissertation Service from Writekraft [www.writekraft.com]
Dissertation Service from Writekraft [www.writekraft.com]WriteKraft Dissertations
 
Dissertation Writing Service From Writekraft [www.writekraft.com]
Dissertation Writing Service From Writekraft [www.writekraft.com]Dissertation Writing Service From Writekraft [www.writekraft.com]
Dissertation Writing Service From Writekraft [www.writekraft.com]WriteKraft Dissertations
 

Similar to Text Analytics Presentation (20)

How to read papers
How to  read papersHow to  read papers
How to read papers
 
MCOM510 WebQuest
MCOM510 WebQuest MCOM510 WebQuest
MCOM510 WebQuest
 
Data analysis – qualitative data presentation 2
Data analysis – qualitative data   presentation 2Data analysis – qualitative data   presentation 2
Data analysis – qualitative data presentation 2
 
Module 1 - SLPManaging Individual BehaviorThe SLP for this c.docx
Module 1 - SLPManaging Individual BehaviorThe SLP for this c.docxModule 1 - SLPManaging Individual BehaviorThe SLP for this c.docx
Module 1 - SLPManaging Individual BehaviorThe SLP for this c.docx
 
课程介绍.pptx
课程介绍.pptx课程介绍.pptx
课程介绍.pptx
 
TSL3133 Topic 11 Qualitative Data Analysis
TSL3133 Topic 11 Qualitative Data AnalysisTSL3133 Topic 11 Qualitative Data Analysis
TSL3133 Topic 11 Qualitative Data Analysis
 
sdv.pptx
sdv.pptxsdv.pptx
sdv.pptx
 
Scoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the JobScoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the Job
 
Seymour PBL 8-25-2015 LINK version
Seymour PBL 8-25-2015 LINK versionSeymour PBL 8-25-2015 LINK version
Seymour PBL 8-25-2015 LINK version
 
Systematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise OverviewSystematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise Overview
 
Thesis [www.writekraft.com]
Thesis [www.writekraft.com]Thesis [www.writekraft.com]
Thesis [www.writekraft.com]
 
Thesis Writing Service From Writekraft [www.writekraft.com]
Thesis Writing Service From Writekraft [www.writekraft.com]Thesis Writing Service From Writekraft [www.writekraft.com]
Thesis Writing Service From Writekraft [www.writekraft.com]
 
Thesis writing services [www.writekraft.com]
Thesis writing services [www.writekraft.com]Thesis writing services [www.writekraft.com]
Thesis writing services [www.writekraft.com]
 
Thesis Writing Service from Writekraft [www.writekraft.com]
Thesis Writing Service from Writekraft [www.writekraft.com]Thesis Writing Service from Writekraft [www.writekraft.com]
Thesis Writing Service from Writekraft [www.writekraft.com]
 
Thesis [www.writekraft.com]
Thesis [www.writekraft.com]Thesis [www.writekraft.com]
Thesis [www.writekraft.com]
 
Thesis writing service [www.writekraft.com]
Thesis writing service [www.writekraft.com]Thesis writing service [www.writekraft.com]
Thesis writing service [www.writekraft.com]
 
Dissertation Writing Service from Writekraft [www.writekrfat.com]
Dissertation Writing Service from Writekraft [www.writekrfat.com]Dissertation Writing Service from Writekraft [www.writekrfat.com]
Dissertation Writing Service from Writekraft [www.writekrfat.com]
 
Dissertation writing Service [www.writekraft.com]
Dissertation writing Service [www.writekraft.com]Dissertation writing Service [www.writekraft.com]
Dissertation writing Service [www.writekraft.com]
 
Dissertation Service from Writekraft [www.writekraft.com]
Dissertation Service from Writekraft [www.writekraft.com]Dissertation Service from Writekraft [www.writekraft.com]
Dissertation Service from Writekraft [www.writekraft.com]
 
Dissertation Writing Service From Writekraft [www.writekraft.com]
Dissertation Writing Service From Writekraft [www.writekraft.com]Dissertation Writing Service From Writekraft [www.writekraft.com]
Dissertation Writing Service From Writekraft [www.writekraft.com]
 

Text Analytics Presentation

  • 1. An Introduction to Text Analytics in IBM SPSS Modeler Skylar Ritchie Shawn Bergman 1
  • 2. Objectives 1. To give a broad overview of text analytics… a. Defining key terms b. Describing important steps in the process 2. To provide a step-by-step tutorial for how to use IBM SPSS Modeler to... a. Read in source text b. Extract concepts, sentiment, and text link patterns from records c. Categorize records d. Visualize the results 2
  • 3. Overview of Text Analytics Objective #1 3
  • 4. Text Analytics “The process of deriving high quality information from text” --Marisa Peacock, Social Media Strategist ”A technology and process both, a mechanism for knowledge discovery applied to documents, a means of finding value in text. Solutions…analyze linguistic structure...discern entities...as well as relationships, concepts, and even sentiments. They...automate classification...of source documents. They exploit visualization for exploratory analysis.” --Seth Grimes, Analytics Strategy Consultant 1. Extraction: to discern entities, relationships, concepts, and sentiments 2. Categorization: to automate classification 3. Visualization 4
  • 5. What does text analytics ”look” like? 1: Source Text •File •Web Feed 2: Dictionaries • Substitution • Type • Exclude 3: Extraction Results •Concepts •Types •Text Link Analysis Patterns 4: Grouping Techniques •Concept Inclusion •Concept Root Derivation •Semantic Network •Co-occurrence 5: Categorization Results •Categories •Descriptors Sourcing (Step 1) Extracting (Steps 2-3) Categorizing (Steps 4-5) Visualizing 5 Handout provided
  • 7. Key Terms Source text file Field Document/record 7
  • 9. Key Terms <Organization> university university, college, school, academy, institute, polytechnic, alma mater, graduate school… Types: higher-level concepts Concepts: lead terms under which similar terms are grouped together Terms: single words (uni-terms) or word phrases (multi-terms) that are interesting or relevant 9 Handout provided
  • 10. Substitution Dictionary: Terms  Concepts An editable collection of synonymous terms grouped under a target term, or concept Target Term Synonyms university university, college, school, academy, institute, polytechnic, alma mater, graduate school… student student, scholar, undergraduate, graduate, grad student, postdoctoral fellow, freshman, sophomore, junior, senior… professor professor, prof, tenured faculty member, dean, assistant professor, associate professor, lecturer, academic… university graduate school college university 10
  • 11. Type Dictionary: Concepts  Types An editable collection of concepts grouped under a label known as the type name Concept Type 5 star <Positive> a lot better <Positive> beyond my expectations <Positive> abhor <Negative> bizarre <Negative> can’t stand <Negative> all about the same <Uncertain> been with it for too little time <Uncertain> can’t think of any <Uncertain> 11
  • 12. Exclude Dictionary An editable collection of terms and types that will be removed from the final extraction results Exclude List any kind of problem can’t say enough can’t wait i was out of if it ain’t broke, don’t fix it prefer not to to work with went down to 12
  • 13. Text Link Analysis (TLA) A pattern-matching technology that is used to extract relationships found between… • Either concepts • Or types • <Organization> + <Positive> • university + excellent “This is a 5 star university” • <Unknown> + <Unknown> + <Negative> • undergraduates + lecturers + dislike “Undergraduates abhor mere lecturers” 13 Handout provided
  • 15. Key Terms Categorization: the process of assigning records to a category when the text within them matches a descriptor Category: higher-level ideas that capture the central message of the text Descriptor: concepts, types, patterns, and category rules that have been used to define a category Descriptors Concepts Types TLA patterns Category rules 15
  • 16. Category Rules Statements that classify records into a category based on a logical expression using extracted concepts, types, and patterns as well as Boolean operators Operator Meaning Example + ”And” (order important) • <Organization> + <Positive> • university + excellent & ”And” (order not important) • <Positive> & <Organization> • excellent & university | ”Or” • <Person> | <Organization> • student | university !() “Not” • !(<Person>) • !(student) Matching Sentence This is a 5 star university 16 Handout provided
  • 17. Wildcard Operator The Boolean operator * that acts as a variable and stands in for a missing word or word fragment Usage Example Matching Phrases Space after word graduate * • graduate school • graduate student Space before word * graduate • university graduate No space after word graduate* • graduates • graduated No space before word *graduate • undergraduate 17
  • 18. Grouping Techniques The mechanisms underlying the categorization process Extraction Results • Concepts • Types • Text Link Analysis Patterns Grouping Techniques • Concept Inclusion • Concept Root Derivation • Semantic Network • Co-occurrence Categorization Results • Categories • Descriptors 18 Handout provided
  • 19. Concept Inclusion What? Grouping based on subsets and supersets How? 1. Breaking concepts into components 2. De-inflecting components When? Text that is somewhat technical Descriptor: De-inflected Components faculty De-inflected Components {graduate, faculty} {faculty, committee} {tenure, faculty, member} Components {graduate, faculty} {faculty, committees} {tenured, faculty, members} Concepts graduate faculty faculty committees tenured faculty members 19
  • 20. Concept Root Derivation What? Grouping based on morphological relationships How? 1. Breaking concepts into components 2. De-inflecting components 3. Removing suffixes to find root When? Any text, but few categories Descriptor: De-inflected Component Roots psycholog- De-inflected Components {study, psychology} {psychological, study} {noteworthy, psychologist} Components {studies, psychology} {psychological, studies} {noteworthy, psychologist} Concepts studies in psychology psychological studies noteworthy psychologist 20
  • 21. Semantic Network What? Grouping based on semantic relationships How? • Synonyms: “are” relationship • Hyponyms: “is a” relationship When? Text that is not highly technical Category educators Synonyms professors teachers Category social science Hyponyms psychology social science 21
  • 22. Co-occurrence What? Grouping based on concepts that appear together How? 𝐶 𝑋𝑌 ≥ 2 → 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = (𝐶 𝑋𝑌)2 𝐶 𝑋 × 𝐶 𝑌 When? Any text, but many categories based on possibly distant relationships Example Concepts Students flock to ASU • students = W • ASU = X ASU focuses on sustainability • ASU = X • sustainability = Y Sustainability is the way of the future • sustainability = Y • way of the future = Z 𝐶 𝑊 = 1 𝐶 𝑋 = 2 𝐶 𝑌 = 2 𝐶 𝑍 = 1 𝐶 𝑊𝑋 = 1 𝐶 𝑋𝑌 = 1 𝐶 𝑌𝑍 = 1 (𝐶 𝑊𝑋)2 𝐶 𝑊 × 𝐶 𝑋 = 12 1 × 2 = 1 2 (𝐶 𝑋𝑌)2 𝐶 𝑋 × 𝐶 𝑌 = 12 2 × 2 = 1 4 (𝐶 𝑌𝑍)2 𝐶 𝑌 × 𝐶 𝑍 = 12 2 × 1 = 1 222
  • 23. Extraction v. Categorization Extraction Categorization Ends To discover what records contain To classify records based on what they contain Means • Substitution dictionary • Type dictionary • Exclude dictionary • Concept root derivation • Concept inclusion • Semantic network • Co-occurrence Output • Concepts • Types • TLA patterns • Categories • Descriptors • Concepts • Types • TLA patterns • Category rules 23
  • 25. Starting Modeler by… Creating a new stream Sourcing an Excel file 25
  • 26. Creating a New Stream 1. Open IBM SPSS Modeler 17.1 2. Select 3. Click Ok 4. To create another stream, click 26
  • 27. Starting Modeler by… Creating a new stream Sourcing an Excel file 27
  • 28. 28
  • 29. Sourcing an Excel File 1. Click the tab 2. Double click the node or click and drag it into the stream 3. Double click the node within the stream or right click and click Edit 4. Click on the tab 5. Select the 6. Select the 7. A 8. Select 9. Click Ok 29
  • 30. Starting Interactive Workbench Session with… Basic Resources Template Opinions Template Opinions Text Analysis Package 30 Handout provided • Less information in substitution, type, and exclude dictionaries • No categories • More information in substitution, type, and exclude dictionaries • No categories • More information in substitution, type, and exclude dictionaries • Pre-built categories
  • 31. Starting Interactive Workbench Session with… Basic Resources Template Opinions Template Opinions Text Analysis Package 31
  • 32. Starting an Interactive Workbench Session with the Basic Resources Template 1. Click the tab 2. Double click the node or click and drag it into the stream 3. Double click the node within the stream or right click and click Edit 4. Click on the tab 5. Select the 6. Click on the tab 7. Select 8. Click 32
  • 33. Interactive Workbench – Categories & Concepts View Categories Pane Extraction Results Pane Data Pane 33
  • 34. Interactive Workbench – Resource Editor View Type Dictionary Substitution Dictionary Exclude Dictionary 34
  • 35. Starting Interactive Workbench Session with… Basic Resources Template Opinions Template Opinions Text Analysis Package 35
  • 36. Starting an Interactive Workbench Session with the Opinions Template 1. Double click the node within the stream or right click and click Edit 2. Click on the tab 3. Click 4. Select 5. Click Ok 6. Click 36
  • 37. Interactive Workbench – Categories & Concepts View Concept View 37
  • 38. Interactive Workbench – Categories & Concepts View Type View 38
  • 39. Interactive Workbench – Resource Editor View Type Dictionary Substitution Dictionary Exclude Dictionary 39
  • 40. Starting Interactive Workbench Session with… Basic Resources Template Opinions Template Opinions Text Analysis Package 40
  • 41. Starting an Interactive Workbench Session with the Opinions Text Analysis Package 1. Double click the node within the stream or right click and click Edit 2. Click on the tab 3. Select 4. Click 5. Select 6. Click 7. Click 41
  • 42. Interactive Workbench – Categories & Concepts View Categories Pane Extraction Results Pane Data Pane 42
  • 43. Interactive Workbench – Resource Editor View Type Dictionary Substitution Dictionary Exclude Dictionary 43
  • 44. Templates v. Text Analysis Packages Libraries Pre-Built Categories Basic Resources Template • Local • Core • Variations • Nonlinguistic Entities No Opinions Template • Local • Core • Variations • Nonlinguistic Entities • Opinions • Budget • Slang • Emoticon No Opinions Text Analysis Package • Local • Core • Variations • Nonlinguistic Entities • Opinions • Budget • Slang • Emoticon Yes 44 Handout provided
  • 45. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 45 Handout provided
  • 46. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 46
  • 47. Interactive Workbench – Categories & Concepts View 47
  • 48. Editing the Substitution Dictionary 1. Right click on the concept 2. Select Add to Synonym 3. Click New 4. Create the target term to which you want to assign the synonym 5. Click Ok 6. Click 48
  • 50. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 50
  • 51. Interactive Workbench – Categories & Concepts View 51
  • 52. Editing the Type Dictionary 1. Right click on the concept 2. Select Add to Type 3. Click More 4. Select the type to which you want to assign the concept 5. Click Ok 6. Click Ok again 7. Click 52
  • 54. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 54
  • 55. Interactive Workbench – Categories & Concepts View 55
  • 56. Editing the Exclude Dictionary 1. Right click on the concept 2. Click Exclude from Extraction 3. Click 56
  • 57. Interactive Workbench Categories & Concepts View Resource Editor View 57
  • 58. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 58
  • 59. Extracting TLA Patterns 1. In the Text Link Analysis View, click 2. Select a type pattern to see the concept patterns that correspond to it 3. Click to see the concepts and type webs corresponding to these patterns 59
  • 60. Interactive Workbench – Text Link Analysis View 60
  • 61. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 61
  • 62. Automatically Building Categories 1. In the Categories & Concepts View, click 2. Click Edit: 3. Select 4. Click 5. Click 6. Select 7. Select 8. Select 9. Select 10. Select 11. Click Ok 12. Click 62
  • 63. Interactive Workbench – Categories & Concepts View Category Subcategory Descriptor Visualization Pane: Category Bar 63
  • 64. Interactive Workbench – Categories & Concepts View Category Web 64
  • 65. Interactive Workbench – Categories & Concepts View Category Web Table 65
  • 66. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 66
  • 67. Interactive Workbench – Categories & Concepts View 67
  • 68. Manually Categorizing Concepts 1. Select the concept you want to categorize 2. Click 3. Select the category to which you want to assign the concept: 4. Click Ok 68
  • 69. Interactive Workbench – Categories & Concepts View 69
  • 70. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 70
  • 71. Interactive Workbench – Categories & Concepts View 71
  • 72. Manually Categorizing Types 1. Select the type you want to categorize 2. Click 3. Select the category to which you want to assign the concept or create a new category: 4. Click Ok 72
  • 73. Interactive Workbench – Categories & Concepts View 73
  • 74. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 74
  • 75. Interactive Workbench – Text Link Analysis View Type Patterns Concept Patterns 75
  • 76. Manually Categorizing TLA Patterns 1. Select the TLA pattern you want to categorize 2. Click 3. Select the category to which you want to assign the concept or create a new category: 4. Click Ok 76
  • 77. Interactive Workbench – Categories & Concepts View 77
  • 78. Using the Opinions Template for… Extraction by… Editing the… Substitution Dictionary Type Dictionary Exclude Dictionary Extracting TLA Patterns Categorization by… Automatically Building Categories Manually Categorizing… Concepts Types TLA Patterns Manually Creating Category Rules 78
  • 79. Manually Creating Category Rules 1. Right click on the category for which you want to create a rule 2. Click Create Category Rule 3. Create your rule by… 1. Dragging concepts or types into the Rule Editor 2. Combining them with Boolean operators 4. Click to see how many records match 5. Click 79
  • 80. Interactive Workbench – Categories & Concepts View 80
  • 81. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 81 Handout provided
  • 82. 82
  • 83. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 83
  • 84. Interactive Workbench – Categories & Concepts View 84
  • 85. Manually Adjusting Categories 1. Right click on the category or categories that you want to adjust 2. Select either Move to Category or Merge Categories or Edit > Delete 85
  • 86. Interactive Workbench – Categories & Concepts View 86
  • 87. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 87
  • 88. Interactive Workbench – Categories & Concepts View 88
  • 89. Generating Model 1. Once you are satisfied with the categories you have created, click 2. Drag the newly created modeling node into your stream 3. Right click on your source node 4. Click Connect 5. Click on your modeling node to connect the two nodes 89
  • 91. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 91
  • 92. Converting Model Categories to Fields 1. Right click on your modeling node 2. Click Edit 3. Click on the tab 4. Select 5. Change the 6. Click Ok 92
  • 93. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 93
  • 94. Deriving a Total Negativity Score 1. Click on the tab 2. Double click the node or click and drag it into the stream 3. Double click the node within the stream or right click and click Edit 4. Give a descriptive name to your 5. Click to create a formula 6. In Expression Builder, click on a category that you want to be in your formula 7. Click to add it 8. Click on an operator such as 9. Add another category 10. When you are finished, click Ok 11. Repeat the process to create additional formulas 94
  • 95. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 95
  • 96. Deriving an Overall Sentiment Score 1. Click on the tab 2. Double click the node or click and drag it into the stream 3. Double click the node within the stream or right click and click Edit 4. Give a descriptive name to your 5. Select 6. Define field settings: 7. Click Ok 96
  • 98. Using the Opinions Text Analysis Package for… Manually Adjusting Categories Generating Model Converting Model Categories to Fields Deriving... Total Negativity Score Overall Sentiment Score Visualizing Model Results 98
  • 99. Visualizing Model Results 1. Click on the tab 2. Double click the node or click and drag it into the stream 3. Double click the node within the stream or right click and click Edit 4. Click on the tab 5. Select 6. Select overlay: 7. Select 8. Click 99
  • 100. 100
  • 101. Summary 1. To give a broad overview of text analytics… a. Defining key terms b. Describing important steps in the process 2. To provide a step-by-step tutorial for how to use IBM SPSS Modeler to... a. Read in source text b. Extract concepts, sentiment, and text link patterns from records c. Categorize records d. Visualize the results 101
  • 102. Additional Resources • Users Guide: http://public.dhe.ibm.com/software/analytics/spss/documentation/m odeler/17.0/en/ModelerTextAnalytics.pdf • Introduction to SPSS Text Analytics Webinar: https://www.youtube.com/watch?v=tK-o4MnRScQ&list=WL&index=2 102

Editor's Notes

  1. Having read the 225-page User’s Guide cover to cover and watched countless videos on Modeler, I can personally attest that the two most difficult aspects of learning the software are… Distinguishing between terms that look similar, but signify very different ideas Coming up with an organizational framework for understanding the many things you can do in Modeler The first half of the presentation is dedicated to the first difficulty, and the second half of the presentation, to the second The overriding goal of this presentation is for you to feel as though you can explore the software for yourselves In putting it together, I tried to focus only on the essentials, and even though I only scratched the surface of what the software can do, we will have to hustle to make it through everything However, we will post this presentation with all of its examples and videos on the Office of Research Consultation website so that you can use it as a resource and refer back to it when you need it In the interest of time, I am going to cover the first half of the presentation relatively quickly, but if I am moving too quickly, please do not hesitate to ask questions and slow me down—just understand that we may not get to everything and that you may have to watch some of the videos at the end for yourself
  2. Let’s start with a definition of text analytics One thing both of these definitions have in common is that they both describe text analytics as a process Furthermore, both definitions describe the outcome of this process in similar terms: the outcome is high quality information, knowledge, and value The second definition, however, is somewhat more descriptive than the first, since it enumerates the principal steps in this process Those steps are to… Discern entities, relationships, concepts, and the relationship between them—something IBM calls extraction Automate classification—something IBM calls categorization Visualize the results In my presentation today, I will first describe these steps in greater detail and then show you how to perform them for yourself
  3. So what does this process look like? On the macro-level, the process involves four primary steps: Reading in source text Extracting linguistic entities, relationships, and sentiment Categorizing records Visualizing the results On more of a micro-level, the primary steps of extracting and categorizing can be broken down further: Extraction involves passing the source text through a variety of dictionaries (to be described in greater detail) in order to identify… Concepts Types Text Link Analysis patterns Categorization involves taking these extraction results and applying a number of grouping techniques in order to create categories and descriptors that classify records These diagrams depict text analytics as a linear process; however, as the User’s Guide repeatedly emphasizes, text analytics is an iterative process, so a more accurate depiction might include a feedback loop
  4. Let’s take a look at the first step in the text analytics process: sourcing Source text can take the form of either a computer file (such as an Excel file) or a Web feed (such as an RSS feed with various web links) Since the focus of today’s presentation is to demonstrate how to perform extraction, categorization, and visualization, I will use an Excel file as source text Using a Web feed as a source is a little less straightforward, but if you are interested in that as well, I can make that the topic of a future presentation Within an Excel file, you have worksheets, whose columns are known as “fields” and whose cells are referred to as either “documents” or “records,” two terms that IBM uses interchangeably For the sake of simplicity, I will refer to them in the future as records
  5. Let’s turn now to the second main step in text analytics: extraction Here for the first time we encounter a number of terms that look similar, but signify very different ideas In fact, these ideas are arranged hierarchically wither “terms” at the bottom and “types” at the highest level of abstraction Terms and concepts are always written as lowercase words or word phrases, and types are always enclosed in brackets The general types that come with the Core Library—more on that later—include <Person>, <Product>, <Organization>, and <Location> But types in other more specific libraries can themselves be more specific: Types in the Opinions Library include <Positive>, <Negative>, <Contextual>, and <Uncertain> among others Types in the Employee Satisfaction Library include <CoWorker>, <Management>, <Benefits>, and <WorkLifeBalance> among others
  6. As I mentioned earlier, there are several linguistic dictionaries that are instrumental in the extraction process The first of these is known as a substitution dictionary, and it is responsible for grouping terms under what are called target terms or concepts The computer scans all of the records, and whenever it finds synonymous terms, it essentially rewrites them as the target term It is important to note that this dictionary—and all the others—are editable So if, for example, you want to distinguish between “universities” and “institutes,” you can separate the two terms in your substitution dictionary And if, on the other hand, you want to use two terms synonymously, you can combine them in this dictionary
  7. The second linguistic dictionary is known as a type dictionary, and as its name implies, it is responsible for grouping concepts under their respective types Here the computer assigns a higher-level descriptive label to the concepts themselves, and although it is generally pretty good at assigning types when given some kind of context, if it is not given context, it will often assign the type <Unknown>
  8. The third and final linguistic dictionary is known as the exclude dictionary, and as its name suggests, whatever it contains is excluded from the final extraction As you peruse this dictionary, you might find a term or phrase that you do want to extract, and by deselecting it in this dictionary, you can ensure that it shows up in the extraction results There is also a way to assign unwanted terms and phrases to the exclude dictionary
  9. Text Link Analysis (or TLA) is where text analytics really demonstrates its value TLA patterns are the fourth and final kind of extraction results Whereas the other extraction results (terms, concepts, and types) represent a single linguistic unit, TLA patterns represent the relationships between these units and can express the meaning of an entire sentence with a subject, verb, and predicate As the examples at right indicate… Patterns can contain 2 or more concepts or types Order is important (indicated by the + operator), but sentiments always come last
  10. Finally let’s turn to the third main step in text analytics: categorization Whereas extraction involves bundling the terms, concepts, and types within records, categorization bundles the records themselves on the basis of what they contain Descriptors determine whether or not a record is assigned to a given category, and descriptors can take the form of either concepts, types, TLA patterns, or category rules
  11. Since we have already covered concepts, types, and TLA patterns, let’s move on and cover category rules In one way, category rules are like TLA patterns: they often join concepts or categories to describe a record and determine whether or not it belongs in a category In another way, however, category rules are unlike TLA patterns In the first place, they can use operators such as the ampersand or the vertical bar, in which case order is not important (excellent & university) would capture the exact same records as (university & student) In the second place, category rules can indicate the absence of something, whereas TLA patterns only focus on the presence of things !(student) would capture all of the records that do not contain student, and this might be a considerable number Usually, you would want to use the not operator in conjunction with another operator such as student & !(professor)
  12. The fifth and final Boolean operator is known as the wildcard, and you can think of it as a variable that represents a missing… Prefix Suffix Or word that precedes or comes after a given word If there is a space either before or after the wildcard, the wildcard represents a missing word If, on the other hand, there is no space, then the wildcard only represents a part of a word Wildcards can be useful for generalizing category descriptors, but in some instances, they can overgeneralize For example, “graduated” can be either an adjective or a verb, and if it is an adjective, it can refer to an alumnus or to a cylinder, and depending on the context, you may want to capture one concept but not the other with your descriptor
  13. Having covered category rules, the fourth kind of category descriptor, let’s turn to the grouping techniques that generate both the categories and their descriptors There are four of these: concept inclusion, concept root derivation, semantic networks, and co-occurrence
  14. Concept inclusion is a grouping technique that involves breaking concepts into their component sets, de-inflecting these components, and then identifying areas of overlap For example, let’s say you had the multi-term concepts “graduate faculty,” “faculty committees,” and “tenured faculty members” These concepts would first be broken down into their component sets and then these sets would be de-inflected (e.g., converting nouns from plural to singular) In the process at right, I have illustrated the de-inflection process by underlining the parts of the word that are removed in a subsequent step In these component sets, the order of the words is not important; the only thing that is important for the concept inclusion technique is whether or not these component sets have areas of overlap Concept inclusion is a technique that is relatively robust and works well on text that contains technical jargon
  15. Concept root derivation employs a very similar process, but goes one step further—stripping words down to their morphological or structural roots so that areas of overlap can be identified As you can see at right, “psychology,” “psychological,” and “psychologist” all have the same root—”psycholog-”—and the concepts can be grouped into categories on the basis of this similarity
  16. Unlike concept root derivation, which categorizes concepts on the basis of morphological relationships, the semantic network technique looks for and categorizes concepts on the basis of semantic relationships, relationships having to do with word meanings These semantic relationships generally take the form of either synonyms or hyponyms, where the former denotes an “are” relationship, and the latter, an “is a relationship” “Professors” and “teachers,” for example, might be considered synonyms, since they both are educators “Psychology” and “social science,” on the other hand, are hyponyms, since psychology is a social science
  17. The fourth and final grouping technique is that of co-occurrence Cxy represents the number of records in which two concepts co-occur; Cx, the number in which the first concept occurs; Cy, the number in which the second occurs Generally, concepts must co-occur two or more times in order for them to be categorized together; however, this setting can be adjusted either higher or lower If your setting is high, you will generate fewer categories, but these categories will contain concepts that are more similar to each other If your setting is low, you will generate more categories, but they will be more heterogeneous Co-occurrence is a relatively straightforward technique, but if you are interested in how it computes a similarity coefficient for two concepts, several sample calculations are illustrated at right
  18. To sum up what we have said so far, extraction differs from categorization both in terms of its purpose or end and in terms of its means to that end The purpose of extraction is to discover what records contain, whereas the purpose of categorization is to classify records on the basis of what they contain The means used are also different Extraction takes place by comparing records against a number of dictionaries Categorization, on the other hand, involves applying a variety of algorithms to the extraction results to create categories In this way, concepts, types, and TLA patterns are both output and input: output for the extraction process and input for the categorization process They are what gets pulled out of records and what the software then turns around and uses to classify those records
  19. Now that we have parsed out what the terminology means, let’s take a look at the software itself and see how to perform the various tasks associated with sourcing, extracting, categorizing, and visualizing As I mentioned earlier, one difficulty in learning Modeler is distinguishing between terms that look similar; however, a second difficulty concerns organizing the many different tasks you can perform in Modeler To surmount this second difficulty, I have provided a number of charts so that you can keep track of what we have done and what we are doing If you have the data set, you may find it helpful to follow along on your computer
  20. A stream is just your workspace, and it lays out in a visual fashion… What data you are using What processes you are running it through
  21. The data set that you gave us to analyze is a focus group conversation about the strategic direction of the College of Business Because you are probably less interested in moderator comments than you are in those of participants, you may want to filter out the moderator’s remarks in Excel before you start the analysis process
  22. Templates initiate the extraction phrase and pull out concepts and types There are many different kinds of templates, some of which contain more in their substitution, type, and exclude dictionaries than others There are also what are called text analysis packages (or TAPs) that come… Not only with a wealth of information in their dictionaries But also with a number of pre-built categories that you may be interested in when you are conducting your analysis For example, there is a TAP for employee satisfaction surveys, and the categories that it comes with include positive and negative sentiment toward… Coworkers Managers Communication Job security Benefits Etc. If you are not interested in all of the pre-built categories, you can delete or modify them to suit your preferences
  23. Now that we have explored the extraction and categorization results with the Opinions Template, let’s move to the Opinions Text Analysis Package As you’ll remember from the first part of the tutorial, the difference between a template and a text analysis package is that the former does not come with pre-built categories, whereas the latter does
  24. Because the focus group conversation is not in the proper format with a question as the field header and each record as one person’s response to that question, we will switch to a slightly different data set that is in the proper format so that we can demonstrate the remaining capabilities This data set is a questionnaire about a company’s safety program, and the field that we will be looking at has to do with what employees want the company to stop doing with regard to safety Because this is an employee opinion questionnaire, we can use the employee opinion text analysis package