Text Analytics Presentation

An Introduction to Text Analytics
in IBM SPSS Modeler
Skylar Ritchie
Shawn Bergman
1

Objectives
1. To give a broad overview of text analytics…
a. Defining key terms
b. Describing important steps in the process
2. To provide a step-by-step tutorial for how to use IBM SPSS Modeler
to...
a. Read in source text
b. Extract concepts, sentiment, and text link patterns from records
c. Categorize records
d. Visualize the results
2

Overview of Text Analytics
Objective #1
3

Text Analytics
“The process of deriving high quality
information from text”
--Marisa Peacock, Social Media Strategist
”A technology and process both, a
mechanism for knowledge discovery applied
to documents, a means of finding value in
text. Solutions…analyze linguistic
structure...discern entities...as well as
relationships, concepts, and even
sentiments. They...automate
classification...of source documents. They
exploit visualization for exploratory analysis.”
--Seth Grimes, Analytics Strategy Consultant
1. Extraction: to discern entities,
relationships, concepts, and sentiments
2. Categorization: to automate classification
3. Visualization
4

What does text analytics ”look” like?
1: Source Text
•File
•Web Feed
2: Dictionaries
• Substitution
• Type
• Exclude
3: Extraction
Results
•Concepts
•Types
•Text Link
Analysis
Patterns
4: Grouping
Techniques
•Concept
Inclusion
•Concept Root
Derivation
•Semantic
Network
•Co-occurrence
5: Categorization
Results
•Categories
•Descriptors
Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
5
Handout provided

Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
6

Key Terms
Source text file
Field
Document/record
7

Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
8

Key Terms
<Organization>
university
university, college, school,
academy, institute,
polytechnic, alma mater,
graduate school…
Types: higher-level concepts
Concepts: lead terms under which
similar terms are grouped
together
Terms: single words (uni-terms) or
word phrases (multi-terms) that
are interesting or relevant
9
Handout provided

Substitution Dictionary: Terms  Concepts
An editable collection of
synonymous terms grouped under
a target term, or concept
Target Term Synonyms
university university, college,
school, academy,
institute, polytechnic,
alma mater, graduate
school…
student student, scholar,
undergraduate, graduate,
grad student,
postdoctoral fellow,
freshman, sophomore,
junior, senior…
professor professor, prof, tenured
faculty member, dean,
assistant professor,
associate professor,
lecturer, academic…
university
graduate
school
college
university
10

Type Dictionary: Concepts  Types
An editable collection of
concepts grouped under a
label known as the type
name
Concept Type
5 star <Positive>
a lot better <Positive>
beyond my expectations <Positive>
abhor <Negative>
bizarre <Negative>
can’t stand <Negative>
all about the same <Uncertain>
been with it for too little time <Uncertain>
can’t think of any <Uncertain>
11

Exclude Dictionary
An editable collection of terms
and types that will be removed
from the final extraction results
Exclude List
any kind of problem
can’t say enough
can’t wait
i was out of
if it ain’t broke, don’t fix it
prefer not to
to work with
went down to
12

Text Link Analysis (TLA)
A pattern-matching
technology that is used
to extract relationships
found between…
• Either concepts
• Or types
• <Organization> + <Positive>
• university + excellent
“This is a 5 star
university”
• <Unknown> + <Unknown> +
<Negative>
• undergraduates + lecturers + dislike
“Undergraduates
abhor mere
lecturers”
13
Handout provided

Sourcing
(Step 1)
Extracting
(Steps 2-3)
Categorizing
(Steps 4-5)
Visualizing
14

Key Terms
Categorization: the process of
assigning records to a category when
the text within them matches a
descriptor
Category: higher-level ideas that
capture the central message of the
text
Descriptor: concepts, types,
patterns, and category rules that
have been used to define a category
Descriptors
Concepts
Types
TLA patterns
Category rules
15

Category Rules
Statements that classify records into a category based on a logical
expression using extracted concepts, types, and patterns as well as
Boolean operators
Operator Meaning Example
+ ”And”
(order
important)
• <Organization> + <Positive>
• university + excellent
& ”And”
(order not
important)
• <Positive> & <Organization>
• excellent & university
| ”Or” • <Person> | <Organization>
• student | university
!() “Not” • !(<Person>)
• !(student)
Matching Sentence
This is a 5 star university
16
Handout provided

Wildcard Operator
The Boolean operator * that acts as a variable and stands in for a missing
word or word fragment
Usage Example Matching Phrases
Space after word graduate * • graduate school
• graduate student
Space before word * graduate • university graduate
No space after word graduate* • graduates
• graduated
No space before word *graduate • undergraduate
17

Grouping Techniques
The mechanisms underlying the categorization process
Extraction Results
• Concepts
• Types
• Text Link
Analysis
Patterns
Grouping
Techniques
• Concept
Inclusion
• Concept Root
Derivation
• Semantic
Network
• Co-occurrence
Categorization
Results
• Categories
• Descriptors
18
Handout provided

Concept Inclusion
What?
Grouping based on subsets and
supersets
How?
1. Breaking concepts into
components
2. De-inflecting components
When?
Text that is somewhat technical
Descriptor: De-inflected Components
faculty
De-inflected Components
{graduate, faculty} {faculty, committee} {tenure, faculty, member}
Components
{graduate, faculty} {faculty, committees} {tenured, faculty, members}
Concepts
graduate faculty faculty committees tenured faculty members
19

Concept Root Derivation
What?
Grouping based on morphological
relationships
How?
1. Breaking concepts into
components
2. De-inflecting components
3. Removing suffixes to find root
When?
Any text, but few categories
Descriptor: De-inflected Component Roots
psycholog-
De-inflected Components
{study, psychology} {psychological, study} {noteworthy, psychologist}
Components
{studies, psychology} {psychological, studies} {noteworthy, psychologist}
Concepts
studies in psychology psychological studies noteworthy psychologist
20

Semantic Network
What?
Grouping based on semantic
relationships
How?
• Synonyms: “are” relationship
• Hyponyms: “is a” relationship
When?
Text that is not highly technical
Category
educators
Synonyms
professors teachers
Category
social science
Hyponyms
psychology social science
21

Co-occurrence
What?
Grouping based on concepts that
appear together
How?
𝐶 𝑋𝑌 ≥ 2 → 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 =
(𝐶 𝑋𝑌)2
𝐶 𝑋 × 𝐶 𝑌
When?
Any text, but many categories based
on possibly distant relationships
Example Concepts
Students flock to ASU • students = W
• ASU = X
ASU focuses on sustainability • ASU = X
• sustainability = Y
Sustainability is the way of
the future
• sustainability = Y
• way of the future = Z
𝐶 𝑊 = 1
𝐶 𝑋 = 2
𝐶 𝑌 = 2
𝐶 𝑍 = 1
𝐶 𝑊𝑋 = 1
𝐶 𝑋𝑌 = 1
𝐶 𝑌𝑍 = 1
(𝐶 𝑊𝑋)2
𝐶 𝑊 × 𝐶 𝑋
=
12
1 × 2
=
1
2
(𝐶 𝑋𝑌)2
𝐶 𝑋 × 𝐶 𝑌
=
12
2 × 2
=
1
4
(𝐶 𝑌𝑍)2
𝐶 𝑌 × 𝐶 𝑍
=
12
2 × 1
=
1
222

Extraction v. Categorization
Extraction Categorization
Ends To discover what records contain To classify records based on what they
contain
Means • Substitution dictionary
• Type dictionary
• Exclude dictionary
• Concept root derivation
• Concept inclusion
• Semantic network
• Co-occurrence
Output • Concepts
• Types
• TLA patterns
• Categories
• Descriptors
• Concepts
• Types
• TLA patterns
• Category rules
23

Modeler Tutorial
Objective #2
24

Starting
Modeler by…
Creating a
new stream
Sourcing an
Excel file
25

Creating a New Stream
1. Open IBM SPSS Modeler 17.1
2. Select
3. Click Ok
4. To create another stream, click
26

Starting
Modeler by…
Creating a
new stream
Sourcing an
Excel file
27

Sourcing an Excel File
1. Click the tab
2. Double click the node or click and drag it into the
stream
3. Double click the node within the stream or right
click and click Edit
4. Click on the tab
5. Select the
6. Select the
7. A
8. Select
9. Click Ok
29

Starting Interactive
Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
30
Handout provided
• Less information in
substitution, type, and
exclude dictionaries
• No categories
• More information in
• No categories
• More information in
• Pre-built categories

Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
31

Starting an Interactive Workbench Session with the Basic Resources Template
1. Click the tab
2. Double click the node or click and drag it into
the stream
4. Click on the tab
5. Select the
6. Click on the tab
7. Select
8. Click
32

Interactive Workbench – Categories & Concepts View
Categories Pane
Extraction Results Pane
Data Pane
33

Interactive Workbench – Resource Editor View
Type Dictionary
Substitution Dictionary
Exclude Dictionary
34

Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
35

Starting an Interactive Workbench Session with the Opinions Template
2. Click on the tab
3. Click
4. Select
5. Click Ok
6. Click
36

Concept View
37

Type View
38

Type Dictionary
Exclude Dictionary
39

Workbench
Session with…
Basic Resources
Template
Opinions Template
Opinions Text
Analysis Package
40

Starting an Interactive Workbench Session with the Opinions Text Analysis Package
2. Click on the tab
3. Select
4. Click
5. Select
6. Click
7. Click
41

Categories Pane
Extraction Results Pane
Data Pane
42

Type Dictionary
Exclude Dictionary
43

Templates v. Text Analysis Packages
Libraries Pre-Built Categories
Basic Resources Template • Local
• Core
• Variations
• Nonlinguistic Entities
No
Opinions Template • Local
• Core
• Variations
• Opinions
• Budget
• Slang
• Emoticon
No
Opinions Text Analysis Package • Local
• Core
• Variations
• Opinions
• Budget
• Slang
• Emoticon
Yes
44
Handout provided

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
45
Handout provided

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
46

47

Editing the Substitution Dictionary
1. Right click on the concept
2. Select Add to Synonym
3. Click New
4. Create the target term to which you want to assign the
synonym
5. Click Ok
6. Click
48

Interactive Workbench
Categories
&
Concepts View
Resource
Editor View
49

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
50

51

Editing the Type Dictionary
2. Select Add to Type
3. Click More
4. Select the type to which you want to assign the concept
5. Click Ok
6. Click Ok again
7. Click
52

Categories
&
Concepts View
Resource
Editor View
53

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
54

55

Editing the Exclude Dictionary
2. Click Exclude from Extraction
3. Click
56

Categories & Concepts View Resource Editor View
57

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
58

Extracting TLA Patterns
1. In the Text Link Analysis View, click
2. Select a type pattern to see the concept patterns that
correspond to it
3. Click to see the concepts and type webs
corresponding to these patterns
59

Interactive Workbench – Text Link Analysis View
60

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
61

Automatically Building Categories
1. In the Categories & Concepts View, click
2. Click Edit:
3. Select
4. Click
5. Click
6. Select
7. Select
8. Select
9. Select
10. Select
11. Click Ok
12. Click
62

Category
Subcategory
Descriptor
Visualization Pane:
Category Bar
63

Category Web
64

Category Web Table
65

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
66

67

Manually Categorizing Concepts
1. Select the concept you want to categorize
2. Click
3. Select the category to which you want to assign the
concept:
4. Click Ok
68

69

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
70

71

Manually Categorizing Types
1. Select the type you want to categorize
2. Click
concept or create a new category:
4. Click Ok
72

73

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
74

Interactive Workbench – Text Link Analysis View
Type Patterns Concept Patterns
75

Manually Categorizing TLA Patterns
1. Select the TLA pattern you want to categorize
2. Click
concept or create a new category:
4. Click Ok
76

77

Using the
Opinions
Template for…
Extraction by…
Editing the…
Substitution
Dictionary
Type Dictionary
Exclude
Dictionary
Extracting TLA
Patterns
Categorization
by…
Automatically
Building
Categories
Manually
Categorizing…
Concepts
Types
TLA Patterns
Manually
Creating
Category Rules
78

Manually Creating Category Rules
1. Right click on the category for which you want to create
a rule
2. Click Create Category Rule
3. Create your rule by…
1. Dragging concepts or types into the Rule Editor
2. Combining them with Boolean operators
4. Click to see how many records match
5. Click
79

80

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Categories to Fields
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
81
Handout provided

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
83

84

Manually Adjusting Categories
1. Right click on the category or categories that you want
to adjust
2. Select either Move to Category or Merge Categories or
Edit > Delete
85

86

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
87

88

Generating Model
1. Once you are satisfied with the categories you have
created, click
2. Drag the newly created modeling node
into your stream
3. Right click on your source node
4. Click Connect
5. Click on your modeling node to connect the
two nodes
89

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
91

Converting Model Categories to Fields
1. Right click on your modeling node
2. Click Edit
3. Click on the tab
4. Select
5. Change the
6. Click Ok
92

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
93

Deriving a Total Negativity Score
1. Click on the tab
2. Double click the node or click and drag it into the
stream
4. Give a descriptive name to your
5. Click to create a formula
6. In Expression Builder, click on a category that you want to
be in your formula
7. Click to add it
8. Click on an operator such as
9. Add another category
10. When you are finished, click Ok
11. Repeat the process to create additional formulas
94

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
95

Deriving an Overall Sentiment Score
1. Click on the tab
the stream
4. Give a descriptive name to your
5. Select
6. Define field settings:
7. Click Ok
96

Using the Opinions
Text Analysis
Package for…
Manually Adjusting
Categories
Generating Model
Converting Model
Deriving...
Total Negativity
Score
Overall Sentiment
Score
Visualizing Model
Results
98

Visualizing Model Results
1. Click on the tab
the stream
4. Click on the tab
5. Select
6. Select overlay:
7. Select
8. Click
99

Summary
1. To give a broad overview of text analytics…
a. Defining key terms
b. Describing important steps in the process
2. To provide a step-by-step tutorial for how to use IBM SPSS Modeler
to...
a. Read in source text
b. Extract concepts, sentiment, and text link patterns from records
c. Categorize records
d. Visualize the results
101

Additional Resources
• Users Guide:
http://public.dhe.ibm.com/software/analytics/spss/documentation/m
odeler/17.0/en/ModelerTextAnalytics.pdf
• Introduction to SPSS Text Analytics Webinar:
https://www.youtube.com/watch?v=tK-o4MnRScQ&list=WL&index=2
102

Text Analytics Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Text Analytics Presentation

Similar to Text Analytics Presentation (20)

Text Analytics Presentation

Editor's Notes