SlideShare a Scribd company logo
1 of 67
From Sentiment to Persuasion
Analysis: A Look at Idea Generation
Tools
Jason Kessler
Data Scientist, CDK Global
@jasonkessler
www.jasonkessler.com
Outline
• Idea generation tools
– Use large corpora to generate hypotheses about questions like:
– How do you make a persuasive ad?
– How can presidential candidates improve their rhetoric?
– How do ethnicity and gender correlate to language use in online
dating profiles?
– How do movie reviews predict box-office success?
• Technical content:
– Ways of extracting category-associated words and phrases from
corpora
– UX around displaying and provided context to associated words and
phrases
Customer-Written
Product Reviews
Good Ad Content
Naïve Approach: Indicators of Positive Sentiment
"If you ask a Subaru owner what they think of their car, more times than not
they'll tell you they love it,"
-Alan Bethke, director of marketing communications for Subaru of
America (via Adweek)
Positive sentiment.
Engaging language.
Finding Engaging Content
…I was very skeptical giving up my truck
and buying an "Economy Car." I'm 6'
215lbs, but my new career has me
driving a personal vehicle to make sales
calls. I am overly impressed with my
Cruze…
Rating: 4.4/5 Stars
Example Review Appearing on a 3rd Party
Automotive Site
# of users who
read review:
20
Text:
Car Reviewed: Chevy Cruze
Finding Engaging Content
…I was very skeptical giving up my truck
and buying an "Economy Car." I'm 6'
215lbs, but my new career has me
driving a personal vehicle to make sales
calls. I am overly impressed with my
Cruze…
Rating: 4.4/5 Stars
Example Review Appearing on a 3rd Party
Automotive Site
# of users who
read review:
# who went on to
visit a Chevy
dealer’s website: 15
20
Text:
Car Reviewed: Chevy Cruze
Finding Engaging Content
…I was very skeptical giving up my truck
and buying an "Economy Car." I'm 6'
215lbs, but my new career has me
driving a personal vehicle to make sales
calls. I am overly impressed with my
Cruze…
Rating: 4.4/5 Stars
Example Review Appearing on a 3rd Party
Automotive Site
# of users who
read review:
# who went on to
visit a Chevy
dealer’s website: 15
20
Review Engagement Rate:
15/20=75%
Text:
Car Reviewed: Chevy Cruze
Finding Engaging Content
…I was very skeptical giving up my truck
and buying an "Economy Car." I'm 6'
215lbs, but my new career has me
driving a personal vehicle to make sales
calls. I am overly impressed with my
Cruze…
Rating: 4.4/5 Stars
Example Review Appearing on a 3rd Party
Automotive Site
# of users who
read review:
# who went on to
visit a Chevy
dealer’s website: 15
20
Review Engagement Rate:
15/20=75%
Text:
Car Reviewed: Chevy Cruze
Median Review Engagement Rate:
22%
Positive Sentiment High Engagement
Love Comfortable
Comfortable Front [Seats]
Features Acceleration
Solid Free [Car Wash, Oil Change]
Amazing Quiet
Sentiment vs. Persuasiveness: SUV-Specific
Positive Sentiment High Engagement
Love Comfortable
Comfortable Front [Seats]
Features Acceleration
Solid Free [Car Wash, Oil Change]
Amazing Quiet
Sentiment vs. Persuasiveness: SUV-Specific
Negative Sentiment Low Engagement
Transmission Money [spend my, save]
Problem Features
Issue Dealership
Dealership Amazing
Times Build Quality [typically positive]
• We’ll discuss algorithms later in the talk
• Basically, we rank words and phrases based on their classifier
produced feature weights
• Techniques and technologies used
– Unigram and bigram features (bigrams must pass a simple key-
phrase test)
– Ridge classifier
Algorithm for finding word lists
High Sentiment Terms
Love
Awesome
Fantastic
Handled
Perfect
Engagement Terms
Blind (spot, alert)
Contexts from high
engagement reviews
- “The techno safety features
(blind spot, lane alert, etc.)
are reason for buying car...”
- “Side blind Zone Alert is
truly wonderful…”
- …
Can better science improve messaging?
Engagement Terms
Blind
White (paint, diamond)
Contexts
- “White with cornsilk interior.”
- “My wife fell in love with the
Equinox in White Diamond”
- “The white diamond paint is
to die for”
Can better science improve messaging?
Can better science improve messaging?
Engagement Terms
Blind
White
Climate (geography, a/c)
Contexts
- “Love the front wheel drive
in this northern Minn.
Climate”
- “We do live in a cold climate
(Ontario)”
- …climate control…
Just recently, VW has produced very similar
commercials.
Process
Process
Corpus
collection
Label documents
with class of
interest
Find linguistic
elements that are
associated with
class
Explain why
linguistic
elements are
associated.
Identify
documents
of interest.
• For CDK’s usage:
• Persuasive
• High engagement
rate.
• Positive
• High star rating.
- Show
representative
contexts.
- Human
generated
explanation.
- Statistics
supporting
association.
- Ideation.
Complicated!
Will be a major
focus of this talk.
Case Study 1:
Language of Politics
NYT: 2012 Political Convention Word Use by Party
Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html
2012 Political Convention Word Use by Party
Source: http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html,
Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html
Corpus has a class size imbalance:
- Democrats: 79k words across 123 speeches
- Republicans: 60k words across 66 speeches
“Number of mentions by spoken words”
- Normalizes imbalance (282 vs. 182)
- More understandable than P(jobs|Democrat) vs.
P(jobs|Republican), which are both extremely low
numbers (0.36% vs. 0.30%)
• Corpus: Political Convention Speeches
• Class labels: Political Party of Speaker
• Linguistic elements:
– Words and phrases
– Manually chosen
• Explanation:
– Cool bubble diagram
– Selective topic narration
– Click-to-view topic contexts organized by speaker and party
• We’ll get back to this in a minute
Summary: NYT 2012 Conventions
Case Study 2:
Language of Self-
Representation
OKCupid: How does gender and ethnicity affect self-
presentation on online dating profiles?
Christian Rudder: http://blog.okcupid.com/index.php/page/7/
Which words and phrases statistically distinguish ethnic groups and genders?
hobos
almond
butter 100 Years of
Solitude
Bikram yoga
Source: http://blog.okcupid.com/index.php/page/7/ (Rudder 2010)
Words and phrases that distinguish white men.
OKCupid: How do ethnicities’ self-presentation differ
on a dating site?
Source: http://blog.okcupid.com/index.php/page/7/ (Rudder 2010)
Words and phrases that distinguish Latino men.
Explanation
OKCupid: How do ethnicities’ self-presentation differ
on a dating site?
Source: http://blog.okcupid.com/index.php/page/7/
Words and phrases that distinguish Latino men.
OKCupid: How do ethnicities’ self-presentation differ
on a dating site?
The explanation suggests a topic modeling may help to identify latent themes
that are driving these word and phrase distinctiveness.
Source: http://blog.okcupid.com/index.php/page/7/
Words and phrases that distinguish Latino men.
OKCupid: How do ethnicities’ self-presentation differ
on a dating site?
The explanation suggests a topic modeling may help to identify latent themes
that are driving these word and phrase distinctiveness.
What can we do with this?
• Genre of insurance or investment ads
– Montage of important events in the life of a person.
• With these phrase sets, the ads practically write themselves:
• What if you wanted to target Latino men?
– Grows up boxing
– Meets girlfriend salsa dancing
– Becomes a Marine
– Tells a joke at his wedding
– Etc…
The linguistic elements were found “statistically.”
The exact method is unclear, but Rudder (2014)
describes a novel method to identify statistically
associated terms.
- Let’s look closely at the algorithm and see:
- how it works
- and how it performs on the political convention
data set.
OKCupid: How do ethnicities’ self-presentation differ
on a dating site?
* Not drawn to scale
Rankingwith
democrats
Ranking with
republicans
top
middle
bottom
bottom
middle
top
giraffe✚
olympics
✚
ann
✚
bipartisan ✚
people✚
stand ✚
election ✚auto✚
wealthy✚
bin laden✚
regulatory
✚✚
pelosi
✚
rancher
grandfather ✚
public✚
worker✚
regulation ✚
profit ✚
Source: Christian Rudder. Dataclysm. 2014.
* Not drawn to scale
Rankingwith
democrats
Ranking with
republicans
top
middle
bottom
bottom
middle
top
giraffe✚
olympics
✚
ann
✚
bipartisan ✚
people✚
stand ✚
election ✚auto✚
wealthy✚
bin laden✚
regulatory
✚✚
pelosi
✚
rancher
grandfather ✚
public✚
worker✚
regulation ✚
profit ✚
Association between democrats and
“worker” is the Euclidean distance
between word and top left corner
Source: Christian Rudder. Dataclysm. 2014.
* Not drawn to scale
Rankingwith
democrats
Ranking with
republicans
top
middle
bottom
bottom
middle
top
giraffe✚
olympics
✚
ann
✚
bipartisan ✚
people✚
stand ✚
election ✚auto✚
wealthy✚
bin laden✚
regulatory
✚✚
pelosi
✚
rancher
grandfather ✚
public✚
worker✚
regulation ✚
profit ✚
Association between republicans and
“regulation” is the Euclidean distance
between word and bottom right corner
Source: Christian Rudder. Dataclysm. 2014.
Another look at the 2012 political convention data
Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html
- The conventions let political parties reach a broad audience,
and both energize their bases and reach undecided voters.
- How well do these terms capture rhetorical differences between
parties?
Applying the Rudder algorithm to the 2012 data reveals a number of
terms associated with a party that weren’t covered in the NYT viz.
These can uncover party talking points.
Another look at the 2012 political convention data
Republican Top Terms
Included in
Visualization? Comment
olympics no
Gov. Romney was CEO of the Organizing Committee for the 2002 Winter
Olympics.
ann no Ann Romney
big government no
16 [trillion] no Size of national debt
oklahoma no Speech by Mary Fallin, OK governor, mentioned state numerous times.
elect mitt yes
next president no
the constitution no Mostly referring to allegedly unconstitutional actions by Pres. Obama
mitt 's yes
our founding no Founding fathers. Talk of restoring values of founding fathers.
jack no Republicans just seem to talk about people named Jack more.
8 [percent] no
8% unemployment. The term “unemployment” was used in the visualization, but
Democrats didn’t mention the percentage.
they just no “Just don’t get it” was a refrain of a Repub. speaker.
patient no
Discussions of US being “patient,” as well as how the ACA affects the doctor-
patient relationship
pipeline no Keystone pipeline
How well do these terms capture linguistic differences
between parties?
Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html
Before
After
Now let’s look at the Democrats.
• The auto bailout is Pres. Obama’s 2012 Olympics.
• Government is seen as a collection of programs (Pell grants, Medicare
Vouchers, etc…) to help middle class families, vs. “big government”.
• Attacks on wealthy
• No appeals to fundamental principles (“constitution,” “founding fathers”)
• Women explicitly mentioned, while Repubs. talk about Ms. Romney.
Another look at the 2012 political convention data
Democratic Top Terms
Included in
Visualization? Comment
auto [industry] yes Provided in NYT. Pres. Obama was credited with auto industry recovery.
[move] america forward yes *only “forward” was included in visualization.
insurance company no
woman 's yes
[the] wealthy no Never used by Repubs.
pell [grant] no Never used by Repubs.
last week no Used to talk about RNC that happened the pervious week.
grandmother no 6:1 ratio of Dem vs. republican usage. Dovetails with discussion of women.
access no Access to gov’t services or health care
millionaire yes
platform no Repubs never mentioned party platform
voucher no Accusing Republicans of turning Medicare into “voucher.”
class family yes “Middle class” was included.
register no Voter registration. Only used once by Repubs.
• Democrats had an advantage in having their convention last
– They could refute Republican talking points
– The Republicans made Gov. Romney's role in the 2002 Olympics a
major selling point
• It went virtually unmentioned by Democrats
• Republicans may be using numbers to their detriment:
– 8% unemployment
• Often “for 42 months” was added
– $16 trillion deficit
– These numbers are tough to interpret without a lot context
• Romney’s “47%… …are dependent on the government, believe they are
victims” comment may have been the death-nail in his presidential bid
• Jeb Bush’s campaign point of “4% GDP growth” has been ineffective
– His polling numbers are at about 4% at the time of this talk
How can this aid in messaging?
Case Study 3:
Movie reviews and
revenue
- Data:
- 1,718 movie reviews from 2005-2009 7 different publications
(e.g., Austin Chronicle, NY Times, etc.)
- Various movie metadata like rating and director
- Gross revenue
- Task:
- Predict revenue from text, couched as a regression problem
- Regressor used: Elastic Net
- l1 and l2 penalized linear regression
- 2009 reviews were held-out as test data
- Linguistic elements:
- Ngrams: unigrams, bigrams and trigrams
- Dependency relation triples: <dependent, relation, head>
- Versions of features labeled for each publication (i.e. domain)
- “Ent. Weekly: comedy_for”, “Variety: comedy_for”
- Essentially the same algo as Daume III (2007)
- Performed better than naïve baseline, but worse than metadata
Predicting Box-Office Revenue From Movie Reviews
Joshi et al. Movie Reviews and Revenues: An Experiment in Text Regression. NAACL 2010
Daume III. Frustratingly Easy
Domain Adaptation. ACL 2007.
Predicting Box-Office Revenue From Movie Reviews
Joshi et al. Movie Reviews and Revenues: An Experiment in Text Regression. NAACL 2010
manually labeled feature
categories
Feature weight
(“Weight ($M)”) in
linear model
indicates how much
features are “worth”
in millions of dollars.
The learned
coefficients.
- 2015 follow-up work:
- Using convolutional neural network in
place of Elastic Net
Bitvai and Cohn: Non-Linear Text Regression with a Deep Convolutional Neural Network. ACL 2015
Predicting Box-Office Revenue From Movie Reviews
Bitvai and Cohn: Non-Linear Text Regression with a Deep Convolutional Neural Network.
- Word association for convolutional neural
network regressor
- Algorithm:
- Compare the prediction of the regressor
with phrase zeroed out in input to original
output.
- Impact is the difference in outputs.
- Impact for “Hong Kong” will involve
running regressor with “Hong Kong”
zeroed out in movie representation, but
unigrams “Hong” and “Kong” are
unaffected.
Impact = predict({…, “Hong Kong”: 1, …}) –
predict({…, “Hong Kong”: 0, …})
Predicting Box-Office Revenue From Movie Reviews
Bitvai and Cohn: Non-Linear Text Regression with a Deep Convolutional Neural Network.
- Explanation
- ‘#’s reflect count of movies in test set
having a review that used phrase
- Min is lowest impact across movies
classified, max is highest, used for
ordering top positive and negative.
- Many “top” terms only appear in one
movie.
- Manually selected phrases are ordered
by the average impact.
- Open questions:
- Does the increase or decrease in the
prediction actually improve the
regressor’s performance?
- Including the average decrease in
MAE among movies with phrase
would address this.
• The corpus used in Joshi et al. 2010 is freely available.
• Can we use the Rudder algorithm to find interesting associated
terms? How does it compare?
– Rudder algorithm requires two or more classes.
– We can partition the the dataset into high and low revenue partitions.
• High being movies in the upper third of revenue
• Low in the bottom third
– Find words that are associated with high vs. low (throwing out the
middle third) and vice versa
Univariate approach to predicting revenue from text
• Observation definition is really important!
– Recall that the same movie may have multiple reviews.
– We can treat an observation as
• a single review
• a single movie
– The response variable remains the same– movie revenue
Univariate approach to predicting revenue-category
from text
• Observation definition is really important!
– Recall that the same movie may have multiple reviews.
– We can treat an observation as
• a single review
• a single movie
– The response variable remains the same– movie revenue
Univariate approach to predicting revenue-category
from text
Top 5 high revenue terms (Rudder algorithm)
Review-level observations Movie-level observations
Batman Computer generated
Borat Superhero
Rodriguez The franchise
Wahlberg Comic book
Comic book Popcorn
• Observation definition is really important!
– Recall that the same movie may have multiple reviews.
– We can treat an observation as
• a single review
• a single movie
– The response variable remains the same– movie revenue
Univariate approach to predicting revenue-category
from text
Top 5 high revenue terms (Rudder algorithm)
Review-level observations Movie-level observations
Batman Computer generated
Borat Superhero
Rodriguez The franchise
Wahlberg Comic book
Comic book Popcorn
Univariate approach to predicting revenue-category
from text
Top 5
Computer generated
Superhero
The franchise
Comic book
Popcorn
Bottom 5
exclusively
[Phone number]
Festival
Tribeca
With English
Failed to produce term associations around
content ratings (e.g., PG-13, “strong
language”). Rating is strongly correlated to
revenue.
Let’s look exclusively at PG-13 movies
Only PG-13-rated movies
Selected Top Terms
Franchise
Computer generated
Installment
The first two
The ultimate
Selected Bottom Terms
[Theater specific terms like
phone numbers]
A friend
Her mother
Parent
One day
Siblings
Top terms are very similar. Franchises
and sequels remain indicator of success.
Bottom terms tell us something new!
Movies about friendship or family
dynamics don’t seem to perform well.
Idea generation tools can also be idea
rejection tools.
- Producers looking for a movie to pull in
a lot of revenue, a PG-13 family
melodrama isn’t a great idea.
Lesson: Corpus selection is important in
getting actionable, interpretable results!
Language use and age
Language use over time in Facebook statuses
Best topic for each age
group listed.
LOESS regression line for
prevalence by age group
Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, et al. (2013) Personality, Gender, and Age in the Language of
Social Media: The Open Vocabulary Approach. PLoS ONE 8(9)
Nod to James Pennebaker
Word cloud pros and cons
Alternative to word cloud is
list, ranked by phrase
frequency or phrase
precision.
Pro
• Word clouds force you to
hunt for the most
impactful terms
• You end up examining the
long tail in the process
• Compactly represent
many phrases
Con
• Longer words are more
prominent.
• “Mullet of the Internet”
• Hard to show phrase
annotations.
• Ranking is unclear.
Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, et al. (2013) Personality, Gender, and Age in the Language of
Social Media: The Open Vocabulary Approach. PLoS ONE 8(9)
CDK Global’s
Language
Visualization
Tool
• Suppose you are selling a car to a typical person, how would you
describe the car’s performance?
• Should you say
– This car has 162 ft-lbs of torque.
– OR
– This car makes passing on two lane roads easy.
• Having an idea generation (and rejection) tool makes this very
easy.
Informing dealer talk tracks.
• Corpus and document selection are important
– Documents: movie-level instead of review-level
– Corpus: rating-specific
– Subsets of corpus can be particularly interesting: e.g., PG-13 movies
• Don’t always look at extreme terms
– The Rudder algorithm on the NYT visualization lacked many important
issues like Medicar
• Use a variety of approaches
– Univariate and multivariate approaches can highlight different terms
• More phrase context is better than less
• Phrase lists are most understandable when presented with a
narrative, even if it’s a bit speculative
Recommendations
• Thank you!
• We’re hiring
– talk to me (best) or, if you can’t, go to CDKJobs.com
• Special thanks Joel Collymore (the concept of “idea generation
tool”), Michael Mabale (thoughts on word clouds), Michael
Eggerling, Ray Littell-Herrick, Peter Huang, Peter Kahn, Iris
Laband, Kyle Lo, Chris Mills, Dengyao Mo, Keith Zackarone
Acknowledgements
Questions? (Yes, we’re hiring!!)
• Data Scientist
• UI/UX Development
& Design
• Software Engineer –
all levels
• Product Manager
Is this
you?
• Find “Jobs by
Category”
• Click Technology
• Have your
Resume ready
• Click “Apply”!
Head to
CDKJobs.com
-or-
talk to me
@jasonkessler

More Related Content

What's hot

SEO For Photographers
SEO For PhotographersSEO For Photographers
SEO For Photographerslaurgolis
 
Global Travel Network - Design
Global Travel Network  - DesignGlobal Travel Network  - Design
Global Travel Network - DesignSymantec
 
Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14
Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14
Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14Jordan Godbey
 
Search Engine Marketing: How Insurance Agents Can Take Advantage
Search Engine Marketing: How Insurance Agents Can Take AdvantageSearch Engine Marketing: How Insurance Agents Can Take Advantage
Search Engine Marketing: How Insurance Agents Can Take Advantagemmahan
 
What? How? Why? Building Query Personas to Power Your Content Strategy
What? How? Why? Building Query Personas to Power Your Content StrategyWhat? How? Why? Building Query Personas to Power Your Content Strategy
What? How? Why? Building Query Personas to Power Your Content StrategyGrant Simmons
 
Business Intelligence | Competitive Intelligence | Business Intelligence Tools
Business Intelligence | Competitive Intelligence | Business Intelligence ToolsBusiness Intelligence | Competitive Intelligence | Business Intelligence Tools
Business Intelligence | Competitive Intelligence | Business Intelligence ToolsRoland Frasier
 
Optimization advice-for-watertownbuysellgold-com-just-sell-gold
Optimization advice-for-watertownbuysellgold-com-just-sell-goldOptimization advice-for-watertownbuysellgold-com-just-sell-gold
Optimization advice-for-watertownbuysellgold-com-just-sell-goldBrian Bateman
 

What's hot (8)

SMX East 2015
SMX East 2015SMX East 2015
SMX East 2015
 
SEO For Photographers
SEO For PhotographersSEO For Photographers
SEO For Photographers
 
Global Travel Network - Design
Global Travel Network  - DesignGlobal Travel Network  - Design
Global Travel Network - Design
 
Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14
Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14
Cincinnati AMA - How to Align Content and Keywords to the Buying Cycle - 2-12-14
 
Search Engine Marketing: How Insurance Agents Can Take Advantage
Search Engine Marketing: How Insurance Agents Can Take AdvantageSearch Engine Marketing: How Insurance Agents Can Take Advantage
Search Engine Marketing: How Insurance Agents Can Take Advantage
 
What? How? Why? Building Query Personas to Power Your Content Strategy
What? How? Why? Building Query Personas to Power Your Content StrategyWhat? How? Why? Building Query Personas to Power Your Content Strategy
What? How? Why? Building Query Personas to Power Your Content Strategy
 
Business Intelligence | Competitive Intelligence | Business Intelligence Tools
Business Intelligence | Competitive Intelligence | Business Intelligence ToolsBusiness Intelligence | Competitive Intelligence | Business Intelligence Tools
Business Intelligence | Competitive Intelligence | Business Intelligence Tools
 
Optimization advice-for-watertownbuysellgold-com-just-sell-gold
Optimization advice-for-watertownbuysellgold-com-just-sell-goldOptimization advice-for-watertownbuysellgold-com-just-sell-gold
Optimization advice-for-watertownbuysellgold-com-just-sell-gold
 

Similar to From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools

Nativity Scene Essay. Online assignment writing service.
Nativity Scene Essay. Online assignment writing service.Nativity Scene Essay. Online assignment writing service.
Nativity Scene Essay. Online assignment writing service.Crystal Hall
 
It's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin Eagan
It's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin EaganIt's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin Eagan
It's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin EaganUXPA International
 
Pt 1 Analyzing Your Website Workshop for Wedding Pros
Pt 1 Analyzing Your Website Workshop for Wedding ProsPt 1 Analyzing Your Website Workshop for Wedding Pros
Pt 1 Analyzing Your Website Workshop for Wedding ProsChristie Osborne
 
Search Engine Optimization (SEO) Training Presentation
Search Engine Optimization (SEO) Training PresentationSearch Engine Optimization (SEO) Training Presentation
Search Engine Optimization (SEO) Training PresentationAaron Bramley
 
Research Paper Reference Citation - Term Paper For
Research Paper Reference Citation - Term Paper ForResearch Paper Reference Citation - Term Paper For
Research Paper Reference Citation - Term Paper ForCarla Bennington
 
Week 2 - Planning A Web Site Audience - 2
Week 2 - Planning A Web Site Audience - 2Week 2 - Planning A Web Site Audience - 2
Week 2 - Planning A Web Site Audience - 2Stark State College
 
Actionable Local SEO Tips - RizeCon - Nathan Hawkes
Actionable Local SEO Tips - RizeCon - Nathan HawkesActionable Local SEO Tips - RizeCon - Nathan Hawkes
Actionable Local SEO Tips - RizeCon - Nathan HawkesArcane Marketing
 
Conductor @ State of Search 2019
Conductor @ State of Search 2019Conductor @ State of Search 2019
Conductor @ State of Search 2019Christine Schrader
 
Why Great Marketers Must Be Great Skeptics
Why Great Marketers Must Be Great SkepticsWhy Great Marketers Must Be Great Skeptics
Why Great Marketers Must Be Great SkepticsRand Fishkin
 
From Intake to Engagement: Old School and New Cool Strategies and Techniques
From Intake to Engagement: Old School and New Cool Strategies and TechniquesFrom Intake to Engagement: Old School and New Cool Strategies and Techniques
From Intake to Engagement: Old School and New Cool Strategies and TechniquesRecruitDC
 
Onetomarket | How to benefit from developments online
Onetomarket | How to benefit from developments onlineOnetomarket | How to benefit from developments online
Onetomarket | How to benefit from developments onlinesearch congress
 
Why Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.com
Why Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.comWhy Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.com
Why Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.comOptimizely
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan
 
Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...
Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...
Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...SiteTuners Conversion Rate Optimization
 
Why Inbound Marketing for Online Business - EBriks Infotech
Why Inbound Marketing for Online Business - EBriks InfotechWhy Inbound Marketing for Online Business - EBriks Infotech
Why Inbound Marketing for Online Business - EBriks InfotechEBriks Infotech Pvt. Ltd.
 

Similar to From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools (20)

Nativity Scene Essay. Online assignment writing service.
Nativity Scene Essay. Online assignment writing service.Nativity Scene Essay. Online assignment writing service.
Nativity Scene Essay. Online assignment writing service.
 
It's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin Eagan
It's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin EaganIt's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin Eagan
It's Getting Personal: The Rise of Hyper-Targeted User Experiences - Colin Eagan
 
Pt 1 Analyzing Your Website Workshop for Wedding Pros
Pt 1 Analyzing Your Website Workshop for Wedding ProsPt 1 Analyzing Your Website Workshop for Wedding Pros
Pt 1 Analyzing Your Website Workshop for Wedding Pros
 
Search Engine Optimization (SEO) Training Presentation
Search Engine Optimization (SEO) Training PresentationSearch Engine Optimization (SEO) Training Presentation
Search Engine Optimization (SEO) Training Presentation
 
1640 track1 kessler
1640 track1 kessler1640 track1 kessler
1640 track1 kessler
 
Content, Conversions, and Lead Generation
Content, Conversions, and Lead GenerationContent, Conversions, and Lead Generation
Content, Conversions, and Lead Generation
 
Finding Web Resources
Finding Web ResourcesFinding Web Resources
Finding Web Resources
 
Research Paper Reference Citation - Term Paper For
Research Paper Reference Citation - Term Paper ForResearch Paper Reference Citation - Term Paper For
Research Paper Reference Citation - Term Paper For
 
Week 2 - Planning A Web Site Audience - 2
Week 2 - Planning A Web Site Audience - 2Week 2 - Planning A Web Site Audience - 2
Week 2 - Planning A Web Site Audience - 2
 
The Poetry of SEO
The Poetry of SEOThe Poetry of SEO
The Poetry of SEO
 
Actionable Local SEO Tips - RizeCon - Nathan Hawkes
Actionable Local SEO Tips - RizeCon - Nathan HawkesActionable Local SEO Tips - RizeCon - Nathan Hawkes
Actionable Local SEO Tips - RizeCon - Nathan Hawkes
 
Conductor @ State of Search 2019
Conductor @ State of Search 2019Conductor @ State of Search 2019
Conductor @ State of Search 2019
 
Why Great Marketers Must Be Great Skeptics
Why Great Marketers Must Be Great SkepticsWhy Great Marketers Must Be Great Skeptics
Why Great Marketers Must Be Great Skeptics
 
From Intake to Engagement: Old School and New Cool Strategies and Techniques
From Intake to Engagement: Old School and New Cool Strategies and TechniquesFrom Intake to Engagement: Old School and New Cool Strategies and Techniques
From Intake to Engagement: Old School and New Cool Strategies and Techniques
 
Onetomarket | How to benefit from developments online
Onetomarket | How to benefit from developments onlineOnetomarket | How to benefit from developments online
Onetomarket | How to benefit from developments online
 
Why Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.com
Why Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.comWhy Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.com
Why Great Marketers Must be Great Skeptics: Rand Fishkin, Moz.com
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...
Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...
Content, Conversions and Lead Generation Webinar (presented by Andy Crestodin...
 
Why Inbound Marketing for Online Business - EBriks Infotech
Why Inbound Marketing for Online Business - EBriks InfotechWhy Inbound Marketing for Online Business - EBriks Infotech
Why Inbound Marketing for Online Business - EBriks Infotech
 
Inspiring CMU
Inspiring CMUInspiring CMU
Inspiring CMU
 

More from Jason Kessler

Visualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextVisualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextJason Kessler
 
Natural Language Visualization with Scattertext
Natural Language Visualization with ScattertextNatural Language Visualization with Scattertext
Natural Language Visualization with ScattertextJason Kessler
 
Lexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary ClassificationLexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary ClassificationJason Kessler
 
Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler
 
The 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainThe 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainJason Kessler
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Jason Kessler
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Jason Kessler
 

More from Jason Kessler (7)

Visualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextVisualizing Words and Topics with Scattertext
Visualizing Words and Topics with Scattertext
 
Natural Language Visualization with Scattertext
Natural Language Visualization with ScattertextNatural Language Visualization with Scattertext
Natural Language Visualization with Scattertext
 
Lexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary ClassificationLexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary Classification
 
Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with Twitter
 
The 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainThe 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive Domain
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
 

Recently uploaded

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 

Recently uploaded (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 

From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools

  • 1. From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools Jason Kessler Data Scientist, CDK Global @jasonkessler www.jasonkessler.com
  • 2. Outline • Idea generation tools – Use large corpora to generate hypotheses about questions like: – How do you make a persuasive ad? – How can presidential candidates improve their rhetoric? – How do ethnicity and gender correlate to language use in online dating profiles? – How do movie reviews predict box-office success? • Technical content: – Ways of extracting category-associated words and phrases from corpora – UX around displaying and provided context to associated words and phrases
  • 4. Naïve Approach: Indicators of Positive Sentiment "If you ask a Subaru owner what they think of their car, more times than not they'll tell you they love it," -Alan Bethke, director of marketing communications for Subaru of America (via Adweek)
  • 6. Finding Engaging Content …I was very skeptical giving up my truck and buying an "Economy Car." I'm 6' 215lbs, but my new career has me driving a personal vehicle to make sales calls. I am overly impressed with my Cruze… Rating: 4.4/5 Stars Example Review Appearing on a 3rd Party Automotive Site # of users who read review: 20 Text: Car Reviewed: Chevy Cruze
  • 7. Finding Engaging Content …I was very skeptical giving up my truck and buying an "Economy Car." I'm 6' 215lbs, but my new career has me driving a personal vehicle to make sales calls. I am overly impressed with my Cruze… Rating: 4.4/5 Stars Example Review Appearing on a 3rd Party Automotive Site # of users who read review: # who went on to visit a Chevy dealer’s website: 15 20 Text: Car Reviewed: Chevy Cruze
  • 8. Finding Engaging Content …I was very skeptical giving up my truck and buying an "Economy Car." I'm 6' 215lbs, but my new career has me driving a personal vehicle to make sales calls. I am overly impressed with my Cruze… Rating: 4.4/5 Stars Example Review Appearing on a 3rd Party Automotive Site # of users who read review: # who went on to visit a Chevy dealer’s website: 15 20 Review Engagement Rate: 15/20=75% Text: Car Reviewed: Chevy Cruze
  • 9. Finding Engaging Content …I was very skeptical giving up my truck and buying an "Economy Car." I'm 6' 215lbs, but my new career has me driving a personal vehicle to make sales calls. I am overly impressed with my Cruze… Rating: 4.4/5 Stars Example Review Appearing on a 3rd Party Automotive Site # of users who read review: # who went on to visit a Chevy dealer’s website: 15 20 Review Engagement Rate: 15/20=75% Text: Car Reviewed: Chevy Cruze Median Review Engagement Rate: 22%
  • 10.
  • 11. Positive Sentiment High Engagement Love Comfortable Comfortable Front [Seats] Features Acceleration Solid Free [Car Wash, Oil Change] Amazing Quiet Sentiment vs. Persuasiveness: SUV-Specific
  • 12. Positive Sentiment High Engagement Love Comfortable Comfortable Front [Seats] Features Acceleration Solid Free [Car Wash, Oil Change] Amazing Quiet Sentiment vs. Persuasiveness: SUV-Specific Negative Sentiment Low Engagement Transmission Money [spend my, save] Problem Features Issue Dealership Dealership Amazing Times Build Quality [typically positive]
  • 13. • We’ll discuss algorithms later in the talk • Basically, we rank words and phrases based on their classifier produced feature weights • Techniques and technologies used – Unigram and bigram features (bigrams must pass a simple key- phrase test) – Ridge classifier Algorithm for finding word lists
  • 15. Engagement Terms Blind (spot, alert) Contexts from high engagement reviews - “The techno safety features (blind spot, lane alert, etc.) are reason for buying car...” - “Side blind Zone Alert is truly wonderful…” - …
  • 16.
  • 17. Can better science improve messaging? Engagement Terms Blind White (paint, diamond) Contexts - “White with cornsilk interior.” - “My wife fell in love with the Equinox in White Diamond” - “The white diamond paint is to die for”
  • 18. Can better science improve messaging?
  • 19. Can better science improve messaging? Engagement Terms Blind White Climate (geography, a/c) Contexts - “Love the front wheel drive in this northern Minn. Climate” - “We do live in a cold climate (Ontario)” - …climate control…
  • 20.
  • 21. Just recently, VW has produced very similar commercials.
  • 23. Process Corpus collection Label documents with class of interest Find linguistic elements that are associated with class Explain why linguistic elements are associated. Identify documents of interest. • For CDK’s usage: • Persuasive • High engagement rate. • Positive • High star rating. - Show representative contexts. - Human generated explanation. - Statistics supporting association. - Ideation. Complicated! Will be a major focus of this talk.
  • 24. Case Study 1: Language of Politics
  • 25. NYT: 2012 Political Convention Word Use by Party Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html
  • 26. 2012 Political Convention Word Use by Party Source: http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html,
  • 27. Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html Corpus has a class size imbalance: - Democrats: 79k words across 123 speeches - Republicans: 60k words across 66 speeches “Number of mentions by spoken words” - Normalizes imbalance (282 vs. 182) - More understandable than P(jobs|Democrat) vs. P(jobs|Republican), which are both extremely low numbers (0.36% vs. 0.30%)
  • 28. • Corpus: Political Convention Speeches • Class labels: Political Party of Speaker • Linguistic elements: – Words and phrases – Manually chosen • Explanation: – Cool bubble diagram – Selective topic narration – Click-to-view topic contexts organized by speaker and party • We’ll get back to this in a minute Summary: NYT 2012 Conventions
  • 29. Case Study 2: Language of Self- Representation
  • 30. OKCupid: How does gender and ethnicity affect self- presentation on online dating profiles? Christian Rudder: http://blog.okcupid.com/index.php/page/7/ Which words and phrases statistically distinguish ethnic groups and genders? hobos almond butter 100 Years of Solitude Bikram yoga
  • 31. Source: http://blog.okcupid.com/index.php/page/7/ (Rudder 2010) Words and phrases that distinguish white men. OKCupid: How do ethnicities’ self-presentation differ on a dating site?
  • 32. Source: http://blog.okcupid.com/index.php/page/7/ (Rudder 2010) Words and phrases that distinguish Latino men. Explanation OKCupid: How do ethnicities’ self-presentation differ on a dating site?
  • 33. Source: http://blog.okcupid.com/index.php/page/7/ Words and phrases that distinguish Latino men. OKCupid: How do ethnicities’ self-presentation differ on a dating site? The explanation suggests a topic modeling may help to identify latent themes that are driving these word and phrase distinctiveness.
  • 34. Source: http://blog.okcupid.com/index.php/page/7/ Words and phrases that distinguish Latino men. OKCupid: How do ethnicities’ self-presentation differ on a dating site? The explanation suggests a topic modeling may help to identify latent themes that are driving these word and phrase distinctiveness.
  • 35. What can we do with this? • Genre of insurance or investment ads – Montage of important events in the life of a person. • With these phrase sets, the ads practically write themselves: • What if you wanted to target Latino men? – Grows up boxing – Meets girlfriend salsa dancing – Becomes a Marine – Tells a joke at his wedding – Etc…
  • 36. The linguistic elements were found “statistically.” The exact method is unclear, but Rudder (2014) describes a novel method to identify statistically associated terms. - Let’s look closely at the algorithm and see: - how it works - and how it performs on the political convention data set. OKCupid: How do ethnicities’ self-presentation differ on a dating site?
  • 37. * Not drawn to scale Rankingwith democrats Ranking with republicans top middle bottom bottom middle top giraffe✚ olympics ✚ ann ✚ bipartisan ✚ people✚ stand ✚ election ✚auto✚ wealthy✚ bin laden✚ regulatory ✚✚ pelosi ✚ rancher grandfather ✚ public✚ worker✚ regulation ✚ profit ✚ Source: Christian Rudder. Dataclysm. 2014.
  • 38. * Not drawn to scale Rankingwith democrats Ranking with republicans top middle bottom bottom middle top giraffe✚ olympics ✚ ann ✚ bipartisan ✚ people✚ stand ✚ election ✚auto✚ wealthy✚ bin laden✚ regulatory ✚✚ pelosi ✚ rancher grandfather ✚ public✚ worker✚ regulation ✚ profit ✚ Association between democrats and “worker” is the Euclidean distance between word and top left corner Source: Christian Rudder. Dataclysm. 2014.
  • 39. * Not drawn to scale Rankingwith democrats Ranking with republicans top middle bottom bottom middle top giraffe✚ olympics ✚ ann ✚ bipartisan ✚ people✚ stand ✚ election ✚auto✚ wealthy✚ bin laden✚ regulatory ✚✚ pelosi ✚ rancher grandfather ✚ public✚ worker✚ regulation ✚ profit ✚ Association between republicans and “regulation” is the Euclidean distance between word and bottom right corner Source: Christian Rudder. Dataclysm. 2014.
  • 40. Another look at the 2012 political convention data Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html - The conventions let political parties reach a broad audience, and both energize their bases and reach undecided voters. - How well do these terms capture rhetorical differences between parties?
  • 41. Applying the Rudder algorithm to the 2012 data reveals a number of terms associated with a party that weren’t covered in the NYT viz. These can uncover party talking points. Another look at the 2012 political convention data Republican Top Terms Included in Visualization? Comment olympics no Gov. Romney was CEO of the Organizing Committee for the 2002 Winter Olympics. ann no Ann Romney big government no 16 [trillion] no Size of national debt oklahoma no Speech by Mary Fallin, OK governor, mentioned state numerous times. elect mitt yes next president no the constitution no Mostly referring to allegedly unconstitutional actions by Pres. Obama mitt 's yes our founding no Founding fathers. Talk of restoring values of founding fathers. jack no Republicans just seem to talk about people named Jack more. 8 [percent] no 8% unemployment. The term “unemployment” was used in the visualization, but Democrats didn’t mention the percentage. they just no “Just don’t get it” was a refrain of a Repub. speaker. patient no Discussions of US being “patient,” as well as how the ACA affects the doctor- patient relationship pipeline no Keystone pipeline
  • 42. How well do these terms capture linguistic differences between parties? Mike Bostock et al., http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html Before After
  • 43. Now let’s look at the Democrats. • The auto bailout is Pres. Obama’s 2012 Olympics. • Government is seen as a collection of programs (Pell grants, Medicare Vouchers, etc…) to help middle class families, vs. “big government”. • Attacks on wealthy • No appeals to fundamental principles (“constitution,” “founding fathers”) • Women explicitly mentioned, while Repubs. talk about Ms. Romney. Another look at the 2012 political convention data Democratic Top Terms Included in Visualization? Comment auto [industry] yes Provided in NYT. Pres. Obama was credited with auto industry recovery. [move] america forward yes *only “forward” was included in visualization. insurance company no woman 's yes [the] wealthy no Never used by Repubs. pell [grant] no Never used by Repubs. last week no Used to talk about RNC that happened the pervious week. grandmother no 6:1 ratio of Dem vs. republican usage. Dovetails with discussion of women. access no Access to gov’t services or health care millionaire yes platform no Repubs never mentioned party platform voucher no Accusing Republicans of turning Medicare into “voucher.” class family yes “Middle class” was included. register no Voter registration. Only used once by Repubs.
  • 44. • Democrats had an advantage in having their convention last – They could refute Republican talking points – The Republicans made Gov. Romney's role in the 2002 Olympics a major selling point • It went virtually unmentioned by Democrats • Republicans may be using numbers to their detriment: – 8% unemployment • Often “for 42 months” was added – $16 trillion deficit – These numbers are tough to interpret without a lot context • Romney’s “47%… …are dependent on the government, believe they are victims” comment may have been the death-nail in his presidential bid • Jeb Bush’s campaign point of “4% GDP growth” has been ineffective – His polling numbers are at about 4% at the time of this talk How can this aid in messaging?
  • 45. Case Study 3: Movie reviews and revenue
  • 46. - Data: - 1,718 movie reviews from 2005-2009 7 different publications (e.g., Austin Chronicle, NY Times, etc.) - Various movie metadata like rating and director - Gross revenue - Task: - Predict revenue from text, couched as a regression problem - Regressor used: Elastic Net - l1 and l2 penalized linear regression - 2009 reviews were held-out as test data - Linguistic elements: - Ngrams: unigrams, bigrams and trigrams - Dependency relation triples: <dependent, relation, head> - Versions of features labeled for each publication (i.e. domain) - “Ent. Weekly: comedy_for”, “Variety: comedy_for” - Essentially the same algo as Daume III (2007) - Performed better than naïve baseline, but worse than metadata Predicting Box-Office Revenue From Movie Reviews Joshi et al. Movie Reviews and Revenues: An Experiment in Text Regression. NAACL 2010 Daume III. Frustratingly Easy Domain Adaptation. ACL 2007.
  • 47. Predicting Box-Office Revenue From Movie Reviews Joshi et al. Movie Reviews and Revenues: An Experiment in Text Regression. NAACL 2010 manually labeled feature categories Feature weight (“Weight ($M)”) in linear model indicates how much features are “worth” in millions of dollars. The learned coefficients.
  • 48. - 2015 follow-up work: - Using convolutional neural network in place of Elastic Net Bitvai and Cohn: Non-Linear Text Regression with a Deep Convolutional Neural Network. ACL 2015
  • 49. Predicting Box-Office Revenue From Movie Reviews Bitvai and Cohn: Non-Linear Text Regression with a Deep Convolutional Neural Network. - Word association for convolutional neural network regressor - Algorithm: - Compare the prediction of the regressor with phrase zeroed out in input to original output. - Impact is the difference in outputs. - Impact for “Hong Kong” will involve running regressor with “Hong Kong” zeroed out in movie representation, but unigrams “Hong” and “Kong” are unaffected. Impact = predict({…, “Hong Kong”: 1, …}) – predict({…, “Hong Kong”: 0, …})
  • 50. Predicting Box-Office Revenue From Movie Reviews Bitvai and Cohn: Non-Linear Text Regression with a Deep Convolutional Neural Network. - Explanation - ‘#’s reflect count of movies in test set having a review that used phrase - Min is lowest impact across movies classified, max is highest, used for ordering top positive and negative. - Many “top” terms only appear in one movie. - Manually selected phrases are ordered by the average impact. - Open questions: - Does the increase or decrease in the prediction actually improve the regressor’s performance? - Including the average decrease in MAE among movies with phrase would address this.
  • 51. • The corpus used in Joshi et al. 2010 is freely available. • Can we use the Rudder algorithm to find interesting associated terms? How does it compare? – Rudder algorithm requires two or more classes. – We can partition the the dataset into high and low revenue partitions. • High being movies in the upper third of revenue • Low in the bottom third – Find words that are associated with high vs. low (throwing out the middle third) and vice versa Univariate approach to predicting revenue from text
  • 52. • Observation definition is really important! – Recall that the same movie may have multiple reviews. – We can treat an observation as • a single review • a single movie – The response variable remains the same– movie revenue Univariate approach to predicting revenue-category from text
  • 53. • Observation definition is really important! – Recall that the same movie may have multiple reviews. – We can treat an observation as • a single review • a single movie – The response variable remains the same– movie revenue Univariate approach to predicting revenue-category from text Top 5 high revenue terms (Rudder algorithm) Review-level observations Movie-level observations Batman Computer generated Borat Superhero Rodriguez The franchise Wahlberg Comic book Comic book Popcorn
  • 54. • Observation definition is really important! – Recall that the same movie may have multiple reviews. – We can treat an observation as • a single review • a single movie – The response variable remains the same– movie revenue Univariate approach to predicting revenue-category from text Top 5 high revenue terms (Rudder algorithm) Review-level observations Movie-level observations Batman Computer generated Borat Superhero Rodriguez The franchise Wahlberg Comic book Comic book Popcorn
  • 55. Univariate approach to predicting revenue-category from text Top 5 Computer generated Superhero The franchise Comic book Popcorn Bottom 5 exclusively [Phone number] Festival Tribeca With English Failed to produce term associations around content ratings (e.g., PG-13, “strong language”). Rating is strongly correlated to revenue. Let’s look exclusively at PG-13 movies
  • 56. Only PG-13-rated movies Selected Top Terms Franchise Computer generated Installment The first two The ultimate Selected Bottom Terms [Theater specific terms like phone numbers] A friend Her mother Parent One day Siblings Top terms are very similar. Franchises and sequels remain indicator of success. Bottom terms tell us something new! Movies about friendship or family dynamics don’t seem to perform well. Idea generation tools can also be idea rejection tools. - Producers looking for a movie to pull in a lot of revenue, a PG-13 family melodrama isn’t a great idea. Lesson: Corpus selection is important in getting actionable, interpretable results!
  • 58. Language use over time in Facebook statuses Best topic for each age group listed. LOESS regression line for prevalence by age group Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, et al. (2013) Personality, Gender, and Age in the Language of Social Media: The Open Vocabulary Approach. PLoS ONE 8(9) Nod to James Pennebaker
  • 59. Word cloud pros and cons Alternative to word cloud is list, ranked by phrase frequency or phrase precision. Pro • Word clouds force you to hunt for the most impactful terms • You end up examining the long tail in the process • Compactly represent many phrases Con • Longer words are more prominent. • “Mullet of the Internet” • Hard to show phrase annotations. • Ranking is unclear. Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, et al. (2013) Personality, Gender, and Age in the Language of Social Media: The Open Vocabulary Approach. PLoS ONE 8(9)
  • 61.
  • 62.
  • 63.
  • 64. • Suppose you are selling a car to a typical person, how would you describe the car’s performance? • Should you say – This car has 162 ft-lbs of torque. – OR – This car makes passing on two lane roads easy. • Having an idea generation (and rejection) tool makes this very easy. Informing dealer talk tracks.
  • 65. • Corpus and document selection are important – Documents: movie-level instead of review-level – Corpus: rating-specific – Subsets of corpus can be particularly interesting: e.g., PG-13 movies • Don’t always look at extreme terms – The Rudder algorithm on the NYT visualization lacked many important issues like Medicar • Use a variety of approaches – Univariate and multivariate approaches can highlight different terms • More phrase context is better than less • Phrase lists are most understandable when presented with a narrative, even if it’s a bit speculative Recommendations
  • 66. • Thank you! • We’re hiring – talk to me (best) or, if you can’t, go to CDKJobs.com • Special thanks Joel Collymore (the concept of “idea generation tool”), Michael Mabale (thoughts on word clouds), Michael Eggerling, Ray Littell-Herrick, Peter Huang, Peter Kahn, Iris Laband, Kyle Lo, Chris Mills, Dengyao Mo, Keith Zackarone Acknowledgements
  • 67. Questions? (Yes, we’re hiring!!) • Data Scientist • UI/UX Development & Design • Software Engineer – all levels • Product Manager Is this you? • Find “Jobs by Category” • Click Technology • Have your Resume ready • Click “Apply”! Head to CDKJobs.com -or- talk to me @jasonkessler

Editor's Notes

  1. Source: http://www.adweek.com/news/advertising-branding/how-subaru-fell-love-and-never-looked-back-148475 "If you ask a Subaru owner what they think of their car, more times than not they'll tell you they love it," said Alan Bethke, director of marketing communications for Subaru of America. "It was always in front of us, but never utilized in the marketing."
  2. Break up slide
  3. Break up slide
  4. Break up slide
  5. Three and four star reviews may be more credible.
  6. Source: http://www.adweek.com/news/advertising-branding/how-subaru-fell-love-and-never-looked-back-148475
  7. Edit examples to ensure that blind spot alert system
  8. Source: http://www.adweek.com/news/advertising-branding/how-subaru-fell-love-and-never-looked-back-148475
  9. Excuse the poor photoshopping.
  10. Excuse the poor photoshopping.
  11. You’re left to your own devices to try and make sense of the differences.
  12. See http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ for more info on the model
  13. Theater specific terms talk about independent theaters