SlideShare a Scribd company logo
1 of 20
© Copyright 2013 Attensity All rights reserved
Little Words in Big Data
Jessica Perri
Dir. Linguistic Technology
Attensity Corp.
jperri@attensity.com
© Copyright 2013 Attensity All rights reserved
Overview
Big Data – Social Media
Natural Language Parsing and Extraction
Sentiment
© Copyright 2013 Attensity All rights reserved
We have more data available than
ever before…
© Copyright 2013 Attensity All rights reserved
Big Data and Big Growth
• The amount of available data growing exponentially
• Seeing a change in the discourse landscape
– Dramatic increase in personal narrative (blogs, reviews, twitter, etc)
– Shift in authorship and compositional methods (smart phones, tablets, etc)
• Result: More variation in data than ever before
© Copyright 2013 Attensity All rights reserved
But… more data does not
necessarily mean better data.
© Copyright 2013 Attensity All rights reserved
Processing Challenges - Where Did the Data Come From?
• Signal/Noise ratio worse than ever
– ETL problems
– Spam, spam, spam
– Marketing materials
– Shills, employees, interns and unsavory types gaming the system
• Domain detection critical for pragmatic assumptions
© Copyright 2013 Attensity All rights reserved
Processing Challenges - What is the Data Composed Of?
• Text is “degraded”
– Missing/excessive punctuation
– Missing words
– Typographical errors
– Rapid topic shift
• Language is extremely varied, and constantly changing
– A million words for a single picture
– Productive, phonological rules for emphasis (loooooooooooool, uggghhhhh)
– Novel and coined terms
• Not business relevant
© Copyright 2013 Attensity All rights reserved
Processing Challenges – Extralinguistic Cues
• People are opinionated
• People are sassy
• People are sarcastic
• People are clever
@jane: Obama won. I’m SO HAPPY to have a
#socialist #communist president.
@jane: Poor Romney. I’m so sad that he has to go
home to one of his 35 mansions. #not
@jane: It’s so great that Obama won. </sarcasm>
@jane: It’s so great that Obama won.
#saidnooneever
© Copyright 2013 Attensity All rights reserved
We need to use existing data more
intelligently!
© Copyright 2013 Attensity All rights reserved
What can we do with Big Data?
• “Looking for a needle in a haystack”
• Search for predefined scenarios: Recovery
• Implications for processing: Use a set of targeted
patterns over all possible data
© Copyright 2013 Attensity All rights reserved
What can we do with Big Data?
• “Looking for the shape of the haystack”
• Look for trends and novel events: Discovery
– IDKWILFBIKIWISI
• Implications for processing: Use dynamic patterns
over a sample of data (“exhaustive extraction”)
© Copyright 2013 Attensity All rights reserved
Attensity Exhaustive Extraction – Roles and Relationships
“I bought a beautiful Jimmy Choo scarf for my mom from Nordstrom.”
© Copyright 2013 Attensity All rights reserved
Attensity Voice – Shades of Meaning
Indefinite Voice depicts the uncertainty of the statement:
I might stay here again.
Intent Voice indicates the plans of a customer:
We will definitely stay here in the future!
Conditional Voice reveals customer’s stipulations:
I would shop more often if I got free shipping.
Negation cancels out the statement:
I have never reset my password.
Recur Voice conveys the recurring manner of the event:
This is the third time I’ve emailed them.
Command Voice detects strong demands from a customer, distinguishing them from requests or statements of fact:
Lower your prices.
© Copyright 2013 Attensity All rights reserved
Domain Knowledge Models
• Narrow topic definition
– Data variability across domains
– Reconciling ambiguity
• Iterative refreshing
– What is relevant NOW
– Growth in the lexicon because of new products, etc.
• Life cycle
– Predefinition
– Expiration
© Copyright 2013 Attensity All rights reserved
Sentiment
© Copyright 2013 Attensity All rights reserved
Sentiment Definitions
• Sentiment Type
– Opinion Mining (typically neg/pos)
– Emotion Detection
• Sentiment Scope
– Document level
– Sentence level
– Entity/aspect level
• A Couple Sentiment Use Cases
– Marketing
– Newsmakers
© Copyright 2013 Attensity All rights reserved
Sentiment Detection
• Attensity performs comprehensive language analysis
– Syntactic parse, providing linguistic analysis
– Semantic cues
– Pragmatic intelligence
• Single value for entities
• Sentiment features are weighted and combined to provide the final sentiment value
and score for document level sentiment
© Copyright 2013 Attensity All rights reserved
Marketing: A single picture is comprised of thousands of words
© Copyright 2013 Attensity All rights reserved
Political Newsmakers: Emotions
• Yahoo Social Media Widget “The Signal”
• Focused around Political Data for the 2012 Election
• Seven Emotions:
– Angry, Confused, Disengaged, Excited, Happy,
Sad, and Worried
• Candidate and Issue-centric:
– Fundraising, Religion, Race, etc.
– Economy, Environment, Foreign Affairs ,
Health Care, Social Issues, etc.
• Segmented by state
© Copyright 2013 Attensity All rights reserved
Questions?

More Related Content

Similar to "Little Words in Big Data", Jessica Perri, Attensity Director Linguistic Technology

Legal issues facing journalists
Legal issues facing journalistsLegal issues facing journalists
Legal issues facing journalists
Ellyn Angelotti
 
Cracking the Code of Human Behavior
Cracking the Code of Human BehaviorCracking the Code of Human Behavior
Cracking the Code of Human Behavior
iMedia Connection
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
Thinkful
 

Similar to "Little Words in Big Data", Jessica Perri, Attensity Director Linguistic Technology (20)

Fostering an Ecosystem for Smartphone Privacy
Fostering an Ecosystem for Smartphone PrivacyFostering an Ecosystem for Smartphone Privacy
Fostering an Ecosystem for Smartphone Privacy
 
Legal issues facing journalists
Legal issues facing journalistsLegal issues facing journalists
Legal issues facing journalists
 
Data science and ethics in fundraising
Data science and ethics in fundraisingData science and ethics in fundraising
Data science and ethics in fundraising
 
Data Mining & Engineering
Data Mining & EngineeringData Mining & Engineering
Data Mining & Engineering
 
Big Challenges in Data Modeling: Ethical Data Modeling
Big Challenges in Data Modeling: Ethical Data ModelingBig Challenges in Data Modeling: Ethical Data Modeling
Big Challenges in Data Modeling: Ethical Data Modeling
 
Cracking the Code of Human Behavior
Cracking the Code of Human BehaviorCracking the Code of Human Behavior
Cracking the Code of Human Behavior
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Data mining 2 exploratory data analysis
Data mining 2   exploratory data analysisData mining 2   exploratory data analysis
Data mining 2 exploratory data analysis
 
Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online
 
Big data Mining
Big data MiningBig data Mining
Big data Mining
 
EEDL_JUL23_Webinar_FINAL.pdf
EEDL_JUL23_Webinar_FINAL.pdfEEDL_JUL23_Webinar_FINAL.pdf
EEDL_JUL23_Webinar_FINAL.pdf
 
HR Change & Transformation
HR Change & TransformationHR Change & Transformation
HR Change & Transformation
 
Helping Developers with Privacy
Helping Developers with PrivacyHelping Developers with Privacy
Helping Developers with Privacy
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
 
Cyber Summit 2014 - An Open Data Initiative: A general overview and learning ...
Cyber Summit 2014 - An Open Data Initiative: A general overview and learning ...Cyber Summit 2014 - An Open Data Initiative: A general overview and learning ...
Cyber Summit 2014 - An Open Data Initiative: A general overview and learning ...
 
13 jun13 gaming-webinar
13 jun13 gaming-webinar13 jun13 gaming-webinar
13 jun13 gaming-webinar
 
Knowing How People Are Playing Your Game Gives You the Winning Hand
Knowing How People Are Playing Your Game Gives You the Winning HandKnowing How People Are Playing Your Game Gives You the Winning Hand
Knowing How People Are Playing Your Game Gives You the Winning Hand
 
Making Intelligent Virtual Assistants a Reality
Making Intelligent Virtual Assistants a RealityMaking Intelligent Virtual Assistants a Reality
Making Intelligent Virtual Assistants a Reality
 
Earley Executive Roundtable on Data Analytics - Session 1 - The Business Pote...
Earley Executive Roundtable on Data Analytics - Session 1 - The Business Pote...Earley Executive Roundtable on Data Analytics - Session 1 - The Business Pote...
Earley Executive Roundtable on Data Analytics - Session 1 - The Business Pote...
 

More from Attensity

Capgemini Social Media Management Presentation
Capgemini Social Media Management PresentationCapgemini Social Media Management Presentation
Capgemini Social Media Management Presentation
Attensity
 
The Real Reasons Customers Churn Presentation
The Real Reasons Customers Churn PresentationThe Real Reasons Customers Churn Presentation
The Real Reasons Customers Churn Presentation
Attensity
 
Beauty Meets Brains: Attensity Analyze 6.0 presentation
Beauty Meets Brains: Attensity Analyze 6.0 presentationBeauty Meets Brains: Attensity Analyze 6.0 presentation
Beauty Meets Brains: Attensity Analyze 6.0 presentation
Attensity
 
ThinkJar's VoC Study with Esteban Kolsky
ThinkJar's VoC Study with Esteban KolskyThinkJar's VoC Study with Esteban Kolsky
ThinkJar's VoC Study with Esteban Kolsky
Attensity
 

More from Attensity (14)

Attensity 2013 predictions-v1
Attensity 2013 predictions-v1Attensity 2013 predictions-v1
Attensity 2013 predictions-v1
 
Lance att 1 pdf
Lance att 1 pdfLance att 1 pdf
Lance att 1 pdf
 
Attensity globes volume pdf
Attensity globes volume pdfAttensity globes volume pdf
Attensity globes volume pdf
 
Attensity’s top 5 predictions for social media engagement trends in 2013 (par...
Attensity’s top 5 predictions for social media engagement trends in 2013 (par...Attensity’s top 5 predictions for social media engagement trends in 2013 (par...
Attensity’s top 5 predictions for social media engagement trends in 2013 (par...
 
Capgemini Social Media Management Attensity Presentation
Capgemini Social Media Management Attensity PresentationCapgemini Social Media Management Attensity Presentation
Capgemini Social Media Management Attensity Presentation
 
Accuracy Matters – Using Text Analytics to Drive the Cisco Customer Experience
Accuracy Matters – Using Text Analytics to Drive the Cisco Customer ExperienceAccuracy Matters – Using Text Analytics to Drive the Cisco Customer Experience
Accuracy Matters – Using Text Analytics to Drive the Cisco Customer Experience
 
Real-World Challenges of Real-Time Social Analytics
Real-World Challenges of Real-Time Social AnalyticsReal-World Challenges of Real-Time Social Analytics
Real-World Challenges of Real-Time Social Analytics
 
The Future of Text Analytics
The Future of Text AnalyticsThe Future of Text Analytics
The Future of Text Analytics
 
Capgemini Social Media Management Presentation
Capgemini Social Media Management PresentationCapgemini Social Media Management Presentation
Capgemini Social Media Management Presentation
 
Facebook Analytics Presentation
Facebook Analytics PresentationFacebook Analytics Presentation
Facebook Analytics Presentation
 
The Real Reasons Customers Churn Presentation
The Real Reasons Customers Churn PresentationThe Real Reasons Customers Churn Presentation
The Real Reasons Customers Churn Presentation
 
Beauty Meets Brains: Attensity Analyze 6.0 presentation
Beauty Meets Brains: Attensity Analyze 6.0 presentationBeauty Meets Brains: Attensity Analyze 6.0 presentation
Beauty Meets Brains: Attensity Analyze 6.0 presentation
 
ThinkJar's VoC Study with Esteban Kolsky
ThinkJar's VoC Study with Esteban KolskyThinkJar's VoC Study with Esteban Kolsky
ThinkJar's VoC Study with Esteban Kolsky
 
Attensity Sentiment Symposium 2012 Presentation
Attensity Sentiment Symposium 2012 PresentationAttensity Sentiment Symposium 2012 Presentation
Attensity Sentiment Symposium 2012 Presentation
 

Recently uploaded

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

"Little Words in Big Data", Jessica Perri, Attensity Director Linguistic Technology

  • 1. © Copyright 2013 Attensity All rights reserved Little Words in Big Data Jessica Perri Dir. Linguistic Technology Attensity Corp. jperri@attensity.com
  • 2. © Copyright 2013 Attensity All rights reserved Overview Big Data – Social Media Natural Language Parsing and Extraction Sentiment
  • 3. © Copyright 2013 Attensity All rights reserved We have more data available than ever before…
  • 4. © Copyright 2013 Attensity All rights reserved Big Data and Big Growth • The amount of available data growing exponentially • Seeing a change in the discourse landscape – Dramatic increase in personal narrative (blogs, reviews, twitter, etc) – Shift in authorship and compositional methods (smart phones, tablets, etc) • Result: More variation in data than ever before
  • 5. © Copyright 2013 Attensity All rights reserved But… more data does not necessarily mean better data.
  • 6. © Copyright 2013 Attensity All rights reserved Processing Challenges - Where Did the Data Come From? • Signal/Noise ratio worse than ever – ETL problems – Spam, spam, spam – Marketing materials – Shills, employees, interns and unsavory types gaming the system • Domain detection critical for pragmatic assumptions
  • 7. © Copyright 2013 Attensity All rights reserved Processing Challenges - What is the Data Composed Of? • Text is “degraded” – Missing/excessive punctuation – Missing words – Typographical errors – Rapid topic shift • Language is extremely varied, and constantly changing – A million words for a single picture – Productive, phonological rules for emphasis (loooooooooooool, uggghhhhh) – Novel and coined terms • Not business relevant
  • 8. © Copyright 2013 Attensity All rights reserved Processing Challenges – Extralinguistic Cues • People are opinionated • People are sassy • People are sarcastic • People are clever @jane: Obama won. I’m SO HAPPY to have a #socialist #communist president. @jane: Poor Romney. I’m so sad that he has to go home to one of his 35 mansions. #not @jane: It’s so great that Obama won. </sarcasm> @jane: It’s so great that Obama won. #saidnooneever
  • 9. © Copyright 2013 Attensity All rights reserved We need to use existing data more intelligently!
  • 10. © Copyright 2013 Attensity All rights reserved What can we do with Big Data? • “Looking for a needle in a haystack” • Search for predefined scenarios: Recovery • Implications for processing: Use a set of targeted patterns over all possible data
  • 11. © Copyright 2013 Attensity All rights reserved What can we do with Big Data? • “Looking for the shape of the haystack” • Look for trends and novel events: Discovery – IDKWILFBIKIWISI • Implications for processing: Use dynamic patterns over a sample of data (“exhaustive extraction”)
  • 12. © Copyright 2013 Attensity All rights reserved Attensity Exhaustive Extraction – Roles and Relationships “I bought a beautiful Jimmy Choo scarf for my mom from Nordstrom.”
  • 13. © Copyright 2013 Attensity All rights reserved Attensity Voice – Shades of Meaning Indefinite Voice depicts the uncertainty of the statement: I might stay here again. Intent Voice indicates the plans of a customer: We will definitely stay here in the future! Conditional Voice reveals customer’s stipulations: I would shop more often if I got free shipping. Negation cancels out the statement: I have never reset my password. Recur Voice conveys the recurring manner of the event: This is the third time I’ve emailed them. Command Voice detects strong demands from a customer, distinguishing them from requests or statements of fact: Lower your prices.
  • 14. © Copyright 2013 Attensity All rights reserved Domain Knowledge Models • Narrow topic definition – Data variability across domains – Reconciling ambiguity • Iterative refreshing – What is relevant NOW – Growth in the lexicon because of new products, etc. • Life cycle – Predefinition – Expiration
  • 15. © Copyright 2013 Attensity All rights reserved Sentiment
  • 16. © Copyright 2013 Attensity All rights reserved Sentiment Definitions • Sentiment Type – Opinion Mining (typically neg/pos) – Emotion Detection • Sentiment Scope – Document level – Sentence level – Entity/aspect level • A Couple Sentiment Use Cases – Marketing – Newsmakers
  • 17. © Copyright 2013 Attensity All rights reserved Sentiment Detection • Attensity performs comprehensive language analysis – Syntactic parse, providing linguistic analysis – Semantic cues – Pragmatic intelligence • Single value for entities • Sentiment features are weighted and combined to provide the final sentiment value and score for document level sentiment
  • 18. © Copyright 2013 Attensity All rights reserved Marketing: A single picture is comprised of thousands of words
  • 19. © Copyright 2013 Attensity All rights reserved Political Newsmakers: Emotions • Yahoo Social Media Widget “The Signal” • Focused around Political Data for the 2012 Election • Seven Emotions: – Angry, Confused, Disengaged, Excited, Happy, Sad, and Worried • Candidate and Issue-centric: – Fundraising, Religion, Race, etc. – Economy, Environment, Foreign Affairs , Health Care, Social Issues, etc. • Segmented by state
  • 20. © Copyright 2013 Attensity All rights reserved Questions?