"Little Words in Big Data", Jessica Perri, Attensity Director Linguistic Technology

  • 497 views
Uploaded on

Semantic Tech Conference (SemTech) - June 2nd-5th in San Francisco: Technical scope with Jessica Perri, Director Linguistic Technologies …

Semantic Tech Conference (SemTech) - June 2nd-5th in San Francisco: Technical scope with Jessica Perri, Director Linguistic Technologies

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
497
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © Copyright 2013 Attensity All rights reservedLittle Words in Big DataJessica PerriDir. Linguistic TechnologyAttensity Corp.jperri@attensity.com
  • 2. © Copyright 2013 Attensity All rights reservedOverviewBig Data – Social MediaNatural Language Parsing and ExtractionSentiment
  • 3. © Copyright 2013 Attensity All rights reservedWe have more data available thanever before…
  • 4. © Copyright 2013 Attensity All rights reservedBig Data and Big Growth• The amount of available data growing exponentially• Seeing a change in the discourse landscape– Dramatic increase in personal narrative (blogs, reviews, twitter, etc)– Shift in authorship and compositional methods (smart phones, tablets, etc)• Result: More variation in data than ever before
  • 5. © Copyright 2013 Attensity All rights reservedBut… more data does notnecessarily mean better data.
  • 6. © Copyright 2013 Attensity All rights reservedProcessing Challenges - Where Did the Data Come From?• Signal/Noise ratio worse than ever– ETL problems– Spam, spam, spam– Marketing materials– Shills, employees, interns and unsavory types gaming the system• Domain detection critical for pragmatic assumptions
  • 7. © Copyright 2013 Attensity All rights reservedProcessing Challenges - What is the Data Composed Of?• Text is “degraded”– Missing/excessive punctuation– Missing words– Typographical errors– Rapid topic shift• Language is extremely varied, and constantly changing– A million words for a single picture– Productive, phonological rules for emphasis (loooooooooooool, uggghhhhh)– Novel and coined terms• Not business relevant
  • 8. © Copyright 2013 Attensity All rights reservedProcessing Challenges – Extralinguistic Cues• People are opinionated• People are sassy• People are sarcastic• People are clever@jane: Obama won. I’m SO HAPPY to have a#socialist #communist president.@jane: Poor Romney. I’m so sad that he has to gohome to one of his 35 mansions. #not@jane: It’s so great that Obama won. </sarcasm>@jane: It’s so great that Obama won.#saidnooneever
  • 9. © Copyright 2013 Attensity All rights reservedWe need to use existing data moreintelligently!
  • 10. © Copyright 2013 Attensity All rights reservedWhat can we do with Big Data?• “Looking for a needle in a haystack”• Search for predefined scenarios: Recovery• Implications for processing: Use a set of targetedpatterns over all possible data
  • 11. © Copyright 2013 Attensity All rights reservedWhat can we do with Big Data?• “Looking for the shape of the haystack”• Look for trends and novel events: Discovery– IDKWILFBIKIWISI• Implications for processing: Use dynamic patternsover a sample of data (“exhaustive extraction”)
  • 12. © Copyright 2013 Attensity All rights reservedAttensity Exhaustive Extraction – Roles and Relationships“I bought a beautiful Jimmy Choo scarf for my mom from Nordstrom.”
  • 13. © Copyright 2013 Attensity All rights reservedAttensity Voice – Shades of MeaningIndefinite Voice depicts the uncertainty of the statement:I might stay here again.Intent Voice indicates the plans of a customer:We will definitely stay here in the future!Conditional Voice reveals customer’s stipulations:I would shop more often if I got free shipping.Negation cancels out the statement:I have never reset my password.Recur Voice conveys the recurring manner of the event:This is the third time I’ve emailed them.Command Voice detects strong demands from a customer, distinguishing them from requests or statements of fact:Lower your prices.
  • 14. © Copyright 2013 Attensity All rights reservedDomain Knowledge Models• Narrow topic definition– Data variability across domains– Reconciling ambiguity• Iterative refreshing– What is relevant NOW– Growth in the lexicon because of new products, etc.• Life cycle– Predefinition– Expiration
  • 15. © Copyright 2013 Attensity All rights reservedSentiment
  • 16. © Copyright 2013 Attensity All rights reservedSentiment Definitions• Sentiment Type– Opinion Mining (typically neg/pos)– Emotion Detection• Sentiment Scope– Document level– Sentence level– Entity/aspect level• A Couple Sentiment Use Cases– Marketing– Newsmakers
  • 17. © Copyright 2013 Attensity All rights reservedSentiment Detection• Attensity performs comprehensive language analysis– Syntactic parse, providing linguistic analysis– Semantic cues– Pragmatic intelligence• Single value for entities• Sentiment features are weighted and combined to provide the final sentiment valueand score for document level sentiment
  • 18. © Copyright 2013 Attensity All rights reservedMarketing: A single picture is comprised of thousands of words
  • 19. © Copyright 2013 Attensity All rights reservedPolitical Newsmakers: Emotions• Yahoo Social Media Widget “The Signal”• Focused around Political Data for the 2012 Election• Seven Emotions:– Angry, Confused, Disengaged, Excited, Happy,Sad, and Worried• Candidate and Issue-centric:– Fundraising, Religion, Race, etc.– Economy, Environment, Foreign Affairs ,Health Care, Social Issues, etc.• Segmented by state
  • 20. © Copyright 2013 Attensity All rights reservedQuestions?