IBM Research - Almaden
© 2012 IBM Corporation
Leveraging Big Data to Derive
Actionable People Insights
Huahai Yang
USER Gr...
IBM Research - Almaden
© 2012 IBM Corporation2
About Us
• Aka. IBM Almaden Research Center (ARC)
• On top of a hill in the...
IBM Research - Almaden
© 2012 IBM Corporation3
About Us
• ARC
– Science & Technology
– Storage Systems
– Service Science R...
IBM Research - Almaden
© 2012 IBM Corporation4
Big Data Opportunities
• Industries
– Finance
– Retail
– Product Manufactur...
IBM Research - Almaden
© 2012 IBM Corporation5
Our Focus: Insights about People
• Perceptions, Sentiments, Personalities a...
IBM Research - Almaden
© 2012 IBM Corporation6
Outline
 Consumer users
– OpinionBlocks
• Visually summarizing product rev...
IBM Research - Almaden
© 2012 IBM Corporation7
OpinionBlocks Motivation
 Online product reviews
– Significant in consumer...
IBM Research - Almaden
© 2012 IBM Corporation8
Difficulties with Review Text
 A lot of variations in terms of:
– length
–...
IBM Research - Almaden
© 2012 IBM Corporation9
Prior Work
Chen et al. “Visualizing Analysis of Conflicting Opinions”, 2006...
IBM Research - Almaden
© 2012 IBM Corporation10
Can Consumer Trust the Analysis?
 Sentiment analysis is not a solved prob...
IBM Research - Almaden
© 2012 IBM Corporation11
Our Approach
 Support an interactive reading experience where users can
s...
IBM Research - Almaden
© 2012 IBM Corporation12
Overview First
 Provide summary of
overall opinion
 Identify important
f...
IBM Research - Almaden
© 2012 IBM Corporation13
Filter on Demand

 Polarity of feature


 Keywords


 Snippets
IBM Research - Almaden
© 2012 IBM Corporation14
Zoom across LODs
IBM Research - Almaden
© 2012 IBM Corporation15
Zoom across LODs
IBM Research - Almaden
© 2012 IBM Corporation16
Work in Progress
 Formal user studies
– Does the system help consumers?
•...
IBM Research - Almaden
© 2012 IBM Corporation17
Brandy
 360 ゚ understanding of a business brand: evidence-
based brand ma...
IBM Research - Almaden
© 2012 IBM Corporation18
Social
Data
Enriched/new
Customer Profile
Segment-based
Direct Marketing
(...
IBM Research - Almaden
© 2012 IBM Corporation19
Features # Examples Computation
LIWC
(dictionary-
based
measurement
of asp...
IBM Research - Almaden
© 2012 IBM Corporation20 Industry Solutions Joint Program
Ongoing: Correlating Brands with Personal...
IBM Research - Almaden
© 2012 IBM Corporation21
Brand Perceptions from Twitter Data
 Twitter Data
– Collected from Twitte...
IBM Research - Almaden
© 2012 IBM Corporation22
Classification- Inputs
 Training Data
– Manually categorized ~3500 tweets...
IBM Research - Almaden
© 2012 IBM Corporation23
Classification- Results
Walmart Costco
WEKA ICM WEKA ICM
Category #tweets ...
IBM Research - Almaden
© 2012 IBM Corporation24
Classification- Insights
 Insufficient evidence: some categories are rare...
IBM Research - Almaden
© 2012 IBM Corporation25
Infrastructure- UI
IBM Research - Almaden
© 2012 IBM Corporation26
Work in progress
• Improve classification quality of brand perceptions
– U...
IBM Research - Almaden
© 2012 IBM Corporation27
Qcrowd: asking targeted strangers questions
IBM Research - Almaden
© 2012 IBM Corporation28
Engagement Continuum
IBM Research - Almaden
© 2012 IBM Corporation29
System Architecture
IBM Research - Almaden
© 2012 IBM Corporation30
Research Questions
• Where might this be helpful?
– Questions about an eve...
IBM Research - Almaden
© 2012 IBM Corporation31
Test Scenarios: TSA Tracker & Camera Review
• Crowdsourcing airport securi...
IBM Research - Almaden
© 2012 IBM Corporation32
Questions Asked
• TSA Tracker
– Without incentive
– With incentive
–
• Cam...
IBM Research - Almaden
© 2012 IBM Corporation33
Results
IBM Research - Almaden
© 2012 IBM Corporation34
Follow-up Questions
IBM Research - Almaden
© 2012 IBM Corporation35
Observations
IBM Research - Almaden
© 2012 IBM Corporation36
Thank You! Questions?
User System and Experience Research (USER)
IBM Resea...
Upcoming SlideShare
Loading in...5
×

Leveraging Big Data to Derive Actionable People Insights

136

Published on

Talk given in July 2012 at Fudan University http://cscw.fudan.edu.cn/lang/en/news/huahai_yang_visited_cisl

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
136
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • G is a must, and E&F are bonus
  • Visiting,112 OtherCategory,574 Jobs,187 Experience,99 Checkout,135 Value,19 Service,17 Layout,6 Quality,18 People,81 Pharmacy,63 Selection,50
  • Leveraging Big Data to Derive Actionable People Insights

    1. 1. IBM Research - Almaden © 2012 IBM Corporation Leveraging Big Data to Derive Actionable People Insights Huahai Yang USER Group, Computer Science
    2. 2. IBM Research - Almaden © 2012 IBM Corporation2 About Us • Aka. IBM Almaden Research Center (ARC) • On top of a hill in the southern tip of Silicon Valley
    3. 3. IBM Research - Almaden © 2012 IBM Corporation3 About Us • ARC – Science & Technology – Storage Systems – Service Science Research – Computer Science • Theory • Database Management • Intelligent Information System • Healthcare Information Technology • User System and Experience Research (USER) – Currently led by Michelle X. Zhou • Join us, we are always hiring! – Interns, Postdocs, Software Engineers and Research Staff Members
    4. 4. IBM Research - Almaden © 2012 IBM Corporation4 Big Data Opportunities • Industries – Finance – Retail – Product Manufacturer – Tel-communication – Entertainment – … • Potentials – Customer acquisition/retention – Market segmentation – Brand management – Risk assessment • 300+ million tweets daily • 1+ million blog posts daily
    5. 5. IBM Research - Almaden © 2012 IBM Corporation5 Our Focus: Insights about People • Perceptions, Sentiments, Personalities and other profiles
    6. 6. IBM Research - Almaden © 2012 IBM Corporation6 Outline  Consumer users – OpinionBlocks • Visually summarizing product reviews  Business users – Brandy • Understanding user perception and personality for direct marketing – qCrowd • Actively engaging individuals on social media
    7. 7. IBM Research - Almaden © 2012 IBM Corporation7 OpinionBlocks Motivation  Online product reviews – Significant in consumer decision making – Large volume, uncurated – Limited search tools – Users have different priorities  Help consumer makes better use of reviews
    8. 8. IBM Research - Almaden © 2012 IBM Corporation8 Difficulties with Review Text  A lot of variations in terms of: – length – clarity of language – to the point vs vague – emotional vs subjective  Incoherent with the rating  Redundant
    9. 9. IBM Research - Almaden © 2012 IBM Corporation9 Prior Work Chen et al. “Visualizing Analysis of Conflicting Opinions”, 2006 Oelka et al. “Visual Opinion Analysis of Customer Feedback Data”, 2009 Analyze first, and visualize the analysis results only
    10. 10. IBM Research - Almaden © 2012 IBM Corporation10 Can Consumer Trust the Analysis?  Sentiment analysis is not a solved problem in NLP – Often less than 80% accuracy – Aspect oriented sentiment is even less accurate  Low resolution – Often polarity only: positive, negative, neutral  Not very actionable “…it was not clean, but I am not expecting a better performance from any vacuum.”
    11. 11. IBM Research - Almaden © 2012 IBM Corporation11 Our Approach  Support an interactive reading experience where users can search for relevant information.   Visualize the text itself while highlighting analysis results.   Show the categorized text in context so that users can judge fairness of the sentiment analysis.   Progressively disclose textual information while continuously providing visual graphical summaries
    12. 12. IBM Research - Almaden © 2012 IBM Corporation12 Overview First  Provide summary of overall opinion  Identify important features and key issues in each  Interactivity reveals correlations among features
    13. 13. IBM Research - Almaden © 2012 IBM Corporation13 Filter on Demand   Polarity of feature    Keywords    Snippets
    14. 14. IBM Research - Almaden © 2012 IBM Corporation14 Zoom across LODs
    15. 15. IBM Research - Almaden © 2012 IBM Corporation15 Zoom across LODs
    16. 16. IBM Research - Almaden © 2012 IBM Corporation16 Work in Progress  Formal user studies – Does the system help consumers? • Learn better about the product domain • Find information faster • Make better decisions  Resign of the UI • Compare products • Scalable with larger number of reviews
    17. 17. IBM Research - Almaden © 2012 IBM Corporation17 Brandy  360 ゚ understanding of a business brand: evidence- based brand management from social media – Brand associations (e.g., key aspects) – Competitive brands – Brand evolution – User modeling of those who voiced brand perception • Demographics, personality traits, locations, brand association and sentiments  Active brand management for effective marketing – Craft/adjust marketing messages based on brand analysis • Associations and customer needs – Deliver marketing messages to target customers • Individual customers (e.g., customer retention) • Customer segments (e.g., customer acquisition)
    18. 18. IBM Research - Almaden © 2012 IBM Corporation18 Social Data Enriched/new Customer Profile Segment-based Direct Marketing (Customer Acquisition) Brand Management (Marketing Research) Existing Customers New Customers Social Data of Known Customers Perceptual Map Social Data of Unknown Customers (A) Profiling Fusion (C) Brand perception from social data (D) Overlay Unica (Today) (B) User profiling from social data Customer Profiles Individual-based Direct Marketing (Customer retention) Data Key Technology Applications Project Map (E) (F) (G)
    19. 19. IBM Research - Almaden © 2012 IBM Corporation19 Features # Examples Computation LIWC (dictionary- based measurement of aspects of word usage) 68 First person Negation Feeling Communication Leisure Death Let g be a LIWC category, Ng denotes the number of occurrences of words in that category in one’s tweets and N denotes the total number of words in his/her tweets. A score for category g is then: Ng/N. Big Five (personality types, OCEAN) 5 Openness Using correlations with LIWC features as reported by previous researchers (e.g., Yarkoni et al.) Big Five Facets 30 Liberalism Imagination Using correlations with LIWC features as reported by previous researchers (e.g., Yarkoni et al.) Modeling Personality  Research in psychology have shown that word usage in one’s writings such as blogs and essays is related with one’s personality – Our research has correlated Big 5 facet features with willingness and readiness to respond to questions on social networks
    20. 20. IBM Research - Almaden © 2012 IBM Corporation20 Industry Solutions Joint Program Ongoing: Correlating Brands with Personalities  Data – 3000 Twitter users discussing Walmart, Costco, and Sears  Analytics – Measured personality traits from twits – “Openness” scores per brand shown to right. All Brands Costco Sears Walmart
    21. 21. IBM Research - Almaden © 2012 IBM Corporation21 Brand Perceptions from Twitter Data  Twitter Data – Collected from Twitter4J Stream-fashion API – Queries: Walmart, Costco, Sears – Total ~1M tweets , from May 31st - June 21 (3 weeks)  Data Filtering & Enhancements – Started with 6 categories from Consumer Report, “selection, quality, layout, service, checkout, value” – Category expansion- using synonyms, Wordnet and related terms. E.g., – Added new categories that POP out of the data, “jobs, people, visiting, experience..”. • E.g., “I don"t know why I even try anymore. Walmart always disappoints. #walmartsucks” • “Walmart stay crowded!” checkout line,wait,cashier,counter,self checkout,cash,register layout Isle,passage,lane,shelf,door,gate,lost,spacious,narrow,row
    22. 22. IBM Research - Almaden © 2012 IBM Corporation22 Classification- Inputs  Training Data – Manually categorized ~3500 tweets • Walmart (1341), Costco (1420), Sears (1109) – Run WEKA with 3 different models • SVM • Multinomial • KNN – Run ICM with some tuning • Naïve based • Knowledge based and rules based – Cross validation with 2-folds.  Data Expansion – Run algorithm that finds similar tweets for those 3500 categorized tweets, ended up with ~75,000 tweets.
    23. 23. IBM Research - Almaden © 2012 IBM Corporation23 Classification- Results Walmart Costco WEKA ICM WEKA ICM Category #tweets Precision Recall Precision Recall #tweets Precision Recall Precision Recall Visiting 112 0.313 0.357 0.412 0.5 244 0.457 0.496 0.86 0.39 Jobs 187 0.907 0.834 1 0.782 32 0.773 0.531 1 0.538 Experience 99 0.253 0.222 0.272 0.312 114 0.199 0.254 0.177 0.544 Checkout 135 0.927 0.748 0.963 0.791 16 0.333 0.188 0.2 0.12 Service 17 0 0 0 0 3 0 0 0 0 Value 19 0.5 0.053 0 0 53 0.1 0.057 0.125 0.08 Layout 6 0 0 0 0 4 0 0 0 0 Quality 18 0.25 0.056 0 0 53 0.16 0.075 0.2 0.318 People 81 0.294 0.185 0.357 0.256 43 0.053 0.023 0.2 0.3889 Pharmacy 63 0.925 0.778 0.92 0.741 95 0.926 0.916 0.905 0.941 Selection 50 0.389 0.14 0.112 0.409 132 0.284 0.189 0.407 0.159 Other Category 574 0.598 0.763 0.673 0.111 712 0.625 0.704 0.702 0.5938
    24. 24. IBM Research - Almaden © 2012 IBM Corporation24 Classification- Insights  Insufficient evidence: some categories are rarely discussed in the tweets. E.g., “layout”, “service”.  Complexity of category: Even for the same number of training data, the classification for some categories have much lower accuracy Visiting Jobs Experience Checkout Service Value Layout Quality People Pharmacy Selection 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 200 #tweets Precision
    25. 25. IBM Research - Almaden © 2012 IBM Corporation25 Infrastructure- UI
    26. 26. IBM Research - Almaden © 2012 IBM Corporation26 Work in progress • Improve classification quality of brand perceptions – Unsupervised methods to uncover unknown perceptions • Relating brand perceptions with consumer personalities • User studies on marketing professionals – How well the tools support marketing tasks? –
    27. 27. IBM Research - Almaden © 2012 IBM Corporation27 Qcrowd: asking targeted strangers questions
    28. 28. IBM Research - Almaden © 2012 IBM Corporation28 Engagement Continuum
    29. 29. IBM Research - Almaden © 2012 IBM Corporation29 System Architecture
    30. 30. IBM Research - Almaden © 2012 IBM Corporation30 Research Questions • Where might this be helpful? – Questions about an event that are best answered soon after the event – Questions for which there might be a diversity of opinions – More? • How feasible is this approach? – Will people answer questions from strangers? – Will use of incentives increase responses? – What is the quality of the answers? –
    31. 31. IBM Research - Almaden © 2012 IBM Corporation31 Test Scenarios: TSA Tracker & Camera Review • Crowdsourcing airport security wait time via twitter • Crowdsourcing product reviews via twitter – Ask follow-up questions if responded
    32. 32. IBM Research - Almaden © 2012 IBM Corporation32 Questions Asked • TSA Tracker – Without incentive – With incentive – • Camera Reviews
    33. 33. IBM Research - Almaden © 2012 IBM Corporation33 Results
    34. 34. IBM Research - Almaden © 2012 IBM Corporation34 Follow-up Questions
    35. 35. IBM Research - Almaden © 2012 IBM Corporation35 Observations
    36. 36. IBM Research - Almaden © 2012 IBM Corporation36 Thank You! Questions? User System and Experience Research (USER) IBM Research – Almaden http://www.almaden.ibm.com/cs/disciplines/user/ Jilin Chen Allen Cypher Eben Haber Eser Kandogan Tessa Lau Jalal Mahmud Jeffrey Nichols Barton Smith Huahai Yang Michelle X. ZhouTara Mathews
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×