Successfully reported this slideshow.

Engaging with Users on Public Social Media


Published on

My talk from Carnegie Mellon's HCII Seminar on April 24, 2013.

On some social media platforms, such as Twitter, Youtube, Pinterest, and tumblr, much of the content generated by users is publicly accessible and communication can be easily initiated between strangers who have never previously communicated before. The communities that have risen up around these platforms, particularly on Twitter, can also be inclusive and supportive of interactions between strangers. The public and open nature of these communities creates an opportunity to create a new kind of crowdsourcing system, where individuals are identified who may be good candidates to complete various tasks based on their published content. We explore the potential of such a system through several information collection tasks, examining the response rate and information quality that can be obtained through such a system. We also explore a means of leveraging users' previous social media content to predict their likelihood of response and optimize our system's collection behavior. At IBM Research - Almaden, we are now looking to extend these ideas to additional domains, including proactive and reactive customer support, and precision marketing campaigns.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Engaging with Users on Public Social Media

  1. 1. Engaging with Users onPublic Social MediaJeffrey NicholsIBM Research –
  2. 2. IBM Research – Almaden• 400+ research employees; 100+ studentsand post-docs• Research in Computer Science, StorageSystems, Science and Technology, ServicesScience• User Focused Systems in CS
  3. 3. The Buzz of the CrowdPeople are generating 1+ billionstatus updates every dayTopics covered in status updatesare highly diverse:• Weather, traffic, and other day-to-day annoyances• Experiences with products• Reaction to eventsHow can we leverage this buzz todo something useful?* 1/2 billion updates every day onTwitter as of October 2012
  4. 4. Challenge: The Information IcebergInformation revealed through status updatesUseful information known to members of social networkGOAL
  5. 5. Example #1:Learning more about customer incidentsto improve service• What happened?• Was it something in particular about this store?• Could other people have the same experience?• How can we make things right?This information could be used toimprove the customer experience
  6. 6. Example #2:Tracking crime to improve reporting andbetter allocate resources• Where was it stolen?• Was a report filed with police?Over time, this information couldsuggest how to allocate officers orfunds to different areas of the city
  7. 7. • How long did it take to get through security?This information could be used bythe security agency (TSA) to identifyproblem spots and allocate officers.It can also be used by consumers toplan their air travel.Example #3:Tracking wait times at airport security checkpoints,shows updates may indirectly suggest person has info.
  8. 8. Uses for Engagement on Social MediaThe ability to actively identify and engage with the right peopleat the right time on social media can empower an organization• Collect just-in-time information from users• Disseminate important information (broadcast or targeted)• Motivate users to perform a task• Seize timely business opportunities(e.g., cross- or up selling)
  9. 9. Uses for Engagement on Social MediaThe ability to actively identify and engage with the right peopleat the right time on social media can empower an organization• Collect just-in-time information from users• Disseminate important information (broadcast or targeted)• Motivate users to perform a task• Seize timely business opportunities(e.g., cross- or up selling)
  10. 10. Examples
  11. 11. Where might this be helpful?• Questions that have spatial and/or temporalspecificity (e.g., about an event)• Questions for which there might be a diversityof opinion• More?
  12. 12. OtherAdvantages• Information is easier toextract from responsesbecause the question isknown• Sample range can becontrolled by askingquestions from users with avariety of different profiles• No waiting needed…questions can be asked inreal-time• Potential answerers can beprimed with the questionbefore they have the answer
  13. 13. How feasible is this approach?• Will people answer questions from strangers?• Will use of an incentive increase responses?• What is the quality of the answers?
  14. 14. Concrete Prototype: TSA TrackerCrowdsourcing airport security wait times through Twitter14Step #1. Watch for peopletweeting about beingin airportStep #2. Ask nicely if theywould share wait time tohelp othersStep #3. Collect responsesand share relevant dataon web siteStep #4. Say thank you!Key Question:Will people respond to questionsfrom strangers? , @tsatracking
  15. 15. QuestionsFrom @tsatracker (includes incentive)“If you went through security at <airport code>, can youreply with your wait time? Info will be used to help othertravelers”From @tsatracking (no incentive)“If you went through security at <airport code>, can youreply with your wait time?”
  16. 16. Concrete Prototype: Product ReviewsStep 1. Identify owners of a productStep 2. Ask focused question about product• How is the image quality?• Does it take good low light pictures?• How quickly does it take a picture after pressingthe shutter button?• How durable is it?• What accessories are must haves?• Etc…Step 3-4. Ask more questions if user respondsStep 5. Visualize results as structured product review (future work)Key Questions:Will people respond to questionsin this different domain?Will people respond to follow-upquestions at the same rate?Do responses contain useful &accurate information?
  17. 17. Product Review ScenariosSamsung Galaxy Tab 10.1• Popular consumer electronics product at the time of the study(didn’t want to use iPad)• Compared to reviews from Amazon.comL.A.-area Food Trucks• Vibrant scene and Twitter is a primary means of communication• Food trucks usually identified in tweet by @handle• Compared to reviews from
  18. 18. Question Asking Dashboard*Keyword-filtered streamUser’s recent tweetsResponses
  19. 19. Quality Evaluation MethodsHuman Coding• Twitter responses & Traditional Reviews• Relevance of response• Information TypesInformation Entropy• Comparison between Twitter/Amazon, Twitter/YelpMechanical Turk Questionnaire• Usefulness, Objectiveness, Trustworthiness, Balance,Readability
  20. 20. Results…
  21. 21. Suspended!• @tsatracking account (no incentive condition)given 1 week suspension after asking 150questions• Did not violate Twitter Terms of Use• Exceeded threshold for blocks or messagemarked as spam• Neither of our other accounts weresuspended
  22. 22. ResultsAnswer:42% response rate44% of answersreceived in 30 minsNo significant differencebetween any conditions(taking into account suspension)Key Question:Will people respond toquestions from strangers?
  23. 23. Follow-up Question Results• Significant differences between all 4 questions (H=50.12, df=3, p < 0.0001, Kruskal-Wallis)and just the 3 follow-ups (H=25.46, df=2, p < 0.0001, Kruskal-Wallis)
  24. 24. Qualitative Results• @tsatracker account picked up 16followers• Many positive responses (“this will begreat for travelers”)• Only one slightly negative response (“thisis creepy”), but that person also gave ananswer
  25. 25. Response Quality (Coding)ResponseCountRelevantAnswerWrongAnswerButUsefulInfoMulti-MessageResponseAverageInfoperResponseOff-topicInfoperResponseTablet 258 71% 19% 3% 1.82 0.48Food Truck 111 82% 6% 6% 1.69 0.46# IrrelevantResponsesNoExperienceDidnt knowor understandThinkswerea botTablet 75 63% 11% 7%Food Truck 20 25% 30% 0%OverallBreakdownIrrelevantResponseBreakdown
  26. 26. Information EntropyThe Twitter method is dependent on the questions• Despite trying to base our questions on the contents of Amazon reviews,ose reviews still contained more information.• Our food truck questions went beyond Yelp reviews* Calculated using a shrinkage entropy estimatorTablet Food TruckAmazon Twitter Yelp TwitterInformation(bits)4.25 3.76 3.27 4.24Tablet Food TruckAmazon Twitter Yelp TwitterInformation(bits)4.09 3.73 3.27 3.02All InformationInformation InBoth Sets
  27. 27. Mechanical Turk EvaluationTabletAmazon TwitterMann-Whitney pUsefulness 3.19 2.64 868.5 0.006Objectiveness 2.94 2.53 814.5 0.042Trustworthiness 2.94 2.39 861.0 0.008Balance 3.00 2.11 936.0 0.001Readability 2.92 2.61 741.5 0.270Food TruckYelp TwitterMann-Whitney pUsefulness 2.86 2.56 734.0 0.309Objectiveness 2.17 2.08 672.0 0.783Trustworthiness 2.58 2.14 800.5 0.071Balance 2.47 1.72 921.0 0.002Readability 2.89 2.11 896.0 0.004Completion Times• Tablet26.5 minutes for Amazon25.8 minutes for Twitter• Food Truck19.9 minutes for Yelp16.8 minutes for TwitterExplanation of Results• Few concrete examples ofexperiences in Twitteranswers• Limited information aboutTwitter reviewers
  28. 28. Conclusions• Response rates independent ofdomain seem to be around 40-45%• Providing an explanation orincentive does not seem to affectresponse• Answer quality is fairly high at70-80%• Quality seems to be tied totargeting accuracy• Most “bad” answers come frompeople who didn’t know the answerto our question
  29. 29. The Targeting Challenge• Finding relevant people• Identifying the most likely responders
  30. 30. Filtering for RelevanceChen, et al. to appear at ICWSM 2013
  31. 31. Problem
  32. 32. ProblemIt’s difficult to identify relevant tweets from keywords aloneunderspecified overspecifiedBridging the gap with regular expressions and rules can take hours or days ofauthoring by a human expert
  33. 33. Use Crowd to Create Intelligent Filters1. Collect sample ofrelevant tweets(keyword filter)2. Collect ground truth filterresults from crowd onMechanical Turk3. Machine learn a filter modelsusing SPSS Modeler4. Use models to filtertweets in real-timeFilterModel5. Social media dashboardusers can react faster andmore accuratelyFilterModelEach filter requiresa few hours and~$35 to create
  34. 34. EvaluationScenarios• Customer service for Delta Airlines & Hertz Rent-a-car• Relevance filter + Opinion filterEvaluation Questions• Quality of Crowd-Labeled Ground Truth• Effectiveness of Filter Algorithms• Usefulness by Users in Filtering Tasks
  35. 35. EvaluationScenarios• Customer service for Delta Airlines & Hertz Rent-a-car• Relevance filter + Opinion filterEvaluation Questions• Quality of Crowd-Labeled Ground Truth• Effectiveness of Filter Algorithms• Usefulness by Users in Filtering Tasks
  36. 36. Evaluation ResultsLabel Agreement(Pair-wise Cohen’s kappa)Performance
  37. 37. Likelihood of ResponseMahmud, et al. IUI 2013
  38. 38. Baseline: Human JudgementHow well can humans identify “willingness” and “readiness”?Two surveys on CrowdFlower:• Willingness: Asked each participant to predict if a displayed Twitteruser would be willing to respond to a given question, assuming thatthe user has the ability to answer• Readiness: Asked each participant to predict how soon (e.g. 1 hour, 1day) the person would respond assuming that s/he is willing torespond100 participants for the first survey and 50 for the second
  39. 39. Willingness Result• 29% correct when only tweets of a user was displayed• 38% correct when complete twitter profile was displayed.• Selecting users for question asking is also difficult for the crowd
  40. 40. Readiness Result58% Correct• Compared with the ground truth• For example, if a participant predicted that person X will respondwithin an hour, but the response was not received in time, theprediction is then incorrect
  41. 41. Features for Machine Selection• Responsiveness• E.g., mean response time to other users’ mention• Profile• E.g., use particular words in profile description• Activity• E.g., number of tweets• Readiness• E.g., percentage of tweets occurring at each hour of the day• Personality
  42. 42. OpennessConscientiousnessExtraversionAgreeablenessNeuroticism[Tausczik&Pennebaker 2010, Yarkoni 2010]Map the use of words, frequency, &correlation with Big5 based onpsycholinguistic dictionaries (LIWC++)“Agreeableness”wonderful (0.28), together (0.26) …porn (-0.25), cost (-0.23)Deriving Personality from Social Media
  43. 43. Feature AnalysisSignificant Features• For TSA-tracker-1 dataset, we found 42 significant features (FDR was 2.8%).• For Product dataset, we found 31 features as significant (FDR 4.2%)• For TSA-tracker-2 dataset, we found 11 significant features (FDR 11.2%)Top-4 Discriminative Features- Top-4 features were found usingextensive experiments.
  44. 44. EvaluationEvaluating Prediction ModelTSA-tracker-1 TSA-tracker-2 ProductSVM Logistic SVM Logistic SVM LogisticPrecision 0.62 0.60 0.52 0.51 0.67 0.654Recall 0.63 0.61 0.53 0.55 0.71 0.62F1 0.625 0.606 0.525 0.53 0.689 0.625AUC 0.657 0.599 0.592 0.514 0.716 0.55Comparison of Average Response Rates using Different ApproachesTSA-tracker-1 TSA-tracker-2 ProductBaseline 42% 33% 31%Binary-classification 62% 52% 67%Top-K-Selection 61% 54% 67%Our Algorithm 67% 56% 69%
  45. 45. Live ExperimentMethod• Used Twitter’s Search API and a set of rules to find 500users who mentioned airport and 500 for product• Randomly asked 100 users for the security wait time• Used our algorithm to identify 100 users for questioning from theremaining 400 users
  46. 46. Conclusions
  47. 47. Engagement Continuum• Scenario-based filtering• Smart engagement recommendations(e.g., based on location inference)• Customizable engagement scenarios• Domain-specific analyticsmanual assisted automaticSystem UHumans do all the work Analytics streamline decisions:“press button to engage”System-driven engagementSend this:Send• Rule-based engagement• Exception identificationand notification• Intelligent transition tohuman-driven engagementas desired• Keyword filtering• Unstructured engagement• Domain-independent analytics47
  48. 48. To wrap up…• Interaction on social media enables a variety ofapplications• Collecting information using this approach isfeasible and produces quality information• Targeting can be improved flexibly throughcrowd-assisted filtering• Likely responders can be identified from theirsocial media content
  49. 49. Thanks!For more information, contact:Jeffrey
  50. 50. Samsung Galaxy Tab 10.1Questions• 2 iterations• First round Qs based on CNET andEngadget editor reviews• Second round modified based ontop 10 user reviews of tablets onAmazon.comProcedure• Identified users from real-timetwitter stream• Keywords and then manual humaninspection• Questions chosen semi-randomlybased on content of tweet,answers received so farRound #2 Questions
  51. 51. Samsung Galaxy Tab 10.1 Example
  52. 52. Los Angeles Food TrucksQuestions• Based on our own intuitions ofwhat information would beinterestingProcedure• Identified users from real-timetwitter stream• @handles for food trucks and thenmanual human inspection• Asked questions for 90 active LAfood trucks at time of study• Most traffic was concentrated forjust three (Kogi Taco, GrilledCheese, and GrillEmAll), and wereport results only for those
  53. 53. Los Angeles Food Trucks Example
  54. 54. Example:Real-time Viewer InsightReal-time collection of relevant usersHistorical Social MediaComprehensive User ProfileRule-based FactsDeep Traits from PyscholinguisticAnalysisLives in Chicago, ILLoves Deception on NBCDirected Engagement to Learn MoreCollect opinion about a newshowMarket new productEtc.
  55. 55. TSA Tracker on Twitter