Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Who are We Studying: Humans or Bots?

1,316 views

Published on

Presentation at the Workshop on "Small Data and Big Data Controversies and Alternatives: Perspectives from The Sage Handbook of Social Media Research Methods" with Anabel Quan-Haase, Luke Sloan, Diane Rasmussen Pennington, et al.
LINK: http://sched.co/7G5N

Published in: Social Media
  • Be the first to comment

Who are We Studying: Humans or Bots?

  1. 1. Who are We Studying: Bots or Humans? Anatoliy Gruzd gruzd@ryerson.ca @gruzd Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of Management Director, Social Media Lab Ryerson University
  2. 2. Research at the Social Media Lab • Social Media Analytics • Social Media Data Stewardship • Networked Influence • Online Political Engagement • Learning Analytics • Social Media & Health
  3. 3. Outline • Social Media Analytics • The Rise of the Bots • Case Study: Social Media Use during the 2014 EuroMaidan Revolution in Ukraine • Detecting Bots • Next steps
  4. 4. 1.7B users 500M users 300M users Growth of Social Media Data
  5. 5. Self- collected /reported Public APIs Data Resellers + More Ways to Access Social Media Data
  6. 6. Cloud & Distributed Computing Data & Information Organization Analytics Visualization + More Tools for Big Data Analytics
  7. 7. Data -> Visualizations -> Understanding How to Make Sense of Social Media Data? Twitter: @gruzdAnatoliy Gruzd 7
  8. 8. How to Make Sense of Social Media Data? Example: Geo-based Analysis Twitter: @gruzdAnatoliy Gruzd 8
  9. 9. How to Make Sense of Social Media Data? Example: Geo-based + Content Analysis Tracking Hate Speech on Twitter Twitter: @gruzd Anatoliy Gruzd 9 Source: http://www.fenuxe.com/tag/geo-coded
  10. 10. The Rise of Social Bots • Who are we studying: Humans or Bots? Social Media Data Analytics Challenges
  11. 11. Social Bot – software designed to act on the Internet with some level of autonomy
  12. 12. Different Types of Bots Free music, games, books, downloads Jewelery, electronics, vehicles Contest, gambling, prizes Finance, loans, realty Increase Twitter following DietAdult (Grier et al, 2010)
  13. 13. Sample Twitter Bots
  14. 14. Social Bot Example: Microsoft’s AI Twitter chatbot
  15. 15. Social Bot Example: Microsoft’s AI Twitter chatbot
  16. 16. Platform-reported & Estimated % of Bots Fake 5% Fake 2% Fake 8% Source: http://blogs.wsj.com/digits/2015/06/30/fake-accounts-still-plague-instagram-despite-purge-study-finds/ 1.5B users 300M users 400M users … but is that everything? (30,000,000)(15,000,000) (32,000,000)
  17. 17. Why does it matter if there are bots, spammers and fakers in our datasets? Popular topics mentioned in the 14,500 abstracts of journal & conference papers on “social media” or “social networking websites” published since 1999 (Gruzd, 2015)
  18. 18. Why does it matter if there are bots, spammers and fakers in our datasets? How many of these 14,500 papers took into account the presence and influence of bots, spammers or fakers ? (Gruzd, 2015)
  19. 19. Case Study: 2014 EuroMaidan Revolution in Ukraine "2014-02-21 11-04 Euromaidan in Kiev" by Amakuha. Licensed under CC BY-SA 3.0 via Wikimedia November 21, 2013 - Ukraine gov. suspended the trade & association agreement with EU Gruzd, A., & Tsyganova, K. (2015). Information Wars and Online Activism During the 2013/2014 Crisis in Ukraine: Examining the Social Structures of Pro- and Anti- Maidan Groups. Policy & Internet, 7(2), 121–158. http://doi.org/10.1002/poi3.91
  20. 20. About Vkontakte: #1 Social Networking Website in Ukraine 26 source: http://en.wikipedia.org @gruzd
  21. 21. Example: VK Group User Interface – Posts, Likes, Comments… …Discussion board, Links & Media Files
  22. 22. Data Collection PRO1 Pro-Maidan PRO2 Pro-Maidan ANTI1 Anti-Maidan ANTI2 Anti-Maidan Num. of Nodes 141,542 96,402 60,506 69,029 Num. of Connections 338,344 221,452 280,678 192,273 • Data collection: 2 most popular (public) Pro-Maidan and Anti-Maidan groups • Period: February 18 – May 25, 2015 • Used VK Public API • Communities – information about groups and group members • Wall – posts and comments • Likes – “likes” that members and visitors leave on posts • Friends – group members’ friendship relations
  23. 23. What can we learn from structures of friendship networks? Anti-EuroMaidan groupPro-EuroMaidan group
  24. 24. Subgroup 3 31@gruzd2014 EuroMaidan Revolution Example: VK Group – Pro EuroMaidan Marketing, Spam % of spammers among participants with friends is higher than among all group members Spam accounts 5% Spam accounts 15% Group members Members w/friends
  25. 25. Reported & Estimated % of Bot Accounts Fake 5% Fake 2% Fake 8% Source: http://blogs.wsj.com/digits/2015/06/30/fake-accounts-still-plague-instagram-despite-purge-study-finds/ 1.5B users 300M users 400M users … but is that everything?
  26. 26. Detecting Bots…
  27. 27. Detecting Bots… Photo •Color & Edge histograms •Color & Edge Directivity Descriptor (CEDD) •Image Similarity Message •Sensitive words •URL •Duplicates •#hashtags •@replies Poster •Username •Creation date •Engagement level SocialNetwork •# Friends •# Following •In/out degree centrality •Clustering (Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
  28. 28. Example: Fake Twitter account @gruzd? Not really!
  29. 29. Detecting Bots… Photo •Color & Edge histograms •Color & Edge Directivity Descriptor (CEDD) •Image Similarity Message •Sensitive words •URL •Duplicates •#hashtags •@replies Poster •Username •Creation date •Engagement level SocialNetwork •# Friends •# Following •In/out degree centrality •Clustering (Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
  30. 30. Detecting Bots… Fakers like to post on Fridays! Fake accounts Real accounts Frequency of Twitter posts (Gurajala et.al, 2015)
  31. 31. Detecting Bots… Photo •Color & Edge histograms •Color & Edge Directivity Descriptor (CEDD) •Image Similarity Message •Sensitive words •URL •Duplicates •#hashtags •@replies Poster •Username •Creation date •Engagement level SocialNetwork •# Friends •# Following •In/out degree centrality •Clustering (Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
  32. 32. Detecting Bots… Photo •Color & Edge histograms •Color & Edge Directivity Descriptor (CEDD) •Image Similarity Message •Sensitive words •URL •Duplicates •#hashtags •@replies Poster •Username •Engagement level •Creation date SocialNetwork •# Friends •# Following •In/out degree centrality •Clustering (Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
  33. 33. Detecting Bots… Fake accounts tend to be created later in the week Fakers Real accounts Frequency of creation days for Twitter accounts (Gurajala et.al, 2015)
  34. 34. Detecting Bots… Photo •Color & Edge histograms •Color & Edge Directivity Descriptor (CEDD) •Image Similarity Message •Sensitive words •URL •Duplicates •#hashtags •@replies Poster •Username •Engagement level •Creation date SocialNetwork •# Friends •# Following •In/out degree centrality •Clustering (Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
  35. 35. Detecting Bots… Photo •Color & Edge histograms •Color & Edge Directivity Descriptor (CEDD) •Image Similarity Message •Sensitive words •URL •Duplicates •#hashtags •@replies Poster •Username •Engagement level •Creation date SocialNetwork •# Friends •# Following •In/out degree centrality •Clustering (Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
  36. 36. Detecting Bots… Using Social Network Analysis Example: A spam account attempting to take over a conference hashtag
  37. 37. Detecting Bots…
  38. 38. to introduce these emerging techniques to researchers who are increasingly relying on social media data as their go-to data source! The challenge is ... © Chris Allen licensed under Creative Commons
  39. 39. Calling on computational researchers to • Develop and share principles, protocols, tools and techniques around handling and cleaning social media data. • Develop stronger partnerships with social science researchers to start discussing how to handle bot-like accounts properly. • …because the nature of bots and their influence on users’ online behavior is not just a computational, but also social science issue.
  40. 40. Questions to consider Remove or Keep? Scenario 1: Marketing-related bots, if they do not interact with anyone else in the study group but just there to increase their follower base Scenario 2: Automated Twitter accounts designed to repost certain news stories
  41. 41. Questions to consider Remove or Keep? Scenario 1: Marketing-related bots, if they do not interact with anyone else in the study group but just there to increase their follower base Scenario 2: Automated Twitter accounts designed to repost Trump’s tweets
  42. 42. 2017 #SMSociety Theme: Social Media for Social Good or Evil https://socialmediaandsociety.org
  43. 43. Who are We Studying: Bots or Humans? Anatoliy Gruzd gruzd@ryerson.ca @gruzd Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of Management Director, Social Media Lab Ryerson University
  44. 44. References • Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding High- quality Content in Social Media. In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 183–194). New York, NY, USA: ACM. • Gruzd, A., & Roy, J. (2014). Investigating Political Polarization on Twitter: A Canadian Perspective. Policy & Internet, 6(1), 28–45. http://doi.org/10.1002/1944-2866.POI354 • Gruzd, A., & Tsyganova, K. (2015). Information Wars and Online Activism During the 2013/2014 Crisis in Ukraine: Examining the Social Structures of Pro- and Anti-Maidan Groups. Policy & Internet, 7(2), 121–158. http://doi.org/10.1002/poi3.91 • Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010). @spam: the underground on 140 characters or less (p. 27). ACM Press. • Wang, A. H. (2010). Don’t follow me: Spam detection in Twitter. In Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT) (pp. 1–10). IEEE. • Yardi, S., Romero, D., Schoenebeck, G., & Boyd, D. (2009). Detecting spam in a Twitter network. First Monday, 15(1).

×