Reverse Engineering Twitter Hashtag Algorithm

1,388 views

Published on

Twitter today markets itself through neat infographics where it explains how its main features -- specifically, the hashtag -- should be used. The term used in most infographics is contributing value to conversation. Since no engineering logic is supplied along with the term, there is no way to know what it means in practice. This paper proposes a model that can be used to collect, process, and visualize the hashtag algorithm, relative to a user's own account. Software implementation is also provided.

Published in: Technology, Business

Reverse Engineering Twitter Hashtag Algorithm

  1. 1. . Contributions 1. a brand new method for crawling social networks 2. a framework that can be used by social media to evaluate impact ◦ = probability for tweets to show up in hashtag streams 3. example analysis based on the above . The goal is... .. .... to reverse engineer hashtag algorithm M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 2/21 ... 2/21
  2. 2. . Twitter Hashtags M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 3/21 ... 3/21
  3. 3. . Hashtag Streams . Hashtag Streams are ... .. .... streams of tweets that show up when people search Twitter • hashtag is the best way to search • note: Twitter tries to phase out hashtags (and mentions), so search may find tweets even without hashtags . Hashtags are Important... .. .... because they are used by social media to promote events, products, etc. M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 4/21 ... 4/21
  4. 4. . Twitter Infographics • Twitter promotes hashtags by releasing infographics • the content is very confusing for social media • hard to translate into numbers, concrete actions, etc. M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 5/21 ... 5/21
  5. 5. . Twitter Infographics (2) : Zoom-Ins M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 6/21 ... 6/21
  6. 6. . Twitter Infographics (3) : Cleanup YES Decide New Tag? Will you promote it? Will you add value? Add to hashtag stream Out Out NO NO NO YESYES • all the garbage cleaned out, a much clearer decision algorithms • does not clarify what the value or promotion mean in practice • since Twitter does not help, we need to reverse engineer the algorithm M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 7/21 ... 7/21
  7. 7. . Crawling vs Sampling M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 8/21 ... 8/21
  8. 8. . Crawling : Practice and Problems • traditional crawling is done in commandline using wget or curl • problem1: Twitter and others try to avoid being crawled and created fences (login, cookies, forwarding, JS post-loading, etc.) • problem2: official APis are very restricted, Twitter API does not cover search • problem3: hard to use other services while crawling .... Twitter + YouTube M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 9/21 ... 9/21
  9. 9. . Snowball Sampling • the new way to look at sampling • done in cycles: 1. sample something 2. select a wanted subset 3. sample the subset at a higher depth 4. .... repeat • snowball sampling is directly applicable to crawling Twitter M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 10/21 ... 10/21
  10. 10. . Crawling : Two Approaches • approach 1 (traditional) : use APIs (HTTP, OAuth, etc.) to get data • approach 2 (proposed) : attach your robot to a working Twitter webapp in browser ◦ interaction is via clicks, just like human ◦ more natural M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 11/21 ... 11/21
  11. 11. . Implementation M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 12/21 ... 12/21
  12. 12. . Implementation : Twaater • Chrome extension, auto-triggered by loading a Twitter page • storing logs in one's own Dropbox drive M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 13/21 ... 13/21
  13. 13. . Implementation : Twaater • https://github.com/maratishe/twaater • personalization 1. need to change Dropbox auth tokens to point to one's own drive 2. enter Twitter under own account and let Twaater pick up from here • runs continuously, close browser when want to stop M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 14/21 ... 14/21
  14. 14. . Example Analysis M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 15/21 ... 15/21
  15. 15. . Twaater : Metric Space • tweet metrics/counts: links, retweets, favorites, tags, tagstatus, mentions • + account metrics/counts: tweets, following, followers M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 16/21 ... 16/21
  16. 16. . Twaater : Tweet Timelin • all metrics change in time • timeline of one tweet is very important • aggregates tweet status and its position (if any) in hashtag streams ◦ for each hashtag contained in a tweet M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 17/21 ... 17/21
  17. 17. . Analysis : Rules and CCF • lists : time serious of metrics versus time series ouf positions in hashtag streams ◦ ccf( metric values, hashtag positions) ◦ note that there are alland tophashtag streams • selection : pick a max in time series, and filter lists by threshold ◦ thresholds are different for each metric ◦ helps to filter out noise or focus only on large (important) values • view showing up in hashtag streams as binary (yes/no) versus analog (list position) values • extras (future work) : analysis along the timeline, much higher complexity M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 18/21 ... 18/21
  18. 18. . Analysis : Results 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tags links mentions retweets favorites tweets following followers tagstatus all/binary 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tagslinks mentions retweets favorites tweets following followers tagstatus top/binary 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tags links mentions retweets favorites tweets following followers tagstatus all/actual 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tagslinks mentions retweets favorites tweets following followerstagstatus top/actual • binary: useless • analog: filtering out very low values (most) helps reveal good correlation ◦ for example, favorites contributes to tweets showing up closer to top in lists • account metrics: show no effect • among large values, tagstatus (topic popularity) becomes prominent M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 19/21 ... 19/21
  19. 19. . Future Work • Twaater is own-centric, makes is possible to crowdsource/distribute crawling ◦ fits the description of snowball sampling • 2nd order statistics (CCF) did not reveal a simple hashtag algorithm ◦ more complicated models have to be tested • alternatively smarter filtering can also help ◦ ... select a subset of important tweets to subject to analysis M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 20/21 ... 20/21
  20. 20. . That’s all, thank you ... M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 21/21 ... 21/21

×