Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub

327 views

Published on

Slides for the paper presented at IEEE/WIC/ACMInternational Conference on Web Intelligence (WI ’19), Thessaloniki, Greece

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub

  1. 1. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub Sameera Horawalavithana*, Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury, Lawrence O. Hall, Adriana Iamnitchi University of South Florida IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece
  2. 2. Security Vulnerabilities ❏ Identified by CVE (Common Vulnerabilities and Exposures) identifiers: ❏ Publicly known security vulnerability is uniquely identified by a pattern CVE-YYYY-NNNN ❏ Formally recorded in National Vulnerability Database (NVD) ❏ “U.S. government repository of standards based vulnerability management data represented using the Security Content Automation Protocol (SCAP)” ❏ Discussed on social media 2CVEs published in NVD over time.
  3. 3. Research Questions 1) What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? 2) Can the software development activities in GitHub be predicted from the discussions on Reddit and Twitter? 3
  4. 4. Outline ❏ Dataset ❏ Data analysis ❏ CVE mentions in Reddit and Twitter ❏ CVE mentions in GitHub actions ❏ Predicting GitHub activities by using Reddit and Twitter activity signals ❏ Summary 4
  5. 5. Datasets ❏ Two social-media platforms: Reddit and Twitter ❏ One software collaborative platform: GitHub ❏ 18 months of records: 03/16-08/17 ❏ Data filtering using the regular expression CVE-d{4}-d{4} to match CVE identifiers that appeared in posts, comments in Reddit, tweets, replies in Twitter, and GitHub event descriptions 5
  6. 6. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? 6
  7. 7. CVE Mentions in Reddit and Twitter (1) 7 ❏ 10,257 CVE identifiers mentioned in our Reddit/Twitter dataset, ❏ 95% CVE identifiers are mentioned only on Twitter. ❏ 0.5% CVE IDs are mentioned only on Reddit. ❏ 4.5% mentioned on both platforms More security vulnerabilities are discussed on Twitter
  8. 8. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? 8
  9. 9. CVE Mentions in Reddit and Twitter (2) 9 Reddit Twitter Both platforms show a peak in the mentions of CVE identifiers near their public disclosure ❏ Day 0 represent the NVD public disclosure date ❏ Published date of the message (post/tweet) is relative to NVD public disclosure date of mentioned CVE identifier
  10. 10. CVE Mentions in Reddit and Twitter (3) 10 Reddit Twitter ❏ Timing of social-media messages with respect to Reddit subreddits and Twitter Hashtags Out of the CVE identifiers discussed on Reddit, majority are discussed before public disclosure
  11. 11. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How does the severity of the security vulnerabilities affect the timing of vulnerability mentions on the two platforms? 11
  12. 12. CVE Mentions in Reddit and Twitter (4) 12 ❏ Timing of social-media messages with respect to the severity of mentioned security vulnerabilities ❏ We identified bot-driven communities using the textual description of the subreddit ❏ We used BotHunter to detect Twitter bot users Early discussions related to high severity CVE identifiers occur on Reddit
  13. 13. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? 13
  14. 14. CVE Mentions in Reddit and Twitter (5) 14 ❏ Three Cascade Types ❏ Before (completed): cascades start and end before the public disclosure day of the mentioned CVE ❏ Before (not completed): cascades start before the public disclosure day, but continue after the public disclosure day of the mentioned CVE ❏ After: cascades start and end before the public disclosure day of the mentioned CVE Reddit discussions are viral before the CVE public disclosure, Twitter re-shares emerge after the CVE public disclosure
  15. 15. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. What types of sentiments fuel these discussions? 15
  16. 16. CVE Mentions in Reddit and Twitter (6) 16 ● Uncertainty analysis of Reddit comments ○ Used a pre-trained machine learning model (Yu et al. [1]) to classify whether comment is certain or not towards the subject of the conversation ● Reaction types of Twitter replies ○ Used a pre-trained machine learning model (Glenski et al. [2]) to classify whether the reply is in a type of an answer, elaboration, question, appreciation, negative reaction, and agreement 1. Ning Yu and Graham Horwood. 2018. Veracity Enriched Event Extraction. In 2018 International Workshop on Social Sensing (SocialSens).3–3. 2. Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 176–181. More “certain” comments in Reddit, Majority of Twitter replies are classified as “elaboration”, then follows “answer” before and after public disclosure
  17. 17. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. How does GitHub activity depend on the public disclosure of security vulnerabilities? 17
  18. 18. CVE Mentions in GitHub Events (1) ❏ 10,502 CVE identifiers mentioned in GitHub Events ❏ The overlap with the CVE identifiers mentioned in platforms ❏ 40% with Twitter ❏ 3% with Reddit 18 Moderate overlap of CVE identifiers subject to software development with Twitter
  19. 19. CVE Mentions in GitHub Events (2) ❏ Majority of GitHub events mentioned only one CVE identifier, ❏ One CVE identifier (CVE-2015-1805) is mentioned more than in 3000 GitHub events, ❏ CVE-2015-1805 is published in NVD around August 2015 ❏ We noticed an increased volume of related GitHub activities in early 2016 ❏ What did really happen? 19
  20. 20. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. How does GitHub activity depend on the public disclosure of security vulnerabilities? f. How does GitHub activity correlate with the number of CVEs for the most vulnerable repositories? 20
  21. 21. CVE Mentions in GitHub Events (3) 21 ❏ We selected two most vulnerable repositories with respect to the number of associated CVE identifiers ❏ We show the pattern across three time-series, monthly number of mentioned CVEs, Forks, Watches and Push Events ❏ We calculate Dynamic Time Warping (DTW) to measure the similarity between GitHub event and CVE time-series Push Events are the closest to follow the pattern of CVE mentions
  22. 22. RQ2: Can the software development activities in GitHub be predicted from the discussions on Reddit and Twitter? 22
  23. 23. Predicting GitHub Activities A GitHub event is defined as (U,R,Ep ,Th ), ❏ U: user ❏ R: repository ❏ Ep : type of action (PushEvent PullRequestEvent, IssuesEvent, ForkEvent, WatchEvent, CommitEvent, ReleaseEvent) ❏ Th : the event time-stamp in hours 23 Time Reddit Twitter GitHub Training Testing Features Target (Event) Features Target (Event) January 2017 to May 2017* August 2017 *June and July, 2017 as validation data
  24. 24. Predicting GitHub Activities: Features and Approach ❏ Reddit time-series features ❏ Daily count of posts ❏ Daily count of active authors ❏ Daily count of active subreddits ❏ Daily counts of comments ❏ Twitter time-series features ❏ Daily count of tweets ❏ Daily count of tweeting users ❏ Daily count of retweets ❏ Daily count of retweeting users 24 Reddit/Twitter time-series Features NN Number of GitHub events in a day Likelihood of a user performing an action to a repository in a hour LSTM Hourly GitHub activities of a user to a repository Predicting Longitudinal User Activity at Fine Time Granularity in Online Collaborative Platforms, Renhao Liu, Frederick mubang, Lawrence Hall*, Sameera Horawalavithana, Adriana iamnitchi, John Skvoretz, IEEE International Conference on Systems, Man, and Cybernetics (SMC) , Bari, Italy, 2019
  25. 25. Predicting GitHub Activities: Results 25 JS-divergence: 0.0020, and R2: 0.6067,JS-divergence: 0.0029 and R2: 0.6300
  26. 26. Predicting GitHub Activities: Relevance 26 ❏ Why is predicting GitHub activities important? ❏ GitHub hosts many exploits and patches related with CVE identifiers ❏ Predictions might reflect the software development activities of an attacker who develops an exploit ❏ Predictions can be used to estimate the availability of a patch related to a security vulnerability Reddit/Twitter features are helpful for predicting number of GitHub events. It is more difficult to predict the identity of a user and the repository in an event.
  27. 27. Summary 27 ❏ We characterized a use-case scenario where diverse online platforms are interconnected such that the activities in one platform can be predicted based on the activities in the others. Practical implications of our findings: ❏ Advance or calibrate security alert tools based on information from multiple social media platforms. ❏ Better coordinate software development activities with the lessons learned from social-media information
  28. 28. Acknowledgements ❏ Funded by DARPA SocialSim Program and the Air Force Research Laboratory ❏ Data: Leidos, Netanomics ❏ Evaluation code provided by Pacific Northwest National Laboratory 28
  29. 29. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub Sameera Horawalavithana* (sameera1@mail.usf.edu) Check out our project @SocialSim
  30. 30. Backup 30
  31. 31. Thank you. sameera1@mail.usf.edu 31 Check out our project @SocialSim
  32. 32. Related Work ❏ Different types of security vulnerability information available in Twitter (Syed et. al., Sauerwein et al.) ❏ Description of Vulnerabilities (e.g., URLs to security mailing list, expert blogs etc.) ❏ Demonstration of Exploits (e.g., URLs to YouTube videos) ❏ Unofficial proposals of countermeasures (e.g., URLs to security blogs describing unofficial patches) ❏ Announcement of patch releases (e.g., URLs to official blog posts by vendors) ❏ Automatically discovering security threats from independent platforms. ❏ E.g., Twitter, Dark Web (Sapienza et al.), security blogs (Mittal et. al, ) etc. 32

×