Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sifting through twitter


Published on

a quickly thrown together presentation about data mining twitter for vulnerability references which was presented at ruxmon.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Sifting through twitter

  1. 1. sifting through twitter ruxmon aug 2011
  2. 2. introduction <ul><li>problem #1 </li></ul><ul><ul><li>vulnerability data feeds can lack important information </li></ul></ul><ul><li>problem #2 </li></ul><ul><ul><li>there is far too much noise on twitter from itsec people </li></ul></ul><ul><li>sifting through twitter </li></ul><ul><ul><li>twitter is now an integral part of the security community </li></ul></ul><ul><ul><li>many researchers post interesting data every day </li></ul></ul><ul><ul><li>tonnes of possible people to follow  </li></ul></ul><ul><ul><li>lots of juicy information amongst the rough  </li></ul></ul><ul><ul><li>not enough time to sift through it all </li></ul></ul>
  3. 3. general twitter observations <ul><ul><li>following is too selective </li></ul></ul><ul><ul><li>language barriers </li></ul></ul><ul><ul><li>too much noise </li></ul></ul><ul><ul><li>the UI is limited </li></ul></ul><ul><ul><li>too much CJ'ing </li></ul></ul><ul><ul><li>the word tweet  </li></ul></ul><ul><ul><li>endless amounts of data </li></ul></ul><ul><ul><li>small concise messages </li></ul></ul><ul><ul><li>can be queried via a API </li></ul></ul><ul><ul><li>interesting relational data </li></ul></ul><ul><ul><li>unicode text </li></ul></ul>the bad the good
  4. 4. the idea <ul><ul><li>build something to mine for vulnerability references </li></ul></ul><ul><ul><li>keep it as simple as possible  </li></ul></ul><ul><ul><li>automate collating data for vulnerability ID's </li></ul></ul><ul><ul><li>find what the community is saying about bugs </li></ul></ul><ul><ul><li>build a user database of itsec people on twitter </li></ul></ul>
  5. 5. entry point <ul><ul><li>use twitter search API to find vuln. references </li></ul></ul><ul><ul><ul><li>twitter search api doesn't support regex or hyphens </li></ul></ul></ul><ul><ul><ul><li>but vuln. ref prefixes are quite unique (CVE, VMSA, etc.) </li></ul></ul></ul><ul><ul><ul><li>limited (but sufficient) amount of api calls per hr </li></ul></ul></ul><ul><li>instant interesting results </li></ul><ul><ul><li>researchers posting 2c on bugs, posting exploits, advisories, links to blog analysis, etc. </li></ul></ul><ul><ul><li>vendors and security bots posting advisories </li></ul></ul><ul><ul><li>home users and AV messages, sysadmins about patches </li></ul></ul><ul><ul><li>lots of foreign language codes, but mining still works fine </li></ul></ul>
  6. 6. presentation <ul><ul><li>lots of relational time-based data - now to present it! </li></ul></ul><ul><ul><li>decided to use SIMILE Exhibit widgets from MIT  </li></ul></ul><ul><ul><li>is excellent for browsing, searching, filtering relational data </li></ul></ul><ul><ul><li>supports timelines, timeplots, maps – all easy to use </li></ul></ul>
  7. 7. similar items and language barriers <ul><ul><li>retweets and so on create many near duplicate items </li></ul></ul><ul><ul><li>added LCS to compare likeness and link similar items </li></ul></ul><ul><ul><li>finding RT's from mining needs built-in logic: people add 2c to tweets, tinyurls differ for the same Location, people add hashtags </li></ul></ul><ul><ul><li>mapping similar items allows for weighting items </li></ul></ul><ul><ul><li>interestingly, there is a lot of foreign text thats returned </li></ul></ul><ul><ul><li>hooked UI into google translate to translate inline </li></ul></ul>
  8. 8. extending userinfo <ul><ul><li> the userlist grows steadily with people referencing vulns </li></ul></ul><ul><ul><li>users generally have interesting meta-data, such as website (blog, company, etc) and sometimes a location string </li></ul></ul><ul><ul><li>added call to users/show.json to maintain userinfo </li></ul></ul><ul><ul><li>hooked into google maps API to try to resolve coords  </li></ul></ul>
  9. 9. demo
  10. 10. todo list <ul><li>expanding scope: </li></ul><ul><ul><li>utilize user-list to naturally branch-out search criteria </li></ul></ul><ul><ul><li>find trending items amongst security users </li></ul></ul><ul><ul><li>skips the entire manual following process </li></ul></ul><ul><li>misc.: </li></ul><ul><ul><li>linking meta-data from a vuln. inventory </li></ul></ul><ul><ul><li>basic data-feeds and web-services </li></ul></ul><ul><ul><li>performance/scalability improvements </li></ul></ul>
  11. 11. wrapping up <ul><ul><li>this has demonstrated there's potential for this data to be used for vulnerability related feeds and intel </li></ul></ul><ul><ul><li>researchers who find or research bugs and posting good analysis information is invaluable </li></ul></ul><ul><ul><li>some of the geomaps data is very interesting (communities, etc) </li></ul></ul><ul><ul><li>hoping to have time to add items on todo soon </li></ul></ul><ul><ul><li>interested to hear feedback and ideas  </li></ul></ul><ul><ul><li>this was all relatively quick & easy to throw together </li></ul></ul><ul><ul><ul><li>makes you wonder what well-resourced folks are doing? </li></ul></ul></ul>