Sifting through twitter


Published on

a quickly thrown together presentation about data mining twitter for vulnerability references which was presented at ruxmon.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sifting through twitter

  1. 1. sifting through twitter ruxmon aug 2011
  2. 2. introduction <ul><li>problem #1 </li></ul><ul><ul><li>vulnerability data feeds can lack important information </li></ul></ul><ul><li>problem #2 </li></ul><ul><ul><li>there is far too much noise on twitter from itsec people </li></ul></ul><ul><li>sifting through twitter </li></ul><ul><ul><li>twitter is now an integral part of the security community </li></ul></ul><ul><ul><li>many researchers post interesting data every day </li></ul></ul><ul><ul><li>tonnes of possible people to follow  </li></ul></ul><ul><ul><li>lots of juicy information amongst the rough  </li></ul></ul><ul><ul><li>not enough time to sift through it all </li></ul></ul>
  3. 3. general twitter observations <ul><ul><li>following is too selective </li></ul></ul><ul><ul><li>language barriers </li></ul></ul><ul><ul><li>too much noise </li></ul></ul><ul><ul><li>the UI is limited </li></ul></ul><ul><ul><li>too much CJ'ing </li></ul></ul><ul><ul><li>the word tweet  </li></ul></ul><ul><ul><li>endless amounts of data </li></ul></ul><ul><ul><li>small concise messages </li></ul></ul><ul><ul><li>can be queried via a API </li></ul></ul><ul><ul><li>interesting relational data </li></ul></ul><ul><ul><li>unicode text </li></ul></ul>the bad the good
  4. 4. the idea <ul><ul><li>build something to mine for vulnerability references </li></ul></ul><ul><ul><li>keep it as simple as possible  </li></ul></ul><ul><ul><li>automate collating data for vulnerability ID's </li></ul></ul><ul><ul><li>find what the community is saying about bugs </li></ul></ul><ul><ul><li>build a user database of itsec people on twitter </li></ul></ul>
  5. 5. entry point <ul><ul><li>use twitter search API to find vuln. references </li></ul></ul><ul><ul><ul><li>twitter search api doesn't support regex or hyphens </li></ul></ul></ul><ul><ul><ul><li>but vuln. ref prefixes are quite unique (CVE, VMSA, etc.) </li></ul></ul></ul><ul><ul><ul><li>limited (but sufficient) amount of api calls per hr </li></ul></ul></ul><ul><li>instant interesting results </li></ul><ul><ul><li>researchers posting 2c on bugs, posting exploits, advisories, links to blog analysis, etc. </li></ul></ul><ul><ul><li>vendors and security bots posting advisories </li></ul></ul><ul><ul><li>home users and AV messages, sysadmins about patches </li></ul></ul><ul><ul><li>lots of foreign language codes, but mining still works fine </li></ul></ul>
  6. 6. presentation <ul><ul><li>lots of relational time-based data - now to present it! </li></ul></ul><ul><ul><li>decided to use SIMILE Exhibit widgets from MIT  </li></ul></ul><ul><ul><li>is excellent for browsing, searching, filtering relational data </li></ul></ul><ul><ul><li>supports timelines, timeplots, maps – all easy to use </li></ul></ul>
  7. 7. similar items and language barriers <ul><ul><li>retweets and so on create many near duplicate items </li></ul></ul><ul><ul><li>added LCS to compare likeness and link similar items </li></ul></ul><ul><ul><li>finding RT's from mining needs built-in logic: people add 2c to tweets, tinyurls differ for the same Location, people add hashtags </li></ul></ul><ul><ul><li>mapping similar items allows for weighting items </li></ul></ul><ul><ul><li>interestingly, there is a lot of foreign text thats returned </li></ul></ul><ul><ul><li>hooked UI into google translate to translate inline </li></ul></ul>
  8. 8. extending userinfo <ul><ul><li> the userlist grows steadily with people referencing vulns </li></ul></ul><ul><ul><li>users generally have interesting meta-data, such as website (blog, company, etc) and sometimes a location string </li></ul></ul><ul><ul><li>added call to users/show.json to maintain userinfo </li></ul></ul><ul><ul><li>hooked into google maps API to try to resolve coords  </li></ul></ul>
  9. 9. demo
  10. 10. todo list <ul><li>expanding scope: </li></ul><ul><ul><li>utilize user-list to naturally branch-out search criteria </li></ul></ul><ul><ul><li>find trending items amongst security users </li></ul></ul><ul><ul><li>skips the entire manual following process </li></ul></ul><ul><li>misc.: </li></ul><ul><ul><li>linking meta-data from a vuln. inventory </li></ul></ul><ul><ul><li>basic data-feeds and web-services </li></ul></ul><ul><ul><li>performance/scalability improvements </li></ul></ul>
  11. 11. wrapping up <ul><ul><li>this has demonstrated there's potential for this data to be used for vulnerability related feeds and intel </li></ul></ul><ul><ul><li>researchers who find or research bugs and posting good analysis information is invaluable </li></ul></ul><ul><ul><li>some of the geomaps data is very interesting (communities, etc) </li></ul></ul><ul><ul><li>hoping to have time to add items on todo soon </li></ul></ul><ul><ul><li>interested to hear feedback and ideas  </li></ul></ul><ul><ul><li>this was all relatively quick & easy to throw together </li></ul></ul><ul><ul><ul><li>makes you wonder what well-resourced folks are doing? </li></ul></ul></ul>