Sifting through twitter
Upcoming SlideShare
Loading in...5

Sifting through twitter



a quickly thrown together presentation about data mining twitter for vulnerability references which was presented at ruxmon.

a quickly thrown together presentation about data mining twitter for vulnerability references which was presented at ruxmon.



Total Views
Views on SlideShare
Embed Views



7 Embeds 82 38 26 9 4 3 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Sifting through twitter Sifting through twitter Presentation Transcript

  • sifting through twitter ruxmon aug 2011
  • introduction
    • problem #1
      • vulnerability data feeds can lack important information
    • problem #2
      • there is far too much noise on twitter from itsec people
    • sifting through twitter
      • twitter is now an integral part of the security community
      • many researchers post interesting data every day
      • tonnes of possible people to follow 
      • lots of juicy information amongst the rough 
      • not enough time to sift through it all
  • general twitter observations
      • following is too selective
      • language barriers
      • too much noise
      • the UI is limited
      • too much CJ'ing
      • the word tweet 
      • endless amounts of data
      • small concise messages
      • can be queried via a API
      • interesting relational data
      • unicode text
    the bad the good
  • the idea
      • build something to mine for vulnerability references
      • keep it as simple as possible 
      • automate collating data for vulnerability ID's
      • find what the community is saying about bugs
      • build a user database of itsec people on twitter
  • entry point
      • use twitter search API to find vuln. references
        • twitter search api doesn't support regex or hyphens
        • but vuln. ref prefixes are quite unique (CVE, VMSA, etc.)
        • limited (but sufficient) amount of api calls per hr
    • instant interesting results
      • researchers posting 2c on bugs, posting exploits, advisories, links to blog analysis, etc.
      • vendors and security bots posting advisories
      • home users and AV messages, sysadmins about patches
      • lots of foreign language codes, but mining still works fine
  • presentation
      • lots of relational time-based data - now to present it!
      • decided to use SIMILE Exhibit widgets from MIT 
      • is excellent for browsing, searching, filtering relational data
      • supports timelines, timeplots, maps – all easy to use
  • similar items and language barriers
      • retweets and so on create many near duplicate items
      • added LCS to compare likeness and link similar items
      • finding RT's from mining needs built-in logic: people add 2c to tweets, tinyurls differ for the same Location, people add hashtags
      • mapping similar items allows for weighting items
      • interestingly, there is a lot of foreign text thats returned
      • hooked UI into google translate to translate inline
  • extending userinfo
      •  the userlist grows steadily with people referencing vulns
      • users generally have interesting meta-data, such as website (blog, company, etc) and sometimes a location string
      • added call to users/show.json to maintain userinfo
      • hooked into google maps API to try to resolve coords 
  • demo
  • todo list
    • expanding scope:
      • utilize user-list to naturally branch-out search criteria
      • find trending items amongst security users
      • skips the entire manual following process
    • misc.:
      • linking meta-data from a vuln. inventory
      • basic data-feeds and web-services
      • performance/scalability improvements
  • wrapping up
      • this has demonstrated there's potential for this data to be used for vulnerability related feeds and intel
      • researchers who find or research bugs and posting good analysis information is invaluable
      • some of the geomaps data is very interesting (communities, etc)
      • hoping to have time to add items on todo soon
      • interested to hear feedback and ideas 
      • this was all relatively quick & easy to throw together
        • makes you wonder what well-resourced folks are doing?