Sifting through twitter
Upcoming SlideShare
Loading in...5
×
 

Sifting through twitter

on

  • 873 views

a quickly thrown together presentation about data mining twitter for vulnerability references which was presented at ruxmon.

a quickly thrown together presentation about data mining twitter for vulnerability references which was presented at ruxmon.

Statistics

Views

Total Views
873
Views on SlideShare
791
Embed Views
82

Actions

Likes
0
Downloads
7
Comments
0

7 Embeds 82

http://www.volvent.org 38
http://volvent.org 26
http://talkback.volvent.org 9
https://twimg0-a.akamaihd.net 4
http://a0.twimg.com 3
http://webcache.googleusercontent.com 1
http://www.slashdocs.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sifting through twitter Sifting through twitter Presentation Transcript

  • sifting through twitter ruxmon aug 2011
  • introduction
    • problem #1
      • vulnerability data feeds can lack important information
    • problem #2
      • there is far too much noise on twitter from itsec people
    • sifting through twitter
      • twitter is now an integral part of the security community
      • many researchers post interesting data every day
      • tonnes of possible people to follow 
      • lots of juicy information amongst the rough 
      • not enough time to sift through it all
  • general twitter observations
      • following is too selective
      • language barriers
      • too much noise
      • the UI is limited
      • too much CJ'ing
      • the word tweet 
      • endless amounts of data
      • small concise messages
      • can be queried via a API
      • interesting relational data
      • unicode text
    the bad the good
  • the idea
      • build something to mine for vulnerability references
      • keep it as simple as possible 
      • automate collating data for vulnerability ID's
      • find what the community is saying about bugs
      • build a user database of itsec people on twitter
  • entry point
      • use twitter search API to find vuln. references
        • twitter search api doesn't support regex or hyphens
        • but vuln. ref prefixes are quite unique (CVE, VMSA, etc.)
        • limited (but sufficient) amount of api calls per hr
    • instant interesting results
      • researchers posting 2c on bugs, posting exploits, advisories, links to blog analysis, etc.
      • vendors and security bots posting advisories
      • home users and AV messages, sysadmins about patches
      • lots of foreign language codes, but mining still works fine
  • presentation
      • lots of relational time-based data - now to present it!
      • decided to use SIMILE Exhibit widgets from MIT 
      • is excellent for browsing, searching, filtering relational data
      • supports timelines, timeplots, maps – all easy to use
  • similar items and language barriers
      • retweets and so on create many near duplicate items
      • added LCS to compare likeness and link similar items
      • finding RT's from mining needs built-in logic: people add 2c to tweets, tinyurls differ for the same Location, people add hashtags
      • mapping similar items allows for weighting items
      • interestingly, there is a lot of foreign text thats returned
      • hooked UI into google translate to translate inline
  • extending userinfo
      •  the userlist grows steadily with people referencing vulns
      • users generally have interesting meta-data, such as website (blog, company, etc) and sometimes a location string
      • added call to users/show.json to maintain userinfo
      • hooked into google maps API to try to resolve coords 
  • demo volvent.org
  • todo list
    • expanding scope:
      • utilize user-list to naturally branch-out search criteria
      • find trending items amongst security users
      • skips the entire manual following process
    • misc.:
      • linking meta-data from a vuln. inventory
      • basic data-feeds and web-services
      • performance/scalability improvements
  • wrapping up
      • this has demonstrated there's potential for this data to be used for vulnerability related feeds and intel
      • researchers who find or research bugs and posting good analysis information is invaluable
      • some of the geomaps data is very interesting (communities, etc)
      • hoping to have time to add items on todo soon
      • interested to hear feedback and ideas 
      • this was all relatively quick & easy to throw together
        • makes you wonder what well-resourced folks are doing?