(and Other Adventures in Internet-Scale Data Science)
Doing Security Data Science at Scale using Rapid7 Project Sonar data (but also with data you have at home/work).
3.
Delivering Security Data & Analytics
that revolutionize the practice of cyber security
37%
Fortune 1000
5,100+
Customers
800+
Employees
99
Countries
Threat Exposure Management
Incident Detection and Response
Security Advisory Services
Rapid7
4. R7 Data Science + Labs
We have an incredible team of
Red teamers
Data Scientists
Developers
5. WIRED
IBM Watson Brings AI Wonders to Cybersecurity
Why Machine Learning Is Our Last Hope for
Cybersecurity
Fortune
Datanami
MIT builds AI system that can detect 85% of
cyberattacks
Business Insider
Machine learning is fueling a cyber arms race
14. The place for security data science?
Separate signal from noise
Finding and visualizing trends
Cluster hosts into groups
filtering
visualization
organization
26. Sonar Data
443/TCP SSL Certificates
80/TCP HTTP Get/IP vhosts
Reverse DNS
Forward DNS
UDP Probes (uPnP, IPMI, NetBIOS, etc.)
POP, IMAP, SMTP
27. 443/TCP SSL Certs (weekly scans)
~25M SSL Certs, ~ 55GB in < 4 hours
80/TCP HTTP Get Requests (bi-weekly scans)
Reverse DNS (bi-weekly scans)
~60-65M Web servers, ~1.7 TB in < 10 hours
~1.1B Records, ~50 GB < 24 hours
What's out there?
28. Heisenberg
Low-interaction, cloud-based RDP honeypots deployed
across the world
Over 334 days, recorded:
221203 different password attempts
from 5076 distinct IP addresses
across 119 different countries
29.
30. Other data
• BGP Archives
• Blacklists: CleanMX, phishtank, malwaredomains
31. Malicious Topology of IPV4
• How to apply Internet-scale structure to phishing
attacks?
• How can this structure help us identify malicious
areas of the Internet?
40. • Very few ASes host a disproportionate amount of
malicious activity
• Smaller subnets and ASes are becoming more
ubiquitous in IPv4
• Malicious ASes are likely large and deeply
fragmented
Recap
41. With Internet-Scale security data...
• We develop more informed context and bounds on
local malicious activity
• We make effective security ML more of a future
possibility
42. How can I get started?
• sonar.labs.rapid7.com and scans.io
• blacklists: CleanMX, Phishtank, Malwaredomains
• BGP archives - routeviews project
• Heisenberg data coming soon