Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Malicious Topologies of IPv4
and Other Adventures in Internet-Scale Data Science
Suchin Gururangan
Bob Rudis

Rapid7 Data ...
Quick Bios
Suchin Gururangan
@ssgrn
suchin.co
Data Scientist
Bob Rudis
@hrbrmstr
rud.is
[Master]Chief Data Scientist


Delivering Security Data & Analytics 

that revolutionize the practice of cyber security
37%

Fortune 1000
5,100+

Custo...
R7 Data Science + Labs
We have an incredible team of
Red teamers
Data Scientists
Developers
WIRED
IBM Watson Brings AI Wonders to Cybersecurity
Why Machine Learning Is Our Last Hope for
Cybersecurity
Fortune
Datana...
Machine learning algorithms are only as good
as the data they are trained on.
Location Identification
Facial Recognition
Object Recognition
ModelPixels
URLs
SSL Certs
Webshells
malicious/benign?
anomalous?
URLs
SSL Certs
Webshells
Feature
Engineering
? Model
Most security data is inconsistent
short-lived
adversarial
biased
lacking ground truth
Let's get better security data.
Data science is not just machine learning.
The place for security data science?
Separate signal from noise
Finding and visualizing trends
Cluster hosts into groups
fi...
Internet-Scale
Internet-Scale is big
Internet-Scale is generalizable
Internet-Scale is structured
See trends and develop context for micro-scale attacks
More data!
Why Internet-Scale Data Science?
Internet-Scale Tools
Sonar
Internet-wide surveys across entire public IPV4 and
a wide variety of services and protocols.
Started in Nov 2013 by...
Sonar Data
443/TCP SSL Certificates
80/TCP HTTP Get/IP vhosts
Reverse DNS
Forward DNS
UDP Probes (uPnP, IPMI, NetBIOS, etc....
443/TCP SSL Certs (weekly scans)
~25M SSL Certs, ~ 55GB in < 4 hours
80/TCP HTTP Get Requests (bi-weekly scans)
Reverse DN...
Heisenberg
Low-interaction, cloud-based RDP honeypots deployed
across the world
Over 334 days, recorded:
221203 different ...
Other data
• BGP Archives
• Blacklists: CleanMX, phishtank, malwaredomains
Malicious Topology of IPV4
• How to apply Internet-scale structure to phishing
attacks?
• How can this structure help us i...
IPV4 Hierarchy
IP Subnet AS
127.0.0.1 127.0.0/24 AS10
source: xkcd
Malicious Autonomous Systems
AS Fragmentation
20.1.0/23
20.1.0/2320.1.2/23
20.1.2/24 20.1.2/24 20.1.0/2420.1.0/24
Tree Depth
Fragmentation = 1 - Tree D...
Malicious AS Topology
Size
80-95th percentile in
IPv4
Fragmentation
10-20% higher
Malicious
Topology
Benign
Topology
- sub...
Subnet Category ARIN Fee Subnet Prefix
XX-Small $500.00 < /22
X-Small $1,000.00 /22 - /20
Small $2,000.00 /20 - /18
Medium ...
• Very few ASes host a disproportionate amount of
malicious activity
• Smaller subnets and ASes are becoming more
ubiquito...
With Internet-Scale security data...
• We develop more informed context and bounds on
local malicious activity
• We make e...
How can I get started?
• sonar.labs.rapid7.com and scans.io
• blacklists: CleanMX, Phishtank, Malwaredomains
• BGP archive...
Malicious Topologies of IPv4
Malicious Topologies of IPv4
Malicious Topologies of IPv4
Malicious Topologies of IPv4
Malicious Topologies of IPv4
Malicious Topologies of IPv4
Malicious Topologies of IPv4
Upcoming SlideShare
Loading in …5
×

Malicious Topologies of IPv4

450 views

Published on

(and Other Adventures in Internet-Scale Data Science)

Doing Security Data Science at Scale using Rapid7 Project Sonar data (but also with data you have at home/work).

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Malicious Topologies of IPv4

  1. 1. Malicious Topologies of IPv4 and Other Adventures in Internet-Scale Data Science Suchin Gururangan Bob Rudis
 Rapid7 Data Science
  2. 2. Quick Bios Suchin Gururangan @ssgrn suchin.co Data Scientist Bob Rudis @hrbrmstr rud.is [Master]Chief Data Scientist
  3. 3. 
 Delivering Security Data & Analytics 
 that revolutionize the practice of cyber security 37%
 Fortune 1000 5,100+
 Customers 800+
 Employees 99
 Countries Threat Exposure Management Incident Detection and Response Security Advisory Services Rapid7
  4. 4. R7 Data Science + Labs We have an incredible team of Red teamers Data Scientists Developers
  5. 5. WIRED IBM Watson Brings AI Wonders to Cybersecurity Why Machine Learning Is Our Last Hope for Cybersecurity Fortune Datanami MIT builds AI system that can detect 85% of cyberattacks Business Insider Machine learning is fueling a cyber arms race
  6. 6. Machine learning algorithms are only as good as the data they are trained on.
  7. 7. Location Identification Facial Recognition Object Recognition
  8. 8. ModelPixels
  9. 9. URLs SSL Certs Webshells malicious/benign? anomalous?
  10. 10. URLs SSL Certs Webshells Feature Engineering ? Model
  11. 11. Most security data is inconsistent short-lived adversarial biased lacking ground truth
  12. 12. Let's get better security data.
  13. 13. Data science is not just machine learning.
  14. 14. The place for security data science? Separate signal from noise Finding and visualizing trends Cluster hosts into groups filtering visualization organization
  15. 15. Internet-Scale
  16. 16. Internet-Scale is big
  17. 17. Internet-Scale is generalizable
  18. 18. Internet-Scale is structured
  19. 19. See trends and develop context for micro-scale attacks More data! Why Internet-Scale Data Science?
  20. 20. Internet-Scale Tools
  21. 21. Sonar Internet-wide surveys across entire public IPV4 and a wide variety of services and protocols. Started in Nov 2013 by HD Moore
  22. 22. Sonar Data 443/TCP SSL Certificates 80/TCP HTTP Get/IP vhosts Reverse DNS Forward DNS UDP Probes (uPnP, IPMI, NetBIOS, etc.) POP, IMAP, SMTP
  23. 23. 443/TCP SSL Certs (weekly scans) ~25M SSL Certs, ~ 55GB in < 4 hours 80/TCP HTTP Get Requests (bi-weekly scans) Reverse DNS (bi-weekly scans) ~60-65M Web servers, ~1.7 TB in < 10 hours ~1.1B Records, ~50 GB < 24 hours What's out there?
  24. 24. Heisenberg Low-interaction, cloud-based RDP honeypots deployed across the world Over 334 days, recorded: 221203 different password attempts from 5076 distinct IP addresses across 119 different countries
  25. 25. Other data • BGP Archives • Blacklists: CleanMX, phishtank, malwaredomains
  26. 26. Malicious Topology of IPV4 • How to apply Internet-scale structure to phishing attacks? • How can this structure help us identify malicious areas of the Internet?
  27. 27. IPV4 Hierarchy IP Subnet AS 127.0.0.1 127.0.0/24 AS10
  28. 28. source: xkcd
  29. 29. Malicious Autonomous Systems
  30. 30. AS Fragmentation 20.1.0/23 20.1.0/2320.1.2/23 20.1.2/24 20.1.2/24 20.1.0/2420.1.0/24 Tree Depth Fragmentation = 1 - Tree Depth / # Nodes Subnet Tree
  31. 31. Malicious AS Topology Size 80-95th percentile in IPv4 Fragmentation 10-20% higher Malicious Topology Benign Topology - subnet Composition 50-60% XX-small subnets
  32. 32. Subnet Category ARIN Fee Subnet Prefix XX-Small $500.00 < /22 X-Small $1,000.00 /22 - /20 Small $2,000.00 /20 - /18 Medium $4,000.00 /18 - /16 Large $8,000.00 /16 - /14 X-Large $16,000.00 /14 - /12 XX-Large $32,000.00 > /12
  33. 33. • Very few ASes host a disproportionate amount of malicious activity • Smaller subnets and ASes are becoming more ubiquitous in IPv4 • Malicious ASes are likely large and deeply fragmented Recap
  34. 34. With Internet-Scale security data... • We develop more informed context and bounds on local malicious activity • We make effective security ML more of a future possibility
  35. 35. How can I get started? • sonar.labs.rapid7.com and scans.io • blacklists: CleanMX, Phishtank, Malwaredomains • BGP archives - routeviews project • Heisenberg data coming soon

×