The Background Noise of the Internet

The Background
Noise of the
Internet
Andrew Morris
@Andrew___Morris

• Thank you
• Founders
• Committee
• Staff
• Attendees

About Me
Andrew Morris
Background in offensive
cyber stuff, security
research
Previously:
* Endgame R&D
* Intrepidus (NCC
Group)
* KCG (ManTech)
Twitter: @Andrew___Morris

Lots of people scan the Internet.
I built a system that collects all of the
Internet-wide scan traffic.
I analyze the data to find weird stuff.
I make that data available to researchers
for free via an API

Structure
• Background
• Previous Work
• Architecture
• Analysis
• Roadmap
• Conclusion
• Questions

Background
• Internet-wide mass scanning is easier than ever
• Open source tooling: Masscan, ZMap,
UnicornScan, etc
• Cloud computing
• Instant servers
• large amount of recyclable IP addresses
• High throughput / faster global Internet
connections

What is Internet Mass
Scanning?
• “Mass Scanning” is scanning every single
routable IP address on the Internet for
something
• The IPv4 address space is 0.0.0.0 –
255.255.255.255
• Give or take a few blocks
• That’s 4.2 billion IP addresses
• Bandwidth-wise, roughly same as uploading a 240
GB file

What does this mean?
• Lots and lots of people scanning the Internet,
for lots of different things
• From millions of different IP addresses
• Benign: Shodan, Censys, Sonar, ShadowServer
• Malicious: SSH/Telnet worms (Mirai), IOT worms,
CONFICKER, etc
• Internet-wide scanning is busier than ever

This creates a problem
When you see an IP scanning your network,
are they scanning you specifically or the
entire Internet?
When you see an IP attacking your network,
are they attacking you specifically or the
entire Internet?

Solution
• Collect all the omnidirectional Internet-wide
IPv4 scan/attack traffic
• Subtract those IPs/activity from your SIEM
• All the remaining activity is targeting you

But how?
• Stand up a large amount of servers in diverse
data centers with no business value
• No business value means that ANY traffic that hits
it is, by definition, opportunistic
• Instrument these servers with extremely
aggressive logging and small microservices
• Stream the logs of the scan/attack traffic to a
central place
• Analyze the data and convert into a consumable
format

Barriers
• It is strategically cheaper to ask a question
of the Internet than it is to answer a given
question
• How many computers are running X version of software
is easy
• How many computers are scanning for X version of
software is hard

Byproducts
• Observe changes in Internet-scanning over time
• Opt-out of omnidirectional scanning altogether
• Collect information on malware campaigns and
botnets

History
• Like three honeypots (2014)
• Animus v1 (2015)
• Bash and glue (SHMOOCON 2015 “No Budget Threat
Intelligence”)
• Related work at a previous company (2015-2016)
• EPIPHANY (2016)
• THE DATA THAT HONEYPOTS COLLECT IS SHITTY THREAT
INTELLIGENCE
• IT’S LITERALLY THE OPPOSITE OF THREAT INTELLIGENCE
• IT’S ANTI THREAT INTELLIGENCE
• Animus GOES COMMERCIAL (2017)
• Turns out startups are hard
• Grey Noise (2018)
• I’m not going to stop until I die
• ???
• Become a monk

GreyNoise
• Read about it here: https://greynoise.io
• API docs here: https://github.com/grey-noise-
intelligence/api.greynoise.io
• Visualizer here: http://viz.greynoise.io

Architecture
• Collection
• Orchestration
• Data Producers / Services
• Log Forwarder
• Message Bus
• Streamd
• Analysis
• Cache / Database
• Enrichments
• Analyticsd
• Consumption
• API
• Front End
• Operational Security

AWS
DigitalOcean
Azure
RabbitMQ
Long Term Storage
Database CacheAnalytics Server
Analytics Database
Web API

Step one: Stand up lots of
servers
in different regions of different
cloud providers

Collection: Orchestration
• Terraform
• Open source tool on GitHub by Hashicorp
• Supports lots of different cloud providers
• AWS
• DigitalOcean
• Azure
• Google Cloud
• Etc

Collection: Orchestration
(Lessons)
• LESSON: Cloud-init
• LESSON: NAT or nah
• LESSON: Interface names
• Eth0
• Ath1
• whatever

Collection: Data producers /
Services
• Ridiculously aggressive iptables rules
• Log all packets
• …on all ports
• …on all protocols
• SSH
• Telnet
• HTTP
• Others

Collection: Data producers /
Services
(Lessons)
• MISTAKE: Tune your iptables / p0f / sniffers / whatever to
ignore garbage / outbound traffic
• LESSON: Things will be spoofed (TCP, UDP, and ICMP)
• LESSON: Bang for your buck: Iptables, HTTP, Telnet, SSH, and
P0f

Step two: Stream the data to a
central place

Collection: Message Bus
•RabbitMQ
•Message Queue
•Topic routing

Collection: Message Bus
(Lessons)
• MISTAKE: Google PubSub
• LESSON: Maintain state
• LESSON: Meta message envelop
• Time
• Provider
• Region
• Node UUID
• POSSIBLE: ZeroMQ, Kafka
•Streamd

Collection: Log Forwarder
•I wrote my own
•Python + Pygtail / iNotify / Watchdog
•Can also use something that’s already been
written
•Logstash
•Elasticsearch Filebeat
•Rsyslog

Step three: Put the data in a
database

Analysis: Cache / Database
• PostgreSQL
• N days of data, rotates
• Fast-ish
• Robust
Dumpster
Long term storage
You’re going to fuck something up
Retro load is your friend

Analysis: Cache / Database
• MISTAKE: Postgres is awesome but too slow for data this big
• MISTAKE: Google BigQuery is the shit but it gets expensive if you're
doing batch queries on a very short timeline
• LESSON: Postgres + Cassandra is the truth

Analysis: Enrichments
• We need:
• ASN
• rDNS
• Organization
• Country
• City
• Maxmind is expensive
• Neustar is expensive
• Ipinfo is CHEAP
• Harvesting it yourself is also CHEAP but requires a lot of effort

Analysis: Enrichments
(Lessons)
• MISTAKE: Collecting the data yourself is hard and inconsistent and
involves a lot of work
• LESSON: ARIN has an unauthenticated non rate-limited public API for
IP ownership
• LESSON: Enrichd
• LESSON: Cache rules everything around me

Step five: Analyze and
categorize/tag the data

Analysis: Analyticsd
• Service to analyze some time window of data
• E.g. past 4 days of data
• Catalogue:
• Actors
• Shodan
• Censys
• Sonar
• Activity
• Scanning for SSH
• Scanning for Telnet
• LESSON: YOU PROBABLY DON’T NEED REAL TIME ANALYTICS
• Batch analytics with small time frames
• This is why Postgres will often do the trick
• LESSON: Only pay attention to activity that has happened on more than one of your nodes
• LESSON: You need to know how many nodes are up collecting data at any point in time to
properly do a time-series analysis

Step six: Make the data
available

Consumption: API
• Web API
• Tell me about this IP address
• Tell me about this analytic
• Github
• Search “Grey Noise API”
• Github.com/Grey-Noise-
Intelligence

Consumption: Bindings
• Bobby Filar: phyler/greynoise
• Tek: PyGreyNoise
• Bob Rudis: R bindings
• Some mystery Go bindings out there

Consumption: FRONT END
• Complete 100% credit to Casey Buto (github.com/cbuto)
• Point and click interface
• Hosted version at viz.greynoise.io
• EXPLORE THE DATA

OpSec (Operational Security)
• Hard to fingerprint (mostly custom services)
• Encrypt everything
• No names
• Ops domains
• Dockerize
• Shift infrastructure constantly
• Reduce the oracle surface
• IO is hard to opsec
• Minimum number / node thresholds
• Sleep delays

Cost
• AWS: 15 regions
• $4.75 per box
• Total: $71
• Digitalocean: 11 regions
• $5 per box
• Total: $55
• Google: 36 regions
• $4.28 per box
• Total: $154
• Total: $400 per month
Vultr: 15 regions
$5 per box (they advertise $2.50 but they're never
available)
Total: $75
Linode: 9 regions
$5 per box
Total: $45

Cost (notes)
• Notes:
• No Ops boxes in here (you need these)
• This is simply not enough to have complete coverage but it'll give you a good
start
• You can save money by buying extra IPs, but it complicates engineering

Analysis
• What am I collecting?
• Volume Summary
• Data Summary
• Actor Summary
• Benign
• Malicious
• Unknown???
• Malware Summary
• Hall of Shame (Malware-iest
regions of the Internet)
• WEIRD SHIT
• Misc Lessons

What am I collecting?
• Passive
• Iptables – Packets on ports
• P0f – passive OS fingerprint
• JA3 – SSL fingerprint (stick around!)
• Active
• HTTP
• SSH
• Telnet
• Experimental
• RDP
• SIP
• SMTP
• NTP
• TFTP
• DNS

Data Summary
• Iptables:
• I don’t have a good way to quantify this yet
• HTTP:
• Lots of ”/”, spoofed user agents, search engines, people looking for
Jboss/Wordpress/Tomcat/PHPMyAdmin
• SSH + Telnet
• Bots. Defaults cred attempts. Nothing new here.
• P0f
• Lots of OS visibility

Volume Summary
• With the aforementioned numbers ($400 worth of servers):
• 1M – 2M iptables events per day
• 700k – 1M SSH logins per day
• 1M – 10M telnet logins per day
• 10K – 100K HTTP requests per day
• 100-200 messages per second through your queue
• ~60K IPs per day
• 1GB of raw data, msgpacked + compressed per day

Actor Summary
• Benign:
• Shodan: 27 IPs
• Censys: 334 IPs
• Sonar: 56 IPs
• ShadowServer: 228 IPs
• IPIP: 63 IPs
• BinaryEdge: 253 IPs
• PDRLabs: 25 IPs
• Pingdom: 9 IPs
• ProbeTheNet: 1 IP
• NetCraft: 145 IPs
• Others
• Malicious
• Mirai: 249k IPs
• SSH Worms: 92k IPs
• Popped Routers / residential IPs attacking
people: 590k IPs

Pretenders
• Machines advertising client banners that are
false
• Mismatches between user agent, p0f OS fingerprint,
and JA3
• Is the browser hitting this HTTP server really
running Safari on a Linux kernel 3.1 box? Is it?
• Why? Idk

Dangling DNS
• When you spin a bunch of IPs up and down, it’s
not uncommon to inherit an IP address from your
cloud provider that still has a domain pointing
to it.
• CDN.whatever123.acme.com
• This traffic is dirty, you don’t want it

“WORM FINDER”
• Sometimes when Grey Noise observes an IP
address scanning for a given TCP port, I’ll
turn around and check to see if that port is
open on the source machine.
• If the answer is yes, this can be a great
indicator of a worm
• Why else would a computer search for behavior
that it also exhibits?
• Average lifespan from start to finish is 4 days

Zmap’s hardcoded ID parameter
• Zmap hardcodes all packets it creates with an
ID parameter of “54321”, making it trivial to
fingerprint
• Go to “github.com/zmap/zmap” and search / grep
the repository for “54321”
• Shoutout Oliver Gasser @ Technical University
of Munich

Still SO MANY WINDOWS WORMS
• LOADS of people blasting SMB traffic on TCP
port 445
• More and more RDP worms as well, but these
aren’t exploiting vulns, just guessing creds
• WinRM is next, in my opinion

People do weird stuff through
proxies
• Airline price scraping data (???)
• Also testing stolen credentials
• And probably credit card numbers
• News sites??? This is a huge rabbit hole…

Lots of robo calls probably
come from popped SIP boxes
• People try to make calls to India and Russia
through open VOIP servers
• Like, LOTS of them
• Tens of thousands per day

The things people
scan for through
Tor is
interesting

You can neuter/blow up worms by
replaying their own traffic back
to them
• A box is compromised with a Telnet worm
• The worm carries a built in wordlist
• The compromised box throws the same wordlist at
you
• You replay the wordlist back to the compromised
box
• Chances are, depending on the worm, one of
those credentials will work

What does the future hold?
• Version 1.1 API coming very soon
• Integrate with everything
• Badass machine learning opportunities
• Explore identifying anti-threat intelligence in
other areas
• Intranet traffic
• DMZ traffic
• Files on a filesystem

Conclusion
• The Internet is a noisy place
• Every packet has a story
• It’s possible to collect all of this background
noise
• If you want to explore the data, hit the API.
If the API doesn’t give you what you need,
email me or hit me up on Twitter

Acknowledgements
• Phil Maddox (twitter.com/foospidy)
• Bobby Filar (twitter.com/filar)
• Rich Seymour (twitter.com/rseymour)
• Casey Buto (github.com/cbuto)
• Bob Rudis (twitter.com/hrbrmstr)
• Tek (twitter.com/tenacioustek)
• Mickey Perre (twitter.com/MickeyPerre)
• Michel Oosterhof (twitter.com/micheloosterhof)

THANK YOU!
andrew@morris.sc
@andrew___morris

The Background Noise of the Internet

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Background Noise of the Internet

Similar to The Background Noise of the Internet (20)

More from Andrew Morris

More from Andrew Morris (10)

Recently uploaded

Recently uploaded (16)

The Background Noise of the Internet

Editor's Notes