Leveraging DNS to
Surface Attacker
Activity
March 2017 • Josh Liburdi & Chris McCubbin
Presenters
Chris McCubbin
Sqrrl Director of Data Science
Josh Liburdi
Sqrrl Security Technologist
Agenda
• Leveraging DNS data for investigations
• DNS-based data science techniques
• An example of Tunneling and DGA detection
Leveraging DNS
Data for
Investigations
What is DNS?
Client needs to connect to:
https://www.sqrrl.com
Client's DNS server doesn't know where
sqrrl.com is hosted, forwards query to
upstream server
Upstream DNS server knows sqrrl.com
resolves to 104.196.225.76, returns response
Client's DNS server caches response, sends
response to client
Client connects to https://www.sqrrl.com
DNS Server
https://sqrrl.com
2
3
5
DNS Server
1 4
1
2
3
4
5
How do attackers use DNS?
• Attackers target DNS
– DNS spoofing
– DNS reflection
• Attackers utilize DNS
– Tunneling
– Domain Generation Algorithms (DGA)
– Dynamic DNS
Why is DNS data useful?
Threat Detection
Opportunity for attacker to leave traceable
footprints in your network
Incident Investigations
Keep track of attacker access in your
network
DNS Tunneling Overview
• Data encoded inside of DNS queries is sent to an attacker-controlled server
• Used for command and control, data exfiltration
• Bypasses common security controls (firewalls, web proxies)
Local Network
Local DNS
Resolver
Intermediate DNS
Resolver
*.tunnel.com
DNS Tunnel Server
*.tunnel.com
DNS Tunnel Client
Remote Network
DNS Tunneling Overview
Many queries required to transfer moderate
amounts of data
1MB transfer would take ~5k domains
Tunnels produce patterns
paeqcigq.tunnel.com
pafich3i.tunnel.com
gxqwl0eaytioruga5.tunnel.com
Queried DNS domains tend to be unique
Assuming no repeats in data, each domain will
contain unique labels
DGA Overview
def generate_domain(year, month, day):
domain = ""
for i in range(16):
year = ((year ^ 8 * year) >> 11) ^
((year & 0xFFFFFFF0) << 17)
month = ((month ^ 4 * month) >> 25) ^
16 * (month & 0xFFFFFFF8)
day = ((day ^ (day << 13)) >> 19) ^
((day & 0xFFFFFFFE) << 12)
domain += chr(((year ^ month ^ day) %
25) + 97)
return domain
Method of establishing a connection with a
command and control server
Used to protect / hide infrastructure and
evade detection
Avoids DNS domain blacklisting
Malware generates DNS domains based
on an algorithm and a seed
Seed may be hardcoded or determined
dynamically (e.g., current datetime) en.wikipedia.org/wiki/Domain_generation_algorithm#
Example
DGA Overview
Source: https://johannesbader.ch/2014/12/the-dga-of-newgoz/
DGAs produce patterns
Visually appear “off”
Human would interpret the domain as strange
(pmwtrdsv.ru) or nonsensical (turnipboxsea.com)
Malware may attempt to resolve many
unregistered domains
ci4u0c10b77f5opvn211n5poa3.comwiq
yhl13dkep615aec27ue2t2t.net
kguv3bd2hi317d9l8vdy4i6m.org
xah67i2ayufesns8mh12h1kab.net
7m4oq6jngoka7zxtoq1taebe1.com
DGA Overview
Malware Seed # Domains in wild
Alureon Thread ID + milliseconds since boot 5/day
Padcrypt Date 24/day or 72/day
ProsLikeFan Date, hardcoded 100/day
Qadars Date 200/day
Qakbot Date 5000/day
Sisron Date 4/day
Source: https://johannesbader.ch/
DNS-Based Data Science
Techniques
DNS Data Sources
DNS Tunnel Detection
DNS
Data
Filter
DNS
Data
Collation
Features
Classifier Risk Outliers
DNS Tunnel Detection
DNS Data Filter DNS Data
0.
0.5
1.
1.5
2.
2.5
NumberofDNS
requests
Time
1 hour buckets
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
Collation
• Number of queries
• Number of subdomains
• Average subdomain length
• Average information content of subdomains
Features
DNS Tunnel Classification Features
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
• Number of queries
• Number of subdomains
• Average subdomain length
• Average information content of subdomains
Classifier Risk Outliers
Features
DNS Tunnel Classification
DNS Data Filter DNS Data
DNS Tunnel Validation
paeqcigq.tunnel.com
pafich3i.tunnel.com
gxqwl0eaytioruga5.tunnel.com IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
Collation
Lessons Learned from testing on Sqrrl DNS data
• There are several potential sources of false positives:
– CDNs
– Anti-virus software
– Internal DNS traffic
– Popular services (Spotify, Slack, …)
• Many of these organize content under long, random-looking subdomain names
• Whitelisting can remove some of these false positives
• A hard cut requiring > K unique subdomains per user per hour helps significantly
Sqrrl traffic data feature plots
0
45
90
135
180
0 2250 4500 6750 9000
Number of Subdomains
Phishing
YouTube, Amazon AWS,
CDNs, anti-virus, anti-spam
sqrrl-lab.net
slack-msgs.com
AverageLength
Sqrrl traffic data feature plots
0.
0.25
0.5
0.75
1.
1.25
0 2250 4500 6750 9000 11250
Number of Subdomains
0.
0.25
0.5
0.75
1.
1.25
0 225 450 675 900 1125
Number of subdomains
eclampsialemontree.net
slack
sqrrl-lab
anti-virus
Ad servers
UniqueQueries
UniqueQueries
eclampsialemontree.net
• Queries to 284 unique subdomains with names like:
– ykzcpj1j4ovv3nc1mcgg27ji7uzf4o,
yhgir5h3ts3rppd3j3bph1se4rjqtj,
– Pkbenvnzwo2jl2onldka17rv5uu2kd,
– Kinkascic,
– Kinkascie,
– Kinkascig
• Most queried just once, a few 2-4 times
• Length always a multiple of 3, almost always 30 or 9
characters
• Appears to be a malware site that attempts to inject
invisible frames into ads
DNS DGA Detection
DNS
Data
Filter
DNS
Data
Collation
Features
Classifier Risk Outliers
DNS DGA Detection
DNS Data Filter DNS Data
Collation
IP → Domain Session
IP → Domain Session
IP → Domain Session
IP → Domain Session
0.
0.5
1.
1.5
2.
2.5
Requestssent
Time
DNS Session
DNS DGA Classification Features
Features
0.
0.1
0.2
0.2
0.3
0 1 2 3 4 5 6
Day of the week
Histogram for day of the
week
0.
0.04
0.07
0.11
0.14
0.18
0 2 4 6 8 10 12 14 16 18 20 22 24
Hour of the day
Histogram for hour of the day
IP → Domain Session
IP → Domain Session
IP → Domain Session
IP → Domain Session
• Session duration
• Number of unique NxDomains
• Average information content of subdomains
DNS DGA Classification
Classifier Risk Outliers
Features
0.
0.1
0.2
0.2
0.3
0 1 2 3 4 5 6
Day of the week
Histogram for day of the week
0.
0.04
0.07
0.11
0.14
0.18
0 2 4 6 8 10 12 14 16 18 20 22 24
Hour of the day
Histogram for hour of the day
• Session duration
• Number of unique NxDomains
• Average information content of subdomains
DNS DGA Validation
DNS Data Filter DNS Data
ci4u0c10b77f5opvn211n5poa3.com
wiqyhl13dkep615aec27ue2t2t.net
mkguv3bd2hi317d9l8vdy4i6m.org
1xah67i2ayufesns8mh12h1kab.net
17m4oq6jngoka7zxtoq1taebe1.com
Collation
IP → Domain Session
IP → Domain Session
IP → Domain Session
IP → Domain Session
Combined DGA Risk Score
-400
-200
0
200
400
600
800
1000
1200
1400
1600
1800
-400 -200 0 200 400 600 800 1000 1200 1400 1600 1800
CombinedRank
Index
Combined Rank Separation
• Normal
• DGA
Example Tunneling
and DGA Detection
DNS Tunnel
DGA
Graph Investigation
info.sqrrl.com/download-ueba-ebook
User & Entity Behavior Analytics
What's included in this
• What you need to know about advanced behavioral analytics
• How it can automate and revolutionize threat hunting
• How to use it for streamlined threat detection practices
The Heart of Next-Generation Threat Hunting
Questions

Leveraging DNS to Surface Attacker Activity

  • 1.
    Leveraging DNS to SurfaceAttacker Activity March 2017 • Josh Liburdi & Chris McCubbin
  • 2.
    Presenters Chris McCubbin Sqrrl Directorof Data Science Josh Liburdi Sqrrl Security Technologist
  • 3.
    Agenda • Leveraging DNSdata for investigations • DNS-based data science techniques • An example of Tunneling and DGA detection
  • 4.
  • 5.
    What is DNS? Clientneeds to connect to: https://www.sqrrl.com Client's DNS server doesn't know where sqrrl.com is hosted, forwards query to upstream server Upstream DNS server knows sqrrl.com resolves to 104.196.225.76, returns response Client's DNS server caches response, sends response to client Client connects to https://www.sqrrl.com DNS Server https://sqrrl.com 2 3 5 DNS Server 1 4 1 2 3 4 5
  • 6.
    How do attackersuse DNS? • Attackers target DNS – DNS spoofing – DNS reflection • Attackers utilize DNS – Tunneling – Domain Generation Algorithms (DGA) – Dynamic DNS
  • 7.
    Why is DNSdata useful? Threat Detection Opportunity for attacker to leave traceable footprints in your network Incident Investigations Keep track of attacker access in your network
  • 8.
    DNS Tunneling Overview •Data encoded inside of DNS queries is sent to an attacker-controlled server • Used for command and control, data exfiltration • Bypasses common security controls (firewalls, web proxies) Local Network Local DNS Resolver Intermediate DNS Resolver *.tunnel.com DNS Tunnel Server *.tunnel.com DNS Tunnel Client Remote Network
  • 9.
    DNS Tunneling Overview Manyqueries required to transfer moderate amounts of data 1MB transfer would take ~5k domains Tunnels produce patterns paeqcigq.tunnel.com pafich3i.tunnel.com gxqwl0eaytioruga5.tunnel.com Queried DNS domains tend to be unique Assuming no repeats in data, each domain will contain unique labels
  • 10.
    DGA Overview def generate_domain(year,month, day): domain = "" for i in range(16): year = ((year ^ 8 * year) >> 11) ^ ((year & 0xFFFFFFF0) << 17) month = ((month ^ 4 * month) >> 25) ^ 16 * (month & 0xFFFFFFF8) day = ((day ^ (day << 13)) >> 19) ^ ((day & 0xFFFFFFFE) << 12) domain += chr(((year ^ month ^ day) % 25) + 97) return domain Method of establishing a connection with a command and control server Used to protect / hide infrastructure and evade detection Avoids DNS domain blacklisting Malware generates DNS domains based on an algorithm and a seed Seed may be hardcoded or determined dynamically (e.g., current datetime) en.wikipedia.org/wiki/Domain_generation_algorithm# Example
  • 11.
    DGA Overview Source: https://johannesbader.ch/2014/12/the-dga-of-newgoz/ DGAsproduce patterns Visually appear “off” Human would interpret the domain as strange (pmwtrdsv.ru) or nonsensical (turnipboxsea.com) Malware may attempt to resolve many unregistered domains ci4u0c10b77f5opvn211n5poa3.comwiq yhl13dkep615aec27ue2t2t.net kguv3bd2hi317d9l8vdy4i6m.org xah67i2ayufesns8mh12h1kab.net 7m4oq6jngoka7zxtoq1taebe1.com
  • 12.
    DGA Overview Malware Seed# Domains in wild Alureon Thread ID + milliseconds since boot 5/day Padcrypt Date 24/day or 72/day ProsLikeFan Date, hardcoded 100/day Qadars Date 200/day Qakbot Date 5000/day Sisron Date 4/day Source: https://johannesbader.ch/
  • 13.
  • 14.
  • 15.
  • 16.
    DNS Tunnel Detection DNSData Filter DNS Data 0. 0.5 1. 1.5 2. 2.5 NumberofDNS requests Time 1 hour buckets IP + Destination → Domain Session IP + Destination → Domain Session IP + Destination → Domain Session IP + Destination → Domain Session Collation
  • 17.
    • Number ofqueries • Number of subdomains • Average subdomain length • Average information content of subdomains Features DNS Tunnel Classification Features IP + Destination → Domain Session IP + Destination → Domain Session IP + Destination → Domain Session IP + Destination → Domain Session
  • 18.
    • Number ofqueries • Number of subdomains • Average subdomain length • Average information content of subdomains Classifier Risk Outliers Features DNS Tunnel Classification
  • 19.
    DNS Data FilterDNS Data DNS Tunnel Validation paeqcigq.tunnel.com pafich3i.tunnel.com gxqwl0eaytioruga5.tunnel.com IP + Destination → Domain Session IP + Destination → Domain Session IP + Destination → Domain Session IP + Destination → Domain Session Collation
  • 20.
    Lessons Learned fromtesting on Sqrrl DNS data • There are several potential sources of false positives: – CDNs – Anti-virus software – Internal DNS traffic – Popular services (Spotify, Slack, …) • Many of these organize content under long, random-looking subdomain names • Whitelisting can remove some of these false positives • A hard cut requiring > K unique subdomains per user per hour helps significantly
  • 21.
    Sqrrl traffic datafeature plots 0 45 90 135 180 0 2250 4500 6750 9000 Number of Subdomains Phishing YouTube, Amazon AWS, CDNs, anti-virus, anti-spam sqrrl-lab.net slack-msgs.com AverageLength
  • 22.
    Sqrrl traffic datafeature plots 0. 0.25 0.5 0.75 1. 1.25 0 2250 4500 6750 9000 11250 Number of Subdomains 0. 0.25 0.5 0.75 1. 1.25 0 225 450 675 900 1125 Number of subdomains eclampsialemontree.net slack sqrrl-lab anti-virus Ad servers UniqueQueries UniqueQueries
  • 23.
    eclampsialemontree.net • Queries to284 unique subdomains with names like: – ykzcpj1j4ovv3nc1mcgg27ji7uzf4o, yhgir5h3ts3rppd3j3bph1se4rjqtj, – Pkbenvnzwo2jl2onldka17rv5uu2kd, – Kinkascic, – Kinkascie, – Kinkascig • Most queried just once, a few 2-4 times • Length always a multiple of 3, almost always 30 or 9 characters • Appears to be a malware site that attempts to inject invisible frames into ads
  • 24.
  • 25.
    DNS DGA Detection DNSData Filter DNS Data Collation IP → Domain Session IP → Domain Session IP → Domain Session IP → Domain Session 0. 0.5 1. 1.5 2. 2.5 Requestssent Time DNS Session
  • 26.
    DNS DGA ClassificationFeatures Features 0. 0.1 0.2 0.2 0.3 0 1 2 3 4 5 6 Day of the week Histogram for day of the week 0. 0.04 0.07 0.11 0.14 0.18 0 2 4 6 8 10 12 14 16 18 20 22 24 Hour of the day Histogram for hour of the day IP → Domain Session IP → Domain Session IP → Domain Session IP → Domain Session • Session duration • Number of unique NxDomains • Average information content of subdomains
  • 27.
    DNS DGA Classification ClassifierRisk Outliers Features 0. 0.1 0.2 0.2 0.3 0 1 2 3 4 5 6 Day of the week Histogram for day of the week 0. 0.04 0.07 0.11 0.14 0.18 0 2 4 6 8 10 12 14 16 18 20 22 24 Hour of the day Histogram for hour of the day • Session duration • Number of unique NxDomains • Average information content of subdomains
  • 28.
    DNS DGA Validation DNSData Filter DNS Data ci4u0c10b77f5opvn211n5poa3.com wiqyhl13dkep615aec27ue2t2t.net mkguv3bd2hi317d9l8vdy4i6m.org 1xah67i2ayufesns8mh12h1kab.net 17m4oq6jngoka7zxtoq1taebe1.com Collation IP → Domain Session IP → Domain Session IP → Domain Session IP → Domain Session
  • 29.
    Combined DGA RiskScore -400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 -400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 CombinedRank Index Combined Rank Separation • Normal • DGA
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    info.sqrrl.com/download-ueba-ebook User & EntityBehavior Analytics What's included in this • What you need to know about advanced behavioral analytics • How it can automate and revolutionize threat hunting • How to use it for streamlined threat detection practices The Heart of Next-Generation Threat Hunting
  • 35.

Editor's Notes

  • #6 Phonebook for the Internet Use a DNS domain name to look up an IP address You can’t stop DNS Protocol details Runs on UDP (stateless) Queries recursively propagate until an answer is determined Server provides time-to-live (TTL) Determines how long answer should be cached
  • #8 potentially mention Threat Intelligence Monitor attacker infrastructure from afar
  • #18 Number of queries Should be large for tunnels Number of subdomains Should be large and equal to or approaching number of queries Average subdomain length Should be large for tunnels Average information content of subdomains Should be higher for tunnels
  • #19 Classify outlier-ness using a multivariate Bayesian classifier Assigns a ranking score for each detection candidate triple (source, destination, time) For each classifier feature (number of queries, subdomains, avg. length and info), determine the probability of that feature value among all observed traffic Greater outliers are given higher ranks Final risk score depends on the rank, the expected rate of attacks, and the time span of the analyzed data
  • #20 To test the detector, we use the Sqrrl DNS data We “inject” tunnels, or add them with the logs for regular traffic We can vary subdomain lengths, have tried ~ 10 - max character in length Typically include ~ 500 - 10,000 queries in a tunnel injection The system finds all the injected tunnels In the Sqrrl data, we typically have two false positives due to sophosxl AV software on two separate computers BUT, these look very similar to tunneling activity
  • #25 Detection based on classifying sessions (source IP, time interval) Destination is a primary domain Can eliminate all legitimate primary domains before sessionization For each session, compute feature vector Make an assumption that most DGA requests do not exist in DNS (NxDomain)
  • #26 Detection based on classifying triples (source IP, destination, time interval) Destination is a “registered domain” - usually a TLD plus next level google.com guardian.co.uk mysite.cloudfront.net Use records of DNS requests for subdomains under each registered domain. E.g. “maps”, docs”, “mail”, “mymap.maps” might be subdomains of “google.com” For each triple, compute feature vector to quantify properties of the subdomains under that registered domain We can ignore queries for registered domains with no subdomain - no subdomain means there can’t be any encoded message Can reasonably whitelist domains of major sites
  • #27 Session Duration Number of unique NxDomains Should be large Time of Day and Day of Week DGAs are not constrained to normal work hours Average information content of subdomains Should be higher for DGA
  • #28 Multi-classifier approach One classifier for each of three focus areas Combine results of classifiers in to a final risk score Domain Classifier How unusual given domain name in comparison to other domains seen in normal traffic? Record Classifier How unusual given DNS record? Session Classifier How unusual given DGA session?
  • #29 Bro logs of 90 days of Sqrrl DNS traffic Inject data with real DGA records Domains generated from real DGA reverse engineered code Model real DGA timing