Accurately detect domain generation algorithm (DGA) activity using the Elastic Stack by deploying a pre-trained supervised machine learning model to enrich Packetbeat data at ingest and anomaly detection to improve accuracy and pinpoint malicious hosts. We will walk through the configuration and then deep dive into what each component is doing, how the model was trained, and how the model can detect real-world malware activity.
Using machine learning to detect DGA with >99.9% accuracy
1. 1
Using machine learning to
detect DGA with >99.9%
accuracy
Steve Dodson
Tech Lead, Machine Learning
2. 2
This presentation and the accompanying oral presentation contain forward-looking statements, including statements
concerning plans for future offerings; the expected strength, performance or benefits of our offerings; and our future
operations and expected performance. These forward-looking statements are subject to the safe harbor provisions
under the Private Securities Litigation Reform Act of 1995. Our expectations and beliefs in light of currently
available information regarding these matters may not materialize. Actual outcomes and results may differ materially
from those contemplated by these forward-looking statements due to uncertainties, risks, and changes in
circumstances, including, but not limited to those related to: the impact of the COVID-19 pandemic on our business
and our customers and partners; our ability to continue to deliver and improve our offerings and successfully
develop new offerings, including security-related product offerings and SaaS offerings; customer acceptance and
purchase of our existing offerings and new offerings, including the expansion and adoption of our SaaS offerings;
our ability to realize value from investments in the business, including R&D investments; our ability to maintain and
expand our user and customer base; our international expansion strategy; our ability to successfully execute our
go-to-market strategy and expand in our existing markets and into new markets, and our ability to forecast customer
retention and expansion; and general market, political, economic and business conditions.
Additional risks and uncertainties that could cause actual outcomes and results to differ materially are included in
our filings with the Securities and Exchange Commission (the “SEC”), including our Annual Report on Form 10-K for
the most recent fiscal year, our quarterly report on Form 10-Q for the most recent fiscal quarter, and any
subsequent reports filed with the SEC. SEC filings are available on the Investor Relations section of Elastic’s
website at ir.elastic.co and the SEC’s website at www.sec.gov.
Any features or functions of services or products referenced in this presentation, or in any presentations, press
releases or public statements, which are not currently available or not currently available as a general availability
release, may not be delivered on time or at all. The development, release, and timing of any features or functionality
described for our products remains at our sole discretion. Customers who purchase our products and services
should make the purchase decisions based upon services and product features and functions that are currently
available.
All statements are made only as of the date of the presentation, and Elastic assumes no obligation to, and does not
currently intend to, update any forward-looking statements or statements relating to features or functions of services
or products, except as required by law.
Forward-Looking Statements
3. Overview
• Intro: Domain Generation Algorithms (DGAs)
• Training a supervised model to detect DGA activity
• Deploying a supervised model to detect DGA activity
• Anomaly detection + supervised learning
8. Domain Generation Algorithms (DGAs)
infected host command and
control (c2)
server
Dynamically generated domain names
003zzy.com
103yzy.com
203xzy.com
303wzy.com
403vzy.com
503uzy.com
603tzy.com
703szy.com
803rzy.com
903qzy.com
a03izy.com
b03hzy.com
c03gzy.com
d03fzy.com
e03ezy.com
f03dzy.com
...
DNS resolver
103yzy.com
107.183.127.132
107.183.127.132
107.183.127.132
9. Domain Generation Algorithms (DGAs)
• Domains only need to be registered when needed
• Blocklists become infeasible
– Domains are typically pseudo-random strings seeded by variables such as:
• Time
• Daily trending twitter hashtag
• Insignificant digits of foreign exchange rate
• Weather temperature
– Huge number of potential domains (DGArchive has >100 million domain names)
• Detection via Machine Learning
– Unsupervised clustering e.g. NXDOMAIN responses, domain name trigrams
– Supervised modeling e.g. classification via LSTM, CNN or RNN networks
10. 10
Collect DNS Data
packetbeat
Store
ingest node
data node
Predict DGA
generated
domain name
machine
learning
machine
learning
ml node Identify
anomalous
DGA clients
DGA Detection Using the Elastic Stack
Supervised + Unsupervised Machine Learning
11. 11
Collect DNS Data
packetbeat
ingest
node
Predict DGA
generated
domain name
{
"@timestamp": "2016-04-24T05:27:21.276Z",
"query": "class IN, type A, e5353.g.akamaiedge.net",
"type": "dns",
"client": {
"ip": "172.31.1.6"
},
"dns": {
"type": "answer",
"op_code": "QUERY",
"question": {
"name": "e5353.g.akamaiedge.net",
"type": "A",
"class": "IN",
"etld_plus_one": "akamaiedge.net",
"registered_domain": "akamaiedge.net",
"top_level_domain": "net",
"subdomain": "e5353.g"
},
"response_code": "NOERROR"
},
...
}
{
"@timestamp": "2016-04-24T05:27:21.276Z",
"query": "class IN, type A, e5353.g.akamaiedge.net",
"type": "dns",
"client": {
"ip": "172.31.1.6"
},
"dns": {
"type": "answer",
"op_code": "QUERY",
"question": {
"name": "e5353.g.akamaiedge.net",
"type": "A",
"class": "IN",
"etld_plus_one": "akamaiedge.net",
"registered_domain": "akamaiedge.net",
"top_level_domain": "net",
"subdomain": "e5353.g"
},
"response_code": "NOERROR"
},
"ml_is_dga" : {
"malicious_probability" : 0.000462264881252894,
"malicious_prediction" : 0
},
...
}
DGA Detection Using the Elastic Stack
Create supervised model to predict probability of DGA generated domain name
12. Malware family Number of examples
tinba 93759
banjori 72443
emotet 52496
gameover 36344
necurs 25487
rovnix 24541
ramnit 19422
qakbot 18693
murofet 16791
simda 10972
pykspa2s 10719
ranbyus 7983
virut 6049
urlzone 6014
dyre 4269
cryptolocker 3236
... ...
• 437554 benign domains (+ DNS responses) based on first
437554 domains in https://tranco-list.eu/list/6WKX/1000000
– tranco attempts to outcome issues with alexa and other
top-n lists
• 437555 malicious domains (+ DNS responses) based on:
– data from
https://data.netlab.360.com/feeds/dga/dga.tx
t retrieved on 2020-07-06
– data generated by scripts based on reverse engineering
malware
– 75 different malware families
– proportions of different malware families follow the rates
of occurence of malware family in threat feeds
DGA Detection Using the Elastic Stack
Step 1 Curate training data
13. DGA Detection Using the Elastic Stack
...
"dns": {
"question": {
"name": "003zzy.com",
"registered_domain": "003zzy.com",
"top_level_domain": "com"
},
"response_code": "NXDOMAIN"
}
}
...
raw packetbeat
data
Feature Description
0, 0, 3,
z, z, y Unigrams of sld
00, 03,
3z, zz, zy Bigrams of sld
003, 03z,
3zz, zzy Trigrams of sld
com Top level domain
NXDOMAIN DNS response code
select and extract
features
second level domain (sld) == 003zzy
elastic ML automatically
encodes categorical
features
one-hot encoding
target mean encoding
frequency encoding
0.3876153631
0.8477736242
0.175098397
0.5827405692
0.6615046734
0.775009638
0.4144211703
0.900513846
0.8523824824
0.4195915404
0.4521752463
0.115962451
0.7900194414
0.1870883196
0.2410644705
0.7115350244
0.1072127385
0.07823202264
0.4194384261
0.9138066365
0.8286342599
0.4064033259
0.347854925
0.5696505436
0.3060119362
0.4939044746
0.6141223411
0.5219034882
0.303752633
0.2939497085
0.5750901193
0.9182710501
0.168996351
0.02299813442
0.4614626745
0.7587818661
0.4652189072
0.1734316993
0.04630413582
0.8822958048
0.539031397
...
Step 2 Feature engineering
14. DGA Detection Using the Elastic Stack
Step 3 Train the model
expanded
packetbeat data
create and run data
frame analytics job
{
...
"f": {
"tld": "com",
"b0": "3i",
"b1": "in",
"b2": "n3",
"b3": "3z",
"b4": "zs",
...
},
"dns": {
"question": {
"registered_domain": "3in3zs114mia1dj768i11s67en.com",
"top_level_domain": "com",
"etld_plus_one": "3in3zs114mia1dj768i11s67en.com",
"name": "3in3zs114mia1dj768i11s67en.com",
"type": "A",
"class": "IN"
}
},
"is_malicious": 1,
...
}
15. • Training details
– 875,109 rows
– 185 categorical features which mapped
to 207 numeric features
– Model training took ~10 hours on gcp
c2-standard-8 8 vCPUs, 32 GB
memory)
– Model training required ~3GB memory
• Model training accuracy
DGA Detection Using the Elastic Stack
Step 4 Evaluate and test the model
0 1
0 437394 161
1 160 437394
0 1
0 99.96 0.04
1 0.04 99.96
16. • Test data
– 997,301 benign domains (tranco)
– 35,451,973 malicious domains (DGArchive, netlab360 feeds for 20 days)
• Confusion matrix (99.7% accuracy)
DGA Detection Using the Elastic Stack
Step 4 Evaluate and test the model
0 1
0 99.34 0.29
1 0.66 99.71
20. • Combine supervised modeling and unsupervised anomaly detection:
– Supervised model enriches data with probable DGA activity
– Time series anomaly detection can detect clients that have unusual DGA activity
compared to the population
DGA Detection Using the Elastic Stack
Improving accuracy and operationalising DGA detection
23. 23
Place a quote from someone really, really
important and it will shrink to fit this space.
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Author Name Here
26. 26
Safe Harbor Statement
This presentation includes forward-looking statements that are
subject to risks and uncertainties. Actual results may differ
materially as a result of various risk factors included in the reports
on the Forms 10K, 10Q, and 8K, and in other filings we make with
the SEC from time to time. Elastic undertakes no obligation to
update any of these forward-looking statements.
28. Bullet title (Inter 24 pt)
• Try to keep your use of bullet slides to a minimum
• Be creative and think visually
• If you need to source something copy and paste the text box at the
bottom left onto your page
Subtitle sentence case (Inter 18pt)
29. Bullet slide title treatment can be up to two lines in
length (Inter bold 24 pt)
Subtitle sentence case (Inter 18pt)
Bullet slide title treatment can be up to two lines in
length (Inter bold 24 pt)
• Bullets are sentence case (Inter 18pt)
– Second-line bullets are Inter 14pt
• Third-line bullets are Inter 12pt
• Limit the number of bullets on a slide
• Text highlights are orange, but not underlined
• Try not to go below the recommended font sizes
30. Bullet title (Inter 24 pt)
• Try to keep your use of bullet slides to a minimum
• Be creative and think visually
• If you need to source something copy and paste the text box at the
bottom left onto your page
Subtitle sentence case (Inter 18pt)
31. Bullet slide title treatment can be up to two lines in
length (Inter bold 24 pt)
• Bullets are sentence case (Inter 18pt)
‒ Second-line bullets are Inter 14pt
‒ Third-line bullets are Inter 12pt
• Limit the number of bullets on a slide
• Text highlights are orange, but not underlined
• Try not to go below the recommended font sizes
Subtitle sentence case (Inter 18pt)
32. Place a quote from someone
really, really important and it will
shrink to fit this space…
Author Name Here
33. Author Name Here
Place a quote from someone
really, really important and it will
shrink to fit this space…
34. Chart Slide With Multiple Colors
Sub-title or chart title here in sentence case
35. Chart Slide With Multiple Colors
Sub-title or chart title here in sentence case
36. Pie Chart Slide With Multiple Colors
Sub-title or chart title here in sentence case
62%
Supporting text
goes here under
the number
62%
Supporting text
goes here under
the number
37. Pie Chart Slide With Multiple Colors
Sub-title or chart title here in sentence case
38. Transition Slide Title Goes
Here and Can Be a Few
Lines Long
Subtitle goes here in sentence
case
39. Transition Slide Title Goes
Here and Can Be a Few
Lines Long
Subtitle goes here in sentence
case
40. Transition Slide Title Goes
Here and Can Be a Few
Lines Long
Subtitle goes here in sentence
case
42. 1M 1M 1M
HEADER HERE
Supporting text
goes here under
the number
HEADER HERE
Supporting text
goes here under
the number
HEADER HERE
Supporting text
goes here under
the number
Big Number Treatment
43. 1M 1M 1M
HEADER HERE
Supporting text
goes here under
the number
HEADER HERE
Supporting text
goes here under
the number
HEADER HERE
Supporting text
goes here under
the number
Big Number Treatment (Dark Mode)
44. Table Layout Treatment
Subtitle text placeholder sentence case
HEADER HEADER HEADER HEADER
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Option 1
45. Table Layout Treatment
Subtitle text placeholder sentence case
HEADER HEADER HEADER HEADER
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Option 2
46. Table Layout Treatment
Subtitle text placeholder sentence case
HEADER HEADER HEADER HEADER
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Option 3
47. Table Layout Treatment
Subtitle text placeholder sentence case
HEADER HEADER HEADER HEADER
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Information Information Information Information
Option 4
48.
49. Please use this area
for content, screen
shot, or quote; the
next few slide show
examples
50. Please use this area
for content, screen
shot, or quote; the
next few slide show
examples
51. We mine and analyze
4 billion events every
day to detect security
hacks and threats.
52. We mine and analyze
4 billion events every
day to detect security
hacks and threats.
55. 55
With organic logging growing 50%
year over year, and monitoring
infrastructure spend at nearly 10%,
one rogue log can ruin the platform.
The checks and balances necessary
to make sure we don’t hit that
roadblock are built with the Elastic
Stack and Beats.
TEXT GOES HERE IN ALL CAPS
Additional text goes here to support the content and can
be a couple lines in length and sits bottom left aligned
56. 56
With organic logging growing 50%
year over year, and monitoring
infrastructure spend at nearly 10%,
one rogue log can ruin the
platform. The checks and balances
necessary to make sure we don’t
hit that roadblock are built with the
Elastic Stack and Beats.
TEXT GOES HERE IN ALL CAPS
Additional text goes here to support the content and can
be a couple lines in length and sits bottom left aligned
57. ”
The Elastic Stack is critical to us. Every day
millions of users and customers worldwide
trust Box to execute mission-critical
business functions.
“
59. You can use
this area for a
text treatment
that supports
your chosen
imagery
60. You can use
this area for a
text treatment
that supports
your chosen
imagery
61. Slide Title Here With
a Few Bullets
Subtitle goes here
• Bullet one goes here in
sentence case and no period
• Bullets should be kept short
and sweet; stay focused
• Use bullets to help break up
content that you need to
have on the screen
62. Slide Title Here With
a Few Bullets
Subtitle goes here
● Bullet one goes here in
sentence case and no
period
● Bullets should be kept short
and sweet; stay focused
● Use bullets to help break up
content that you need to
have on the screen
63. Slide Title Here
With Key Points
Subtitle goes here
Header Here
Body copy goes here and just increase
the indent level to get to the proper
formatting
Header Here
Body copy goes here and just increase
the indent level to get to the proper
formatting
Header Here
Body copy goes here and just increase
the indent level to get to the proper
formatting
Header Here
Body copy goes here and just increase
LOGGING METRICS APM
ADVANCED
SEARCH
SECURITY
ANALYTICS
DATA
SCIENCE
FOUNDATIONSPECIALIZATIONS
64. Slide Title Here
With Key Points
Subtitle goes here
Header Here
Body copy goes here and just
increase the indent level to get to
the proper formatting
Header Here
Body copy goes here and just
increase the indent level to get to
the proper formatting
Header Here
Body copy goes here and just
increase the indent level to get to
the proper formatting
Header Here
Body copy goes here and just
65. Image Treatment With Caption Layout
How to add your own photos and crop properly…
Your image will populate the
container but you will likely need
to adjust the crop. Double click
on the image to adjust. Use the
blue dots to adjust the size.
Click on the grayed out portion
of the image and drag to the
left or right until you are happy
with the crop.
1 2 3Right click on the image and go
to replace image. Select a new
image from your machine.
66. You can use
this area for a
text treatment
that supports
your chosen
imagery
67. Agenda Slide
Use color to highlight
Enter title for section one here and use sentence case1
Enter title for section three here and use sentence case3
Enter title for section four here and use sentence case4
Enter title for section five here and use sentence case5
Enter title for section two here and use sentence case2
Option 1ANOTE THIS SLIDE IS NOT IN THE LAYOUT OPTIONS.
ALWAYS START A NEW PRESENTATION USING THE
CORPORATE TEMPLATE AND ADD YOUR CONTENT
TO THIS SLIDE.
68. Agenda Slide
Use color to highlight
Enter title for section one here and use sentence case1
Enter title for section three here and use sentence case
Enter title for section four here and use sentence case
Enter title for section five here and use sentence case
Enter title for section two here and use sentence case2
Option 1BNOTE THIS SLIDE IS NOT IN THE LAYOUT OPTIONS.
ALWAYS START A NEW PRESENTATION USING THE
CORPORATE TEMPLATE AND ADD YOUR CONTENT
TO THIS SLIDE.
3
4
5
69. Agenda Slide
Use color to highlight
Enter title for section one here and use sentence case
Enter title for section two here and use sentence case
Enter title for section three here and use sentence case
Enter title for section four here and use sentence case
Enter title for section five here and use sentence case
1
2
3
4
5
Option 2NOTE THIS SLIDE IS NOT IN THE LAYOUT OPTIONS.
ALWAYS START A NEW PRESENTATION USING THE
CORPORATE TEMPLATE AND ADD YOUR CONTENT
TO THIS SLIDE.
70. Agenda Slide
Use color to highlight
Enter title for section one here and use sentence case
Enter title for section two here and use sentence case
Enter title for section three here and use sentence case
Enter title for section four here and use sentence case
Enter title for section five here and use sentence case
1
2
3
4
5
Option 3NOTE THIS SLIDE IS NOT IN THE LAYOUT OPTIONS.
ALWAYS START A NEW PRESENTATION USING THE
CORPORATE TEMPLATE AND ADD YOUR CONTENT
TO THIS SLIDE.
71. Agenda Slide
Use color to highlight
Enter title for section one here and use sentence case
Enter title for section two here and use sentence case
Enter title for section three here and use sentence case
Enter title for section four here and use sentence case
Enter title for section five here and use sentence case
1
2
3
4
5
Option 4NOTE THIS SLIDE IS NOT IN THE LAYOUT OPTIONS.
ALWAYS START A NEW PRESENTATION USING THE
CORPORATE TEMPLATE AND ADD YOUR CONTENT
TO THIS SLIDE.
72. Process Diagram Treatment, 5 Ideas
See style page for more color options
1 2 3 4 5
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
73. Process Diagram Treatment, 5 Ideas + Highlight
See style page for more color options
1 2 3 4 5
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
74. Process Diagram Treatment, 4 Ideas
See style page for more color options
Supporting text
goes here under
the number
1 2 3 4
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
75. Process Diagram Treatment, 4 Ideas
See style page for more color options
Supporting text
goes here under
the number
1 2 3 4
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
76. Process Diagram Treatment, 3 Ideas
See style page for more color options
Supporting text
goes here under
the number
1 2 3
Supporting text
goes here under
the number
Supporting text
goes here under
the number
77. Process Diagram Treatment, 3 Ideas
See style page for more color options
Supporting text
goes here under
the number
1 2 3
Supporting text
goes here under
the number
Supporting text
goes here under
the number
78. Process Diagram Treatment, 5 Ideas
See style page for more color options
1 2 3 4
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
5
Supporting text
goes here under
the number
79. Process Diagram Treatment, 5 Ideas + Highlight
See style page for more color options
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
1 2 3 4 5
80. Process Diagram Treatment, 4 Ideas
See style page for more color options
1 2 3 4
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
81. Process Diagram Treatment, 3 Ideas
See style page for more color options
1 2 3
Supporting text
goes here under
the number
Supporting text
goes here under
the number
Supporting text
goes here under
the number
82. Title Here Title Here Title Here
• One bullet here
• Two bullet here
• Three bullet here
• One bullet here
• Two bullet here
• Three bullet here
• One bullet here
• Two bullet here
• Three bullet here
Box With Bullet Treatment
83. Title Here Title Here Title Here
• One bullet here
• Two bullet here
• Three bullet here
• One bullet here
• Two bullet here
• Three bullet here
• One bullet here
• Two bullet here
• Three bullet here
Box With Bullet Treatment with Color Choice
84. • One bullet here
• Two bullet here
• Three bullet here
Title Here
• One bullet here
• Two bullet here
• Three bullet here
Title Here
• One bullet here
• Two bullet here
• Three bullet here
Title Here
Box Bullet Treatment
85. • One bullet here
• Two bullet here
• Three bullet here
• One bullet here
• Two bullet here
• Three bullet here
• One bullet here
• Two bullet here
• Three bullet here
Title Here Title Here Title Here
Box Bullet Treatment with Color Scheme