Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep DGA: Adversarially-Tuned Domain Generation and Detection


Published on

Endgame data scientists present the first known Deep Learning architecture to pseudo-randomly generated domain names. They demonstrate that adversarially-crafted domains names targeting a DL model are also adversarial for an independent external classifier

Published in: Technology
  • Be the first to comment

Deep DGA: Adversarially-Tuned Domain Generation and Detection

  1. 1. November 4, 2016 DEEPDGA: ADVERSARIALLY-TUNED DOMAIN GENERATION AND DETECTION Bobby Filar -> @filar Hyrum Anderson Jonathan Woodbridge AISec2016
  2. 2. Outline § Motivation § Background § DeepDGA Architecture § Experiment(s) Setup § Results § Future Work 2
  3. 3. Motivations § Can we Red team vs. Blue team a known infosec problem (DGAs) leveraging Generative Adversarial Networks (GAN)? § Offensive: Leverage GANs to construct a deep-learning DGA designed to bypass an independent classifier. § Defensive: Can adversarially generated domains augment/improve training data to harden independent classifier??? 3
  4. 4. Related Work § Recent work in adversarial examples • Explaining and harnessing adversarial examples (Goodfellow, 2015) • Adversarial perturbations against DNN for Malware Classification (Papernot, 2016) § Key differences between other domains and INFOSEC: • Other domains – Make my model robust to occasional blind spot examples that it might come across in the wild • Information Security – Discover and plug holes in my model that the adversary is actively trying to discover and exploit (Red vs. Blue) 4 Fast gradient sign method (Goodfellow) “What is the cost of changing X’s label to a different y?”
  5. 5. Background à Domain-Generated Algorithm § Employed by malware families to bypass common C2 defenses § DGAs take a seed input and generate large amounts of pseudo-random domain names § Subset of domains registered command and control (C2) servers § Botnets and malware iterate through generated domains until it finds one that is registered, connects and establishes C2 channel § Asymmetric attack since defender must know all possible domains to blacklist 5 DNS C2 bjgkre.com212.211.123.01 NXDomain NXDomain
  6. 6. Domain-Generated Algorithm à Cryptolocker Example 6
  7. 7. Domain-Generated Algorithm à Character Distributions 7 § DGA char dist + ML == robust defense? § Cryptolocker and ramnit are both nearly uniform over same range • Expected; Calculations on a single seed § Suppobox concatenates random words from English dictionary thus reflects the distribution of Alexa 1M § Much more difficult for prior DGA detection models to correctly classify § Our goal is build a character-based generator that mimics the Alexa domain name distribution
  8. 8. AUTOENCODERS § Data compression algorithm § Models consist of encoder, decoder, and loss function • encoder — transforms input to a low-dimension embedding (lossy compression) • decoder — reconstruct original input from encoder (decompression) § Goal: minimize distortion between reconstructed output and original input § Easy to train; Don’t need labels (unsupervised) GENERATIVE ADVERSARIAL NETWORKS § Adversarial game between two models • generator— seeks to create synthetic data based on samples from the true data distribution (w/ added noise) • discriminator — receives sample and must determine if it is a synthetic (from generator) or true data sample § Goal: Find an equilibrium similar to Nash Equilibrium by pitting models against one another § Harder to train; Unsupervised • lots of failure modes 8 Background à Frameworks
  9. 9. DeepDGA Architecture
  10. 10. DeepDGA à Autoencoder à Encoder § Encoder architecture taken from [Kim et al, 2015], found useful in character-level language modeling § Embedding learns linear mapping for each valid domain character (20 dimension space) § Convolutions filters applied to capture character combos (bi/trigrams) § Max-pooling over-time & over-filter • Gather fixed-representation § Highway Network à LSTM 10 Learn the right representation of Alexa domains
  11. 11. DeepDGA à Autoencoder à Decoder § Decoder is ~ the reverse of encoder minus maxpool step § Domain embedding is repeated over max length domain length (time-steps) § Sequence is passed to LSTM à Highway Network à Convolutional Filters § Softmax activation on final layer produces a multinomial distribution over domain characters § Sampled to generate new domain name modeled after the input domain name. 11
  12. 12. DeepDGA à GAN § Simply rewire autoencoder framework as the base of our GAN • Accepts random seed as input • Outputs domains much like valid domain name § Box Layer — restricts output to live in axis-aligned box defined by embedding vectors of training data • Parameterize manifold coords of legit domains • Box layer used in generator to ensure it only learns domains in the legit domain (Alexa-like) manifold 12
  13. 13. DeepDGA à History Regularization § Regularize the discriminator model by training on both recently generated samples, but also sampled domains from prior adversarial rounds § Helps discriminator “remember” any deficiencies in model coverage AND forces discriminator to learn novel domain embeddings § Reduces likelihood for generator collapsing (i.e. generating same domain every batch) 13
  14. 14. DeepDGA à Walkthrough 14 0.1 … x … … … … … random seed generatordetector Move 1: Red Team train generator to randomly create impostors that trick the detector
  15. 15. DeepDGA à Walkthrough 15 0.1 … x … … … … … generatordetector Move 2: Blue Team train detector to distinguish real domains from generator’s impostors
  16. 16. DeepDGA à Walkthrough 16 0.1 … x … … … … … random seed generatordetector
  17. 17. DeepDGA à Walkthrough 17 0.1 … x … … … … … detector generator (encoder) (decoder)
  18. 18. DeepDGA à Walkthrough 0.1 … x … … … … … random seed generatordetector 18
  19. 19. AUTOENCODED DOMAINS <input domain> à <output domain> clearspending à clearspending synodos à synodos 3walq à 3walq kayak à kayak sportpitvl à sportpitvl 7resume à 7resume templateism à templateism spielefuerdich à spielefueddrch firebaseapp à firepareapp gilliananderson à gilliadandelson tonwebmarketing à torwetmarketing thetubestore à thebubestore infusion à infunion akorpasaji à akorpajaji hargonis à harnonis GAN-GENERATED DOMAINS 19 DeepDGA à Generated Domains
  20. 20. Experiment Setup & Results
  21. 21. Experiment Setup § Datasets • Alexa Top 1M • DGA Family datasets • All Open Source § Training Time • DeepDGA (Autoencoder & GAN) implemented in Keras (python DL library) • Auto encoder pretrained for 300 epochs · each epoch 256K domains randomly sampled · batch size of 128 · 14 hours on NVIDIA Titan X GPU • Each adversarial round generated 12.8K samples against detector · @ 7 mins on GPU per round 21
  22. 22. Experiment Setup à Offensive § Red Team: DeepDGA vs. External Classifier § Random Forest model (sklearn – python) • ensemble classifier more resistant to adversarial attacks due to low variance § Handcrafted feature extraction • domain length • entropy of character distribution • vowel-to-constant ratio • n-grams § Model trained on Alexa top 10K vs. DeepDGA • Results averaged over 10-fold CV 22 Trained explicitly to catch DeepDGA and only DeepDGA
  23. 23. DeepDGA vs. External Classifier 23 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% accuracy (%) (trained to catch 11 DGA families, equally represented in training set )
  24. 24. DeepDGA à Character Distributions § Earlier we compared DGA families and Alexa 1M character distributions. • Anomalous distributions were easy to identify § DeepDGA character distributions pre- adversarial rounds also appear anomalous. § But… post-adversarial rounds begin to resemble Alexa 1M (still not perfect) § Character distribution would confound previously important features • Entropy • Vowel-to-consonant ratio • n-grams 24
  25. 25. Experiment Setup à Defensive § The core of this research was to determine if adversarial examples could harden an independent classifier. § Augmented training dataset w/ adversarial domains generated by GAN. § In theory, model can be hardened against previously unobserved families (in training set) § Employed LOO strategy which entire DGA family was held out for validation • Baseline – Model trained on other 9 families + Alexa Top 10k • Hardened – Repeated process w/ + DeepDGA (malicious) 25 Binary Classification Before/After Adversarial Hardening TPR @ a fixed 1% FPR
  26. 26. Summary § Contributions • Present the first known Deep Learning architecture to pseudo-randomly generate domain names • Demonstrate that adversarially-crafted domains names targeting a DL model are also adversarial for an independent external classifier • At least experimentally, those same adversarial samples can be used to augment a training set and harden an independent classifier § Hard problems • GANs are hard! à Adversarial game construction • Carefully watch FP rate · A dataset overloaded w/ augmented DGAs can increase the FP rate · Model tries to learn that these “realistic” domain names are possibly malicious § Future Work • Network – improving domain name generation (DGA) and detection • Strengthen Malware Classification models · Malicious WinAPI sequences · Adversarially-tuned static feature vectors 26
  27. 27. Questions?