Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

10 - IDNOG04 - Enrico Hugo (Indonesia Honeynet Project) - The Rise of DGA Malwares

241 views

Published on

10 - IDNOG04 - Enrico Hugo (Indonesia Honeynet Project) - The Rise of DGA Malwares

Published in: Internet
  • Login to see the comments

  • Be the first to like this

10 - IDNOG04 - Enrico Hugo (Indonesia Honeynet Project) - The Rise of DGA Malwares

  1. 1. THE RISE OF DGA MALWARES ENRICO HUGO, S.KOM. , CEH IDNOG 4TH CONFERENCE | 27 JULY 2017 | JAKARTA, INDONESIA
  2. 2. AGENDA • Distributed Denial of Service • Botnet Architectures • Domain Generation Algorithm • DGA Detection Techniques • Reverse Engineering • Zipf’s Law • Maximum Consonant Sequence Length • Hierarchical Clustering
  3. 3. DISTRIBUTED DENIAL OF SERVICE
  4. 4. DISTRIBUTED DENIAL OF SERVICE • DDoS is the current threat as seen on recent news on cyber attacks • Mirai, for example, employs millions of infected network devices to perform DDoS • These devices form a network of zombies or bots, so-called “botnet” • The botnet(s) is/are controlled by a person or a group of people known as “botmaster(s)” • Botmasters issue commands to the botnet after the bots have successfully established connections to the Command-and-Control (C&C) server(s)
  5. 5. BOTNET ARCHITECTURES
  6. 6. STAR TOPOLOGY
  7. 7. MULTI SERVER C&C TOPOLOGY
  8. 8. HIERARCHICAL TOPOLOGY
  9. 9. RANDOM OR PEER-TO-PEER TOPOLOGY
  10. 10. BOTNET C&C LOOKUP • Botnet establishes connection with its C&C server by first looking up the IP address of its C&C server • Regardless of its architecture / topology, botnets mostly use fluxing • There are two types of fluxing: • IP Flux • Domain Flux
  11. 11. IP FLUX • A single Fully Qualified Domain Name (FQDN) associated with many constantly-changing IP addresses • There are two types of IP Fluxing techniques: • Single Flux • Double Flux
  12. 12. DOMAIN FLUX • Many FQDNs resolve to a single IP address • Most of the time this IP address is the IP address of the proxy, not the actual C&C server • One of the most popular techniques nowadays is the Domain Generation Algorithm (DGA)
  13. 13. DOMAIN GENERATION ALGORITHM
  14. 14. DEFINITION Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers.
  15. 15. CHARACTERISTICS • NXDOMAIN responses • Usually random on the 2LD or 3LD domains • A lot of requests from the same IP address • Ranges from completely unreadable words (not compliant to Zipf’s Law) to dictionary words (harder to detect).
  16. 16. MALWARES USING DGA • Kraken • Conficker • Gameover Zeus • Pykspa • Cryptolocker • Dyre • Darkshell • Locky • Mad Max • PandaBanker • Pushdo • Ramnit • Srizbi • Torpig • Virut • etc.
  17. 17. DGA DETECTION TECHNIQUES • Reverse Engineering (Generating Regular Expressions for DGA Detection) • Zipf’s Law (Detecting the Existence of DGA within Log Files) • Maximum Consonant Sequence Length (Detecting the DGA within Log Files) • Hierarchical Clustering (Clustering Log Files)
  18. 18. REVERSE ENGINEERING DGA DETECTION TECHNIQUES
  19. 19. DGARCHIVE • Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, and Elmar Gerhards-Padilla published a paper entitled “A Comprehensive Measurement Study of Domain Generating Malware” in which they discussed the many different categories of malware DGAs. • In addition, they also managed to create DGArchive, a repository of DGA regexes from 69 malware families obtained by reverse engineering malware samples. • Using the regexes, it is possible to generate list of AGDs for the current day to be used as a blacklist before the DGA attack even started.
  20. 20. DRAWBACK OF REGEX • The regex provided by DGArchive is too generic • For example, the DGA regular expression of Darkshell is [sS]{6}.com and google.com fits into the regex • Some other detection measures are necessary
  21. 21. ZIPF’S LAW DGA DETECTION TECHNIQUES
  22. 22. ZIPF’S LAW Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word.
  23. 23. N-GRAM FREQUENCIES Let’s take facebook.com as an example: • Unigrams = [‘f’, ‘a’, ‘c’, ‘e’, ‘b’, ‘o’, ‘o’, ‘k’, ‘c’, ‘o’, ‘m’] • Bigrams = [‘fa’, ‘ac’, ‘ce’, ‘eb’, ‘bo’, ‘oo’, ‘ok’, ‘co’, ‘om’] • Trigrams = [‘fac’, ‘ace’, ‘ceb’, ‘ebo’, ‘boo’, ‘ook’, ‘com’] The bigram frequency: • fa = 1 • ac = 1 • ce = 1 • eb = 1 • bo = 1 • oo = 1 • ok = 1 • co = 1 • om = 1 The unigram frequency: • f = 1 • a = 1 • c = 2 • e = 1 • b = 1 • o = 3 • k = 1 • m = 1
  24. 24. BIGRAM FREQUENCY OF LOG FILE Given a DNS Log File containing a list of domain names as follows: • google.com • facebook.co.id • apple.com • youtube.com • klikbca.com • twitter.com • detik.com • co = 7 • om = 6 • ik = 2 • le = 2 • oo = 2 • ac = 1 • ca = 1 • it = 1 • ce =1 The sorted bigram frequencies would be: • ap = 1 • go = 1 • et = 1 • gl = 1 • er = 1 • pp = 1 • tw = 1 • tt = 1 • tu = 1 • li = 1 • ti = 1 • te = 1 • pl = 1 • be = 1 • de = 1 • yo = 1 • bc = 1 • bo = 1 • wi = 1 • fa = 1 • eb = 1 • kb = 1 • ok = 1 • og = 1 • ut = 1 • kl = 1 • ou = 1 • ub = 1 • id = 1
  25. 25. CONVERTING FREQUENCIES TO FREQUENCY RATIOS • There are 38 distinct bigrams in the given DNS log file • The total of all 38 bigram frequencies are 52 • The most frequent bigram frequency is 7, equalling to 7/52 times in the log file • The least frequent bigram frequency is 1, equalling to 1/52 times in the log file • Therefore the max and min bigram frequency ratio is 0.1346 and 0.0192 respectively
  26. 26. ALEXA BIGRAM DISTRIBUTION
  27. 27. CONFICKER BIGRAM DISTRIBUTION
  28. 28. PYKSPA BIGRAM DISTRIBUTION
  29. 29. CONFICKER VS PYKSPA BIGRAM DISTRIBUTION
  30. 30. AGD VS HGD BIGRAM DISTRIBUTION
  31. 31. AGD VS HGD • From the graphs, it is seen that Algorithmically-Generated Domains (AGD) such as the Conficker and Pykspa worm domains, generate a relatively straight line graph while Human-Generated Domains (HGD) like Alexa’s Top 500 sites produce an elbow-shaped graph . • This observation leads to the creation of a formula for calculating the probability of a given log file containing DGA domains or incurring a DGA attack. The higher the DGA probability rate, the higher the possibility of an ongoing DGA attack within the monitored log.
  32. 32. MAXIMUM CONSONANT SEQUENCE LENGTH DGA DETECTION TECHNIQUES
  33. 33. DISCOVERING DGA WITHIN LOG FILES • Further observation on the polluted log file (identified using Zipf’s Law) reveals one of the most prominent DGA characteristics that allow us to distinguish AGDs from HGDs better, i.e. Maximum Consonant Sequence Length. Generally, AGDs has a larger value of MCS Length compared to HGDs. • Example: • google.com has a maximum consonant sequence length of 2, since the longest consonant sequence is “gl” • vofwxlbi.cn, one of the domains generated by Conficker worm, has a Maximum Consonant Sequence Length of 5 and the longest sequence is “fwxlb”
  34. 34. HIERARCHICAL CLUSTERING DGA DETECTION TECHNIQUES
  35. 35. FEATURES Level 1 • Query Class • Query Type Level 2 • Response Code Level 3 • Query Length • Numeric Chars Level 4 • Query Label Level 5 • Numeric Chars
  36. 36. TREEMAP
  37. 37. RESULTING CLUSTERS
  38. 38. ACCURACY OF DETECTION • Calculating the Accuracy using the formula below, the number 0.913 or 91% accuracy is obtained
  39. 39. COUNTERMEASURES - SINKHOLING
  40. 40. COUNTERMEASURES – DNS RPZ • Obtain daily DGA log file from http://data.netlab.360.com/feeds/dga/dga.txt • Parse using dnsanalysis library in Python • Export to text file and implement into DNS RPZ
  41. 41. REFERENCES • Botnet Communication Topologies https://www.damballa.com/downloads/r_pubs/WP_Botnet_Communications_Primer.pdf • A Comprehensive Measurement Study of Domain Generating Malware https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.p df • DGArchive – A deep dive into domain generating malware https://www.botconf.eu/wp-content/uploads/2015/12/OK-P06-Plohmann-DGArchive.pdf • Using DNS RPZ to Block Malicious DNS Requests https://blogs.cisco.com/security/using-dns-rpz-to-block-malicious-dns-requests

×