Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

7,746 views

Published on

Published in: Technology
  • can u pls send me documentation to nireesha.koduri@gmail.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

  1. 1. Using Algorithms to Brute Force Algorithms … a journey through time and namespace Anthony Kasza Bsides Chicago 2015
  2. 2. Audience Participation: Answer a question, win a prize
  3. 3. Audience Participation: What is an algorithm?
  4. 4. algorithm noun Word used by programmers when they do not want to explain what they did. [12]  
  5. 5. Outline Background Malware Communications and Botnet Architectures Analyzing Domain Generation Algorithms Ramnit Ramnit’s DGA Brute Force Identification of Ramnit DGA Seeds Results Graphs Applications and Improvements
  6. 6. Me Anthony Kasza Security Researcher: OpenDNS @anthonykasza github.com/anthonykasza
  7. 7. Background
  8. 8. Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet [10]  
  9. 9. Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? [10]  
  10. 10. Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home [10]  
  11. 11. Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home Botnets are resilient cloud based, often distributed, remote administration systems [10]  
  12. 12. Audience Participation: Name a malware
  13. 13. Malware Communications: IP Open socket Beacon to IP address Easy to set up Easy to take down Client   Implant   C2   Server   Client   Implant   Client   Implant  
  14. 14. Malware Communications: P2P Open socket Beacon to super node peer(s) Very resilient Peer consensus issues Complex to set up Super   node   Super   node   Super   node   Super   node   [9]   Client   Implant   Client   Implant   Client   Implant  
  15. 15. Malware Communications: DNS Open socket Issue DNS query Client   Implant   C2   Server   Client   Implant   Client   Implant   DNS   Resolver  
  16. 16. Malware Communications: DNS Open socket Issue DNS query Open socket Beacon to IP address Relatively easy to set up Relatively easy to take down Client   Implant   C2   Server   Client   Implant   Client   Implant   DNS   Resolver  
  17. 17. Audience Participation: Name a botnet that uses DNS
  18. 18. Malware Communications: DNS Resiliency Tricks Fast Flux – DNS A records change quickly Double Flux – DNS A and NS records change quickly Domain Generation Algorithms (DGA) – C2 domain names are generated dynamically by a deterministic function within the implant at run time. Samples are "strings proof"
  19. 19. How To DGA Client   DGA   Date   Seed   Hash/PRNG   String   TLD  set   Domain   name   Lexicon   query   connect  to  IP     NXD   A   Start   End  
  20. 20. Example DGA Output vfxlsatformalisticirekb[.]com rd0ee55073a3776810962c124f02a99424[.]ws croialotvvnfliyjmvt[.]ru yxjsibeugmmj[.]in osghqrdmlyhh[.]net easebrainjobmarket[.]com
  21. 21. Malware Communications: DGA -  Function that generates domain names -  Shared secret between botnet implants and operators -  Often incorporates the date Operator registers domain “just in time” before the implant generates it [3]   Client   Implant   Registrar   Operator   DNS   Resolver   C2   Server  
  22. 22. Malware Communications: DGA -  Function that generates domain names -  Shared secret between botnet implants and operators -  Often incorporates the date Registrar ensures the domain is inserted into the DNS [3]   Client   Implant   Registrar   Operator   DNS   Resolver   C2   Server  
  23. 23. Malware Communications: DGA -  Function that generates domain names -  Shared secret between botnet implants and operators -  Often incorporates the date Implant generates and resolves the domain [3]   Client   Implant   Registrar   Operator   DNS   Resolver   C2   Server  
  24. 24. Malware Communications: DGA -  Function that generates domain names -  Shared secret between botnet implants and operators -  Often incorporates the date Implant connects to C2 IPv4 [3]   Client   Implant   Registrar   Operator   DNS   Resolver   C2   Server  
  25. 25. Malware Communications: DGA -  Function that generates domain names -  Shared secret between botnet implants and operators -  Often incorporates the date Repeat: Operator is constantly registering domain names [3]   Client   Implant   Registrar   Operator   DNS   Resolver   C2   Server  
  26. 26. Audience Participation: Name a malware that uses a DGA
  27. 27. Malware that uses a DGA Banjori DirCrypt Dyre GameoverZeus Hesperbot Matsnu Necurs Pushdo Pykspa Qakbot Ramnit Shiotob Simbda/Shiz Symmi TinyBanker Bedep Emotet Gozi Nymaim Suppobox Urlzone VolatileCedar Cryptolocker Conficker Murofet BankPatch Bobax Ramdo Flashback Kelihos Rovnix Torpig Many more… [5]  
  28. 28. Each DGA is Special Snowflake Conficker.C – generated 50k names per day Pushdo – DGA as a backup if C2 domain went down Kelihos – DGA as a backup if P2P network went down newGOZ DGA domains… registered through a few common registrars typically registered 1hr before algo would generate them changed NS domains but reused NS IPv4s [4]    [11]  
  29. 29. DGA Domain Query Periods Dyre Ramnit Matsnu Pykspa Bedep ~1 day N/A ~2 weeks ~3 weeks ~1 week
  30. 30. Generalized DGA pseudo code… for i in domain_set_size: domain = generate_domain(date, magic) resolve domain if domain resolves contact domain StopIteration def generate_domain(date, magic): domain = '' for i in lexicon_item_count: item = random_select(lexicon, magic) domain = domain + item domain = domain + random_select(tld_set, magic) return domain
  31. 31. Generalized Algorithms Analyses Domain set size How many domains to generate Date Today's date Seed A number used to ignite a PRNG Salt A magic number or campaign ID Lexicon A set of letters, n-grams, or words Lexicon Items Count Number of items to use from lexicon TLD set All possible TLDs MD*, SHA*, Etc Some hash PRNG Random numbers Bitwise Math xor, shl/shr, mod, b64, ascii to hex Names to contact These are often regex-able due to properties of the transformation function Inputs Functions Outputs
  32. 32. An Algorithm Taxonomy from Inputs Group   Lexicon   Domain     set  size   Salt/ Seed   Date   Examples   A   LeNers   Yes   Yes   Yes   Necurs,  GOZ,  Symmi,  Tinba,  Pykspa   B   LeNers   Yes   Yes   No   Ramnit,  DirCrypt,  VolaVleCedar,  Ramdo   C.i   LeNers   Yes   No   Yes   Conficker,  Dyre,  Cryptolocker,  Pushdo,   Qakbot   C.ii   Words   Yes   No   Yes   Matsnu,  Rovnix  
  33. 33. Enter Ramnit
  34. 34. Audience Participation: Tell me anything about Ramnit
  35. 35. Ramnit Malware Worm/RAT Emerged 2010 “Borrowed” features from Zeus source 2011 Spread via EK, social media, bundled software, etc Uses a DGA [7]  
  36. 36. Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain [1]  
  37. 37. Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain [1]  
  38. 38. Ramnit DGA Pseudo Code Client   DGA   Seed   uint32   LCG  PRNG   string   +  ".com"   Domain   Name   Lexicon   [a-­‐y]{8,19}   query   connect  to  IP     NXD   A  
  39. 39. Ramnit DGA Pseudo Code Unknowns 1.  Linear congruential generator’s seed 2.  How many times this loop occurs Client   DGA   Seed   uint32   LCG  PRNG   string   +  ".com"   Domain   Name   Lexicon   [a-­‐y]{8,19}   query   connect  to  IP     NXD   A  
  40. 40. Brute Forcing Ramnit DGA Seeds Inputs: domain_set_size, seed, tld_set, lexicon Outputs: names I.  Iterate over seed space (232) and identify candidate seeds II.  Find and generate the seeds’ associated domain_set_size III.  Determine the minimum set of seeds to produce all domains (overlap in LCG output) [2]  
  41. 41. Step 1: Identify Candidate Seeds 1.  Seed the Ramnit DGA with every value 0-232 2.  Generate the first domain from each seed –  27 hours on an AWS c3.8xlarge –  24 processes, each with its own CPU core and a portions of the seed space –  Resulting seed and domain tuples sorted and merged 3.  Scan OpenDNS querylogs and find which domains received at least one query 4.  Seeds which generated domains that received queries are candidate seeds
  42. 42. Audience Participation: Which are candidate seeds?
  43. 43. Candidate Seeds Example seed1, domain1 seed2, domain1 seed3, domain1 seed4, domain1
  44. 44. Step 2: Find Seeds’ Domain Set Size 1.  Observe the domain’s hourly query counts for the previous two weeks* 2.  For each candidate seed, generate the next domain 3.  Compare 2 to the seed’s composite query pattern If they are similar: 1.  Merge the pattern into the seed’s composite query pattern 2.  Increment the seed’s domain set size 3.  Goto 1 Otherwise: 1.  Exit * A vector with each position representing an hourly count of DNS queries
  45. 45. Audience Participation: What is this seed’s domain set size?
  46. 46. Seeds’ Domain Set Size Example seed1, domain1 seed1, domain2 seed1, domain3 seed1, domain4
  47. 47. Step 3: Minimum Seed Set for Domain Coverage 1.  For each seed and its associated domain set… 2.  Remove all domain sets that are subset of other domain sets 3.  Minimum seed set for domain coverage remains Seeds that remain aren’t necessarily “in the wild” They are seeds that generate all domains “in the wild”
  48. 48. Audience Participation: Which seeds would be eliminated?
  49. 49. Minimum Seed Set Example seed1: domain1, domain2 seed2: domain1, domain2, domain3 seed3: domain3, domain4 seed4: domain1, domain2, domain3, domain4 seed5: domain5
  50. 50. Brute Forcing Algorithm Weaknesses 1.  The first domain from each seed is used to located candidate seeds 2.  No queries on that day means seed is ignored 3.  Point in time analysis 4.  DGAs collide with legitimate domain names -  1 million monkeys typing in 1 million address bars will eventually browse to 4chan
  51. 51. Results
  52. 52. Results: Seeds, Domains, Clients 29 seeds, 3924 domains -  Seeds confirmed by Symantec’s report I found some seeds not listed in Symantec’s report -  Not a big deal due to overlaps in Ramnit DGA’s LCG seeds I found some domains not listed in Symantec’s report -  Bigger deal if Symantec is serious about takedowns [7]    [8]  
  53. 53. Audience Participation: Was anyone here involved in the Ramnit takedown?
  54. 54. Results: Patterns in Domain Queries by Seed
  55. 55. Results: Patterns in Domain Queries 1.  Locate IPv4s that queried each domain 2.  Create a graph of seed -> domains -> client IPv4s 3.  Count connect components (I found two) S   S   S   S   S   D   D   D   D   D  D   D   D   D   C   C   C   C   C   C  
  56. 56. Results: Patterns in Domain Queries by IPv4 Groups
  57. 57. Applications and Improvements Generalize framework for use with all DGA implementations - Currently working with more than just Ramnit Vigilant monitoring instead of point in time search -  Ramdo seeds are able to be updated by the C2 server -  even if you RE the algorithm, you don't have the seed unique to each compromised system Combine with other DGA detection techniques -  co-occurrances and lexical features [6]  
  58. 58. Conclusion Why should you care? -  Many malware families are using DGAs -  This is a new way to identify new badness -  Know the shared secret, find all the C2 domains -  Not all DGAs are created equal -  Some are more difficult to track than others -  malware authors are people too -  3:30, “The Life and Times of an APT Malware Author”
  59. 59. Audience Participation: Are there any questions?
  60. 60. Thanks BsidesChicago OpenDNS Johannes Bader Daniel Plohmann John Bambenek Thomas Mathew Dhia Mahjoub Steve Mckinney
  61. 61. References http://johannesbader.ch/2014/12/the-dga-of-ramnit/ [1] https://labs.opendns.com/2015/02/18/at-high-noon-algorithms-do-battle/ [2] http://www.cc.gatech.edu/~ynadji3/docs/pubs/pleiades2012.pdf [3] http://www.slideshare.net/OpenDNS/shmoocon-2015-presentation [4] https://github.com/Andrewaeva/DGA [5] http://blogs.technet.com/b/mmpc/archive/2014/04/08/msrt-april-2014-ramdo.aspx [6] http://www.symantec.com/connect/blogs/ramnit-cybercrime-group-hit-major-law-enforcement-operation [7] http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/w32-ramnit-analysis.pdf [8] http://www.malwaretech.com/2013/12/peer-to-peer-botnets-for-beginners.html [9] http://en.wikipedia.org/wiki/Botnet [10] http://commons.wikimedia.org/wiki/File:Snowflake-black.png [11] Somewhere on Twitter [12]

×