ANALYZE'15 - Bulk Malware Analysis at Scale

Extracting Malware
Configurations at Scale
ANALYZE2015
John Bambenek, Fidelis Cybersecurity

Sharing Restrictions
• All the content on the slides can be considered
TLP:GREEN.
• Anything that I say that’s more restrictive, I will
tell you.
• Slides will eventually be posted to SlideShare.
• Questions to
john.bambenek@fidelissecurity.com

Introduction
• Sr. Threat Researcher with Fidelis Cybersecurity
• Faculty at the University of Illinois at Urbana-
Champaign
• Producer of open-source intelligence feeds
• Run several takedown-oriented groups for
various malware families

Problem Statement
• We are on the losing end of an
arms race
• The adversaries produce more malware than we can
possible analyze.
• We have to operate in the open while they operate in
secret.
• Their core business is exploitation, security for us is a
cost center.
• We operate in a global economy without an effective
means of global law enforcement.

TL;DR
Bad News: We’re Doomed
Good News: Unlimited Job Security

The Problem… Illustrated
Virustotal Statistics taken at 20 Apr 2015 14:24 PDT

Another way to look at it…
• How long does it take to reverse engineer a
malware sample?
• How long does it take to create a
signature/rule/defense?
• How long does it take to create all the IOCs?
• Now… how long does it take that actor to
change?

Is it really that many?
• Even though hundreds of thousands of unique files are
seen daily, the number of malware families is much
lower.
• Key is to develop the tooling to take a sample and rip
out the pieces we need that are interesting.
• Single stage malware is easy, the entire configuration is
in one place.
• What about multi-stage malware?
• Still has some place it calls to for the next stage.

The problem of “sufficiency”
• Once we “detect” a threat work occurs until some
“defense” is developed.
• Once a threat is “blocked”, the work tends to
stop.
• Many times there are multiple actor sets that may
use a specific piece of malware but detection can
be generic to the tool level.

The missing pieces…
• What about ongoing surveillance?
• What about tracking and identifying all the unique
endpoints used by a specific piece of malware?
• i.e. If you could know every C2 that ever was an
njRat server, would that be of interest to you?
• What about the unique attributes (mutex,
campaign ID) that may be used?

Making RE more efficient
• Full RE most expensive but most thorough.
• Dynamic analysis is good, but bin may not run correctly.
• Static analysis can be very fast… if you know how to pull
the information out.
• Key is to automate such that you can do as much static
analysis as possible, dynamic for much of the rest and
RE only for the items where there is no other alternative.

Why RATs?
• Single stage malware will generally always have
full configuration in the binary itself.
• Used not just by skiddies but by advanced
attackers also.
• Large sample set to deal with as proof of
concept.
• Dozens of RAT types all well-known to deal with.
• Gotta walk before you can run.

What can you do with RAT configs?

Maybe I’m being a little too harsh
• RAT operators tend to be the black hat farm team.
• It may be “simple” but the fact we haven’t
eradicated it suggests its not so simple.
• Takedowns are an art form in progress, this
provides lower stakes targets to develop the
tradecraft.
• Lack of enforcement breeds the feeling of
invulnerability of cyber criminals.
• Don’t forget, “APT” use RATs too.

Also, there is this magic sauce…
• https://github.com/kevthehermit/RATDecoders
• Python scripts that will statically rip configurations
out of 32 different flavors of RATs.
• Actively developed and you can see in action at
malwareconfig.com
• Disclaimer: I had nothing to do with the
development of these tools; they just fit my need
and Kevin Breen deserves mad props.

The next piece of the puzzle
• In order to determine which decoder to use, you need
to know which RAT it is.
• Yara used for this piece using configs from:
• https://github.com/kevthehermit/YaraRules
• Yara Exchange
• In-House Rules
• Yara results used as “authoritative” for purposes of
selecting the decoder.

Malware Sources
• VirusTotal
• MSFT VIA Program
• Others I haven’t had chance to see if they want
recognition
• RAT Traps
• In total, upwards of .25 TB a day (not all RATs)
• In short, every piece of malware I can find.

RAT Traps
• Some RAT operators tend to have some
targeting information in mind when they are
seeking infections…
• Celebrities
• Corporate executives
• Young girls
• Create faux persona that mimic some of these
characteristics with an available email address
and let nature take its course.
• Or leak them to pastebin if you’re in a hurry.

DESIGNING A SYSTEM TO
HANDLE IT ALL

Process
• Intake of Malware
• Normalize into one directory with MD5 as filename
• Process and Unpack Samples
• Scan all samples with Yara
• Use yara output to run selected samples with
correct decoder
• Normalize output
• Process into CSV feed for daily summary of
configuration info
• Profit

First Bottleneck… Bandwidth
• Running a hi 1.4 xlarge all this could run in about
90 minutes
• It also costs $1000/mo for on-demand
• Oh, and there is no capacity for spot instances
• Running in corporate datacenter it took about 9-
10 hours which is still acceptable for current data.
• Insufficient to do this retroactively.
• There was one issue with running it in corporate
datacenter though…

When datacenter gangsters attack…
• Apparently they get mad when you take up the
whole pipe during business hours…

Next bottleneck… Disk
• All of this is disk I/O intensive:
• Writing to disk
• Processing file magic
• Yara scanning
• Python scripts pulling configurations out of
files.
• SSD or Bust…
• Discard binaries when done processing
• But keep source information

Last bottleneck… time
• Downloading files one at a time (I don’t control
packaging)
• Yara scanning one file at a time
• Lots of wasted CPU cycles sitting in idle.
• Solution: parallel
find . -type f -exec basename {} ; | parallel --max-lines 1 -j
160 yara ~/yara/all_trojans.yar 2> /dev/null >>
../yarascan.$prettystamp

Malware Configs
• Every RAT has different configurable items.
• Not every configuration item is necessarily
valuable for intelligence purposes.
• Some items may have default values.
• Free-form text fields provide interesting data that
may be useful for correlation.
• Mutex can be useful for correlating binaries to
the same actor.

Sample DarkComet config
Key: CampaignID Value: Guest16
Key: Domains Value: 06059600929.ddns.net:1234
Key: FTPHost Value:
Key: FTPKeyLogs Value:
Key: FTPPassword Value:
Key: FTPPort Value:
Key: FTPRoot Value:
Key: FTPSize Value:
Key: FTPUserName Value:
Key: FireWallBypass Value: 0
Key: Gencode Value: 3yHVnheK6eDm
Key: Mutex Value: DC_MUTEX-W45NCJ6
Key: OfflineKeylogger Value: 1
Key: Password Value:
Key: Version Value: #KCMDDC51#

Sample njRat config
Key: Campaign ID Value: 1111111111111111111
Key: Domain Value: apolo47.ddns.net
Key: Install Dir Value: UserProfile
Key: Install Flag Value: False
Key: Install Name Value: svchost.exe
Key: Network Separator Value: |'|'|
Key: Port Value: 1177
Key: Registry Value Value:
5d5e3c1b562e3a75dc95740a35744ad0
Key: version Value: 0.6.4

Processing DNS/IP Info
• Config takes FQDN or IP in free-form field.
• The only configuration item any processing is
done on is here.
• If RFC 1918 IP, then drop config.
• If FQDN resolves to RFC1918 IP, keep it.
• If it doesn’t resolve, keep it.

Sample Output
0739b6a1bc018a842b87dcb95a73248d3842c5de,150213,Dark Comet
Config,Guest16,lolikhebjegehackt.ddns
.net,,1604,,,,o1o5GgYr8yBB,DC_MUTEX-4E844NR
0745a4278793542d15bbdbe3e1f9eb8691e8b4fb,150213,Dark Comet
Config,Guest16,ayhan313.noip.me,,1604
,,,,aWUZabkXJRte,DC_MUTEX-TX61KQS
07540d2b4d8bd83e9ba43b2e5d9a2578677cba20,150213,Dark Comet Config,FUDDDDD,bilalsidd43.no-
ip.biz,
204.95.99.66,1604,,,,qZYsyVu0kMpS,DC_MUTEX-8VK1Q5N
07560860bc1d58822db871492ea1aa56f120191a,150213,Dark Comet Config,Victim,cutedna.no-
ip.biz,,1604
,,,,sfAEjh4m1lQ7,DC_MUTEX-F2T2XKC
07998ff3d00d232b6f35db69ee5a549da11e96d1,150213,Dark Comet
Config,test1,,192.116.50.238,90,,,,4A
2xbJmSqvuc,DC_MUTEX-F54S21D
07ac914bdb5b4cda59715df8421ec1adfaa79cc7,150213,Dark Comet
Config,Guest16,alkozor.ddns.net,31.13
2.106.94,1604,1.ekspert60.z8.ru,######60,######2012,zwd8tEC0F0tA,DC_MUTEX-W3VUKQN

Pump it all into a database… profit
• CSV is all fine and good, but not great for
historical searching…
• Main table with Hash, C2 info, description,
source and date.
• Also pumped into CIF
• RAT-specific table with Hash and RAT specific
config info.

Artifact Mining
• Often (but not always) the operators of a given piece of
malware are distinct and separate from the author of the
malware.
• Correlating related pieces of code may not be worthwhile.
• Cryptolocker example
• At least for RATs, the interesting artifacts are the
configuration, not the code.
• Malware actors may change tools but may continue to
use some of the configuration elements.

Why in the world would you ever do this?
1524 Guest16
145 Guest16_min
50 Anonymous
43
29 Hacked
28 Victim
28 HF
27 TestGuest
27 Test1
26 Guest162
25 Slave
23 B--L--A--Y
22 Guest1
20 Test
17 Guest
17 1
16 DOS
15 Eb0la
14 Kurban
13
12 HACKIADO MUAHAHAHAHA
11 test
11 Bot
10 VoltandoAHackear
10 Hack
10 AVA

More examples
2652 HacKed
119
109
72
50 Hacked
37 hacked
18
14 google
13 Victim
11 isLam
10 victim
10 system
9 test
9
8 xXxVICTIMxXx
8 vitima
8 4kurdistan.no-ip.biz
…
7 HacKed By Amr Nasr
6 HacKed By Mohamed Ashraf
5 HacKed_by_Hammouda-Hacker
4 Ahmed Najar
4 ahMed-haKerS

RAT Creed
This is my RAT. There are many like it, but this one is mine.
My RAT is my best friend. It is my life. I must master it as I
must master my life.
My RAT, without me, is useless. Without my RAT, I am
useless. I must fire my RAT true. I must shoot straighter
than my enemy who is trying to kill me. I must shoot him
before he shoots me. I will...

Top Global ASNs for RAT C2s
294 36947 DZ ALGTEL-AS,DZ
131 8452 EG TE-AS TE-AS,EG
115 42708 SE PORTLANE Portlane Networks AB,SE
113 36903 MA MT-MPLS,MA
98 50710 IQ EARTHLINK-AS EarthLink Ltd. Communications&Internet
Services,IQ
69 9121 TR TTNET Turk Telekomunikasyon Anonim Sirketi,TR
69 25019 SA SAUDINETSTC-AS Saudi Telecom Company JSC,SA
52 NA NA
39 47869 SE NETROUTING-AS Netrouting,NL
35 37705 TN TOPNET,TN
31 24863 EG LINKdotNET-AS,EG
30 45595 PK PKTELECOM-AS-PK Pakistan Telecom Company Limited,PK
25 7738 BR Telemar Norte Leste S.A.,BR
25 3215 FR AS3215 Orange S.A.,FR
25 2609 TN TN-BB-AS Tunisia BackBone AS,TN
24 8376 JO Jordan Data Communications Company LLC,JO
23 4565 MEGAPATH2-US - MegaPath Networks Inc.,US
22 8075 US MICROSOFT-CORP-MSN-AS-BLOCK - Microsoft Corporation,US

Top Countries for RAT C2s
294 DZ
261 US
225 RU
186 EG
168 SE
152 IQ
145 MA
114 BR
103 SA
99 TR
99 TN
89 FR
81 UA

Top US Cities for RAT C2s
22 Redmond, Washington
12 Dallas, Texas
7 Phoenix, Arizona
6 Providence, Utah
6 New York, New York
6 Los Angeles, California
3 Wilmington, Delaware
3 San Antonio, Texas
3 Philadelphia, Pennsylvania
3 Houston, Texas
2 Willoughby, Ohio

Eventually fully-retroactive
• All that malware in Virustotal? You can still use that.
• Think of the intelligence possibilities of having a “master”
database of RAT configurations for “all time”…
• If nothing else, Amazon’s stock price will go up from the
AWS fees
• Why?
• Because often we don’t know what is important until
after-the-fact and the ability to go back and have
information readily available can shorten the response
time.

What to do with this data?
• Give to LE for action is obvious
• Give to CERTs for them to take action
• Or you can burn all the RATs  #OpTrollHackforums
• Creating alerts on this data is probably ok.
• Taking automated blocking action based on this data is
probably not.

#OpSoapbox
• This is a wealth of very useful information… but it
is just information.
• Intelligence is the process of thinking critically
about the information you have…
• What is it telling you
• What are all the possible conclusions
• Where can the adversary deceive you
• What harm could be caused if you acted on it

Don’t be that guy
Adapted from Brandon Levene* (I think)

Counterintelligence
• DNS resolution is under the control of the
adversary.
• The adversary has motive to deceive.
• The adversary has motive to cause harm.
• DGA feeds anecdote
• Shameless plug:
http://osint.bambenekconsulting.com/feeds

What’s the worst that can happen…
• If I were evil and knew you were taking automated
blocking action based on something I controlled resolution
for, here is what I would use for IPs:
198.41.0.4
192.228.79.201
192.33.4.12
199.7.91.13
192.203.230.10
192.5.5.241
192.112.36.4
128.63.2.53
192.36.148.17
192.58.128.30
193.0.14.129
199.7.83.42
202.12.27.33

Analyzing data at scale
• How can you possibly analyze thousands of
configurations to determine confidence in each
individual record?
• You can’t.
• Ultimately need something to correlate it with.
• Wiretap if LE
• Correlation with other malicious activity at
same IP

But the data changes…
• If the adversary uses DNS, they can change information
at-will.
• Long-term goal is to feed “live” data into another
application that handles surveillance called PSS –
Permanent Surveillance System.
• Maybe I’ll open-source it, don’t know yet.
• Beyond that, there are some interesting fields to pivot off
of to correlate campaigns
• Campaign ID
• Mutex
• Registry Keys

Long-Term
• Identifying a threat point-in-time has value.
• Surveilling a threat as it moves and changes
proactively reduces the the window of
opportunity for an adversary.
• RATs are just the start
• They are relatively easy
• Still useful to improve the tradecraft
• And they are still used by adversaries

QUESTIONS?
THANK YOU
John.bambenek@fidelissecurity.com / 217 493 0760
@bambenek

ANALYZE'15 - Bulk Malware Analysis at Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ANALYZE'15 - Bulk Malware Analysis at Scale

Similar to ANALYZE'15 - Bulk Malware Analysis at Scale (20)

More from John Bambenek

More from John Bambenek (9)

Recently uploaded

Recently uploaded (20)

ANALYZE'15 - Bulk Malware Analysis at Scale

Editor's Notes