Why are Command and Control (C&C) communications so significant to detecting advanced threats and how should you go about detecting them? We’ll discuss the various pitfalls of the traditional methods of detecting C&C and specifically those currently based on machine learning. Machine Learning must be structured, designed and delivered in exactly the right way to deliver impact for detection of advanced threats. The session will introduce our approach, which has significantly improved both detection rates and efficiency. We’ll discuss several test cases and the lessons we’ve learned over time.
Learning Outcomes:
Learn why Command and Control monitoring is the key to detecting advanced threats
Uncover pitfalls of the current approaches to C&C detection
Understand Machine Learning and it's role in detecting malicious activity
Understand the potential dangers of the wrong machine learning approach
Learn about the impact a new supervised learning approach can have – in both theory and practice
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are a Good Place to Start
1. Moshe Zioni ( @dalmoz_ )
Security Research Manager, VERINT
On the Hunt for
Advanced Attacks?
C&C Channels are a
Good Place to Start
2. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
• Why focus on C&C?
• C&C - Landscape
• Trends in C&C implementations
• Traditional Approaches
• Our approach
• Limitations
• Proof-of-Concept results
• Takeaways
• Q&A
On The Agenda
3. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
• Moshe Zioni ( @dalmoz_ )
• Leading a terrific group of talented researches at
• Researching and developing cutting-edge, next generation
detection engines for malicious activity on very big enterprises and
ISPs.
• Credit & Kudos goes to the Research team, especially to Eddie,
Maria, Meir, Oren and Vadim, and to the Analysis team.
WHOAMI – credits & kudos
4. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
• Always present (almost)
• Network interception is practical, contrast to other detection
methods/layers
• While malware tends to be polymorphic, communication protocol
does not
• An old problem –
• Current schemes of detection are not so promising on detecting the ‘new’.
• Traditional tactics rely heavily on somewhat naïve comparison.
Why focus on C&C channels?
5. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
C&C landscape
DNS
9%
HTTP
62%
ICMP
5%
NATIVE
14%
P2P
10%
Distribution of Protocols
DNS HTTP ICMP NATIVE P2P
Name Method
Dridex P2P, HTTP
Nano Locker ICMP
Poisn Ivy HTTP
FLAME HTTP
CITADEL HTTP
Bergard HTTP
Vawtrack URLZONE HTTP
BlackMoon HTTP
Wekby DNS
ZeUS (GOZ) HTTP (P2P)
DORKBOT HTTP
SIMDA NATIVE + HTTP
REGIN NATIVE (TCP + UDP ) +ICMP + HTTP
SOUNDFIX-11 HTTP
JAKUcalc HTTP /NATIVE TCP / DNS
TrickBot HTTP
GOZNYM P2P
6. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Trends in C&C implementations
Rapid, fast to respond, evolution
Encryption of transmissions and payload
Encapsulation of transmissions
Steganography of messages
P2P – Forget about SPOF
7. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Traditional Approaches
• Blacklists/ known patterns
• Constantly needs upkeep and maintenance
• Low False Positive
• Forever rely Intelligence and Analysis
• Not suitable at all to find
‘unknown’ schemes
• High False Negative
• Markov models
• ARMA
• Baseline comparison
• Assuming normal traffic differ, in statistic
modelling, of malicious traffic, might reveal
novel schemes
• This assumption is failing many times
in current trends.
• High False Positive Rate
Signature based detection Anomaly based detection
8. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Our approach
Choosing an alternate path
9. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
What Do We Need?
We need something robust, that can “think” of many possibilities.
Rely on what we do know and induce further.
Fast (polynomial) results.
MACHINE LEARNING - For The Win!
10. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Enter Machine Learning
Machine Learning is the science of
providing a computer with the ability to
“learn” by example and teach itself to
find patterns.
There are many methods of ML –
each one has its pros and cons.
The model ‘learns’ from
known, classified data, and
extrapolate to achieve even
nontrivial results. (for a human)
Evolved from Pattern
Recognition and
Artificial Intelligence
studies.
11. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Supervised Learning
Rely on labelled training data.
Collection is key for optimized model and for reducing error levels
Data sample set should be comprised of encompassing, diverse and relevant data.
We used Decision Tree-Random Forest based Supervised learning
12. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Feature
Extraction
SUIT TIE CHARM SMILE BAD-TEETH CAT CLASS
13. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Feature
Extraction
SUIT TIE CHARM SMILE BAD-TEETH CAT CLASS
14. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Feature
Extraction
SUIT TIE CHARM SMILE BAD-TEETH CAT CLASS
15. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Feature
Extraction
SUIT TIE CHARM SMILE BAD-TEETH CAT CLASS
16. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Session
• Time differences
• # of bytes
• How many requests got an answer?
• How much time it took to get an answer?
TCP
• 5-Tuple information
• Protocol
• IP Payload
• Handshake data
• Flags
• Flow count
Feature selection in TCP/HTTP
Protocol specific - HTTP:
• What is the length of the host name?
• Body length
• # of unique URI calls within the session
• # of “user agent” strings used & values
• How many file types were downloaded?
• What is the average status code?
• What is the avg. length of the URI?
• Number of parameters
SSL/TLS
• Certificate metadata
• Negotiatied cipher-suite
17. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Appropriate data collection and feature engineering is crucial for a
proper, effective, model
Machine learning results are hard to interpret – most of the times
the question of ‘How did the machine decided that is malicious
traffic?!’ - Is not straight-forwardly answered.
Do not succumb to overfitting. (e.g. params/samples >> 1)
But, first
18. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
In-the-Wild POC
Sample Results
19. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
POST /some/uri.php HTTP/1.1
layer=cXJjb3JtYUJxamNwaW5jcWdwcSxhbW8=&dimm=Pl
dRR1A8YG12bGd2XW9ja25ncD4tV1FHUDwIPkxDT0c8IG9ja25ncGB
teiA+LUxDT0c8CD5RV0BIPHFyY28iYG12bGd2ImtsImNhdmttbD4tU
VdASDwIPlFATUZbPAhWamtxIm9ncXFjZWcidWNxInFnbHYiZHBtbyJj
ImFtb3JwbW9rcWdmIm9jYWprbGcsCD4tU UBNRls8&err=1
(Source: Akamai)
Spamtorte – old version comm.
20. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Old version body contents:
layer=cXJjb3JtYUJxamNwaW5jcWdwcSxhbW8=&dimm=Pl
dRR1A8YG12bGd2XW9ja25ncD4tV1FHUDwIPkxDT0c8IG9ja25ncGBteiA+LUxDT0c
8CD5RV0BIPHFyY28iYG12bGd2ImtsImNhdmttbD4tUVdASDwIPlFATUZbPAhWamtxIm9ncXFjZWcidWNxInFnbHYiZHBtbyJjI
mFtb3JwbW9rcWdmIm9jYWprbGcsCD4tU UBNRls8&err=1
(Source: Akamai)
New version POST Request body contents: (keeping the first letter and randomizing, 2-5 chars each)
ljj=Y24sZXBnZ2xnNTs1OyxjZUJlb2NrbixhbW8hY2hY24sZXdjcGZCbmdjdGt2dixhb
W8hY24sZXdY2tuLGFtbyFjbixld2tjcGZCam12b2NrbixkcCFjbixld2tjcGZCbmNybXF2
ZyxsZ3YhY24sZXdrYG10a2FqQmVvY2tuLGFtbw3%3D&dhgxbg=PldRR1A8Zm1sY2
5mcW1sNDQ6Pi1XUUdrcWo9Ij5gcDwiPmBwPCJwZ3JueyJvZyJrZCJ7bXcidW13bm
YibmtpZyJ2bSJxZ2cib3sicmptdm1xLCJRZ2cie21&ejv=o
Spamtorte - version comparison
21. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Spamtorte – Malware Upgrades
Filename MD5 Size
OLD (32bit version) 1faf27f6b8e8a9cadb611f668a01cf73 47,509
OLD (64bit version) cb0477445fef9c5f1a5b6689bbfb941e 52,515
NEW (32bit version) c547177e6f8b2cb8be26185073d64edc 87,875
NEW (64bit version) d04c492a5b78516a7a36cc2e1e8bf521 95,063
22. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Spamtorte -
what made
the machine
spot it??
Relevant samples
were from several
sources, found to
be “similar” to:
CryptoWall
TeslaCrypt
23. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
SpamTorte v2: http://cyber.verint.com/spamtorte-version-2/
Getting a hold of the details:
Extra! http://cyber.verint.com/nymaim-malware-variant/
24. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Key takeaways
Traditional schemes are not relevant for the goal of APT detection
Machine Learning is key for uncovering unknown malicious traffic
Collection is gold and should be considered the most crucial part
of the operation, if not – may lead to very error-prone models
C&C comms. are becoming rapidly encrypted (exp. Features)
25. LOOKING FOR APTS? C&C IS A GOOD PLACE TO START
Thank You
Visit us at booth #G160!
Editor's Notes
HTTP/S definitions
UDP/TCP
ICMP
Social networks?
Dridex is p2p or http?
tegonagraphy?
Social networks?
Autoregressive moving average
TRAINING DATA
Easy example -
Small set – cannot really extrapolate from a small bunch
Collection is key!
Bias is dangerous – diversity, robust, clean
TRAINING DATA
Easy example -
Small set – cannot really extrapolate from a small bunch
Collection is key!
Bias is dangerous – diversity, robust, clean
TRAINING DATA
Easy example -
Small set – cannot really extrapolate from a small bunch
Collection is key!
Bias is dangerous – diversity, robust, clean
TRAINING DATA
Easy example -
Small set – cannot really extrapolate from a small bunch
Collection is key!
Bias is dangerous – diversity, robust, clean
Per domain features – sessions instead of http specific features
Another option is to add certificate to the circle together with other ssl/tls features
The payload itself (body) is different – Json -
Samples from samples – add some pcaps or at least names of families of which we derived this conclusion
Similarity breakdown(Columns malware_prediction), [count of alert]1000052 - [~300] - CryptoWall1000053 – [~ 100 ]- TeslaCrypt