Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hunting with Data Science

Increasing the Signal-to-Noise Ratio

www.austintaylor.io
@HuntOperator
June 23, 2017
Who Am I?
@HuntOperator
Austin Taylor
Security Researcher @
IronNet Cybersecurity

Cyber Warfare Operator
@ USAF (MDANG)

www.austintaylor.io
@Hun...
Semantics
@HuntOperator
Threat Hunting
Cyber threat hunting is "the process of proactively
and iteratively searching through networks to
detect an...
Data Science
Data science is an interdisciplinary field
about scientific methods, processes, and systems
to extract knowle...
Threat Hunting
Data Science
——————————
+
Hunting with Data Science
@HuntOperator
Data Science Hunting Funnel
Produced Naturally
Machine Learning
Domain Knowledge
Potential Bad
Normal
Anomalous
Interestin...
Cyber Kill Chain
@HuntOperator
Beaconing
DGA
@HuntOperator
You are here
Cyber Kill Chain
Beaconing
@HuntOperator
Beaconing
•Post-Infection

•Early network-related indication of infection

•Used by malware to “phone home” to
command and...
Detection Challenges
•Hardset Intervals
•Varying window sizes

•Legit Services
•Windows update

•Virus definition updates
@...
Beaconing Detection
@HuntOperator
Beaconing: Detection
@HuntOperator
https://github.com/austin-taylor/flare
• Free Open Source Software
• Designed for data s...
Beaconing: Detection
@HuntOperator
https://github.com/austin-taylor/flare
[beacon]
es_host=localhost # IP address of ES Hos...
Beaconing: Data Science
•Identify Beaconing

•Time

•IP address

•Ports

•Protocol
@HuntOperator
Simple: src_ip, dest_ip, dest_port -> hash
More Complex: Discrete Fourier Transform (DFT)/
Fast Fourier transform (FFT)
Be...
Scenario 1
• A piece of malware has infected a
computer (192.168.0.53) on your
network and is trying to reach back to
its ...
HUNT!
@HuntOperator
Beaconing: Hunt
flare_beacon -c configs/selks4.ini -html beacons.html
108 

events to process
@HuntOperator
Beaconing: Hunt
flare_beacon -c configs/selks4.ini -html —group —whois —focus_outbound
beacons_filtered.html
31 

events to p...
Beaconing: Hunt
flare_beacon -c configs/selks4.ini -html —group —whois —focus_outbound
beacons_filtered.html
–group: This wil...
Beaconing: Hunt
• bytes_toserver: Total sum of bytes sent from IP address to Server
• dest_degree: Amount of source IP add...
Beaconing: Hunt
Validate Results
@HuntOperator
Beaconing: Hunt
Drilling in
@HuntOperator
Beaconing: Hunt
@HuntOperator
Beaconing: Hunt
@HuntOperator
Beaconing: Hunt
@HuntOperator
Beaconing: Hunt
@HuntOperator
Beaconing: Hunt
@HuntOperator
Domain Generation
Algorithms (DGA)
@HuntOperator
Domain Generation
Algorithms (DGA)
•Deterministic value

•Generate large number of domain names

•Easy to burn

•Cheap to ...
DGA Example
@HuntOperator
Source: Aditya K. Sood, Sherali Zeadally, "A Taxonomy of Domain-Generation Algorithms", IEEE Security & Privacy, vol. 14, ...
@HuntOperator
Scenario 2
• A piece of malware has infected a
computer on your network and is
making request to domains using DGA
in an a...
HUNT!
@HuntOperator
DNS Records
Record Count: 15408
@HuntOperator
Import Flare Tools
• DGA Classifier
• Random Forrest Classifier

• N-Grams

• Uses labelled data

• Alexa - Top 1M most popu...
Filter Results
Still too many results…
@HuntOperator
Filter Results
@HuntOperator
Still too many results…
And yet…
Filter Results
Down to 78
And finally…
@HuntOperator
Filter Results
Apply Alexa Check…
and…
57 Results! @HuntOperator
Pass to Analyst
• Identify Process Generating Traffic
• Isolate infected host

• Begin endpoint investigation…
@HuntOperator
Thank you!
www.austintaylor.io
@HuntOperator
Questions?
Upcoming SlideShare
Loading in …5
×

Threat Hunting with Data Science

2,619 views

Published on

After anomalous network traffic has been identified there can still be an abundance of results for an analyst to process. This presentation is for data scientist and network security professionals who want to increase the signal-to-noise.

Flare is a network analytic framework designed for data scientists, security researchers, and network professionals. Written in python, flare is designed for rapid prototyping and development of behavioral analytics. Flare comes with a collection of pre-built utility functions useful for performing feature extraction.

Using flare, we'll walk through identifying Domain Generation Algorithms (DGA) commonly used in malware and how to reduce the dataset to a manageable amount for security professionals to process.

We'll also explore flare's beaconing detection which can be used with the output from popular Intrusion Detection System (IDS) frameworks.

More information on flare can be found at https://github.com/austin-taylor/flare

www.austintaylor.io

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Threat Hunting with Data Science

  1. 1. Hunting with Data Science Increasing the Signal-to-Noise Ratio www.austintaylor.io @HuntOperator June 23, 2017
  2. 2. Who Am I? @HuntOperator
  3. 3. Austin Taylor Security Researcher @ IronNet Cybersecurity Cyber Warfare Operator @ USAF (MDANG) www.austintaylor.io @HuntOperator
  4. 4. Semantics @HuntOperator
  5. 5. Threat Hunting Cyber threat hunting is "the process of proactively and iteratively searching through networks to detect and isolate advanced threats that evade existing security solutions." @HuntOperator
  6. 6. Data Science Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. @HuntOperator
  7. 7. Threat Hunting Data Science —————————— + Hunting with Data Science @HuntOperator
  8. 8. Data Science Hunting Funnel Produced Naturally Machine Learning Domain Knowledge Potential Bad Normal Anomalous Interesting Bad 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 1 0 0 0 0 
 0 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 0 Network Traffic @HuntOperator .001% 3-5% 10% 100%
  9. 9. Cyber Kill Chain @HuntOperator
  10. 10. Beaconing DGA @HuntOperator You are here Cyber Kill Chain
  11. 11. Beaconing @HuntOperator
  12. 12. Beaconing •Post-Infection •Early network-related indication of infection •Used by malware to “phone home” to command and control server @HuntOperator
  13. 13. Detection Challenges •Hardset Intervals •Varying window sizes •Legit Services •Windows update •Virus definition updates @HuntOperator
  14. 14. Beaconing Detection @HuntOperator
  15. 15. Beaconing: Detection @HuntOperator https://github.com/austin-taylor/flare • Free Open Source Software • Designed for data scientists, 
 security researchers • Written in Python • Used for rapid prototyping and development of behavioral analytics • Intended to make identifying malicious behavior in networks as simple as possible.
  16. 16. Beaconing: Detection @HuntOperator https://github.com/austin-taylor/flare [beacon] es_host=localhost # IP address of ES Host, which we forwarded to localhost es_index=logstash-flow-* # ES index es_port=9200 # Logstash port (we forwarded earlier) es_timeout=480 # Timeout limit for elasticsearch retrieval min_occur=50 # Minimum of 50 network occurrences to appear in traffic min_interval=30 # Minimum interval of 30 seconds per beacon min_percent=30 # Beacons must represent 30% of network traffic per dyad window=3 # Accounts for jitter... For example, if 60 second beacons # occurred at 58 seconds or 62 seconds, a window of 3 would # factor in that traffic. threads=8 # Use 8 threads to process (Should be configured) period=24 # Retrieve all flows for the last 24 hours. kibana_version=5 # Your Kibana version. Currently works with 4 and 5 verbose=True # Display output while running script
  17. 17. Beaconing: Data Science •Identify Beaconing •Time •IP address •Ports •Protocol @HuntOperator
  18. 18. Simple: src_ip, dest_ip, dest_port -> hash More Complex: Discrete Fourier Transform (DFT)/ Fast Fourier transform (FFT) Beaconing: Data Science @HuntOperator
  19. 19. Scenario 1 • A piece of malware has infected a computer (192.168.0.53) on your network and is trying to reach back to its Command and Control (C2) server (160.153.76.129) in periodic intervals @HuntOperator
  20. 20. HUNT! @HuntOperator
  21. 21. Beaconing: Hunt flare_beacon -c configs/selks4.ini -html beacons.html 108 
 events to process @HuntOperator
  22. 22. Beaconing: Hunt flare_beacon -c configs/selks4.ini -html —group —whois —focus_outbound beacons_filtered.html 31 
 events to process @HuntOperator
  23. 23. Beaconing: Hunt flare_beacon -c configs/selks4.ini -html —group —whois —focus_outbound beacons_filtered.html –group: This will group the results making it visually easier to identify anomalies. –whois: Enriches IP addresses with WHOIS information through ASN Lookups. –focus_outbound: Filters out multicast, private and broadcast addresses from destination IPs What was applied? @HuntOperator
  24. 24. Beaconing: Hunt • bytes_toserver: Total sum of bytes sent from IP address to Server • dest_degree: Amount of source IP addresses that communicate to the same destination • occurrences: Number of network occurrences between dyads identified as beaconing. • percent: Percent of traffic between dyads considered beaconing. • interval: Intervals between each beacon in seconds @HuntOperator
  25. 25. Beaconing: Hunt Validate Results @HuntOperator
  26. 26. Beaconing: Hunt Drilling in @HuntOperator
  27. 27. Beaconing: Hunt @HuntOperator
  28. 28. Beaconing: Hunt @HuntOperator
  29. 29. Beaconing: Hunt @HuntOperator
  30. 30. Beaconing: Hunt @HuntOperator
  31. 31. Beaconing: Hunt @HuntOperator
  32. 32. Domain Generation Algorithms (DGA) @HuntOperator
  33. 33. Domain Generation Algorithms (DGA) •Deterministic value •Generate large number of domain names •Easy to burn •Cheap to register •Used as a rendezvous point by attacker @HuntOperator vtlfccmfxlkgifuf.com Why DGA?
  34. 34. DGA Example @HuntOperator
  35. 35. Source: Aditya K. Sood, Sherali Zeadally, "A Taxonomy of Domain-Generation Algorithms", IEEE Security & Privacy, vol. 14, no. , pp. 46-53, July-Aug. 2016, doi:10.1109/MSP.2016.76 @HuntOperator We want to detect this
  36. 36. @HuntOperator
  37. 37. Scenario 2 • A piece of malware has infected a computer on your network and is making request to domains using DGA in an attempt to communicate to a Command and Control Server
  38. 38. HUNT! @HuntOperator
  39. 39. DNS Records Record Count: 15408 @HuntOperator
  40. 40. Import Flare Tools • DGA Classifier • Random Forrest Classifier • N-Grams • Uses labelled data • Alexa - Top 1M most popular visited websites • Must pay for service now. • Umbrella/Majestic are free alternatives • Domain TLD Extract - Extracts the Top Level Domain to be checked against Alexa • Also calculate degree from here @HuntOperator
  41. 41. Filter Results Still too many results… @HuntOperator
  42. 42. Filter Results @HuntOperator Still too many results… And yet…
  43. 43. Filter Results Down to 78 And finally… @HuntOperator
  44. 44. Filter Results Apply Alexa Check… and… 57 Results! @HuntOperator
  45. 45. Pass to Analyst • Identify Process Generating Traffic • Isolate infected host • Begin endpoint investigation… @HuntOperator
  46. 46. Thank you! www.austintaylor.io @HuntOperator Questions?

×