1. The Artificial Reality of
Cyber Defense
Fabio Palozza
Technical Director, EMEA
RADWARE
Riga, October 2018 – DSS ITSEC
https://blog.radware.com/author/fabiop/
2. 2
Minimizing False Positives & False Negatives
Too many events Not enough events
Image Source: Effect Size FAQs by Paul Ellis
Why minimize
False Negatives?
S3r1ously !?!?
False Positives?
How much incidents can your SOC
investigate?
Give the right incidents the
amount of time they deserve?
3. 3
Detection Sensitivity in Positive Security Models
Probability
Sensitivity
False Negative False Positive
Allow all Deny all
Negative Security Model
xx’
4. 4
Anomaly Detection – Game On!
• Security threats growing faster
than security teams and
budgets, huge talent shortage
• Paradox: Proliferation of data
from dozens of security products
makes it harder to detect and
investigate threats
• Need for automation
• Rule based event correlation
provides reduction from millions
to thousands
• A good SOC can investigate
maybe a couple of 100 incidents
a day
• How to leverage previous work
from the SOC to improve the
future detection by automation?
• Need for automation that
improves itself over time based
on new data and user or
researcher feedback
6. 6
MACHINE LEARNING
Algorithms whose performance improve as
they are exposed to more data over time
DEEP
LEARNING
Multilayered neural
networks learn from vast
amounts of data
ARTIFICIAL INTELLIGENCE
A system that can sense, reason, act, and adapt
7. 7
Detection Algorithms & Machine Learning
COMPLEXITY
Deterministic
Transparent
Data provides baselines
Too complex to code
Generalization
Opaque
ABILITY TO MITIGATE AUTOMATICALLY / TIME TO MITIGATE
Degree of Attack (DoA)
9. 9
Challenges of Deep Learning
Reproducibility Transparency Learning
in
Adversarial
Contexts
Learning
in
Changing
Environments
Training
Data
10. 10
Poisoning Attack
March 2016 – Microsoft unveiled Tay
An innocent chatbot (twitterbot)
An experiment in conversational understanding
It took less than 24 hours before the community corrupted an innocent AI chatbot
https://i.kym-cdn.com/photos/images/original/001/096/674/ef9.jpg
12. 12
Adversarial Attack Example
Camouflage graffiti and art stickers cause a neural network to
misclassify stop signs as speed limit 45 signs or yield signs
Source: https://thenewstack.io/camouflaged-graffiti-road-signs-can-fool-machine-learning-models/
13. 13
Breaking CAPTCHA
• 2012: Support Vector Machines (SVM) to break reCAPTCHA
• 82% accuracy
• Cruz, Uceda, Reyes
• 2016: Breaking simple-captcha using Deep Learning
• 92% accuracy
• How to break a captcha system using Torch
• 2016: I’m not Human - breaking the Google reCAPTCHA
• 98% accuracy
• Black Hat ASIA 2016 – Sivakorn, Polakis, Keromutis
14. 14
SNAP_R – Automated Spear-Phishing on Twitter
• Man vs Machine – 2 hour bake off
• SNAP_R
• 819 tweets
• 6.85 simulated spear-phishing tweets/minute
• 275 victims
• Forbes staff writer Thomas Fox-Brewster
• 200 tweets
• 1.67 copy/pasted tweets/minute
• 49 vitcims
https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter.pdf
15. 15
DeepHack – DEF CON 25
• Open-source hacking AI: https://github.com/BishopFox/deephack
• Bot learns how to break into web applications
• Using a neural network + trial-and-error
• Learns to exploits multiple kinds of vulnerabilities without prior knowledge of
the applications
• Opening the door for hacking artificial intelligence systems in the future
• Only the beginning
• AI-based hacking tools are emerging as a class of technology that pentesters
have yet to fully explore.
• “We guarantee that you’ll be either writing machine learning hacking tools
next year, or desperately attempting to defend against them.”
Video: DEF CON 25 (2017) - Weaponizing Machine Learning - Petro, Morris - Stream - 30July2017
17. 18 ERT SUS
(Subscription)
ERT Active
Attackers Feed
Blocking Unknown
Attacks
Blocking Known
Attacks
Blocking Known
Attackers
Your Protected
Network
Radware
Attack Mitigation
System
Cloud Malware
Protection
Blocking APT &
0day Infections
COMPLEXITY
ABILITY TO MITIGATE AUTOMATICALLY / TIME TO MITIGATE
“Traditional”
Machine learning
Algorithms
Big Data,
Deep Learning
DefensePro
18. 19
Moving away from the Edge
• Centralizing protections based on Big Data and Deep
Learning models are able to:
• Find and detect anomalies
• Figure out complex relations that humans have a hard time to find in
huge sets of event data
• The output of the systems can be leveraged for near real-
time mitigation through Threat Intelligence feeds
• For automated blacklisting
• APIs integrations with protection devices for autonomously
adapting security policies
• Important in this Cloud-Service is the community or crowd
sourcing aspect
• It enables larger amounts of diverse ‘good’ training data
• While each member can leverage the intelligence for protection
(threads detected in any of the members …)
19. 2020
Radware ERT Active Attackers Feed
Staying Ahead of Emerging Threats & Attackers
20
PREEMTIVE PROTECTION
against known DDoS attackers
Preemptively blocking attackers
before they enter your network
ACTIVE ATTACKERS
blocked in real-time
Blocks IPs actively involved in DNS &
IoT Botnet DDoS attacks in last 24hrs
DATA CORRELATION
across multiple Radware sources
Cloud DDoS intelligence, global deception
network & real-life attack data
20. 2121
Radware’s ERT Active Attackers Feed – How It Works
#1.a Robust DDoS Attack Data
Collected from Radware’s Cloud
DDoS Scrubbing Centers
ERT Threat
Research Center
ERT Active
DDoS Feed
DefensePro
#1.b Continuous
Correlation with
Active Attackers
Identified from
Radware Detection
Network
#2 Feed Created #3 Feed Sent to DefensePro
Ready to block attackers
#1.c Botnet
Intelligence Algorithm
Identified from
Radware automatic
botnet detection/ERT
21. 22
Attacker Feed, real customer use cases
• Activation of the feed during POC for refreshing old DP1016
• More than 4200 distinct IPs hit
• Match ip blacklist from Customer’s SOC at 98,5% (197/200)
22. Cloud-based Malware Attack Detection & Mitigation Service
Visibility
Ongoing
Detection
Audit &
Report
Early
Prevention
C&C List
Subscription
Infection
Attempts
Reporting
New Malware
Detection
Simulated
Malware Attacks
23
23. 24
Radware Traffic Analysis Detects Anomalies
IDENTIFIED BY RADWARE AS ZERO-DAY MALWARE
5. Spoofed
Host Detection
Suspicious traffic
directed to
young domains
less than
1-year old
2. Age of
Domain Infected hosts
communicated in
predictable intervals
of ~10 minutes
3. Periodicity
Suspicious
traffic was
communicating to
websites with
few HTML objects
4. Site Richness
Header data
did not match
IP address of
destination host
1. Similarity
to Malicious
Communication
patterns were
similar to known
malicious behaviors
25. 26
Looking ahead…
• “Traditional” Machine Learning systems have been defending our networks
for some time already
• Attackers are maturing and attacks are getting more complex
• Detecting and stopping future attacks will require innovation
• This innovation could be based on Deep Learning
• Deep Learning Systems have their challenges to perform autonomously
• The theory behind today’s Neural Networks originates from the 60s
• Will we overcome these challenges with incremental advancements ?
• Or will we need another breakthrough in AI ?
• To achieve the ultimate goal of a fully autonomous cyber defense
Attackers are maturing and attacks are getting more complex,
especially on the cyberwar side, where government sponsored attacks have research investments that approach military proportions.
To detect and stop attacks, innovation is required.
Anomaly Detection based on traditional correlation rules may result in too many false positives and way too many events to be manually inspected and correlation rules to be updated continually.
Can happen that Tay, the Microsoft AI Twitter bot that was supposed to be a teenage girl but was turned into a sex-crazed, nazi-loving, Trump supporter within 24h by a bunch of guys (4CHAN).
http://searchsecurity.techtarget.com/tip/Evaluating-and-tuning-an-intrusion-detection-system
The False Negative/False Positive curves are inter-related. Of course to minimize false negative just “deny all” …
It’s possible to lower the risks of false negative increasing the sensitivity, basically with signature-based services for known vulnerabilities.
Anyway it’s needed a balancing, what is the optimum between the two risks …
Security threats are growing faster than security teams and budgets can keep up, and there’s already a huge talent shortage. The proliferation of data from the dozens of security products that a typical large organization deploys is paradoxically making it harder, not easier, for teams to detect and investigate threats.
Thousands of potential clues about hacking activity are overlooked or thrown away each day. At large companies, it’s not uncommon for IT systems to generate tens of thousands of security alerts a day. Security teams can usually filter these down to about a few thousand they think are worth investigating?—?but in a day’s work, they’re lucky if they can review a few hundred of them. Conversely, many investigations are hampered by the gaps in available information, simply because the cost of storing all the relevant data is increasing far faster than a typical organization’s budget.
As a result, it’s pretty common for hackers to go undetected for months, or for it to take a team months to fully understand what’s going on once they’ve detected an issue. All this adds up to more data breaches, more damage, and higher security costs.
Source: https://towardsdatascience.com/cousins-of-artificial-intelligence-dda4edc27b55
There is an urgent need for better and more automation when it comes to anomaly detection.
DeepLearning is the innovative technology that should bring us back in the game. While Machine Learning and DeepLearning neural networks are not new, are around since 30 years, still the base fo current neural networks. THE REDUCTION IN COST OF STORAGE AND COMPUTE RESOURCES HELPED IN THE RE-BIRTH OF THE TECHNOLOGY, but the Most Important Factor for enabling it is the availability of HUGE AMOUNT OF DATA (crowd sourced by the hyper-cloud giants who pushed the research forward.
Deep Learning is to be considered a black-box, can produce good or bad results. The 2 most important aspects are 1. massive amounts of Good data and 2. sizing the Deep Neural Network according to the problem, finding the right balance for fitting data with generalization (over- and under-fitting problem of regression)
It requires massive amount of Good data. Bad or poisoned data will lead to false negatives. Need to “train” the model in clean environments, synthetic data would create a lot of correlation and consequently adverse effect.
The traditional machine learning systems are evolving with the technology and problem-specific. It’s DETERMINISTIC.
Machine learning algorithms can be decomposed and problems exploded in smaller domains so that every component feeds the machine learning baseline. Examples are Radware algorithm for degree-of-attack, based on input of the TCP, UDP, DNS, .. And many other machine learning. (Behavioral-detection method, like rate-invariant SYN attack detection). Also, ‘good user’ vs ‘bad user’ classification on Appwall/WAF, Bot vs Human on Appwall/WAF, …
The code, the program, describe the expected behavior.
Data is used for baselining and ultimately used for detection anomalies.
With deep-learning, the model is generic. In theory, can use the same deep learning network to baseline and detect anomalies in TCP as well as UDP or any new Protocol. But would be needed to re-train (recalibrate) the model. So need a lot of good data to train it.
Traditional machine learning are more task specific, but can work with little to no data and do it with less false positives because they are more deterministic.
NN/DeepLearning, the data describes the expected behavior. Same model applied for different applications (face recognition, spam detection, …). It’s like programming with data (needs a lot of Good data). Example in Radware: Cloud Malware Protection
Amount of Good data
Deep Learning systems are not good at handling changing and dynamic environments.
As network grows, the system may need to be resized to prevent under-fitting.
As protocol change and device type are added/removed to the enviroments, models need to be re-trained to be effective.
Studies in Adversarial Machine Learning are ongoing, goal to find better ways to learn in the presence of adversaries and create models that are more resistant against noise and wrong labeled data.
Only way it to have amount of GOOD data >>>>>>>> amount of BAD data
Can happen that Tay, the Microsoft AI Twitter bot that was supposed to be a teenage girl but was turned into a sex-crazed, nazi-loving, Trump supporter within 24h by a bunch of guys (4CHAN).
Adversarial examples are hard to defend against because it is hard to construct a theoretical model of the adversarial example crafting process. Adversarial examples are solutions to an optimization problem that is non-linear and non-convex for many ML models, including neural networks. Because we don’t have good theoretical tools for describing the solutions to these complicated optimization problems, it is very hard to make any kind of theoretical argument that a defense will rule out a set of adversarial examples.
From another point of view, adversarial examples are hard to defend against because they require machine learning models to produce good outputs for every possible input. Most of the time, machine learning models work very well but only work on a very small amount of all the many possible inputs they might encounter.
Because of the massive amount of possible inputs, it is very hard to design a defense that is truly adaptive.
References:
http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-machine-learning-is-easier-than-defending-it.html
http://blog.ycombinator.com/how-adversarial-attacks-work/
In the area of bot detection and the problem for distinguishing between good humans, bad humans, good bots and bad bots, Captcha has been an annoying but effective way to differentiate humans from bots/scripts,
right until 2016 where researchers designed deep learning systems that can solve Google reCAPTCHA with 98% accuracy, better than most of us humans can solve these things.
Breaking reCAPTCHA with SVM - https://dl.acm.org/citation.cfm?id=2367894
Breaking Simple-Captcha: https://deepmlblog.wordpress.com/2016/01/03/how-to-break-a-captcha-system/
Big opportunities for hackers lay in automating social engineering and turning spear-phishing in massive, automated campaigns - systems automatically scrubbing the internet for personal data and learning from all the information to produce the ultimate message to trick a person to open an attachment or click a malicious link – see for example SNAP_R
New innovative technologies for automating cyber defense also means hackers will find ways to leverage them and abuse them for attacks. There has always been an imbalance between success rate for attacks vs defense: the defense has to continuously plug all holes and vulnerabilities while the offense only has to find a single vulnerability or hole to be successful.
No CISO ever got decorated for stopping 100s of attack attempts but immediately gets blamed if a single attempt got through its defenses. The same goes for applications of AI.
For defense, the is 0 toleration for error while the offense can work with an AI that spits out faulty results most of the time but by luck generates a single good output that results in a breach - see for example DeepHack.
Be Proactive - Preemptive blocking of Active Attackers
Radware uses its ongoing Attack knowledge to fine tune the threat intelligence and catch the IPs that have been involved in actual DDoS attacks in the past 24 hours
The idea is to stop the attackers from reconnaissance and attack preparation – look up open ports, what to attack
If you are difficult to crack in the reconnaissance stage, it is less likely that you will become a target – the difficulty is not worth it.
Diversion upon attack is avoided
Data Correation
Since we are taking drastic measures and blocking the IPs we want to verify and correlate the data with all available sources
Especially Known IoT devices
IoT devices once identified are less likely to change and in most cases should not be accessing the network in any case
The question remains if the incremental advancements in deep learning combined with adversarial studies will ultimately lead to the next generation of fully automated cyber defensive solutions.
Or if we need another breakthrough in machine learning and neural networks to achieve the ultimate goal of fully autonomous cyber defense.