Nowadays, even your smallest company can generate huge sets of data. Fortunately for them, technology has kept up the pace and, with the dawn of Big Data, we are now able to store and analyze huge sets of digital information (read our previous article on The Cybersecurity Hydra and its Big Data nemesis here). What we must remember here is that, whereas this may appear to be a “Big Answer”, there is an even Bigger Question at stake.
What Goes Wrong with Language Definitions and How to Improve the Situation
Solving the bigger data question in cybersecurity
1. Solving the Bigger Data Question in Cybersecurity
Nowadays, even your smallest company can generate huge sets of data. Fortunately for them, technology has
kept up the pace and, with the dawn of Big Data, we are now able to store and analyze huge sets of digital
information (read our previous article on The Cybersecurity Hydra and its Big Data nemesis here). What we must
remember here is that, whereas this may appear to be a “Big Answer”, there is an even Bigger Question at stake.
Big Data is not about exploring and finding new sources of information, but rather it is about collecting and
unveiling what is already there, using newly found methods – much like a modern day archaeologist. The
purpose: take out Small Data in the form of valuable insights based on the interpretation of these very data
relics. Now, while all this sounds great in theory, we cannot help but ask ourselves: how do enterprises manage
to transfer oodles of data, within and between networks, in a secure manner? From where we stand, cybersecurity
experts are having a tough time monitoring it all and, as such, stealthy attacks go easily unnoticed. What do IT
execs do in this case? More often than not, they just hire more personnel. What’s one more additional person
spending his/her time reviewing false positives? Not so sure about that approach. As threats become increasingly
sophisticated and the organizational environment having a tendency to evolve, not to mention the
looming cybersecurity talent gap, employing more staff may prove to be not only costly, but inefficient as well.
May the best robot win… or not
Having moved on from an “if/then” paradigm in the development of modern security solutions, machine learning
(ML) provides algorithm-based judgment calls that enable a system’s ability to be the referee when it comes to
‘similar to’ situations. It’s the same when we switch between programming paradigms – from functional to
imperative, for instance. A functional approach involves composing the problem as a set of functions to be
executed, carefully defining the input to each function (the value returned is therefore entirely dependent on
the input). With an imperative approach (referred to as algorithmic programming) to problem solving, a
developer defines a sequence of steps/instructions that happen in order to accomplish the goal.
By definition a subset of Artificial Intelligence, machine learning can be supervised, unsupervised or
semisupervised. As the names directly imply, each ML type involves a certain degree of involvement on behalf
of the operator and demands a specific set of algorithms. Many voices say that, given how scarce experienced
professionals in cybersecurity are becoming, the goal should be to replace them altogether with a sort of
supreme Artificial Intelligence, capable of being omniscient and of rooting out all security threats – your typical
Man versus Machine dystopian scenario, where the All Powerful AI wins. Translating this from fiction to fact: the
world is waiting for that perfect unsupervised machine learning system, a system capable of knowing what we
want to know before we even know it. And that’s where we tend to disagree.
As more and more robots and AIs are becoming better than humans at some jobs (find out “what are the 10
jobs robots already do better than you” here), cybersecurity is not your average occupation. While machine
learning is awesome (there’s really no other word for it) and companies such as Facebook and Netflix have hit
the jackpot with it, the issue is not the same when it comes to IT security. We neither want to be able to tag
our photos better nor to receive more movie suggestions. In cybersecurity, we need to be able to detect
unknown threats despite weak signals and to reduce this detection time to almost real-time – all aspects in
which unsupervised machine learning does not excel. Leaving all decisions up to a ML-powered system will
2. Machine Learning: the Jarvis to your Iron Man
If neither the machine, nor the man can fight alone against cyber-threats, why not combine forces? The goal
shouldn’t be to replace humans with AI nor to leave it all to the AI. If we were to look for inspiration elsewhere,
let’s say the Marvel universe, the best of superheroes are those whose powers had been enhanced by some
not-so-realistic gadget. Whereas machine learning is far from being perfect, it has the potential to be a true
side-kick for the expert analyst – the real-life (realistic) equivalent of JARVIS, Tony Stark’s artificially intelligent
computer. JARVIS (Just A Rather Very Intelligent System), just like ML, warns of potential dangers and dismisses
them once the call is made by its user, improving its distinction between normal and malicious behaviors over
time. Integrated in the Iron Man armor and Stark’s home defenses, it is the perfect metaphor for illustrating the
symbiosis human/AI we should aspire to.
So where do you start? Well, first, for a more dramatic effect, put your Iron Man suit on. Then, try pinpointing
the issue. Do you just need to detect compromised users? Or do you suspect you’ve been or you will be
attacked? Either way, a specific use case needs to be developed. From there on, the data required to solve the
problem needs to be identified. If you’re after advanced persistent threats, then look for information regarding
the existing security and network infrastructure. Be sure to combine multiple sources (not necessarily more, just
diverse) to get a 360° view of your user activity. If your machine learning analytics are multi-dimensional, you
should be able to catch malware early in the kill-chain, spotting anomalies such as privilege escalation, lateral
movement, data exfiltration, etc.
3. Finally, be patient. The core task of machine learning being to replicate and predict, it takes time. The system
needs to gather enough data and feed it to its behavior analysis engines in order to achieve an accurate
classification between normal and abnormal behaviors. Starting with a training set, a sample of good code and
one of bad code, ML filters them with the help of statistical algorithms and, through multiple iterations, it slowly
learns to distinguish between the two. We say “slowly”, but it’s actually incredibly fast compared to past
technologies: known threats are identified almost instantly with the help of existing knowledge bases, while in
the case of unknown threats it’s a matter of days (1 week with Reveelium, read our article here). But remember
– there are some behaviors that we still don’t know yet and, as such, we cannot teach them to the system. Also,
while malware can be predicted this way with a high degree of probability, it is still the human in the Iron Man
suit that has the last say in the matter.
Link:
https://www.reveelium.com/en/big-question-in-cybersecurity/