1
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Peter Elmer | Security Expert, EMEA | Office of the CTO
May 2021
The value of Machine Learning
in Cyber Security
DATA DRIVEN SECURITY
2
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
• Need for Data Driven Security
• Methods used
• Value of Machine Learning powered by human experience
• Effectivness of Data Driven Security
Today we look at …
3
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Collaboration
Intelligence
Experience
Key Ingredients For Success
Check Point Software Technologies
Founded in 1993, about 5.400 employees
Securing more than 100.000 customers
27 Years
4
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
“Important decision
points are taken by
machines with logic
created from data.”
Check Point, Data Scientists Team
October 2020
5
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Predicting Results Using Machine Learning
Humans deciding on features and labels
oval round
smooth surface undulating surface
sweet sour
‘for pie’ ‘for vine’
Data remains
Data destroid
Human experience is key when
assigning characteristics (features)
for predicting a result (label)
6
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Predicting?
7
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Logic Created From Data
Computer Logic
Data
Program
Deterministic result
Humans deciding for the best logic to achieve a result prior to ‘feeding’ the machine
Context Assumptions Conceptions
Machine Learning Algorithm
Data
Result
Characteristics of data (features) of historic results (labels) are presented to machine
Program / Model
Logic
Program / Model
Logic
New Data Probablistic result
8
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Probabilistic results?
9
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Probabilistic
Deterministic
10
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Vectorising
11
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Feeding more data
into the machine
increases accuracy
12
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Limited resources
Increasing
attack surface
13
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Attacking Is Easier Than Defending
Surface
• Intent
• Idea
• Plan
• Design Logic
• Source Code
• Compile
• Stream of bits
Process
Effort for defending
Effort for
defending
14
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Understanding
Intent
Optimizing
Resources
15
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
8 : 1
Applying Machine Learning requires
eight times less resources than preparing the data
16
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Mathematical
Representation
Abstraction
17
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
• An image of 224x224 RGB
is transformed by filters
becoming a number
• Convolutional filters
capture 3x3 pixels to
capture notion of ...
• right/left
• up/down
• center
• Accuracy of 92,7%
Changing Representation
Turning an image into a number – VGG16 Convolutional Network
Source: Neurohive – VGG16 Convolutional Network for Classification and Detection:
18
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
• Training a VGG16 with
fotos from Citiscapes
• Enhancing realismn of
animation
• Eliminating artefacts
Changing Representation
Turning an image into a number – VGG16 Convolutional Network
Source: Intel - Enhancing Photorealism Enhancement, May 2021
19
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Vectorising Elements – Example: Human Language
Describing meaning / intent to achieve an abstraction level
King
Queen
Man
Woman
Masculinity Femininity
Vectorising words allows ‘word algebra’ - Algebra allows Machine Learning
swimming
swam
walking
walked
Verb tense
Vectors are presenting the abstraction level
20
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Vectorising Elements – Natural Language Processing (NLP)
Describing meaning / intent to achieve an abstraction level
“NLP is a subfield of computer science and artificial intelligence
concerned with interactions between computers and human (natural) languages.
It is used to apply machine learning algorithms to text and speech.”
Source: towards data science
21
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Vectorising Elements – Why is NLP useful?
Describing meaning / intent to achieve an abstraction level
Pineapples
We know ‘Pineapples are spikey and yellow’
are
spikey
and
yellow
Input Projection Output
‘Give me the missing word’
Pineapples
are
spikey
and
yellow
Input Projection Output
‘Give me the context’
Reference: Tomas Mikolov et al. : Distributed Representations of Words and Phrases and their Compositionality
22
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Understanding
what is making
something different
How can we apply this
to Cyber Security?
23
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Vectorising Elements – Cyber Security
Applying NLP when Sandboxing executables
Observing API calls performed against the operating system
API calls are language and can be vectorised
24
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Vectorising Elements – Cyber Security
Applying TF-IDF when disassembling OPCODES
Borrowing TF-IDF algorithm from word document analysis
Source: http://filotechnologia.blogspot.com/2014/01/a-simple-java-class-for-tfidf-scoring.html
“TF-IDF is an information retrieval and information extraction subtask which
aims to express the importance of a word to a document which is part of a
collection of documents which we usually name a corpus. ”
25
©2021 Check Point Software Technologies Ltd.
Vectorising Elements – Cyber Security
Decoded machine language
Machine code has sequence – sequence has meaning
[Protected] Distribution or modification is subject to approval ​
26
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
• An executable file is fed into a neural network
• Each ‘filter‘ performs a mathematical operation
on a sliding patch
Changing Representation
Turning an executable file into vectors – VGG16 Convolutional Network
Source: Check Point, Data Scientists Team, October 2020
Original Convolved
27
©2021 Check Point Software Technologies Ltd.
Machine Learning In Cyber Security
Preventing Unknown Attacks
EXE
Understanding
Entropy & Structure
Disassembling
URL Verification
Finding Similarities
File/Registry
Classification
using provided
Meta Data
Verdict
Meta Data
PDF
PPT
DOC
XLS
PDF Analyzer
URL Verification
Macro Analyzer Classification
using provided
Meta Data
Verdict
Meta Data
[Protected] Distribution or modification is subject to approval ​
28
©2021 Check Point Software Technologies Ltd.
Machine Learning In Cyber Security
Preventing Unknown Attacks
On July 20th 2020 a sample was labeled malicious by our machine learning logic
[Protected] Distribution or modification is subject to approval ​
29
©2021 Check Point Software Technologies Ltd.
Machine Learning In Cyber Security
Preventing Unknown Attacks
On July 24th 2020 only 45 out of 73 engines on Virus Total labeled it malicious
[Protected] Distribution or modification is subject to approval ​
Four days later!
30
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Machine Learning In Cyber Security
Sharing experience
Source: https://research.checkpoint.com/category/how-to-guides/
31
©2021 Check Point Software Technologies Ltd.
Machine Learning In Cyber Security
‘Malware DNA’ based clustering applying TF-IDF
Two dimensional representation of
the 300 000 dimensional space
representing the ‘world of malware’
in Check Point Threat Intelligence
Colors representing labels of
malware families
[Protected] Distribution or modification is subject to approval ​
32
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Itay Cohen (Check Point) and Omri Ben Bassat (Intezer) mapped out an ecosystem
Results:
• Classification into 60 families
and 200 modules
• 22 000 connections between
analyzed samples
• Different Actors don’t share code
Access the interactive map
• Published as open source
Download the detector tool
• Defend and contribute
Map based on Fruchterman-Reingold algorithm
Read the full report:
Machine Learning In Cyber Security
‘Malware DNA’ applied to uncover an APT Eco System
33
©2021 Check Point Software Technologies Ltd.
Machine Learning In Cyber Security
Sharing experience
Understand how vulnerable on-premises and
cloud environments are
[Protected] Distribution or modification is subject to approval ​
Source: https://research.checkpoint.com/2021/deep-into-the-sunburst-attack/
Understanding the SolarWinds Orion Platform Security Advisory
16-December 2020, video, https://community.checkpoint.com/
34
©2021 Check Point Software Technologies Ltd.
Machine Learning In Cyber Security
The need for defense
BBC article about Colonial Pipeline attack, May 2021
[Protected] Distribution or modification is subject to approval ​
Source: https://www.bbc.com/news/business-57050690
Source: Check Point, Research Blog, May 2021
Update 17th May 2021: DarkSide is offline - https://krebsonsecurity.com/
35
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Understanding
the DNA of a
malware allows
attributing ‘family’
characteristics
36
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Knowing the ‘family’
…allows applying
tools for defense
..allows saving
resources
37
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
What‘s next?
38
©2021 Check Point Software Technologies Ltd.
Machine Learning – General Purpose
Comparing NLP-Trained Models
Over 300 apps are using GPT-3
https://openai.com/blog/gpt-3-apps/
GPT-3 API access is controlled
https://openai.com/blog/openai-api/
28th May 2020
14 Apps using GPT-3
[Protected] Distribution or modification is subject to approval ​
39
©2021 Check Point Software Technologies Ltd.
Machine Learning Empowers Threat Prevention
Every input for Threat Intelligence becomes a Label
More than 27 years of experience …
• Having access to data
• Knowing the labels
• Selecting the right features
• Creating ML algorithms
• ML empowers Threat Prevention
Data
Labels This is
This is
Feature1: form
Feature2: colour
Next
module
[Protected] Distribution or modification is subject to approval ​
40
©2021 Check Point Software Technologies Ltd.
Machine Learning Empowers Threat Prevention
The infinity cycle of learning
Incumbent
New DATA
Labeling
Training
Stand by
evaluation
Decision point
Federated Learning
Using encrypted customer data
Supervised by human expertise
Measuring
Unseen data
Adjusting weights
[Protected] Distribution or modification is subject to approval ​
41
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
The infinity cycle of learning is powered by us
42
©2021 Check Point Software Technologies Ltd.
[Protected] Distribution or modification is subject to approval ​
Peter Elmer | Security Expert, EMEA | Office of the CTO
pelmer@checkpoint.com, May 2021
THANK YOU

stackconf 2021 | Data Driven Security

  • 1.
    1 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Peter Elmer | Security Expert, EMEA | Office of the CTO May 2021 The value of Machine Learning in Cyber Security DATA DRIVEN SECURITY
  • 2.
    2 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • Need for Data Driven Security • Methods used • Value of Machine Learning powered by human experience • Effectivness of Data Driven Security Today we look at …
  • 3.
    3 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Collaboration Intelligence Experience Key Ingredients For Success Check Point Software Technologies Founded in 1993, about 5.400 employees Securing more than 100.000 customers 27 Years
  • 4.
    4 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ “Important decision points are taken by machines with logic created from data.” Check Point, Data Scientists Team October 2020
  • 5.
    5 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Predicting Results Using Machine Learning Humans deciding on features and labels oval round smooth surface undulating surface sweet sour ‘for pie’ ‘for vine’ Data remains Data destroid Human experience is key when assigning characteristics (features) for predicting a result (label)
  • 6.
    6 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Predicting?
  • 7.
    7 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Logic Created From Data Computer Logic Data Program Deterministic result Humans deciding for the best logic to achieve a result prior to ‘feeding’ the machine Context Assumptions Conceptions Machine Learning Algorithm Data Result Characteristics of data (features) of historic results (labels) are presented to machine Program / Model Logic Program / Model Logic New Data Probablistic result
  • 8.
    8 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Probabilistic results?
  • 9.
    9 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Probabilistic Deterministic
  • 10.
    10 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising
  • 11.
    11 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Feeding more data into the machine increases accuracy
  • 12.
    12 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Limited resources Increasing attack surface
  • 13.
    13 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Attacking Is Easier Than Defending Surface • Intent • Idea • Plan • Design Logic • Source Code • Compile • Stream of bits Process Effort for defending Effort for defending
  • 14.
    14 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding Intent Optimizing Resources
  • 15.
    15 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ 8 : 1 Applying Machine Learning requires eight times less resources than preparing the data
  • 16.
    16 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Mathematical Representation Abstraction
  • 17.
    17 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • An image of 224x224 RGB is transformed by filters becoming a number • Convolutional filters capture 3x3 pixels to capture notion of ... • right/left • up/down • center • Accuracy of 92,7% Changing Representation Turning an image into a number – VGG16 Convolutional Network Source: Neurohive – VGG16 Convolutional Network for Classification and Detection:
  • 18.
    18 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • Training a VGG16 with fotos from Citiscapes • Enhancing realismn of animation • Eliminating artefacts Changing Representation Turning an image into a number – VGG16 Convolutional Network Source: Intel - Enhancing Photorealism Enhancement, May 2021
  • 19.
    19 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Example: Human Language Describing meaning / intent to achieve an abstraction level King Queen Man Woman Masculinity Femininity Vectorising words allows ‘word algebra’ - Algebra allows Machine Learning swimming swam walking walked Verb tense Vectors are presenting the abstraction level
  • 20.
    20 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Natural Language Processing (NLP) Describing meaning / intent to achieve an abstraction level “NLP is a subfield of computer science and artificial intelligence concerned with interactions between computers and human (natural) languages. It is used to apply machine learning algorithms to text and speech.” Source: towards data science
  • 21.
    21 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Why is NLP useful? Describing meaning / intent to achieve an abstraction level Pineapples We know ‘Pineapples are spikey and yellow’ are spikey and yellow Input Projection Output ‘Give me the missing word’ Pineapples are spikey and yellow Input Projection Output ‘Give me the context’ Reference: Tomas Mikolov et al. : Distributed Representations of Words and Phrases and their Compositionality
  • 22.
    22 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding what is making something different How can we apply this to Cyber Security?
  • 23.
    23 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Cyber Security Applying NLP when Sandboxing executables Observing API calls performed against the operating system API calls are language and can be vectorised
  • 24.
    24 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Cyber Security Applying TF-IDF when disassembling OPCODES Borrowing TF-IDF algorithm from word document analysis Source: http://filotechnologia.blogspot.com/2014/01/a-simple-java-class-for-tfidf-scoring.html “TF-IDF is an information retrieval and information extraction subtask which aims to express the importance of a word to a document which is part of a collection of documents which we usually name a corpus. ”
  • 25.
    25 ©2021 Check PointSoftware Technologies Ltd. Vectorising Elements – Cyber Security Decoded machine language Machine code has sequence – sequence has meaning [Protected] Distribution or modification is subject to approval ​
  • 26.
    26 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • An executable file is fed into a neural network • Each ‘filter‘ performs a mathematical operation on a sliding patch Changing Representation Turning an executable file into vectors – VGG16 Convolutional Network Source: Check Point, Data Scientists Team, October 2020 Original Convolved
  • 27.
    27 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks EXE Understanding Entropy & Structure Disassembling URL Verification Finding Similarities File/Registry Classification using provided Meta Data Verdict Meta Data PDF PPT DOC XLS PDF Analyzer URL Verification Macro Analyzer Classification using provided Meta Data Verdict Meta Data [Protected] Distribution or modification is subject to approval ​
  • 28.
    28 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks On July 20th 2020 a sample was labeled malicious by our machine learning logic [Protected] Distribution or modification is subject to approval ​
  • 29.
    29 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks On July 24th 2020 only 45 out of 73 engines on Virus Total labeled it malicious [Protected] Distribution or modification is subject to approval ​ Four days later!
  • 30.
    30 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Machine Learning In Cyber Security Sharing experience Source: https://research.checkpoint.com/category/how-to-guides/
  • 31.
    31 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security ‘Malware DNA’ based clustering applying TF-IDF Two dimensional representation of the 300 000 dimensional space representing the ‘world of malware’ in Check Point Threat Intelligence Colors representing labels of malware families [Protected] Distribution or modification is subject to approval ​
  • 32.
    32 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Itay Cohen (Check Point) and Omri Ben Bassat (Intezer) mapped out an ecosystem Results: • Classification into 60 families and 200 modules • 22 000 connections between analyzed samples • Different Actors don’t share code Access the interactive map • Published as open source Download the detector tool • Defend and contribute Map based on Fruchterman-Reingold algorithm Read the full report: Machine Learning In Cyber Security ‘Malware DNA’ applied to uncover an APT Eco System
  • 33.
    33 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Sharing experience Understand how vulnerable on-premises and cloud environments are [Protected] Distribution or modification is subject to approval ​ Source: https://research.checkpoint.com/2021/deep-into-the-sunburst-attack/ Understanding the SolarWinds Orion Platform Security Advisory 16-December 2020, video, https://community.checkpoint.com/
  • 34.
    34 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security The need for defense BBC article about Colonial Pipeline attack, May 2021 [Protected] Distribution or modification is subject to approval ​ Source: https://www.bbc.com/news/business-57050690 Source: Check Point, Research Blog, May 2021 Update 17th May 2021: DarkSide is offline - https://krebsonsecurity.com/
  • 35.
    35 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding the DNA of a malware allows attributing ‘family’ characteristics
  • 36.
    36 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Knowing the ‘family’ …allows applying tools for defense ..allows saving resources
  • 37.
    37 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ What‘s next?
  • 38.
    38 ©2021 Check PointSoftware Technologies Ltd. Machine Learning – General Purpose Comparing NLP-Trained Models Over 300 apps are using GPT-3 https://openai.com/blog/gpt-3-apps/ GPT-3 API access is controlled https://openai.com/blog/openai-api/ 28th May 2020 14 Apps using GPT-3 [Protected] Distribution or modification is subject to approval ​
  • 39.
    39 ©2021 Check PointSoftware Technologies Ltd. Machine Learning Empowers Threat Prevention Every input for Threat Intelligence becomes a Label More than 27 years of experience … • Having access to data • Knowing the labels • Selecting the right features • Creating ML algorithms • ML empowers Threat Prevention Data Labels This is This is Feature1: form Feature2: colour Next module [Protected] Distribution or modification is subject to approval ​
  • 40.
    40 ©2021 Check PointSoftware Technologies Ltd. Machine Learning Empowers Threat Prevention The infinity cycle of learning Incumbent New DATA Labeling Training Stand by evaluation Decision point Federated Learning Using encrypted customer data Supervised by human expertise Measuring Unseen data Adjusting weights [Protected] Distribution or modification is subject to approval ​
  • 41.
    41 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ The infinity cycle of learning is powered by us
  • 42.
    42 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Peter Elmer | Security Expert, EMEA | Office of the CTO pelmer@checkpoint.com, May 2021 THANK YOU