Detection index learning based on cyber threat intelligence and its application by Tsuyoshi Taniguchi

Copyright 2017 FUJITSU SYSTEM INTEGRATION LABORATORIES LIMITED
Indicator learning based on cyber threat
intelligence and its application Overview
〜 Searching treasures from a vast amount of threat
information 〜
0
CODE BLUE Day0 - Special Track
Counter Cyber Crime Track
(November 8, 2017)
FUJITSU SYSTEM INTEGRATION LABORATORIES LTD.
Tsuyoshi TANIGUCHI

Treasures buried in a vast amount of threat
information
Copyright 2017 FUJITSU SYSTEM INTEGRATION LABORATORIES LIMITED1

Cyber Threat Intelligence
Cyber Threat Intelligence: CTI
A report that is created to share
knowledge on a particular thread

The traditional CTI: Shared by text
For a cyberattack called ○○, the involvement
of an attacker named △△ is strongly
suspected. As the method of attack, malware
called □□ connecting to C&C server with IP
xx.xx.xx.xx has been observed.
3

Next CTI: Readable by machines
<tag threat-name> ○○ </threat-name>
<tag attacker> △△ </attacker>
<tag attack-method> □□ </attack-method>
<tag ip> xx.xx.xx.xx </ip>
4

STIX (Structured Threat Information eXpression) Format
 One of the CIT
standards
 Consist of 8
information
groups
IPA's outline of STIX https://www.ipa.go.jp/security/vuln/STIX.html
5
Intent of Cyber attack
activities
Indicators to detect
attacks
Events observed by
attacks
Behaviors and methods
of cyber attackers Incidents
People/organizations
involved to cyber attacks
Vulnerabilities of targeted software,
systems, and configurations
Countermeasures against
threats

Issues to work on
Analysts have too much CTIs to
analysis
Encourage to share CIT using AIS
(Automated Indicator Sharing)
A vast amount of CTI could turn into
garbage

Motivation
To help analysts,
find special CTIs (treasures) that
describe attackers
from a vast amount of CTIs
(garbage)

Image of searching treasures from CTIs
Real-time type
CTI sources Others
Analysis type
CTI sources
CTI platform
Treasures
(Special CTIs)
8

Indicators
 Indicators to detect attacks with elements of CTIs
 Type of indicators
 IP address ←Target
 Domain ←Target
 Host
 E-mail
 URL
 Hash: MD5, SHA1, SHA256, PEHASH, IMPHASH
 …
• IP xxx.xxx.xxx.xxx
• IP yyy.yyy.yyy.yyy
• IP zzz.zzz.zzz.zzz
Unidentified (New)
Continued use
Reuse
9

Most of indicators (attack infrastructure) are
used just once
80% >
Used just once
My research
focuses on this part
10

Hypothesis of my research
Indicators on CTI show the attackers' footprints
Classify the indicators as the following 3 categories
Disposable (used just once)
Long life
Reuse

Image of how to use the result of indicator
learning
Real-time type
CTI sources
Black list
(Detection list)
Analysis type
CTI sources
Most of them are vanished
soon, but need to deal them
CTI platform
Special IP and
domain
A vast amount of
(unidentified) real-
time indicator
Extra deal,
more analysis
Indicator DB
12

Prior notice for indicator learning based on CTI
It's not a talk about deep learning or
clustering

1. Treasures buried in a vast amount of threat
information
CTI
STIX
Garbage
Treasures

Contents of the treasures

Real example 1 (1/2): Spam mails
Hi xxxxxx,
Congratulations!
You have access to your free
trading cash!
The money is sitting and waiting
in your account now.
Access Here Now
Thanks again
Dennis Mcclain
http://sectorservices[.]com[.]br/
components/com_tz_portfolio/v
iews/gallery/tmpl/
187.17.111[.]105
DNS
16

Indicator DB
Real example 1 (2/2): Usage of indicator learning
187.17.111[.]105
17

Real example 2 (1/2): Kelihos botnet
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Life-span of Botnet indicator (IP address) of Kelihos botnet in 2015
11 (/ 39,937)
lived for more
then 46 weeks
97.5% vanished
within 4 weeks
xx.xx.xx.41: 4/13 - 4/14
xx.xx.xx.42: 3/16
xx.xx.xx.46: 3/28 - 6/19
xx.xx.xx.47: 3/8 - 3/13
xx.xx.xx.48: 5/21 - 5/22
xx.xx.xx.51: 5/1 - 6/14
18

Treasures are buried
19

Real example 3: Estimation of attack trends
Long life type → DownloaderDisposable type → Botnet, DGA, etc
20

Real example 4 (1/2): Monitoring IP addresses that
could be used potentially by malicious activities
2014 at
present
2015 2016
GameOverZeus
Sality
CryptoWall
Tinba
DGA
21

Real example 4 (2/2): Verifications using passive
DNS services
 Passive Total by RiskIQ
Learning period
based on CTI
LOCKY spam
June 2016
4 (3rd) →
19 (4th) →
209 (5th)
398 (20th) →
573 (21st) →
584 (22nd)
22

2. Contents of the treasures
Long-life indicators
Attack trends
Proactive defenses

The way of searching treasures

CTI (indicators on CTIs) is a collection of biased
data
 The trouble of learning CTI indicators: a mass of bias
 In machine learning, statistical information of learning data is to be applied for
future...
 Unbalanced number of CTIs depending on specific malware (campaign)
 Ex. WannaCry, Petya, Bad Rabbit
 Bias of the quality of indicators
 Most of indicators are new (unidentified) or related to a part of a vast amount of CTIs
 Bias (difference) of the quality of attacks
 Botnet (distribution, non-discriminational type) or APT (Targeted)

Indicator learning
It's not enough just simply to apply standard algorithms
Majority: Use just once
Booms: Botnets etc use and then dispose a lot
Classification/Identification: Most of indicators can identify
malware
Searching treasures: Return to a problem to reveal
rare patterns (treasures)
Unable to find treasures by blindly searching all the
CTIs

Structure of indicator learning
CTI data source 1
Subgroup 1 Subgroup 2 Subgroup i⋯
Preprocessing
Indicator learning
Indicator DB
CTI data source 2 CTI data source 3
27

Preprocessing
Basically assume
the STIX format
and use a XML
parser
<stix:STIX_Package …>
<stix:STIX_Header>
…
</stix:STIX_Header>
<stix:Observables…>
…
<cybox:Title> IP addresses </cybox:Title>
…
<AddressObj:Address_Value> xxx.xxx.xxx.xxx </AddressObj:Address_Value>
…
<cybox:Title>Cerber IP addresses </cybox:Title>
…
<AddressObj:Address_Value> yyy.yyy.yyy.yyy </AddressObj:Address_Value>
…
</stix:Observables>
<stix:STIX_TTPs>
…
<ttp:Title> … </ttp:Title>
…
</stix:STIX_TTPs>
<stix:Campaigns>
…
<campaign:Title> Campaign1 </campaign:Title>
…
</stix:Campaigns>
…
28

Sub-grouping CTIs
• IP 1-1
• IP 1-2
• Domain 1-1
• ⋯
Subgroup1 - GOZ
CTI data source 1
Preprocessing
CTI data source 2 CTI data source 3
• IP 2-1
• IP 2-2
• Domain 2-1
• ⋯
⋯
• IP i-1
• IP i-2
• Domain i-1
• ⋯
Timeline
• IP 1-1
• IP 1-2
• Domain 1-1
• ⋯
Subgroup2 - Upatre
• IP 2-1
• IP 2-2
• Domain 2-1
• ⋯
⋯
• IP i-1
• IP i-2
• Domain i-1
• ⋯
Timeline
• IP 1-1
• IP 1-2
• Domain 1-1
• ⋯
Subgroup3 - Kelihos
• IP 2-1
• IP 2-2
• Domain 2-1
• ⋯
⋯
• IP i-1
• IP i-2
• Domain i-1
• ⋯
Timeline
• IP 1-1
• IP 1-2
• Domain 1-1
• ⋯
Subgroup4 - Pony
• IP 2-1
• IP 2-2
• Domain 2-1
• ⋯
⋯
• IP i-1
• IP i-2
• Domain i-1
• ⋯
Timeline
 GameOverZeus, Upatre, Kelihos, Pony, Locky, Domain Generation Algorithm, Dridex, DyreTrojan,
Cryptowall, Sality, Tinba, Torrent, KOL, Madness, APT28, APT10, Fallout, Lazarus, WannaCry, Petya
29

Learning life-span of indicators
As an indicator for CTIs, how long should it be kept?
• IP 1
• IP 2
CTI at 2/1 CTI at 2/8 CTI at 2/15 CTI at 2/22
CTIs related to a specific malware
• IP 1
• IP 3
• IP 1
• IP 4
• IP 1
30

Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
Life-span of Botnet indicator (IP address) of Kelihos botnet in 2015
11 (/ 39,937)
lived for more
then 46 weeks
97.5% vanished
within 4 weeks
xx.xx.xx.41: 4/13 - 4/14
xx.xx.xx.42: 3/16
xx.xx.xx.46: 3/28 - 6/19
xx.xx.xx.47: 3/8 - 3/13
xx.xx.xx.48: 5/21 - 5/22
xx.xx.xx.51: 5/1 - 6/14
31

Weighting indicators
 Compare IP addresses and domains between multiple subgroups
 Contrast Set Mining [Bay et.al 2001]
 Emerging Patterns [Dong and Li 1999]
itemset A
32
DB 1 DB 2
Possible to identify
itemset A
No appearance
IP, domain
Malware,
Campaign

IP addresses shared by multiple malwares
 More than 99%： Single subgroup
 Less than 1%: Multiple subgroups
456 / 58048:
0.79%
33

Real example 4 (1/2): Monitoring IP addresses that
could be used potentially by malicious activities
2014 at
present
2015 2016
GameOverZeus
Sality
CryptoWall
Tinba
DGA
34

Conclusion
1. Treasure is buried in CTIs
2. Need to have talented
guides to search treasures

Detection index learning based on cyber threat intelligence and its application by Tsuyoshi Taniguchi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Detection index learning based on cyber threat intelligence and its application by Tsuyoshi Taniguchi

Similar to Detection index learning based on cyber threat intelligence and its application by Tsuyoshi Taniguchi (20)

More from CODE BLUE

More from CODE BLUE (20)

Recently uploaded

Recently uploaded (20)

Detection index learning based on cyber threat intelligence and its application by Tsuyoshi Taniguchi

Editor's Notes