SlideShare a Scribd company logo
1 of 45
"I Know What You Did Before":
General Framework for Correlation
Analysis of Cyber Threat Incidents
Daegeon Kim1, JiYoung Woo2, Huy Kang Kim1
1School of Information Security, Korea University, Republic of Korea
2DepartmentofBigDataEngineering,SoonchunghyangUniversity,RepublicofKorea
INDEX
1. Motivation
2. Objective
3. Event Relation Tree (ERT)
4. Event Transition Graph (ETG)
5. Experiments
6. Future Research
Motivation
• CTI Sharing Expedites
☞ CTI sharing is being promoted nationally & internationally!
Motivation
• CTI Expression Frameworks
by The MITRE Corporation
by The MITRE Corporation
by The MITRE Corporation
by MANDIANT Corporation
• CTI Exchange Frameworks (Platforms)
by The MITRE Corporation
by Computer Incident Response Center Luxembourg (CIRCL)
by The CSIRT Gadgets Foundation
MANTISby Siemens
Motivation
• CTI Analysis Methods
iDefense IntelGraph (by Verisign) Web Intelligence Engine (by Recorded Future)
Motivation
☞ But, we need a FRAMEWORK for CTI correlation analysis
• Little research has been conducted for
analyzing CTI despite following advantages:
1) Inter-operability of data (machine, vendor,
organization independent)
2) Compact expression of heterogeneous source of
threat information
3) Possibility of performing long-term and nation-
wide threat analysis
Motivation
Objective
• All types of data should be treated for integrated
analysis.
• Temporal variations of incident events should be
reflected so that CTI analyst can suppose the
attacker’s intention.
• Propose the general framework so that the
analysis can further be improved.
Event Relation Tree
• Event Relation Tree (ERT) is a tree-like graph
expressing relations of events.
EID: 1
EID: 2
EID: 3 EID: 4
EID: 5
EID: 6
Event
Relations
event name / ID (EID)
timestamp
IP / URL
yara matching rules
accounts
other string info.(e.g. mutex, boundary…)
data (CTI)
parent node
children nodes
event relations
(e.g., related EID, type:data)
EID:1, account:abc123
EID:1, IP:1.1.1.1
EID:6, account:abc123
EID:7, mutex:rainbow1EID : 7
[ ERT Example ]
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
EID: 1
May-1-2015
The initial event to analyze correlation is added in ERT as the root node.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
EID: 1
May-1-2015
The database storing CTI of incident events are provided as an input.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID 1 is already in ERT, so it is ignored.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
There is no relation between EID 1 and EID 2 in ERT, so the existence of the relation is checked.
Since no relation is found, jump to next iteration.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
Since EID 1 has a relation with EID 3 which is not in ERT, EID 3 and the relation is stored in ERT.
Especially, their relation is added to the nodes of EID 1 and EID 3 separately.
Because new node is added in ERT, the recursive call of ERTCONST function begins from it. (DFS)
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
Since EID 1 already has a relation with EID 3, it is ignored.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
There is no relation between EID 2 and EID 3 in ERT, so the existence of the relation is checked.
Since no relation is found, jump to next iteration.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
EID 3 is already in ERT, so it is ignored.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
There is no relation between EID 3 and EID 4 in ERT, so the existence of the relation is checked.
Since no relation is found, jump to next iteration.
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
Since EID 3 has a relation with EID 5 which is not in ERT, EID 5 and the relation is stored in ERT.
Because new node is added in ERT, the recursive call of ERTCONST function restarts from it.(DFS)
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
IfEID5hasarelationtoEID1bothofwhicharealreadyinERT,onlytherelationisaddedtothetwonodes.
EID: 1
May-1-2015
EID:3, account:abc
EID:5, IP:1.1.1.1
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
EID:1, IP:1.1.1.1
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID:5, IP:1.1.1.1
EID:4, string:wakeup
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
EID:1, IP:1.1.1.1
Let’s suppose no more relation is found under the node of EID 3 as the parent.
Then “node” is back to EID 1 and “new_event” indicates EID 4.
SincethereisarelationexistbetweenEID1andEID4whichisnotinERT,EID4andtherelationisaddedinERT.
At this point, the left and the right branches of EID 1 show different characteristics.
EID: 4
Dec-11-2015
EID:1, string:wakeup
Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID:5, IP:1.1.1.1
EID:4, string:wakeup
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
EID:1, IP:1.1.1.1
EID: 4
Dec-11-2015
EID:1, string:wakeup
EID:7, boundary:alphabeta
EID: 7
Apr-30-2015
EID:4, boundary:alphabeta
Let’s suppose EID 4 has no other relations until EID 6 and a relation is found with EID 7.
EID 7 and the relation is added to ERT.
Event Transition Graph
• Event Transition Graph (ETG) is constructed
from ERT by sorting with respect to event time.
• To preserve the branching characteristic in ERT,
the graph structure is adapted for ETG.
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
[ ERT ]
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
[ ETG ]
EID: 3
Jan-3-2016
transformation
(descending order sorting)
①
②
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
①
② ② ①
④
③
❶
❷
❸
❹
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
①
②
④
③
①
❷
❹
❶
❸
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
①
②
④
③
①
❶
❷
❸
❹
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
If the ancestor of EID 1, EID 3, had any relation to
the flipped branch, it might be inserted between EID 1 and EID 3.
EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
Experiments
• System Architecture
Experiments
• Dataset Generation
– Duration: 2011-2015
– Collection
• incident events in the field: malware, spear-phishing email…
• from malware sharing cites.. i.e., VirusTotal
– DataTypes:URL,IP,accounts,PDBpath,mutex,boundary,
filemapping,andotherkeywords.
– Size: around 18,000 records of 820 events
[ Sample Data ]
Experiments
• Preprocessing
– email accounts: domains are ignored
• helloworld0123@gmail.com v.s. helloworld0123@hotmail.com
– directory: each directories are parsed
• C:UsersBestHacker_JohnCampaign...
• D:BestHacker_JohnMyProject...
– IP classes: A, B, C, or D class
• B class: 1.2.3.4 → 1.2.X.X / C class: 1.1.1.1 → 1.2.3.X
– other heuristics and expert’s knowledge could be applied.
• i.e. (full name) Bart Simpson  B. Simpson
Experiments
• Case Study 1
– At the beginning of this campaign, compromised
websites were used as C2 server.
– But, at some point, the group used public cloud
services to disguise network traffic of the malware
shown to be normal.
Experiments
1
2
A
B
a
b
c
[ ERT ]
boundary used in C2 server
mutex in malwarecloud service
used as C2 server
: the initial event
Experiments
1 2
A
B
a
b
c
3
[ ETG ] the oldest event in Ⓑ
the first appeared event in Ⓐ
: descending start points (graph roots)
boundary used in C2 server
mutex in malwarecloud service
used as C2 server
Experiments
• Case Study 2
– The attacker group of the second case study is as known as
Lazarus Group who attacked Sony Pictures Entertainment
(SPE) in 2014.
Novetta, “Operation Blockbuster: Unraveling the Long Thread of
the Sony Attack”, 2015.
– Several types of malware were distributed by
compromised websites and spear-phishing email while
keep changing the functionality of them.
Experiments
[ ETG (ERT: left-bottom) ]
boundary
phising scam
Experiments
• Timeline Comparison of Analysis to Novetta’s Report*
* Novetta, “Operation Blockbuster: Unraveling the Long Thread of the Sony Attack,” 2015.
Future Research
Currently, only directly string matching methods are applied to each CIT of events.
 Probabilistic or heuristic approach need to be added.
i.e., calculating similarity score (probability) of events.
finding a relation from full name and initial of name.
EID: 1 EID: 2 EID: 3
a
b c d
Does the relations of “EID: 1- EID: 2” and “EID: 2 - EID: 3” have the same similarity level?
C:Program Files
Can we think EID: 1 and EID: 2 really have the meaningful relation?
EID: 1 EID: 2
EID: 1 EID: 2
Bart Simpson
bartsimpson.com
B. Simpson
bsimpson.net
Isn’t it possible to say EID: 1 and EID: 2 has a relation?
b
cb
Future Research
May 2015 Aug. 2015 Dec. 2015
a
c d
Feb. 2015
EID: 2 EID: 3 EID: 4
ERT
ETG
a
d
Feb. 2015
EID: 2 EID: 4
Dec. 2015
EID: 3
May 2015
EID: 1
Aug. 2015
EID: 1
It seems more reasonable to maintain the event sequence “EID: 3 - EID: 2 - EID: 4”;
EID: 1 may not have any relation to EID: 4.
b c
a
d
Feb. 2015
EID: 2
EID: 4
Dec. 2015
EID: 3 May 2015
EID: 1
Aug. 2015
Currently, ETGpreserve thechain ofbranches iftheeventsliesontheancestor-sibling relation inERT.
 Also, probabilistic approach needs to be added.
i.e., calculating relation score (probability) of events.
Q & A
Daegeon Kim
dgkim0803@korea.ac.kr

More Related Content

Similar to “I Know What You Did Before”: General Framework for Correlation Analysis of Cyber Threat Incidents

Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonIRJET Journal
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...Till Blume
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET Journal
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - UnicreditSplunk
 
RGiampaoli.DynamicIntegrations
RGiampaoli.DynamicIntegrationsRGiampaoli.DynamicIntegrations
RGiampaoli.DynamicIntegrationsRicardo Giampaoli
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudSecure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudIRJET Journal
 
A Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage
A Novel Efficient Remote Data Possession Checking Protocol in Cloud StorageA Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage
A Novel Efficient Remote Data Possession Checking Protocol in Cloud Storageijtsrd
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...ijdpsjournal
 
Optimizing Spark-based data pipelines - are you up for it?
Optimizing Spark-based data pipelines - are you up for it?Optimizing Spark-based data pipelines - are you up for it?
Optimizing Spark-based data pipelines - are you up for it?Itai Yaffe
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
CV - Luthfi Mohamad Latief
CV - Luthfi Mohamad LatiefCV - Luthfi Mohamad Latief
CV - Luthfi Mohamad Latieffahriyah
 
E2D3 introduction
E2D3 introductionE2D3 introduction
E2D3 introductionE2D3
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
 
Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010ERwin Modeling
 

Similar to “I Know What You Did Before”: General Framework for Correlation Analysis of Cyber Threat Incidents (20)

Cf33497503
Cf33497503Cf33497503
Cf33497503
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - Unicredit
 
RGiampaoli.DynamicIntegrations
RGiampaoli.DynamicIntegrationsRGiampaoli.DynamicIntegrations
RGiampaoli.DynamicIntegrations
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudSecure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
 
A Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage
A Novel Efficient Remote Data Possession Checking Protocol in Cloud StorageA Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage
A Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
 
Optimizing Spark-based data pipelines - are you up for it?
Optimizing Spark-based data pipelines - are you up for it?Optimizing Spark-based data pipelines - are you up for it?
Optimizing Spark-based data pipelines - are you up for it?
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
CV - Luthfi Mohamad Latief
CV - Luthfi Mohamad LatiefCV - Luthfi Mohamad Latief
CV - Luthfi Mohamad Latief
 
E2D3 introduction
E2D3 introductionE2D3 introduction
E2D3 introduction
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
 
Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Recently uploaded (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

“I Know What You Did Before”: General Framework for Correlation Analysis of Cyber Threat Incidents

  • 1. "I Know What You Did Before": General Framework for Correlation Analysis of Cyber Threat Incidents Daegeon Kim1, JiYoung Woo2, Huy Kang Kim1 1School of Information Security, Korea University, Republic of Korea 2DepartmentofBigDataEngineering,SoonchunghyangUniversity,RepublicofKorea
  • 2. INDEX 1. Motivation 2. Objective 3. Event Relation Tree (ERT) 4. Event Transition Graph (ETG) 5. Experiments 6. Future Research
  • 3. Motivation • CTI Sharing Expedites ☞ CTI sharing is being promoted nationally & internationally!
  • 4. Motivation • CTI Expression Frameworks by The MITRE Corporation by The MITRE Corporation by The MITRE Corporation by MANDIANT Corporation
  • 5. • CTI Exchange Frameworks (Platforms) by The MITRE Corporation by Computer Incident Response Center Luxembourg (CIRCL) by The CSIRT Gadgets Foundation MANTISby Siemens Motivation
  • 6. • CTI Analysis Methods iDefense IntelGraph (by Verisign) Web Intelligence Engine (by Recorded Future) Motivation ☞ But, we need a FRAMEWORK for CTI correlation analysis
  • 7. • Little research has been conducted for analyzing CTI despite following advantages: 1) Inter-operability of data (machine, vendor, organization independent) 2) Compact expression of heterogeneous source of threat information 3) Possibility of performing long-term and nation- wide threat analysis Motivation
  • 8. Objective • All types of data should be treated for integrated analysis. • Temporal variations of incident events should be reflected so that CTI analyst can suppose the attacker’s intention. • Propose the general framework so that the analysis can further be improved.
  • 9. Event Relation Tree • Event Relation Tree (ERT) is a tree-like graph expressing relations of events. EID: 1 EID: 2 EID: 3 EID: 4 EID: 5 EID: 6 Event Relations event name / ID (EID) timestamp IP / URL yara matching rules accounts other string info.(e.g. mutex, boundary…) data (CTI) parent node children nodes event relations (e.g., related EID, type:data) EID:1, account:abc123 EID:1, IP:1.1.1.1 EID:6, account:abc123 EID:7, mutex:rainbow1EID : 7 [ ERT Example ]
  • 10. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB EID: 1 May-1-2015 The initial event to analyze correlation is added in ERT as the root node.
  • 11. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB EID: 1 May-1-2015 The database storing CTI of incident events are provided as an input.
  • 12. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID 1 is already in ERT, so it is ignored.
  • 13. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 There is no relation between EID 1 and EID 2 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
  • 14. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID: 3 Jan-3-2016 EID:1, account:abc Since EID 1 has a relation with EID 3 which is not in ERT, EID 3 and the relation is stored in ERT. Especially, their relation is added to the nodes of EID 1 and EID 3 separately. Because new node is added in ERT, the recursive call of ERTCONST function begins from it. (DFS)
  • 15. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID: 3 Jan-3-2016 EID:1, account:abc Since EID 1 already has a relation with EID 3, it is ignored.
  • 16. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID: 3 Jan-3-2016 EID:1, account:abc There is no relation between EID 2 and EID 3 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
  • 17. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID: 3 Jan-3-2016 EID:1, account:abc EID 3 is already in ERT, so it is ignored.
  • 18. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID: 3 Jan-3-2016 EID:1, account:abc There is no relation between EID 3 and EID 4 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
  • 19. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID: 3 Jan-3-2016 EID:1, account:abc EID:5, mutex:hello EID: 5 Mar-16-2015 EID:3, mutex:hello Since EID 3 has a relation with EID 5 which is not in ERT, EID 5 and the relation is stored in ERT. Because new node is added in ERT, the recursive call of ERTCONST function restarts from it.(DFS)
  • 20. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event IfEID5hasarelationtoEID1bothofwhicharealreadyinERT,onlytherelationisaddedtothetwonodes. EID: 1 May-1-2015 EID:3, account:abc EID:5, IP:1.1.1.1 EID: 3 Jan-3-2016 EID:1, account:abc EID:5, mutex:hello EID: 5 Mar-16-2015 EID:3, mutex:hello EID:1, IP:1.1.1.1
  • 21. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID:5, IP:1.1.1.1 EID:4, string:wakeup EID: 3 Jan-3-2016 EID:1, account:abc EID:5, mutex:hello EID: 5 Mar-16-2015 EID:3, mutex:hello EID:1, IP:1.1.1.1 Let’s suppose no more relation is found under the node of EID 3 as the parent. Then “node” is back to EID 1 and “new_event” indicates EID 4. SincethereisarelationexistbetweenEID1andEID4whichisnotinERT,EID4andtherelationisaddedinERT. At this point, the left and the right branches of EID 1 show different characteristics. EID: 4 Dec-11-2015 EID:1, string:wakeup
  • 22. Event Relation Tree • ERT Construction (example) EID: 1 EID: 2 EID: 3 EID:4 EID: 5 EID: 6 EID: 7 Event DB node new_event EID: 1 May-1-2015 EID:3, account:abc EID:5, IP:1.1.1.1 EID:4, string:wakeup EID: 3 Jan-3-2016 EID:1, account:abc EID:5, mutex:hello EID: 5 Mar-16-2015 EID:3, mutex:hello EID:1, IP:1.1.1.1 EID: 4 Dec-11-2015 EID:1, string:wakeup EID:7, boundary:alphabeta EID: 7 Apr-30-2015 EID:4, boundary:alphabeta Let’s suppose EID 4 has no other relations until EID 6 and a relation is found with EID 7. EID 7 and the relation is added to ERT.
  • 23. Event Transition Graph • Event Transition Graph (ETG) is constructed from ERT by sorting with respect to event time. • To preserve the branching characteristic in ERT, the graph structure is adapted for ETG. EID: 1 May-1-2015 EID: 3 Jan-3-2016 EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 [ ERT ] EID: 1 May-1-2015 EID: 3 Jan-3-2016 EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 [ ETG ] EID: 3 Jan-3-2016 transformation (descending order sorting) ① ②
  • 24. Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 3 Jan-3-2016 EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att ① ② ② ① ④ ③ ❶ ❷ ❸ ❹
  • 25. Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 3 Jan-3-2016 EID: 5 Mar-16-2015
  • 26. Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 3 Jan-3-2016 EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 3 Jan-3-2016 EID: 5 Mar-16-2015
  • 27. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att ① ② ④ ③ ① ❷ ❹ ❶ ❸
  • 28. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 5 Mar-16-2015
  • 29. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 5 Mar-16-2015
  • 30. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 5 Mar-16-2015 ① ② ④ ③ ① ❶ ❷ ❸ ❹
  • 31. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 5 Mar-16-2015
  • 32. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 node iter curr dir: up / down att EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015 If the ancestor of EID 1, EID 3, had any relation to the flipped branch, it might be inserted between EID 1 and EID 3.
  • 33. EID: 3 Jan-3-2016 Event Transition Graph • ETG Construction (example) EID: 1 May-1-2015 node iter curr dir: up / down att EID: 5 Mar-16-2015 EID: 4 Dec-11-2015 EID: 7 Apr-30-2015
  • 35. Experiments • Dataset Generation – Duration: 2011-2015 – Collection • incident events in the field: malware, spear-phishing email… • from malware sharing cites.. i.e., VirusTotal – DataTypes:URL,IP,accounts,PDBpath,mutex,boundary, filemapping,andotherkeywords. – Size: around 18,000 records of 820 events [ Sample Data ]
  • 36. Experiments • Preprocessing – email accounts: domains are ignored • helloworld0123@gmail.com v.s. helloworld0123@hotmail.com – directory: each directories are parsed • C:UsersBestHacker_JohnCampaign... • D:BestHacker_JohnMyProject... – IP classes: A, B, C, or D class • B class: 1.2.3.4 → 1.2.X.X / C class: 1.1.1.1 → 1.2.3.X – other heuristics and expert’s knowledge could be applied. • i.e. (full name) Bart Simpson  B. Simpson
  • 37. Experiments • Case Study 1 – At the beginning of this campaign, compromised websites were used as C2 server. – But, at some point, the group used public cloud services to disguise network traffic of the malware shown to be normal.
  • 38. Experiments 1 2 A B a b c [ ERT ] boundary used in C2 server mutex in malwarecloud service used as C2 server : the initial event
  • 39. Experiments 1 2 A B a b c 3 [ ETG ] the oldest event in Ⓑ the first appeared event in Ⓐ : descending start points (graph roots) boundary used in C2 server mutex in malwarecloud service used as C2 server
  • 40. Experiments • Case Study 2 – The attacker group of the second case study is as known as Lazarus Group who attacked Sony Pictures Entertainment (SPE) in 2014. Novetta, “Operation Blockbuster: Unraveling the Long Thread of the Sony Attack”, 2015. – Several types of malware were distributed by compromised websites and spear-phishing email while keep changing the functionality of them.
  • 41. Experiments [ ETG (ERT: left-bottom) ] boundary phising scam
  • 42. Experiments • Timeline Comparison of Analysis to Novetta’s Report* * Novetta, “Operation Blockbuster: Unraveling the Long Thread of the Sony Attack,” 2015.
  • 43. Future Research Currently, only directly string matching methods are applied to each CIT of events.  Probabilistic or heuristic approach need to be added. i.e., calculating similarity score (probability) of events. finding a relation from full name and initial of name. EID: 1 EID: 2 EID: 3 a b c d Does the relations of “EID: 1- EID: 2” and “EID: 2 - EID: 3” have the same similarity level? C:Program Files Can we think EID: 1 and EID: 2 really have the meaningful relation? EID: 1 EID: 2 EID: 1 EID: 2 Bart Simpson bartsimpson.com B. Simpson bsimpson.net Isn’t it possible to say EID: 1 and EID: 2 has a relation?
  • 44. b cb Future Research May 2015 Aug. 2015 Dec. 2015 a c d Feb. 2015 EID: 2 EID: 3 EID: 4 ERT ETG a d Feb. 2015 EID: 2 EID: 4 Dec. 2015 EID: 3 May 2015 EID: 1 Aug. 2015 EID: 1 It seems more reasonable to maintain the event sequence “EID: 3 - EID: 2 - EID: 4”; EID: 1 may not have any relation to EID: 4. b c a d Feb. 2015 EID: 2 EID: 4 Dec. 2015 EID: 3 May 2015 EID: 1 Aug. 2015 Currently, ETGpreserve thechain ofbranches iftheeventsliesontheancestor-sibling relation inERT.  Also, probabilistic approach needs to be added. i.e., calculating relation score (probability) of events.
  • 45. Q & A Daegeon Kim dgkim0803@korea.ac.kr

Editor's Notes

  1. Hello, everybody. I am pleasure to introduce my research paper entitled as “I know that you did before”: General Framework for correlation analysis of cyber threat incident. Since the volume of the prepared presentation material is large, I will go though it as quickly as possible to have Q&A time at the last.
  2. Index is as follows. After talking about the objective of our research, I will introduce the novel concepts, Event Relation Tree and Event Transition Graph which are the keys for the correlation analysis. Finally, I will conclude with the experimental results and the future research concerns.
  3. Unlike to the physical crime where the evidence collected from the crime scene tells almost facts of the crime, it is hard to analyze the cyber crime with the evidence collected only from a victim machine. For this reason and the mutual interests against cyber crime, CTI sharing is being promoted nationally as well as internationally.
  4. Also, large contributions are being made for the systemic CTI expression and sharing. These are the some of the famous frameworks for CRI expression.
  5. And, these are the some of the well-known CTI exchange frameworks or platforms.
  6. Beyond utilizing CTI for applycation into the network defense systems or for understanding each incidents, we need to further retrieve relations and trends among CTI that expressing cyber incidents. Some methods exist for assisting CTI analysis by graphically expressing them simply based on some common attributes shared by the incidents. For example on the left image that is generated by iDefense IntelGraph by Verisign, the two events are linked by the similar malware family. With the simple connectivity based CTI representation, it is hard to analyze and capture the underlying features of CTI. Therefore, we need a framework for CTI analysis rather than only the methods to represent it.
  7. Even though many advantages of CTI analysis, there are little research being conducted for it. Such advantages are: Inter-operability which means formatted CTI could be used independent to machine, vendor, and organization just as xml format. Compact expression which means any heterogeneous source data and threat information could be simply expressed as formatted CTI. for the sake of above advantages and CTI sharing, one could conduct long term and nation wide threat analysis.
  8. Before developing CTI correlation analysis framework, we set three objectives which we are going to be achieved by using the framework. First, all types of data such as IP, URL, malware, hashes, any string values etc. could be dealt by the framework. Since APT groups’ intension or attack vector could keep changing, those temporal variations could be captured on the framework. At last, the proposed framework should be general regardless of specific techniques in it since the techniques currently might not be ripen and could be further improved,
  9. To achieve the objectives for CTI analysis, we introduce a tree-like graph structure named Event Relation Tree, ERT, which representing and containing information of an event itself as well as the relations between events. An event information is consisted by the event name and ID, the timestamp of the event and CTI of the event. The relations of a node are expressed by the link to the parent node & the children nodes and the event relations composed by the pairs of the related event ID and the CTI connecting two events.
  10. The right algorithm is the ERT construction procedure and I will show how it works using an example. Let’s suppose there are seven events in Event Database. If the initial event to run relational analysis is EID 1 then the event is added in the ERT as the root node.
  11. The database storing CTI of events are provided as an input.
  12. From now on, the recursive and iterative ERT construction process begins. In this ‘for loop’, the relations between the root event and other events in the DB are examined from ERTConst function. In ERTConst function, ‘node’ argument indicates the node in ERT and ‘new_event’ argument indicates the event in the event database which is examined the relation to ‘node’. Since the node of EID 1 is already in ERT, so it is ignored in this stage.
  13. Next, EID 2 in the DB is compared to the root node of the ERT. There is no relation between EID 1 and 2 stored in the ERT, so the relation existence is checked by FindRelation function. Since no relation is found, jump to next iteration.
  14. Next, EID 3 in the DB is compared to the root node of the ERT. Since new relation between EID 1 and EID 3 which is not in ERT, the node of EID 3 and the relation is stored in ERT. Especially, their relation is added to the nodes of EID 1 and EID 3 separately. Because new node is added in the ERT, the recursive call of ERTCONST function begins from it. This indicates ERT is constructed by Depth-First Search like manner.
  15. Next, because ERTCont cunction is called recursively from EID 3, EID 3 is passed as ‘node’ argument and EID 1 as ‘new_event’. Since EID 1 already has a relation with EID 3, it is ignored.
  16. Next, EID 2 in the DB is compared to EID 3 in the ERT. There is no relation between EID 2 and EID 3 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
  17. Here, EID 3 is already in ERT, so it is ignored.
  18. At this time, EID 4 in the DB is compared to EID 3 in the ERT. There is no relation between EID 4 and EID 3 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
  19. Now, there is a relation found between EID 3 in ERT and EID 5 in the database that is new to the ERT. The node of EID 5 is added as the children of the node of EID 3 with the relation to EID 3. The relation is added to the node of EID 3 as well. Because new node of EID 5 is added in ERT, the recursive call of ERTCONST function restarts from the node.
  20. If EID 5 has a relation to EID 1 both of which are already in ERT, only the relation is added to the two nodes.
  21. Let’s suppose no more relation is found under the node of EID 3 as the parent. Then “node” is back to EID 1 and “new_event” indicates EID 4. Since there is a relation exist between EID 1 and EID 4 which is not in ERT, the node of EID 4 is added in ERT. At this point, the left and the right branches of EID 1 show different characteristics where the left branch of the root node have some chain of the relation that is not shared by the node in the right branch of the root node, and vice versa.
  22. At last, let’s suppose EID 4 in the ERT has no other relations until EID 6 and then a new relation is found to EID 7. The node of EID 7 is newly added to the ERT and the relation is attached to the node of EID 4. This is the final ERT that is created as the correlation analysis from the EID 1.
  23. From the priviriously constructed ERT, we redefine the new graph-like structure called Event Transition Graph, ETG, to capture temporal transition of cyber incident by the same attribution. To preserve the branching characteristic in ERT, the graph structure is adapted for ETG. Let’s see the example. The left tree is the ERT we created before. To sort the right branch by descending order, the node of EID 4 should move up to the root node like the right graph. Next, to sort the left branch of the ERT, the node of EID 3 could move up to the node of EID 4 like the circled number 1. However, to preserve the branching characteristic of the original ERT, the node of EID 3 is attached to the node of EID 1 as the new parent so that the chain of relation of EID 4-EID 1-EID 7 and the other chain of relation of EID 3-EID 1-EID 5 is remained.
  24. The transition algorithm from ERT to ETG is provided in the right side, and now let’s see how it works from the ERT we generated. At the algorithm begins, it takes to two parameters. The one is ‘node’ which is the pivotal node to start sorting, and the other is the sorting direction, ‘dir’. The sorting starts from the root node to downstream direction. Next, it is checked whether the sorting process reached to the end. Since the current direction is downstream, the children node of the root is selected for iteration. For each selected node, the ‘curr’ node, for the iteration, it is checked whether the selected node should be relocated for the sorting, and the node to be newly attached, the ‘att’ node, is returned as the result.
  25. Once the ‘att’ node is selected, the parent-children relations below the ‘curr’ node are flipped since the direction is downstream. If the direction is upstream, the relation above the ‘curr’ node will be flipped.
  26. Finally, the ‘curr’ node is attached as the parent of the ‘att’ node, and sorting process restarts from the ‘curr’ node to the reverse direction, upstream, since the branch was flipped.
  27. Now, the ‘node’ is the node of EID 3 and it not located to the end of sorting process since the node of EID 5 exists as the parent. The parent node is set to be the ‘iter’ as well as the ‘curr’ node. Since EID 5 happened previous to EID 1, the node of EID 1 is selected as the ‘att’ node.
  28. The node of EID 5 reattached as the child of the node of EID 1, and ETGTransform method is recursively called from the node of EID 5.
  29. Since the direction is downstream and the node of EID 5 is the leaf node, current sorting recursion is terminated.
  30. It is time to sort the right branch of the node of EID 1. Similar to the process of the left branch of the node EID 1, the ‘curr’ node, which is the node of EID 4 in this iteration, should be moved as the parent of the node EID 1.
  31. Before relocation, the right branch is flipped.
  32. The flipped branch is newly added as another ancestor branch of the node EID 1.
  33. This is the final ETG after the node EID 7 is relocated at the end.
  34. We made experiments of the usability of ERT and ETG as the following process. For the detected and collected cyber incidents (or events), those are analyzed and investigated by human resources and the result is stored to MISP that could store CTI in structured format. For the malware, we applied our own yara rule and added the matching results to the database. After the preprocessing step, we construct ERT and ETG for the initial event to find correlation. The final analysis is made by human resource using the ERT and the ETG.
  35. The dataset is generated by cyber incidents from 2011 to 2015 which are indeed happened in the field. Also, we gathered additional malware we are interested form the malware repository. The stored data types include url, ip, accounts, PDB path and so on. The size is about 18,000 records of 820 events. The image below is the sample data to show how the stored CTI is composed of.
  36. It is possible to preprocess the dataset in many ways. These are some example of preprocessing one can apply to. For example, e-mail domains could be ignored to consider only account IDs. Also, full directory path may be meaningful but it is not the case frequently observed where the full path is matched. Therefore the directory name acquired by parsed path could be used. One could only consider the specific bandwidth of IP, and any other heuristics and expert’s knowledge could be applied on the preprocessing step.
  37. From now, I am going to show two case study results. This is the brief background of the events of the first case study. At the beginning of the campaign, compromised websites were used as C2 server. But, at some point, the campaign group started to use public cloud services to disguise network traffic generated by the malware shown to be normal.
  38. This is the ERT constructed from the initial event ID 18, the red circled node, which is also the root node of the ERT. A part of data is asterisked for the security reason. After following the sibling of the root, the children is branching into two ways. The events belonging to the right branch were using compromised webserver and the events belonging to the left branch were using cloud service for C2. During the transition of the two types, some malware shared the common mutex string.
  39. This is the ETG converted from the previous ERT. The circled number 3 that connected to the root nodes is the first appeared event among the left event group. One thing to notice is that event #5 and #4 which are actually belong to the right group are branched to the left. This is because those have relation to the events, the mutex, in the left group. This is the one of current limitation to be improved further. I will explain the limitations in detail at the end of the presentation.
  40. The attacker group of the second case study is as known as Lazarus Group who attacked Sony Pictures Entertainment (SPE) in 2014. This group distributed several types of malware by compromised websites and spear-phishing email while keep changing the functionality of them.
  41. The left-bottom is the ERT and the main image is the ETG. I will shortly explain the analysis result using the ETG constructed from the initial event 82. - The C2 command of bot-type malware in the event groups evolved from A to D. - There are fairly substantial factors indicating that those events were caused by an identical group: Some of the shared C2 commends in A with D and B with C, The boundary string(a) shared in C and D. - The incidents that the only downloader found in Sept. 2015 in D seem to download bots similar to 2 -5 which had spread from Jun. to Jul. 2015. - In Dec. 2014, bots had been spread by spear-phishing email ( 6-8 ). - The attacker group of the events of E appeared to be different from that of the other events in this ETG. But event 9 and many events in E shared the same C class of IP address used for attacks which implies the two attacker groups might have some relations.
  42. This is the comparison of our analysis to that of Novetta’s report. Type “A” to “D” is grouped by the ETG and temporal activity is shown as the red lines. The blue line is the campaigns and the durations by the attack group. The types categorized by our analysis is similar to that of Novetta in a way that the initial points the malware of each types were distributed are close each other. Moreover, our analysis indicates that the attack group had been acted frequently at least until last year using the similar types of malware.
  43. The functions in proposed general correlation analysis framework has room to be improved. Let’s think about the three cases. Does the relations of “EID: 1- EID: 2” sharing one data element and “EID: 2 - EID: 3” sharing three data element have the same similarity level? How about the second case? Can we think EID: 1 and EID: 2 really have the meaningful relation which shares very common directory path in the OS? In the third case, isn’t it possible to say EID: 1 and EID: 2 has a relation? It is not able to find reasonable relations of event in those cased using FindRelation function in ERT construction algorithm, since currently only directly string matching methods are applied to each CIT of events. To improve it, it is required to adapt probabilistic or heuristic approaches into FindRelation function.
  44. To explain required improvements of ETG transition algorithm, suppose the case of ETG transition from above ERT. Using the current GetNodeToAttach function in ETG tranfomation algorithm, the ERT would be transformed to be the first ETG since ETG preserve the chain of branches if the events lies on the ancestor-sibling relation in ERT. However, It seems more reasonable to only maintain the event sequence “EID: 3 - EID: 2 - EID: 4” because EID: 1 may not have any relation to EID: 4. To improve, again, probabilistic approaches into GetNodeToAttach function.
  45. This is the end of my presentation. If you have any question, please ask to me in here or by following e-mail. Thank you.