Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
“I Know What You Did Before”: General Framework for Correlation Analysis of Cyber Threat Incidents
1. "I Know What You Did Before":
General Framework for Correlation
Analysis of Cyber Threat Incidents
Daegeon Kim1, JiYoung Woo2, Huy Kang Kim1
1School of Information Security, Korea University, Republic of Korea
2DepartmentofBigDataEngineering,SoonchunghyangUniversity,RepublicofKorea
4. Motivation
• CTI Expression Frameworks
by The MITRE Corporation
by The MITRE Corporation
by The MITRE Corporation
by MANDIANT Corporation
5. • CTI Exchange Frameworks (Platforms)
by The MITRE Corporation
by Computer Incident Response Center Luxembourg (CIRCL)
by The CSIRT Gadgets Foundation
MANTISby Siemens
Motivation
6. • CTI Analysis Methods
iDefense IntelGraph (by Verisign) Web Intelligence Engine (by Recorded Future)
Motivation
☞ But, we need a FRAMEWORK for CTI correlation analysis
7. • Little research has been conducted for
analyzing CTI despite following advantages:
1) Inter-operability of data (machine, vendor,
organization independent)
2) Compact expression of heterogeneous source of
threat information
3) Possibility of performing long-term and nation-
wide threat analysis
Motivation
8. Objective
• All types of data should be treated for integrated
analysis.
• Temporal variations of incident events should be
reflected so that CTI analyst can suppose the
attacker’s intention.
• Propose the general framework so that the
analysis can further be improved.
9. Event Relation Tree
• Event Relation Tree (ERT) is a tree-like graph
expressing relations of events.
EID: 1
EID: 2
EID: 3 EID: 4
EID: 5
EID: 6
Event
Relations
event name / ID (EID)
timestamp
IP / URL
yara matching rules
accounts
other string info.(e.g. mutex, boundary…)
data (CTI)
parent node
children nodes
event relations
(e.g., related EID, type:data)
EID:1, account:abc123
EID:1, IP:1.1.1.1
EID:6, account:abc123
EID:7, mutex:rainbow1EID : 7
[ ERT Example ]
10. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
EID: 1
May-1-2015
The initial event to analyze correlation is added in ERT as the root node.
11. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
EID: 1
May-1-2015
The database storing CTI of incident events are provided as an input.
12. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID 1 is already in ERT, so it is ignored.
13. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
There is no relation between EID 1 and EID 2 in ERT, so the existence of the relation is checked.
Since no relation is found, jump to next iteration.
14. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
Since EID 1 has a relation with EID 3 which is not in ERT, EID 3 and the relation is stored in ERT.
Especially, their relation is added to the nodes of EID 1 and EID 3 separately.
Because new node is added in ERT, the recursive call of ERTCONST function begins from it. (DFS)
15. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
Since EID 1 already has a relation with EID 3, it is ignored.
16. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
There is no relation between EID 2 and EID 3 in ERT, so the existence of the relation is checked.
Since no relation is found, jump to next iteration.
17. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
EID 3 is already in ERT, so it is ignored.
18. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
There is no relation between EID 3 and EID 4 in ERT, so the existence of the relation is checked.
Since no relation is found, jump to next iteration.
19. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
Since EID 3 has a relation with EID 5 which is not in ERT, EID 5 and the relation is stored in ERT.
Because new node is added in ERT, the recursive call of ERTCONST function restarts from it.(DFS)
21. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID:5, IP:1.1.1.1
EID:4, string:wakeup
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
EID:1, IP:1.1.1.1
Let’s suppose no more relation is found under the node of EID 3 as the parent.
Then “node” is back to EID 1 and “new_event” indicates EID 4.
SincethereisarelationexistbetweenEID1andEID4whichisnotinERT,EID4andtherelationisaddedinERT.
At this point, the left and the right branches of EID 1 show different characteristics.
EID: 4
Dec-11-2015
EID:1, string:wakeup
22. Event Relation Tree
• ERT Construction (example)
EID: 1
EID: 2
EID: 3
EID:4
EID: 5
EID: 6
EID: 7
Event DB
node
new_event
EID: 1
May-1-2015
EID:3, account:abc
EID:5, IP:1.1.1.1
EID:4, string:wakeup
EID: 3
Jan-3-2016
EID:1, account:abc
EID:5, mutex:hello
EID: 5
Mar-16-2015
EID:3, mutex:hello
EID:1, IP:1.1.1.1
EID: 4
Dec-11-2015
EID:1, string:wakeup
EID:7, boundary:alphabeta
EID: 7
Apr-30-2015
EID:4, boundary:alphabeta
Let’s suppose EID 4 has no other relations until EID 6 and a relation is found with EID 7.
EID 7 and the relation is added to ERT.
23. Event Transition Graph
• Event Transition Graph (ETG) is constructed
from ERT by sorting with respect to event time.
• To preserve the branching characteristic in ERT,
the graph structure is adapted for ETG.
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
[ ERT ]
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
[ ETG ]
EID: 3
Jan-3-2016
transformation
(descending order sorting)
①
②
24. Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
①
② ② ①
④
③
❶
❷
❸
❹
25. Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
26. Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 3
Jan-3-2016
EID: 5
Mar-16-2015
27. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
①
②
④
③
①
❷
❹
❶
❸
28. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
29. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
30. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
①
②
④
③
①
❶
❷
❸
❹
31. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
32. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
If the ancestor of EID 1, EID 3, had any relation to
the flipped branch, it might be inserted between EID 1 and EID 3.
33. EID: 3
Jan-3-2016
Event Transition Graph
• ETG Construction (example)
EID: 1
May-1-2015
node
iter
curr
dir: up / down
att
EID: 5
Mar-16-2015
EID: 4
Dec-11-2015
EID: 7
Apr-30-2015
35. Experiments
• Dataset Generation
– Duration: 2011-2015
– Collection
• incident events in the field: malware, spear-phishing email…
• from malware sharing cites.. i.e., VirusTotal
– DataTypes:URL,IP,accounts,PDBpath,mutex,boundary,
filemapping,andotherkeywords.
– Size: around 18,000 records of 820 events
[ Sample Data ]
36. Experiments
• Preprocessing
– email accounts: domains are ignored
• helloworld0123@gmail.com v.s. helloworld0123@hotmail.com
– directory: each directories are parsed
• C:UsersBestHacker_JohnCampaign...
• D:BestHacker_JohnMyProject...
– IP classes: A, B, C, or D class
• B class: 1.2.3.4 → 1.2.X.X / C class: 1.1.1.1 → 1.2.3.X
– other heuristics and expert’s knowledge could be applied.
• i.e. (full name) Bart Simpson B. Simpson
37. Experiments
• Case Study 1
– At the beginning of this campaign, compromised
websites were used as C2 server.
– But, at some point, the group used public cloud
services to disguise network traffic of the malware
shown to be normal.
39. Experiments
1 2
A
B
a
b
c
3
[ ETG ] the oldest event in Ⓑ
the first appeared event in Ⓐ
: descending start points (graph roots)
boundary used in C2 server
mutex in malwarecloud service
used as C2 server
40. Experiments
• Case Study 2
– The attacker group of the second case study is as known as
Lazarus Group who attacked Sony Pictures Entertainment
(SPE) in 2014.
Novetta, “Operation Blockbuster: Unraveling the Long Thread of
the Sony Attack”, 2015.
– Several types of malware were distributed by
compromised websites and spear-phishing email while
keep changing the functionality of them.
42. Experiments
• Timeline Comparison of Analysis to Novetta’s Report*
* Novetta, “Operation Blockbuster: Unraveling the Long Thread of the Sony Attack,” 2015.
43. Future Research
Currently, only directly string matching methods are applied to each CIT of events.
Probabilistic or heuristic approach need to be added.
i.e., calculating similarity score (probability) of events.
finding a relation from full name and initial of name.
EID: 1 EID: 2 EID: 3
a
b c d
Does the relations of “EID: 1- EID: 2” and “EID: 2 - EID: 3” have the same similarity level?
C:Program Files
Can we think EID: 1 and EID: 2 really have the meaningful relation?
EID: 1 EID: 2
EID: 1 EID: 2
Bart Simpson
bartsimpson.com
B. Simpson
bsimpson.net
Isn’t it possible to say EID: 1 and EID: 2 has a relation?
44. b
cb
Future Research
May 2015 Aug. 2015 Dec. 2015
a
c d
Feb. 2015
EID: 2 EID: 3 EID: 4
ERT
ETG
a
d
Feb. 2015
EID: 2 EID: 4
Dec. 2015
EID: 3
May 2015
EID: 1
Aug. 2015
EID: 1
It seems more reasonable to maintain the event sequence “EID: 3 - EID: 2 - EID: 4”;
EID: 1 may not have any relation to EID: 4.
b c
a
d
Feb. 2015
EID: 2
EID: 4
Dec. 2015
EID: 3 May 2015
EID: 1
Aug. 2015
Currently, ETGpreserve thechain ofbranches iftheeventsliesontheancestor-sibling relation inERT.
Also, probabilistic approach needs to be added.
i.e., calculating relation score (probability) of events.
Hello, everybody.
I am pleasure to introduce my research paper entitled as “I know that you did before”: General Framework for correlation analysis of cyber threat incident.
Since the volume of the prepared presentation material is large, I will go though it as quickly as possible to have Q&A time at the last.
Index is as follows.
After talking about the objective of our research, I will introduce the novel concepts, Event Relation Tree and Event Transition Graph which are the keys for the correlation analysis.
Finally, I will conclude with the experimental results and the future research concerns.
Unlike to the physical crime where the evidence collected from the crime scene tells almost facts of the crime, it is hard to analyze the cyber crime with the evidence collected only from a victim machine.
For this reason and the mutual interests against cyber crime, CTI sharing is being promoted nationally as well as internationally.
Also, large contributions are being made for the systemic CTI expression and sharing.
These are the some of the famous frameworks for CRI expression.
And, these are the some of the well-known CTI exchange frameworks or platforms.
Beyond utilizing CTI for applycation into the network defense systems or for understanding each incidents,
we need to further retrieve relations and trends among CTI that expressing cyber incidents.
Some methods exist for assisting CTI analysis by graphically expressing them simply based on some common attributes shared by the incidents.
For example on the left image that is generated by iDefense IntelGraph by Verisign, the two events are linked by the similar malware family.
With the simple connectivity based CTI representation, it is hard to analyze and capture the underlying features of CTI.
Therefore, we need a framework for CTI analysis rather than only the methods to represent it.
Even though many advantages of CTI analysis, there are little research being conducted for it.
Such advantages are:
Inter-operability which means formatted CTI could be used independent to machine, vendor, and organization just as xml format.
Compact expression which means any heterogeneous source data and threat information could be simply expressed as formatted CTI.
for the sake of above advantages and CTI sharing, one could conduct long term and nation wide threat analysis.
Before developing CTI correlation analysis framework, we set three objectives which we are going to be achieved by using the framework.
First, all types of data such as IP, URL, malware, hashes, any string values etc. could be dealt by the framework.
Since APT groups’ intension or attack vector could keep changing, those temporal variations could be captured on the framework.
At last, the proposed framework should be general regardless of specific techniques in it since the techniques currently might not be ripen and could be further improved,
To achieve the objectives for CTI analysis, we introduce a tree-like graph structure named Event Relation Tree, ERT, which representing and containing information of an event itself as well as the relations between events.
An event information is consisted by the event name and ID, the timestamp of the event and CTI of the event.
The relations of a node are expressed by the link to the parent node & the children nodes and the event relations composed by the pairs of the related event ID and the CTI connecting two events.
The right algorithm is the ERT construction procedure and I will show how it works using an example.
Let’s suppose there are seven events in Event Database.
If the initial event to run relational analysis is EID 1 then the event is added in the ERT as the root node.
The database storing CTI of events are provided as an input.
From now on, the recursive and iterative ERT construction process begins.
In this ‘for loop’, the relations between the root event and other events in the DB are examined from ERTConst function.
In ERTConst function, ‘node’ argument indicates the node in ERT and ‘new_event’ argument indicates the event in the event database which is examined the relation to ‘node’.
Since the node of EID 1 is already in ERT, so it is ignored in this stage.
Next, EID 2 in the DB is compared to the root node of the ERT.
There is no relation between EID 1 and 2 stored in the ERT, so the relation existence is checked by FindRelation function.
Since no relation is found, jump to next iteration.
Next, EID 3 in the DB is compared to the root node of the ERT.
Since new relation between EID 1 and EID 3 which is not in ERT, the node of EID 3 and the relation is stored in ERT.
Especially, their relation is added to the nodes of EID 1 and EID 3 separately.
Because new node is added in the ERT, the recursive call of ERTCONST function begins from it.
This indicates ERT is constructed by Depth-First Search like manner.
Next, because ERTCont cunction is called recursively from EID 3, EID 3 is passed as ‘node’ argument and EID 1 as ‘new_event’.
Since EID 1 already has a relation with EID 3, it is ignored.
Next, EID 2 in the DB is compared to EID 3 in the ERT.
There is no relation between EID 2 and EID 3 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
Here, EID 3 is already in ERT, so it is ignored.
At this time, EID 4 in the DB is compared to EID 3 in the ERT.
There is no relation between EID 4 and EID 3 in ERT, so the existence of the relation is checked. Since no relation is found, jump to next iteration.
Now, there is a relation found between EID 3 in ERT and EID 5 in the database that is new to the ERT.
The node of EID 5 is added as the children of the node of EID 3 with the relation to EID 3. The relation is added to the node of EID 3 as well.
Because new node of EID 5 is added in ERT, the recursive call of ERTCONST function restarts from the node.
If EID 5 has a relation to EID 1 both of which are already in ERT, only the relation is added to the two nodes.
Let’s suppose no more relation is found under the node of EID 3 as the parent.
Then “node” is back to EID 1 and “new_event” indicates EID 4.
Since there is a relation exist between EID 1 and EID 4 which is not in ERT, the node of EID 4 is added in ERT.
At this point, the left and the right branches of EID 1 show different characteristics where the left branch of the root node have some chain of the relation that is not shared by the node in the right branch of the root node, and vice versa.
At last, let’s suppose EID 4 in the ERT has no other relations until EID 6 and then a new relation is found to EID 7.
The node of EID 7 is newly added to the ERT and the relation is attached to the node of EID 4.
This is the final ERT that is created as the correlation analysis from the EID 1.
From the priviriously constructed ERT, we redefine the new graph-like structure called Event Transition Graph, ETG, to capture temporal transition of cyber incident by the same attribution.
To preserve the branching characteristic in ERT, the graph structure is adapted for ETG. Let’s see the example.
The left tree is the ERT we created before. To sort the right branch by descending order, the node of EID 4 should move up to the root node like the right graph.
Next, to sort the left branch of the ERT, the node of EID 3 could move up to the node of EID 4 like the circled number 1.
However, to preserve the branching characteristic of the original ERT, the node of EID 3 is attached to the node of EID 1 as the new parent so that the chain of relation of EID 4-EID 1-EID 7 and the other chain of relation of EID 3-EID 1-EID 5 is remained.
The transition algorithm from ERT to ETG is provided in the right side, and now let’s see how it works from the ERT we generated.
At the algorithm begins, it takes to two parameters. The one is ‘node’ which is the pivotal node to start sorting, and the other is the sorting direction, ‘dir’.
The sorting starts from the root node to downstream direction.
Next, it is checked whether the sorting process reached to the end.
Since the current direction is downstream, the children node of the root is selected for iteration.
For each selected node, the ‘curr’ node, for the iteration, it is checked whether the selected node should be relocated for the sorting, and the node to be newly attached, the ‘att’ node, is returned as the result.
Once the ‘att’ node is selected, the parent-children relations below the ‘curr’ node are flipped since the direction is downstream. If the direction is upstream, the relation above the ‘curr’ node will be flipped.
Finally, the ‘curr’ node is attached as the parent of the ‘att’ node, and sorting process restarts from the ‘curr’ node to the reverse direction, upstream, since the branch was flipped.
Now, the ‘node’ is the node of EID 3 and it not located to the end of sorting process since the node of EID 5 exists as the parent.
The parent node is set to be the ‘iter’ as well as the ‘curr’ node.
Since EID 5 happened previous to EID 1, the node of EID 1 is selected as the ‘att’ node.
The node of EID 5 reattached as the child of the node of EID 1, and ETGTransform method is recursively called from the node of EID 5.
Since the direction is downstream and the node of EID 5 is the leaf node, current sorting recursion is terminated.
It is time to sort the right branch of the node of EID 1.
Similar to the process of the left branch of the node EID 1, the ‘curr’ node, which is the node of EID 4 in this iteration, should be moved as the parent of the node EID 1.
Before relocation, the right branch is flipped.
The flipped branch is newly added as another ancestor branch of the node EID 1.
This is the final ETG after the node EID 7 is relocated at the end.
We made experiments of the usability of ERT and ETG as the following process.
For the detected and collected cyber incidents (or events), those are analyzed and investigated by human resources and the result is stored to MISP that could store CTI in structured format.
For the malware, we applied our own yara rule and added the matching results to the database.
After the preprocessing step, we construct ERT and ETG for the initial event to find correlation.
The final analysis is made by human resource using the ERT and the ETG.
The dataset is generated by cyber incidents from 2011 to 2015 which are indeed happened in the field. Also, we gathered additional malware we are interested form the malware repository.
The stored data types include url, ip, accounts, PDB path and so on. The size is about 18,000 records of 820 events.
The image below is the sample data to show how the stored CTI is composed of.
It is possible to preprocess the dataset in many ways.
These are some example of preprocessing one can apply to.
For example, e-mail domains could be ignored to consider only account IDs.
Also, full directory path may be meaningful but it is not the case frequently observed where the full path is matched. Therefore the directory name acquired by parsed path could be used.
One could only consider the specific bandwidth of IP, and any other heuristics and expert’s knowledge could be applied on the preprocessing step.
From now, I am going to show two case study results.
This is the brief background of the events of the first case study.
At the beginning of the campaign, compromised websites were used as C2 server.
But, at some point, the campaign group started to use public cloud services to disguise network traffic generated by the malware shown to be normal.
This is the ERT constructed from the initial event ID 18, the red circled node, which is also the root node of the ERT. A part of data is asterisked for the security reason.
After following the sibling of the root, the children is branching into two ways.
The events belonging to the right branch were using compromised webserver and the events belonging to the left branch were using cloud service for C2.
During the transition of the two types, some malware shared the common mutex string.
This is the ETG converted from the previous ERT.
The circled number 3 that connected to the root nodes is the first appeared event among the left event group.
One thing to notice is that event #5 and #4 which are actually belong to the right group are branched to the left.
This is because those have relation to the events, the mutex, in the left group. This is the one of current limitation to be improved further. I will explain the limitations in detail at the end of the presentation.
The attacker group of the second case study is as known as Lazarus Group who attacked Sony Pictures Entertainment (SPE) in 2014.
This group distributed several types of malware by compromised websites and spear-phishing email while keep changing the functionality of them.
The left-bottom is the ERT and the main image is the ETG. I will shortly explain the analysis result using the ETG constructed from the initial event 82.
- The C2 command of bot-type malware in the event groups evolved from A to D.
- There are fairly substantial factors indicating that those events were caused by an identical group:
Some of the shared C2 commends in A with D and B with C,
The boundary string(a) shared in C and D.
- The incidents that the only downloader found in Sept. 2015 in D seem to download bots similar to 2 -5 which had spread from Jun. to Jul. 2015.
- In Dec. 2014, bots had been spread by spear-phishing email ( 6-8 ).
- The attacker group of the events of E appeared to be different from that of the other events in this ETG. But event 9 and many events in E shared the same C class of IP address used for attacks which implies the two attacker groups might have some relations.
This is the comparison of our analysis to that of Novetta’s report.
Type “A” to “D” is grouped by the ETG and temporal activity is shown as the red lines.
The blue line is the campaigns and the durations by the attack group.
The types categorized by our analysis is similar to that of Novetta in a way that the initial points the malware of each types were distributed are close each other.
Moreover, our analysis indicates that the attack group had been acted frequently at least until last year using the similar types of malware.
The functions in proposed general correlation analysis framework has room to be improved.
Let’s think about the three cases.
Does the relations of “EID: 1- EID: 2” sharing one data element and “EID: 2 - EID: 3” sharing three data element have the same similarity level?
How about the second case? Can we think EID: 1 and EID: 2 really have the meaningful relation which shares very common directory path in the OS?
In the third case, isn’t it possible to say EID: 1 and EID: 2 has a relation?
It is not able to find reasonable relations of event in those cased using FindRelation function in ERT construction algorithm, since currently only directly string matching methods are applied to each CIT of events.
To improve it, it is required to adapt probabilistic or heuristic approaches into FindRelation function.
To explain required improvements of ETG transition algorithm, suppose the case of ETG transition from above ERT.
Using the current GetNodeToAttach function in ETG tranfomation algorithm, the ERT would be transformed to be the first ETG since ETG preserve the chain of branches if the events lies on the ancestor-sibling relation in ERT.
However, It seems more reasonable to only maintain the event sequence “EID: 3 - EID: 2 - EID: 4” because EID: 1 may not have any relation to EID: 4.
To improve, again, probabilistic approaches into GetNodeToAttach function.
This is the end of my presentation. If you have any question, please ask to me in here or by following e-mail.
Thank you.