SlideShare a Scribd company logo
1 of 34
CLUE
CLUSTERING FOR MINING WEB URLS
Andrea Morichetta
Enrico Bocchi
Hassan Metwalley
Marco Mellia
name.surname@polito.it
ITC28
Wรผrzburg, September 15th, 2016
CONTEXT
Part 1
2
SCENARIO
Internet evolution and needs for monitoring.
3,256,931,615
Users in the world
December 2nd 2015 [http://www.internetlivestats.com]
1,930,257,214
Subscriptions to โ€œmobile
networksโ€
December 2013
[Source: ITU]
Network
Monitoring
To obtain quality and
security
3
THE WEB TODAY
Internet increasing complexity. 4
THE WEB TODAY
Internet increasing complexity.
Ads
Tracking
Malware
5
THE WEB AND MALICIOUS TRAFFIC
HTTP traffic monitoring to track anomalous and potentially malicious behaviors.
Malware
Zero-day
Compromised
machines talk to
the C&C.
C&C Server
Firewall
Compromised Host
mlw.com/abc
Firewall blocks
malicious
requests using
static rules.
6
THE WEB AND MALICIOUS TRAFFIC
HTTP traffic monitoring to track anomalous and potentially malicious behaviors.
Malware
Zero-day
C&C Server
Firewall
Compromised host
mlw.com/abc
malw.com/abd
Algorithmically
generated URLs starting
from seeds
(e.g. current date
or Twitter trends)
They elude static
controls,
based on blacklists,
changing URLsโ€™
paths and hostnames
7
THE WEB AND MALICIOUS TRAFFIC
HTTP traffic monitoring to track anomalous and potentially malicious behaviors.
Malware
Zero-day
C&C Server
Firewall
Compromised host
mlw.com/abc
malw.com/abd
Algorithmically
generated URLs starting
from seeds
(e.g. current date
or Twitter trends)
They elude static
controls,
based on blacklists,
changing URLsโ€™
paths and hostnames
HTTP traffic
monitoring
Group algorithmically
generated URLs.
Control and monitor
possible, not-checked,
malicious behaviors.
Or generically better
understanding the traffic on
the Web.
8
EXAMPLE: TIDSERV
Malware TidServ analysis.
Profit-making purpose
It spreads with users complicity
URLs characterized by pseudo-randomness
Trojan Rootkit
9
EXAMPLE: TIDSERV
Malware TidServ analysis.
swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMTIy
MDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxPXV
pbmZlIG5ZGVzaw==38c
rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMT
IyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxP
WZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g
Profit-making purpose
It spreads with users complicity
URLs characterized by pseudo-randomness
Trojan Rootkit
10
EXAMPLE: TIDSERV
Malware TidServ analysis.
swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMTIy
MDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxPXV
pbmZlIG5ZGVzaw==38c
rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMT
IyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxP
WZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g
Profit-making purpose
It spreads with users complicity
URLs characterized by pseudo-randomness
Trojan Rootkit
How to automatically detect this behavior?
Which are services adopting these techniques?
11
METHODOLOGY
Part 2
12
CLUE in a nutshell
โ€ข HTTP traffic analysis -> How to find similar URLs?
โ€ข How similar are two strings?
โ€ข How to group similar URLs?
โ€ข Clustering algorithms -> Which algorithm? Which parameters?
โ€ข How to suggest relevant clusters?
โ€ข Highlight relevant clusters to further mine
Big data approach for HTTP mining
DBSCAN
clustering
Results
Distance
calculation
Log
CLUE: CLustering for URL Exploration
13
SCENARIO
Traffic collected from a network with more than 20000 Hosts connected.
IDS
HTTP requests
DBSCAN
calculation
Results
Distance
calculation
Log
URLs
extraction
Internal
Clients
Edge
Router
External
Servers
Labels
14
INPUT
Aggregated HTTP log carry one entry for each HTTP request. Millions of requests are collected per day.
Timestamp Hostname Path
1339130937 www.emilbanca.it /emilbanca/img2009/angolo_blu_dx.gif
1339130938 fpdownload.adobe.com /pub/swz/crossdomain.xml
1339130941 8lnxpg8vwyuzhbol.com /rkF4Tx3x8N4YR8C5dj0xLjkmaWQ9NmQzNDZkY2FkYzU4Yzk0ODBkZDliODNkNjYxYzIzMmNjZTZhZDY4ZCZhaWQ9MzA0Mjgmc2lk
PTQmb3M9NS4xIDAwMDAgU1AwLjAma3c9WlcxcGJHSmhibU5oRFFwbGJXbHNZbUZ1WTJFTkNtVnlibVZ6ZEc4cmMzUnliM3A2Y
VN0bVpYSnlZWEpoRFFvPSZ1cmw9YUhSMGNEb3ZMM2QzZHk1cGJtSmhibXN1YVhRdlpuVnVZM1JwYjI0dmJHOW5hVzR2YVc1a1
pYZ3Vhbk53UDJ4aGJtYzlhWFFtWVdKcFBUQTNNRGN5Sm1OemN6MHdOekEzTWc9PSZyZWY9ZDNkM0xtVnRhV3hpWVc1allTNX
BkQzl3YjNKMFlXd3ZjR0ZuWlQ5ZmNHRm5aV2xrUFRJMk9ERXNNU1pmWkdGa1BYQnZjblJoYkNaZmMyTm9aVzFoUFZCUFVsUkJU
QT0935A
1339130945 83.133.121.147 /c/kaw0hOOD6x5Jpso2440a89f7bdeb9da9f4b5af9160e66aa908c
1339130946 delivery.jemacpv.com /network/c/adclick.php
1339130946 delivery.jemacpv.com /network/c/adclick.php
1339130947 www.peaktube.com /video_play
1339130948 www.peaktube.com /redirect.php
1339130949 cdn1.static.videobash.com /css/ie8-style-new.css
1339130949 www.videobash.com /video_play
1339130949 cdn1.static.videobash.com /css/style_new.css
1339130954 www.emilbanca.it /emilbanca/img2009/labanca_con.jpg
1339130965 www.emilbanca.it /emilbanca/img2009/labanca_con.jpg
1339130980 img3.iol.it /s/sport/med/balotelli-quotma-quale-derby-di-mercato-quot.jpg
1339130980 rta.criteo.com /dis/rtt.js
1339130980 img1.iol.it /img107/share/pubblicita/07/76/2012/4/nome.jpg
1339130980 img3.iol.it /img107/coldx/appl/01/1047/2011/3/3.gif
1339130980 img1.iol.it /img107/coldx/appl/03/3043/2012/3/03_110x107.jpg
1339130980 img3.iol.it /s/lavoro/116/autogrill-licenzia-in-massa.jpg
1339130980 www.libero.it /
DBSCAN
calculation
Results
Distance
calculation
Log
URLs
extraction
How to express URLs similarity?
15
SIMILARITY
Comparison between elements with no good understanding a priori.
LEVENSHTEIN
DISTANCE
JARO
DISTANCE
URL
DISTANCE
Simple Levenstein
distance: assigns a unit
cost to all edit operations
Levenshtein modified:
unitary weight for adding
and removing edit
operations, double weight
for replacements
The Jaro algorithm is a
measure that evaluates
the number and order of
features in common
Edit Distance
Class of distance functions in which, given two strings s and t, distance is the cost of
best sequence of edit operations that convert s to t.
DBSCAN
calculation
Results
Distance
calculation
Log
URLs
extraction
16
DISTANCE EVALUATION IN PRACTICE
Comparing distance measures behavior with TidServ elements.
LEVENSHTEIN
DISTANCE
a.
swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ
3ZTEzMTIyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdv
b2dsZS5pdCZxPXVpbmZlIG5ZGVzaw==38c
b.
iau71nag001.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OG
Q3ZTEzMTIyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmd
vb2dsZS5pdCZxPXVpbmZlIG15ZGVzaw==38c
c.
rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ
3ZTEzMTIyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdv
b2dsZS5pdCZxPWZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g
d.
iau71nag001.com/Kvb13nWd6P4XrFs3dmVyPTQuMiZiaWQ9MDU0NWQwZDQwY2MyODU4YWNj
YzFlZjJkM2FiZDA5N2RiYmRlYmVkZiZhaWQ9NTAwMTgmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdv
b2dsZS5pdCZxPWZhY2Vib29r27c
e.
zhakazth.cn/qkF3Vrye5c4qHoo4dmVyPTUuMCZzPTAmYmlkPTA1MzMyNGU1MzQzMDY5NTZiYW
YxNGViYTQ5YWY4ZGZhM2I2OWEwYTQmYWlkPTMwNDIxJnNpZD0zJmVuZz13d3cuZ29vZ2xlLml
0JnE9dHJvbWJhdGErdnVhaWVyK2NvbiticmFzaWxpYW5hK2luK3NwaWFnZ2lhJng4Nj02NA==16h
a-b:
11
a-c:
56
a-d:
97
a-e:
182
17
DISTANCE EVALUATION IN PRACTICE
Comparing distance measures behavior with TidServ elements.
LEVENSHTEIN
DISTANCE
JARO
DISTANCE
URL
DISTANCE
a-b:
0.05
a-c:
0.26
a-d:
0.47
a-b:
0.13
a-c:
0.28
a-d:
0.33
a-b:
11
a-c:
56
a-d:
97
a-e:
182 a-e:
0.38
a-e:
0.7
Pros:
โ€ข enhance differences
โ€ข normalized
18
EXAMPLE
๐‘ข๐‘Ÿ๐‘™1 = โ€˜๐‘”๐‘œ๐‘œ๐‘”๐‘™๐‘’. ๐‘๐‘œ๐‘šโ€™ 10 ๐‘โ„Ž๐‘Ž๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘’๐‘Ÿ๐‘  ;
๐‘ข๐‘Ÿ๐‘™2 = โ€˜1๐‘”๐‘œ๐‘”๐‘”๐‘™๐‘’. ๐‘๐‘œ๐‘šโ€™ 11 ๐‘โ„Ž๐‘Ž๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘’๐‘Ÿ๐‘  ;
๐ฟ๐‘’๐‘ฃ๐‘’๐‘›๐‘ โ„Ž๐‘ก๐‘’๐‘–๐‘› ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ž๐‘›๐‘๐‘’ ๐‘š๐‘œ๐‘‘ ๐‘ข๐‘Ÿ๐‘™1, ๐‘ข๐‘Ÿ๐‘™2 =
1 ๐‘Ž๐‘‘๐‘‘ ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก: 1 + 1 ๐‘Ÿ๐‘’๐‘๐‘™๐‘Ž๐‘๐‘’๐‘š๐‘’๐‘›๐‘ก ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก: 2 = 3;
๐‘ซ๐‘ผ๐‘น๐‘ณ ๐’–๐’“๐’๐Ÿ, ๐’–๐’“๐’๐Ÿ =
3
10+11
= 0.143
URL DISTANCE
Measure to calculate strings similarity.
๐‘ซ๐‘ผ๐‘น๐‘ณ ๐’”๐’•๐’“๐’Š๐’๐’ˆ๐Ÿ, ๐’”๐’•๐’“๐’Š๐’๐’ˆ๐Ÿ =
๐ฟ๐‘’๐‘ฃ๐‘’๐‘›๐‘ โ„Ž๐‘ก๐‘’๐‘–๐‘› ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ž๐‘›๐‘๐‘’ ๐‘š๐‘œ๐‘‘(๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”1, ๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”2)
๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”1 + ๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”2
Based on Levensthein
distance,
unitary weight for adding
and removing,
double weight for
replacements
plus normalization
FORMULA
DBSCAN
calculation
Results
Distance
calculation
Log
URLs
extraction
How to use this metric to group
similar URLs?
19
DBSCAN
Clustering algorithm used for grouping URLs together.
Features
It allows the presence of outliers: prevents non-
coherent elements to be added to the cluster.
Must not define the number of clusters a priori
Must not define centroids
Do not mandatory require points in Euclidean space
Can handle different shaped clusters and not only
globular ones
Parameters
Epsilon, radius of the considered area
Min points, minimum number of points inside the area Example of clustering with DBSCAN
Based on the idea of density, intended as the
number of points in a specific area; compared
to other algorithms families it provides partial
solutions.
DBSCAN
calculation
Results
Distance
calculation
Log
URLs
extraction
20
SCHEMA
Final schema. Developed in Python.
Log files
URLs List URL distance between
every couple of elements
Compute
Distance
Matrix
Distance Matrix
Extract HTTP
Object URLs
Load
Distance
Matrix
Compute
DBSCAN
Clusters
Statistics
DBSCAN
calculation
Results
Distance
calculation
URLs
extraction
Log
21
RESULTS
Part 3
22
DISTINCT URL ELEMENTS
ANALYSIS
Analysis of the HTTP traffic
generated by
14 Hosts infected by TidServ
20 randomly selected Hosts.
Analysis of DBSCAN clustering on 34 Hostsโ€™ Test Set.
About
TidServ 228
Other malware 33
Benign 78160
Total 78421
DBSCAN
calculation
Results
Distance
calculation
URLs
extraction
Log
Is it possible to separate all the 228
malicious URLs from the data?
And which parameters shall be used?
23
URLs
extraction
CLUSTERING
Results for 34 Hosts infected by TidServ.
NUMBER OF OUTLIERS
Performance
Decrease in the number of outliers, for
growing Epsilon.
DBSCAN
calculation
Log
Results
Distance
calculation
Lots of outliers
Few outliers
24
URLs
extraction
CLUSTERING
Results for 34 Hosts infected by TidServ.
NUMBER OF CLUSTERS
Performance
More complicated relations with the
number of clusters
Increase in the number of clusters for
Epsilon = 0.2 and 0.225, due to the fact
that many elements previously
considered noise constitute new
clusters.
DBSCAN
calculation
Log
Results
Distance
calculation
Lots of very small clusters
Few giant clusters
Which E allows us to isolate the 228
malicious URLs?
Note: from 78000++ URLs to 300 clusters
25
URLs
extraction
CLUSTERING
Results for 34 Hosts infected by TidServ.
CLUSTERING RESULTS
FOR TIDSERV - OUTLIERS
Performance
Decrease in the number of outliers, until
reaching 0 for Epsilon = 0.4.
DBSCAN
calculation
Log
Results
Distance
calculation
All Tidserv URLs are clustered
26
URLs
extraction
CLUSTERING
Results for 34 Hosts infected by TidServ.
CLUSTERING RESULTS
FOR TIDSERV
Performance
Constant and coherent growing of the
number of known elements included and
ability to aggregate additional not-
reported elements.
DBSCAN
calculation
Log
Results
Distance
calculation
Nr. of IDS-
flagged
URLs
(228)
Few giant clusters
Why more than 228 URLs are actually
clustered?
27
URLs
extraction
CLUSTERING
Results for 34 Hosts infected by TidServ.
CLUSTERING RESULTS
FOR TIDSERV
Performance
Constant and coherent growing of the
number of known elements included and
ability to aggregate additional not-
reported elements.
DBSCAN
calculation
Log
Results
Distance
calculation
Cluster ID TidServ - IDS Count All elements Count
A 5 5
B 18 32
C 5 6
D 75 79
E 118 192
F 6 6
G 1 37
Total 228 357
Do those clusters contain actually
similar URLs?
28
TIDSERV ANALYSIS
Cluster G โ€“ Compare Elements
โ€ข gnu4oke0r.com/4VY00y9P7Z5xiPs9dmVyPTQuMCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT
AmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxPWxvdWlzIGNydWlzZXM=16h
โ€ข lkckclcklii1i.com/TAR3vUsX844qz1c5Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==27g
โ€ข lkckclckl1i1i.com/TAR3vUsX844qz1c5Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==27g
โ€ข lkckclcklii1i.com/ZvP1nw3P6z6XLSs7Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==26g
โ€ข lkckclckl1i1i.com/ZvP1nw3P6z6XLSs7Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT
AmcmQ9MA==26g
โ€ข lkckclcklii1i.com/yVv4l79D5E7yT8u9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==18x
โ€ข lkckclckl1i1i.com/yVv4l79D5E7yT8u9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==18x
โ€ข lkckclcklii1i.com/3Zh2DpoP583XBvc2Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==05Z
โ€ข lkckclckl1i1i.com/3Zh2DpoP583XBvc2Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA
mcmQ9MA==05Z
โ€ข lkckclcklii1i.com/ZaW4pfQP6P4Q7EO9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT
AmcmQ9MA==06c
โ€ข lkckclckl1i1i.com/ZaW4pfQP6P4Q7EO9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT
AmcmQ9MA==06c
โ€ข lkckclcklii1i.com/SVn4kZCE8Y6MEes8Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT
AmcmQ9MA==38A
โ€ข lkckclckl1i1i.com/SVn4kZCE8Y6MEes8Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT
AmcmQ9MA==38A
Tidserv OK (better than IDS!!!)
But what about the 300++ clusters?
29
SILHOUETTE
Silhouette values distribution for some representative clustering results.
CALCULATIONS
Performance
Consider clusters with more than 20
elements
Most clusters have silhouette > 0
Tidservโ€™s clusters are not those with the
highest silhouette (between 0.7 and 0.4)
Clusters with silhouette > 0 are associated to
URL algorithmically generated
This behavior is evident for silhouette > 0.7
DBSCAN
calculation
Results
Distance
calculation
URLs
extraction
Log
Cohese clusters
Sparse clusters
S(C)
30
SILHOUETTE
Examples of groupings (Eps = 0.4, MinPts = 4).
DBSCAN
calculation
Results
Distance
calculation
URLs
extraction
Log
Clusters sorted by silhouette coefficient
S(C) Main hostname (unique number) Elements Activity
0.92 skygo_streaming-i.akamaihd.net (1) 551 Streaming
0.91 ad.doubleclick.net (1) 99 Advertising
0.87 cookex.amp.yahoo.com (1) 61 Malware
0.85 static.simply.com (1) 25 File Hosting
0.81 d24w6bsrhbeh9d.cloudfront.net (1) 63 File Hosting
0.81 mfdclk001.org (1) 27 Malware
0.78 adserver.webads.it (1) 35 Advertising
0.77 .com (3) 37 TidServ
0.75 pixel.quantserve.com (1) 57 Advertising
0.72 watson.microsoft.com (1) 29 Windows
Debug
0.7 coadvertise.cubecdn.net (1) 36 Advertising
0.69 atdmt.com (2) 768 Tracking
0.65 su.ff.avast.com (1) 82 Avast Update
0.64 log.dmtry.com (1) 24 Advertising
0.61 clickpixelabn.com (1) 32 Malware
S(C) = silhouette coefficient for the cluster
Take away
Results on other clusters
Clusters contains very similar URLs
Easy to identify specific services
โ€ข Streaming
โ€ข ADS
โ€ข Malware
โ€ข Tracking
โ€ข Software update
Most are automatically generated URLs
Helps the monitoring/security analyst to
understand network traffic
31
Take away
Results on other clusters
Clusters contains very similar URLs
Easy to identify specific services
โ€ข Streaming
โ€ข ADS
โ€ข Malware
โ€ข Tracking
โ€ข Software update
Most are automatically generated URLs
Helps the monitoring/security analyst to
understand network traffic
Clusters sorted by their size
SILHOUETTE
Examples of groupings (Eps = 0.4, MinPts = 4) 19/20
DBSCAN
calculation
Results
Distance
calculation
URLs
extraction
Log
Clusters sorted by silhouette coefficient
S(C) Main hostname (unique number) Elements Activity
0.92 skygo_streaming-i.akamaihd.net (1) 551 Streaming
0.91 ad.doubleclick.net (1) 99 Advertising
0.87 cookex.amp.yahoo.com (1) 61 Malware
0.85 static.simply.com (1) 25 File Hosting
0.81 d24w6bsrhbeh9d.cloudfront.net (1) 63 File Hosting
0.81 mfdclk001.org (1) 27 Malware
0.78 adserver.webads.it (1) 35 Advertising
0.77 .com (3) 37 TidServ
0.75 pixel.quantserve.com (1) 57 Advertising
0.72 watson.microsoft.com (1) 29 Windows
Debug
0.7 coadvertise.cubecdn.net (1) 36 Advertising
0.69 atdmt.com (2) 768 Tracking
0.65 su.ff.avast.com (1) 82 Avast Update
0.64 log.dmtry.com (1) 24 Advertising
0.61 clickpixelabn.com (1) 32 Malware
Malware
(H1) mfdclk001.org
(P1)
Y2xrPTEuMjEmYmlkPTUwMGFhNzVjLWY1ZTUtNDhhOC05ZjlkLWY2ODQ
3NGYzOGQwZCZhaWQ9MTAwMTAmc2lkPTAmcmQ9MTcuMTEuMjAxMQ
==
- (H1) /UVw07ael7p4qVcS6 (P1) 26c
- (H1) /wZl1ELDd7N5quws3 (P1) 26A
- (H1) /SZY35xzx6x4q5Ks1 (P1) 26x
- (H1) /7V10LUdl7Z5mxcS2 (P1) 25A
- (H1) /LZK2BDxe5k4mQiO2 (P1) 17g
- (H1) /WZb3fvgl643q33U7 (P1) 05c
Advertising
(H1) ad.doubleclick.net
(P1)
/0_AcquisitionRtr_Apr12_AmericanExpress.html/5854707559307a64423867414151
5767
(P2) 0;click0=http://oase00821.247realmedia.com/5c/msn.it/Female/L-13/
(P3) /GroupM-IT/AmericanExpress_Acquisition_Apr12_Rtr/
(P4) /adj/N4199.456584.XAXIS.COM1/B6490067
(P5) ;sz=
- (H1) / (P4) (P5) 300x25 (P2) 1403202186/Right (P3) 300x25 (P1)
- (H1) / (P4) .2 (P5) 728x9 (P2) 523922702/Top (P3) 728x9 (P1)
- (H1) / (P4) .2 (P5) 728x9 (P2) 717876294/Top (P3) 728x9 (P1)
- (H1) / (P4) .2 (P5) 728x9 (P2) 309206097/Top (P3) 728x9 (P1)
- (H1) / (P4) .2 (P5) 728x9 (P2) 2064492282/Top (P3) 728x9 (P1)
- (H1) / (P4)(P5) 300x25 (P2) 1934004172/Right (P3)300x25 (P1)
32
CONCLUSIONS
Part 4
33
CONCLUSIONS & FUTURE WORK
Benefits of the system and possible next steps.
CLUE Automatically provides aggregated views of URLs
๏‚ง Simplifies network/security administratorโ€™s tasks
Use of passively monitored network traffic
๏‚ง Transparent for the user
Completely unsupervised methodology
๏ถ Further analyze clusters to extract common, interesting behaviors
๏ถ Allow greater system scalability
๏ถ Iterative approach
๏ถ Use CLUE to identify other interesting patterns (e.g. look at the User Agent)
โ€ฆIn future:
34

More Related Content

Similar to CLUE ITC28.pptx

Tutorial mikrotik step by step
Tutorial mikrotik step by stepTutorial mikrotik step by step
Tutorial mikrotik step by stepDewa Ketut Setiawan
ย 
June 28 Presentation
June 28 PresentationJune 28 Presentation
June 28 PresentationAndrew McGarry
ย 
A novel token based approach towards packet loss control
A novel token based approach towards packet loss controlA novel token based approach towards packet loss control
A novel token based approach towards packet loss controleSAT Journals
ย 
A novel token based approach towards packet loss
A novel token based approach towards packet lossA novel token based approach towards packet loss
A novel token based approach towards packet losseSAT Publishing House
ย 
Cloud Analytics Engine Value - Juniper Networks
Cloud Analytics Engine Value - Juniper Networks Cloud Analytics Engine Value - Juniper Networks
Cloud Analytics Engine Value - Juniper Networks Juniper Networks
ย 
Tcp congestion avoidance algorithm identification
Tcp congestion avoidance algorithm identificationTcp congestion avoidance algorithm identification
Tcp congestion avoidance algorithm identificationBala Lavanya
ย 
Banking and ATM networking reports
Banking and ATM networking reportsBanking and ATM networking reports
Banking and ATM networking reportsShakib Ansaar
ย 
Mikrotik IP Settings For Performance and Security
Mikrotik IP Settings For Performance and SecurityMikrotik IP Settings For Performance and Security
Mikrotik IP Settings For Performance and SecurityGLC Networks
ย 
Level-up Your Cloud Visibility Into AWS With ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyesLevel-up Your Cloud Visibility Into AWS With ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyesThousandEyes
ย 
CCNA 200-120 Exam Questions
CCNA 200-120 Exam QuestionsCCNA 200-120 Exam Questions
CCNA 200-120 Exam QuestionsEng. Emad Al-Atoum
ย 
Network Telemetry
Network TelemetryNetwork Telemetry
Network TelemetryAalok Shah
ย 
Network Monitoring System ppt.pdf
Network Monitoring System ppt.pdfNetwork Monitoring System ppt.pdf
Network Monitoring System ppt.pdfkristinatemen
ย 
network monitoring system ppt
network monitoring system pptnetwork monitoring system ppt
network monitoring system pptashutosh rai
ย 
Networking in college
Networking in collegeNetworking in college
Networking in collegeHarpreet Gaba
ย 
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET -  	  Identification and Classification of IoT Devices in Various Appli...IRJET -  	  Identification and Classification of IoT Devices in Various Appli...
IRJET - Identification and Classification of IoT Devices in Various Appli...IRJET Journal
ย 
NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)olatunde ismaila
ย 
Community tools to fight against DDoS, SANOG 27
Community tools to fight against DDoS, SANOG 27Community tools to fight against DDoS, SANOG 27
Community tools to fight against DDoS, SANOG 27APNIC
ย 
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceNetwork Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceCloudian
ย 

Similar to CLUE ITC28.pptx (20)

Cerita
CeritaCerita
Cerita
ย 
Tutorial mikrotik step by step
Tutorial mikrotik step by stepTutorial mikrotik step by step
Tutorial mikrotik step by step
ย 
June 28 Presentation
June 28 PresentationJune 28 Presentation
June 28 Presentation
ย 
A novel token based approach towards packet loss control
A novel token based approach towards packet loss controlA novel token based approach towards packet loss control
A novel token based approach towards packet loss control
ย 
A novel token based approach towards packet loss
A novel token based approach towards packet lossA novel token based approach towards packet loss
A novel token based approach towards packet loss
ย 
Cloud Analytics Engine Value - Juniper Networks
Cloud Analytics Engine Value - Juniper Networks Cloud Analytics Engine Value - Juniper Networks
Cloud Analytics Engine Value - Juniper Networks
ย 
Seqรผestro de dados na Internet
Seqรผestro de dados na InternetSeqรผestro de dados na Internet
Seqรผestro de dados na Internet
ย 
Tcp congestion avoidance algorithm identification
Tcp congestion avoidance algorithm identificationTcp congestion avoidance algorithm identification
Tcp congestion avoidance algorithm identification
ย 
Banking and ATM networking reports
Banking and ATM networking reportsBanking and ATM networking reports
Banking and ATM networking reports
ย 
Mikrotik IP Settings For Performance and Security
Mikrotik IP Settings For Performance and SecurityMikrotik IP Settings For Performance and Security
Mikrotik IP Settings For Performance and Security
ย 
Level-up Your Cloud Visibility Into AWS With ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyesLevel-up Your Cloud Visibility Into AWS With ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyes
ย 
CCNA 200-120 Exam Questions
CCNA 200-120 Exam QuestionsCCNA 200-120 Exam Questions
CCNA 200-120 Exam Questions
ย 
Network Telemetry
Network TelemetryNetwork Telemetry
Network Telemetry
ย 
Network Monitoring System ppt.pdf
Network Monitoring System ppt.pdfNetwork Monitoring System ppt.pdf
Network Monitoring System ppt.pdf
ย 
network monitoring system ppt
network monitoring system pptnetwork monitoring system ppt
network monitoring system ppt
ย 
Networking in college
Networking in collegeNetworking in college
Networking in college
ย 
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET -  	  Identification and Classification of IoT Devices in Various Appli...IRJET -  	  Identification and Classification of IoT Devices in Various Appli...
IRJET - Identification and Classification of IoT Devices in Various Appli...
ย 
NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)
ย 
Community tools to fight against DDoS, SANOG 27
Community tools to fight against DDoS, SANOG 27Community tools to fight against DDoS, SANOG 27
Community tools to fight against DDoS, SANOG 27
ย 
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceNetwork Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
ย 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
ย 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
ย 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
ย 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
ย 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)
ย 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
ย 
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoordharasingh5698
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .DerechoLaboralIndivi
ย 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
ย 
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night StandCall Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Standamitlee9823
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
ย 

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
ย 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ย 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
ย 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
ย 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
ย 
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
ย 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
ย 
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night StandCall Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 

CLUE ITC28.pptx

  • 1. CLUE CLUSTERING FOR MINING WEB URLS Andrea Morichetta Enrico Bocchi Hassan Metwalley Marco Mellia name.surname@polito.it ITC28 Wรผrzburg, September 15th, 2016
  • 3. SCENARIO Internet evolution and needs for monitoring. 3,256,931,615 Users in the world December 2nd 2015 [http://www.internetlivestats.com] 1,930,257,214 Subscriptions to โ€œmobile networksโ€ December 2013 [Source: ITU] Network Monitoring To obtain quality and security 3
  • 4. THE WEB TODAY Internet increasing complexity. 4
  • 5. THE WEB TODAY Internet increasing complexity. Ads Tracking Malware 5
  • 6. THE WEB AND MALICIOUS TRAFFIC HTTP traffic monitoring to track anomalous and potentially malicious behaviors. Malware Zero-day Compromised machines talk to the C&C. C&C Server Firewall Compromised Host mlw.com/abc Firewall blocks malicious requests using static rules. 6
  • 7. THE WEB AND MALICIOUS TRAFFIC HTTP traffic monitoring to track anomalous and potentially malicious behaviors. Malware Zero-day C&C Server Firewall Compromised host mlw.com/abc malw.com/abd Algorithmically generated URLs starting from seeds (e.g. current date or Twitter trends) They elude static controls, based on blacklists, changing URLsโ€™ paths and hostnames 7
  • 8. THE WEB AND MALICIOUS TRAFFIC HTTP traffic monitoring to track anomalous and potentially malicious behaviors. Malware Zero-day C&C Server Firewall Compromised host mlw.com/abc malw.com/abd Algorithmically generated URLs starting from seeds (e.g. current date or Twitter trends) They elude static controls, based on blacklists, changing URLsโ€™ paths and hostnames HTTP traffic monitoring Group algorithmically generated URLs. Control and monitor possible, not-checked, malicious behaviors. Or generically better understanding the traffic on the Web. 8
  • 9. EXAMPLE: TIDSERV Malware TidServ analysis. Profit-making purpose It spreads with users complicity URLs characterized by pseudo-randomness Trojan Rootkit 9
  • 10. EXAMPLE: TIDSERV Malware TidServ analysis. swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMTIy MDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxPXV pbmZlIG5ZGVzaw==38c rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMT IyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxP WZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g Profit-making purpose It spreads with users complicity URLs characterized by pseudo-randomness Trojan Rootkit 10
  • 11. EXAMPLE: TIDSERV Malware TidServ analysis. swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMTIy MDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxPXV pbmZlIG5ZGVzaw==38c rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMT IyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxP WZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g Profit-making purpose It spreads with users complicity URLs characterized by pseudo-randomness Trojan Rootkit How to automatically detect this behavior? Which are services adopting these techniques? 11
  • 13. CLUE in a nutshell โ€ข HTTP traffic analysis -> How to find similar URLs? โ€ข How similar are two strings? โ€ข How to group similar URLs? โ€ข Clustering algorithms -> Which algorithm? Which parameters? โ€ข How to suggest relevant clusters? โ€ข Highlight relevant clusters to further mine Big data approach for HTTP mining DBSCAN clustering Results Distance calculation Log CLUE: CLustering for URL Exploration 13
  • 14. SCENARIO Traffic collected from a network with more than 20000 Hosts connected. IDS HTTP requests DBSCAN calculation Results Distance calculation Log URLs extraction Internal Clients Edge Router External Servers Labels 14
  • 15. INPUT Aggregated HTTP log carry one entry for each HTTP request. Millions of requests are collected per day. Timestamp Hostname Path 1339130937 www.emilbanca.it /emilbanca/img2009/angolo_blu_dx.gif 1339130938 fpdownload.adobe.com /pub/swz/crossdomain.xml 1339130941 8lnxpg8vwyuzhbol.com /rkF4Tx3x8N4YR8C5dj0xLjkmaWQ9NmQzNDZkY2FkYzU4Yzk0ODBkZDliODNkNjYxYzIzMmNjZTZhZDY4ZCZhaWQ9MzA0Mjgmc2lk PTQmb3M9NS4xIDAwMDAgU1AwLjAma3c9WlcxcGJHSmhibU5oRFFwbGJXbHNZbUZ1WTJFTkNtVnlibVZ6ZEc4cmMzUnliM3A2Y VN0bVpYSnlZWEpoRFFvPSZ1cmw9YUhSMGNEb3ZMM2QzZHk1cGJtSmhibXN1YVhRdlpuVnVZM1JwYjI0dmJHOW5hVzR2YVc1a1 pYZ3Vhbk53UDJ4aGJtYzlhWFFtWVdKcFBUQTNNRGN5Sm1OemN6MHdOekEzTWc9PSZyZWY9ZDNkM0xtVnRhV3hpWVc1allTNX BkQzl3YjNKMFlXd3ZjR0ZuWlQ5ZmNHRm5aV2xrUFRJMk9ERXNNU1pmWkdGa1BYQnZjblJoYkNaZmMyTm9aVzFoUFZCUFVsUkJU QT0935A 1339130945 83.133.121.147 /c/kaw0hOOD6x5Jpso2440a89f7bdeb9da9f4b5af9160e66aa908c 1339130946 delivery.jemacpv.com /network/c/adclick.php 1339130946 delivery.jemacpv.com /network/c/adclick.php 1339130947 www.peaktube.com /video_play 1339130948 www.peaktube.com /redirect.php 1339130949 cdn1.static.videobash.com /css/ie8-style-new.css 1339130949 www.videobash.com /video_play 1339130949 cdn1.static.videobash.com /css/style_new.css 1339130954 www.emilbanca.it /emilbanca/img2009/labanca_con.jpg 1339130965 www.emilbanca.it /emilbanca/img2009/labanca_con.jpg 1339130980 img3.iol.it /s/sport/med/balotelli-quotma-quale-derby-di-mercato-quot.jpg 1339130980 rta.criteo.com /dis/rtt.js 1339130980 img1.iol.it /img107/share/pubblicita/07/76/2012/4/nome.jpg 1339130980 img3.iol.it /img107/coldx/appl/01/1047/2011/3/3.gif 1339130980 img1.iol.it /img107/coldx/appl/03/3043/2012/3/03_110x107.jpg 1339130980 img3.iol.it /s/lavoro/116/autogrill-licenzia-in-massa.jpg 1339130980 www.libero.it / DBSCAN calculation Results Distance calculation Log URLs extraction How to express URLs similarity? 15
  • 16. SIMILARITY Comparison between elements with no good understanding a priori. LEVENSHTEIN DISTANCE JARO DISTANCE URL DISTANCE Simple Levenstein distance: assigns a unit cost to all edit operations Levenshtein modified: unitary weight for adding and removing edit operations, double weight for replacements The Jaro algorithm is a measure that evaluates the number and order of features in common Edit Distance Class of distance functions in which, given two strings s and t, distance is the cost of best sequence of edit operations that convert s to t. DBSCAN calculation Results Distance calculation Log URLs extraction 16
  • 17. DISTANCE EVALUATION IN PRACTICE Comparing distance measures behavior with TidServ elements. LEVENSHTEIN DISTANCE a. swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ 3ZTEzMTIyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdv b2dsZS5pdCZxPXVpbmZlIG5ZGVzaw==38c b. iau71nag001.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OG Q3ZTEzMTIyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmd vb2dsZS5pdCZxPXVpbmZlIG15ZGVzaw==38c c. rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjYWVhNjE0NjhhMmQ4ZTc0OGQ 3ZTEzMTIyMDZiMDQ4NWY2MjJhYSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdv b2dsZS5pdCZxPWZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g d. iau71nag001.com/Kvb13nWd6P4XrFs3dmVyPTQuMiZiaWQ9MDU0NWQwZDQwY2MyODU4YWNj YzFlZjJkM2FiZDA5N2RiYmRlYmVkZiZhaWQ9NTAwMTgmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdv b2dsZS5pdCZxPWZhY2Vib29r27c e. zhakazth.cn/qkF3Vrye5c4qHoo4dmVyPTUuMCZzPTAmYmlkPTA1MzMyNGU1MzQzMDY5NTZiYW YxNGViYTQ5YWY4ZGZhM2I2OWEwYTQmYWlkPTMwNDIxJnNpZD0zJmVuZz13d3cuZ29vZ2xlLml 0JnE9dHJvbWJhdGErdnVhaWVyK2NvbiticmFzaWxpYW5hK2luK3NwaWFnZ2lhJng4Nj02NA==16h a-b: 11 a-c: 56 a-d: 97 a-e: 182 17
  • 18. DISTANCE EVALUATION IN PRACTICE Comparing distance measures behavior with TidServ elements. LEVENSHTEIN DISTANCE JARO DISTANCE URL DISTANCE a-b: 0.05 a-c: 0.26 a-d: 0.47 a-b: 0.13 a-c: 0.28 a-d: 0.33 a-b: 11 a-c: 56 a-d: 97 a-e: 182 a-e: 0.38 a-e: 0.7 Pros: โ€ข enhance differences โ€ข normalized 18
  • 19. EXAMPLE ๐‘ข๐‘Ÿ๐‘™1 = โ€˜๐‘”๐‘œ๐‘œ๐‘”๐‘™๐‘’. ๐‘๐‘œ๐‘šโ€™ 10 ๐‘โ„Ž๐‘Ž๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘’๐‘Ÿ๐‘  ; ๐‘ข๐‘Ÿ๐‘™2 = โ€˜1๐‘”๐‘œ๐‘”๐‘”๐‘™๐‘’. ๐‘๐‘œ๐‘šโ€™ 11 ๐‘โ„Ž๐‘Ž๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘’๐‘Ÿ๐‘  ; ๐ฟ๐‘’๐‘ฃ๐‘’๐‘›๐‘ โ„Ž๐‘ก๐‘’๐‘–๐‘› ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ž๐‘›๐‘๐‘’ ๐‘š๐‘œ๐‘‘ ๐‘ข๐‘Ÿ๐‘™1, ๐‘ข๐‘Ÿ๐‘™2 = 1 ๐‘Ž๐‘‘๐‘‘ ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก: 1 + 1 ๐‘Ÿ๐‘’๐‘๐‘™๐‘Ž๐‘๐‘’๐‘š๐‘’๐‘›๐‘ก ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก: 2 = 3; ๐‘ซ๐‘ผ๐‘น๐‘ณ ๐’–๐’“๐’๐Ÿ, ๐’–๐’“๐’๐Ÿ = 3 10+11 = 0.143 URL DISTANCE Measure to calculate strings similarity. ๐‘ซ๐‘ผ๐‘น๐‘ณ ๐’”๐’•๐’“๐’Š๐’๐’ˆ๐Ÿ, ๐’”๐’•๐’“๐’Š๐’๐’ˆ๐Ÿ = ๐ฟ๐‘’๐‘ฃ๐‘’๐‘›๐‘ โ„Ž๐‘ก๐‘’๐‘–๐‘› ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ž๐‘›๐‘๐‘’ ๐‘š๐‘œ๐‘‘(๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”1, ๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”2) ๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”1 + ๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘›๐‘”2 Based on Levensthein distance, unitary weight for adding and removing, double weight for replacements plus normalization FORMULA DBSCAN calculation Results Distance calculation Log URLs extraction How to use this metric to group similar URLs? 19
  • 20. DBSCAN Clustering algorithm used for grouping URLs together. Features It allows the presence of outliers: prevents non- coherent elements to be added to the cluster. Must not define the number of clusters a priori Must not define centroids Do not mandatory require points in Euclidean space Can handle different shaped clusters and not only globular ones Parameters Epsilon, radius of the considered area Min points, minimum number of points inside the area Example of clustering with DBSCAN Based on the idea of density, intended as the number of points in a specific area; compared to other algorithms families it provides partial solutions. DBSCAN calculation Results Distance calculation Log URLs extraction 20
  • 21. SCHEMA Final schema. Developed in Python. Log files URLs List URL distance between every couple of elements Compute Distance Matrix Distance Matrix Extract HTTP Object URLs Load Distance Matrix Compute DBSCAN Clusters Statistics DBSCAN calculation Results Distance calculation URLs extraction Log 21
  • 23. DISTINCT URL ELEMENTS ANALYSIS Analysis of the HTTP traffic generated by 14 Hosts infected by TidServ 20 randomly selected Hosts. Analysis of DBSCAN clustering on 34 Hostsโ€™ Test Set. About TidServ 228 Other malware 33 Benign 78160 Total 78421 DBSCAN calculation Results Distance calculation URLs extraction Log Is it possible to separate all the 228 malicious URLs from the data? And which parameters shall be used? 23
  • 24. URLs extraction CLUSTERING Results for 34 Hosts infected by TidServ. NUMBER OF OUTLIERS Performance Decrease in the number of outliers, for growing Epsilon. DBSCAN calculation Log Results Distance calculation Lots of outliers Few outliers 24
  • 25. URLs extraction CLUSTERING Results for 34 Hosts infected by TidServ. NUMBER OF CLUSTERS Performance More complicated relations with the number of clusters Increase in the number of clusters for Epsilon = 0.2 and 0.225, due to the fact that many elements previously considered noise constitute new clusters. DBSCAN calculation Log Results Distance calculation Lots of very small clusters Few giant clusters Which E allows us to isolate the 228 malicious URLs? Note: from 78000++ URLs to 300 clusters 25
  • 26. URLs extraction CLUSTERING Results for 34 Hosts infected by TidServ. CLUSTERING RESULTS FOR TIDSERV - OUTLIERS Performance Decrease in the number of outliers, until reaching 0 for Epsilon = 0.4. DBSCAN calculation Log Results Distance calculation All Tidserv URLs are clustered 26
  • 27. URLs extraction CLUSTERING Results for 34 Hosts infected by TidServ. CLUSTERING RESULTS FOR TIDSERV Performance Constant and coherent growing of the number of known elements included and ability to aggregate additional not- reported elements. DBSCAN calculation Log Results Distance calculation Nr. of IDS- flagged URLs (228) Few giant clusters Why more than 228 URLs are actually clustered? 27
  • 28. URLs extraction CLUSTERING Results for 34 Hosts infected by TidServ. CLUSTERING RESULTS FOR TIDSERV Performance Constant and coherent growing of the number of known elements included and ability to aggregate additional not- reported elements. DBSCAN calculation Log Results Distance calculation Cluster ID TidServ - IDS Count All elements Count A 5 5 B 18 32 C 5 6 D 75 79 E 118 192 F 6 6 G 1 37 Total 228 357 Do those clusters contain actually similar URLs? 28
  • 29. TIDSERV ANALYSIS Cluster G โ€“ Compare Elements โ€ข gnu4oke0r.com/4VY00y9P7Z5xiPs9dmVyPTQuMCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT AmcmQ9MCZlbmc9d3d3Lmdvb2dsZS5pdCZxPWxvdWlzIGNydWlzZXM=16h โ€ข lkckclcklii1i.com/TAR3vUsX844qz1c5Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==27g โ€ข lkckclckl1i1i.com/TAR3vUsX844qz1c5Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==27g โ€ข lkckclcklii1i.com/ZvP1nw3P6z6XLSs7Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==26g โ€ข lkckclckl1i1i.com/ZvP1nw3P6z6XLSs7Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT AmcmQ9MA==26g โ€ข lkckclcklii1i.com/yVv4l79D5E7yT8u9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==18x โ€ข lkckclckl1i1i.com/yVv4l79D5E7yT8u9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==18x โ€ข lkckclcklii1i.com/3Zh2DpoP583XBvc2Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==05Z โ€ข lkckclckl1i1i.com/3Zh2DpoP583XBvc2Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPTA mcmQ9MA==05Z โ€ข lkckclcklii1i.com/ZaW4pfQP6P4Q7EO9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT AmcmQ9MA==06c โ€ข lkckclckl1i1i.com/ZaW4pfQP6P4Q7EO9Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT AmcmQ9MA==06c โ€ข lkckclcklii1i.com/SVn4kZCE8Y6MEes8Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT AmcmQ9MA==38A โ€ข lkckclckl1i1i.com/SVn4kZCE8Y6MEes8Y2xrPTIuNCZiaWQ9NWJjNWFiMjE1YjRmN2I4ZjM3OTRmODNkZjhmNWY0ZjFmODZkYjE1YyZhaWQ9MzAwMDEmc2lkPT AmcmQ9MA==38A Tidserv OK (better than IDS!!!) But what about the 300++ clusters? 29
  • 30. SILHOUETTE Silhouette values distribution for some representative clustering results. CALCULATIONS Performance Consider clusters with more than 20 elements Most clusters have silhouette > 0 Tidservโ€™s clusters are not those with the highest silhouette (between 0.7 and 0.4) Clusters with silhouette > 0 are associated to URL algorithmically generated This behavior is evident for silhouette > 0.7 DBSCAN calculation Results Distance calculation URLs extraction Log Cohese clusters Sparse clusters S(C) 30
  • 31. SILHOUETTE Examples of groupings (Eps = 0.4, MinPts = 4). DBSCAN calculation Results Distance calculation URLs extraction Log Clusters sorted by silhouette coefficient S(C) Main hostname (unique number) Elements Activity 0.92 skygo_streaming-i.akamaihd.net (1) 551 Streaming 0.91 ad.doubleclick.net (1) 99 Advertising 0.87 cookex.amp.yahoo.com (1) 61 Malware 0.85 static.simply.com (1) 25 File Hosting 0.81 d24w6bsrhbeh9d.cloudfront.net (1) 63 File Hosting 0.81 mfdclk001.org (1) 27 Malware 0.78 adserver.webads.it (1) 35 Advertising 0.77 .com (3) 37 TidServ 0.75 pixel.quantserve.com (1) 57 Advertising 0.72 watson.microsoft.com (1) 29 Windows Debug 0.7 coadvertise.cubecdn.net (1) 36 Advertising 0.69 atdmt.com (2) 768 Tracking 0.65 su.ff.avast.com (1) 82 Avast Update 0.64 log.dmtry.com (1) 24 Advertising 0.61 clickpixelabn.com (1) 32 Malware S(C) = silhouette coefficient for the cluster Take away Results on other clusters Clusters contains very similar URLs Easy to identify specific services โ€ข Streaming โ€ข ADS โ€ข Malware โ€ข Tracking โ€ข Software update Most are automatically generated URLs Helps the monitoring/security analyst to understand network traffic 31
  • 32. Take away Results on other clusters Clusters contains very similar URLs Easy to identify specific services โ€ข Streaming โ€ข ADS โ€ข Malware โ€ข Tracking โ€ข Software update Most are automatically generated URLs Helps the monitoring/security analyst to understand network traffic Clusters sorted by their size SILHOUETTE Examples of groupings (Eps = 0.4, MinPts = 4) 19/20 DBSCAN calculation Results Distance calculation URLs extraction Log Clusters sorted by silhouette coefficient S(C) Main hostname (unique number) Elements Activity 0.92 skygo_streaming-i.akamaihd.net (1) 551 Streaming 0.91 ad.doubleclick.net (1) 99 Advertising 0.87 cookex.amp.yahoo.com (1) 61 Malware 0.85 static.simply.com (1) 25 File Hosting 0.81 d24w6bsrhbeh9d.cloudfront.net (1) 63 File Hosting 0.81 mfdclk001.org (1) 27 Malware 0.78 adserver.webads.it (1) 35 Advertising 0.77 .com (3) 37 TidServ 0.75 pixel.quantserve.com (1) 57 Advertising 0.72 watson.microsoft.com (1) 29 Windows Debug 0.7 coadvertise.cubecdn.net (1) 36 Advertising 0.69 atdmt.com (2) 768 Tracking 0.65 su.ff.avast.com (1) 82 Avast Update 0.64 log.dmtry.com (1) 24 Advertising 0.61 clickpixelabn.com (1) 32 Malware Malware (H1) mfdclk001.org (P1) Y2xrPTEuMjEmYmlkPTUwMGFhNzVjLWY1ZTUtNDhhOC05ZjlkLWY2ODQ 3NGYzOGQwZCZhaWQ9MTAwMTAmc2lkPTAmcmQ9MTcuMTEuMjAxMQ == - (H1) /UVw07ael7p4qVcS6 (P1) 26c - (H1) /wZl1ELDd7N5quws3 (P1) 26A - (H1) /SZY35xzx6x4q5Ks1 (P1) 26x - (H1) /7V10LUdl7Z5mxcS2 (P1) 25A - (H1) /LZK2BDxe5k4mQiO2 (P1) 17g - (H1) /WZb3fvgl643q33U7 (P1) 05c Advertising (H1) ad.doubleclick.net (P1) /0_AcquisitionRtr_Apr12_AmericanExpress.html/5854707559307a64423867414151 5767 (P2) 0;click0=http://oase00821.247realmedia.com/5c/msn.it/Female/L-13/ (P3) /GroupM-IT/AmericanExpress_Acquisition_Apr12_Rtr/ (P4) /adj/N4199.456584.XAXIS.COM1/B6490067 (P5) ;sz= - (H1) / (P4) (P5) 300x25 (P2) 1403202186/Right (P3) 300x25 (P1) - (H1) / (P4) .2 (P5) 728x9 (P2) 523922702/Top (P3) 728x9 (P1) - (H1) / (P4) .2 (P5) 728x9 (P2) 717876294/Top (P3) 728x9 (P1) - (H1) / (P4) .2 (P5) 728x9 (P2) 309206097/Top (P3) 728x9 (P1) - (H1) / (P4) .2 (P5) 728x9 (P2) 2064492282/Top (P3) 728x9 (P1) - (H1) / (P4)(P5) 300x25 (P2) 1934004172/Right (P3)300x25 (P1) 32
  • 34. CONCLUSIONS & FUTURE WORK Benefits of the system and possible next steps. CLUE Automatically provides aggregated views of URLs ๏‚ง Simplifies network/security administratorโ€™s tasks Use of passively monitored network traffic ๏‚ง Transparent for the user Completely unsupervised methodology ๏ถ Further analyze clusters to extract common, interesting behaviors ๏ถ Allow greater system scalability ๏ถ Iterative approach ๏ถ Use CLUE to identify other interesting patterns (e.g. look at the User Agent) โ€ฆIn future: 34

Editor's Notes

  1. Impatto e dimensione di internet
  2. Impatto e dimensione di internet
  3. Impatto e dimensione di internet
  4. Impatto e dimensione di internet
  5. Dรฌ che Epsilon sarร  la distanza calcolata da Ratio
  6. Spiega grafico COLORI
  7. Spiega grafico COLORI
  8. Spiega grafico COLORI
  9. Spiega grafico COLORI
  10. Spiega grafico COLORI