SlideShare a Scribd company logo
1 of 29
Download to read offline
LENTA
Longitudinal Exploration for Network Traffic Analysis
Andrea Morichetta, Marco Mellia
The Web today
A proliferation of services
that rely on HTTP
Each day hundred thousands of
unique URLs need to be analyzed by
the network analyst
- For traffic analysis
- For performance tuning
- For security
- …
2
Malware
DGA technique
State of art: Firewalls block
malicious traffic using static rules.
Countermeasure: DGA - Generate
pseudo-random domains starting
from common seeds (e.g. current
date or Twitter trends), eluding static
controls based on blacklists.
rammyjuke.com
C&C Server
Blacklist
swltcho81.com
www.hjaoopoa.top
textspeier.de
…
3
Malware
DGA technique
swltcho81.com/NZf4A07d7r7yE1C1dmVyPTQuMCZiaWQ9YjZjYW
VhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMTIyMDZiMDQ4NWY2MjJhY
SZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2ds
ZS5pdCZxPXVpbmZlIG5ZGVzaw==38c
rammyjuke.com/kaI1wWRd8Y5yfbU9dmVyPTQuMCZiaWQ9YjZjY
WVhNjE0NjhhMmQ4ZTc0OGQ3ZTEzMTIyMDZiMDQ4NWY2MjJh
YSZhaWQ9NDAxOTcmc2lkPTAmcmQ9MCZlbmc9d3d3Lmdvb2d
sZS5pdCZxPWZvcnVtIGFybWF0YSBkZWxsZSB0ZW5lYnJl37g
Looking better at the path, it can be
noticed that the structure is similar
and is still possible to match them.
Blacklist
swltcho81.com
www.hjaoopoa.top
textspeier.de
… rammyjuke.com
C&C Server
4
Idea
Ease the analysis by
clustering network traffic
Implement a self-learning
methodology to automatically
associate previously observed
services and identify new traffic
generated by possibly suspicious
applications.
5
LENTA
overview
Day 1
6
LENTA
overview
Day 1
Clusters
!(1)
6
LENTA
overview
Day 1
Clusters
!(1)
Clusters’ sampling is performed to facilitate computation & storing
6
LENTA
overview
Day 1
Clusters
!(1)
Clusters’ sampling is performed to facilitate computation & storing
%
!(1)
System
Knowledge
%
&(1)
6
LENTA
overview
Day 1 Day 2
Clusters
!(1) !(2)
Clusters’ sampling is performed to facilitate computation & storing
&
!(1) &
!(2)
System
Knowledge
&
'(1) &
'(2)
6
LENTA
overview
Day 1 Day 2 Day 3 Day 4
Clusters
!(1) !(2) !(3) !(4)
Clusters’ sampling is performed to facilitate computation & storing
(
!(1) (
!(2) (
!(3) (
!(4)
System
Knowledge
(
)(1) (
)(2) (
)(3) (
)(4)
6
Traffic Collection
HTTP requests
Internal
Clients
External
Servers
Edge
Router
7
Amount of unique URLs found in a
network over a week of observation
URLs observation
8
Clustering
URL comparison is executed by means of a string distance implementation
based on edit distance, i.e., number of edit necessary to make one string
equal to the other
New: A recursive version of DBSCAN clustering to
- Reduce data complexity
- Improve clustering accuracy
9
CLUE[1] - Big data approach for HTTP mining
DBSCAN
calculation
Distance
calculation
Log
URLs
extraction
Results
HTTP traffic analysis.
How to find similar
URLs?
How to group
similar URLs?
Which clustering
algorithm? Which
parameters?
[1] Morichetta, A., Bocchi, E., Metwalley, H., & Mellia, M. (2016, September). CLUE: clustering for mining web URLs. In Teletraffic
Congress (ITC 28), 2016 28th International (Vol. 1, pp. 286-294). IEEE.
10
I-DBSCAN
Compute
DBSCAN
Extract Clusters
(and noise…)
Distance Matrix
- 0.3
0.5
0.9
Silhouette index
Find the ϵ that allows to
cluster a certain percentage
of the whole dataset
Threshold defined a priori
Iterate the process over clusters with silhouette below a threshold !
Input MinPts
11
More clusters and higher cluster
quality, thanks to recursive clustering
of bad formed clusters
The more the silhouette is near to
one, the more the cluster is well
formed
I-DBSCAN
Iterative DBSCAN over
clustering results
12
Sampling is performed to ease the comparison between clusters, to reduce
computational complexity and keep traffic digests, reducing its footprint
The medoid is appropriate for spherical and homogeneous clusters. We
implemented percentile sampling in order to produce a sampling that is
more peculiar to the population of the cluster
Sampling
13
For each element in the cluster we
compute its mean intra-cluster
distance, i.e., the mean of pairwise
distance by it and every other
element in the cluster.
We then order the elements by their
mean intra-cluster distance.
Percentile sampling
Choosing representatives
for clusters
14
We extract from this distribution m
percentiles and pick the
corresponding elements.
The idea is to have a set of cluster
subsamples (representatives) that
includes both elements that are in
the center area of a cluster and the
ones at its border, dividing it in equal
sets.
Percentile sampling
Choosing representatives
for clusters
15
The number of subsamples chosen is
a trade-off between precision and
complexity.
We tested it using two clustering
data sets results from a day of traffic.
The first builds the System
Knowledge and contains half clusters
selected from C. The second set
contains all clusters.
Percentile sampling
Choosing m size
16
Using string distance , new clusters are compared to the ones in the System
Knowledge and added to it if the distance to the closest old cluster is higher
than a threshold !
"
# $ = "
# $ − 1 ∪ "
)* $ ∈ ) $ ,-./
"
)* $ , "
# $ − 1 ≥ !
Random replacement when a new cluster is associated to the old one
to update the system knowledge and to replace “old” representatives
System knowledge enhancement
17
Starting from an initial group of
almost 33000 unique URLs we then
artificially create new groups,
progressively injecting URLs
belonging to different applications
to the previous data set.
From the picture can be noticed that
LENTA is able to identify multiple
clusters for each stage.
In vitro experiment
LENTA reaction to
anomalous traffic
18
Results after a week
19
Future steps
Focus on HTTPS traffic to have a complete view on the network activities
Extend big data approaches to all the stage of the system, to scale the
analysis
Application of LENTA over different lexical features, e.g., hostname in DNS
queries or user agents in HTTP requests
20
• Metti slide finale con «domande» J
Questions?
21
10
3
10
4
10
5
Dataset Size
101
102
103
104
Elapsed
time
in
seconds
Centralized
Spark
Computing the pairwise distance
between points is the most complex
and time consuming step in
clustering algorithms.
We implemented a parallelized
computation of distances on Spark,
obtaining better results with respect
to a centralized approach.
Distance matrix
Computing pairwise
distances
27
Starting from an initial group of
almost 33000 unique URLs we then
artificially create new groups,
progressively injecting URLs
belonging to different applications
to the previous data set.
From the picture can be noticed that
LENTA is able to identify multiple
clusters for each stage.
In vitro experiment
LENTA reaction to
anomalous traffic
28
HTTP vs HTTPS over time
2013/04
2013/07
2013/10
2014/01
2014/04
2014/07
2014/10
2015/01
2015/04
2015/07
2015/10
2016/01
2016/04
2016/07
2016/10
2017/01
2017/04
2017/07
2017/10
0
10
20
30
40
50
60
70
80
90
100
Share
[%]
FB-ZERO
SPDY
HTTP/2
TLS
QUIC
HTTP
A B C D E F
29

More Related Content

Similar to slides_itc30_2018_Morichetta_v2.pdf

JPD1423 A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...
JPD1423  A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...JPD1423  A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...
JPD1423 A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...chennaijp
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netredpel dot com
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics TokyoAdam Gibson
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at ScaleJeff Henrikson
 
Laporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdf
Laporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdfLaporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdf
Laporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdfIGedeArieYogantaraSu
 
a probabilistic misbehavior detection scheme toward efficient trust establish...
a probabilistic misbehavior detection scheme toward efficient trust establish...a probabilistic misbehavior detection scheme toward efficient trust establish...
a probabilistic misbehavior detection scheme toward efficient trust establish...swathi78
 
RedSplice_Network_Traffic_Examiner_Datasheet
RedSplice_Network_Traffic_Examiner_DatasheetRedSplice_Network_Traffic_Examiner_Datasheet
RedSplice_Network_Traffic_Examiner_DatasheetLaurentiu Nicula
 
RRD Tool and Network Monitoring
RRD Tool and Network MonitoringRRD Tool and Network Monitoring
RRD Tool and Network Monitoringsweta dargad
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Gabriele Bozzi
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaGabriele Bozzi
 
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...IEEEGLOBALSOFTSTUDENTSPROJECTS
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEEMEMTECHSTUDENTPROJECTS
 
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology DiscoveryEfficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology DiscoveryIOSR Journals
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 

Similar to slides_itc30_2018_Morichetta_v2.pdf (20)

JPD1423 A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...
JPD1423  A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...JPD1423  A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...
JPD1423 A Probabilistic Misbehavior Detection Scheme toward Efficient Trust ...
 
June 28 Presentation
June 28 PresentationJune 28 Presentation
June 28 Presentation
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot net
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 
Laporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdf
Laporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdfLaporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdf
Laporan Praktikum Keamanan Siber - Tugas 4 -Kelas C - Kelompok 3.pdf
 
a probabilistic misbehavior detection scheme toward efficient trust establish...
a probabilistic misbehavior detection scheme toward efficient trust establish...a probabilistic misbehavior detection scheme toward efficient trust establish...
a probabilistic misbehavior detection scheme toward efficient trust establish...
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
DTN
DTNDTN
DTN
 
RedSplice_Network_Traffic_Examiner_Datasheet
RedSplice_Network_Traffic_Examiner_DatasheetRedSplice_Network_Traffic_Examiner_Datasheet
RedSplice_Network_Traffic_Examiner_Datasheet
 
RRD Tool and Network Monitoring
RRD Tool and Network MonitoringRRD Tool and Network Monitoring
RRD Tool and Network Monitoring
 
Netsim experiment manual
Netsim experiment manualNetsim experiment manual
Netsim experiment manual
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom Italia
 
SDN approach.pptx
SDN approach.pptxSDN approach.pptx
SDN approach.pptx
 
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
 
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology DiscoveryEfficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 

Recently uploaded

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

slides_itc30_2018_Morichetta_v2.pdf

  • 1. LENTA Longitudinal Exploration for Network Traffic Analysis Andrea Morichetta, Marco Mellia
  • 2. The Web today A proliferation of services that rely on HTTP Each day hundred thousands of unique URLs need to be analyzed by the network analyst - For traffic analysis - For performance tuning - For security - … 2
  • 3. Malware DGA technique State of art: Firewalls block malicious traffic using static rules. Countermeasure: DGA - Generate pseudo-random domains starting from common seeds (e.g. current date or Twitter trends), eluding static controls based on blacklists. rammyjuke.com C&C Server Blacklist swltcho81.com www.hjaoopoa.top textspeier.de … 3
  • 5. Idea Ease the analysis by clustering network traffic Implement a self-learning methodology to automatically associate previously observed services and identify new traffic generated by possibly suspicious applications. 5
  • 8. LENTA overview Day 1 Clusters !(1) Clusters’ sampling is performed to facilitate computation & storing 6
  • 9. LENTA overview Day 1 Clusters !(1) Clusters’ sampling is performed to facilitate computation & storing % !(1) System Knowledge % &(1) 6
  • 10. LENTA overview Day 1 Day 2 Clusters !(1) !(2) Clusters’ sampling is performed to facilitate computation & storing & !(1) & !(2) System Knowledge & '(1) & '(2) 6
  • 11. LENTA overview Day 1 Day 2 Day 3 Day 4 Clusters !(1) !(2) !(3) !(4) Clusters’ sampling is performed to facilitate computation & storing ( !(1) ( !(2) ( !(3) ( !(4) System Knowledge ( )(1) ( )(2) ( )(3) ( )(4) 6
  • 13. Amount of unique URLs found in a network over a week of observation URLs observation 8
  • 14. Clustering URL comparison is executed by means of a string distance implementation based on edit distance, i.e., number of edit necessary to make one string equal to the other New: A recursive version of DBSCAN clustering to - Reduce data complexity - Improve clustering accuracy 9
  • 15. CLUE[1] - Big data approach for HTTP mining DBSCAN calculation Distance calculation Log URLs extraction Results HTTP traffic analysis. How to find similar URLs? How to group similar URLs? Which clustering algorithm? Which parameters? [1] Morichetta, A., Bocchi, E., Metwalley, H., & Mellia, M. (2016, September). CLUE: clustering for mining web URLs. In Teletraffic Congress (ITC 28), 2016 28th International (Vol. 1, pp. 286-294). IEEE. 10
  • 16. I-DBSCAN Compute DBSCAN Extract Clusters (and noise…) Distance Matrix - 0.3 0.5 0.9 Silhouette index Find the ϵ that allows to cluster a certain percentage of the whole dataset Threshold defined a priori Iterate the process over clusters with silhouette below a threshold ! Input MinPts 11
  • 17. More clusters and higher cluster quality, thanks to recursive clustering of bad formed clusters The more the silhouette is near to one, the more the cluster is well formed I-DBSCAN Iterative DBSCAN over clustering results 12
  • 18. Sampling is performed to ease the comparison between clusters, to reduce computational complexity and keep traffic digests, reducing its footprint The medoid is appropriate for spherical and homogeneous clusters. We implemented percentile sampling in order to produce a sampling that is more peculiar to the population of the cluster Sampling 13
  • 19. For each element in the cluster we compute its mean intra-cluster distance, i.e., the mean of pairwise distance by it and every other element in the cluster. We then order the elements by their mean intra-cluster distance. Percentile sampling Choosing representatives for clusters 14
  • 20. We extract from this distribution m percentiles and pick the corresponding elements. The idea is to have a set of cluster subsamples (representatives) that includes both elements that are in the center area of a cluster and the ones at its border, dividing it in equal sets. Percentile sampling Choosing representatives for clusters 15
  • 21. The number of subsamples chosen is a trade-off between precision and complexity. We tested it using two clustering data sets results from a day of traffic. The first builds the System Knowledge and contains half clusters selected from C. The second set contains all clusters. Percentile sampling Choosing m size 16
  • 22. Using string distance , new clusters are compared to the ones in the System Knowledge and added to it if the distance to the closest old cluster is higher than a threshold ! " # $ = " # $ − 1 ∪ " )* $ ∈ ) $ ,-./ " )* $ , " # $ − 1 ≥ ! Random replacement when a new cluster is associated to the old one to update the system knowledge and to replace “old” representatives System knowledge enhancement 17
  • 23. Starting from an initial group of almost 33000 unique URLs we then artificially create new groups, progressively injecting URLs belonging to different applications to the previous data set. From the picture can be noticed that LENTA is able to identify multiple clusters for each stage. In vitro experiment LENTA reaction to anomalous traffic 18
  • 24. Results after a week 19
  • 25. Future steps Focus on HTTPS traffic to have a complete view on the network activities Extend big data approaches to all the stage of the system, to scale the analysis Application of LENTA over different lexical features, e.g., hostname in DNS queries or user agents in HTTP requests 20
  • 26. • Metti slide finale con «domande» J Questions? 21
  • 27. 10 3 10 4 10 5 Dataset Size 101 102 103 104 Elapsed time in seconds Centralized Spark Computing the pairwise distance between points is the most complex and time consuming step in clustering algorithms. We implemented a parallelized computation of distances on Spark, obtaining better results with respect to a centralized approach. Distance matrix Computing pairwise distances 27
  • 28. Starting from an initial group of almost 33000 unique URLs we then artificially create new groups, progressively injecting URLs belonging to different applications to the previous data set. From the picture can be noticed that LENTA is able to identify multiple clusters for each stage. In vitro experiment LENTA reaction to anomalous traffic 28
  • 29. HTTP vs HTTPS over time 2013/04 2013/07 2013/10 2014/01 2014/04 2014/07 2014/10 2015/01 2015/04 2015/07 2015/10 2016/01 2016/04 2016/07 2016/10 2017/01 2017/04 2017/07 2017/10 0 10 20 30 40 50 60 70 80 90 100 Share [%] FB-ZERO SPDY HTTP/2 TLS QUIC HTTP A B C D E F 29