Network Measurement and Monitori - Assigment 1, Group3, "Classification"

255 views

Published on

Created by Patrick Herbeuval and Valentin Thirion

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
255
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Network Measurement and Monitori - Assigment 1, Group3, "Classification"

  1. 1. Networking measurements and monitoring 1st assigment: Oral Presentation Classification Patrick Herbeuval Valentin Thirion University of Liège University of Liège1st Master in Computer Science 1st Master in Computer Sciencep.herbeuval@student.ulg.ac.be valentin.thirion@student.ulg.ac.be Teacher: B. DONNET benoit.donnet@ulg.ac.be
  2. 2. PlanI. IntroductionFour papers II. Early Application Identification III. Multilevel classifier: BLINC IV. Statistical: The ADSL Case V. Application specific: SkypeVI. ComparativeVII. Conclusion
  3. 3. I - IntroductionInternet is more and more used todayWe want to keep the network comfortable enoughThe quality of service asked by consumers increases asfast as applications consumes more bandwidthISPs, companies and universities want to ban P2PPort based classifiers were good years ago, quiteinefficient now
  4. 4. Why classify?Classification is today a key issue for today’s networkadministrators and companies for the following reasons:• Improve the network infrastructure• Ban undesired traffic• Protect the network against potential attacks• Global knowledge of trends
  5. 5. How classify?Deep Packet Inspection (DPI): verry precise technique butlots of drawbacks: Huge computation power needed Unneficient if packets are crypted Continuous need of database updatesStatistical analysisSocial
  6. 6. II - Early Application IdentificationGoal: determine the app with the first few packetsAdvantage: knowing the kind of traffic in thebeginning, ability to block, redirect itDPI consumes too much ressources and flows need to beended to be analysedStatistical: usage of the mean sizes, durations, … theseare values that are not available for the first few packets
  7. 7. Clustering the flowsTechniques used: K-Means, Gaussian MixtureModel, specialValues used: Size of the first few packets Duration of the first few packets (negociation phase)
  8. 8. Data set4 packet traces 3 from a University network 1 from an enterprise networkKeep only TCP packets and trash the ones that flowbegan before the trace captureFeatures analysed: need for an efficient metric Size and direction of the first 4 packets We can observe that the range of theses values is very similar across traces, see graph next slide
  9. 9. Size &Direction
  10. 10. Classification, 2 phasesTraining phase: offline at management sites. Apply clustering techniques to samples of TCP connections for all target applications Creation of a spatial representation based on the sizes of the first P packets (vector of P dimensions or HMM) Then find applications that have the same behaviour Best results: 40 clusters and the 4 first packets Creation of two sets: One with the description of each cluster One with applications present in each cluster
  11. 11. Classification, 2 phasesClassification phase: online at management hosts Extract the 5-tuple and analysis of the size of packets in all directions With this size, use the assigment module (associates a connection to a cluster) With the clusters, the labelling module selects the application associated with the connection
  12. 12. Evaluation & Conclusion Evaluation Assigment accuracy: above 95% for all heuristics Labbeling accuracy: between 85% and 98% The size of first few packet is a good metric Quality of clustering is richer with HMM but comparable with Euclidean GMM Clustering with TCP ports classifies over 98% of know applications Limitation: need the first 4 packets in the correct orderHeuristic: (Wikipedia) Where the exhaustive search is impractical (NP-complete for instance), heuristic methods are used to speed up theprocess of finding a satisfactory solution.
  13. 13. III – The BLINC ClassifierStands for BLINd ClassificationAvoid reading the whole content of the packet Privacy, performance, cyphered packets3 levels of classification Social level Functional level Application level
  14. 14. The Social levelFinding host communities Client-server, P2P, …Analyse these communities Perfect match : likely malicious Partial overlap : P2P sources, websites, gaming, … Partial overlap within the same subnet : farms
  15. 15. The Social level (2)
  16. 16. The functional levelFind if a host offers a service, uses it or bothMostly depending on the port range used by this hostWorks better when a host is connected to many serversTypical schemes: HTTP server: 1-2 ports P2P: many ports (to 1 per host) Mail server: depending on services available
  17. 17. The application levelUsing the connections 4-tuple (+ maybe othercharacteristics)Create a model for every application typeModels are represented by little graphs called« graphlets »
  18. 18. BLINC : ResultsUses 2 metrics to evaluate the classifier Completeness (% classified traffic) Accuracy (% correctly classified traffic)Some parameters can be used to tune the classifier Changing a threshold can improve the results for one of the metrics, but significantly degrade the other one
  19. 19. Global resultsGN : Genome campus (~1000 users), UN : university network (~20.000 users)
  20. 20. TuningTd : minimal # of destination IPs needed to classify the flow as P2P
  21. 21. Results (2)Good detection rate without reading any byte of the payload Non payload flows classified as well. Cyphering is not a problem Low resource consumptionGood detection of unknown flowsDifficult to distinguish applications of the same type (e.g.a ll VoIPprotocols grouped as the same one)Doesn’t work if the header are encryptedHard to identify multiple sources behind NATsResults from the edge of the network, the classifier may workdifferently at the backbone of the network
  22. 22. BLINC : conclusionBLINC has a good detection rate without costing a lot ofprocessing and without being intrusiveIt can detect attacks and unknown protocolsIt can be improved in some situations
  23. 23. IV – The ADSL CaseTest statistical classifier on different sites, after havingbeen trained on some others.Dataset: 4 packet traces collected at 3 different ADSL POPs from Orange 2 traces at the same time, different locations 2 traces at the same location, 17 days between Reference used: ODP tool (provided by Orange)
  24. 24. Classification methodology3 algorithms used to classify the traces Naïve Bayes Kernel Estimation Bayesian Network C4.5 Decision TreeTraces analysed on the two features SET_A: Packet Level Information SET_B: Flow Level Statistics3 filters: S/S: flows with 3-way-Handshake S/S+4D: same as S/S + at least 4 data packets S/S+F/R: same as S/S + FIN or RST flag at the end
  25. 25. Classification, 2 casesStatic case: classification on each site independently Ideal number of packets: 4 Accuracy: about 90% Great classification of WEB and EDONKEY flowsCross-site case: SET_A: EDONKEY result immune, spatial similarity seems more important than temporal similarity. Classifier very sensitive to the context in which it is trained MAIL is often taken for FTP due to the packet sizes similarities Usage of Port number increases the quality of results
  26. 26. Classification, 2 cases (continued) SET_B: some degradations Focus on a single feature: Port number Results are the opposite from the static case Prediction of traffic using non-legacy ports is non efficient Due to the heavy-hitters (typically P2P)Global results: C4.5 algorithm is the best in term of overallaccuracy for almost all cases (static + cross-site) Degradation : C4.5 is comparable with other algorithms (≤17%)Data overfitting problem
  27. 27. Unknown class + ConclusionLooking for the unknown marked flows 3 way handshake Apply classifiers and get confidence level, this value is then compared to the one returned by C4.5 Useful to detect malicious traffic and P2P Should be integrated into existing DPI toolConclusion: Statistical tools are very useful to identify unknown traffic Good performances if used in the same site as training Can detect applications among protocols Really suffers from data overfitting (same behaviour from different apps) Great thing about this analysis: used commercial traffic, so very differentiated
  28. 28. V – Skype caseWe want to detect Skype trafficIt’s already possible to detect VoIP traffic with otherclassifiers, but how to distinguish it ?Skype is a closed and cyphered protocol, which has to beanalysed before starting the classification
  29. 29. Skype modelUsing a controlled environment, detection of Skype trafficcharacteristics2 kinds of connections : E2E and E2O E2E : End 2 End, Skype to Skype E2O : End 2 Out, Skype to telephone networkSkype works on TCP and UDPSkype can carry text, voice, video and files Everything multiplexed in 1 packet In this case, only voice traffic is treated
  30. 30. Skype SoMTCP packets are entirely cyphered, they cannot beanalysedUDP has a small uncyphered overhead, called Start ofMessage (SoM)E2E : id and message type (signaling or data)E20 : unique connection identifierSkype also always uses the same port number in UDP(12340)
  31. 31. ClassifiersChi-Square Classifier (CSC) Based on the randomness of bits in packets Doesn’t works on TCP since cyphered packets seems to be completely random.Naive Bayes Classifier (NBC) Real-time voice protocol classifier Based on message size (depending of the audio codec) and on average inter-packet gap Used on a short window of samples to cope with variability in packet sizePayload based classifier Used in the controlled environment to check if CSC and NBC work well
  32. 32. ExperimentsNBC detects all kinds of VoIP trafficCSC detects all kinds of Skype traffic Using both of them should detect Skype voice traffic
  33. 33. Results N N OK OK FP FP FP% FP% FN FN FN% FN% N N OK OK FP FP FP% FP% FN FN FN% FN% E2E E2E 1014 1014 E2E E2E 65 65 PBC PBC —— —— —— —— — — PBC PBC —— —— — — —— — — E2O E2O 163 163 E2O E2O 125 125 E2E E2E 1236 1236 726 726 510 510 0.68 0.68 288 288 28.40 28.40 E2E E2E 27437 27437 50 50 27387 27387 73.73 73.73 15 15 23.08 23.08 NBC NBC NBC NBC E2O E2O 441 441 153 153 288 288 0.38 0.38 10 10 6.13 6.13 E2O E2O 295 295 124 124 171 171 0.46 0.46 1 1 0.80 0.80 E2E E2E 2781 2781 984 984 1797 1797 2.40 2.40 30 30 2.96 2.96 E2E E2E 191 191 57 57 134 134 0.36 0.36 8 8 12.31 12.31 CSC CSC CSC CSC E2O E2O 161 161 157 157 44 0.01 0.01 66 3.68 3.68 E2O E2O 190 190 123 123 67 67 0.18 0.18 2 2 1.6 1.6 NBC ∧ NBC ∧ E2E E2E 716 716 710 710 66 0.01 0.01 304 304 29.98 29.98 NBC ∧ NBC ∧ E2E E2E 51 51 49 49 2 2 0.01 0.01 16 16 24.62 24.62 CSC CSC E2O E2O 147 147 147 147 00 0.00 0.00 16 16 9.82 9.82 CSC CSC E2O E2O 163 163 122 122 41 41 0.11 0.11 3 3 2.40 2.40 ≥ 100 ≥ 100 76025 76025 ≥ 100 ≥ 100 37212 37212 TOT TOT — — — — — — — — — — TOT TOT — — — — — — — — — — 487729 487729 258634 258634 Table 3: Results for UDP flows, C AMPUS dataset. Table 3: Results for UDP flows, C AMPUS dataset. Table 4: Results for UDP flows, ISP dataset. Table 4: Results for UDP flows, ISP dataset. C AMPUS C AMPUS ISP ISPPBC as oracle, so that flows that pass the PBC classification form E2E E2E 20910 20910 60 60 PBC as oracle, so that flows that pass the PBC classification form NBC NBC E2O E2O 2034 2034 646 646aa reliable dataset. We refer to this set as the benchmark dataset. reliable dataset. We refer to this set as the benchmark dataset. E2E E2E Very low false positive rateIn particular, this dataset is built by Skype voice flows considering In particular, this dataset is built by Skype voice flows consideringthe E2O case. In the E2E case, voice, video, data and chat flows CSC CSC E2O E2O 403996 403996 46876 46876 the E2O case. In the E2E case, voice, video, data and chat flows NBC ∧ CSC E2E E2E 621 621 12 12are present, since it is impossible to distinguish among them from NBC ∧ CSC E2O 313 0 are present, since it is impossible to distinguish among them from E2O 313 0packet inspection. Our tests are the NBC, the CSC and the joint ≥ 100 1646424 108831 Bigger false negative rate packet inspection. Our tests are the NBC, the CSC and the jointNBC-CSC classifiers. Notice that the NBC test is expected to fail NBC-CSC classifiers. Notice that the NBC test is expected to fail TOT TOT ≥ 100 1646424 23856424 23856424 108831 1614553 1614553when aavideo/data/chat benchmark E2E flow is tested. when video/data/chat benchmark E2E flow is tested. From aapreliminary set of experiments on the testbed traces, con- From preliminary set of experiments on the testbed traces, con-taining more that 50 Skype voice calls, we tuned the PBC and CSC Table 5: Results for TCP flows, both datasets. Table 5: Results for TCP flows, both datasets. taining more that 50 Skype voice calls, we tuned the PBC and CSCclassifier thresholds to B m i inn = − 5 and χ 22(T hr) = 150, respec- classifier thresholds to B m = − 5 and χ (T hr) = 150, respec-tively. Using such choices, further discussed in Sec. 5.2, all flows tively. Using such choices, further discussed in Sec. 5.2, all flows noticing that the NBC (correctly) identifies 27437 voice flows, mostwere correctly identified as E2E or E2O, and neither FP nor FN noticing that the NBC (correctly) identifies 27437 voice flows, most were correctly identified as E2E or E2O, and neither FP nor FN of which correspond to actual ISP’s VoIP flows carried over RTP. of which correspond to actual ISP’s VoIP flows carried over RTP.were identified. Using the same threshold setting, we then apply were identified. Using the same threshold setting, we then apply Only combining the CSC allows to detect the true Skype voicethe classification to real traffic traces: the results are summarized Only combining the CSC allows to detect the true Skype voice the classification to real traffic traces: the results are summarized flows. These results confirm that the NBC-FP may be due to non- flows. These results confirm that the NBC-FP may be due to non-
  34. 34. Skype : ConclusionSkype is hard to classify due to its cypheringprotocol, which makes its analysis hard to doBut with this classifier, we have good results on UDP False positive is almost zero, good if the ISP wants to prioritarize its traffic False negative is bigger but not really a problem while the ISP doesn’t want to block Skype
  35. 35. VI - ComparativeAll these classifiers have good results, but each of them has itsstrengths and weaknessesADSL needs specific training, but best detection rateBLINC and Early are less precise but more flexible They are also faster and good to detect attacksBLINC detects unknown protocols but cannot discern appsEarly needs the 4 first packets in order, ADSL the 3-way handshakeSkype is more specific, cannot be compared immediately Good false positive rate but higher false negative rate
  36. 36. VII – ConclusionWe have now solutions that can replace DPI’sEach classifier is good in its domain Important network: early app detection (detect attacks soon) ADSL and commercial: statistical (user trends, adapt infrastructure) University or academy: BLINC (statistics, trends) Everywhere we want to improve it: Skype classifierRemarks: Traces and classifiers are quite old (4 to 6 years) What about mobile usage ? Multimedia over 3/4G networks ?
  37. 37. References:K. Karagiannis, K. Papagiannaki, M. Faloutsos. BLINC: Multilevel TrafficClassification in the Dark. In Proc. ACM SIGCOMM. August 2005.L. Bernaille, R. Teixeira, K. Salamatian. Early Application Identification. In Proc.ACM CoNEXT. December 2006.M. Pietrzyk, J.-L. Costeux, G. Urvoy-Keller, T. En-Jajjary. Challenging StatisticalClassification for Operational Usage: the ADSL Case. In Proc. ACM/USENIXInternet Measurement Conference (IMC). Novem- ber 2009.D.Bonfiglio,M.Mellia,M.Meo,D.Rossi,P.Tofanelli.RevealingSkype Traffic: WhenRandomness Plays with You. In Proc. ACM SIGCOMM. August 2007. Thanks for your attention Any questions ?

×