Guiomar corral memoriatesi_2009_07_13.pdf.txt;jsessionid=49bfb8b510baef6cc9ec29ac78f49861.tdx2

1,666 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,666
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Guiomar corral memoriatesi_2009_07_13.pdf.txt;jsessionid=49bfb8b510baef6cc9ec29ac78f49861.tdx2

  1. 1. New Challenges in Detection and Management of Security Vulnerabilities in DataNetworks Tesi DoctoralDoctorand : Guiomar Corral Torruella Directora del Treball : Dra. ElisabetGolobardes i Rib´ e Programa de Doctorat : Tecnologies de la Informaci´ i les oComunicacions i la seva gesti´ oGrup de Recerca en Sistemes Intel·ligents Escola T`cnica Superior d’Enginyeriaen Electr`nica i Inform`tica e o a Enginyeria i Arquitectura La Salle –Universitat Ramon Llull Quatre Camins, 2 – 08022 Barcelonahttp://www.salle.url.edu/GRSI 10 de 7 de 2009
  2. 2. 2
  3. 3. AbstractAs networks become an integral part of corporations and everyone’s lives,advanced network security technologies are being developed to protect data andpreserve privacy. Network security testing is necessary to identify and reportvulnerabilities, and also to assure enterprise security requirements. Securityanalysis is necessary to recognize malicious data, unauthorized tra c, detectedffivulnerabilities, intrusion data patterns, and also to extract conclusions fromthe information gathered in the security test. Then, where is the problem? Thereis no open-source standard for security testing, there is no integral frameworkthat follows an open-source methodology for security testing, informationgathered after a security test includes large data sets, there is not an exactand objective pattern of behavior among network devices or, furthermore, amongdata networks and, finally, there are too many potentially vulnerabilities. Thechallenge of this domain resides in having a great volume of data; data arecomplex and can appear inconsistent diagnostics. It is also an unsuperviseddomain where no machine learning techniques have been applied before. Thus acomplete characterization of the domain is needed. Consensus is the maincontribution of this thesis. Consensus is an integrated framework that includesa computer-aided system developed to help security experts during networktesting and analysis. The system automates mechanisms related to a securityassessment in order to minimize the time needed to perform an OSSTMM securitytest. This framework can be used in wired and wireless networks. Networksecurity can be evaluated from inside or from outside the system. It gathersdata of di erent network devices, not only computers but also routers, firewallsffand Intrusion Detection Systems (IDS). Consensus manages many data to beprocessed by security analysts after an exhaustive test. General information,port scanning data, operating system fingerprinting , vulnerability scanningdata, routing and filtering rules, IDS response, answer to malicious code, weakpasswords reporting, and response to denial of service attacks can be stored foreach tested device. This data is gathered by the automated testing tools thathave been included in Consensus. The great amount of data for every device andthe di erent number and type of attributes complicates a manually tra c patternff ffifinding. The automated testing tools can obtain di erent results, incomplete orffinconsistent information. Then data obtained from a security test can beuncertain, approximate, complex and partial true. In this environment arises thesecond main contribution of this thesis: Analia, the data analysis module ofConsensus. Whereas Consensus gathers security data, Analia includes ArtificialIntelligence to help analysts after a vulnerability assessment. Unsupervisedlearning has been analyzed to be adapted to this domain. Analia findsresemblances within tested devices and clustering aids analysts in theextraction of conclusions. Afterwards, the best results are selected by applyingcluster validity indices. Then explanations of clustering results are includedto give a more comprehensive response to security analysts. The combination ofmachine learning techniques in the network security domain provides benefits andimprovements when performing security assessments with the Consensus frameworkand processing its results with Analia.3
  4. 4. 4
  5. 5. ResumenA medida que las redes pasan a ser un elemento integral de las corporaciones,las tecnolog´ de ıas seguridad de red se desarrollan para proteger datos ypreservar la privacidad. El test de seguridad en una red permite identificarvulnerabilidades y asegurar los requisitos de seguridad de cualquier empresa. Elan´lisis de la seguridad permite reconocer informaci´n maliciosa, tr´fico noautorizado, a o a vulnerabilidades de dispositivos o de la red, patrones deintrusi´n, y extraer conclusiones de la o informaci´n recopilada en el test.Entonces, ¿d´nde est´ el problema? No existe un est´ndar o o a a de c´digoabierto ni un marco integral que siga una metodolog´ de c´digo abierto paratests de o ıa o seguridad, la informaci´n recopilada despu´s de un test incluyemuchos datos, no existe un patr´n o e o exacto y objetivo sobre elcomportamiento de los dispositivos de red ni sobre las redes y, finalmente, el n´mero de vulnerabilidades potenciales es muy extenso. El desaf´ de este dominioreside en u ıo tener un gran volumen de datos complejos, donde pueden aparecerdiagn´sticos inconsistentes. o Adem´s, es un dominio no supervisado donde no sehan aplicado t´cnicas de aprendizaje autom´tico a e a anteriormente. Por ello esnecesaria una completa caracterizaci´n del dominio. o Consensus es la aportaci´nprincipal de esta tesis: un marco integrado que incluye un sistema oautomatizado para mejorar la realizaci´n de tests en una red y el an´lisis de lainformaci´n recogida. o a o El sistema automatiza los mecanismos asociados a untest de seguridad y minimiza la duraci´n de o dicho test, siguiendo la metodolog´ OSSTMM. Puede ser usado en redes cableadas e inal´mbricas. ıa a La seguridadse puede evaluar desde una perspectiva interna, o bien externa a la propia red.Se recopilan datos de ordenadores, routers, firewalls y detectores deintrusiones. Consensus gestionar´ a los datos a procesar por analistas deseguridad. Informaci´n general y espec´ o ıfica sobre sus servicios, sistemaoperativo, la detecci´n de vulnerabilidades, reglas de encaminamiento y defiltrado, la o respuesta de los detectores de intrusiones, la debilidad de lascontrase˜as, y la respuesta a c´digo n o malicioso o a ataques de denegaci´n deservicio son un ejemplo de los datos a almacenar por cada o dispositivo. Estosdatos son recopilados por las herramientas de test incluidas en Consensus. Lagran cantidad de datos por cada dispositivo y el diferente n´mero y tipo deatributos que les u caracterizan, complican la extracci´n manual de un patr´n decomportamiento. Las herramientas o o de test automatizadas pueden obtenerdiferentes resultados sobre el mismo dispositivo y la informaci´n recopiladapuede llegar a ser incompleta o inconsistente. En este entorno surge la segundao principal aportaci´n de esta tesis: Analia, el m´dulo de an´lisis deConsensus. Mientras que Cono o a sensus se encarga de recopilar datos sobre laseguridad de los dispositivos, Analia incluye t´cnicas e de InteligenciaArtificial para ayudar a los analistas despu´s de un test de seguridad. Distintose m´todos de aprendizaje no supervisado se han analizado para ser adaptados aeste dominio. Analia e encuentra semejanzas dentro de los dispositivosanalizados y la agrupaci´n de dichos dispositivos o ayuda a los analistas en laextracci´n de conclusiones. Las mejores agrupaciones son seleccionadas omediante la aplicaci´n de ´ o ındices de validaci´n. A continuaci´n, el sistemagenera explicaciones o o sobre cada agrupaci´n para dar una respuesta m´sdetallada a los analistas de seguridad. o a La combinaci´n de t´cnicas deaprendizaje autom´tico en el dominio de la seguridad de redes o e a proporcionabeneficios y mejoras en la realizaci´n de tests de seguridad mediante lautilizaci´n del o o marco integrado Consensus y su sistema de an´lisis deresultados Analia. a 5
  6. 6. 6
  7. 7. ResumA mesura que les xarxes passen a ser un element integral de les corporacions,les tecnologies de seguretat de xarxa es desenvolupen per protegir dades ipreservar la privacitat. El test de seguretat en una xarxa permet identificarvulnerabilitats i assegurar els requisits de seguretat de qualsevol empresa.L’an`lisi de la seguretat permet recon`ixer informaci´ maliciosa, tr`fic no a e oa autoritzat, vulnerabilitats de dispositius o de la xarxa, patrons d’intrusi´,i extreure conclusions de o la informaci´ recopilada en el test. Llavors, onest` el problema? No existeix un est`ndard de codi o a a obert ni un marcintegral que segueixi una metodologia de codi obert per a tests de seguretat, lainformaci´ recopilada despr´s d’un test inclou moltes dades, no existeix un patr´ exacte i objectiu o e o sobre el comportament dels dispositius de xarxa nisobre les xarxes i, finalment, el nombre de vulnerabilitats potencials ´s moltextens. El desafiament d’aquest domini resideix a tenir un gran e volum de dadescomplexes, on poden apar`ixer diagn`stics inconsistents. A m´s, ´s un domini noe o e e supervisat on no s’han aplicat t`cniques d’aprenentatge autom`ticanteriorment. Per aix` cal una e a o completa caracteritzaci´ del domini. oConsensus ´s l’aportaci´ principal d’aquesta tesi: un marc integrat que inclouun sistema e o automatitzat per millorar la realitzaci´ de tests en una xarxa il’an`lisi de la informaci´ recollida. o a o El sistema automatitza elsmecanismes associats a un test de seguretat i minimitza la durada de l’esmentattest, seguint la metodologia OSSTMM. Pot ser usat en xarxes cablejades i sensefils. La seguretat es pot avaluar des d’una perspectiva interna, o b´ externa ala pr`pia xarxa. Es e o recopilen dades d’ordinadors, routers, firewalls idetectors d’intrusions. Consensus gestionar` les a dades a processar peranalistes de seguretat. Informaci´ general i espec´ o ıfica sobre els seusserveis, sistema operatiu, la detecci´ de vulnerabilitats, regles d’encaminamenti de filtrat, la resposta dels o detectors d’intrusions, la debilitat de lescontrasenyes, i la resposta a codi malici´s o a atacs de o denegaci´ de servei s´n un exemple de les dades a emmagatzemar per cada dispositiu. Aquestes o odades s´n recopilades per les eines de test incloses a Consensus. o La granquantitat de dades per cada dispositiu i el diferent n´mero i tipus d’atributsque els u caracteritzen, compliquen l’extracci´ manual d’un patr´ decomportament. Les eines de test auo o tomatitzades poden obtenir diferentsresultats sobre el mateix dispositiu i la informaci´ recopilada o pot arribar aser incompleta o inconsistent. En aquest entorn sorgeix la segona principalaportaci´ o d’aquesta tesi: Analia, el m`dul d’an`lisi de Consensus. Mentre queConsensus s’encarrega de o a recopilar dades sobre la seguretat delsdispositius, Analia inclou t`cniques d’Intel·lig`ncia Artifie e cial per ajudarals analistes despr´s d’un test de seguretat. Diferents m`todes d ’aprenentatgeno e e supervisat s’han analitzat per ser adaptats a aquest domini. Analia trobasemblances dins dels dispositius analitzats i l’agrupaci´ dels esmentatsdispositius ajuda als analistes en l’extracci´ de o o conclusions. Les millorsagrupacions s´n seleccionades mitjan¸ant l’aplicaci´ d’´ o c o ındexs devalidaci´. o A continuaci´, el sistema genera explicacions sobre cada agrupaci´per donar una resposta m´s o o e detallada als analistes de seguretat. Lacombinaci´ de t`cniques d’aprenentatge autom`tic en el domini de la seguretat dexarxes o e a proporciona beneficis i millores en la realitzaci´ de tests deseguretat mitjan¸ant la utilitzaci´ del o c o marc integrat Consensus i el seusistema d’an`lisi de resultats Analia. a 7
  8. 8. 8
  9. 9. AcknowledgmentsI would like to express my gratitude to all those who gave me the possibility tocomplete this thesis. All this work has been possible with the support of theGrup de Recerca en Sistemes Intel·ligents of La Salle (URL) and specially, themembers that helped me in the di erent publications: Albert ´ Fornells, AlvaroffGarcia, Albert Orriols and David Vernet. I would also like to thank EvaArmengol, from the IIIA (Artificial Intelligence Research Institute), for hercollaboration in this thesis. The discussions and cooperations with all of themhave contributed substantially to this work. I am thankful to my supervisor, Dr.Elisabet Golobardes, whose encouragement, guidance and support from the initialto the final level enabled me to develop an understanding of the subject,approaching two initial separate worlds: the telematics and the artificialintelligence. I am grateful to the Telematics team as well: Agust´ Alex, JosepMaria, and specially, Jaume. ın, ` It is a pleasure to thank those who madeConsensus and Analia possible from the first beginning: ` Gaspar, Jordi, Maite,Xavi, Enrique, Albert, Angel, Anna, Elisabet, Angel, Hector, Omar, Esther, Alex,Isard and Marc. I am also indebted to Pete Herzog for all his help, support,interest and valuable hints, specially with the OSSTMM. Besides, I want to thankEva for her assistance with refining the English written. I would also like tothank the Ministerio de Educaci´n y Ciencia, which has supported the MIDo CBR(TIN2006-15140-C03-03) and VDS-Analia (CIT-390000-2005-27) projects, and theMinisterio de Industria, Turismo y Comercio, which has supported the Consensus-VDS project (FIT-3600002004-81). I want to thank Enginyeria La Salle for itssupport during this long way. I cannot end without thanking my family, on whoseconstant encouragement and unconditional support I have relied throughout mytime spent in this thesis. Moltes gr`cies Gustau, moltes gr`cies a a pares, perestar all´ des del principi. I moltes gr`cies Biel, el m´s petit de la fam´ ı ae ılia, per` que m’ha o donat la darrera empenta per presentar aquesta tesi.9
  10. 10. 10
  11. 11. ContentsI Framework and Related Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2123 23 24 26 27 29 30 31 31 32 32 33 35 37 38 39 39 39 40 41 42 43 45 46 47 47 4848 49 491 Introduction 1.1 Introduction 1.2 Framework 1.3 Motivation 1.4 Main goals 1.5The thesis . 1.6 Summary .2 Related work 2.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2Network vulnerability assessments . . . . . . . . . . . . . . . . . 2.2.1Security vulnerabilities . . . . . . . . . . . . . . . . . . . . 2.2.2 Securitypolicies, standards procedures and methodologies 2.2.3 Securityassessments . . . . . . . . . . . . . . . . . . . . . 2.2.4 Network securitytesting tools . . . . . . . . . . . . . . . . 2.3 Analysis of networkvulnerability assessments . . . . . . . . . . . 2.3.1 Artificial Intelligence andMachine Learning . . . . . . . . 2.3.2Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2.1Clustering process . . . . . . . . . . . . . . . . . 2.3.2.2 Clusteringclassification . . . . . . . . . . . . . . 2.3.2.3 K -means . . . . . . . . . . .. . . . . . . . . . . 2.3.2.4 X -means . . . . . . . . . . . . . . . . . . . . .. 2.3.2.5 SOM . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2.6Autoclass . . . . . . . . . . . . . . . . . . . . . . 2.3.2.7 Evolutionarymultiobjective clustering . . . . . . 2.3.3 Explanation of clusteringresults . . . . . . . . . . . . . . 2.3.4 Validation of clustering results . . .. . . . . . . . . . . . 2.3.4.1 C-index . . . . . . . . . . . . . . . . . . . . . . . 2.3.4.2 Davies-Boudinindex . . . . . . . . . . . . . . . . 2.3.4.3 Dunn index . . . . . . . . . . . .. . . . . . . . . 2.3.4.4 Silhouette index . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .IIContributions5153 53 543 Consensus: Improving testing in security assessments 3.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 3.2 Consensus analysis and
  12. 12. considerations . . . . . . . . . . . . . . . . . . . . . . . . . 11
  13. 13. 12 3.2.1 Discussion of security policies, standards and methodologies 3.2.2Discussion of security assessments . . . . . . . . . . . . . . 3.2.3 Discussionof security tools . . . . . . . . . . . . . . . . . . Consensus architecture . .. . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Global architecture ofConsensus . . . . . . . . . . . . . . . 3.3.2 Base System Module . . . . . . . .. . . . . . . . . . . . . . 3.3.3 ManagementModule . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Database Module . . . .. . . . . . . . . . . . . . . . . . . . 3.3.5 AnalysisModule . . . . . . . . . . . . . . . . . . . . . . . . 3.3.6 Testing Modules . .. . . . . . . . . . . . . . . . . . . . . . 3.3.7 The communicationsprotocol . . . . . . . . . . . . . . . . . Wired vulnerabilityassessment . . . . . . . . . . . . . . . . . . . . 3.4.1 Wired securityrequirements . . . . . . . . . . . . . . . . . . 3.4.2 Wired testing modules inConsensus . . . . . . . . . . . . . Wireless vulnerabilityassessment . . . . . . . . . . . . . . . . . . . 3.5.1 Wireless securityrequirements . . . . . . . . . . . . . . . . 3.5.2 Wireless testing module inConsensus . . . . . . . . . . . . Router and firewall vulnerabilityassessment . . . . . . . . . . . . . 3.6.1 Router and firewall securityrequirements . . . . . . . . . . 3.6.2 Router and firewall tests in Consensus . .. . . . . . . . . . Chaptersummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 56 57 59 59 60 6162 63 63 64 66 66 67 72 72 73 76 76 77 79 81 81 82 86 87 88 89 89 91 91 92 94 9696 97 99 101 104 105 106 108 110 110 111 114 1143.33.43.53.63.74 Experimentation with Consensus 4.1Introduction . . . . . . . . . . . . . . . . . 4.2 Experimentation on laboratorytestbed . . 4.3 Experimentation on La Salle Network . . 4.4 Experimentation onan enterprise network 4.5 Chapter summary . . . . . . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .
  14. 14. . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .5 Analia: Improving analysis in security assessments 5.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2Artificial Intelligence and clustering . . . . . . . . . . . . . . . . 5.2.1Discussion of Artificial Intelligence and Machine Learning 5.2.2 Discussion ofclustering approaches . . . . . . . . . . . . . 5.2.3 Discussion of clustervalidation approaches . . . . . . . . 5.3 Analiaarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Globalarchitecture of Analia . . . . . . . . . . . . . . . . 5.3.2 AI Module . . . . .. . . . . . . . . . . . . . . . . . . . . 5.4 Pattern representation in Analia .. . . . . . . . . . . . . . . . . 5.4.1 Pattern representation of portscanning . . . . . . . . . . 5.4.2 Pattern representation of operating systemfingerprinting 5.4.3 Pattern representation of vulnerability information . . . .5.4.4 Pattern representation for clustering in Analia . . . . . . 5.4.5 Patternimplementation for clustering in Analia . . . . . . 5.5 Clustering in Analia . .. . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Clustering design inAnalia . . . . . . . . . . . . . . . . . 5.5.1.1 Input block of AnaliaClustering module . . . . 5.5.1.2 Execution block of Analia Clusteringmodule . . 5.5.1.3 Output block of Analia Clustering module . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
  15. 15. . . . . . . . . . . . . . . . . . . .
  16. 16. CONTENTS 5.5.2 Clustering implementation in Analia . . . . . . . . Clustervalidation in Analia . . . . . . . . . . . . . . . . . 5.6.1 Intracohesionfactor . . . . . . . . . . . . . . . . . 5.6.2 Intercohesionfactor . . . . . . . . . . . . . . . . . 5.6.3 Prediction of the number ofclusters in Analia . . . 5.6.4 Implementation of cluster validation inAnalia . . Explanation of results in Analia . . . . . . . . . . . . . . . 5.7.1The Anti-unification in Analia . . . . . . . . . . . 5.7.2 The Detailed Anti-unification algorithm in Analia Chapter summary . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 115 118 118 119 120 122 126 126 128 130 133 133 134 138 140 143 147 149 1511535.65.75.86 Experimentation with Analia 6.1 Introduction . . . . . . . . . . . . . . . . .. . . . . . . . 6.2 Initial experimentation with Partition Methods . . . . . 6.3Experimentation with Autoclass . . . . . . . . . . . . . 6.4 Experimentationwith SOM . . . . . . . . . . . . . . . . 6.5 Experimentation with clustervalidation . . . . . . . . . 6.6 Experimentation with weightedattributes . . . . . . . . 6.7 Experimentation with multiobjectiveclustering . . . . . 6.8 Experimentation with explanations of clustering results6.9 Chapter summary . . . . . . . . . . . . . . . . . . . . .IIIConclusions and Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155157 157 157 158 160 164 1657 Conclusions and future work 7.1 Framework of thethesis . . . . . . . . . . . . . . . . . . . . . 7.2 Conclusions: contributionsand work . . . . . . . . . . . . . . 7.2.1 Automation of security assessments indata networks . 7.2.2 AI applied to the analysis of security in data networks7.3 Summary of publications . . . . . . . . . . . . . . . . . . . . . 7.4 Futurework . . . . . . . . . . . . . . . . . . . . . . . . . . . .IVBibliography167177A Notation
  17. 17. 14CONTENTS
  18. 18. List of Figures1.1 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.1 4.2 4.3 4.4 4.5 4.6 5.15.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 Framework of thisthesis with Consensus and Analia. . . . . . . . . . . . . . . . . . Types ofsystem and network security assessments as based on time and cost. . . . 2D-maparchitecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . Hexagonal and rectangulartopologies. . . . . . . . . . . . . . . . . . . . . . . . . . Globalarchitecture of Consensus. . . . . . . . . . Diagram of the Base SystemModule. . . . . . . . Block diagram of the Management Module. . . . Blockdiagram of the Database Module. . . . . . . Block diagram of a testing module. .. . . . . . . Message exchange. . . . . . . . . . . . . . . . . . Flow diagramof the Wired Test block. . . . . . . Flow diagram of the Wireless Testblock. . . . . Flow diagram of the Router/Firewall Test block. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 28 36 43 43 60 61 62 63 64 65 69 75 78 82 83 83 84 84 86 96 97 98 100 101102 103 103 103 104 104 105 106 106 107Experimentation testbed of the Networking Laboratory of LaSalle. . . . . . . . . . Management console of Consensus: Test configuration. . .. . . . . . . . . . . . . . Management console of Consensus: Summary of Testresults. . . . . . . . . . . . . Management console of Consensus: Wireless testconfiguration. . . . . . . . . . . . Management console of Consensus: Summary ofWireless Testing results. . . . . . ACE: Tra c flow between Consensus managementffiand a probe sending a results file. System architecture of Consensus with Analia.. . . . . . . . . . . . . . . . . . . . Analia phases. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . AI Module of Analia. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of avulnerability detection in a Linux web server. . . . . . . . . . . . . . Exampleof a vulnerability detection in a Windows web server. . . . . . . . . . . .Knowledge representation using ports and portranges. . . . . . . . . . . . . . . . . Knowledge representation using ports(boolean) and port ranges (numerical). . . . Knowdlege representation usingports and port ranges, all numerical attributes. . . Knowdlege representationusing ports and port ranges, all numerical attributes including filtered ports. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example ofdata stored in Consensus regarding the OS fingerprinting of a device. . Knowledgerepresentation using operating system fingerprinting data. . . . . . . .Knowledge representation using risk factor and severity leveldata. . . . . . . . . . Knowledge representation using all detected ports andoperating systems. . . . . . Knowledge representation using a single port rangeand all OS. . . . . . . . . . . . Knowledge representation using the same numberof attributes for ports and OS. . 15
  19. 19. 16 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30LIST OF FIGURES Knowledge representation using all detected ports, OS andvulnerability information.107 Knowledge representation using a single portrange, OS and vulnerability information.107 Knowledge representation using portranges, OS and vulnerability information. . . 108 View manager to generate apattern representation of open ports. . . . . . . . . . . 108 State diagram ofthe Clustering module of Analia. . . . . . . . . . . . . . . . . . . 111 Exampleof a configuration file of a K -means execution in Analia. . . . . . . . . . . 112Example of an ARFF input data file in Analia. . . . . . . . . . . . . . . . . . .. . 113 Fragment example of an output results file in Analia. . . . . . . . . . .. . . . . . . 115 Configuration web interface of the clustering approaches inAnalia. . . . . . . . . . 116 Example of a clustering result inAnalia. . . . . . . . . . . . . . . . . . . . . . . . . 117 Example of the webinterface with a list of clustering executions in Analia. . . . . 117 Blockdiagram of the Indexus module. . . . . . . . . . . . . . . . . . . . . . . . . .122 Example of the cluster validity process in Analia. . . . . . . . . . . . . .. . . . . . 123 Example of the cluster validity process with Cohesion factors inAnalia. . . . . . . 124 Example of a cluster matrix with similarity valuesbetween tested devices and its Intracohesion cluster value inAnalia. . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.31 Example ofa matrix with distance values between clusters of a partition and itsIntercohesion value in Analia. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 125 5.32 Explanation of the elements of Table 5.4 applying DAUalgorithm, when the percentage of occurrence of attributes is selected to morethan 70%. . . . . . . . . . . 129 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.116.12 6.13 Cluster results with K -means for di erent values offfK. . . . . . . . . . . . . . . . . Display of cluster results with K -means forK = 4, 5. . . . . . . . . . . . . . . . . Display of cluster results with K-means for K = 5 in Analia. . . . . . . . . . . . . Distribution of clusters byK -means and by X -means. . . . . . . . . . . . . . . . . Cluster distributionsby automatic Autoclass for Views 0, 1 and 2. . . . . . . . . . Distribution ofclusters by K -means, X -means and SOM for View 0, where K is the number ofclusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Intracohesion and Intercohesion evolution for K-means ( ) and SOM ( ). . . . .. Intracohesion and Intercohesion evolution for K -means, Autoclass andautomatic Autoclass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Validity index evolution for K -means. . . . . . . . . . .. . . . . . . . . . . . . . . Validity index evolution for X -means. . . . . . .. . . . . . . . . . . . . . . . . . . Representation of the analysis of thedi erent weighted views. . . . . . . . . . . . . CAOS clustering output onffConsensus with the deviation and the connectivity. . . Clustering distributionof devices for SOM executions. . . . . . . . . . . . . . . . . 135 136 136 137139 141 144 145 146 147 148 150 152
  20. 20. List of Tables3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 5.1 5.2 5.3 5.4 6.1 Comparison of securitystandards. . . . . . . . . . . . . . . . . . . Comparison of securitymethodologies. . . . . . . . . . . . . . . . . Sections susceptible ofautomation of the OSSTMM methodology. Comparison of securitytools. . . . . . . . . . . . . . . . . . . . . . Tools for testing in Consensuswired probes. . . . . . . . . . . . . . Consensus probesdescription. . . . . . . . . . . . . . . . . . . . . . Wireless tools fortesting in Consensus Wireless Module. . . . . . . Tools for firewall and routertesting inConsensus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 55 55 57 58 70 71 74 78 87 119 120 127 129Time period of a manual and automatic security test. . . . . . . . . . . . . . .. . Example of a cluster matrix that includes Similarity values between testeddevices. Example of a comparison matrix of distances between clusters of apartition. . . . Description of four elements from Analia dataset and theirexplanation D obtained from AUalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Description of four elements from Analia dataset and their explanation Diobtained from DAU algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . Summary of the mean of Intracohesion and Intercohesion valuesfor K-means and SOM algorithms using di erent number of clusters (C). It alsoffincludes the di erence between both factors. The best results are marked inffBold. . . . . . . . . . . . . . Student’s test result of SOM compared to K-means. . . . . . . . . . . . . . . . . . Summary of the Cohesion factors for K-means, X-means and CAOS clustering for di erent number of clusters. . . . . . .ff. . . . . . . . . . . . . . . . . . . . . . . . Explanation of two clustersobtained from AU algorithm. . . . . . . . . . . . . . . Explanation of twoclusters obtained from DAU algorithm. . . . . . . . . . . . . .6.2 6.3 6.4 6.5143 144 150 152 15317
  21. 21. 18
  22. 22. List of Algorithms2.1 2.2 2.3 5.1 5.2 5.3 5.4 K-meansalgorithm . . . . . . . . . . . . . . . . . . . . . . X-meansalgorithm . . . . . . . . . . . . . . . . . . . . . . Training algorithm of a2D-Kohonen map. . . . . . . . . . View Manager algorithm . . . . . . . . . . . .. . . . . . . Algorithm for the selection of the best number of clusters in AUalgorithm adapted to Analia domain . . . . . . . . . . Detailed Anti-Unification(DAU) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Analia . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 42 44 109121 127 12819
  23. 23. 20LIST OF ALGORITHMS
  24. 24. Part IFramework and Related Work21
  25. 25. Chapter 1IntroductionThe significant need of security systems that integrate malware and vulnerabilitydetection in large-scale operational communication networks has become theleitmotiv of this thesis. One of the main goals has been the automation of theprocesses related to a security assessment following a well-known andestablished open-source methodology for security testing. The second main goalhas been focused on the approaching between the security field and the artificialintelligence in order to get benefit from the best of both worlds. The analysisof data gathered in a security test has been improved with the application ofartificial intelligence techniques. This chapter introduces the main concepts ofthis thesis, describes the framework, and details the motivation and objectives.It concludes with a formal overview of the organization of this document.1.1IntroductionThe growing requirement of the network market for functional intelligent systemsin the security area is currently motivating important research and developmentwork. Intelligent systems, which can provide human like expertise such as domainknowledge and uncertain reasoning, are important in tackling practical computingproblems. This thesis aims to provide a contribution to the network securityfield including the benefits of intelligent systems. This intrinsicinterconnection will benefit from the best of both worlds. Its result will be anew framework for security assessments that improves the security testing phaseand the analysis of testing results. The complete framework has been namedConsensus and the intelligent system module has been called Analia. Theintegration of di erent learning techniques, to overcome individual limitationsffand achieve synergetic e ects through combination of these techniques, willffcontribute to a more complete intelligent system for security testing purposes.The experimentation will be presented to demonstrate how this novel frameworkhas been used to solve real world problems. This thesis comes up to newchallenges in detection and management of security vulnerabilities in datanetworks. It researches a new assessment framework for the testing ofcommunication networks in terms of security. First of all, the di erentffprocesses that have to be carried out in a security test according to aparticular methodology are investigated. The most appropriate processes areselected to become automated in order to improve the testing phase of a securityassessment. This main contribution materializes in the framework Consensus. Itprovides a new model to perform security tests by automating the processes withthe incorporation of open-source security tools in the system. The management ofthe information collected after a security test is also investigated in thisthesis. Machine learning techniques are analyzed and the most suitable ones areselected to be integrated in the thesis framework. The combination of thesemachine learning approaches gets the best of them and overcomes their individualdeficiencies to produce the second main contribution of this thesis, Analia, theanalysis module of Consensus with Artificial Intelligence (AI) capabilities. 23
  26. 26. 24CHAPTER 1. INTRODUCTIONThis chapter presents a comprehensive overview of the thesis. The framework thatembraces this thesis is detailed and the lines of research are related to thedi erent projects in which we have taken part. The fundamentals that motivatedffthis research and the reasons to contribute to the research community with a newsecurity assessment framework are detailed. The main goals of this thesis arealso described in this chapter. Finally, the organization of this thesis isillustrated.1.2FrameworkThis thesis has been submitted to the PhD program Tecnologies de la Informaci´ iles Comunio cacions i la seva gesti´ (Doctorate in Information and CommunicationTechnologies and its Mano agement) of the Universitat Ramon Llull (URL). Thisthesis is part of the research of the Grup de Recerca en Sistemes Intel·ligents(Research Group in Intelligent Systems), GRSI, of Enginyeria La Salle of theURL. The research of the GRSI is focused on Machine Learning, especially on thefield of Knowledge Discovery from Databases, also known as Data Mining. Theresearch group aims at extracting interesting patterns from moderate and largedata sets. In this framework, GRSI works on di erent stages of the process offfdata mining: pre-processing, characterization of data sets, analysis for abetter understanding and improvement of machine learning techniques,methodologies to evaluate learners and post-processing. Over the last few years,the research has mainly focused on learning methods inspired in naturalprinciples and analogy. The group is known for its expertise on EvolutionaryComputation, Case-Based Reasoning and Soft Computing, and Data Complexity. Thegroup seeks for applying its research to real world domains which may benefitsociety in the near future. Some of its recent applications are the support tothe diagnosis of breast cancer and the computer-aided analysis of attacks incomputer networks. Currently, the group is also working on other challengingmedical and industry applications. The group has been accredited by Generalitatde Catalunya as Consolidated Research Group (2002 SGR-00155, 2005 SGR 302). Oneof the domains of application in which GRSI researches into is the security ondata networks. This thesis forms part of this domain and presents an organizedapproach where machine learning and intelligent systems become validalternatives to solve existent problems in the data network domain. All theseproblems have been arisen in the di erent projects where the work and theffdi erent contributions of this thesis have been included. I joined the GRSI inff1998, more than ten years ago. My collaboration with the GRSI started with theproject Design and Implementation of a New Generation Telematics Platform toSupport Open and Distance Learning (Implementaci´n y estudio de herramientas deinteligencia artificial o aplicadas a plataformas de ense˜anza abierta adistancia y a las redes ATM que las soportan). This n study was partiallysupported by the Spanish Education Ministry grant CICYT-TEL-98-0408. In thisproject the cooperation between the artificial intelligence and the data networkfields started from the beginning, proving that artificial intelligence techniquescan provide useful solutions to network problems. The research work in thesecurity domain started in the project Consensus - Vulnerability DetectionSystem, partially funded by the Ministerio de Industria, Turismo y Comerciounder grant number FIT-360000-2004-81. This work continued with the project VDS-ANALIA: Sistema de an´lisis de detecci´n de vulnerabilidades medianteinteligencia artificial (VDS-ANALIA: Vulneraa o bility Detection Analysis Systemby means of Artificial Intelligence) funded by the Ministerio de Educaci´n yCiencia under grant CIT-390000-2005-27. The line of research was also integratedin o the CICYT project called MID-CBR: A Unified Framework for the Development ofCase-Based Reasoning (CBR) Systems (Un Marco Integrador para el Desarrollo deSistemas de Razonamiento Basado en Casos). This project has been supported bythe Spanish Ministry of Education and Science under grant TIN2006-15140-C03-03.
  27. 27. 1.2. FRAMEWORK A brief description of these projects is detailed below:25• IA-ATM (CICYT-TEL-98-0408): This project had two main goals for the GRSI: theapplication of AI to open and distance learning environments, and also to ATMnetworks. The first goal pursued to predict the evolution of the students todesign specific plans to avoid future fails. The second goal was related to theprediction of congestion in a ATM network to take measures in advance so as notto get the network congested [32]. Partners: Universitat de Girona andUniversitat Oberta de Catalunya. • Consensus (FIT-360000-2004-81): The main goalof this project was to develop a distributed tool that was dedicated to securityenvironment professionals. The tool follows the Open Source Security TestingMethodology Manual (OSSTMM) to carry out vulnerability tests in an autonomousand distributed way. Partner: ISECOM, who created the OSSTMM. • Analia (CIT-390000-2005-27): The objective of this project was the application of ArtificialIntelligence techniques to the improvement of the analysis of the informationobtained while carrying out security vulnerability tests in data networks.Partner: ISECOM. • MID-CBR (TIN2006-15140-C03-03): The main objectives of thisproject related to this thesis are the study of new ways to use soft computingtechniques for CBR, techniques for case retrieval in knowledge-intensive CBRsystems and the empirical evaluation of the developed techniques by means of CBRprototypes implemented for several experimental domains. In this thesis theexperimental domain was the data network security. Partners: UniversidadComplutense de Madrid and the Artificial Intelligence Research Institute.Entities of promotion and observation: ISECOM and the Fundaci´ Cl´ o ınic per ala Recerca Biom`dica. e In parallel, I have collaborated on other researchprojects related to the networking area but without the component of ArtificialIntelligence. These projects focus mainly on security aspects of di erentfftechnologies of data networks. All these projects have been carried out in LaSalle as well, led by the Telematics group. An abstract of these projects isdetailed below: • ATM-VP: The project Adaptative Reconfiguration of ATM virtualnetworks based on Virtual Paths was done together with the corporation DIMAT in1998. The main goal of this project was to develop a design and animplementation of an automatic management console of the ATM virtual networks incase of congestion [92]. • CSI-RHET: The project Quality of Service (QoS) andTra c Engineering in Heterogeneous Network addressed the design, implementationffiand evaluation of a heterogeneous multiservice broadband telecommunicationsnetwork based on IPv6 with QoS. Partners: Universidad de Cartagena, InstitutoTecnol´gico de Arag´n, Universidad de Alcal´ de Henares. o o a • OPERA 2: Themain goal of the project Open PLC European Research Alliance Phase 2 was todevelop a Power Line Communications (PLC) technology that can become analternative for a broadband access integrated network. The URL goal was toperform a security analysis of the PLC technology to evaluate the OPERA1specification and contribute with a new proposal for the standardization process(2005-2008). • ICARO: It stands for Integrity, Confidentiality, Authenticity andsecuRity in pOwerline networks. This project researches about the advancedsecurity mechanisms adapted to PLC in terms of data integrity, confidentialityand authenticity. This project has been done together with the company DS2 from2008 until 2009. The conclusions extracted from this project are being used tothe ITU-T standardization for PLC. This project has been partially funded by theMinisterio de Industria, Turismo y Comercio in the Avanza framework.
  28. 28. 26CHAPTER 1. INTRODUCTION1.3MotivationThe security of communication and data networks was not considered a priority afew years ago, perhaps because nobody was able to predict such an impressivegrowth of the use of data networks and their significant importance worldwide.The design of protocols, devices and networks was more focused on theiroperational function rather than providing systems that fulfilled securityrequirements. However this trend has radically changed by now. Nowadays this isa very prolific line of research with many e orts dedicated to security, as itffcan be extracted from the european Framework Programme (FP). As an example, theVII FP 2007-2013 has planned to invest 1,400 million euros in the security themein 7 years, 4% of the total budget1 . One of the cross cutting activitiesincluded in the research themes is the security systems integration,interconnectivity and interoperability, where the work of this thesis adapts tothe topics proposed. The need of improvement of the security testing mechanismsin data networks in order to enrich the threat and vulnerability detectionprocess is a red-hot issue. Hackers tend to go a step forward security systemsto break barriers and enter into public networks and systems. So intelligentsystems are indispensable to discover the own security flaws with the purpose ofovertaking possible hacking attacks. One of the main contributions of thisthesis formulates a solution for this matter. A new system that automates theprocesses related to a security test following an opensource testing methodologywidely disseminated, like the OSSTMM (Open Source Security Testing MethodologyManual)[67], is presented in this thesis. This system has been named Consensusand includes a wide variety of open-source testing security tools, as the OSSTMMrecommends. It automates the testing process for wired and wireless networks,considering not only computers but also network devices like routers, firewallsor intrusion detection systems. The testing process is not the only phasecapable of improvement. The analysis of the data gathered in a security test isanother line of research where many e orts are being dedicated. This is due toffthe fact that the data processing to analyze the security level of a network isdi cult, especially for the high volume of data to be managed. Another reason offfithis complexity is related to the information collected, as it usually containsdata from multiple sources with imprecise, approximate and uncertain knowledge.Thus, the following contribution presented in this thesis exposes newalternatives for the processing of this data moving closer machine learning andnetwork security fields. As a result of the automation process of a security testand its subsequent analysis, artificial intelligence has been included as a meansto promote the processing and analysis of data after a successful security test.Consequently, the application of machine learning techniques will help securityexperts to process the large amount of information and draw conclusions fromdata gathered automatically. This contribution is embodied in Analia, theanalysis module of Consensus that incorporates the analyzed machine learningapproaches. This module also comprises validity methods to confirm the quality ofmachine learning results. Moreover, it includes a contribution of a new methodof validation of the results adapted to this security domain ad hoc that will beused to verify the usefulness of the system’s response to the security analyst.With the contributions detailed in this thesis, we show the e ectiveness of theffmachine learning approach by building a framework for our model that detectsvulnerabilities and analyzes the results of a security audit. These processesare automated to improve the work of security auditors and alleviate the manualtasks carried out when doing a security test over a network.1European Comission CORDIS FP7 - http://cordis.europa.eu/fp7/home en.html
  29. 29. 1.4. MAIN GOALS271.4Main goalsThe research lines followed in this thesis have been defined mainly by theobjectives of the research projects developed on the activity of research, ledby the guidelines of the GRSI group in the telematics domain. Network securityis a critical aspect of any organization, company or university. It isincreasingly necessary to have mechanisms in data networks to guarantee aminimum level of risk and ensure a certain level of security, with the purposeof protecting not only the devices connected to the network and itscommunications but also the information stored. To achieve a proper level ofsecurity it is essential to know the status of all components of a network,detect potential threats and vulnerabilities, analyze the results and correctany problems discovered in the devices. This is why the first objective of thisresearch is the study and analysis of the techniques and methodologies used todetect vulnerabilities and perform security assessments in a network. Thisthorough analysis leads to the definition of the second objective of this thesis:the proposal, design and implementation of Consensus, a new solution to detectvulnerabilities in a network. This new system carries out security tests overdata networks on an automated way to detect security vulnerabilities andpotential holes. The system stores the results of tests in order in acentralized database. This line of research is linked with the project Consensus- Vulnerability Detection System explained in Section 1.2. The volume ofinformation obtained while performing a security test is substantial, because anetwork is formed by a multitude of devices and many parameters and data abouteach of them should be gathered. In addition, it is essential to conductsecurity tests on a network periodically in order to maintain the level ofsecurity in a network. It is possible to detect new devices, setting changes,new vulnerabilities and other important aspects regarding network security intwo consecutive assessments over the same network. Consequently, the securityanalyst faces a constant challenge to study the information obtained fromdi erent tests and draw appropriate conclusions. In this context, anotherffobjective of this research work is the analysis and application of machinelearning techniques in the field of network security to provide mechanisms toimprove information analysis obtained on the status of a network. Within thisobjective, the validation of data processing results from a security andtelematics point of view when applying machine learning techniques has generateda new goal and, hence, new contributions have been presented in this area. Theretrieval of the data processed by means of machine learning techniques hasbecome a topic of study as well. Comprehensive conclusions should be given tosecurity analysts so the proposed system becomes a real assistance in theirdaily routine. These objectives have been materialized in Analia, the analysismodule of Consensus with machine learning capabilities to extract conclusionsfrom security data. All these lines of work are related to the projects VDS-Analia and MID-CBR, explained in Section 1.2. Accordingly, the main goals ofthis thesis are summarized in the following list: • Study and analysis of thetechniques and methodologies used to detect vulnerabilities and perform securityassessments in a network. • Design and implementation of the security testingcontributions in a new framework called Consensus, a new solution to performsecurity assessments automatically in a network. • Study and analysis of machinelearning approaches in the field of network security to improve data analysisafter a security assessment of a communications network. • Study and analysis ofvalidation techniques for the results of the application of machine learningapproaches.
  30. 30. 28CHAPTER 1. INTRODUCTION • Study and analysis of the retrieval of the dataprocessed by means of machine learning techniques in a format understandable forthe security analyst. • Design and implementation of the contributions relatedto data processing with machine learning in Analia, an analysis moduleintegrated in Consensus. • Validation of the contributions and the frameworkConsensus and Analia with appropriate testing scenarios.The integrating framework of this thesis is depicted in Figure 1.1. This graphicshows the two main blocks of contribution: the Consensus system and the Analiamodule. The red dotted line separates both blocks and includes the securityprofile associated to each block. The security tester is the profile that willbenefit from Consensus features, whereas the security analyst is the profile thatwill take advantage of Analia facilities. These two profiles are related to thephases of a methodological security assessment [97]. On the one hand, Consensuswill provide the resources to verify system security settings and identifysystem vulnerabilities of wireless and wired networks. Consensus will allow toview the system from the external attacker perspective by testing the networkfrom the outside with the Internet Testing Sensor. It will also test the systemfrom a malicious insider perspective from the inside with the use of theIntranet Testing Sensor, as it can be seen in Figure 1.1. The specialcharacteristics of the DeMilitarized Zone (DMZ), where the public servers arelocated, will be solved with the use of the DMZ Testing Sensor. All thesesensors and their integration in Consensus will be described exhaustively in thenext chapters. On the other hand, Analia is the analysis module of Consensuswith Artificial Intelligence abilities to manage data stored in Consensus after asecurity assessment. So Analia will supply the mechanisms to do a structuredanalysis of the data collected from a security assessment and a reportingproposal. As a consequence, this module will include the machine learningtechniques that will provide processed results to security analysts.CONSENSUSANALIAFigure 1.1: Framework of this thesis with Consensus and Analia. The study,analysis, resolution and implementation of these objectives will be described inthe following chapters. Also the various contributions for each of the researchlines of this thesis will be detailed.
  31. 31. 1.5. THE THESIS291.5The thesisThis thesis summarizes all the di erent work and contributions developed in theffdata network and machine learning fields. This document is divided into threeparts: (1) framework and related work; (2) contributions; (3) conclusions andfuture work. Chapters 1 and 2 constitute the framework and the related workpart. Chapters 3, 4, 5 and 6 form part of the contributions. Finally, Chapter 7summarizes the conclusions and future work. A brief description of the chaptersincluded in this thesis is detailed below: • Chapter 1 introduces the frameworkof this thesis and details the motivation and goals that have led us to moveforward in this research area. • Chapter 2 provides background on relevantissues related to the research lines of this thesis. A broad overview of thefield of network vulnerability assessments is given and the most representativesecurity policies, standards and methodologies are detailed. Di erent approachesffthat have been taken in other systems for automating security assessments aresurveyed as well. This chapter also overviews the artificial intelligencetechniques most suitable to be applied in the analysis of the informationgathered in network vulnerability assessments. Unsupervised learning approachesare studied to select the most appropriate for the system. Validity techniquesare analyzed and techniques to explain unsupervised learning results areaddressed. • Chapter 3 focuses on the first phase of a security assessment: thetesting phase. The main goals of this phase are enumerated. The methodologiesand procedures related to this phase are analyzed and several limitations areidentified. The contributions to solve these limitations are presented. Theframework Consensus is described and its components are depicted, with the aimof explaining the contributions related to the di erent improvements in moreffdetail. This chapter summarizes the Consensus features and justifies that a newworthwhile framework for security experts has come to light. • Chapter 4 detailsthe experiments carried out with the framework Consensus to evaluate itsperformance in real data networks. The testing scenarios and experimentationresults are detailed. These evaluations show how Consensus can be used toimprove the daily work of security testers. • Chapter 5 focuses on the secondphase of a security assessment: the analysis phase. A discussion of the mostsuitable artificial intelligence techniques for this security domain and thisanalysis phase is included. The main steps of the clustering approach have beendefined and contributions for each step have been designed and integrated inAnalia, the analysis module of Consensus with artificial intelligencecapabilities. This chapter explains all the di erent contributions in theffprocess of unsupervised learning and how they have been used to improve theanalysis of security testing results. • Chapter 6 summarizes the experiments runwith Analia to process data from Consensus framework. These experiments show howAnalia can be used to extract conclusions from the information of securityvulnerability assessments. Experimentation results are described and corroboratehow Analia can be used to improve the daily work of security analysts. • Chapter7 presents the conclusions of the research and summarizes the contributions ofthis thesis. This chapter also shows the lines for future research.
  32. 32. 30CHAPTER 1. INTRODUCTION1.6SummaryThis thesis has been developed into the research group GRSI of La Salle (URL)and has followed the guidelines of the PhD program Tecnologies de la Informaci´i les Comunicacions i la seva o gesti´ in the same university URL. This thesiscould not have succeeded without the support of o the GRSI group, La Salle andthe URL university. The global context of this thesis is the telematics field. Inthis broad scope, the security of data networks has become the core area and,more specifically, the security assessment of data networks. In this particulararea, the important benefits of machine learning, and unsupervised learning inparticular, have been considered and the integration of both worlds has resultedin the final contribution presented here. The final objective of this thesis is todefine, design and implement a framework for the improvement of securityassessments in data networks. This framework has automated the testing phase ofa security assessment and has included machine learning techniques for theanalysis of the security assessment results. The result of this framework is anew platform called Consensus, a distributed security testing system thatestablishes a guideline to deploy a complete network security test, promotesopen source tools, provides a flexible implementation and adapts to user needs.This system contains Analia, the module where the artificial intelligenttechniques to analyze testing results have been included. The framework composedof Consensus and Analia has been evaluated with a set of experiments tocorroborate the success and usefulness of the whole system. These contributionshave been endorsed with several publications in the fields of telematics andmachine learning. All this work will be described in the next chapters at greatlength.
  33. 33. Chapter 2Related workThe significant growth of networks and Internet has lead to increased securityrisks. Security vulnerabilities have become a main concern as they areconstantly discovered and exploited. This chapter surveys related work aboutvulnerability assessment and the application of artificial intelligencetechniques in this environment. The di erent topics covered by the thesis areffdetailed and the state of the art of these topics is expounded.2.1IntroductionCommunication data networks have become a key element in the infrastructure ofevery corporation. Data networks are used not only to communicate devices butalso to transport sensitive information, critical business applications and evento store mission-critical data. This crucial dependence on networks hasgenerated a demand for mechanisms that assure a certain level of security. Moreconnectivity leads to more security risks and hence more potentialvulnerabilities that could be deliberately exploited. The term vulnerability canbe defined as a condition of a missing or ine ectively administered safeguard orffcontrol that allows a threat to occur with a greater impact or frequency, orboth [97]. Vulnerability can also be defined as a bug or misconfiguration orspecial sets of circumstances that could result in an exploitation of thatvulnerability, directly by an attacker or indirectly through automated attacks[113]. Related to this term, vulnerability assessment is the process ofidentifying and quantifying vulnerabilities in a system. Regarding networksecurity assessment, this system can be a computer, the communicationsinfrastructure or a whole data network. Besides, a vulnerability assessment mustbe an important part of any security audit. A security audit is a systematic,measurable technical assessment of how the organization’s security policy isemployed at a specific site. In fact, security audits provide a measurable way toexamine how secure a site really is. Information is gathered through personalinterviews, vulnerability scans, examination of operating system settings andnetwork baselines. A security audit follows a three-phase methodology. Phase oneserves as a planning phase and defines the scope of the e ort, the networkffarchitecture, and the existing security solutions. Phase two encompasses anumber of di erent testing tools and activities used to verify system securityffsettings and identify system vulnerabilities, in order to view the system fromtwo separate perspectives, that of the external attacker, and that of amalicious insider. Finally, phase three defines a structured analysis of thecollected data and a reporting approach [97]. However, due to the largemagnitude of corporation networks and the di erent aspects to take into account,fftime and cost can limit the depth of a security audit. These limitations justifythe automation of the maximum number of processes 31
  34. 34. 32CHAPTER 2. RELATED WORKinvolved in this task. This thesis is focused on analyzing and automatingseveral processes related to phase two and phase three to help securityauditors; phase one depends on every corporation and must be performed byauditors. Regarding phase two, automated vulnerability assessment tools canperform the scanning task on a large number of devices in a fraction of the timetaken by a manual approach [83]. Nevertheless, it is also necessary to automatethe processes related to the analysis of test results (phase three), as athorough network security test generates extensive data quantities that need tobe handled. This chapter describes the related work covered by the thesis. It isdivided into two main blocks: a survey of network security assessments and astudy of the analysis phase of a security assessment. The former considers thedefinition of a security vulnerability and reviews the security policies,standards and methodologies most widely used. It describes types of securityassessments. Next, it provides a study of the testing tools that have becomewidespread and, hence, more suitable to be included in the contributionframework. Regarding the second category, an analysis of the most suitablemachine learning techniques for this scenario has been done. Di erentffunsupervised learning approaches have been studied. Several alternatives toexplain clustering results have been detailed. Finally, a study of thetechniques to validate clustering results closes this related work.2.22.2.1Network vulnerability assessmentsSecurity vulnerabilitiesSecurity vulnerabilities have become a main concern as they are constantlydiscovered and exploited in computer systems and network devices. Several toolsexist to analyze them from the point of view of vulnerability definition,modeling and characterization. A service o ered by SANS1 is called @RISK: TheffConsensus Security Alert. This system reports the new security vulnerabilitiesdiscovered during the past week and the actions that other organizations aredoing to protect themselves. The NVD2 is a security vulnerability database thatintegrates publicly available U.S. Government vulnerability resources. The NVDis also synchronized with the CVE3 vulnerability list. CVE provides common namesfor publicly known information security vulnerabilities and exposures. AlsoOVAL4 promotes open and publicly available security content. OVAL defines alanguage using XML for representing system configuration information andtransferring results to other OVAL-compatible products. It uses CVE list as thebasis for most of the OVAL definitions. On the other hand, CVE list does not givea measurement that reflects the criticism of vulnerabilities. There exists apublic initiative for scoring and quantifying the impact of softwarevulnerabilities. This score is named Common Vulnerability Scoring System (CVSS).It represents the actual risk a given vulnerability possesses, helping analystsprioritize remediation e orts [19]. Di erent research projects use CVE databaseff ffas a source to identify vulnerabilities. M2D2 is a model for IDS alertcorrelation that models vulnerabilities from CVE [90]. ALBA is an agent-aidedintrusion detection tool that incorporates CVE dictionary into its ontology torepresent vulnerability information [81]. CVE database has been included insystems that support development of secure software [50, 68]. Other systems useCVE information to predict the vulnerability discovery process for a program[71]. The Open Web Application Security Project (OWASP5 ) presented VulnXML, anopen standard format for web application security vulnerabilities only.Unfortunately there is not much research on automated vulnerability management[112].The SANS Institute. http://www.sans.org National Vulnerability Database (NVD).http://nvd.nist.org 3 Common Vulnerabilities and Exposures (CVE).http://http://www.cve.mitre.org 4 Open Vulnerability and Assessment Language(OVAL). http://oval.mitre.org 5 OWASP Project. http://www.owasp.org
  35. 35. 2 1
  36. 36. 2.2. NETWORK VULNERABILITY ASSESSMENTS332.2.2Security policies, standards procedures and methodologiesEvery device connected to a network is exposed to many di erent threats. Whileffinformation system and network security professionals struggle to move forwardwith constantly evolving threats, hackers use technology that conventionalsecurity tools and services cannot cope with. Nevertheless, it is important toconsider not only deliberated actions but also unintentional human activities,problems with systems or external problems that could derive in negative resultslike the exposure, modification or even the deletion of information or also thedisruption of services. There exists a serious problem of lack of knowledgeabout the solutions to establish in order to improve security in a corporation.This fact leads to an insu cient security level in many corporate networks. Forffiexample, a study performed by ASIMELEC in 2003 applied to 52 Spanish companiesshowed up that the implantation of security standards like ISO17799-1 was onlyreflected in the 53% of the corporations [44]. This study also maintains that themain cause of these poor results is due to the lack of perception of thisenvironment and its consequences by the companies. Moreover, most companiesconsider that a high percentage of the security recommendations are notapplicable in their context, although the main reason of the absence ofapplicability is due to their inexperience [44]. The international register ofcertificates ISMS (Information Security Management System) shows only 35 Spanishcompanies certified with ISO/IEC 27001:2005 in May, 2009. On the other hand, UKhas 400 certified companies, Germany has 119 and the United States has 91companies with this certification [66]. The most important security standards arethe following: • BS7799: standard published by the British Standard Institute(BSI). The standard BS 7799 (BS 7799-1) was published in 1995. It is a code ofpractice for information security management and it is the base to obtain a BSIaccreditation that assures that the information stored in a corporate network issecure and the security of that corporation is reliable. A second part BS 7799-2is added in 1998. Part 1 of the standard was proposed as an ISO standard andpublished with minor amendments as ISO/IEC 17799 on 2000. BS 7799-2:2002 iso cially launched in 2002. It is re-published in June 2005 as ISO/IECffi27001:2005, as a result of the regular ISO standards update cycle. - ISO/IEC17799: this standard establishes guidelines and general principles forinitiating, implementing, maintaining and improving information security in anorganization. It contains best practices of control objectives and controls ofdi erent areas like security policy, asset management, human resources security,ffphysical and environmental security, communications and operations management,access control, information security incident management and so on. Most of theexistent standards are based on this ISO and incorporate part of itsrecommendations. - ISO-27001:2005: this standard establishes the bases of anISMS. It applies, as a framework, the ten operational areas defined in ISO/IEC17799:2005. It covers all types of organizations to ensure di erent aspects likeffcompliance with laws and regulations, definition of new information securitymanagement processes, and also identification and clarification of existinginformation security management processes. This standard follows the four-stepmodel PDCA (Plan-Do-Check-Act) in order to establish, operate, monitor andconstantly improve the e ectiveness of a documented ISMS. • UNE-ISO/IEC 17799ffand UNE 71502: UNE is the Spanish authority in charge of regulating network andinformation security. UNE-ISO/IEC 17799 is a best practice code for the securityinformation management and it is based on the international standard ISO/IEC
  37. 37. 34CHAPTER 2. RELATED WORK 17799. It divides the security specification in tenblocks distributed from perimetral security to the classification and control ofassets. The standard UNE 71502:2004, based on BS 7799-2:2002 is approved in2004. AENOR carries out a certification according to the Spanish standard UNE71502:2004. This certification establishes the requirements to implement,document and evaluate an ISMS, within the risks identified by organizationsaccording to UNE-ISO/IEC 17799:2002 standard, which develops a code of goodpractices for the information security management. • ISO/IEC TR 13335: thisstandard provides guidelines for the management of information technology (IT)security. It defines general concepts about security and communication models. Italso provides techniques for the management of IT security, designed to assistthe implementation of IT security. Di erent concepts and models for informationffand communications technology security management are included too. • ISO/IEC15408: the main goal of this standard is to provide evaluation criteria for ITsecurity in order to compare di erent models. Functional and assuranceffrequirements can be evaluated when applying this standard.Di erent methodologies for security management and system development haveffarisen in the last years. These methodologies are mainly focused on design,implementation and monitoring aspects. Complex systems and networks needadequate methods to formalize and validate their management and development. Themain security methodologies are the following: • OCTAVE (Operationally CriticalThreats, Assets and Vulnerability Evaluation): it is a risk-based strategicassessment and planning technique for security [94]. This methodology not onlyfocuses on technology but also on organizational risk and strategic, practice-related issues, balancing operation risk and security practices. It defines a setof self-directed activities for organizations to identify and manage theirinformation security risks. It is an open methodology that contains threephases. Phase 1 determines what is important to the organization and what iscurrently being done to protect those assets. Security requirements and possiblethreats to each critical asset are also identified. Phase two evaluates theinformation infrastructure considering the network, architecture, operatingsystem and applications. Finally, Phase three develops a protection strategy andmitigation plans to address the risks to the critical assets. • MEHARI (MethodeHarmonis´e d’Analyse des Risques): it is a method for risk analysis e and riskmanagement developed by CLUSIF(Club de la Securite des Systemes d’InformationFrancais). It summarizes MARION and MELISA methods. It is compatible with othersecurity standards like ISO/IEC 17799 and ISO 13335. The software Risicare isthe main tool used to implement this methodology. • ISM3 (Information SecurityManagement Maturity Model ): ISM3 is a a framework for ISMS. ISM3 looks atdefining levels of security that are appropriate to the business mission andrender a high return on investment. It can be applied to assure businessobjectives are specifically tailored to security design, implementation,operations, management, procurement, and assurance processes. • Ebios (Expresi´ndes Besoins et d’Identification des Objectifs de S´curit´ ): it is used to o e eassess and treat risks relating to information systems security. It can be usedto communicate this information within the organization and to partners- Thus,it assists in the risk management process. It has been designed by DCSSI(Central Information Systems Security
  38. 38. 2.2. NETWORK VULNERABILITY ASSESSMENTS35Division - Direction Centrale de la S´curit´ des Syst`mes d’information),department of the e e e French government. • OSSTMM (Open Source SecurityTesting Methodology Manual ): it is an open source methodology proposed byISECOM [67]. It is a peer-reviewed methodology for performing security tests andmetrics. It is divided into five sections which test information and datacontrols, personnel security awareness levels, fraud and social engineeringcontrol levels, computer and telecommunication networks including wirelessdevices, physical security access controls and security processes. This is thefirst open source methodology in this environment. It is considered not acorporative but an operational methodology: it does not explain how securitytests should be performed. It its more focused on which items need to be tested,what to do before, during, and after a security test, and how to measure theresults.2.2.3Security assessmentsNetwork security is a main concern in any corporation. Constantly evolvingthreats force security professionals to do permanent and exhaustive controls toprotect networks and detect weaknesses. Anomaly detection techniques have beenwidely applied to detect intrusions and vulnerabilities since the publication ofthe seminal report by Anderson in 1980 [5]. A correct design of a securitysystem needs the understanding of the threats and attacks and how these maymanifest themselves in audit data [5]. Consequently, testing is a basic tool toassure that enterprise security requirements are met. The security level can bealso evaluated by analyzing test results and discovering how attackers canpenetrate into systems and networks [34, 113]. Security assessments pursue threemain goals. The first goal is to discover design and implementation flaws, as wellas to identify the operations that may violate the security policies implementedin an organization. Secondly, it has to ensure that security policy reflectsaccurately the organization’s needs. Finally, it has to evaluate the consistencebetween the systems documentation and how they are implemented. In fact, acomplete test should include the system, communication, physical, personal,operational and also the administrative security [113]. Di erent securityffassessments can be performed. The di erence between these security assessmentsffis focused on the goal to analyze, cost and time spent in order to obtainresults (see Figure 2.1, [67]). The most common security tests are the following[67]: • Vulnerability scanning: it consists in discovering known security holes,flaws, softwares or techniques that take profit of any vulnerability in remotesystems. Most of vulnerability scanners rely on a local database that containsall the information required to check a system for security holes in servicesand ports, anomalies in packet construction, and potential paths to exploitableprograms or scripts. Then the scanner tries to exploit each vulnerability thatis discovered. When vulnerabilities are discovered, countermeasures should beapplied in the system so as to solve the problem. • Security scanning: this testnot only performs a vulnerability scanning but also includes manual falsepositive verification and network weakness identification. • Penetration testing:the commonly accepted definition of penetration testing is the illegitimateacquisition of legitimate authority [49, 107, 99]. The goal of a penetrationtest is to gain privileged access to the network or any device. Therefore, it isbased on the ability to command network facilities to operate in a di erent wayffrather than what their owners
  39. 39. 36CHAPTER 2. RELATED WORK expected them to do, to gain the full or at leastsubstantial control of a host in a way normally reserved for trusted operationssta , or to acquire the management interface of an application [49]. • RiskffAssessment: it is the process to determine the level of risk in a particularcourse of action. Risk is assumed as the possibility that some factor coulddamage any element of an organization. The result of this test is a report thatshows assets, vulnerabilities, likelihood of damage, estimates of the costs ofrecovery, summaries of possible defensive measures and their costs and estimatedprobable savings from better protection. It is necessary to take intoconsideration the presence of the network, business justifications and alsoindustry specific justifications [110]. • Security Audit: it usually refers to ahands-on, privileged security inspection of all the systems within a network,taking into account the analysis of operating systems and applications. Thistest is much more exhaustive than the previous ones. • Ethical Hacking: itdescribes the process of attacking a network on behalf of its owners, seekingvulnerabilities that a malicious hacker could exploit. Ethical hackers reportproblems instead of taking advantage of them. These hackers work with clients inorder to help them secure their systems. Thus, this test is not intrusive as itsgoal is to detect failures but not to damage any network or system [107]. •Posture Assessment or Security Testing: it is the process to determine that aninformation system is capable to protect its data and also it is capable tomaintain its original functionalities [113].costSecurity Auditing Posture Assessment & Security Testing Risk Assessment EthicalHacking Penetration TestingSecurity Scanning Vulnerability ScanningtimeFigure 2.1: Types of system and network security assessments as based on timeand cost.
  40. 40. 2.2. NETWORK VULNERABILITY ASSESSMENTS372.2.4Network security testing toolsDi erent vulnerability assessment tools exist in the market. Some of them arefffree scanners, like Nessus6 , X-Scan7 and SARA8 . Other commercial tools are GFILANguard9 , Retina10 , ISS Internet Scanner11 and SAINT12 . The latest versionof Nessus is now closed source, but it is still free without the latest plugins.The most popular and free web vulnerability scanners are Nikto13 , Paros proxy14and Webscarab15 , whereas WebInspect16 , Watchfire AppScan17 and N-Stealth18 arecommercial web vulnerability scanners. Not only is cost an important matter incommercial solutions, but also proprietary methodologies that hide the internaltesting process and vulnerability assessment. These types of tools do notusually implement an open source methodology for network testing. On the otherhand, free or open source tools are useful when performing a test due to theirinexpensive cost, flexibility and easiness of customization. However it isnecessary to use di erent simple tools in order to perform a thorough securityfftest, as every tool gives response to a part of the test. Thus, distinct toolsare needed for system service identification, vulnerability assessment,application-specific assessment, access control testing or other phases of anetwork security test. The first goal of a network vulnerability assessment is totest everything possible [97]. Di erent improvement proposals to achieve thisffobjective have been found. VASE (Vulnerability Assessment Support Engine)improves safety in vulnerability testing process but only integrates Nessus asvulnerability assessment tool [54]. VAS is another system that is using Nessusas the only scanning engine for vulnerability assessments [117]. Lee et al.[75]present a system that automates port scanning by using only Nmap19 . SAT is atool to test and detect weakness and security holes in a computer system, but itdoes not detect vulnerabilities on other network devices [89]. Ferret is anotherhost vulnerability checking tool that does not consider network vulnerabilities[104]. Another proposal is MulVAL [95], a system that uses OVAL as scanningtool. However, an OVAL scanner runs on each machine so the tool must beinstalled on every host that needs to be analyzed. Hence, hosts must be modifiedbefore the scan and vulnerability results could be influenced by this alteration.If the network to be tested has many devices, the previous work needed toperform the test increases dramatically. Moreover, most of the reported systemsonly take into account servers or computer systems and they do not considerother network devices like firewalls or routers. Furthermore wireless networksneed special tests as their properties di er from wired networks. This thesisffwill provide a solution to automate the testing phase using complementary toolsin a structured way and following an open source methodology to performvulnerability assessments. Protecting a network is about more than just securingservers and computer systems. The network should be also protected againstattacks that target firewalls and routers. Routers provide services that areessential to the correct operation of the networks. Compromise of a router canlead to various security problems on the network served by that router, or evenother networks which that router communicates with. Di erent auditing proceduresfffor a network secured with aNessus. http://www.nessus.org X-Scan. http://www.xfocus.net/projects/X-Scan 8SARA. http://www-arc.com/sara 9 LANGuard. http://www.gfi.com/lannetscan 10Retina. http://www.eeye.com 11 IS. http://www.iss.net 12 SAINT.http://www.saintcorporation.com/products/vulnerability scan/saint 13 Nikto.http://www.cirt.net/code/nikto.shtml 14 Paros proxy. http://www.parosproxy.org15 OWASP WebScarab Project. http://www.owasp.org 16 SPI Dynamics’ WebInspect.http://www.spidynamics.com 17 Watchfire AppScan.http://www.watchfire.com/products/appscan 18 N-Stealth. http://www.nstalker.com19 Nmap. http://www.insecure.org7 6
  41. 41. 38CHAPTER 2. RELATED WORKfirewall are in [79, 12]. Nessus and Nmap are the tools respectively proposed toaudit manually the firewall. A case study for security audit in [78] analyzesmanually the network infrastructure and its possible vulnerabilities. PBit is atool developed to test packet filter rules only for Linux firewalls [39]. Afunctional approach is presented in [70], where firewall vulnerabilities areclassified based on the firewall functional units to run the correspondingpenetration tests for di erent firewall products. Another firewall testingfftechnique based on selecting the packets instead of random testing is shown in[41]. All these solutions are focused only on testing firewalls and routers; theyare not integrated in a system capable of testing all devices within a network.The increasing popularity of wireless networks has opened organizations up tonew security threats. Wireless LANs (WLANs) present unique security challengesbecause they are vulnerable to specialized and particular attacks. Some attacksmay exploit technology weaknesses, whereas other may be focused on configurationweaknesses. Compromising only one node or introducing a malicious node may a ectffthe viability of the entire network [34]. Typical tools for monitoring wirednetworks examine only network or higher abstraction layers based on theassumption that lower layers are protected by the physical security of wires.However, this assumption cannot be extrapolated to wireless networks because ofthe broadcast nature of such networks [77]. Currently, there exist wirelessvulnerability scanners, like Aircrack20 , Kismet21 , Stumbler22 , Wellenreiter23or Airsnort24 . However, di erent tools are needed to obtain all the informationffrequired for a reliable test. Therefore, there exists a need of a vulnerabilitydetection system for WLANs that automates the processes. As wireless networksare usually part of a corporate network, this automated system should beintegrated in a system capable of detecting vulnerabilities in the entirenetwork. A global solution for this problem will be presented in the followingsections. Another line of work regarding vulnerability analysis is related tomodel network attacks, due to the fact that attacks may exploit a known orunknown vulnerability. Model checking techniques [102], attack-graph generationbased on model-checking [106], graph-based search algorithms techniques [98, 3],reasoning rules using Datalog clauses [95], a framework for automaticallyextracting attack manifestations from log data [74], stochastic formal grammar[73] and finite-state machine modeling [22] are some approaches. However thisline is out of scope of this thesis, more focused on the analysis ofvulnerability reports after a network test has been performed.2.3Analysis of network vulnerability assessmentsThe last phase of a vulnerability assessment is the analysis of the assessmentresults to obtain a concise report that summarizes the conclusions obtained fromthe audit. Regarding result analysis, systems like VASE and VAS only show Nessusresults [54, 117]. MulVAL models the interaction of software bugs with systemand network configurations using reasoning rules that characterize general attackmethodologies. MulVAL results show violation and attack traces [95]. All thesetools show vulnerability assessment results, but the analysis and process ofthis data is manually performed by security analysts. However, considerable dataquantities are compiled after performing a network security test, and thereforea manual classification becomes an arduous work [34, 42, 81]. This is why theanalysis phase needs to be improved in order to help security analysts to dealwith test results. The improvements presented in this thesis introduce AItechniques in the analysis phase to handle stored data. This is why the nextcovered work is related to AI and networking.20 21Aircrack. http://www.michiganwireless.org/tools/aircrackhttp://www.kismetwireless.net 22 http://www.stumbler.net 23http://www.wellenreiter.net 24 www.airsnort.shmoo.com
  42. 42. 2.3. ANALYSIS OF NETWORK VULNERABILITY ASSESSMENTS392.3.1Artificial Intelligence and Machine LearningThe main goal of Machine Learning is to build mechanisms through whichintelligent systems improve their performance over time. This paradigm refers tosystems capable of the autonomous acquisition and integration of knowledge. Thiscapacity to learn from experience, analytical observation, and other means o ersffincreased e ciency and e ectiveness. Depending on the implemented strategy,ffi ffMachine Learning techniques can be classified in supervised learning andunsupervised learning. The former estimates unknown dependencies from knowninput-output samples. Supervised learning classes are predefined so an expert isrequired to pre-classify the samples. On the other hand, unsupervised learningclasses are initially unknown and need to be extracted from input data.Unsupervised learning is useful when the system does not provide data withlabeled classes or any other information beyond the raw data.2.3.2ClusteringThe analysis of information provided by network security tests requires improvedmethods to identify behavior patterns, recognize malicious data or unauthorizedchanges in a network [36]. Unsupervised learning can be applied in this securitytest environment, where the domain and the di erent classes have not been definedffand no previous knowledge of network behavior and data results are required. Inthis sense, unsupervised learning can be used for clustering groups of testeddevices with similar vulnerabilities and find features inherent to the problem.2.3.2.1 Clustering processThe main goal of clustering is to group elements with similar attribute valuesinto the same class. Elements from the same cluster should have a highsimilarity, whereas elements from di erent clusters should have a lowffsimilarity. In the clustering task there is no goal attribute, as classes areinitially unknown and need to be discovered from the data. This means that noinformation related to common features in a class can be used, as classes havenot been previously defined. Clustering activity usually involves the followingsteps: 1. Pattern representation. 2. Definition of a pattern proximity measureappropriate to the data domain. 3. Clustering. 4. Data abstraction, if needed.5. Cluster validation, if needed. Pattern representation refers to the number ofclasses, the number of available patterns, and the number, type, and scale ofthe features available to the clustering algorithm. A previous feature selectionshould be done, where the most e ective subset of the original features shouldffbe selected. Secondly, pattern proximity is usually measured by a distancefunction defined on pairs of patterns. A broad variety of distance measures canbe used. A simple distance measure like Euclidean distance can often be used toreflect dissimilarity between two patterns, whereas other similarity measures canbe used to characterize the conceptual similarity between patterns. There alsoexists a high variety of techniques for cluster formation. For this reason, themost adequate techniques for the data domain should be also selected. Dataabstraction is the process of extracting a simple and compact representation ofa data set, usually in terms of cluster prototypes or representative patternssuch as the centroid or the director vector. Finally, cluster validation is theassessment of a clustering procedure’s output to determine whether the output ismeaningful.
  43. 43. 40 2.3.2.2 Clustering classificationCHAPTER 2. RELATED WORKThere exist a large number of clustering methods. The choice of a clusteringalgorithm depends on the type of available data and on the particular purpose[57]. In general, clustering methods can be classified in the followingapproaches. There are two large families of clustering techniques: partitionmethods and hierarchic methods. The former cluster training data into Kclusters, where K <M and M is the number of objects in the data set. The latterwork by grouping data objects into a tree of clusters. Clustering methods can bealso classified considering the strategy used to build the clusters. Center-basedmethods minimize an objective function, usually based on distances betweencentroids, which indicates how good a solution is. One of the mostrepresentative partition and center-based methods is the K-means algorithm [60].Its main drawback is the configuration of the K parameter. In this sense, the X-means [96] algorithm automatically calculates the best number of clusters foreach execution. Search-based methods also minimize an objective function, buttend to search for the global optimal point, instead of a local global point.MOCK (Multiobjective Clustering with Automatic k-Determination) [58] is anexample of a search-based clustering approach. It optimizes a partitioning withrespect to multiple, complementary clustering objectives and uses amultiobjective evolutionary algorithm to perform the optimization. Density-basedmethods discover clusters with an arbitrary shape. This typically regardsclusters as dense regions of objects in the data space that are separated byregions of low density, representing noise. The most popular algorithms in thiscategory are the following: DBSCAN (Density-Based Spatial Clustering ofApplications with Noise) [43], OPTICS (Ordering Points to Identify ClusteringStructure) [6] and DENCLUE (DENsity-based CLUstEring) [62]. On the other hand,Grid-based method divides the space into a finite number of cells that form agrid structure in which all clustering operations are performed. As anadvantage, this method has a constant processing time, independently of thenumber of data objects. In this group point up algorithms such as CLIQUE(Clustering High-Dimensional Space) [2], STING (STatistical INformation Grid)[114], and also WaveCluster [105], an algorithm that clusters using the wavelettransformation. Model-based methods use mathematical and probabilistic models;an example is the Autoclass algorithm [21]. Concerning artificial neural networks(NN) [13], Kohonen Map or Self-Organizing Map (SOM) [72] is a widely usedunsupervised learning model. Adaptive Resonance Theory (ART) is another NNtechnique. There are many variations of ART available. The simplest ART networkclassifies an input vector into a category depending on the stored pattern itmost closely resembles [16]. Di erent clustering techniques have been applied inffthe network security domain. K-means has been used to find natural grouping ofsimilar alarm records [15, 69] and also to detect network intrusions [76, 120,42]. Clustering has been used for network tra c classification as well[80]. SOMffihas already been applied as clustering method to study computer attacks [36], todetect network intrusions [37] and viruses [118], to analyze TCP tra c patternsffi[119], and also to detect anomalous tra c [101]. Evolutionary multiobjectivefficlustering has been successfully applied to important real-world problems suchas intrusion detection [4], formation of cluster-based sensing networks inwireless sensor networks [116] and creation of security profiles [55]. SoftComputing techniques can improve the discovery of implicit and previouslyunknown knowledge using a better feature extraction in a high dimensional spaceto cluster data [47]. An example of a Soft Computing technique is SOM [72].However, a drawback of data clustering methods is their lack of interpretationof clustering results, as they do not explain why some elements are in the samecluster and other elements have been grouped separately. Hence a furtherinterpretation is needed to facilitate an accurate explanation of the results. Abrief description of the clustering techniques that may enclose the best optionsto process data from security assessments is provided in the following sections.

×