SlideShare a Scribd company logo
1 of 45
Download to read offline
Applying Soft Computing 
Techniques to Corporate Mobile 
Security Systems 
Máster en Ingeniería de Computadores y 
Redes 
Paloma de las Cuevas Delgado 
Dirigida por los Doctores: 
Antonio Miguel Mora García 
Juan Julián Merelo Guervós
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
Research Context 
1
Research Context 
● Bring Your Own Device problem 
2
Research Context 
● MUSES SERVER 
3
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
Underlying Problem 
● Enterprise Security applied to employees’ connections to the 
Internet (URL requests). 
● Security? How? 
○ Proxy 
○ Blacklists 
○ Whitelists 
○ Firewalls 
○ Elaboration of Corporate Security Policies 
List of URLs which are permitted (white) or not (black) 
● The aim of this research is going a step beyond. 
5
● Objective → to obtain a tool for automatically making an 
allowance or denial decision with respect to URLs that are 
not included in the black/whitelists. 
○ This decision would be based in the one made for similar URL 
accesses (those with similar features). 
○ The tool should consider other parameters of the request in 
addition to the URL string. 
Objectives 
6
1. Data Mining process 
a. Parsing 
b. Preprocessing 
Followed Schema 
2. Labelling process (requests labelled as ALLOW or DENY) 
3. Machine Learning 
4. Studying classification accuracies 
7
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
Working Scenario 
● Employees requesting accesses to URLs (records from an 
actual Spanish company - around 100 employees) during 
workday. 
● Having access to a Log File of 100k entries (patterns) within 
two hours (8 - 10 am). CSV file format. 
● Also, we were provided with a set of rules (specification of 
the security policies on if-then clauses). 
9
● An Entry (unlabelled) 
● A Policy and a Rule 
“Video streamings cannot be reproduced” 
Data description 
http_reply_ 
code 
http_metho 
d 
duration_ 
miliseconds 
content_type server_or_ 
cache_address 
time squid_hierarchy bytes url client_ 
adress 
200 GET 1114 application/octet-stream 
X.X.X.X 08:30:08 DEFAULT_PARENT 106961 http://www. 
one.example. 
com 
X.X.X.X 
rule "policy-1 MP4" 
attributes 
when 
squid:Squid(dif_MCT=="video",bytes>1000000, 
content_type matches "*.application.*, 
url matches "*.p2p.* ) 
then 
PolicyDecisionPoint.deny(); 
end 10
● An Entry 
○ Has 7 categorical fields and 3 numerical fields. 
● A Rule 
○ Has a set of conditions, and a decision (ALLOW/DENY). 
○ Each condition has three parts: 
■ Data Type (e.g. bytes) 
■ Relationship (e.g. < ) 
■ Value (e.g. 1000000) 
Data description 
11
Tools used during this research 
● Drools and Squid syntax for the rules, CSV format for Log 
data. 
● Weka, which has a great and state-of-the-art set of 
classifiers. 
● Two implementations: 
○ Perl → faster in the parsing process, slower with the labelling 
process and the use of weka. 
○ Java → native implementation with weka, better for automation, 
and it will be embedded in an actual Java project (MUSES). 
12
After the parsing process 
● A hash with the entries 
○ Keys → Entry fields 
○ Values → Field values 
● A hash with the set of rules 
○ Keys → Condition fields, and decision 
○ Values → Name of the data type, its 
desired value, relationship between them, 
and allow, or deny. 
%logdata = ( 
entry =>{ 
http_reply_code =>xxx 
http_method =>xxx 
duration_miliseconds =>xxx 
content_type =>xxx 
server_or_cache_address =>xxx 
time =>xxx 
squid_hierarchy =>xxx 
bytes =>xxx 
url =>xxx 
client_address =>xxx 
}, 
); 
%rules = ( 
rule =>{ 
field =>xxx 
relation =>xxx 
value =>xxx 
decision =>[allow, deny] 
}, 
); 
13
● The two hashes are compared during the labelling process. 
● Conditions of each rule are checked in each entry. 
● If an entry meets all conditions, it is labelled with the 
corresponding decision of the rule. 
● A pair key-value is included in the hash of that entry, with 
the decision. 
● Conflict resolution: 
Labelling Process 
○ Entry meets conditions of a rule that allows making the request. 
○ Entry meets conditions of a rule that denies making the request. 
14
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
Data Summary 
● The CSV file, now with all the patterns that could be labelled 
(the others were not covered by the rules), has 57502 
entries/patterns: 
○ 38972 with an ALLOW label. 
○ 18530 with a DENY label. 
2:1 ratio 
● It might be needed to apply data balancing techniques: 
○ Undersampling: random removal of patterns in majority class. 
○ Oversampling: duplication of each pattern in minority class. 
16
Experimental Setup 
● The classifiers are tested, firstly, with a 10-fold cross-validation 
process. 
○ Top five classifiers in accuracy, are chosen for the following 
experiments. 
○ Also, Naïve Bayes classifier is taking as a reference. 
● Secondly, a division process is performed over the initial 
(labelled) log file, into both training and test files. 
● These training and test files are created with different ratios 
and either taking the entries randomly or sequentially. 
17
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
Flow Diagram 
1) Initial labelling process. 
Experiments with unbalanced, and balanced 
data. From those, divisions are made: 
● 80% training 20% testing 
● 90% training 10% testing 
Randomly, and sequentially. 
3) Enhancing the creation of training and test files. 
Experiments with unbalanced data. From those, divisions 
are made, patterns randomly taken: 
● 80% training 20% testing 
● 90% training 10% testing 
● 60% training 40% testing 
2) Removal of duplicated requests. 
Experiments with unbalanced data. From 
those, divisions are made: 
● 80% training 20% testing 
● 90% training 10% testing 
● 60% training 40% testing 
Randomly, and sequentially. 
4) Filtering the features of the URL. 
Experiments with unbalanced, and balanced data. 
From those, divisions are made, patterns 
randomly taken: 
● 80% training 20% testing 
● 90% training 10% testing 
● 60% training 40% testing 
19
First set of experiments 
1) Initial labelling process. 
● The classifiers are tested, firstly, with a 10-fold cross-validation process 
over the balanced data. 
20
First set of experiments 
1) Initial labelling process. 
● Naïve Bayes and top five classifiers are tested with training and test 
divisions, in order to avoid testing patterns being used for training and 
vice versa. 
21
First set of experiments 
1) Initial labelling process. 
Divisions made over unbalanced data 
22
First set of experiments 
1) Initial labelling process. 
Divisions made over balanced data (undersampling) 
23
First set of experiments 
1) Initial labelling process. 
Divisions made over balanced data (oversampling) 
24
● We studied the field squid_hierarchy and saw that had two possible 
values: DIRECT or DEFAULT_PARENT. 
● The connections are made, firstly, to the Squid proxy, and then, if 
appropriate, the request continues to another server. 
○ Then, some of the entries were repeated, and results may be affected for 
that. 
Second set of experiments 
2) Removal of duplicated requests. 
http_reply_ 
code 
http_metho 
d 
duration_ 
miliseconds 
content_type server_or_ 
cache_address 
time squid_hierarchy bytes url client_ 
adress 
200 GET 1114 application/octet-stream 
X.X.X.X 08:30:08 DEFAULT_PARENT 106961 http://www. 
one.example. 
com 
X.X.X.X 
25
Second set of experiments 
2) Removal of duplicated requests. 
Divisions made over unbalanced data 
26
Third set of experiments 
3) Enhancing the creation of training and test files. 
● Repeated URL core domains could yield to false results. 
● During the division process, we ensured that requests with the same URL 
core domain went to the same file (either for training or for testing). 
27
Third set of experiments 
3) Enhancing the creation of training and test files. 
28
Created Rules During Classification 
● In the experiments that included only the URL core domain as a 
classification feature, rules were too focused on that feature. 
PART decision list 
------------------ 
url = dropbox: deny (2999.0) 
url = ubuntu: allow (2165.0) 
url = facebook: deny (1808.0) 
url = valli: allow (1679.0) 
29
Created Rules During Classification 
● Another kind of rules were found, but always dependant on 
the URL core domain. 
url = grooveshark AND 
http_method = POST: allow (733.0) 
url = googleapis AND 
content_type = text/javascript AND 
client_address = 192.168.4.4: allow (155.0/2.0) 
url = abc AND 
content_type_MCT = image AND 
time <= 31532000: allow (256.0) 
30
Fourth set of experiments 
4) Filtering the features of the URL. 
● Rules created by the classifiers are too focused on the URL core domain 
feature. 
● We did the experiments again with the original file, but including as a 
feature only the Top Level Domain of the URL, and not the core domain. 
31
Fourth set of experiments 
4) Filtering the features of the URL. 
Divisions made over unbalanced data 
32
Fourth set of experiments 
4) Filtering the features of the URL. 
Divisions made over balanced data 
33
Created Rules During Classification 
● After including the URL top level domain as a classification feature, 
instead of URL core domain, rules classify mainly by server 
address. 
PART decision list 
------------------ 
server_or_cache_address = 173.194.34.248: allow (238.0/1.0) 
server_or_cache_address = 8.27.153.126: allow (235.0/2.0) 
server_or_cache_address = 91.121.155.13: deny (235.0) 
server_or_cache_address = 66.220.152.19: deny (201.0) 
34
Created Rules During Classification 
● URL TLD appears, but now the rules are not always 
dependant on this feature. 
server_or_cache_address = 90.84.53.48 AND 
client_address = 10.159.39.199 AND 
tld = es AND 
time <= 31533000: allow (138.0/1.0) 
content_type = application/octet-stream AND 
tld = com AND 
server_or_cache_address = 192.168.4.4 AND 
client_address = 10.159.86.22: allow (210.0) 
server_or_cache_address = 90.84.53.19 AND 
tld = com: deny (33.0/1.0) 
35
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
● In most cases, Random Forest classifier is the one that yields 
better results. 
● The loss of information when analysing a Log of URL 
requests lowers the results. This happens when: 
○ Oversampling data (because we randomly remove data). 
○ Keeping the sequence of the requests of the initial Log file while 
making the division in training and test files. 
Conclusions 
37
Conclusions 
● For future experiments, it should be ensured that same URL 
lexical features (like the core domain) are not in both 
training and test files at the same time. 
○ This wrongs the results. 
● As seen in the rules obtained, it is possible to develop a tool 
that automatically makes an allowance or denial decision 
with respect to URLs, and that decision would depend on 
other features of a URL request and not only the URL. 
38
Scientific Contributions 
● MUSES: A corporate user-centric system which applies 
computational intelligence methods, at ACM SAC conference, 
Gyeongju, Korea, March 2014. 
● Enforcing Corporate Security Policies via Computational 
Intelligence Techniques, at SecDef Workshop at GECCO, 
Vancouver, July 2014. 
● Going a Step Beyond the Black and White Lists for URL Accesses 
in the Enterprise by means of Categorical Classifiers, at ECTA, 
Rome, Italy, October 2014. 
39
1. Research context. 
2. Underlying problem and objectives. 
3. Data description and preprocessing. 
4. Experimental setup. 
5. Experiments and results. 
6. Conclusions and scientific contributions. 
7. Future Work. 
Index
● Making experiments with bigger data sets (e.g. a whole 
workday). 
● Include more lexical features of a URL in the experiments (e. 
g. number of subdomains, number of arguments, or the 
path). 
● Consider sessions when classifying. 
○ Defining session as the set of requests that are made from a certain 
client during a certain time). 
● To finally implement a system and to prove them with real 
data, in real-time. 
Future Work 
41
Thank you for your attention 
Questions? 
paloma@geneura.ugr.es 
Twitter @unintendedbear

More Related Content

Viewers also liked

Njatc Pv Powerpoint Acte Naae
Njatc Pv Powerpoint Acte NaaeNjatc Pv Powerpoint Acte Naae
Njatc Pv Powerpoint Acte Naaervb1019
 
FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...
FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...
FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...u772020
 
Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...Beniamino Murgante
 
7 ee462_l_dc_dc_boost_ppt
 7 ee462_l_dc_dc_boost_ppt 7 ee462_l_dc_dc_boost_ppt
7 ee462_l_dc_dc_boost_pptRaja d
 
International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS) International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS) ijfls
 
INTERLEAVED BOOST CONVERTER FOR PV APPLICATION
INTERLEAVED BOOST CONVERTER FOR PV APPLICATIONINTERLEAVED BOOST CONVERTER FOR PV APPLICATION
INTERLEAVED BOOST CONVERTER FOR PV APPLICATIONDr. Bibhu Prasad Ganthia
 
Speed con trol of dc motor
Speed con trol of dc motorSpeed con trol of dc motor
Speed con trol of dc motorAmit Ranjan
 
Ajal UPQC
Ajal UPQC Ajal UPQC
Ajal UPQC AJAL A J
 
MPPT using fuzzy logic
MPPT using fuzzy logicMPPT using fuzzy logic
MPPT using fuzzy logicmazirabbas
 
FUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTER
FUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTERFUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTER
FUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTERRAGHVENDRA KUMAR PANDIT
 
Power quality improvement using upqc with soft computing method: Fuzzy logic
Power quality improvement using upqc with soft computing method: Fuzzy logicPower quality improvement using upqc with soft computing method: Fuzzy logic
Power quality improvement using upqc with soft computing method: Fuzzy logicSakti Prasanna Muduli
 
MPPT Solar Charge Controller
MPPT Solar Charge ControllerMPPT Solar Charge Controller
MPPT Solar Charge ControllerShashank Narayan
 
Solar Presentation.pptx
Solar Presentation.pptxSolar Presentation.pptx
Solar Presentation.pptxSteve Martinez
 
MPPT Based Optimal Charge Controller in PV system
MPPT Based Optimal Charge Controller in PV systemMPPT Based Optimal Charge Controller in PV system
MPPT Based Optimal Charge Controller in PV systemMalik Sameeullah
 

Viewers also liked (20)

Njatc Pv Powerpoint Acte Naae
Njatc Pv Powerpoint Acte NaaeNjatc Pv Powerpoint Acte Naae
Njatc Pv Powerpoint Acte Naae
 
FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...
FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...
FORECASTING OF RENEWABLE ENERGY PRODUCTION BY USING GENETIC ALGORITHM (GA) FO...
 
Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...
 
7 ee462_l_dc_dc_boost_ppt
 7 ee462_l_dc_dc_boost_ppt 7 ee462_l_dc_dc_boost_ppt
7 ee462_l_dc_dc_boost_ppt
 
International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS) International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS)
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Fuzzy+logic
Fuzzy+logicFuzzy+logic
Fuzzy+logic
 
INTERLEAVED BOOST CONVERTER FOR PV APPLICATION
INTERLEAVED BOOST CONVERTER FOR PV APPLICATIONINTERLEAVED BOOST CONVERTER FOR PV APPLICATION
INTERLEAVED BOOST CONVERTER FOR PV APPLICATION
 
Speed con trol of dc motor
Speed con trol of dc motorSpeed con trol of dc motor
Speed con trol of dc motor
 
Ajal UPQC
Ajal UPQC Ajal UPQC
Ajal UPQC
 
MPPT using fuzzy logic
MPPT using fuzzy logicMPPT using fuzzy logic
MPPT using fuzzy logic
 
Classical Sets & fuzzy sets
Classical Sets & fuzzy setsClassical Sets & fuzzy sets
Classical Sets & fuzzy sets
 
FUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTER
FUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTERFUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTER
FUZZY LOGIC CONTROLLER ON DC/DC BOOST CONVERTER
 
report3
report3report3
report3
 
Power quality improvement using upqc with soft computing method: Fuzzy logic
Power quality improvement using upqc with soft computing method: Fuzzy logicPower quality improvement using upqc with soft computing method: Fuzzy logic
Power quality improvement using upqc with soft computing method: Fuzzy logic
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
MPPT Solar Charge Controller
MPPT Solar Charge ControllerMPPT Solar Charge Controller
MPPT Solar Charge Controller
 
Solar Presentation.pptx
Solar Presentation.pptxSolar Presentation.pptx
Solar Presentation.pptx
 
Hybrid wind solar energy system
Hybrid wind solar energy systemHybrid wind solar energy system
Hybrid wind solar energy system
 
MPPT Based Optimal Charge Controller in PV system
MPPT Based Optimal Charge Controller in PV systemMPPT Based Optimal Charge Controller in PV system
MPPT Based Optimal Charge Controller in PV system
 

Similar to Applying soft computing techniques to corporate mobile security systems

Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...
Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...
Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...Paloma De Las Cuevas
 
Model Based Test Validation and Oracles for Data Acquisition Systems
Model Based Test Validation and Oracles for Data Acquisition SystemsModel Based Test Validation and Oracles for Data Acquisition Systems
Model Based Test Validation and Oracles for Data Acquisition SystemsLionel Briand
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseGeorge Kalangi
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databaseselliando dias
 
QA Meetup at Signavio (Berlin, 06.06.19)
QA Meetup at Signavio (Berlin, 06.06.19)QA Meetup at Signavio (Berlin, 06.06.19)
QA Meetup at Signavio (Berlin, 06.06.19)Anesthezia
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
Qtp manual testing tutorials by QuontraSolutions
Qtp manual testing tutorials by QuontraSolutionsQtp manual testing tutorials by QuontraSolutions
Qtp manual testing tutorials by QuontraSolutionsQUONTRASOLUTIONS
 
When assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() failsWhen assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() failsMartin Skurla
 
Test Driven Development with Sql Server
Test Driven Development with Sql ServerTest Driven Development with Sql Server
Test Driven Development with Sql ServerDavid P. Moore
 
Strategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source FrameworksStrategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source FrameworksDimitry Polivaev
 
The Current State of the Art of Regression Testing
The Current State of the Art of Regression TestingThe Current State of the Art of Regression Testing
The Current State of the Art of Regression TestingJohn Reese
 
Mt s11 test_design
Mt s11 test_designMt s11 test_design
Mt s11 test_designTestingGeeks
 
Testing insights from data lakes
Testing insights from data lakesTesting insights from data lakes
Testing insights from data lakesshivindkaur
 
Ontologies mining using association rules
Ontologies mining using association rulesOntologies mining using association rules
Ontologies mining using association rulesChemseddine Berbague
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
Lauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataLauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataTEST Huddle
 

Similar to Applying soft computing techniques to corporate mobile security systems (20)

Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...
Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...
Going a Step Beyond the Black and White Lists for URL Accesses in the Enterpr...
 
Model Based Test Validation and Oracles for Data Acquisition Systems
Model Based Test Validation and Oracles for Data Acquisition SystemsModel Based Test Validation and Oracles for Data Acquisition Systems
Model Based Test Validation and Oracles for Data Acquisition Systems
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's disease
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databases
 
QA Meetup at Signavio (Berlin, 06.06.19)
QA Meetup at Signavio (Berlin, 06.06.19)QA Meetup at Signavio (Berlin, 06.06.19)
QA Meetup at Signavio (Berlin, 06.06.19)
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Qtp manual testing tutorials by QuontraSolutions
Qtp manual testing tutorials by QuontraSolutionsQtp manual testing tutorials by QuontraSolutions
Qtp manual testing tutorials by QuontraSolutions
 
When assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() failsWhen assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() fails
 
Test Driven Development with Sql Server
Test Driven Development with Sql ServerTest Driven Development with Sql Server
Test Driven Development with Sql Server
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Strategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source FrameworksStrategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source Frameworks
 
The Current State of the Art of Regression Testing
The Current State of the Art of Regression TestingThe Current State of the Art of Regression Testing
The Current State of the Art of Regression Testing
 
Mt s11 test_design
Mt s11 test_designMt s11 test_design
Mt s11 test_design
 
Testing insights from data lakes
Testing insights from data lakesTesting insights from data lakes
Testing insights from data lakes
 
Maestro_Abstract
Maestro_AbstractMaestro_Abstract
Maestro_Abstract
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Ontologies mining using association rules
Ontologies mining using association rulesOntologies mining using association rules
Ontologies mining using association rules
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Lauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataLauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test Data
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Applying soft computing techniques to corporate mobile security systems

  • 1. Applying Soft Computing Techniques to Corporate Mobile Security Systems Máster en Ingeniería de Computadores y Redes Paloma de las Cuevas Delgado Dirigida por los Doctores: Antonio Miguel Mora García Juan Julián Merelo Guervós
  • 2. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 3. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 5. Research Context ● Bring Your Own Device problem 2
  • 6. Research Context ● MUSES SERVER 3
  • 7. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 8. Underlying Problem ● Enterprise Security applied to employees’ connections to the Internet (URL requests). ● Security? How? ○ Proxy ○ Blacklists ○ Whitelists ○ Firewalls ○ Elaboration of Corporate Security Policies List of URLs which are permitted (white) or not (black) ● The aim of this research is going a step beyond. 5
  • 9. ● Objective → to obtain a tool for automatically making an allowance or denial decision with respect to URLs that are not included in the black/whitelists. ○ This decision would be based in the one made for similar URL accesses (those with similar features). ○ The tool should consider other parameters of the request in addition to the URL string. Objectives 6
  • 10. 1. Data Mining process a. Parsing b. Preprocessing Followed Schema 2. Labelling process (requests labelled as ALLOW or DENY) 3. Machine Learning 4. Studying classification accuracies 7
  • 11. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 12. Working Scenario ● Employees requesting accesses to URLs (records from an actual Spanish company - around 100 employees) during workday. ● Having access to a Log File of 100k entries (patterns) within two hours (8 - 10 am). CSV file format. ● Also, we were provided with a set of rules (specification of the security policies on if-then clauses). 9
  • 13. ● An Entry (unlabelled) ● A Policy and a Rule “Video streamings cannot be reproduced” Data description http_reply_ code http_metho d duration_ miliseconds content_type server_or_ cache_address time squid_hierarchy bytes url client_ adress 200 GET 1114 application/octet-stream X.X.X.X 08:30:08 DEFAULT_PARENT 106961 http://www. one.example. com X.X.X.X rule "policy-1 MP4" attributes when squid:Squid(dif_MCT=="video",bytes>1000000, content_type matches "*.application.*, url matches "*.p2p.* ) then PolicyDecisionPoint.deny(); end 10
  • 14. ● An Entry ○ Has 7 categorical fields and 3 numerical fields. ● A Rule ○ Has a set of conditions, and a decision (ALLOW/DENY). ○ Each condition has three parts: ■ Data Type (e.g. bytes) ■ Relationship (e.g. < ) ■ Value (e.g. 1000000) Data description 11
  • 15. Tools used during this research ● Drools and Squid syntax for the rules, CSV format for Log data. ● Weka, which has a great and state-of-the-art set of classifiers. ● Two implementations: ○ Perl → faster in the parsing process, slower with the labelling process and the use of weka. ○ Java → native implementation with weka, better for automation, and it will be embedded in an actual Java project (MUSES). 12
  • 16. After the parsing process ● A hash with the entries ○ Keys → Entry fields ○ Values → Field values ● A hash with the set of rules ○ Keys → Condition fields, and decision ○ Values → Name of the data type, its desired value, relationship between them, and allow, or deny. %logdata = ( entry =>{ http_reply_code =>xxx http_method =>xxx duration_miliseconds =>xxx content_type =>xxx server_or_cache_address =>xxx time =>xxx squid_hierarchy =>xxx bytes =>xxx url =>xxx client_address =>xxx }, ); %rules = ( rule =>{ field =>xxx relation =>xxx value =>xxx decision =>[allow, deny] }, ); 13
  • 17. ● The two hashes are compared during the labelling process. ● Conditions of each rule are checked in each entry. ● If an entry meets all conditions, it is labelled with the corresponding decision of the rule. ● A pair key-value is included in the hash of that entry, with the decision. ● Conflict resolution: Labelling Process ○ Entry meets conditions of a rule that allows making the request. ○ Entry meets conditions of a rule that denies making the request. 14
  • 18. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 19. Data Summary ● The CSV file, now with all the patterns that could be labelled (the others were not covered by the rules), has 57502 entries/patterns: ○ 38972 with an ALLOW label. ○ 18530 with a DENY label. 2:1 ratio ● It might be needed to apply data balancing techniques: ○ Undersampling: random removal of patterns in majority class. ○ Oversampling: duplication of each pattern in minority class. 16
  • 20. Experimental Setup ● The classifiers are tested, firstly, with a 10-fold cross-validation process. ○ Top five classifiers in accuracy, are chosen for the following experiments. ○ Also, Naïve Bayes classifier is taking as a reference. ● Secondly, a division process is performed over the initial (labelled) log file, into both training and test files. ● These training and test files are created with different ratios and either taking the entries randomly or sequentially. 17
  • 21. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 22. Flow Diagram 1) Initial labelling process. Experiments with unbalanced, and balanced data. From those, divisions are made: ● 80% training 20% testing ● 90% training 10% testing Randomly, and sequentially. 3) Enhancing the creation of training and test files. Experiments with unbalanced data. From those, divisions are made, patterns randomly taken: ● 80% training 20% testing ● 90% training 10% testing ● 60% training 40% testing 2) Removal of duplicated requests. Experiments with unbalanced data. From those, divisions are made: ● 80% training 20% testing ● 90% training 10% testing ● 60% training 40% testing Randomly, and sequentially. 4) Filtering the features of the URL. Experiments with unbalanced, and balanced data. From those, divisions are made, patterns randomly taken: ● 80% training 20% testing ● 90% training 10% testing ● 60% training 40% testing 19
  • 23. First set of experiments 1) Initial labelling process. ● The classifiers are tested, firstly, with a 10-fold cross-validation process over the balanced data. 20
  • 24. First set of experiments 1) Initial labelling process. ● Naïve Bayes and top five classifiers are tested with training and test divisions, in order to avoid testing patterns being used for training and vice versa. 21
  • 25. First set of experiments 1) Initial labelling process. Divisions made over unbalanced data 22
  • 26. First set of experiments 1) Initial labelling process. Divisions made over balanced data (undersampling) 23
  • 27. First set of experiments 1) Initial labelling process. Divisions made over balanced data (oversampling) 24
  • 28. ● We studied the field squid_hierarchy and saw that had two possible values: DIRECT or DEFAULT_PARENT. ● The connections are made, firstly, to the Squid proxy, and then, if appropriate, the request continues to another server. ○ Then, some of the entries were repeated, and results may be affected for that. Second set of experiments 2) Removal of duplicated requests. http_reply_ code http_metho d duration_ miliseconds content_type server_or_ cache_address time squid_hierarchy bytes url client_ adress 200 GET 1114 application/octet-stream X.X.X.X 08:30:08 DEFAULT_PARENT 106961 http://www. one.example. com X.X.X.X 25
  • 29. Second set of experiments 2) Removal of duplicated requests. Divisions made over unbalanced data 26
  • 30. Third set of experiments 3) Enhancing the creation of training and test files. ● Repeated URL core domains could yield to false results. ● During the division process, we ensured that requests with the same URL core domain went to the same file (either for training or for testing). 27
  • 31. Third set of experiments 3) Enhancing the creation of training and test files. 28
  • 32. Created Rules During Classification ● In the experiments that included only the URL core domain as a classification feature, rules were too focused on that feature. PART decision list ------------------ url = dropbox: deny (2999.0) url = ubuntu: allow (2165.0) url = facebook: deny (1808.0) url = valli: allow (1679.0) 29
  • 33. Created Rules During Classification ● Another kind of rules were found, but always dependant on the URL core domain. url = grooveshark AND http_method = POST: allow (733.0) url = googleapis AND content_type = text/javascript AND client_address = 192.168.4.4: allow (155.0/2.0) url = abc AND content_type_MCT = image AND time <= 31532000: allow (256.0) 30
  • 34. Fourth set of experiments 4) Filtering the features of the URL. ● Rules created by the classifiers are too focused on the URL core domain feature. ● We did the experiments again with the original file, but including as a feature only the Top Level Domain of the URL, and not the core domain. 31
  • 35. Fourth set of experiments 4) Filtering the features of the URL. Divisions made over unbalanced data 32
  • 36. Fourth set of experiments 4) Filtering the features of the URL. Divisions made over balanced data 33
  • 37. Created Rules During Classification ● After including the URL top level domain as a classification feature, instead of URL core domain, rules classify mainly by server address. PART decision list ------------------ server_or_cache_address = 173.194.34.248: allow (238.0/1.0) server_or_cache_address = 8.27.153.126: allow (235.0/2.0) server_or_cache_address = 91.121.155.13: deny (235.0) server_or_cache_address = 66.220.152.19: deny (201.0) 34
  • 38. Created Rules During Classification ● URL TLD appears, but now the rules are not always dependant on this feature. server_or_cache_address = 90.84.53.48 AND client_address = 10.159.39.199 AND tld = es AND time <= 31533000: allow (138.0/1.0) content_type = application/octet-stream AND tld = com AND server_or_cache_address = 192.168.4.4 AND client_address = 10.159.86.22: allow (210.0) server_or_cache_address = 90.84.53.19 AND tld = com: deny (33.0/1.0) 35
  • 39. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 40. ● In most cases, Random Forest classifier is the one that yields better results. ● The loss of information when analysing a Log of URL requests lowers the results. This happens when: ○ Oversampling data (because we randomly remove data). ○ Keeping the sequence of the requests of the initial Log file while making the division in training and test files. Conclusions 37
  • 41. Conclusions ● For future experiments, it should be ensured that same URL lexical features (like the core domain) are not in both training and test files at the same time. ○ This wrongs the results. ● As seen in the rules obtained, it is possible to develop a tool that automatically makes an allowance or denial decision with respect to URLs, and that decision would depend on other features of a URL request and not only the URL. 38
  • 42. Scientific Contributions ● MUSES: A corporate user-centric system which applies computational intelligence methods, at ACM SAC conference, Gyeongju, Korea, March 2014. ● Enforcing Corporate Security Policies via Computational Intelligence Techniques, at SecDef Workshop at GECCO, Vancouver, July 2014. ● Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by means of Categorical Classifiers, at ECTA, Rome, Italy, October 2014. 39
  • 43. 1. Research context. 2. Underlying problem and objectives. 3. Data description and preprocessing. 4. Experimental setup. 5. Experiments and results. 6. Conclusions and scientific contributions. 7. Future Work. Index
  • 44. ● Making experiments with bigger data sets (e.g. a whole workday). ● Include more lexical features of a URL in the experiments (e. g. number of subdomains, number of arguments, or the path). ● Consider sessions when classifying. ○ Defining session as the set of requests that are made from a certain client during a certain time). ● To finally implement a system and to prove them with real data, in real-time. Future Work 41
  • 45. Thank you for your attention Questions? paloma@geneura.ugr.es Twitter @unintendedbear