Analysis of the “KDD Cup-1999” Datasets

•

1 like•1,097 views

Rafsanjani, Muhammod

This presentation is focus on "KDD Cup-1999" datasets.

Data & Analytics

Analysis of the
“KDD Cup - 1999”
Data Sets
Rafsanjani Muhammod
011-141-144

Overview :
● What is “KDD Cup-1999” data set(s) ?
● Data redundancy
● Types of attack
● Data partitioning
● Imbalance data set(s)
● Results
● Conclusion
● References

What is “KDD Cup-1999” data set(s) ?
KDD Cup 1999 : “Computer Network Intrusion Detection” problem.
[ intrusion = unauthorized user(s) ]
Records : 4,898,431 ( around 5 millions ) in “train data set” &
311,027 in “test data set”.
Features : 41 ( & a class, which consists 23 attributes. )

Types of attack :
● Denial of Service Attack (DoS)
● User to Root Attack (U2R)
● Remote to Local Attack (R2L)
● Probing Attack

Data partitioning :
[ Portnoy et al. ]
Each subset around :
490,000 ( ½ millions )

Imbalance data set(s)
[ K. Leung et al. ]
Sub-data sets : 4, 5, 6 & 7 are all is “smurt” & 8 is all is “neptune”.

[K. Lung et al. ] observed :
1. Around 78% “train data” are duplicant &
2. Around 75% “test data” are duplicant.
[ Portnoy et al. ] observed :
The distribution of this data set(s) are very uneven which made cross-validation
difficult.

Code :
Drawing comparing barplot ( in R) : https://goo.gl/KqZsMM
Sample Code ( in Python ) : https://goo.gl/O4FjRT
Sample Code ( in Java ) : https://goo.gl/0ZSOJY

Conclusion :
[ Tavallaee et al. ] claims that the data set(s) have some problems.
(Such as : Data redundancy, high accuracy rate, highly imbalanced etc. )
So, they proposed new data set(s) name “NSL-KDD”.
Though, McHugh claims that “NSL-KDD” may not be a perfect
representative of existing real networks, because of the lack of
public datasets for network-based IDSs.

References :
1. [ Tavallaee et al. ] “A Detailed Analysis of the KDD CUP 99 Data Set”
2. [ J. McHugh ] “Testing intrusion detection systems: a critique of the 1998
and 1999 darpa intrusion detection system evaluations as performed by
lincoln laboratory”.
3. [ K. Leung et al. ] “Unsupervised anomaly detection in network intrusion
detection using clusters”
4. Dr. Dewan Md. Farid lecture. ( CSE 6011 & CSI 415 )
5. UC Irvine Machine Learning Repository
6. WEKA Team ( Evaluate Performance )
7. Python packages : “Pandas”, “Sci-Kit learn”
8. R packages : “ggplot2”

What's hot

W3afvjbala

Anomaly detectionHitesh Mohapatra

Cyber security threats for 2017Ramiro Cid

Fedarated learningVaishakhKP1

AI & ML in Cyber Security - Why Algorithms are DangerousRaffael Marty

Cyber Crime & SecurityAnchit Rajawat

Man in the middle attack (mitm)Hemal Joshi

Mobile securitydilipdubey5

Security of Machine LearningInstitute of Contemporary Sciences

Phishing pptSanjay Kumar

Scanning web vulnerabilitiesMohit Dholakiya

Vulnerability in aiSrajalTiwari1

Cyber crimesParveen Bala

Anomaly Detection TechniqueChakrit Phain

Operational SecuritySplunk

Cyber crime & security final tapanTapan Khilar

Cyber security with aiBurhan Ahmed

Social networks protection against fake profiles and social bots attacksAboul Ella Hassanien

Securing hand held computing devicesjraja01

Digital Crime & Forensics - Presentationprashant3535

What's hot (20)

W3af

Anomaly detection

Cyber security threats for 2017

Fedarated learning

AI & ML in Cyber Security - Why Algorithms are Dangerous

Cyber Crime & Security

Man in the middle attack (mitm)

Mobile security

Security of Machine Learning

Phishing ppt

Scanning web vulnerabilities

Vulnerability in ai

Cyber crimes

Anomaly Detection Technique

Operational Security

Cyber crime & security final tapan

Cyber security with ai

Social networks protection against fake profiles and social bots attacks

Securing hand held computing devices

Digital Crime & Forensics - Presentation

Similar to Analysis of the “KDD Cup-1999” Datasets

Protocol Type Based Intrusion Detection Using RBF Neural NetworkWaqas Tariq

PERFORMANCE EVALUATION OF DIFFERENT KERNELS FOR SUPPORT VECTOR MACHINE USED I...IJCNCJournal

Deep Convolutional Neural Network based Intrusion Detection SystemSri Ram

Data mining final reportKedar Kumar

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and...IJMER

Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals

QUERY INVERSION TO FIND DATA PROVENANCE cscpconf

2013 feature selection for intrusion detection using nsl kddVan Thanh

Bogdan Kecman INIT Presentationarhismece

Detection of malicious attacks by Meta classification algorithmsEswar Publications

Bogdan Kecman Advanced DatabasingBogdan Kecman

Current issues - International Journal of Network Security & Its Applications...IJNSA Journal

Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...CSCJournals

Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan

A Comparative Study of Deep Learning Approaches for Network Intrusion Detecti...IRJET Journal

A SURVEY ON THE USE OF DATA CLUSTERING FOR INTRUSION DETECTION SYSTEM IN CYBE...IJNSA Journal

DDoS Attack Detection and Botnet Prevention using Machine LearningIRJET Journal

Analysis of the DatasetsRafsanjani, Muhammod

Tutorial On Database Management Systempsathishcs

Design and Implementation of Intrusion Detection and Prevention by Applying D...VAIBHAVSHARMA94659

Similar to Analysis of the “KDD Cup-1999” Datasets (20)

Protocol Type Based Intrusion Detection Using RBF Neural Network

PERFORMANCE EVALUATION OF DIFFERENT KERNELS FOR SUPPORT VECTOR MACHINE USED I...

Deep Convolutional Neural Network based Intrusion Detection System

Data mining final report

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and...

Intrusion Detection System for Classification of Attacks with Cross Validation

QUERY INVERSION TO FIND DATA PROVENANCE

2013 feature selection for intrusion detection using nsl kdd

Bogdan Kecman INIT Presentation

Detection of malicious attacks by Meta classification algorithms

Bogdan Kecman Advanced Databasing

Current issues - International Journal of Network Security & Its Applications...

Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...

Virtual Knowledge Graphs for Federated Log Analysis

A Comparative Study of Deep Learning Approaches for Network Intrusion Detecti...

A SURVEY ON THE USE OF DATA CLUSTERING FOR INTRUSION DETECTION SYSTEM IN CYBE...

DDoS Attack Detection and Botnet Prevention using Machine Learning

Analysis of the Datasets

Tutorial On Database Management System

Design and Implementation of Intrusion Detection and Prevention by Applying D...

Recently uploaded

Carero dropshipping via API with DroFx.pptxolyaivanovalion

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

April 2024 - Crypto Market Report's Analysismanisha194592

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

Halmar dropshipping via API with DroFxolyaivanovalion

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Midocean dropshipping via API with DroFxolyaivanovalion

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

April 2024 - Crypto Market Report's Analysis

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Unveiling Insights: The Role of a Data Analyst

Brighton SEO | April 2024 | Data Storytelling

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

VidaXL dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

Halmar dropshipping via API with DroFx

Mature dropshipping via API with DroFx.pptx

Midocean dropshipping via API with DroFx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

BigBuy dropshipping via API with DroFx.pptx

100-Concepts-of-AI by Anupama Kate .pptx

Log Analysis using OSSEC sasoasasasas.pptx

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Analysis of the “KDD Cup-1999” Datasets

1. Analysis of the “KDD Cup - 1999” Data Sets Rafsanjani Muhammod 011-141-144

2. Overview : ● What is “KDD Cup-1999” data set(s) ? ● Data redundancy ● Types of attack ● Data partitioning ● Imbalance data set(s) ● Results ● Conclusion ● References

3. What is “KDD Cup-1999” data set(s) ? KDD Cup 1999 : “Computer Network Intrusion Detection” problem. [ intrusion = unauthorized user(s) ] Records : 4,898,431 ( around 5 millions ) in “train data set” & 311,027 in “test data set”. Features : 41 ( & a class, which consists 23 attributes. )

4. Tables :

5. Types of attack : ● Denial of Service Attack (DoS) ● User to Root Attack (U2R) ● Remote to Local Attack (R2L) ● Probing Attack

6. Data partitioning : [ Portnoy et al. ] Each subset around : 490,000 ( ½ millions )

7. Imbalance data set(s) [ K. Leung et al. ] Sub-data sets : 4, 5, 6 & 7 are all is “smurt” & 8 is all is “neptune”.

8. [K. Lung et al. ] observed : 1. Around 78% “train data” are duplicant & 2. Around 75% “test data” are duplicant. [ Portnoy et al. ] observed : The distribution of this data set(s) are very uneven which made cross-validation difficult.

9. Result :

10. Code : Drawing comparing barplot ( in R) : https://goo.gl/KqZsMM Sample Code ( in Python ) : https://goo.gl/O4FjRT Sample Code ( in Java ) : https://goo.gl/0ZSOJY

11. Conclusion : [ Tavallaee et al. ] claims that the data set(s) have some problems. (Such as : Data redundancy, high accuracy rate, highly imbalanced etc. ) So, they proposed new data set(s) name “NSL-KDD”. Though, McHugh claims that “NSL-KDD” may not be a perfect representative of existing real networks, because of the lack of public datasets for network-based IDSs.

12. References : 1. [ Tavallaee et al. ] “A Detailed Analysis of the KDD CUP 99 Data Set” 2. [ J. McHugh ] “Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory”. 3. [ K. Leung et al. ] “Unsupervised anomaly detection in network intrusion detection using clusters” 4. Dr. Dewan Md. Farid lecture. ( CSE 6011 & CSI 415 ) 5. UC Irvine Machine Learning Repository 6. WEKA Team ( Evaluate Performance ) 7. Python packages : “Pandas”, “Sci-Kit learn” 8. R packages : “ggplot2”

13. Thankyou to all.

Analysis of the “KDD Cup-1999” Datasets

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Analysis of the “KDD Cup-1999” Datasets

Similar to Analysis of the “KDD Cup-1999” Datasets (20)

Recently uploaded

Recently uploaded (20)

Analysis of the “KDD Cup-1999” Datasets