SlideShare a Scribd company logo
Ardışık Topluluk Ögrenmesine Dayalı
Gürbüz Anomali Tespiti
SIU 2020
Selim Firat Yilmaz
Suleyman Serdar Kozat
5-7 Ekim 2020
Bilkent Üniversitesi Elektrik ve Elektronik Mühendisliği
Anomali Tespiti
• Beklenen davranışa uymayan desenlerin bulunmasını amaçlar.
• Anomali: Verinin beklenen dagılımına aykırı davranan
örneklerdir1
.
• Kullanım alanları:
• Siber atakların tespiti2
• Videolarda anormal aktivite tespiti3
• Kredi kartı dolandırıcılıklarının tespiti4
• Gözetimsiz anomali tespiti
1Varun Chandola, Arindam Banerjee, and Vipin Kumar. “Anomaly detection: A survey”. In: ACM computing surveys (CSUR) 41.3 (2009), p. 15.
2Ahmad Javaid et al. “A deep learning approach for network intrusion detection system”. In: Proceedings of the 9th EAI International
Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS). ICST (Institute for Computer Sciences,
Social-Informatics and …. 2016, pp. 21–26.
3Waqas Sultani, Chen Chen, and Mubarak Shah. “Real-world anomaly detection in surveillance videos”. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition. 2018, pp. 6479–6488.
4John Akhilomen. “Data mining application for cyber credit-card fraud detection system”. In: Industrial Conference on Data Mining.
Springer. 2013, pp. 218–228.
1
Literatür
• İzolasyon Ormanı (İO) (Isolation Forest)5
• Hem anomali hem normal örnekler olduğunda da iyi.
• k En Yakın Komşu (k-EYK) (kth
NN)6
• Sadece normal örnekler olduğunda iyi.
d3 d1
d2
SKORLA( ) = max(d1, d2, d3)
Ayrıştırılmış
Anomali
Normal
Hedef
d0
5Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. “Isolation forest”. In: 2008 Eighth IEEE International Conference on Data Mining. IEEE. 2008,
pp. 413–422.
6Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. “Efficient algorithms for mining outliers from large data sets”. In: ACM Sigmod
Record. Vol. 29. 2. ACM. 2000, pp. 427–438.
2
Önerilen Model
• İzolasyon Ormanı Gözetimli k En Yakın Komşu modeli
• Veri etiketlemeye ihtiyaç duymaz.
• Anomalilere karşı gürbüz (robust)
• Kaynak kod: https://github.com/selimfirat/siu-sead
3
Önerilen Model
Izolasyon Ormanı
E1
E2
k En Yakın Komşu
Ayrıştırılmış
Anomali
Normal
Hedef
Anomali Skoru
Anomalileri ayrıştırma
Eğitim
Eğitim
Tahmin
4
Deney Protokolü ve Veri Kümeleri
• 5-kat çapraz doğrulama ve 3 farklı rassallık sabiti = 15 deney
• Ortalama kesinlik metriği7
.
Table 1: Verİ Kümelerİnİn İstatİstİklerİ8
Veri Kümeleri
İstatistikler
Örnek Miktarı Boyut Sayısı Anomali Yüzdesi
satellite 5803 36 1.22%
satimage-2 6435 36 31.63%
pendigits 6870 16 2.27%
musk 3062 166 3.17%
7Xiaodan Xu, Huawen Liu, and Minghai Yao. “Recent Progress of Anomaly Detection”. In: Complexity 2019 (2019), pp. 1–11. doi:
10.1155/2019/2686378.
8http://odds.cs.stonybrook.edu
5
Deneyler
Table 2: Normal Örneklerle Eğİtİlen Modellerİn Ortalama Kesinlik
Sonuçları
Veri Kümeleri
Modeller
TS-DVM k-EYK İO Önerilen Model
satellite 0.701 0.803 0.750 0.794
satimage-2 0.953 0.955 0.907 0.941
pendigits 0.362 0.945 0.380 0.308
musk 0.884 0.6421 0.345 0.918
Table 3: Karışık Örneklerle Eğİtİlen Modellerİn Ortalama Kesİnlİk
Sonuçları
Veri Kümeleri
Modeller
TS-DVM k-EYK İO Önerilen Model
satellite 0.639 0.549 0.655 0.661
satimage-2 0.966 0.373 0.917 0.968
pendigits 0.249 0.086 0.294 0.311
musk 1.000 0.258 0.947 1.000
6
Parametre ve Anomali Oranı Analizi
0.0 0.2 0.4 0.6 0.8 1.0
Parametresi
0.2
0.4
0.6
0.8
1.0
OrtalamaKesinlik
Veri Kümesi
musk
pendigits
satellite
satimage-2
Figure 1: Önerilen modelin λ
parametresine karşılık ortalama
kesinlik metriğinin değişimi
0.0 0.2 0.4 0.6 0.8 1.0
Anomali Oran
0.0
0.2
0.4
0.6
0.8
1.0
OrtalamaKesinlik
Model
IO
k-EYK
Önerilen Model
TS-DVM
Figure 2: Satellite verisinin eğitim
kümesindeki anomali oranına göre
modellerin ortalama kesinlik
skorlarının değişimi
7
Yeni Derin Öğrenme Modeli
• Unsupervised Anomaly Detection via Deep Metric Learning with
End-to-End Optimization
• https://arxiv.org/abs/2005.05865
• selimfirat.github.io
8
Teşekkür
Dinlediğiniz için teşekkür ederim.
selimfirat.github.io
9
Referanslar i
References
John Akhilomen. “Data mining application for cyber credit-card
fraud detection system”. In: Industrial Conference on Data
Mining. Springer. 2013, pp. 218–228.
Varun Chandola, Arindam Banerjee, and Vipin Kumar. “Anomaly
detection: A survey”. In: ACM computing surveys (CSUR) 41.3
(2009), p. 15.
10
Referanslar ii
Ahmad Javaid et al. “A deep learning approach for network
intrusion detection system”. In: Proceedings of the 9th EAI
International Conference on Bio-inspired Information and
Communications Technologies (formerly BIONETICS). ICST
(Institute for Computer Sciences, Social-Informatics and … 2016,
pp. 21–26.
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. “Isolation forest”.
In: 2008 Eighth IEEE International Conference on Data Mining.
IEEE. 2008, pp. 413–422.
Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim.
“Efficient algorithms for mining outliers from large data sets”. In:
ACM Sigmod Record. Vol. 29. 2. ACM. 2000, pp. 427–438.
11
Referanslar iii
Waqas Sultani, Chen Chen, and Mubarak Shah. “Real-world
anomaly detection in surveillance videos”. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition.
2018, pp. 6479–6488.
Xiaodan Xu, Huawen Liu, and Minghai Yao. “Recent Progress of
Anomaly Detection”. In: Complexity 2019 (2019), pp. 1–11. doi:
10.1155/2019/2686378.
12

More Related Content

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Ardışık Topluluk Öğrenmesine Dayalı Gürbüz Anomali Tespiti

  • 1. Ardışık Topluluk Ögrenmesine Dayalı Gürbüz Anomali Tespiti SIU 2020 Selim Firat Yilmaz Suleyman Serdar Kozat 5-7 Ekim 2020 Bilkent Üniversitesi Elektrik ve Elektronik Mühendisliği
  • 2. Anomali Tespiti • Beklenen davranışa uymayan desenlerin bulunmasını amaçlar. • Anomali: Verinin beklenen dagılımına aykırı davranan örneklerdir1 . • Kullanım alanları: • Siber atakların tespiti2 • Videolarda anormal aktivite tespiti3 • Kredi kartı dolandırıcılıklarının tespiti4 • Gözetimsiz anomali tespiti 1Varun Chandola, Arindam Banerjee, and Vipin Kumar. “Anomaly detection: A survey”. In: ACM computing surveys (CSUR) 41.3 (2009), p. 15. 2Ahmad Javaid et al. “A deep learning approach for network intrusion detection system”. In: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS). ICST (Institute for Computer Sciences, Social-Informatics and …. 2016, pp. 21–26. 3Waqas Sultani, Chen Chen, and Mubarak Shah. “Real-world anomaly detection in surveillance videos”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 6479–6488. 4John Akhilomen. “Data mining application for cyber credit-card fraud detection system”. In: Industrial Conference on Data Mining. Springer. 2013, pp. 218–228. 1
  • 3. Literatür • İzolasyon Ormanı (İO) (Isolation Forest)5 • Hem anomali hem normal örnekler olduğunda da iyi. • k En Yakın Komşu (k-EYK) (kth NN)6 • Sadece normal örnekler olduğunda iyi. d3 d1 d2 SKORLA( ) = max(d1, d2, d3) Ayrıştırılmış Anomali Normal Hedef d0 5Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. “Isolation forest”. In: 2008 Eighth IEEE International Conference on Data Mining. IEEE. 2008, pp. 413–422. 6Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. “Efficient algorithms for mining outliers from large data sets”. In: ACM Sigmod Record. Vol. 29. 2. ACM. 2000, pp. 427–438. 2
  • 4. Önerilen Model • İzolasyon Ormanı Gözetimli k En Yakın Komşu modeli • Veri etiketlemeye ihtiyaç duymaz. • Anomalilere karşı gürbüz (robust) • Kaynak kod: https://github.com/selimfirat/siu-sead 3
  • 5. Önerilen Model Izolasyon Ormanı E1 E2 k En Yakın Komşu Ayrıştırılmış Anomali Normal Hedef Anomali Skoru Anomalileri ayrıştırma Eğitim Eğitim Tahmin 4
  • 6. Deney Protokolü ve Veri Kümeleri • 5-kat çapraz doğrulama ve 3 farklı rassallık sabiti = 15 deney • Ortalama kesinlik metriği7 . Table 1: Verİ Kümelerİnİn İstatİstİklerİ8 Veri Kümeleri İstatistikler Örnek Miktarı Boyut Sayısı Anomali Yüzdesi satellite 5803 36 1.22% satimage-2 6435 36 31.63% pendigits 6870 16 2.27% musk 3062 166 3.17% 7Xiaodan Xu, Huawen Liu, and Minghai Yao. “Recent Progress of Anomaly Detection”. In: Complexity 2019 (2019), pp. 1–11. doi: 10.1155/2019/2686378. 8http://odds.cs.stonybrook.edu 5
  • 7. Deneyler Table 2: Normal Örneklerle Eğİtİlen Modellerİn Ortalama Kesinlik Sonuçları Veri Kümeleri Modeller TS-DVM k-EYK İO Önerilen Model satellite 0.701 0.803 0.750 0.794 satimage-2 0.953 0.955 0.907 0.941 pendigits 0.362 0.945 0.380 0.308 musk 0.884 0.6421 0.345 0.918 Table 3: Karışık Örneklerle Eğİtİlen Modellerİn Ortalama Kesİnlİk Sonuçları Veri Kümeleri Modeller TS-DVM k-EYK İO Önerilen Model satellite 0.639 0.549 0.655 0.661 satimage-2 0.966 0.373 0.917 0.968 pendigits 0.249 0.086 0.294 0.311 musk 1.000 0.258 0.947 1.000 6
  • 8. Parametre ve Anomali Oranı Analizi 0.0 0.2 0.4 0.6 0.8 1.0 Parametresi 0.2 0.4 0.6 0.8 1.0 OrtalamaKesinlik Veri Kümesi musk pendigits satellite satimage-2 Figure 1: Önerilen modelin λ parametresine karşılık ortalama kesinlik metriğinin değişimi 0.0 0.2 0.4 0.6 0.8 1.0 Anomali Oran 0.0 0.2 0.4 0.6 0.8 1.0 OrtalamaKesinlik Model IO k-EYK Önerilen Model TS-DVM Figure 2: Satellite verisinin eğitim kümesindeki anomali oranına göre modellerin ortalama kesinlik skorlarının değişimi 7
  • 9. Yeni Derin Öğrenme Modeli • Unsupervised Anomaly Detection via Deep Metric Learning with End-to-End Optimization • https://arxiv.org/abs/2005.05865 • selimfirat.github.io 8
  • 10. Teşekkür Dinlediğiniz için teşekkür ederim. selimfirat.github.io 9
  • 11. Referanslar i References John Akhilomen. “Data mining application for cyber credit-card fraud detection system”. In: Industrial Conference on Data Mining. Springer. 2013, pp. 218–228. Varun Chandola, Arindam Banerjee, and Vipin Kumar. “Anomaly detection: A survey”. In: ACM computing surveys (CSUR) 41.3 (2009), p. 15. 10
  • 12. Referanslar ii Ahmad Javaid et al. “A deep learning approach for network intrusion detection system”. In: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS). ICST (Institute for Computer Sciences, Social-Informatics and … 2016, pp. 21–26. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. “Isolation forest”. In: 2008 Eighth IEEE International Conference on Data Mining. IEEE. 2008, pp. 413–422. Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. “Efficient algorithms for mining outliers from large data sets”. In: ACM Sigmod Record. Vol. 29. 2. ACM. 2000, pp. 427–438. 11
  • 13. Referanslar iii Waqas Sultani, Chen Chen, and Mubarak Shah. “Real-world anomaly detection in surveillance videos”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 6479–6488. Xiaodan Xu, Huawen Liu, and Minghai Yao. “Recent Progress of Anomaly Detection”. In: Complexity 2019 (2019), pp. 1–11. doi: 10.1155/2019/2686378. 12