SlideShare a Scribd company logo
Introduction of Isolation Forest
Anomaly Detection Series 

異常偵測
[ PoIMS .MoLAlX 1eSeCSIoM

- 著重在1ASA點的分佈偵測

[ 0oMSexSTAl .MoLAlX 1eSeCSIoM

- 根據 前後文 決定該點在整h序列g是否
可疑

[ 0olleCSIve .MoLAlX 1eSeCSIoM

- 一h序列g哪l子序列(RTB-RerIeR 相較
於整體序列是可疑的
.MoLAlX 1eSeCSIoM 分成a種大類^v需求d同
可pt用d同的方式處理
孤立森林 PoIMS .MoLAlX 1eSeCSIoM 7eSHoD
[ 模型y設: 異常x稀少e在FeASTreb與一般
的資料d一樣^因此t得異常x容易透過切分
的方式被分別出來
[ t用STBRALOlIMG , /AGGIMG方式建立模型^
因此可p避免7ARkIMG , SwALOIMG問題
[ 6IMeAr TILe
[ 7ARkIMG : 異常x太多、密度太高導致難p切
分
[ SwALOIMG:異常x跟正常x太接近難p切分
理論n紹 直覺:異常點在超平面切分後很容易就被區隔出來
理論n紹 5RolASIoM Tree 的建構方法
1. y設資料集 =,{x1,x2… xM} 特徵維度為D
2. 選取特徵P 並選取切分點 O 

(隨機在特徵P的最大最小範圍內取x ^

將=根據該切分點切分為左右節點
3. 若pca個條q其gi一到達則停止:
(I 樹達到限定高度
(II 節點僅剩一筆資料集
(III 節點g得所有數據都有一樣的x
理論n紹 異常分數的計算
D. M. Rocke and D. L. Woodruff. Identification of outliers in multivariate data. Journal of the American Statistical As- sociation, 91(435):1047–1061, 1996.
[ C(M o表 M 個 DASA進}k{搜尋樹的平
均搜尋路徑
[ 用ps為 /ARe 6IMe 與期望層數s比較
[ 數x越接近1o表越可疑
[ 數x越接近 0 o表越d可疑
理論n紹 5RolASIoM Tree 的建構方法
{姓名:小明 , 身高:120 , 體重: )0}
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
身高- 1)0
理論n紹 5RolASIoM Tree 的建構方法
{姓名:小明 , 身高:120 , 體重: )0}
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
身高- 1)0
理論n紹
每次分割都抽取其g一個2eASTre, 並挑選w選節點g最大最小i間的x
直到每個節點都達到終止條q。
{姓名:小明 , 身高:120 , 體重: )0}
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
身高- 1)0
身高-150
體重-(0
理論n紹
{姓名:小明 , 身高:120 , 體重: )0}
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
身高- 1)0
身高-150
體重-(0
體重-50
身高-1)(
每次分割都抽取其g一個2eASTre, 並挑選w選節點g最大最小i間的x
直到每個節點都達到終止條q。
理論n紹 建立多個ITree ^組成I2oreRS
{姓名:小明 , 身高:120 , 體重: )0}
身高-130
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
理論n紹 最高層數4層的u子
{姓名:小明 , 身高:120 , 體重: )0}
身高-130
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
體重-(8
理論n紹 建立多個ITree ^組成I2oreRS
{姓名:小明 , 身高:120 , 體重: )0}
身高-130
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
體重-(8
體重-(5
理論n紹 建立多個ITree ^組成I2oreRS
{姓名:小明 , 身高:120 , 體重: )0}
身高-130
{姓名:小美 , 身高:1(0 , 體重: 40}
{姓名:小華 , 身高:1)5 , 體重: (0}
{姓名:小泰 , 身高:1)3 , 體重: )0}
{姓名:小f , 身高:1(0 , 體重: ((}
{姓名:小國 , 身高:180 , 體重: 50}
體重-(8
體重-(5
身高- 1(2
理論n紹
身
體
體
身
身
身
體
體
身
{
姓名:小明 ,
身高:120 ,
體重: )0
}
[ 小明的數據在第一棵樹被分到第k層
[ 小明的數據在第k棵樹被分到第一層
[ 平均層數為 1.5
[ C(( , 24(( ] 1 ] (2(( ] 1 /( ,2.)2
[ .MoLAlX SCore , 2^-(1.5/2.)2 ,
0.(82
異常x的計算方式
ProR AMD 0oMR
[ 線性時間、d需計算距離
[ SCAlABle 可p平行運算
[ 避免SwALOIMG ,7ARkIMG
[ 數x變數才能用
[ 0TrRe oF 1ILeMRIoM
[ 僅對3loBAl 8TSlIer敏感

>5LOrovIMG I2oreRS wISH :elASIve 7ARR]
[ 數據量過大反而會效果d好
Pros Cons
.OOlICASIoM
[ .MoLAlX o表的是某lx跟現有的資料
集d一樣 

(大量的提領到底是潛在的商機還是潛在的
犯罪?
[ .MoLAlX SCore 要怎樣跟業務單r的目
的找到m集?

(經過業務單r的經驗訪談轉換爲具備解釋
性及業務意義的變數

More Related Content

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Isolation Forest (孤立森林)

  • 1. Introduction of Isolation Forest Anomaly Detection Series 

  • 2. 異常偵測 [ PoIMS .MoLAlX 1eSeCSIoM
 - 著重在1ASA點的分佈偵測
 [ 0oMSexSTAl .MoLAlX 1eSeCSIoM
 - 根據 前後文 決定該點在整h序列g是否 可疑
 [ 0olleCSIve .MoLAlX 1eSeCSIoM
 - 一h序列g哪l子序列(RTB-RerIeR 相較 於整體序列是可疑的 .MoLAlX 1eSeCSIoM 分成a種大類^v需求d同 可pt用d同的方式處理
  • 3. 孤立森林 PoIMS .MoLAlX 1eSeCSIoM 7eSHoD [ 模型y設: 異常x稀少e在FeASTreb與一般 的資料d一樣^因此t得異常x容易透過切分 的方式被分別出來 [ t用STBRALOlIMG , /AGGIMG方式建立模型^ 因此可p避免7ARkIMG , SwALOIMG問題 [ 6IMeAr TILe [ 7ARkIMG : 異常x太多、密度太高導致難p切 分 [ SwALOIMG:異常x跟正常x太接近難p切分
  • 5. 理論n紹 5RolASIoM Tree 的建構方法 1. y設資料集 =,{x1,x2… xM} 特徵維度為D 2. 選取特徵P 並選取切分點 O 
 (隨機在特徵P的最大最小範圍內取x ^
 將=根據該切分點切分為左右節點 3. 若pca個條q其gi一到達則停止: (I 樹達到限定高度 (II 節點僅剩一筆資料集 (III 節點g得所有數據都有一樣的x
  • 6. 理論n紹 異常分數的計算 D. M. Rocke and D. L. Woodruff. Identification of outliers in multivariate data. Journal of the American Statistical As- sociation, 91(435):1047–1061, 1996. [ C(M o表 M 個 DASA進}k{搜尋樹的平 均搜尋路徑 [ 用ps為 /ARe 6IMe 與期望層數s比較 [ 數x越接近1o表越可疑 [ 數x越接近 0 o表越d可疑
  • 7. 理論n紹 5RolASIoM Tree 的建構方法 {姓名:小明 , 身高:120 , 體重: )0} {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 身高- 1)0
  • 8. 理論n紹 5RolASIoM Tree 的建構方法 {姓名:小明 , 身高:120 , 體重: )0} {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 身高- 1)0
  • 9. 理論n紹 每次分割都抽取其g一個2eASTre, 並挑選w選節點g最大最小i間的x 直到每個節點都達到終止條q。 {姓名:小明 , 身高:120 , 體重: )0} {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 身高- 1)0 身高-150 體重-(0
  • 10. 理論n紹 {姓名:小明 , 身高:120 , 體重: )0} {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 身高- 1)0 身高-150 體重-(0 體重-50 身高-1)( 每次分割都抽取其g一個2eASTre, 並挑選w選節點g最大最小i間的x 直到每個節點都達到終止條q。
  • 11. 理論n紹 建立多個ITree ^組成I2oreRS {姓名:小明 , 身高:120 , 體重: )0} 身高-130 {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50}
  • 12. 理論n紹 最高層數4層的u子 {姓名:小明 , 身高:120 , 體重: )0} 身高-130 {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 體重-(8
  • 13. 理論n紹 建立多個ITree ^組成I2oreRS {姓名:小明 , 身高:120 , 體重: )0} 身高-130 {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 體重-(8 體重-(5
  • 14. 理論n紹 建立多個ITree ^組成I2oreRS {姓名:小明 , 身高:120 , 體重: )0} 身高-130 {姓名:小美 , 身高:1(0 , 體重: 40} {姓名:小華 , 身高:1)5 , 體重: (0} {姓名:小泰 , 身高:1)3 , 體重: )0} {姓名:小f , 身高:1(0 , 體重: ((} {姓名:小國 , 身高:180 , 體重: 50} 體重-(8 體重-(5 身高- 1(2
  • 15. 理論n紹 身 體 體 身 身 身 體 體 身 { 姓名:小明 , 身高:120 , 體重: )0 } [ 小明的數據在第一棵樹被分到第k層 [ 小明的數據在第k棵樹被分到第一層 [ 平均層數為 1.5 [ C(( , 24(( ] 1 ] (2(( ] 1 /( ,2.)2 [ .MoLAlX SCore , 2^-(1.5/2.)2 , 0.(82 異常x的計算方式
  • 16. ProR AMD 0oMR [ 線性時間、d需計算距離 [ SCAlABle 可p平行運算 [ 避免SwALOIMG ,7ARkIMG [ 數x變數才能用 [ 0TrRe oF 1ILeMRIoM [ 僅對3loBAl 8TSlIer敏感
 >5LOrovIMG I2oreRS wISH :elASIve 7ARR] [ 數據量過大反而會效果d好 Pros Cons
  • 17. .OOlICASIoM [ .MoLAlX o表的是某lx跟現有的資料 集d一樣 
 (大量的提領到底是潛在的商機還是潛在的 犯罪? [ .MoLAlX SCore 要怎樣跟業務單r的目 的找到m集?
 (經過業務單r的經驗訪談轉換爲具備解釋 性及業務意義的變數