SlideShare a Scribd company logo
1 of 37
Download to read offline
Introduction to Machine Learning
Learn from Hands-on
Wei-Hsiang, Yu
Data Scientist, aetherAI
2021. Fall
Recap – Core idea of machine learning
Field of study that gives computers the ability to
learn without being explicitly programmed.
- Arthur Lee Samuel, 1959
『
』
2
General workflow of machine learning process
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
3
General workflow of machine learning process
● 醫學 ”影像” 常見問題:分類、偵測、分割
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
4
General workflow of machine learning process
● 還有很多其他類型的問題不在討論範圍內
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
5
General workflow of machine learning process
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
● 哪裡有資料
○ Kaggle
○ Grand-Challenges
○ Papers (https://paperswithcode.com/datasets )
○ 自己蒐集!
6
General workflow of machine learning process
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
● 哪裡有資料
○ Kaggle
○ Grand-Challenges
○ Papers
○ 自己蒐集!
For course demo
7
General workflow of machine learning process
Step2: 蒐集 & 清理資料
1. 根據你預計進行的模型轉換輸入格式
2. 正規化輸入
3. Train – Test Split
4. 改變輸出格式 (It depends)
8
Data Normalization (re-scale)
● 最好、保證、千萬要做 rescale
w1
w2
w1
w2
9
W2 的修正(ΔW)對於 loss 的影響比較大
● 影響訓練的過程
○ 不同 scale 的 weights 修正時會需要不
同的 learning rates
■ 不用 adaptive learning rate 是做
不好的
○ 在同個 scale 下,loss 的等高線會較接
近圓形
➔ gradient 的方向會指向圓心 (最低點)
General workflow of machine learning process
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
● 顯然我們今天是個分類問題
10
General workflow of machine learning process
Step3: 選擇 & 建立模型
● 分類模型有很多種:Logistic Regression, SVM, Decision
Tree, XGBoost, …, 先挑個基本款把流程建起來
● 基本元素
○ 模型
○ 預測
○ 評估
11
Step4: 評估關鍵指標
General workflow of machine learning process
● Important metric for classification task
○ Area Under Receiver-Operating-Curve (AUROC, ROC):Ability for your model to pull the
target distribution from noise distribution.
12
Step4: 評估關鍵指標
Issue: overfitting?
13
training set validation set
Issue: overfitting?
● 通常也會反應在 evaluation metrices 上
14
Training Iterations
Loss
/
Error
Training Error
Testing Error
Issue: overfitting?
15
training set validation set
Overfitting?
Issue: overfitting?
16
training set validation set
Overfitting? – Not a problem in this case
Issue – overfitting
: 假設之後遇到怎麼辦
General idea for overfitting
- Find ways to screw up you model!
Common ways to handle overfitting
● Train / Test Split
● EarlyStopping
● Regularization
● Data augmentation
● Maybe imbalance data?
● Modify loss function
● …
17
Issue – overfitting
: 假設之後遇到怎麼辦
General idea for overfitting
- Find ways to screw up you model!
Common ways to handle overfitting
● Train / Test Split
● EarlyStopping
● Regularization
● Data augmentation
● Maybe imbalance data?
● Modify loss function
● …
18
Not cover today
Issue – overfitting
: Earlystopping
● 在一些比較極端的例子當中, 早點停下來可能比較有利
○ loss 會持續讓模型的權重改變 → 搞爛模型
○ 基本上只能防止把模型搞爛,不會把一個本來就跑不起來的模型救回來
19
Wx
Loss
/
Error
Validation
Train
Issue – overfitting
: Regularization
● 限制 weights 的大小 – 使得 output values 不會因為 inputs 的微小變化造成劇烈的改變
20
wi 較小 ➔ Δxi 對 ̂
y 造成的影響(Δ̂
y)較小
➔ 對 input 變化比較不敏感 ➔ Generalization 好
Issue – overfitting
: Regularization
● 限制 weights 的大小 – 使得 output values 不會因為 inputs 的微小變化造成劇烈的改變
● L2-norm 比 L1-norm 更常使用
21
Cost = Loss + α * Reg
𝐿1 = ෍
𝑖=1
𝑁
|𝑊𝑖|
𝐿2 = ෍
𝑖=1
𝑁
|𝑊𝑖|2
- (Lasso)
- (Ridge)
Issue – overfitting
: Regularization
22
In pyTorch
In scikit-learn
In XGBoost In CatBoost
Issue - evaluation metrics
: How to convince reader A model is better than B
● 在大多 CS 的論文中常有的問題:Performance 比之前好一點點?到底是運氣好還是真的有效
23
https://arxiv.org/pdf/1608.06993.pdf
Issue - evaluation metrics
: How to convince reader A model is better than B
24
https://arxiv.org/pdf/2105.11293.pdf
https://arxiv.org/pdf/1911.06667.pdf
Issue - evaluation metrics
: How to convince reader A model is better than B
● In many medical journals
25
https://pubmed.ncbi.nlm.nih.gov/30312179/
https://pubmed.ncbi.nlm.nih.gov/32140566/
Issue - evaluation metrics
: How to convince reader A model is better than B
26
● Estimation of confidence interval and its “significance” (NOTE: “significant” 這個詞請千萬不要亂用)
○ 每一次實驗都會有一筆結果 (ex. Acc, AUC, Recall, mAP, …)
■ 在跑 N 次實驗後,使用統計方法計算
○ 公式解
○ 模擬解
Estimation of confidence interval
: Basic statistics recap
27
中央極限定理 (Central Limit Theorem)
由一具有平均數 μ,標準差 σ 的母體中抽取樣本大小為 n 的簡
單隨機樣本,當樣本大小 n 夠大時,樣本平均數的抽樣分配會
近似於常態分配。
Population distribution, Sample distribution, and Sampling distribution
Sampling distribution (of the mean)
從一個分布中隨機抽樣一筆資料,
該數會有多少機率落在 a – b 之
間 (~68% 落在 1 個標準差內;
~95% 落在 2 個標準差內)
Estimation of confidence interval
● Hypothesis testing and Interval Estimation
Example :
某實驗中,兩組白老鼠注射後 (一組有打藥;另一組
打食鹽水),某測量的生理指標如下
GroupA: 86,72,74,85,76,79,82,83,83,79,82
GroupB: 81,77,63,75,69,86,81,60
問該藥是否對某生理指標有影響?
● Null hypothesis (H0): μA = μB
● t-test:
● Confidence estimation:
○ Reject H0 if intervals have no overlaps 28
Estimation of confidence interval
● Different metrics may require different testing
29
Estimation of confidence interval
● Different metrics may require different testing
30
Estimation of confidence interval
● 終極招數 Bootstrapping
- 統計值的抽樣分布逼近常態
● 操作步驟
1. 從現有樣本群中,以抽後放回方法抽取
N 個樣本
2. 從這個樣本分布中,計算目標統計值
3. 重複上述 1, 2 M 次,得到統計值的抽樣
分布
4. 將該抽樣分布排序
5. 2.5 與 97.5 分位 (2.5%, 97.5%) 的數值
即為 95% CI (50% 為平均數)
6. 收工
31
General workflow of machine learning process
Step1: 定義問題
Step2: 蒐集 & 清理資料
Step3: 選擇 & 建立模型
Step4: 評估關鍵指標
Step5: 做一份好看的簡報
32
還有沒有其他方法可以讓
模型有機會表現更好?
Hyper-parameter tuning
● 當你確定模型可行,為了擠出最後一丁點
model performance 的時候,不妨試看看
○ 比方說 …
■ 打比賽
■ 等一下要跟老闆 meeting, 但已
經沒梗了
33
Hyper-parameter tuning
● 當你確定模型可行,為了擠出最後一
丁點 model performance 的時候,不妨試
看看
○ 比方說 …
■ 打比賽
■ 等一下要跟老闆 meeting, 但已
經沒梗了
34
Hyper-parameter tuning
● 當你確定模型可行,為了擠出最後一
丁點 model performance 的時候,不妨試
看看
○ 比方說 …
■ 打比賽
■ 等一下要跟老闆 meeting, 但已
經沒梗了
35
Hyper-parameter tuning
● Many packages can help you tune hyper-parameters
36
● Standard searching methods
Today NOT Going To Cover
● Tree-based methods
○ Decision Tree, Random Forest, GBM, XGBoost
● Some reference for you to study
○ Decision Tree
○ Bagging:Learn from bootstrap with samples (Trees are independent)
■ Random Forest
○ Boosting:Additive learning (Use later trees to cover errors from previous trees)
■ GBM
○ Combined
■ XGBoost
■ LightGBM & CatBoost
● You can play around with sample codes
○ https://github.com/Kaminyou/110-1-NTU-DBME5028/tree/main/week5-machine_learning 37

More Related Content

Similar to NTU DBME5028 Week5 Introduction to Machine Learning

Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
Data Science With Python
Data Science With PythonData Science With Python
Data Science With PythonMosky Liu
 
Fixfindprodissues
FixfindprodissuesFixfindprodissues
FixfindprodissuesDave Stokes
 
Fixfindprodissues
FixfindprodissuesFixfindprodissues
FixfindprodissuesDave Stokes
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06Fabrice Trollet
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Machine Learning Presentation
Machine Learning PresentationMachine Learning Presentation
Machine Learning PresentationSk Samiul Islam
 

Similar to NTU DBME5028 Week5 Introduction to Machine Learning (20)

Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Data Science With Python
Data Science With PythonData Science With Python
Data Science With Python
 
Fixfindprodissues
FixfindprodissuesFixfindprodissues
Fixfindprodissues
 
Fixfindprodissues
FixfindprodissuesFixfindprodissues
Fixfindprodissues
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
ML MODULE 5.pdf
ML MODULE 5.pdfML MODULE 5.pdf
ML MODULE 5.pdf
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
Traditional Testing vs MaTeLo Model-Based Testing Tool v2.06
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Machine Learning Presentation
Machine Learning PresentationMachine Learning Presentation
Machine Learning Presentation
 
920 plenary elder
920 plenary elder920 plenary elder
920 plenary elder
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
 

More from Sean Yu

AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineSean Yu
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingSean Yu
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningSean Yu
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Sean Yu
 
Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOnSean Yu
 
[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examples[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examplesSean Yu
 
[Python - Deep Learning] Data generator
[Python - Deep Learning] Data generator[Python - Deep Learning] Data generator
[Python - Deep Learning] Data generatorSean Yu
 
日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講Sean Yu
 
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅Sean Yu
 
R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探Sean Yu
 

More from Sean Yu (10)

AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)
 
Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
 
[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examples[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examples
 
[Python - Deep Learning] Data generator
[Python - Deep Learning] Data generator[Python - Deep Learning] Data generator
[Python - Deep Learning] Data generator
 
日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講
 
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
 
R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

NTU DBME5028 Week5 Introduction to Machine Learning

  • 1. Introduction to Machine Learning Learn from Hands-on Wei-Hsiang, Yu Data Scientist, aetherAI 2021. Fall
  • 2. Recap – Core idea of machine learning Field of study that gives computers the ability to learn without being explicitly programmed. - Arthur Lee Samuel, 1959 『 』 2
  • 3. General workflow of machine learning process Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 3
  • 4. General workflow of machine learning process ● 醫學 ”影像” 常見問題:分類、偵測、分割 Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 4
  • 5. General workflow of machine learning process ● 還有很多其他類型的問題不在討論範圍內 Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 5
  • 6. General workflow of machine learning process Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 ● 哪裡有資料 ○ Kaggle ○ Grand-Challenges ○ Papers (https://paperswithcode.com/datasets ) ○ 自己蒐集! 6
  • 7. General workflow of machine learning process Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 ● 哪裡有資料 ○ Kaggle ○ Grand-Challenges ○ Papers ○ 自己蒐集! For course demo 7
  • 8. General workflow of machine learning process Step2: 蒐集 & 清理資料 1. 根據你預計進行的模型轉換輸入格式 2. 正規化輸入 3. Train – Test Split 4. 改變輸出格式 (It depends) 8
  • 9. Data Normalization (re-scale) ● 最好、保證、千萬要做 rescale w1 w2 w1 w2 9 W2 的修正(ΔW)對於 loss 的影響比較大 ● 影響訓練的過程 ○ 不同 scale 的 weights 修正時會需要不 同的 learning rates ■ 不用 adaptive learning rate 是做 不好的 ○ 在同個 scale 下,loss 的等高線會較接 近圓形 ➔ gradient 的方向會指向圓心 (最低點)
  • 10. General workflow of machine learning process Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 ● 顯然我們今天是個分類問題 10
  • 11. General workflow of machine learning process Step3: 選擇 & 建立模型 ● 分類模型有很多種:Logistic Regression, SVM, Decision Tree, XGBoost, …, 先挑個基本款把流程建起來 ● 基本元素 ○ 模型 ○ 預測 ○ 評估 11 Step4: 評估關鍵指標
  • 12. General workflow of machine learning process ● Important metric for classification task ○ Area Under Receiver-Operating-Curve (AUROC, ROC):Ability for your model to pull the target distribution from noise distribution. 12 Step4: 評估關鍵指標
  • 14. Issue: overfitting? ● 通常也會反應在 evaluation metrices 上 14 Training Iterations Loss / Error Training Error Testing Error
  • 15. Issue: overfitting? 15 training set validation set Overfitting?
  • 16. Issue: overfitting? 16 training set validation set Overfitting? – Not a problem in this case
  • 17. Issue – overfitting : 假設之後遇到怎麼辦 General idea for overfitting - Find ways to screw up you model! Common ways to handle overfitting ● Train / Test Split ● EarlyStopping ● Regularization ● Data augmentation ● Maybe imbalance data? ● Modify loss function ● … 17
  • 18. Issue – overfitting : 假設之後遇到怎麼辦 General idea for overfitting - Find ways to screw up you model! Common ways to handle overfitting ● Train / Test Split ● EarlyStopping ● Regularization ● Data augmentation ● Maybe imbalance data? ● Modify loss function ● … 18 Not cover today
  • 19. Issue – overfitting : Earlystopping ● 在一些比較極端的例子當中, 早點停下來可能比較有利 ○ loss 會持續讓模型的權重改變 → 搞爛模型 ○ 基本上只能防止把模型搞爛,不會把一個本來就跑不起來的模型救回來 19 Wx Loss / Error Validation Train
  • 20. Issue – overfitting : Regularization ● 限制 weights 的大小 – 使得 output values 不會因為 inputs 的微小變化造成劇烈的改變 20 wi 較小 ➔ Δxi 對 ̂ y 造成的影響(Δ̂ y)較小 ➔ 對 input 變化比較不敏感 ➔ Generalization 好
  • 21. Issue – overfitting : Regularization ● 限制 weights 的大小 – 使得 output values 不會因為 inputs 的微小變化造成劇烈的改變 ● L2-norm 比 L1-norm 更常使用 21 Cost = Loss + α * Reg 𝐿1 = ෍ 𝑖=1 𝑁 |𝑊𝑖| 𝐿2 = ෍ 𝑖=1 𝑁 |𝑊𝑖|2 - (Lasso) - (Ridge)
  • 22. Issue – overfitting : Regularization 22 In pyTorch In scikit-learn In XGBoost In CatBoost
  • 23. Issue - evaluation metrics : How to convince reader A model is better than B ● 在大多 CS 的論文中常有的問題:Performance 比之前好一點點?到底是運氣好還是真的有效 23 https://arxiv.org/pdf/1608.06993.pdf
  • 24. Issue - evaluation metrics : How to convince reader A model is better than B 24 https://arxiv.org/pdf/2105.11293.pdf https://arxiv.org/pdf/1911.06667.pdf
  • 25. Issue - evaluation metrics : How to convince reader A model is better than B ● In many medical journals 25 https://pubmed.ncbi.nlm.nih.gov/30312179/ https://pubmed.ncbi.nlm.nih.gov/32140566/
  • 26. Issue - evaluation metrics : How to convince reader A model is better than B 26 ● Estimation of confidence interval and its “significance” (NOTE: “significant” 這個詞請千萬不要亂用) ○ 每一次實驗都會有一筆結果 (ex. Acc, AUC, Recall, mAP, …) ■ 在跑 N 次實驗後,使用統計方法計算 ○ 公式解 ○ 模擬解
  • 27. Estimation of confidence interval : Basic statistics recap 27 中央極限定理 (Central Limit Theorem) 由一具有平均數 μ,標準差 σ 的母體中抽取樣本大小為 n 的簡 單隨機樣本,當樣本大小 n 夠大時,樣本平均數的抽樣分配會 近似於常態分配。 Population distribution, Sample distribution, and Sampling distribution Sampling distribution (of the mean) 從一個分布中隨機抽樣一筆資料, 該數會有多少機率落在 a – b 之 間 (~68% 落在 1 個標準差內; ~95% 落在 2 個標準差內)
  • 28. Estimation of confidence interval ● Hypothesis testing and Interval Estimation Example : 某實驗中,兩組白老鼠注射後 (一組有打藥;另一組 打食鹽水),某測量的生理指標如下 GroupA: 86,72,74,85,76,79,82,83,83,79,82 GroupB: 81,77,63,75,69,86,81,60 問該藥是否對某生理指標有影響? ● Null hypothesis (H0): μA = μB ● t-test: ● Confidence estimation: ○ Reject H0 if intervals have no overlaps 28
  • 29. Estimation of confidence interval ● Different metrics may require different testing 29
  • 30. Estimation of confidence interval ● Different metrics may require different testing 30
  • 31. Estimation of confidence interval ● 終極招數 Bootstrapping - 統計值的抽樣分布逼近常態 ● 操作步驟 1. 從現有樣本群中,以抽後放回方法抽取 N 個樣本 2. 從這個樣本分布中,計算目標統計值 3. 重複上述 1, 2 M 次,得到統計值的抽樣 分布 4. 將該抽樣分布排序 5. 2.5 與 97.5 分位 (2.5%, 97.5%) 的數值 即為 95% CI (50% 為平均數) 6. 收工 31
  • 32. General workflow of machine learning process Step1: 定義問題 Step2: 蒐集 & 清理資料 Step3: 選擇 & 建立模型 Step4: 評估關鍵指標 Step5: 做一份好看的簡報 32 還有沒有其他方法可以讓 模型有機會表現更好?
  • 33. Hyper-parameter tuning ● 當你確定模型可行,為了擠出最後一丁點 model performance 的時候,不妨試看看 ○ 比方說 … ■ 打比賽 ■ 等一下要跟老闆 meeting, 但已 經沒梗了 33
  • 34. Hyper-parameter tuning ● 當你確定模型可行,為了擠出最後一 丁點 model performance 的時候,不妨試 看看 ○ 比方說 … ■ 打比賽 ■ 等一下要跟老闆 meeting, 但已 經沒梗了 34
  • 35. Hyper-parameter tuning ● 當你確定模型可行,為了擠出最後一 丁點 model performance 的時候,不妨試 看看 ○ 比方說 … ■ 打比賽 ■ 等一下要跟老闆 meeting, 但已 經沒梗了 35
  • 36. Hyper-parameter tuning ● Many packages can help you tune hyper-parameters 36 ● Standard searching methods
  • 37. Today NOT Going To Cover ● Tree-based methods ○ Decision Tree, Random Forest, GBM, XGBoost ● Some reference for you to study ○ Decision Tree ○ Bagging:Learn from bootstrap with samples (Trees are independent) ■ Random Forest ○ Boosting:Additive learning (Use later trees to cover errors from previous trees) ■ GBM ○ Combined ■ XGBoost ■ LightGBM & CatBoost ● You can play around with sample codes ○ https://github.com/Kaminyou/110-1-NTU-DBME5028/tree/main/week5-machine_learning 37