SlideShare a Scribd company logo
1 of 35
Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Dimensionality Reduction:
Principal Component Analysis
Week 12
1604C055 - Machine Learning
Dimensionality reduction
• Dimensionality reduction is a process to transform data from a high-
dimensional space into new data in a low-dimensional space such that
the new data still has some meaningful properties of the original data.
• A high-dimensional data in machine learning leads to:
– High computational demands
– Low generalization performance
– Poor error estimates
• Some techniques:
– Principal component analysis (PCA)
– Linear discriminant analysis (LDA)
– Deep Learning: Autoencoders
Principal component analysis (PCA)
• PCA is a statistical techniques used to reduce the dimensions of
data/variables/features without losing the intrinsic information
contained in the original data.
• PCA is categorized as unsupervised learning
• PCA works by transforming the original variables into new variables,
called principal components
• Principal components:
– Uncorrelated variables
– Ordered such that the first few principal components retain the most
variation in the original variables
Principal component analysis (PCA)
PC1
PC2
Principal component analysis (PCA)
• Transformation from 2D to 1D:
– Green: without PCA
– Blue: with PCA
• Transformation without PCA
causes the new data close to
each other.
• Transformation with PCA
increase the distance of each
data
PC
Reduce data from 2D to 1D
Andrew
Reduce data from 3D to 2D
PCA Algorithm
•
Covariance
•
Covariance
Covariance: example
No.
1 4 3
2 1 9
3 4 7
4 8 2
5 9 3
6 7 -2
7 5 4
8 3 4
9 3 2
10 9 -1
Covariance: example
No.
1 4 3
2 1 9
3 4 7
4 8 2
5 9 3
6 7 -2
7 5 4
8 3 4
9 3 2
10 9 -1
Eigenvalue and eigenvector
•
Eigenvalue and eigenvector: example
•
Eigenvalue and eigenvector: example
•
Eigenvalue and eigenvector: example
•
Eigenvalue and eigenvector of
covariance matrix
•
Eigenvalue and eigenvector of
covariance matrix
•
Eigenvalue and eigenvector of
covariance matrix: example
•
Transform to PC coordinate system
•
Transform to PC coordinate system:
example
No.
1 4 3 -0.72945009
2 1 9 -7.29463721
3 4 7 -3.86347132
4 8 2 2.53959559
5 9 3 2.37747538
6 7 -2 5.05223172
7 5 4 -0.8915703
8 3 4 -2.13434049
9 3 2 -0.56732988
10 9 -1 5.5114966
Transform to PC coordinate system:
example
•
Choosing the number of PCs
•
Scree plot
Find the "elbow" of the graph where
the eigenvalues seem to level off is
found .Components to the left of this
point should be retained as
significant
Elbow
PCA in Python with numpy
PCA in Python with numpy
PCA in Python with numpy
PCA in Python with
sklearn.decomposition.PCA
PCA in Python with
sklearn.decomposition.PCA
PCA in Python with
sklearn.decomposition.PCA
Elbow
PCA in Python with
sklearn.decomposition.PCA
Assignment
• Download dataset here:
https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C
-3KyheV?usp=sharing
• This is digit mnist dataset which contain images of handwritten digits
(range from 0-4). The distribution of digit label:
– digit 0-3: 100 for each digit
– digit 4: 200
• Code in the next slide is provided to read dataset where the final
output is a matrix “original_data” (row is for the number of image
being read, 600 images, and column is for image features, which is
from image pixels = 784 pixels = 28 pixels × 28 pixels).
Assignment
• Perform PCA to reduce the dimension of dataset from 784 D to any
number of dimension that would give the optimal result. Save it to
matrix “reduced_data”.
• Choose the best classification algorithm that you think would give
the best result to predict the digit label.
• Perform classification for both “original_data” and “reduced_data”
using the same classification algorithm chosen before. Compare the
result for both of them.
Assignment
• You could perform any data pre-processing techniques to the
dataset before used to train the model such that the best model is
obtained.
• Before feeding to classifier, split the dataset into training and testing
data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and
ratio of 70%:30% for training:testing data.
• Evaluate the model using accuracy and F1 Score (weighted).
• State your conclusion.

More Related Content

Similar to Week 12 Dimensionality Reduction Bagian 1

Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330JEE HYUN PARK
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data PreparationUmair Shafique
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxFaridAliMousa1
 

Similar to Week 12 Dimensionality Reduction Bagian 1 (20)

Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Dadm (lys)
Dadm (lys)Dadm (lys)
Dadm (lys)
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 

Week 12 Dimensionality Reduction Bagian 1

  • 1. Program Studi Teknik Informatika Fakultas Teknik – Universitas Surabaya Dimensionality Reduction: Principal Component Analysis Week 12 1604C055 - Machine Learning
  • 2. Dimensionality reduction • Dimensionality reduction is a process to transform data from a high- dimensional space into new data in a low-dimensional space such that the new data still has some meaningful properties of the original data. • A high-dimensional data in machine learning leads to: – High computational demands – Low generalization performance – Poor error estimates • Some techniques: – Principal component analysis (PCA) – Linear discriminant analysis (LDA) – Deep Learning: Autoencoders
  • 3. Principal component analysis (PCA) • PCA is a statistical techniques used to reduce the dimensions of data/variables/features without losing the intrinsic information contained in the original data. • PCA is categorized as unsupervised learning • PCA works by transforming the original variables into new variables, called principal components • Principal components: – Uncorrelated variables – Ordered such that the first few principal components retain the most variation in the original variables
  • 5. Principal component analysis (PCA) • Transformation from 2D to 1D: – Green: without PCA – Blue: with PCA • Transformation without PCA causes the new data close to each other. • Transformation with PCA increase the distance of each data PC
  • 6. Reduce data from 2D to 1D Andrew
  • 7. Reduce data from 3D to 2D
  • 11. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1
  • 12. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1
  • 17. Eigenvalue and eigenvector of covariance matrix •
  • 18. Eigenvalue and eigenvector of covariance matrix •
  • 19. Eigenvalue and eigenvector of covariance matrix: example •
  • 20. Transform to PC coordinate system •
  • 21. Transform to PC coordinate system: example No. 1 4 3 -0.72945009 2 1 9 -7.29463721 3 4 7 -3.86347132 4 8 2 2.53959559 5 9 3 2.37747538 6 7 -2 5.05223172 7 5 4 -0.8915703 8 3 4 -2.13434049 9 3 2 -0.56732988 10 9 -1 5.5114966
  • 22. Transform to PC coordinate system: example •
  • 23. Choosing the number of PCs •
  • 24. Scree plot Find the "elbow" of the graph where the eigenvalues seem to level off is found .Components to the left of this point should be retained as significant Elbow
  • 25. PCA in Python with numpy
  • 26. PCA in Python with numpy
  • 27. PCA in Python with numpy
  • 28. PCA in Python with sklearn.decomposition.PCA
  • 29. PCA in Python with sklearn.decomposition.PCA
  • 30. PCA in Python with sklearn.decomposition.PCA Elbow
  • 31. PCA in Python with sklearn.decomposition.PCA
  • 32. Assignment • Download dataset here: https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C -3KyheV?usp=sharing • This is digit mnist dataset which contain images of handwritten digits (range from 0-4). The distribution of digit label: – digit 0-3: 100 for each digit – digit 4: 200 • Code in the next slide is provided to read dataset where the final output is a matrix “original_data” (row is for the number of image being read, 600 images, and column is for image features, which is from image pixels = 784 pixels = 28 pixels × 28 pixels).
  • 33.
  • 34. Assignment • Perform PCA to reduce the dimension of dataset from 784 D to any number of dimension that would give the optimal result. Save it to matrix “reduced_data”. • Choose the best classification algorithm that you think would give the best result to predict the digit label. • Perform classification for both “original_data” and “reduced_data” using the same classification algorithm chosen before. Compare the result for both of them.
  • 35. Assignment • You could perform any data pre-processing techniques to the dataset before used to train the model such that the best model is obtained. • Before feeding to classifier, split the dataset into training and testing data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and ratio of 70%:30% for training:testing data. • Evaluate the model using accuracy and F1 Score (weighted). • State your conclusion.