Week 12 Dimensionality Reduction Bagian 1

•Download as PPTX, PDF•

0 likes•2 views

khairulhuda242

KNN

Education

Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Dimensionality Reduction:
Principal Component Analysis
Week 12
1604C055 - Machine Learning

Dimensionality reduction
• Dimensionality reduction is a process to transform data from a high-
dimensional space into new data in a low-dimensional space such that
the new data still has some meaningful properties of the original data.
• A high-dimensional data in machine learning leads to:
– High computational demands
– Low generalization performance
– Poor error estimates
• Some techniques:
– Principal component analysis (PCA)
– Linear discriminant analysis (LDA)
– Deep Learning: Autoencoders

Principal component analysis (PCA)
• PCA is a statistical techniques used to reduce the dimensions of
data/variables/features without losing the intrinsic information
contained in the original data.
• PCA is categorized as unsupervised learning
• PCA works by transforming the original variables into new variables,
called principal components
• Principal components:
– Uncorrelated variables
– Ordered such that the first few principal components retain the most
variation in the original variables

Principal component analysis (PCA)
PC1
PC2

Principal component analysis (PCA)
• Transformation from 2D to 1D:
– Green: without PCA
– Blue: with PCA
• Transformation without PCA
causes the new data close to
each other.
• Transformation with PCA
increase the distance of each
data
PC

Covariance: example
No.
1 4 3
2 1 9
3 4 7
4 8 2
5 9 3
6 7 -2
7 5 4
8 3 4
9 3 2
10 9 -1

Eigenvalue and eigenvector of
covariance matrix
•

Eigenvalue and eigenvector of
covariance matrix: example
•

Transform to PC coordinate system:
example
No.
1 4 3 -0.72945009
2 1 9 -7.29463721
3 4 7 -3.86347132
4 8 2 2.53959559
5 9 3 2.37747538
6 7 -2 5.05223172
7 5 4 -0.8915703
8 3 4 -2.13434049
9 3 2 -0.56732988
10 9 -1 5.5114966

Transform to PC coordinate system:
example
•

Scree plot
Find the "elbow" of the graph where
the eigenvalues seem to level off is
found .Components to the left of this
point should be retained as
significant
Elbow

PCA in Python with
sklearn.decomposition.PCA

PCA in Python with
sklearn.decomposition.PCA
Elbow

Assignment
• Download dataset here:
https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C
-3KyheV?usp=sharing
• This is digit mnist dataset which contain images of handwritten digits
(range from 0-4). The distribution of digit label:
– digit 0-3: 100 for each digit
– digit 4: 200
• Code in the next slide is provided to read dataset where the final
output is a matrix “original_data” (row is for the number of image
being read, 600 images, and column is for image features, which is
from image pixels = 784 pixels = 28 pixels × 28 pixels).

Assignment
• Perform PCA to reduce the dimension of dataset from 784 D to any
number of dimension that would give the optimal result. Save it to
matrix “reduced_data”.
• Choose the best classification algorithm that you think would give
the best result to predict the digit label.
• Perform classification for both “original_data” and “reduced_data”
using the same classification algorithm chosen before. Compare the
result for both of them.

Assignment
• You could perform any data pre-processing techniques to the
dataset before used to train the model such that the best model is
obtained.
• Before feeding to classifier, split the dataset into training and testing
data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and
ratio of 70%:30% for training:testing data.
• Evaluate the model using accuracy and F1 Score (weighted).
• State your conclusion.

Similar to Week 12 Dimensionality Reduction Bagian 1

Kcc201728apr2017 170828235330JEE HYUN PARK

Computer Vision for BeginnersSanghamitra Deb

General Tips for participating Kaggle CompetitionsMark Peng

NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS

House Sale Price Predictionsriram30691

Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann

Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha

Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

Ml10 dimensionality reduction-and_advanced_topicsankit_ppt

04-Data-Analysis-Overview.pptxShree Shree

Neural Network Part-2Venkata Reddy Konasani

Nimrita deep learningNimrita Koul

30thSep2014Mia liu

Pre-Processing and Data PreparationUmair Shafique

1440 track 2 boire_using our laptopRising Media, Inc.

Dadm (lys)SameeriMamillapalli

Guiding through a typical Machine Learning PipelineMichael Gerke

Machine Learning With ML.NETDev Raj Gautam

Unsupervised Learning Clustering KMean and Hirarchical.pptxFaridAliMousa1

Similar to Week 12 Dimensionality Reduction Bagian 1 (20)

Kcc201728apr2017 170828235330

Computer Vision for Beginners

General Tips for participating Kaggle Competitions

NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector

House Sale Price Prediction

Predicting Moscow Real Estate Prices with Azure Machine Learning

Kaggle Higgs Boson Machine Learning Challenge

Ml10 dimensionality reduction-and_advanced_topics

04-Data-Analysis-Overview.pptx

Neural Network Part-2

Nimrita deep learning

30thSep2014

Pre-Processing and Data Preparation

1440 track 2 boire_using our laptop

Dadm (lys)

Guiding through a typical Machine Learning Pipeline

Machine Learning With ML.NET

Unsupervised Learning Clustering KMean and Hirarchical.pptx

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

Introduction to AI in Higher Education_draft.pptxpboyjonauth

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Proudly South Africa powerpoint Thorisha.pptxthorishapillay1

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

The Most Excellent Way | 1 Corinthians 13Steve Thomason

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton

_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon

Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos

Alper Gobel In Media Res Media ComponentInMediaRes1

How to Configure Email Server in Odoo 17Celine George

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a

Presiding Officer Training module 2024 lok sabha electionsanshu789521

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR

Paris 2024 Olympic Geographies - an activityGeoBlogs

Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN

internship ppt on smartinternz platform as salesforce developerunnathinaik

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Introduction to AI in Higher Education_draft.pptx

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf

Proudly South Africa powerpoint Thorisha.pptx

How to Make a Pirate ship Primary Education.pptx

The Most Excellent Way | 1 Corinthians 13

A Critique of the Proposed National Education Policy Reform

Science 7 - LAND and SEA BREEZE and its Characteristics

_Math 4-Q4 Week 5.pptx Steps in Collecting Data

Final demo Grade 9 for demo Plan dessert.pptx

Alper Gobel In Media Res Media Component

How to Configure Email Server in Odoo 17

Hybridoma Technology ( Production , Purification , and Application )

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf

Presiding Officer Training module 2024 lok sabha elections

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️

Paris 2024 Olympic Geographies - an activity

Solving Puzzles Benefits Everyone (English).pptx

internship ppt on smartinternz platform as salesforce developer

Week 12 Dimensionality Reduction Bagian 1

1. Program Studi Teknik Informatika Fakultas Teknik – Universitas Surabaya Dimensionality Reduction: Principal Component Analysis Week 12 1604C055 - Machine Learning

2. Dimensionality reduction • Dimensionality reduction is a process to transform data from a high- dimensional space into new data in a low-dimensional space such that the new data still has some meaningful properties of the original data. • A high-dimensional data in machine learning leads to: – High computational demands – Low generalization performance – Poor error estimates • Some techniques: – Principal component analysis (PCA) – Linear discriminant analysis (LDA) – Deep Learning: Autoencoders

3. Principal component analysis (PCA) • PCA is a statistical techniques used to reduce the dimensions of data/variables/features without losing the intrinsic information contained in the original data. • PCA is categorized as unsupervised learning • PCA works by transforming the original variables into new variables, called principal components • Principal components: – Uncorrelated variables – Ordered such that the first few principal components retain the most variation in the original variables

4. Principal component analysis (PCA) PC1 PC2

5. Principal component analysis (PCA) • Transformation from 2D to 1D: – Green: without PCA – Blue: with PCA • Transformation without PCA causes the new data close to each other. • Transformation with PCA increase the distance of each data PC

6. Reduce data from 2D to 1D Andrew

7. Reduce data from 3D to 2D

8. PCA Algorithm •

9. Covariance •

10. Covariance

11. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1

12. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1

13. Eigenvalue and eigenvector •

14. Eigenvalue and eigenvector: example •

15. Eigenvalue and eigenvector: example •

16. Eigenvalue and eigenvector: example •

17. Eigenvalue and eigenvector of covariance matrix •

18. Eigenvalue and eigenvector of covariance matrix •

19. Eigenvalue and eigenvector of covariance matrix: example •

20. Transform to PC coordinate system •

21. Transform to PC coordinate system: example No. 1 4 3 -0.72945009 2 1 9 -7.29463721 3 4 7 -3.86347132 4 8 2 2.53959559 5 9 3 2.37747538 6 7 -2 5.05223172 7 5 4 -0.8915703 8 3 4 -2.13434049 9 3 2 -0.56732988 10 9 -1 5.5114966

22. Transform to PC coordinate system: example •

23. Choosing the number of PCs •

24. Scree plot Find the "elbow" of the graph where the eigenvalues seem to level off is found .Components to the left of this point should be retained as significant Elbow

25. PCA in Python with numpy

26. PCA in Python with numpy

27. PCA in Python with numpy

28. PCA in Python with sklearn.decomposition.PCA

29. PCA in Python with sklearn.decomposition.PCA

30. PCA in Python with sklearn.decomposition.PCA Elbow

31. PCA in Python with sklearn.decomposition.PCA

32. Assignment • Download dataset here: https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C -3KyheV?usp=sharing • This is digit mnist dataset which contain images of handwritten digits (range from 0-4). The distribution of digit label: – digit 0-3: 100 for each digit – digit 4: 200 • Code in the next slide is provided to read dataset where the final output is a matrix “original_data” (row is for the number of image being read, 600 images, and column is for image features, which is from image pixels = 784 pixels = 28 pixels × 28 pixels).

33.

34. Assignment • Perform PCA to reduce the dimension of dataset from 784 D to any number of dimension that would give the optimal result. Save it to matrix “reduced_data”. • Choose the best classification algorithm that you think would give the best result to predict the digit label. • Perform classification for both “original_data” and “reduced_data” using the same classification algorithm chosen before. Compare the result for both of them.

35. Assignment • You could perform any data pre-processing techniques to the dataset before used to train the model such that the best model is obtained. • Before feeding to classifier, split the dataset into training and testing data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and ratio of 70%:30% for training:testing data. • Evaluate the model using accuracy and F1 Score (weighted). • State your conclusion.

Week 12 Dimensionality Reduction Bagian 1

Recommended

Recommended

More Related Content

Similar to Week 12 Dimensionality Reduction Bagian 1

Similar to Week 12 Dimensionality Reduction Bagian 1 (20)

Recently uploaded

Recently uploaded (20)

Week 12 Dimensionality Reduction Bagian 1