PCA is an unsupervised learning technique used to reduce the dimensionality of large data sets by transforming the data to a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is commonly used for applications like dimensionality reduction, data compression, and visualization. The document discusses PCA algorithms and applications of PCA in domains like face recognition, image compression, and noise filtering.
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Machine Learning in 5 Minutes— ClassificationBrian Lange
Slides from a lightning talk on classification methods, originally given at Open Source Open Mic Chicago 01/2016. Yes, I know I left things you. You try covering this in 5 minutes.
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Machine Learning in 5 Minutes— ClassificationBrian Lange
Slides from a lightning talk on classification methods, originally given at Open Source Open Mic Chicago 01/2016. Yes, I know I left things you. You try covering this in 5 minutes.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Linear regression with gradient descentSuraj Parmar
Intro to the very popular optimization Technique(Gradient descent) with linear regression . Linear regression with Gradient descent on www.landofai.com
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data-mining algorithm used to perform hierarchical clustering over, particularly large data sets.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Linear regression with gradient descentSuraj Parmar
Intro to the very popular optimization Technique(Gradient descent) with linear regression . Linear regression with Gradient descent on www.landofai.com
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data-mining algorithm used to perform hierarchical clustering over, particularly large data sets.
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
Low-rank approximations of data matrices have become an important tool in machine learning and data mining. They allow for embedding high dimensional data in lower dimensional spaces and can therefore mitigate effects due to noise, uncover latent relations, or facilitate further processing. These properties have been proven successful in many application areas such as bio-informatics, computer vision, text processing, recommender systems, social network analysis, among others. Present day technologies are characterized by exponentially growing amounts of data. Recent advances in sensor technology, internet applications, and communication networks call for methods that scale to very large and/or growing data matrices. In this talk, we will describe how to efficiently analyze data by means of matrix factorization using the Python Matrix Factorization Toolbox (PyMF) and HDF5. We will briefly cover common methods such as k-means clustering, PCA, or Archetypal Analysis which can be easily cast as a matrix decomposition, and explain their usefulness for everyday data analysis tasks.
Fixed-Point Code Synthesis for Neural Networksgerogepatton
Over the last few years, neural networks have started penetrating safety critical systems to take decisions in robots, rockets, autonomous driving car, etc. A problem is that these critical systems often have limited computing resources. Often, they use the fixed-point arithmetic for its many advantages (rapidity, compatibility with small memory devices.) In this article, a new technique is introduced to tune the formats (precision) of already trained neural networks using fixed-point arithmetic, which can be implemented using integer operations only. The new optimized neural network computes the output with fixed-point numbers without modifying the accuracy up to a threshold fixed by the user. A fixed-point code is synthesized for the new optimized neural network ensuring the respect of the threshold for any input vector belonging the range [xmin, xmax] determined during the analysis. From a technical point of view, we do a preliminary analysis of our floating neural network to determine the worst cases, then we generate a system of linear constraints among integer variables that we can solve by linear programming. The solution of this system is the new fixed-point format of each neuron. The experimental results obtained show the efficiency of our method which can ensure that the new fixed-point neural network has the same behavior as the initial floating-point neural network.
Fixed-Point Code Synthesis for Neural NetworksIJITE
Over the last few years, neural networks have started penetrating safety critical systems to take decisions in robots, rockets, autonomous driving car, etc. A problem is that these critical systems often have limited computing resources. Often, they use the fixed-point arithmetic for its many advantages (rapidity, compatibility with small memory devices.) In this article, a new technique is introduced to tune the formats (precision) of already trained neural networks using fixed-point arithmetic, which can be implemented using integer operations only. The new optimized neural network computes the output with fixed-point numbers without modifying the accuracy up to a threshold fixed by the user. A fixed-point code is synthesized for the new optimized neural network ensuring the respect of the threshold for any input vector belonging the range [xmin, xmax] determined during the analysis. From a technical point of view, we do a preliminary analysis of our floating neural network to determine the worst cases, then we generate a system of linear constraints among integer variables that we can solve by linear programming. The solution of this system is the new fixed-point format of each neuron. The experimental results obtained show the efficiency of our method which can ensure that the new fixed-point neural network has the same behavior as the initial floating-point neural network.
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
By Dmitry Storcheus (Engineer, Google Research)
Feature extraction, as usually understood, seeks an optimal transformation from raw data into features that can be used as an input for a learning algorithm. In recent times this problem has been attacked using a growing number of diverse techniques that originated in separate research communities: from PCA and LDA to manifold and metric learning. The goal of this talk is to contrast and compare feature extraction techniques coming from different machine learning areas as well as discuss the modern challenges and open problems in feature extraction. Moreover, this talk will suggest novel solutions to some of the challenges discussed, particularly to coupled feature extraction.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Neural Networks: Principal Component Analysis (PCA)
1. CHAPTERS 8
UNSUPERVISED LEARNING:
PRINCIPAL-COMPONENTS ANALYSIS (PCA)
CSC445: Neural Networks
Prof. Dr. Mostafa Gadal-Haqq M. Mostafa
Computer Science Department
Faculty of Computer & Information Sciences
AIN SHAMS UNIVERSITY
Credits: Some Slides are taken from presentations on PCA by :
1. Barnabás Póczos University of Alberta
2. Jieping Ye, http://www.public.asu.edu/~jye02
2. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Introduction
Tasks of Unsupervised Learning
What is Data Reduction?
Why we need to Reduce Data Dimensionality?
Clustering and Data Reduction
The PCA Computation
Computer Experiment
2
Outlines
3. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 3
Unsupervised Learning
In unsupervised learning, the requirement is to discover
significant patterns, or features, of the input data
through the use of unlabeled examples.
That it, the network operates according to the rule:
“Learn from examples without a teacher”
4. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
What is feature reduction?
Feature reduction refers to the mapping of the original high-dimensional data
onto a lower-dimensional space.
Criterion for feature reduction can be different based on different problem settings.
Unsupervised setting: minimize the information loss
Supervised setting: maximize the class discrimination
Given a set of data points of p variables
Compute the linear transformation (projection)
nxxx ,,, 21
)(: pdxGyxG dTpdp
6. Why feature reduction?
Most machine learning and data mining
techniques may not be effective for high-
dimensional data
Curse of Dimensionality
Query accuracy and efficiency degrade rapidly as the
dimension increases.
The intrinsic dimension may be small.
For example, the number of genes responsible for a certain
type of disease may be small.
7. Why feature reduction?
Visualization: projection of high-dimensional data
onto 2D or 3D.
Data compression: efficient storage and retrieval.
Noise removal: positive effect on query accuracy.
8. What is Principal Component Analysis?
Principal component analysis (PCA)
Reduce the dimensionality of a data set by finding a new set of
variables, smaller than the original set of variables
Retains most of the sample's information.
Useful for the compression and classification of data.
By information we mean the variation present in the
sample, given by the correlations between the original
variables.
The new variables, called principal components (PCs), are
uncorrelated, and are ordered by the fraction of the total
information each retains.
9. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
principal components (PCs)
2z
1z
1z• the 1st PC is a minimum distance fit to a line in X space
• the 2nd PC is a minimum distance fit to a line in the plane
perpendicular to the 1st PC
PCs are a series of linear least squares fits to a sample,
each orthogonal to all the previous.
10. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Algebraic definition of PCs
.,,2,1,
1
111 njxaxaz
p
i
ijij
T
p
nxxx ,,, 21
]var[ 1z
),,,(
),,,(
21
121111
pjjjj
p
xxxx
aaaa
Given a sample of n observations on a vector of p variables
define the first principal component of the sample
by the linear transformation
where the vector
is chosen such that is maximum.
11. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
To find first note that
where
is the covariance matrix.
Algebraic Derivation of the PCA
T
i
n
i
i xxxx
n
1
1
1a
11
1
11
1
2
11
2
111
1
1
))((]var[
aaaxxxxa
n
xaxa
n
zzEz
T
n
i
T
ii
T
n
i
T
i
T
mean.theis
1
1
n
i
ix
n
x
In the following, we assume the Data is centered.
0x
12. Algebraic derivation of PCs
np
nxxxX
],,,[ 21
0x
T
XX
n
S
1
Assume
Form the matrix:
then
T
VUX
Obtain eigenvectors of S by computing the SVD of X:
13. Principle Component Analysis
Orthogonal projection of data onto lower-dimension linear
space that...
maximizes variance of projected data (purple line)
minimizes mean squared distance between
data point and
projections (sum of blue lines)
PCA:
14. Principle Components Analysis
Idea:
Given data points in a d-dimensional space,
project into lower dimensional space while preserving as much
information as possible
Eg, find best planar approximation to 3D data
Eg, find best 12-D approximation to 104-D data
In particular, choose projection that
minimizes squared error
in reconstructing original data
15. Vectors originating from the center of mass
Principal component #1 points
in the direction of the largest variance.
Each subsequent principal component…
is orthogonal to the previous ones, and
points in the directions of the largest
variance of the residual subspace
The Principal Components
19. PCA algorithm I (sequential)
m
i
k
j
i
T
jji
T
k
m 1
2
1
1
1
})]({[
1
maxarg xwwxww
w
}){(
1
maxarg
1
2
i
1
1
m
i
T
m
xww
w
We maximize the
variance of the projection
in the residual subspace
We maximize the variance of projection of x
x’ PCA reconstruction
Given the centered data {x1, …, xm}, compute the principal vectors:
1st PCA vector
kth PCA vector
w1(w1
Tx)
w2(w2
Tx)
x
w1
w2
x’=w1(w1
Tx)+w2(w2
Tx)
w
20. PCA algorithm II
(sample covariance matrix)
Given data {x1, …, xm}, compute covariance matrix
PCA basis vectors = the eigenvectors of
Larger eigenvalue more important eigenvectors
m
i
T
i
m 1
))((
1
xxxx
m
i
i
m 1
1
xxwhere
21. PCA algorithm II
PCA algorithm(X, k): top k eigenvalues/eigenvectors
% X = N m data matrix,
% … each data point xi = column vector, i=1..m
•
• X subtract mean x from each column vector xi in X
• XXT … covariance matrix of X
• { i, ui }i=1..N = eigenvectors/eigenvalues of
... 1 2 … N
• Return { i, ui }i=1..k
% top k principle components
m
im 1
1
ixx
22. PCA algorithm III
(SVD of the data matrix)
Singular Value Decomposition of the centered data matrix X.
Xfeatures samples = USVT
X VTSU=
samples
significant
noise
noise
noise
significant
sig.
23. PCA algorithm III
Columns of U
the principal vectors, { u(1), …, u(k) }
orthogonal and has unit norm – so UTU = I
Can reconstruct the data using linear combinations of
{ u(1), …, u(k) }
Matrix S
Diagonal
Shows importance of each eigenvector
Columns of VT
The coefficients for reconstructing the samples
25. Challenge: Facial Recognition
Want to identify specific person, based on facial image
Robust to glasses, lighting,…
Can’t just use the given 256 x 256 pixels
26. Applying PCA: Eigenfaces
Example data set: Images of faces
Famous Eigenface approach
[Turk & Pentland], [Sirovich & Kirby]
Each face x is …
256 256 values (luminance at location)
x in 256256 (view as 64K dim vector)
Form X = [ x1 , …, xm ] centered data mtx
Compute = XXT
Problem: is 64K 64K … HUGE!!!
256x256
realvalues
m faces
X =
x1, …, xm
Method A: Build a PCA subspace for each person and check
which subspace can reconstruct the test image the best
Method B: Build one PCA database for the whole dataset and
then classify based on the weights.
27. Computational Complexity
Suppose m instances, each of size N
Eigenfaces: m=500 faces, each of size N=64K
Given NN covariance matrix , can compute
all N eigenvectors/eigenvalues in O(N3)
first k eigenvectors/eigenvalues in O(k N2)
But if N=64K, EXPENSIVE!
28. A Clever Workaround
Note that m<<64K
Use L=XTX instead of =XXT
If v is eigenvector of L
then Xv is eigenvector of
Proof: L v = v
XTX v = v
X (XTX v) = X( v) = Xv
(XXT)X v = (Xv)
Xv) = (Xv)
256x256
realvalues
m faces
X =
x1, …, xm
31. Shortcomings
Requires carefully controlled data:
All faces centered in frame
Same size
Some sensitivity to angle
Alternative:
“Learn” one set of PCA vectors for each angle
Use the one with lowest error
Method is completely knowledge free
(sometimes this is good!)
Doesn’t know that faces are wrapped around 3D objects
(heads)
Makes no effort to preserve class distinctions
39. Original Image
• Divide the original 372x492 image into patches:
• Each patch is an instance that contains 12x12 pixels on a grid
• View each as a 144-D vector
60. PCA Conclusions
PCA
finds orthonormal basis for data
Sorts dimensions in order of “importance”
Discard low significance dimensions
Uses:
Get compact description
Ignore noise
Improve classification (hopefully)
Not magic:
Doesn’t know class labels
Can only capture linear variations
One of many tricks to reduce dimensionality!
61. Applications of PCA
Eigenfaces for recognition. Turk and Pentland.
1991.
Principal Component Analysis for clustering gene
expression data. Yeung and Ruzzo. 2001.
Probabilistic Disease Classification of Expression-
Dependent Proteomic Data from Mass
Spectrometry of Human Serum. Lilien. 2003.
62. PCA for image compression
d=1 d=2 d=4 d=8
d=16 d=32 d=64 d=100
Original
Image