2. Prcomp
Let‟s start with prcomp which is much easier to implement.
#for this example we will use a built-in R dataset
>require (graphics)
>prc<-prcomp(USArrests, scale=TRUE)
#scale=TRUE indicates transform all to a common scale which is also
#help PCA to work faster
>USArrests
>head(scale(USArrests))
#We will observe it has brought all the data into a common scale and
becomes lot more easier to compare
Rupak Roy
3. Principal Component Analysis
>summary(prc)
#We will observe it gave „Importance of components‟ where we have 4
features from our dataset as principal component „PC1, PC2, PC3, PC4‟
Now the Proportion of Variance tells us the proportion of variance that each
principal component has, PC1 contains the highest variance then followed
by PC2 and the remaining PC3 and PC4 is way to less as compare to PC1
and 2
We can easily understand this and also how much each variable is
contributing in explaining the dataset with the Cumulative Proportion where
PC1 contains 62% of all the variance then followed by PC2 24%. So together
they contain 86% which is 3 quarters of all the variance in the entire dataset.
So now we can work with the first 2 variables PC1 and PC2 and ignore the
other ones which contains very less information in explaining the dataset.
To understand this we can visualize the Principal components using plot()
plot(prc)
Rupak Roy
4. Principal Component Analysis
Another way is by getting the eigen value of each component
>names(prc)
[1] “sdev”, “rotation” “center” “scale” “x”
#To get the eigen value we need to get the square root of stand
deviation ‘sdev’
>prc$sdev^2
[1] 2.4802416 0.9897 0.35656 0.17343
The values that has score of 1 and above are considered to be
important and we can observe again here also the first 2 components
are proving to be important.
0.98 is also = 1 if we remove the decimal.
Rupak Roy
5. Principal Component Analysis
>biplot(prc)
This is an interesting it will give you how the features are correlated to
each other
In the plot Murder, Assault, Rape are close to
each to other as their related to crime and
urban population is not but still some however
positively correlated with the crimes
indicating this crimes Murder, Assault, Rape
in urban population.
So in urban population like California,
Colorado have more crime than
West Virginia, South Dakota which is
negatively correlated
The same if look for Rape there are
more Rapes in Michigan, Texas and Missouri than Georgia, Alaska, Maryland.
Rupak Roy
6. Principal Component Analysis
We can confirm it by viewing the dataset in a decending order
>USArrests[order(USArrests$UrbanPop, decreasing = TRUE),]
#here it is California is in the top of urban population as seen in our plot.
7. Principal Component Analysis
#We can get the same results using our second function "Princomp“
>prin<-princomp(USArrests, cor=TRUE, scale=TRUE)
>plot(prin)
>biplot(prin)
#again we will the save result