Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
2. Data reduction technique developed by
Hotelling H
• Main Aim
• Lower the dimensions
• Orthogonality of new (transformed) dimensions
(principal components)
6. x1
x2
Scatter plot of the data with
original axis X1 and X2
(original data)
Shift the original axis to the
center of the data(mean)
7. x1
x2
Rotate the original axis
• Rotate X1(axis 1) by some
angle such that variability
of the data along that axis
is maximum
• Rotate X2(axis 2) such that
it is perpendicular to the
first axis and variability of
the data along that axis is
second maximum
11. Red dots (projection of the original data points onto the rotating line)
The spread of the red dots will be maximum when it aligns with the pink mark(line)
12. Projection of points on to a line, the line is such that
The projected points has the greatest variability.
Projection of points on to a plane, the plane is such that
the spread of the points onto that plane is the greatest.
15. Principal Components
* First principal component is the direction of greatest
variability (covariance) in the data
* Second is the next orthogonal (uncorrelated) direction
of greatest variability
— So first remove all the variability along the first
component, and then find the next direction of
greatest variability and so on…
16. Principal Components Analysis
(PCA)
Principle
— Linear projection method to reduce the number of parameters
— Transfer a set of correlated variables into a new set of uncorrelated
variabies
— Map the data into a space of lower dimensionality
— Form of unsupervised learning
Properties
— It can be viewed as a rotation of the existing axes to new positions in the
space defined by original variables
— New axes are orthogonal and represent the directions with maximum
variability
17. Computing the components
• First center the data points
• Project the data points(vectors) onto an axis such that the variability
of the projected data points onto that axis is greatest.
• It turns out that the variability of x along the transformed axis is the
eigen values of cov(x) and the direction of the new axis is along the
eigen vectors of cov(x)
19. Bartlets test of sphericity
• H0: R=I
• H1: R not equal to I
In other words
H0: scatter plot is sort of sphere centered at origin
H1: scatter plot is not a sphere
• If scatterplot is a sphere, then no use of PCA
• If scatter plot is not a sphere( is ellipse/ellipsoid) then go
for PCA
20.
21.
22.
23.
24. • The results of the principal component analysis in milk production of
the state of Tamil Nadu revealed that milk production was having
positive relationship with the indigenous cattle population, she-
buffalo population, number of veterinary institutions, gross cropped
area, area under paddy. area under groundnut, native purebred cattle
population, graded and indigenous buffalo population, agricultural
labour population, crossbred cattle population, no. of financial
institutions and graded buffalo population.
• This suggests that effecting a shift in herd structure in favour of cross-
bred cows and graded buffalos can augment the milk production
potential.
Results
26. Introduction
• Cluster is a number of things of the same kind growing or joined
together
• A group of homogeneous things
The principle:
• Objects in the same group are similar to each other
• Objects in the different group are as dissimilar as possible
42. How many clusters to retain?
At what stage I have to stop the algorithm.
Scree plot
43.
44. • The cluster analysis was carried out based on area, production, and
productivity of different agricultural and horticultural crops which
were predominantly grown in the districts of Rajasthan
• calculated for two different periods 1980-1995 and 1996-2014
independently.
45. • Crop cluster based on area during 1980- 1995
• Crop cluster based on area during 1996- 2014
• Crop cluster based on production during 1980-1995
• Crop cluster based on production during 1996-2014
• Crop cluster based on productivity during 1990-1995
• Crop cluster based on productivity during 1996-2014
46.
47.
48.
49. Conclusions
• From the present study we concluded that when the performance of
crop clusters based on area between two periods was compared, it
was evident that gram and cotton has shifted over the years in the
second period of study.
• When comparison of the performances of crop clusters based on
production between two periods was observed that gram, mustard &
rapeseed and cotton production shifted over the period.
• It means these crops were made cluster in the first period but not in
the second period. While wheat and bajra were the crops which
made clusters or had similarity in production across all the districts
of Rajasthan from first period to second period.
50. • The present study also concluded that horticultural crops had
similarity in productivity across all the districts of Rajasthan during
the both period.
• It means coriander, garlic and pea productivity included over the
years in the second period of the study. Only wheat and bajra were
the crops which had similarity in productivity across all the districts of
Rajasthan from first period to second period.