CLUSTER ANALYSIS

 PREPARED BY SABA KHAN
PRESENTED TO IMTIAZ ARIF
        ID 4640
What is Cluster Analysis?
     It is a descriptive analysis technique which groups
     objects (respondents, products, firms, variables,
     etc.) so that each object is similar to the other
     objects in the cluster and different from objects in
     all the other clusters.




2
What is Cluster Analysis?
 Cluster: a collection of data objects
   Similar to one another within the same cluster
   Dissimilar to the objects in other clusters


 Cluster analysis
   Finding similarities between data according to the
   characteristics found in the data and grouping
   similar data objects into clusters
When to use cluster analysis?
     The essence of all clustering approaches is the classification of
        data as suggested by “natural” groupings of the data themselves.
     Simply put when you desire the following then use
        Cluster analysis.
          Taxonomy development(segmentation)
          Data simplification
          Relationship identification
         Applications.
     It is used to segment the market in Marketing, used in
        social networking sites in making new groups based on
        users data, Flickr’s map of photos and other map sites
        use clustering to reduce the number of markers on a
        map.
4
    
Examples of Clustering Applications

 • Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs.
 • Land use: Identification of areas of similar land use in an
earth observation database.
 • Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost.
 • City-planning: Identifying groups of houses according to
their house type, value, and geographical location.
 • Earth-quake studies: Observed earth quake epicenters
  should be clustered along continent faults
Assumptions for Cluster Analysis.
     Sufficient size is needed to ensure representativeness of
        the population and its underlying structure, particularly
        small groups within the population.
       Outliers can severely distort the representativeness of the
        results if they appear as structure (clusters) that are
        inconsistent with the research objectives
       Representativeness of the sample. The sample must
        represent the research question.
       Impact of multicollinearity. Input variables should be
        examined for substantial multicollinearity and if present:
       Reduce the variables to equal numbers in each set of
        correlated measures.


6
HOW TO DEFINE
CLUSTERS
   CLUSTER       CLUSTER
   A             B




             1

             2

             3
We will now go to SPSS for
     analysis.

      Retrieve judges.sav
      Analyze  classify  Hierarchical cluster
      All variables.




10

Cluster analysis

  • 1.
    CLUSTER ANALYSIS PREPAREDBY SABA KHAN PRESENTED TO IMTIAZ ARIF ID 4640
  • 2.
    What is ClusterAnalysis?  It is a descriptive analysis technique which groups objects (respondents, products, firms, variables, etc.) so that each object is similar to the other objects in the cluster and different from objects in all the other clusters. 2
  • 3.
    What is ClusterAnalysis?  Cluster: a collection of data objects  Similar to one another within the same cluster  Dissimilar to the objects in other clusters  Cluster analysis  Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters
  • 4.
    When to usecluster analysis?  The essence of all clustering approaches is the classification of data as suggested by “natural” groupings of the data themselves.  Simply put when you desire the following then use Cluster analysis.  Taxonomy development(segmentation)  Data simplification  Relationship identification Applications.  It is used to segment the market in Marketing, used in social networking sites in making new groups based on users data, Flickr’s map of photos and other map sites use clustering to reduce the number of markers on a map. 4 
  • 5.
    Examples of ClusteringApplications  • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs.  • Land use: Identification of areas of similar land use in an earth observation database.  • Insurance: Identifying groups of motor insurance policy holders with a high average claim cost.  • City-planning: Identifying groups of houses according to their house type, value, and geographical location.  • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
  • 6.
    Assumptions for ClusterAnalysis.  Sufficient size is needed to ensure representativeness of the population and its underlying structure, particularly small groups within the population.  Outliers can severely distort the representativeness of the results if they appear as structure (clusters) that are inconsistent with the research objectives  Representativeness of the sample. The sample must represent the research question.  Impact of multicollinearity. Input variables should be examined for substantial multicollinearity and if present:  Reduce the variables to equal numbers in each set of correlated measures. 6
  • 7.
    HOW TO DEFINE CLUSTERS CLUSTER CLUSTER A B 1 2 3
  • 8.
    We will nowgo to SPSS for analysis. Retrieve judges.sav Analyze  classify  Hierarchical cluster All variables. 10