This document discusses various data analysis techniques including cluster analysis, multidimensional scaling, perceptual mapping, and discriminant analysis. It provides details on cluster analysis methods and processes. Cluster analysis involves grouping similar observations into clusters so that observations within a cluster are more similar to each other than observations in other clusters. The document discusses different clustering algorithms and applications. It also provides an example of using cluster analysis to segment customers of an auto insurance company based on preferences.
Cluster analysis is a major tool in a number of applications in many fields of Business, Engineering & etc.(The odoridis and Koutroubas, 1999):
Data reduction.
Hypothesis generation.
Hypothesis testing.
Prediction based on groups.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
Cluster analysis of classification is often called the 'non-supervised technique'.
It is a multivariate technique used to determine group membership for cases or variables.
Cluster analysis is a data exploration (mining) tool
for dividing a multivariate dataset into “natural”
clusters (groups). We use the methods to explore
whether previously undefined clusters (groups) may
exist in the dataset.
This presentation is on using repeated measures design in the area of social sciences, behavioural sciences, management, sports, physical education etc.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Cluster analysis is a major tool in a number of applications in many fields of Business, Engineering & etc.(The odoridis and Koutroubas, 1999):
Data reduction.
Hypothesis generation.
Hypothesis testing.
Prediction based on groups.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
Cluster analysis of classification is often called the 'non-supervised technique'.
It is a multivariate technique used to determine group membership for cases or variables.
Cluster analysis is a data exploration (mining) tool
for dividing a multivariate dataset into “natural”
clusters (groups). We use the methods to explore
whether previously undefined clusters (groups) may
exist in the dataset.
This presentation is on using repeated measures design in the area of social sciences, behavioural sciences, management, sports, physical education etc.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
UNIT - 4: Data Warehousing and Data MiningNandakumar P
UNIT-IV
Cluster Analysis: Types of Data in Cluster Analysis – A Categorization of Major Clustering Methods – Partitioning Methods – Hierarchical methods – Density, Based Methods – Grid, Based Methods – Model, Based Clustering Methods – Clustering High, Dimensional Data – Constraint, Based Cluster Analysis – Outlier Analysis.
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
Abstract: Details linkage is a procedure which almost adjoins two or more places of data (surveyed or proprietary) from different companies to generate a value chest of information which can be used for further analysis. This allows for the real application of the details. One-to-Many data linkage affiliates an enterprise from the first data set with a number of related companies from the other data places. Before performs concentrate on accomplishing one-to-one data linkages. So formerly a two level clustering shrub known as One-Class Clustering Tree (OCCT) with designed in Jaccard Likeness evaluate was suggested in which each flyer contains team instead of only one categorized sequence. OCCT's strategy to use Jaccard's similarity co-efficient increases time complexness significantly. So we recommend to substitute jaccard's similarity coefficient with Jaro wrinket similarity evaluate to acquire the team similarity related because it requires purchase into consideration using positional indices to calculate relevance compared with Jaccard's. An assessment of our suggested idea suffices as approval of an enhanced one-to-many data linkage system.
Index Terms: Maximum-Weighted Bipartite Matching, Ant Colony Optimization, Graph Partitioning Technique
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Introduction to Multi-Objective Clustering EnsembleIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
A Survey on the Clustering Algorithms in Sales Data MiningEditor IJCATR
This paper discusses different clustering techniques that can be used in sales databases. The advancement of digital data
collection and build up of data in data banks as a result of modernization in sales disciplines has brought in great challenges of data
processing for better and meaningful results due to mass data deposits. Clustering techniques therefore are quite necessary so that the
senior management in sales department can have access to processed data as they engage themselves in decision making processes.
In this paper, I focus on the retail sales data mining, classification and clustering techniques. In this study I analyze the attributes for
the prediction of buyer’s behavior and purchase performance by use of various classification methods like decision trees, C4.5
algorithm and ID3 algorithm.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2. Analysis of data is the process of evaluating data using analytical and logical reasoning
to examine each component of the data provided.
This form of analysis is just one of the many steps that must be completed when
conducting a research experiment.
Data from various sources is gathered, reviewed, and then analyzed to form some sort
of finding or conclusion.
There are a variety of specific data analysis method, some of which include data mining,
text analytics, business intelligence, and data visualizations.
It is a process of inspecting, cleaning, transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions, and supporting decision making.
Data analysis has multiple facets and approaches, encompassing diverse techniques
under a variety of names, in different business, science, and social science domains.
3. Grouping similar customers and products is a fundamental marketing activity. It issued,
prominently, in market segmentation. As companies cannot connect with all their customers,
they have to divide markets into groups of consumers, customers, or clients (called segments)
with similar needs and wants. Firms can then target each of these segments by positioning
themselves in a unique segment (such as Ferrari in the high-end sports car market).
A) Meaning:
Cluster analysis embraces a variety of techniques, the main objective of which is to group
observations or variables into homogeneous and distinct clusters. A simple numerical
example will help explain these objectives
4. B) Example:
The daily expenditures on food (X1) and clothing (X2) of persons are shown in following Table.
The numbers are fictitious and not at all realistic, but the example will help us explain the
essential features of cluster analysis as simply as possible. The data of Table are plotted in
next figure.
Person X1 X2
A 2 4
B 8 2
C 9 3
D 1 5
e 8.5 1
.a
.c
.d
.e
.b
X2
5
10
o
5. B) Example:
Inspection of figure suggests that the observations from two clusters. The consists of persons
‘A’ and ‘D’, and the second of b, c and e. It can be noted that the observations in each cluster
are similar to one another with respect to expenditures on food and clothing, and that the
two clusters are quite distinct from each other.
These conclusions concerning the number of clusters and their membership were reached
through a visual inspection of figure. This inspection was possible bemuse only two variables
were involved in grouping the observations.
7. C) Examples of Clustering Applications:
1) Marketing:
Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs.
2) Land use:
Identification of areas of similar land use in an earth observation database.
3) Insurance:
Identifying groups of motor insurance policy holders with a high average claim cost.
4) City-planning:
Identifying groups of houses according to their house type, value, and geographical
location.
5) Earth-quake studies:
Observed earth quake epicenters should be clustered along continent faults.
8. D) Types of Data required to Clustering in Data Mining:
9. D) Types of Data required to Clustering in Data Mining:
1) Scalability:
The cluster method should be applicable to huge databases and should decrease linearly
with data size increase.
2) Versatility:
Clustering objects could be of different types – numerical data, Boolean data or
categorical data. Ideally a clustering method should be suitable for all different types of
data objects.
3) Ability to Discover Clusters with Different Shapes:
This is an important requirement for spatial data clustering. Many clustering algorithms
can only discover clusters with spherical shapes.
4) Minimal Input Parameter:
The method should require a minimum amount of domain knowledge for correct
clustering. However, most current clustering algorithms have several key parameters and
they are thus not practical for use in real world applications.
10. D) Types of Data required to Clustering in Data Mining:
5) High Dimensionality:
The clustering algorithm should not only be able to handle low- dimensional data but also
the high dimensional space.
6) Ability to Deal with Noisy Data:
Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such
data and may lead to poor quality clusters.
7) Interpretability:
The clustering results should be interpretable, comprehensible and usable.
12. E) Clustering Methods:
1) Hierarchical Methods:
Hierarchical clustering procedures are characterized by the tree-like structure established
in the course of the analysis. Most hierarchical techniques fall into category called
agglomerative clustering. In this category, clusters are consecutively formed from objects.
Initially, this type of procedure starts with each object representing an individual cluster.
2) Centroid-based Clustering:
In centroid-based clustering, clusters are represented by a central vector, which may not
necessarily be a member of the data set. When the number of clusters is fixed to k, k-
means clustering gives a formal definition as an optimization problem: find the cluster
centers and assign the objects to the nearest cluster center, such that the squared
distances from the cluster are minimized.
13. E) Clustering Methods:
3) Distribution-based Clustering:
The clustering model most closely related to statistics is based on distribution models.
Clusters can then easily be defined as objects belonging most likely to the same
distribution. A nice property of this approach is that this closely resembles the way
artificial data sets are generated: by sampling random objects from a distribution.
4) Density-based Clustering:
In density-based clustering, clusters are defined as areas of higher density than the
remainder of the data set. Objects in these sparse areas - that are required to separate
clusters - are usually considered to be noise and border points.
The most popular density based clustering method is DBSCAN. In contrast to many newer
methods, it features a well-defined cluster model called "density-reach ability". Similar to
linkage based clustering; it is based on connecting points within certain distance
thresholds.
14. E) Clustering Methods:
5) Partitioning-based Clustering:
Partitioning methods relocate instances by moving them from one cluster to another,
starting from an initial partitioning. Such methods typically require that the number of
clusters will be pre-set by the user;
The following subsections present various types of partitioning methods.
a) Error Minimization Algorithms:
These algorithms, which tend to work well with isolated and compact clusters, are the
most intuitive and frequently used methods. The basic idea is to find a clustering
structure that a certain error criterion which measures the "distance" of each in-
stance to its representative value.
b) Graph-Theoretic Clustering:
Graph theoretic methods are methods that produce clusters via graphs. The edges of
the graph connect the instances represented as nodes. A well-known graph-theoretic
algorithm is based on the Minimal Spanning Tree MST. Inconsistent edges are edges
whose weight significantly larger than the average of nearby edge lengths. Another
graph-theoretic approach constructs graphs based on limited neighborhood.
15. E) Clustering Methods:
6) Model-based Clustering Methods:
These methods attempt to optimize the fit between the given data and some
mathematical models. Unlike conventional clustering, which identifies groups of objects,
model-based clustering methods also find characteristic descriptions for each group,
where each group represents a concept or class. The most frequently used induction
methods are decision trees and neural networks.
a) Decision Trees:
In decision trees, the dam is represented by a hierarchical tree, where each leaf refers
to a concept and contains a probabilistic description of that concept. Several algorithms
produce classification trees for representing the unlabelled data.
b) Neural Networks:
This type of algorithm represents each cluster by a neuron or “prototype”. The input
data is also represented by neurons, which are connected to the prototype neurons.
Each such connection has a weight, which is learned adaptively during learning.
7) Constraint-Based Method:
In this method the clustering is performed by incorporation of user or application oriented
constraints. The constraint refers to the user expectation or the properties of desired
clustering results.
17. F) Process of Clustering Analysis:
1) Decide on the Clustering Variables:
At the beginning of the clustering process, we have to select appropriate variables for
clustering. Even though this choice is of utmost importance, it is rarely treated as such
and, instead, a mixture of intuition and data availability guide most analyses in marketing
practice. However, faulty assumptions may lead to improper market segments and,
consequently, to deficient marketing strategies. Thus, great care should be taken when
selecting the clustering variables.
2) Decide on the Clustering Procedure:
By choosing a specific clustering procedure, we determine how clusters are to be formed.
This always involves optimizing some kind of criterion, such as minimizing the within-
cluster variance (i.e., the clustering variables’ overall variance of objects in a specific
cluster), or maximizing the distance between the objects or clusters. The procedure could
also address the question of how to determine the similarity between objects in a newly
formed cluster and the remaining objects in the dataset.
3) Decide on the number of clusters:
An important question we haven’t yet addressed is how to decide on the number of
clusters to retain from the data. Unfortunately, hierarchical methods provide only very
limited guidance for making this decision.
18. F) Process of Clustering Analysis:
4) Validate the Cluster Solution:
Assessing the solution’s reliability is closely related to the above, as reliability refers to the
degree to which the solution is stable over time. If segments quickly change their
composition or its members their behavior, targeting strategies are likely not to succeed.
a) Substantial:
The segments are large and profitable enough to serve.
b) Accessible:
The segments can be effectively reached and served, which requires them to be
characterized by means of observable variables.
c) Differentiable:
The segments can be distinguished conceptually and respond differently to different
marketing-mix elements and programs.
d) Actionable:
Effective programs can be formulated to attract and serve the segments.
e) Stable:
Only segments that are stable over time can provide the necessary grounds for a
successful marketing strategy.
19. F) Process of Clustering Analysis:
4) Validate the Cluster Solution:
f) Parsimonious:
To be managerially meaningful, only a small set of substantial clusters should be
identified.
g) Familiar:
To ensure management acceptance, the segments composition should be
comprehensible.
h) Relevant:
Segments should be relevant in respect of the company’s competencies and objectives.
i) Compactness:
Segments exhibit a high degree of within-segment homogeneity and between-segment
heterogeneity.
j) Compatibility:
Segmentation results meet other managerial functions’ requirements.
5) Interpretation of Data:
The final step of any cluster analysis is the interpretation of the clusters. Interpreting
clusters always involves examining the cluster centroids, which are the clustering
variables’ average values of all objects in a certain cluster.
20. G) Amalgamation or Linkage Rules:
1) Single Linkage (nearest neighbor):
As described above, in this method the distance between two clusters is determined by
the distance of the two closest objects (nearest neighbors) in the different clusters. This
rule will, in a sense, string objects together to form clusters, and the resulting clusters
tend to represent long "chains."
2) Complete Linkage (furthest neighbor):
In this method, the distances between clusters are determined by the greatest distance
between any two objects in the different clusters (i.e., by the "furthest neighbors"). This
method usually performs quite well in cases when the objects actually form naturally
distinct "clumps." If the clusters tend to be somehow elongated or of a "chain" type
nature, then this method is inappropriate.
3) Un-weighted pair-group Average:
In this method, the distance between two clusters is calculated as the average distance
between all pairs of objects in the two different clusters. This method is also very efficient
when the objects form natural distinct "clumps," however, it performs equally well with
elongated, "chain" type clusters.
21. G) Amalgamation or Linkage Rules:
4) Weighted pair-group Average:
This method is identical to the un-weighted pair-group average method, except that in the
computations, the size of the respective clusters (i.e., the number of objects contained in
them) is used as a weight. Thus, this method (rather than the previous method) should be
used when the cluster sizes are suspected to be greatly uneven. Note that in their book,
Sneath and Sokal (1973) introduced the abbreviation WPGMA to refer to this method
as weighted pair-group method using arithmetic averages.
5) Un-weighted pair-group Centroid:
The centroid of a cluster is the average point in the multidimensional space defined by the
dimensions. In a sense, it is the center of gravity for the respective cluster. In this method,
the distance between two clusters is determined as the difference between centroids.
Sneath and Sokal (1973) use the abbreviation UPGMC to refer to this method as un-
weighted pair-group method using the centroid average.
6) Weighted pair-group Centroid (median):
This method is identical to the previous one, except that weighting is introduced into the
computations to take into consideration differences in cluster sizes (i.e., the number of
objects contained in them).
22. H) Psychographic Segmentation:
Consumers are not all alike. This provides a challenge for the development and marketing of
profitable products and services. Not every offering will be right for every customer, nor will
every customer be equally responsive to marketing efforts. Segmentation is a way of
organizing customers into groups with similar traits, product preferences, or expectations.
Once segments are identified, marketing messages and in many cases even products can be
customized for each segment. The better the segment(s) chosen for targeting by a particular
organization, the more successful the organization is assumed to be in the marketplace. Since
its introduction in the late 1950s, market segmentation has become a central concept of
marketing practice.
Segments are constructed on the basis of customers:
a) Demographic characteristics,
b) Psychographics,
c) Desired benefits from products/services,
d) Past-purchase and product-use behaviors.
23. I) Example on Psychographics Segment:
Consider Geico planning on customizing its auto
insurance offerings and needs to understand what
its customers view as important from their
insurance provider. Geico can ask its customers to
rate how important the following two attributes
are to them when considering the type of auto
insurance they would use:
a) Savings on premium
b) Existence of a neighborhood agent.
Figure shows what the analysis in this example
might look like:
Premium
Saving
Very
Important
Agent Not
important
Premium
Saving Not
Important
Agent
Very Important
Segment
–A
(49%)
Segment
–C
(15%)
Segment
–B
(36%)
Fig: Segmentation of Goico
Customers
24. J) Interpretation of Example:
1) Cluster analysis to interpret data:
Cluster analysis is a class of statistical techniques that can be applied to data that exhibits
natural groupings. Cluster analysis makes no distinction between dependent and
independent variables. The entire set of interdependent relationships is examined. Cluster
analysis sorts through the raw data on customers and groups them into clusters. A cluster
is a group of relatively homogeneous customers. Customers who belong to the same
cluster are similar to each other. They are also dissimilar to customers outside the cluster,
particularly customers in other clusters. The primary input for cluster analysis is a measure
of similarity between customers, such as
a) correlation coefficients,
b) distance measures,
c) association coefficients.
25. J) Interpretation of Example:
2) Distance Measures:
The main input into any cluster analysis procedure is a measure of distance between
individuals who are being clustered. Distance between two individuals is obtained through
a measure called “Euclidean distance.” lf two individuals, Joe and Sam, are being clustered
on the basis of n variables, then the Euclidean distance between Joe and Sam is
represented as:
Euclidean=
Where,
XJoe, 1 = Respondents the value of Joe along variable 1,
XSam, 1 = Respondents the value of Sam along variable 1
2 2
,1 .1 , ,( ) ... ( ) Joe Sam Joe n Sam nx x x x
26. J) Interpretation of Example:
3) K-Means Clustering Algorithm:
K-means clustering belongs to the non-hierarchical class of clustering aIgorithn1s. It is one
of the more popular algorithms used for clustering in practice because of its simplicity and
speed. It is considered to be more robust to different types of variables, is more
appropriate for large datasets that are common in marketing, and is less sensitive to some
customers who are outliers (in other words, extremely different from others).
For K-means clustering, the user has to specify the number of clusters required before the
clustering algorithm is started. The basic algorithm for K-means clustering is as follows:
a) Choose the number of clusters, ‘k’.
b) Generate k random points as cluster centroids.
c) Assign each point to the nearest cluster centroid.
d) Recomputed the new cluster centroid.
Repeat the two previous steps until some convergence criterion is met. Usually the
convergence criterion is that the assignment of customers to clusters has not changed
over multiple iterations.
27. J) Interpretation of Example:
4) Profiling Clusters:
Once clusters are identified, the description of the clusters in terms of the variables used
for clustering—or using additional data such as demographics helps in customizing
marketing strategy for each segment. This process of describing the clusters is termed
“profiling." Figure Iis an example of such a process. A good deal of cluster-analysis
software also provides information on which cluster a customer belongs to. This
information can be used to calculate the means of the profiling variables for each cluster.
5) Conclusion:
Given a segmentation basis, the K—means clustering algorithm would identify clusters
and the customers that belong to each cluster. The management, however, has to
carefully select the variables to use for segmentation. Criteria frequently used for
evaluating the effectiveness of a segmentation scheme include: identifiability,
sustainability, accessibility, and action ability dentyiability refers to the extent that
managers can recognize segments in the marketplace.
28. Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual
cases of a dataset. It refers to a set of related ordination techniques used in information
visualization, in particular to display the information contained in a distance matrix.
A) Meaning:
Multidimensional scaling (MDS) is a series of techniques that helps the analyst to identify
key dimensions underlying respondents’ evaluations of objects. It is often used in
Marketing to identify key dimensions underlying customer evaluations of products,
services or companies.
Once the data is in hand, multidimensional scaling can help determine:
a) what dimensions respondents use when evaluating objects
b) how many dimensions they may use in a particular situation
c) the relative importance of each dimension, and
d) how the objects are related perceptually
30. B) Types of Multidimensional Scaling:
1) Classical multidimensional scaling:
It is also known as Principal Coordinates analysis, Torgerson Scaling or Torgerson–Gower
scaling.
2) Metric multidimensional scaling:
It is a superset of classical MDS that generalizes the optimization procedure to a variety of
loss functions and input matrices of known distances with weights and so on.
3) Non-metric multidimensional scaling:
In contrast to metric MDS, non-metric MDS finds both a non-
parametric monotonic relationship between the dissimilarities in the item-item matrix
and the Euclidean distances between items, and the location of each item in the low-
dimensional space. The relationship is typically found using isotonic regression.
4) Generalized multidimensional scaling:
It is an extension of metric multidimensional scaling, in which the target space is an
arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on
a surface and the target space is another surface, GMDS allows finding the minimum-
distortion embedding of one surface into another.
31. C) Process in Multidimensional Scaling:
Process in
Multidimensional
Scaling
Formulating the
Problem
Obtaining Input
Data
Running the MDS
Statistical
Program
Decide Number
of Dimensions Mapping the
Results and
Defining the
Dimensions
Test the Results
for Reliability
and Validity
Report the
Results
Comprehensively
32. C) Process in Multidimensional Scaling:
1) Formulating the Problem:
What variables do you want to compare? How many variables do you want to compare?
More than 20 is often considered cumbersome. Fewer than 8 (4 pairs) will not give valid
results. What purpose is the study to be used for?
2) Obtaining Input Data:
Respondents are asked a series of questions. For each product pair, they are asked to rate
similarity (usually on a 7 point Liker scale from very similar to very dissimilar).
3) Running the MDS Statistical Program:
Software for running the procedure is available in many software for statistics. Often there
is a choice between Metric MDS (which deals with interval or ratio level data), and No
metric MDS (which deals with ordinal data).
4) Decide Number of Dimensions:
The researcher must decide on the number of dimensions they want the computer to
create. The more dimensions, the better the statistical fit, but the more difficult it is to
interpret the results.
33. C) Process in Multidimensional Scaling:
5) Mapping the Results and Defining the Dimensions:
The statistical program (or a related module) will map the results. The map will plot each
product (usually in two-dimensional space). The proximity of products to each other
indicate either how similar they are or how preferred they are, depending on which
approach was used. How the dimensions of the embedding actually correspond to
dimensions of system behavior, however, is not necessarily obvious.
6) Test the Results for Reliability and Validity:
Compute R-squared to determine what proportion of variance of the scaled data can be
accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum
acceptable level. An R-square of 0.8 is considered good for metric scaling and .9 is
considered good for non-metric scaling.
7) Report the Results Comprehensively:
Along with the mapping, at least distance measure (e.g., Sorenson index, Jacquard index)
and reliability (e.g., stress value) should be given. It is also very advisable to give the
algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes
replacing the algorithm report), if you have given a start configuration or had a random
choice, the number of runs, the assessment of dimensionality, the Monte Carlo
method results, the number of iterations, the assessment of stability, and the proportional
variance of each axis (r-square).
34. D) Scenario Example on Multidimensional Scaling :
We are interested in understanding consumers’ perceptions of six candy bars on the
market. Instead of trying to gather information about consumers’ evaluation of the candy
bars on a number of attributes, the researcher will instead gather only perceptions of
overall similarities or dissimilarities. The data are typically gathered by having respondents
give simple global responses to statements such as these:
a) Rate the similarity of products A and B on a 10-point scale
b) Product A is more similar to B than to C
c) I like product A better than product C
Candy Bar A B C D E F
A - 2 13 4 3 8
B 12 6 5 7
C 9 10 11
D - 1 14
E - - 15
F - - -
35. E) Steps of Multidimensional scaling to solve such problem:
Step 1: Objectives of Multidimensional Scaling
Step 2: Research Design of MDS
Step 3: Assumptions of
Multidimensional Scaling Analysis
Step 4: Deriving the MDS
Solution and Assessing
Overall Fit
Step 5:
Interpreting the
MDS Results
Step 6:
Validating the
MDS Results
36. E) Steps of Multidimensional scaling to solve such problem:
Step 1: Objectives of Multidimensional Scaling:
Perceptual mapping, and multidimensional scaling in particular, is most appropriate for
achieving two objectives:
a) As an exploratory technique to identify unrecognized dimensions affecting behavior.
b) As a means of obtaining comparative evaluations of objects when the specific bases of
comparison are unknown or indefinable.
The strength of perceptual mapping is its ability to infer dimensions without the need for
defined attributes. In a simple analogy, it is like providing the dependent variable
(similarity among objects) and figuring out what the independent variables (perceptual
dimension) must be.
1) Identification of all Relevant Objects to be Evaluated:
2) Similarity versus Preference Data:
3) Similarity versus Preference Data :
37. E) Steps of Multidimensional scaling to solve such problem:
Step 2: Research Design of MDS:
Perceptual mapping techniques can be classified by the nature of the responses obtained
from the individual concerning the object.
1) Objects: Their Number and Selection:
An implicit assumption in perceptual mapping is that there are common characteristics,
either objective or perceived, that the respondent could use for evaluations. Therefore it
is vital that the objects be comparable.
2) Collection of Similarity or Preference Data:
The primary distinction among multidimensional scaling programs is the type of data
(qualitative or quantitative) used to represent similarity and preferences.
3) Similarities Data:
When collecting similarities data, the researcher is trying to determine which items are
the most similar to each other and which are the most dissimilar.
4) Preference Data:
Preference implies that stimuli should be judged in terms of dominance relationships –
that is, stimuli are ordered in terms of the preference for some property.
38. E) Steps of Multidimensional scaling to solve such problem:
Step 2: Research Design of MDS:
5) Similarity Data:
The starting point for data collection was in obtaining the perceptions of the respondents
concerning the similarity /dissimilarity of HATCO and nine competing firms in the market.
Similarity judgments were made with the comparison-of-paired-objects approach. The 45
pairs of items were presented to the respondents, who indicated how similar each was on
a nine- point scale, with one being "Not at all similar" and nine being “Very Similar.”
6) Attribute Ratings:
In addition to the similarity judgments, ratings of each firm for eight attributes (product
quality, delivery speed, etc.) were obtained by two methods. In the first method, each
firm was rated on a six-point scale for each attribute. In the second method, each
respondent was asked to pick the firm best characterized by each attribute.
7) Preference Evaluations:
The final data assessed the preferences of each respondent for the ten firms in three
different buying situations: a straight re-buy, a modified re-buy and a new-buy situation.
In each situation, the respondents ranked the firms in order of preference for that
particular type of purchase.
39. E) Steps of Multidimensional scaling to solve such problem:
Step 3: Assumptions of Multidimensional Scaling Analysis:
Multidimensional scaling, while having no restraining assumptions on the methodology,
type of data, or form of the relationships among the variables, does require that the
researcher accept several tenets about perception, including the following:
1) Each respondent will not perceive a stimulus to have the same dimensionality (although it
is thought that most people judge in terms of a limited number of characteristics or
dimensions).
2) Respondents need not attach the same level of importance to a dimension, even if all
respondents perceive this dimension.
3) Judgments of a stimulus in terms of either dimensions or levels of importance need not
remain stable over time. People may not maintain the same perceptions for long periods
of time.
40. E) Steps of Multidimensional scaling to solve such problem:
Step 4: Deriving the MDS Solution and Assessing Overall Fit:
The determination of how many dimensions are actually represented in the data is generally
reached through one of three approaches: subjective evaluation, screen plots of the stress
measures, or an overall index of fit.
a) Incorporating Preferences into MDS:
Up to this point, we have concentrated on developing perceptual maps based on
similarity judgments. However, perceptual maps can also be derived from preferences. A
critical assumption is the homogeneity of perception across individuals for the set of
objects. This allows all differences to be attributed to preferences, not perceptual
differences.
41. E) Steps of Multidimensional scaling to solve such problem:
Step 5: Interpreting the MDS Results:
Once the perceptual map is obtained, the two approaches – compositional and
decomposition again diverge in their interpretation of the results. For compositional methods,
the perceptual map must be validated against other measures of perception, because the
positions are totally defined by the attributes specified by the researcher. For decomposition
methods, the most important issue is the description of the perceptual dimensions and their
correspondence to attributes.
a) Identifying the Dimensions:
Multidimensional scaling techniques have no built-in procedure for labeling the
dimensions.
B) Subjective Procedures:
Interpretation must always include some element of researcher or respondent judgment,
and in many cases this proves adequate for the questions at hand.
c) Objective Procedures:
As a compliment to the subjective procedures, a number of more formalized methods have
been developed.
42. E) Steps of Multidimensional scaling to solve such problem:
Step 6: Validating the MDS Results:
The most direct approach towards validation is a split-sample or multi-sample comparison, in
which either the original sample is divided or a new sample is collected. Most often the
comparison between results is done visually or with a simple correlation of coordinates.
a) Correspondence Analysis:
Correspondence Analysis is an interdependence technique that has become increasingly
popular for dimension reduction and perceptual mapping. It is a compositional technique
because the perceptual map is based on the association between objects and a set of
descriptive characteristics or attributes specified by the researcher. Its most direct
application is portraying the “correspondence” of categories of variables, which is then
used as the basis for developing perceptual maps.
43. A) Meaning:
Perceptual mapping has been used to satisfy marketing and advertising information needs
related to product positioning, competitive market structure, consumer preferences and
brand perceptions. Perceptual maps satisfy these types of information needs by analyzing and
then translating consumers' numeric ratings, brand similarity data and brand preference data
into a visual representation of how those consumers view the set of brands and products.
B) Definitions:
1) Kardes, Cronley, & Cline:
“Perceptual maps measure the way products are positioned in the minds of consumers
and show these perceptions on a graph whose axes are formed by product attributes.”
2) (Ferrell & Hartline, 2008):
“A perceptual map represents customer perceptions and preferences spatially by means
of a visual display”
44. C) Approaches to Perceptual Mapping:
There are two approaches to perceptual mapping.
1) Attribute based perceptual mapping:
Attribute based approaches require a respondent to evaluate a set of brands on a large
number of specific attributes, typically those attributes felt to influence how consumers
perceive, evaluate and distinguish among brands and products. Attribute based
perceptual maps can be created through the use of one of three mathematical
techniques: factor analysis, discriminate analysis and correspondence analysis. These
approaches to attribute based perceptual mapping are discussed in the next section.
2) Non-attribute based perceptual mapping:
Non-attribute based approaches require a respondent to rate brands in terms of
similarities or preferences rather than attributes. A discussion of non-attribute based
perceptual mapping is presented later.
While attribute and non-attribute based approaches to perceptual mapping differ in
terms of the types of data collected, both approaches share the fundamental assumption
of perceptual maps that consumers use broad dimensions to evaluate brands and
products.
45. D) Information Require to Perceptual Mapping:
1) The Number of Dimensions Consumers use to Distinguish between Brands or Products:
This information reveals tl1e complexity of the product category from the consumer's
perspective. I-lightly complex categories are those where consumers use a large number
of dimensions to evaluate brands and products; less complex categories are typically
those where fewer dimensions are used.
2) The Nature and Characteristics of these Dimensions:
This information reveals the specific attributes or dimensions that consumers use to
distinguish among products.
3) The Location of Actual Brands, as well as the Ideal Brand on these Dimensions:
This infom1ation reveals consumers' evaluations of tl1e advertiser's product versus other
products and versus the ideal product on dimensions of importance. Further, it makes
explicit from the consumers' perspective, a brand's most direct competitors and provides
a basis for determining the extent to which future advertising should reinforce or seek to
change the brands current positioning.
47. A) Methods under Discriminant Analysis:
1) Multiple Discriminant Analysis:
MDA is also termed Discriminant Factor Analysis and Canonical Discriminant Analysis. It
adopts a similar perspective to PCA: the rows of the data matrix to be examined constitute
points in a multidimensional space, as also do the group mean vectors. Discriminating
axes are determined in this space, in such a way that optimal separation of the predefined
groups is attained.
2) Linear Discriminant Analysis:
It is the 2-group case of MDA. It optimally separates two groups, using the Mahalanob is
metric or generalized distance. It also gives the same linear separating decision surface as
Bayesian maximum likelihood discrimination in the case of equal class covariance
matrices.
3) K-NNs Discriminant Analysis:
Non-parametric (distribution-free) methods dispense with the need for assumptions
regarding the probability density function. They have become very popular especially in
the image processing area. The K-NNs method assigns an object of unknown affiliation to
the group to which the majority of its K nearest neighbors belongs.
48. B)_Discriminant Function:
Discriminant analysis is used to analyze relationships between a non-metric dependent
variable and metric or dichotomous independent variables. Discriminant analysis attempts to
use the independent variables to distinguish among the groups or categories of the
dependent variable. The usefulness of a discriminant model is based upon its accuracy rate,
or ability to predict the known group memberships in the categories of the dependent
variable.
Each function is given a discriminant score to determine how well it predicts group
placement.
1) Structure Correlation Coefficients:
The correlation between each predictor and the discriminant score of each function.
2) Standardized Coefficients:
Each predictor’s unique contribution to each function, therefore this is a partial
correlation. Indicates the relative importance of each predictor in predicting group
assignment from each function.
3) Functions at Group Centroids:
Mean discriminant scores for each grouping variable are given for each function. The
farther apart the means are, the less error there will be in classification.
49. C) Goals to Discriminant Function
There are two main goals for discriminant analysis:
1) Discrimination:
To construct a classifier to distinguish a set of observations from a known population.
2) Classification:
To distribute unlabeled observations into labeled groups with the classifier. The emphasis
is on deriving a classifier that can be used to sort new observations into the labeled
classes.
D) When to Use Discriminant Analysis:
1) Data should be from distinct groups.
2) DA is used to interpret group differences.
3) DA is used to classify new objects.
50. E) Assumptions in Discriminant analysis:
The discriminant model has the following assumptions:
1) Multivariate Normality:
Data values are from a normal distribution. We can use a normality test to verify this.
However, please note that normal assumptions are usually not "fatal". The resultant
significance tests may still be reliable.
2) Equality of variance-covariance within Group:
The covariance matrix within each group should be equal. Equality Test of Covariance
Matrices can be used to verify it. When in doubt, try re-running the analyses using the
Quadratic method, or by adding more observations or excluding one or two groups.
3) Low Multicollinearity of the Variables:
When high multicollinearity among two or more variables is present, the discriminant
function coefficients will not reliably predict group membership. We can use the pooled
within-groups correlation matrix to detect multicollinearity. If there are correlation
coefficients larger than 0.8, exclude some variables or use Principle Component Analysis
first.
51. F) Steps/ Process in Discriminant analysis:
Preparing
Analysis
Data
Verifying
Assumptions
Selecting
Discriminant
Methods
Interpreting
and
Verifying
the Results
52. F) Steps/ Process in Discriminant analysis:
1) Preparing Analysis Data:
a) Enough Sample Size:
As a rule, the sample size of the smallest group should exceed the number of variables.
Usually it is best that there should be at least 20 for each variable. While this low
sample size may work, it is not encouraged. There should be at least 5 observations for
each variable.
b) Independent Random Sample (no outliers):
Discriminant analysis requires that the observations are independent of one another,
i.e., no repeated measures or matched pairs data. In addition, discriminant analysis is
highly sensitive to the inclusion of outliers.
c) Selecting Proper Variables:
Suppressor variables should be excluded. We can judge by observing the Univariate
ANOVA table.
d) Dividing The Sample:
The Classification Summary of Training Data evaluates the observation via discriminant
functions derived from the same data. The "error rate" is usually larger when the user
evaluates the test data, which is not used for discrminant function estimation.
53. F) Steps/ Process in Discriminant analysis:
2) Verifying Assumptions:
The normality test, Equality Test of Covariance Matrices, and pooled within-groups
correlation matrix can be used to verify the assumptions. Please see Assumptions for
more information.
3) Selecting Discriminant Methods:
a) Linear or Quadratic:
The Quadratic Discriminant Analysis (QDA) is like the linear discriminant analysis (LDA)
except that the covariance matrix in LDA is identical. If the equality test of covariance
matrices fails, QDA should be selected. However, though QDA is more flexible for the
covariance matrix than LDA, it has more parameters to estimate.
b) Identifiable prior probabilities:
Discriminant analysis assumes that prior probabilities of group membership are
identifiable. If group population size is unequal, prior probabilities may differ. If one
finds that N for each group in the descriptive statistics table is different,
use Proportional to group size for the Pier Probabilities option.
54. G) Two Group Discriminant Analyses:
In the two-group case, discriminant function analysis can also be thought of as (and is
analogous to) multiple regression (see Multiple Regression; the two-group discriminant
analysis is also called Fisher linear discriminant analysis after Fisher, 1936;
computationally all of these approaches are analogous). If we code the two groups in the
analysis as 1 and 2, and use that variable as the dependent variable in a multiple
regression analysis, then we would get results that are analogous to those we would
obtain via Discriminant Analysis. In general, in the two-group case we fit a linear equation
of the type:
Group = a + b1*x1 + b2*x2 + ... + Bm*xm
Where a is a constant and b1 through bm are regression coefficients. The interpretation of
the results of a two-group problem is straightforward and closely follows the logic of
multiple regressions: Those variables with the largest (standardized) regression
coefficients are the ones that contribute most to the prediction of group membership.
55. H) Coefficient of Variations:
The coefficient of variation (CV) is defined as the ratio of the standard deviation to
the mean :
It shows the extent of variability in relation to mean of the population.
The coefficient of variation should be computed only for data measured on a ratio scale, as
these are measurements that can only take non-negative values. The coefficient of variation
may not have any meaning for data on an interval scale.[1]For example, most temperature
scales are interval scales (e.g., Celsius, Fahrenheit etc.) that can take both positive and
negative values, whereas the Kelvin scale has an absolute null value (i.e., 0K is the absence of
heat), and negative values are nonsensical. Hence, the Kelvin scale is a ratio scale. While the
standard deviation (SD) can be derived on both the Kelvin and the Celsius scale (with both
leading to the same SDs), the CV is only relevant as a measure of relative variability for the
Kelvin scale. A statistical measure of the dispersion of data points in a data series around the
mean. It is calculated as follows:
u
σ
c =
μ
Standard Deviation
Coefficient of Variation =
Expected Return