A primary data is collected from online shoppers from different wards of Trivandrum corporation (Kerala) and applied some Statistical tests.
Some predictive tools like Logistic regression, Linear discriminant analysis, Multinomial regression are also used to predict customer type..
A Case Study on Status of Online Shopping in Trivandrum District
1. 1
A CASE STUDY ON STATUS OF ONLINE SHOPPING
IN TRIVANDRUM DISTRICT
PROJECT
SUBMITTED TO THE UNIVERSITY OF KERALA
IN PARTIAL FULFILMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF SCIENCE
IN
STATISTICS
BY
HALAKE KUMAR SURESH
Reg No : STA 150504
NEETHU M G
Reg No : STA 150506
DEPARTMENT OF STATISTICS
UNIVERSITY OF KERALA
KARIAVATTOM
THIRUVANANTHAPURAM
2015 - 2017
2. 2
Dr. C. SATHEESH KUMAR University of Kerala
Professor and Head Kariavattom,
Department of Statistics Thiruvananthapuram
June, 2017
CERTIFICATE
I hereby certify this “A CASE STUDY ON STATUS OF ONLINE SHOPPING
IN TRIVANDRUM DISTRICT” is a bonafide report of the project work carried
out by Mr. Halake Kumar Suresh & Ms. Neethu M. G. Fourth Semester M.Sc
Statistics students in the Department of statistics, university of Kerala
during 2015 – 2017 under my supervision and guidance, in partial
fulfillment of the requirements for the M.Sc. Degree in Statistics of
University of Kerala.
Dr. C. Satheesh Kumar
3. 3
ACKNOWLEDGEMENT
The success of anything needs encouragement and co-operation
from different quarters. Words are inadequate to express our profound and
deep sense of gratitude to those who helped us in bringing out this project
successfully.
We owe an inestimable debt to Dr. C. Satheesh Kumar, Professor
& Head, Department of Statistics, University of Kerala and our Project guide
for his constant encouragement, excellent invaluable guidance given and
suggestions rendered duringthecourse of the project work.
We are very much grateful to all our teachers, librarian, research
scholars, and M. Phil students, department of Statistics, University of Kerala,
for their kind help throughout the work.
We also take this opportunity to thank our family members and
friendsfor their loveand encouragementthroughout the study.
All above, we thank Almighty God without whose blessings we
would never havebeen able to complete work.
Mr. Halake Kumar Suresh
Ms. Neethu. M. G
Kariavattom
June, 2017
4. 4
CONTENTS
CHAPTER Page No.
1. INTRODUCTION 6-10
1.1 Introduction 6
1.2 Terminology 8
1.3 Objective 9
1.4 Limitations 9
1.5 Summary of the study 9
2. MATERIALS AND METHODS 11-16
2.1 Introduction 11
2.2 Sampling Techniques 11
2.3 Description of the Questionnaire 12
2.4 Statistical Tools 12
3. STUDY ON AWARENESS OF CUSTOMERS 17-26
3.1 Introduction 17
3.2 Normality test 17
3.3 Awareness of males and females 18
3.4 Awareness and residency 20
3.5 Awareness and different education levels 22
3.6 Awareness and different education streams 23
3.7 Awareness and different monthly income
5. 5
groups 24
3.8 Awareness and customers interest 25
3.9 Conclusions 26
4. STUDY ON SATISFACTION OF CUSTOMERS 27-36
4.1 Introduction 27
4.2 Normality test 27
4.3 Awareness of males and females 28
4.4 Awareness and residency 30
4.5 Awareness and different education levels 32
4.6 Awareness and different monthly income
groups 33
4.7 Awareness and customers interest 34
4.8 Conclusion 35
5. LOGISTIC REGRESSION AND LINEAR DISCRIMINANT
ANALYSIS 36-47
5.1 Introduction 36
5.2 Binary logistic regression 37
5.3 Multinomial logistic regression 41
5.4 Linear discriminant analysis of output 44
5.5 Linear discriminant analysis of customer 45
5.6 Conclusions 47
Reference 48
Questionnaire 49-56
6. 6
Chapter 1
INTRODUCTION
1.1 Introduction
1.1.1 E-commerce inIndia: state of play
Retail e-commerce sales in India are expected to increase to approximately $45.17
billion by 2021, and this correlates with the yearly rise in the use of internet on mobile
devices in the country. (A similar trend of rising mobile purchases is also found across
Asia and Western countries as people become more reliant on mobile technology).
Western e-commerce brands have also taken an interest in India’s maturing
commercial scene, Canadian e-commerce start-up. Shopify announced the release
of Shopify.in in 2013 to much fanfare and publicity; a perfect example of how Western
companies are seeing potential in the thriving economy of India. Shopify was quick to
capitalise on the growing e-commerce entrepreneur and SME market — a demographic
that is contributing to India’s economic landscape with refreshingly diverse voices.
China, the other big e-commerce success story of recent years, has also expressed
an interest in entering the buoyant Indian market. The Chinese e-tail giant Alibaba
keenly wants to get a slice of India’s market, currently dominated by native Flipkart and
Snapdeal, as plans were announced to open an office in Mumbai in late 2016.
There are a lot of exciting developments in the pipeline for India’s e-commerce
industry.
India’s online population is increasing at a rapid pace, as is the number of people
accessing the internet on mobile devices.
1.1.2 Online Consumer Buying Behavior
Everybody in the world is the consumer. Each of us buys and sells or consumes
goods and services in life. Consumer behavior is very complex and is determined to a
large extent by social and psychological factors. Consumer behavior can be defined as
those acts of individuals directly involved in obtaining, using and disposing of economic
goods and services.
7. 7
The relevance and importance of understanding consumer behavior is rooted in
modern marketing. The needs of not even two consumers are the same. Therefore they
buy only those products and services which satisfy their wants and desires. To survive
in the market, a firm has to be constantly innovating and understanding the latest
consumer needs and tastes. It will be extremely useful in exploiting marketing
opportunities and in meeting the challenges that the Indian market offers
Online consumer behavior parallels that of offline consumer behavior with some
obvious differences. The stages of the consumer decision process are basically the
same whether the consumer is online or offline. But the general model of consumer
behavior needs modification to take into account new factors. In the online model, web
site features along with consumer skills, product characteristics, attitudes towards online
purchasing and perceptions about control over the Web environment play a vital role.
There are parallels in the analogue world, where it is well known that consumer
behavior can be influenced by store design, and that understanding the precise
movements of consumers through a physical store can enhance sales if goods and
promotions are arranged along the most likely consumer tracks.
Consumer skills refer to the knowledge that consumer has about how to conduct
online transactions. Product characteristics refer to the fact that some products can be
easily described, packaged and shipped over the Internet whereas others cannot.
Combined with traditional factors such as brand, advertising and firm capabilities, these
factors lead to specific attitudes about online shopping
Consumer behavior regarding the use of internet for shopping varies. Some
consumers either lack access or resist using this new channel of distribution, primarily
due to privacy and security concerns. Other shoppers choose to browse the Web so as
to gather information and then visit the stores to negotiate the purchase face to face
with the retailer. Few shoppers visit retail stores first and then buy from an e-retailer.
Still others do all the shopping online: gathering formation, negotiating, purchasing and
either arranging for delivery or picking up the merchandise in the store
Three key ways any Indian e-commerce store can get ahead of the game are:
investing in a brand story, adopting a multichannel strategy, and getting to grips with
social media content
8. 8
1.2 Terminology
Online Shopping
The action or activity of buying goods or services over the internet. Online shopping is a form
of electronic commerce which allows consumers to directly buy goods or services from a seller
over the Internet using a web browser.
Customer
A customer is an individual or business that purchases the goods or services produced by a
business. Attracting customers is the primary goal of most public-facing businesses, because it
is the customer who creates demand for goods and services.
We classified customers in different groups like ‘more efficient’ and ‘less efficient’ or ‘ideal’,
‘ordinary’, ‘low profile’ based on how long he is shopping online, average amount of spending in
single purchase, frequency of internet usage, wish to shop online in future, etc.
Satisfaction towards online shopping
It is the extent or degree to which a customer is fulfilled by the experience of shopping through
internet.
In our study, the satisfaction score is sum of scores assigned to some of sub-questions of table
T1 and T2.
Qns 1,2,3,4,5,6,9,11,12,13,15,16,17 and 20 have assigned scores as below-
Strongly disagree ( score 1), Disagree (score 2), Indifferent (score 3), Agree (score4) and
Strongly agree (score5)
For Qns 8,10,14 and 18 the scoring is in reverse order.
Awareness towards online shopping
There are some psychological, personal and social factors which measure the level of
awareness in customer towards online shopping.
In our study, the awareness score is sum of scores assigned to sub-questions 1 to 10
and Qn 12 of table T3. The scoring is as below.
Never important ( score 1), Not important ( score 2), Indifferent ( score 3), Important ( score
4), Very Important ( score 5).
9. 9
1.3 Main Objectives ofthe Study
To check whether awareness of customer towards online shopping changes with
different vital factors like gender, residency, stream of education, education level
and income.
To check whether Satisfaction of online shopping varies with gender, residency,
stream of education, education level and income
To classify customers into different groups by means of logistic regression,
multinomial regression and discriminant analysis models
1.4 Limitations of the study
Few weak points of the study are enlisted below.
Results may not be comparable for different geographical area or at different
times, as survey is conducted in Trivandrum district in the months of April-May.
Survey is only related to ‘Online Shoppers’, no consideration about mere offline
shoppers.
Sample size is 150, not too large.
1.5 Summaryof the study
In chapter 1 we given brief introduction about e-commerce. We explained the terms
used in the study. Then we also mentioned the main objectives of our study. There are
some limitations of the study, which are discussed in 1st chapter.
Chapter 2 deals with sampling techniques used for collection of data, the description of
the questionnaire-what are the questions included, how they are arranged, what are the
new variables constructed by using many choices of respondents .
Then there is theoretical approach of different statistical tools used throughout the study
and what are the. We included parametric and non - parametric tools according to their
applicability.
Chapter 3 is focused on analysis about awareness of customers towards online
shopping. We compared awareness level with different vital factors like gender,
residency, education level, education stream and customers of different monthly family
income.
10. 10
Chapter 4 is all about satisfaction of customers towards e-shopping. Here also we
applied different statistical tools to see whether there exist significant difference in
satisfaction score for different vital phenomena like gender, residency, education level,
education stream of customers and with different monthly family income.
Chapter 5 has application of different regression models and discriminant analysis to
distinguish the type of customer. Thus we have got several reliable and well precise
models of classification.
We preferred R studio for analysis of the data.
11. 11
Chapter 2
MATERIALS AND METHODS
2.1 Introduction
In chapter 1, we have introduced the terminologies associated with the topic and
described the objectives of the study. In this chapter we are describing the sampling
techniques used for survey, questionnaire and the analytical tools employed. The
details are given through different sections.
2.2 Sampling Techniques
The overall procedure for this study involved in administration of the questionnaire to a
sample of size 150 online shoppers from various parts of the city.
Multi-stage sampling
Multi-stage sampling (also known as multi-stage cluster sampling) is a more complex
form of cluster sampling which contains two or more stages in sample selection. In
simple terms, in multi-stage sampling large clusters of population are divided into
smaller clusters in several stages in order to make primary data collection more
manageable.
Out of total 100 wards in Trivandrum Corporation, most of the neighboring wards are
homogeneous in case of socio-economic conditions. So, all wards are classified into 10
within homogeneous blocks (1st stage sampling unit) and one ward from each block i.e.
2nd sampling unit is taken randomly, which are representative of corresponding block.
They are Kazhakuttom, Chellamangala, Kowdiar, Kachani, Poojappura,
Pappanamcode, Mulloor, Thampanoor, Kadakampally and Akkulam. Then by the
Simple Random sampling method, sample of size 150 is selected from the selected 10
wards.
Better effort had been put to have a sample representing the whole population. It
comprises all age groups of customer, different education levels like SSLC, Graduation,
Master and different streams of education like Arts, Science, Professional also different
occupations. Most of the questionnaires are completed by face to face interview to
avoid any personal bias from respondents.
12. 12
2.3 Description of the Questionnaire
Each questionnaire has two parts-part A and part B, each respondent has asked to
touch all the questions. All questions are arranged in proper sequence, so as to get
reliable data as much as possible.
Part A has 7 questions, which comprises the personal data of respondents like age,
gender, residency, occupation, education level and monthly family income.
Whereas part B includes survey related questions. Questions 1 and 2 are about
internet usage. Then Qns 3 to 7 are about since how long the customer purchasing
online, spending nature and time spent on site. Qns 8 and 9 are for payment. Qn 10 is
of ranking to some elements of online shopping compared to store shopping. Qn 11 has
variety of products among which customer is asked to tick his shopping choices. Qn 12
is ranking of some popular e-retailers and also ranking of Filter option available in any
e-retailers site profile. Qns 13 to 16 are about how friendly the customer for shopping
site.
Then table T1 and T2 together have 20 factors, each having 5 options (a, b, c, d, & e)
which are nothing but the level of agreement from ‘Strongly disagree’ to ‘Strongly agree’
and informant is supposed to choose one of them. These 20 questions are checking
customer’s satisfaction towards online shopping.
Table T3 has 13 factors, which are awareness factors towards online purchase.
There are also 5 options (a, b, c, d, & e) which are levels of importance from ‘Never
important’ to ‘Very important’ and any one is supposed to choose.
We have done scoring to all sub-questions of table T1, T2 and T3, which is described
in terminology.
Then finally Qns from 17 to 25 are asked about influence of social media on online
purchase, technical problems suffered in various steps like payment, cancel or return of
ordered commodities, awareness about consumer rights and whether customer is
willing to purchase in future. (Questionnaire attached at last of document)
2.4 Statistical Tools
In accordance with the main objectives mentioned in previous chapter, we are utilising
some statistical tools to test different hypotheses.
13. 13
Kruskal Wallis H Test
Kruskal Wallis H Test is often called as “Analysis of Variance by Ranks”. This non-
parametric test is especially desired when the k-samples do not come from normal
population, so non-parametric alternative to one way ANOVA. The null hypothesis here
tested,
H0 : k- independent samples come from same population.
Assumptions
1. Dependent variable should be measured at the ordinal or continuous level
2. Independent variable should consist of two or more categorical, independent
groups.
3. Should have independence of observations.
Steps involved
Step 1: Rank all of the scores, ignoring which group they belong to in ascending order.
Step 2: Find "Ti", the total of the ranks for each group. Just add together all of the ranks
for each group in turn
Step 3: Find the value of test statistic H.
𝐻 = {
12
𝑁(𝑁 + 1)
} [∑
𝑇𝑖
2
𝑛𝑖
𝑘
𝑖=1
] − 3(𝑁 + 1)
Where,
N → the total number of observations.
ni → Number of subjects in ith
group
k → Number of groups ( ≥3)
Ti
2
→ square of total ranks for scores in ith
group
Step 4 : The distribution of H is approximately ᵡ2
with k-1 d.f.
Test : reject H0 at α level of significance, when H > ᵡ2α(k-1)
14. 14
Mann-Whitney U test
This is one of most powerful non-parametric test and is alternative to two sample t-
test. The null hypothesis tested here is,
H0 : Two independent random samples come from same population.
Assumptions
Suppose two samples drawn from two independent populations X and Y.
1. X and Y are continuous distributions (or discrete distributions well-approximating
continuous distributions)
2. X and Y have the same shape. The only possible difference is their position (i.e.
the value of the median)
3. the number of elements in each sample is not less than 5
4. the samples are independent
5. scale of measurement should be ordinal, interval or ratio
How it works
To make it simple, the U-test works as follows. Both samples (having sizes N and M)
are combined into one array which is sorted in ascending order. We keep information
about which sample the element had come from. After sorting, each element is replaced
by its rank (its index in array, from 1 to N+M). Then the ranks of the first sample
elements are summarized and the U-value is calculated:
The mean of U equals NM/2. If U is close to this value, the medians of X and Y are
close to each other. If we know distribution quantiles, we can get the significance level
corresponding to the value of U.
Normal approximation
Although U has discrete distribution if N and M are big it could be approximated by the
normal distribution with a mean of NM/2 and standard deviation 𝜎 = √
𝑁𝑀(𝑁𝑀+1)
12
Thus, 𝑍 =
𝑈−
𝑁𝑀
2
√
𝑁𝑀(𝑁+𝑀+1)
12
can be used as test statistic, which has 𝑁(0,1) distribution.
15. 15
Kolmogorov-Smirnov two sample test
This non parametric test is used to test the null hypothesis,
H0: Two data samples come from the same distribution. Note that we are not specifying
what that common distribution is.
i.e. H0: 𝐹 𝑚(𝑥)= 𝐺 𝑛(𝑥) for all x.
The test statistic Dm,n is defined as below,
𝐷 𝑚,𝑛 = 𝑆𝑢𝑝|𝐹̂ 𝑚(𝑥) − 𝐺̂𝑛 (𝑥)|
Where, 𝐹̂ 𝑚(𝑥) and 𝐺̂𝑛(𝑥) are empirical distribution functions of the two samples. Test
reject H0 at α level of significance if 𝐷 𝑚,𝑛 > 𝐷 𝑚,𝑛(𝛼) i.e. accept H0 if 𝐹̂ 𝑚 and 𝐺̂𝑛 are close
for each x.
Students two sample t-test
This test is used to test the null hypothesis
H0: Two population means are not significantly differ. .i.e. 𝜇1 = 𝜇2
Assumptions
1. Each sample is randomly selected from corresponding populations.
2. Populations from which sample drawn are normal.
3. The variance of two populations does not differ significantly.
So before applying t-test, we should go through variance test.
How it works
Suppose we have two independent random samples of size 𝑛1 𝑎𝑛𝑑𝑛2 from normal
populations N(𝜇𝑖, 𝜎𝑖
2
), i =1,2. Let 𝑋𝑖1, 𝑋𝑖2, ⋯⋯ 𝑋𝑖𝑁𝑖
is sample from ith population.
We have to test H01: 𝜎1
2
= 𝜎2
2
Suppose, 𝑠𝑖
2
=
1
𝑛𝑖 −1
∑ (𝑥 𝑖𝑗 − 𝑥̅ 𝑖)
2𝑛𝑖
𝑗=1
is the ith sample variance, i =1,2 and j=1,2,,,,𝑛𝑖
Then F statistic is,
16. 16
Under H01 𝐹 =
𝑠1
2
𝑠2
2 ~𝐹(𝑛1 − 1, 𝑛2 − 1)
Test accept H01 at 5%level of significance if 𝐹 < 𝐹0.05(𝑛1 − 1, 𝑛2 − 1) and then only
perform students t test for testing H0
The test statistic is given below
𝑡 =
𝑋̅1 − 𝑋̅2
√
( 𝑛1 − 1) 𝑠1
2 + ( 𝑛2 − 1) 𝑠2
2
𝑛1 + 𝑛1 − 2
(
1
𝑛1
+
1
𝑛2
)
~𝑡 𝑛1+𝑛2−2
If | 𝑡| > 𝑡0.05(𝑛1 + 𝑛2 − 2) we reject H0 at 5% level of significance.
17. 17
Chapter 3
STUDY ON AWARENESS OF CUSTOMERS
3.1 Introduction
The facility of Online purchasing has allowed customers to identify the different types of
products available in the global market, Due to rapid globalization, all types of products are
available on the internet .Goods and services, consumer durables, books, audio and video
cassettes and services like and air tickets can also be purchased online.
In this era of fast moving lifestyle, customers are busier than what they were few years
back. It is precisely for this reason customers are also purchasing their products and
services through online shopping. Marketplace is fast turning into e-marketplace. So
customer’s awareness is very important to get maximum benefit and to be least affected by
online fraud like issues.
In our study we asked informant to rank some factors according to their preference and
these factors are measures of each respondent’s awareness level towards online shopping.
The questions are separately put in table T3 of questionnaire which is attached in appendix
section. Mostly awareness of e-shopper depends on some terms like reputation of e-seller,
guarantee-warrantee, privacy and security, advertisements, impact of review and rating, etc
Finally the awareness scores are obtained by summing up these scores in particular way
(explained in Terminology).
3.2 Test for Normality of awareness score
19. 19
Thus, pattern of the data somewhat deviated from normality, so it would be prefer to
have conclusions on non-parametric tools, even though both are used.
3.3 Awareness towards online shoppingwith Gender
In this section our interest is to test whether the Awareness score varies with gender.
H0 : Mean awareness score of customer are identical in both Male and Female.
H1 : Mean awareness score of customer are not identical in Male and Female.
Let us firstly check whether the variances of awareness score are significantly differen
t in male and female by F test. R yields following output.
F test to compare two variances
data: awarescore[Gender == "m"] and awarescore[Gender == "f"]
F = 1.2198, num df = 100, denom df = 48, p-value = 0.4482
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7304366 1.9516958
sample estimates:
ratio of variances
1.219822
Thus F test shows that awareness score variances are not significant in male and femal
e populations. So we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: awarescore[Gender == "m"] and awarescore[Gender == "f"]
t = -1.2207, df = 104.12, p-value = 0.2249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.627605 0.625180
sample estimates:
mean of x mean of y
45.03960 46.04082
As p-value is not significant, we accept H0 at 5% level of significance.
Thus, mean awareness score for male and female are not significantly different.
20. 20
We applied Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: awarescore by Gender
W = 2769.5, p-value = 0.2369
alternative hypothesis: true location shift is not equal to 0
As p-value is not significant, accept H0 .Here one can claim that Median awareness scor
e for male and female is not significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test
H0 : Awareness scores for male and female have same unknown distributions
H1 : Awareness scores have different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: awarescore[Gender == "m"] and awarescore[Gender == "f"]
D = 0.15619, p-value = 0.3967
alternative hypothesis: two-sided
Here we don’t have a proof against H0 so we accept H0 at 5% level of significance.
21. 21
3.4 Awareness towards online shoppingwith Residency
In this section we are going to test whether the Awareness score varies with residency
of customer.
H0 : Mean awareness score of customer are identical in Rural and Urban population.
H1 : Mean awareness score varies with residency.
First of all we try to check whether the variances of awareness score are significantly
different in customers belonging to Rural/Town and Urban area by F test. R yields follo
wing output.
F test to compare two variances
data: awarescore[Residency == "Urban"] and awarescore[Residency == "
Rural/Town"]
F = 1.2906, num df = 53, denom df = 95, p-value = 0.2789
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.8122947 2.1191771
Sample estimates:
ratio of variances
1.29058
Thus F test shows that awareness score variances are not significant in both residenti
al customers. So we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: awarescore[residency == "b"] and awarescore[residency == "a"]
t = 0.81259, df = 98.798, p-value = 0.4184
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.009676 2.410139
sample estimates:
mean of x mean of y
45.81481 45.11458
As p-value is not significant, we accept H0 at 5% level of significance.
Mean awareness score Rura/townl and Urban population are not significantly different.
22. 22
For Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: awarescore by residency
W = 2379, p-value = 0.4044
alternative hypothesis: true location shift is not equal to 0
As p-value is not significant, we accept H0 and Median awareness score for rural/tow
n and urban area customers is not significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test as
H0 : Awareness scores for rural/town and urban residential customers have same
unknown distributions
H1 : Awareness scores come from different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: awarescore[residency == "b"] and awarescore[residency == "a"]
D = 0.16782, p-value = 0.2847
alternative hypothesis: two-sided
Here also we don’t have an evidence against H0 so we accept H0 at 5% significance lev
el of significamce.
23. 23
3.5 Awareness with Education level
Now we have to check whether the Awareness score varies with education level of
customer like +2/Diploma, Graduation, Master and Higher degrees.
H0 : Mean awareness score are same for customer with any education level.
H1 : Mean awareness score varies with level of education.
Here R has following output for ANOVA
aov(formula = awarescore ~ edu)
Df Sum Sq Mean Sq F value Pr(>F)
edu 3 30 9.952 0.412 0.744
Residuals 146 3523 24.130
Thus we accept H0 since there is no proof against it.
Sometimes the conditions for ANOVA may be violated for ranked data. So a non-
parametric alternative to one way analysis of variance –Kruskal Wallis H test is used
as below.
Kruskal-Wallis rank sum test
data: awarescore by Edu
Kruskal-Wallis chi-squared = 1.6646, df = 3, p-value = 0.6448
Now we accept null hypothesis based on Kruskal Wallis H test also.
Thus, the awareness towards online shopping is almost same in all customers with diffe
rent education levels.
24. 24
3.6 Awareness with Stream of education
In this section we are going to test whether the awareness score varies with
educational streams like Arts, Science and Professional.
H0 : Mean awareness score of customer are identical in all streams.
H1 : Mean awareness score varies with stream.
Firstly we are going through ANOVA. Here R has following output
aov(formula = awarescore ~ stream)
Df Sum Sq Mean Sq F value Pr(>F)
stream 2 38 19.09 0.798 0.452
Residuals 147 3515 23.91
There is no proof against null hypothesis, so obviously due to non-significant p-value we
accept H0 at 5% level of significance.
As scores are obtained by summation of ordered values, sometimes the conditions for
ANOVA may be violated. So a non-parametric alternative to one way analysis of
variance –Kruskal Wallis H test is used as below.
Kruskal-Wallis rank sum test
data: awarescore by factor(stream)
Kruskal-Wallis chi-squared = 1.5512, df = 2, p-value = 0.4604
as p-value > 0.05, we accept the null hypothesis that awareness score does not vary sig
nificantly with different streams.
25. 25
3.7 Awareness with Income groups
Here we focus on testing whether the Awareness score varies with different family
income groups (Monthly) like below 20000, 20000 to 50000, 50000 - 75000, 75000-1
Lakh and above 1 Lakh.
H0 : Mean awareness score of customer are identical in all income categories
H1 : Mean awareness score varies with different income groups.
Firstly we are using ANOVA. Here R has following output
aov(formula = awarescore ~ income)
Df Sum Sq Mean Sq F value Pr(>F)
income 4 16 4.082 0.167 0.955
Residuals 145 3537 24.390
Thus we accept H0 since there is no evidence for its rejection.
Since the score data obtained may sometimes exhibit certain deviation from the
assumption of ANOVA, we apply usual non-parametric alternative to one way analysis
of variance viz Kruskal Wallis H test also to the data and R displays following output.
Kruskal-Wallis rank sum test
data: awarescore by Income
Kruskal-Wallis chi-squared = 0.70578, df = 4, p-value = 0.9506
Kruskal Wallis H test also support to accept H0 against H1
So we conclude that all income group customers have almost equal awareness.
26. 26
3.8 Awareness with two groups-‘interested’and ‘not-interested’
We can focus on the awareness level of customers who wish to continue e-shopping i
n future and who are not interested in future. The hypotheses are
H0 : Awareness score does not vary in both group of customers (willing and not willing
to continue online shopping in future)
H1 : Awareness score vary in both group of customers.
Since, the ‘not interested’ group size is small (=7) we use Mann Whitney U test and R
yields following output.
Wilcoxon rank sum test with continuity correction
data: awarescore[Entry_out_1_0$continue == 1] and awarescore[Entry_ou
t_1_0$continue == 0]
W = 575, p-value = 0.5087
alternative hypothesis: true location shift is not equal to 0
Thus test support to accept H0 i.e. both group customers are equally aware about onli
ne shopping.
We can’t use two sample t test, because the variation for both groups are different whic
h can be seen in following boxplot.
Even though medians are not significant, the boxplot reveals that those customers not w
illing to continue are not much well in case of awareness.
27. 27
3.9 Conclusions
From the analysis carried out through ‘out the chapter, we are about getting following
conclusions
Awareness score does not vary significantly for males and females. Thus
customers have almost same awareness score irrespective of gender.
Awareness score does not vary with rural/town and urban residential customers.
That is e-shoppers belonging to any residency have almost similar awareness
about online shopping.
There is no significant difference in awareness score for customers of different
education qualifications. Thus any customer has almost similar awareness about
online shopping, irrespective of education level. (For our study, the respondents
are minimum +2/diploma holders)
There is no significant difference in awareness score for customers from different
educational streams like arts, science and professional. Thus all stream
customers exhibit almost similar awareness about online shopping.
We could see there is no significant difference in awareness scores for the
customers of different family income. Thus, irrespective family income, e-
customers are almost equally aware about online shopping.
There is no significant difference in awareness level for the customers who are
willing to continue e-shopping and who are not willing for it in future.
If we consider variation in awareness score of these two groups we say that the
customers who are not willing to continue are not doing well in case of
awareness. We are seeing satisfaction level for same two groups in next chapter.
28. 28
Chapter 4
Study on satisfaction of customers towards online shopping
4.1 Introduction
With the rapid global growth in electronic commerce (e-commerce), businesses are
attempting to gain a competitive advantage by using e-commerce to interact with
customers. Growing numbers of consumers shop online to purchase goods and
services, gather product information or even browse for enjoyment. Online shopping
environments are therefore playing an increasing role in the overall relationship
between marketers and their consumers (Koo et al. 2008). That is, consumer-purchases
are mainly based on the cyberspace appearance such as pictures, images, quality
information, and video clips of the product, not on the actual experience
If a customer wants to purchase something online, there are plenty of online providers
available and multiple brands are also available for single product. Then for wealthy
business maintenance, the consumer satisfaction is very mportant.
In previous chapter we tried to know how awareness about e-shopping changes with
different factors like gender, residency, income, education, etc. Here also we are going
through same factors and satisfaction.
4.2 Normality test of satisfaction score
From the graph we conclude that data holds normality property
29. 29
4.3 Satisfaction towards online shoppingwith Gender
In this section our interest is to test whether the satisfaction score varies with gender.
H0 : Mean satisfaction score of customer are identical in both Male and Female.
H1 : Mean satisfaction score of customer are not identical in Male and Female.
Let us firstly check whether the variances of satisfaction score are significantly differen
t in male and female by F test. R yields following output.
F test to compare two variances
data: sscore[Gender == "f"] and sscore[Gender == "m"]
F = 1.2678, num df = 48, denom df = 100, p-value = 0.3204
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7923922 2.1172386
sample estimates:
ratio of variances
1.267814
Thus F test shows that satisfaction score variances are not significant in male and femal
e populations. So we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: sscore[Gender == "f"] and sscore[Gender == "m"]
t = -3.8227, df = 85.738, p-value = 0.0002495
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.629189 -2.408799
sample estimates:
mean of x mean of y
59.69388 64.71287
As p-value is significant, we reject H0 at 5% level of significance.
Thus, mean satisfaction score for male and female are significantly different.
30. 30
We applied Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: sscore[Gender == "f"] and sscore[Gender == "m"]
W = 1555, p-value = 0.0002268
alternative hypothesis: true location shift is not equal to 0
As p-value is significant, reject H0 .Here one can claim that Median satisfaction score for
male and female is significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test
H0 : Satisfaction scores for male and female have same unknown distributions
H1 : Satisfaction scores have different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: sscore[Gender == "f"] and sscore[Gender == "m"]
D = 0.28188, p-value = 0.01057
alternative hypothesis: two-sided
Here alsowe rejectnull hypothesis.
31. 31
4.4 Satisfaction to shoppingonline and Residency
In this section we are going to test whether the satisfaction score varies with residency
of customer.
H0 : Mean satisfaction score of customer are identical in Rural and Urban population.
H1 : Mean satisfaction score varies with residency.
First of all we try to check whether the variances of satisfaction score are significantly
different in customers belonging to Rural/Town and Urban area by F test. R yields follo
wing output.
F test to compare two variances
data: sscore[residency == "b"] and sscore[residency == "a"]
F = 1.2127, num df = 53, denom df = 95, p-value = 0.4109
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7632736 1.9912871
sample estimates:
ratio of variances
1.212695
Thus F test shows that satisfaction score variances are not significant in both residential
customers, one we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: sscore[residency == "b"] and sscore[residency == "a"]
t = 0.56677, df = 101.4, p-value = 0.5721
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.883585 3.390529
sample estimates:
mean of x mean of y
63.55556 62.80208
As p-value is not significant, we accept H0 at 5% level of significance.
Thus, mean satisfaction score Rura/townl and Urban population are not significantly diff
erent.
32. 32
For Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: sscore[residency == "b"] and sscore[residency == "a"]
W = 2674, p-value = 0.7494
alternative hypothesis: true location shift is not equal to 0
As p-value is not significant, we accept H0 and Median satisfaction score for rural/tow
n and urban area customers is not significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test as
H0 : Satisfaction scores for rural/town and urban residential customers have same
unknown distributions
H1 : Satisfaction scores come from different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: sscore[residency == "b"] and sscore[residency == "a"]
D = 0.096065, p-value = 0.9073
alternative hypothesis: two-sided
Here also we don’t have an evidence against H0 so we accept H0 at 5% significance
level.
33. 33
4.5 Satisfaction with Educationlevel
Now we have to check whether the Satisfaction score varies with education level of
customer like +2/Diploma, Graduation, Master and Higher degrees.
H0 : Mean satisfaction score are same for customer with any education level.
H1 : Mean satisfaction score varies with level of education.
Here R has following output for ANOVA
aov(formula = sscore ~ Edu)
Df Sum Sq Mean Sq F value Pr(>F)
Edu 3 181 60.21 1.046 0.374
Residuals 146 8402 57.54
Thus we accept H0 since there is no proof against it.
Sometimes the conditions for ANOVA may be violated for ranked data. So a non-
parametric alternative to one way analysis of variance –Kruskal Wallis H test is used
as below.
Kruskal-Wallis rank sum test
data: sscore by Edu
Kruskal-Wallis chi-squared = 3.4159, df = 3, p-value = 0.3318
Now we accept null hypothesis based on Kruskal Wallis H test also.
Thus, the satisfaction towards online shopping is almost same in all customers with
different education levels.
34. 34
4.6 Satisfaction with Income groups
Here we focus on testing whether the Satisfaction score varies with different family
income groups (Monthly) like below 20000, 20000 to 50000, 50000 - 75000, 75000-1
Lakh and above 1 Lakh.
H0 : Mean satisfaction score of customer are identical in all income categories
H1 : Mean satisfaction score varies with different income groups.
Firstly we are using ANOVA. Here R has following output
aov(formula = sscore ~ income)
Df Sum Sq Mean Sq F value Pr(>F)
Income 4 265 66.25 1.155 0.333
Residuals 145 8317 57.36
Thus we accept H0 since there is no evidence for its rejection.
Since the score data obtained may sometimes exhibit certain deviation from the
assumption of ANOVA, we apply usual non-parametric alternative to one way analysis
of variance viz Kruskal Wallis H test also to the data and R displays following output.
Kruskal-Wallis rank sum test
data: sscore by Income
Kruskal-Wallis chi-squared = 3.2958, df = 4, p-value = 0.5096
Kruskal Wallis H test also support to accept H0 against H1
So we conclude that all income group customers have almost equal satisfaction towards
online shopping.
35. 35
4.7 Satisfaction with two groups- ‘interested’ and ‘not-interested’
In previous chapter we found that awareness about e-shopping is almost same for both
groups of customers who willing and not willing to continue it in future. Now we can focu
s on the how both groups are satisfied for e-shopping. The hypotheses are
H0 : Satisfaction score does not vary in both group of customers (willing and not willing
to continue online shopping in future)
H1 : Satisfaction score vary in both group of customers.
Since, the ‘not interested’ group size is small (=7) we use Mann Whitney U test and R
yields following output.
Wilcoxon rank sum test with continuity correction
data: sscore[Entry_out_1_0$continue == 1] and sscore[Entry_out_1_0$co
ntinue == 0]
W = 810.5, p-value = 0.005761
alternative hypothesis: true location shift is not equal to 0
Thus test has evidence against H0 So we reject H0 and accept H1
The mean satisfaction score for ‘not-interested’ customer group is less than that for cust
omer who is interested in e-shopping
36. 36
4.8 Conclusions
Through’ out the chapter, we carried out several tests and following are some
conclusions we obtained.
Satisfaction score vary significantly for males and females. We have seen that
female customers seem more satisfied than male customers.
Satisfaction score does not vary with rural/town and urban residential customers.
That is e-shoppers belonging to any residency have almost similar satisfaction
about online shopping.
There is no significant difference in satisfaction score for customers of different
education qualifications. Thus any customer has almost similar satisfaction about
online shopping, irrespective of education level. (For our study, the respondents
are minimum +2/diploma holders)
We could see there is no significant difference in satisfaction scores for the
customers of different family income. Thus, irrespective family income, e-
customers are almost equally aware about online shopping.
There do exist significant difference in satisfaction level for the customers who
are willing to continue e-shopping and who are not willing for it in future.
The customer who is not interested to shop online in near future has less mean
satisfaction score than the interested customer.
One can claim that even the customers are aware, their satisfaction is important for any
e-stores to keep market wealthy.
37. 37
Chapter 5
CLASSIFICATION BY MEANS OF REGRESSION AND DISCRIMINANT
ANALYSIS
5.1 Introduction
Classification is a powerful tool in machine learning which classifies categorical
response based on several information provided by several explanatory variables. It
would be really important to any e-shopper to understand the type of the customer
based on his past records.
In our study we used binary logistic regression to distinguish customers into two classes
‘efficient customer’ and ‘less efficient customer’. The response variable is firstly
constructed based on several variables described in Terminology.
Then there is use of multinomial logistic regression for grouping customer in three
different classes ‘ideal’, ’ordinary’ and ’low profile’. They are also based on several
variables and is discussed in Terminology.
In both of cases, we fitted the model for all variables through which response was
constructed. Then we preferred model simplification by removing non-significant
explanatory variables such that there should not be much loss in precision of simpler
model we get at last.
Finally we used linear discriminant analysis in relative to above two regression models.
So there is comparison of each model with one another interms of precision.
Confusion matrix and misclassification for each model tells how reliable the model is.
There is also representation of some regression graphs to know how if any variable is
significant for particular model.
5.2 Binary Logistic Regression of Outcome Y on explanatory variables
U, V, W and X
38. 38
Y : Outcome, new binary variable defined with value 1 (efficient customer) if customer is
frequent internet user, spending not less than 500 in single purchase, shopping online s
ince more than 2 years and should be willing to purchase in future, otherwise 0 (not effic
ient).
U : Since how many years the customer purchasing online.
Ordinal, with ordered levels 1,2,3 & 4
V : Average spending in single purchase, Factor with levels a,b,c & d
W : Frequency of internet usage, ordinal with levels 1,2,3 & 4
X : wish to continue online shopping in future, 1 if yes and otherwise 0.
If p is P(Y=1), then the model becomes
Logit(p) = α + βu + δiv + σw + λx i = b,c,d
Here, level ‘a’ of U is redundant
Link function is Logit (p)=Log(
𝑝
1−𝑝
)
R yields following summary of model.
glm(formula = out ~ u + w + x, family = "binomial", data = Dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.7494 -0.1687 -0.0105 -0.0101 0.6665
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -39.42289 2655.79412 -0.015 0.988
u 5.63671 1.15708 4.871 1.11e-06 ***
w 0.05184 1.33980 0.039 0.969
x 23.85253 2655.78879 0.009 0.993
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 150.121 on 149 degrees of freedom
Residual deviance: 30.496 on 146 degrees of freedom
AIC: 38.496
Number of Fisher Scoring iterations: 18
This complex model shows that, some of the variables are less significant effect on Y. S
o, for simplified model, we use backward elimination method of model simplification and
the simplest model obtained below.
39. 39
Logit(p) = α + βu
glm(formula = out ~ u, family = "binomial", data = Dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1870 -0.2318 -0.2318 -0.0267 0.8854
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -12.280 2.263 -5.426 5.75e-08 ***
u 4.338 0.817 5.310 1.10e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 150.121 on 149 degrees of freedom
Residual deviance: 45.176 on 148 degrees of freedom
AIC: 49.176
Number of Fisher Scoring iterations: 7
Thus the model is
Logit(p) = -12.28+4.338u
𝑝̂=
𝑒 𝑦
1+𝑒 𝑦
where y = -12.28+4.338u
Confusion Matrix
Predicted
actual 0 1
0 115 5
1 0 30
Misclassification error is 3.333333 %
Following are some graphs reflecting the significance of different variables on the outco
me.
42. 42
5.3 Multinomial Logistic Regression of customer type Y on several explanat
ory variables given below.
Y→New categorical variable for type of customer, having 3 categories-‘ideal’, ’ordinary’
and ‘lowprofile’.
This variable is constructed by using different variables like how customer satisfied with
current custom of online shopping, how much he is confident in digital payment system,
and so on.
P→factor, past years since purchasing online
Q→factor, average spending in single purchase
R→factor, spent in last purchase
S→ordinal, whether the customer using price comparison sites
T→ordinal, for preference to read review/ratings
U→ordinal, for preference to share own review/ratings
V→ordinal, whether feeling safe/secure in online shopping
W→ordinal, agreement level at hesitation for online payment
X→ordinal, agreement level for overall satisfaction
Z→numeric, (yes=1 or no=0 to continue shopping online in future)
The model thus become,
Log(
𝑝2
𝑝1
)= α + βjp +βjq +βkr + δ1s +δ2t+ δ3u+ δ4v+ δ5w + δ6x+ σz
Log(
𝑝3
𝑝1
)= α’ + β’jp +β’jq +β’kr + δ’1s +δ’2t+ δ’3u+ δ’4v+ δ’5w + δ’6x+ σ’z
where i,j,k=b,c,d
p1=P(customer is ’ideal’)
p2=P(customer is ’lowprofile’)
p3=P(customer is ’ordinary’)
here p1+p2+p3=1 and R has taken ‘ideal’ as a ‘reference level’, so the term p 1 is at
denominator
The complex model is,
(Intercept) pb pc pd qb qc qd rb
lowprofile 120.6 -70.6 -47.20 -114.60 -67.41 -10.180 -178.90 20.13
ordinary 128.5 -70.8 -44.99 13.51 -65.28 -5.829 -96.74 20.58
rc rd s t u v w x z
lowprofile 56.16 82.16 13.66 99.82 15.30 -31.26 26.19 -65.09 67.41
ordinary 55.61 79.86 12.13 101.30 16.71 -28.33 20.44 -63.01 65.00
Residual Deviance: 35.16687 AIC: 103.1669
43. 43
Wald Test for significance of coefficients
To know the significance of different coefficients, Wald test gives the p-values which are
calculated by using corresponding standard errors
(Intercept) pb pc pd qb qc qd rb rc
lowprofile 0 1.045e-09 0 NaN 0 0.1069 0.000e+00 0 4.570e-09
ordinary 0 9.291e-10 0 0 0 0.3561 3.672e-07 0 6.354e-09
rd s t u v w x z
lowprofile 0 0.3235 0 0.5426 0.4987 0.3958 0.04770 6.058e-09
ordinary 0 0.3809 0 0.5063 0.5397 0.5075 0.05522 2.061e-08
Now by removing some non significant variables like S and U, also introducing interactio
n between V and W i.e. feeling safe/secure and hesitation at payment, we get more pre
cise and simple model as,
Log(
𝑝2
𝑝1
)= α + βjp + βjq + βkr + δ1t + δ2v + δ3w + δ4x + σz +λv*w (1)
Log(
𝑝3
𝑝1
)= α’ + β’jp + β’jq + β’kr + δ’1t + δ’2v + δ’3w + δ’4x + σ’z +λ’v*w (2)
where i,j,k=b,c,d
If equation (1) is y1 and (2) is y2 , then estimated probabilities are
𝑝̂1=
1
1+𝑒 𝑦1+𝑒 𝑦2
𝑝̂2=
𝑒 𝑦2
1+𝑒 𝑦1+𝑒 𝑦2
𝑝̂3=
𝑒 𝑦3
1+𝑒 𝑦1+𝑒 𝑦2
(Intercept) pb pc pd qb qc qd rb
lowprofile -203.4 -58.44 -34.43 -164.70 -69.17 -10.44 -237.0 12.19
ordinary 373.2 -58.77 -33.76 57.36 -40.89 23.12 -75.2 12.38
rc rd t v w x z v:w
lowprofile -30.23 76.98 41.94 98.60 170.00 -108.60 39.51 -37.01
ordinary 11.83 70.84 52.22 -64.85 -52.53 -54.38 35.25 18.02
Confusion matrix
Misclassification percent is 1.33, which is very less.
Predicted
actual ideal lowprofile ordinary
ideal 4 0 0
lowprofile 0 32 1
ordinary 0 1 112
44. 44
Interms of probability, first four predictions of this model are
ideal lowprofile ordinary
1 0 0.4208 0.5792
2 0 1.0000 0.0000
3 0 0.0000 1.0000
4 0 0.0000 1.0000
45. 45
5.4 Linear Discriminant Analysis (LAD) of outcome on U, V and X
Y→Outcome, new binary variable defined with value 1 (efficient customer) if customer
is frequent internet user, spending not less than 500 in single purchase, shopping online
since more than 2 years and should be willing to purchase in future, otherwise 0 (not eff
icient).
U→Since how many years the customer purchasing online.
Ordinal, with ordered levels 1,2,3 & 4
V→Average spending in single purchase, Factor with levels a,b,c & d
shopping in future, 1 if yes and otherwise 0.
X→wish to continue online
Where R gives following outputs about discriminant analysis
lda(out ~ u + v + x, data = Dat)
Prior probabilities of groups:
0 1
0.8 0.2
Group means:
u v x
0 1.691667 2.150000 0.9416667
1 3.400000 2.333333 1.0000000
Coefficients of linear discriminants:
LD1
u 1.71759540
v -0.04364488
x 0.98842355
Confusion matrix
predict
actual 0 1
0 116 4
1 0 30
Misclassification percent is 2.66
46. 46
5.5 Linear Discriminant analysis (LAD) of Y (customer type) on several
ordinalresponses
Here, all explanatory variables are ordinal except Z one, which is numeric
Y=categorical, customer type (dependent variable)
P<-categorical, past years since purchasing online (past.f)
Q<-ordinal, average spending in single purchase
T<-categorical, for preference to read review/ratings (reviewr.f)
V<-ordinal, whether feeling safe/secure in online shopping
W<-ordinal, agreement level at hesitation for online payment
X<-ordinal, agreement level for overall satisfaction
Z<-numeric, (yes/no to continue shopping online in future)
R yields following discriminant output.
lda(y ~ past.f + q + reviewr.f + v * w + x + z, data = Dat2)
Prior probabilities of groups:
ideal lowprofile ordinary
0.02666667 0.22000000 0.75333333
Group means:
past.fb past.fc past.fd q reviewr.fb reviewr.fc v
ideal 1.0000 0.0000 0.0000 2.5 0.00000 0.00000 4.0000
lowprofile 0.6060 0.2121 0.0000 2.06 0.18181 0.03030 2.2424
ordinary 0.4159 0.1238 0.1238 2.21 0.3185 0.053097 2.9026
w x z v:w
ideal 2.2500 3.2500 1.00000 9.000000
lowprofile 4.0909 1.7272 0.96969 9.151515
ordinary 2.6106 2.1504 0.94690 7.769912
Coefficients of linear discriminants:
LD1 LD2
past.fb 0.03967307 1.1541507
past.fc -0.07931444 0.8777495
past.fd 1.54621131 -0.0129210
q 0.39109904 0.2327005
reviewr.fb 0.85215592 -0.5408363
reviewr.fc 0.39822460 -0.8159128
v -0.70579709 1.7595617
w -2.26878996 1.3843756
x 0.48751834 0.9226888
z 0.58574159 1.0723062
v:w 0.45171501 -0.4799412
Proportion of trace:
LD1 LD2
0.8776 0.1224
47. 47
Confusion matrix
predicted
actual ideal lowprofile ordinary
ideal 4 0 0
lowprofile 0 30 3
ordinary 1 5 107
Misclassification error is 6%,
Here is neat display of classification done by above model.
48. 48
5.6 Conclusions
The binary logistic regression model reveals that for consumer to be ‘efficient’ and ‘less e
fficient’ the past purchasing behavior has significant role. The misclassification error is ab
out 3.33%, which is preferably less.
The multicategory logistic regression model works well with several significant explanator
y variables like past purchase behavior, amount spent in single purchase and continue k
eeping online purchase.
The misclassification percent is just about 1.33, so model is said to be more reliable.
The discriminant analysis model for same variables of binary logistic model has misclassi
fication percent 2.66, indicating is preferably better over binary logistic model.
The discriminant analysis model for some variables of multicategory logistic regression h
as misclassification error about 6%. The grouping is shown by graph using R tool.
49. 49
REFERNCE
Alan Agresti(2002). An Introduction To Categorical Data Analysis,
Second Edition, Wiley Series in Probability and Statistics.
Michael J Crowely (2007).The R Book, Wiley
Rohatgi V.K (1995).An Introduction To Probability Theory and
Mathematical Statistics.
Lehmann E. L.(1975).Non Parametric Statistical Methods Based on
Ranks.
Jared P. Lander (2013).R For Everyone : Advanced Analytics and
Graphics, Kindle Edition
50. 50
Department Of Statistics
University of Kerala
CASE STUDY ON STATUS OF ONLINE SHOPPING IN TRIVANDRUM DISTRICT
Questionnaire
A) Personal Data
1.Name (optional) :
2.Gender : Female □ Male □
3.Age(years) :
4.Residency: a) Rural/Town □ b) Urban □
5.Educationqualification: a) SSLC □ b) plus 2 /Diploma □
c) Graduation □ d) Master □ e) Professional □
(Please specify the stream for above qualifications)
6.Occupation : a) Student □ b) Teacher/Researcher □
c) ITprofessional □ d) Engineer/Industrial □
e)Business/Management □ f)Civilservice □ g)Other □ (specify)_____
7.Monthly family Income(Rs) : a) <20,000 □ b) 20,000-50,000□
c) 50,000-75,000 □ d) 75,000-1Lakh □ e) above 1 Lakh □
B) Survey RelatedQuestions
1) Where do you access internet primarily?
a) Mobile □ b) PC □ c)Tablet / Ipad □
d) Office/workplace □ e) Others □
51. 51
2) How often would you use internet in a week (except study & work)?
a)Daily □ b) Oncein 2-3 days □ c)once in week □ d)less frequently □
3) When did you purchaseonline lastly?
a)Within last week □ b)Within last month □
c)Beforemonth □ d)before3-4 months □ e)before 6 months □
4) Since how many years you have been shopping through online?
a) 1 year □ b)2-3 years □ c)4-5 years □ d)morethan 5 years □
5) On an averagehow much would you spend in single purchase?(inRs)
a) Less than 500 □ b) 500-2000 □
c) 2000-5000 □ d) more than 5000 □
6) Approximately how much you had spent on a last purchase? (inRs)
a) Less than 500 □ b) 500-2000 □
c) 2000-5000 □ d) more than 5000 □
7) Approximately how much time you had spent on a last purchase?
a) few minutes □ b) 15 to 30 min □
c) 30-60 min □ d) morethan 1 hr □
8) Do you have -
Credit/Debit card Yes □ No □
E-banking facility Yes □ No □
9) Which payment method did you use for last purchase?
a) Credit/Debit card □ b) Net banking /Digital Wallet □
c) Cash on delivery □ d) Others (specify) □_____
52. 52
10) Rank the elements thosepromote you to purchaseonline.
(most preferred has rank 1 & least one has rank 6)
Element Ranking
(by preference)
a. Convenient &
Relaxed way
b. Door to door
service
c. Specific Product
information
d. Low price
e. Variety of
products
f. Time save
11) What Products you usually purchaseonline?
Product Choice
Yes No
a. Clothing& Accessories □ □
b. Books & Stationary □ □
c. Mobile &
Computer/Accessories
□ □
d. Electronic & Digital
Accessories
□ □
e. Home, Kitchen & Pets □ □
f. Toys & Baby Products □ □
g. Sports, Fitness &
outdoor
□ □
h. Beauty, health &
Cosmetics/Jewellery
□ □
53. 53
12) Rank the following according to your preference.
13) Would you prefer using price comparison sites?
a) Almostalways □ b) sometimes □ c) rarely/never □
14) Do you prefer to read review/ratings of product by other purchasers?
a) Almostalways □ b) sometimes □ c) rarely/never □
15) Do you express your opinion in the “Productreview/rating” section?
a) Almostalways □ b) sometimes □ c) rarely/never □
16) I am willing to pay more if
a) website offer free delivery □ b) faster/fastestdelivery options □
c) tax free shopping □ d) item not available offline □
Factor Ranking
a) Brand
b) Popularity(rating/reviews)
c) Price (low-high/high-low)
d) Discount/Offer/Coupon
e) Fresh arrivals
f) Others
Site Ranking
a) Amazon
b) Snapdeal
c) Flipkart
d) Myntra
e) Ebay
f) Others(specify)
54. 54
Factors
Strongly
disagree
Disagree
Indifferent
Agree
Strongly
agree
1. I can buy the products anytime24
hours a day while shopping online
2. Itis easy to chooseand make
comparison with other products
3. The website design/ layout helps me
in searching and selecting the right
product
4. Sometimes I can find products online
which I may not find in stores
5. I feel that it takes less time in
evaluating and selecting a product
while shopping online
6. I feel safeand securewhile shopping
online
7. I like to shop online from a
trustworthy website
8. There has been asking unnecessary
information in online shopping.
9. I believe online shopping will
eventually supersedetraditional
shopping
10.A long time is required for thedelivery
of products and service
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
T1
55. 55
11.More choices are available in online
site
12. The description of products shown on
the site are very accurate
13.Online shopping is as secure as
traditional shopping
14.At the time of payment, I hesitate to
give my credit/debit card number
15.Internet reduces the monetary costs of
traditional shopping to a great
extent(parking,travel,etc)
16. I am satisfied with the service quality
of online retailers
17. When I get a product up to
expectation, I prefer same site next
time
18. Delivery/shipping charge of product is
relatively high
19.Product gets delivered before the
delivery timeline mentioned
20. I am overall satisfied with the
experience of shopping online.
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝⃝
Strongly
disagree
Disagree
Indifferent
Agree
Strongly
agree
FactorsT2
56. 56
Factor
i. Reputation of the company □ □ □ □ □
ii. Guarantees and Warrantees □ □ □ □ □
iii. Privacy □ □ □ □ □
iv. Descriptionof goods in the site □ □ □ □ □
v. Riskofpaymentdata □ □ □ □ □
vi. Seasonal/Festivaloffers □ □ □ □ □
vii. Waiting to receive the product □ □ □ □ □
viii. TrackingtheproductIordered □ □ □ □ □
ix. Theadvertisementofrelated
producttothepurchasingproduct
(shoe:socks)
□ □ □ □ □
x. Impactofreviews/ratings □ □ □ □ □
xi. Not being able to touch
products.
□ □ □ □ □
xii. Returnpolicyofonlinestore □ □ □ □ □
xiii. Situation of out of stockitems □ □ □ □ □
17) Do social networking advertisements influence you ononlinepurchase?
Yes □ No □
18)Haveyou suffered fromtechnical problems during purchase?
Yes □ No □ If yes, please mention______
Never
Important
NotImportant
Indifferent
Important
Very
Important
T3
57. 57
19) Haveyou suffered fromtransaction problems during payment?
Yes □ No □
If yes, please mention ______
20) Haveyou ever had to cancel your order? (beforeit got dispatched)
Yes □ No □
21) Haveyou returned the delivered object over last 2-3 purchase?
Yes □ No □
If yes, whatwould be the reason?
a) Absent of receiver □
b) Change in productquality/size/colour □
c) Damaged product □
d) Lostinterest in that product □
e) Other □
22) Haveyou ever faced cancel of order by company without your consent?
Yes □ No □
23) Do you know customer can sell used /fresh commodities through some
shopping sites?
Yes □ No □
24) Are you awareof your consumer rights when shopping online?
Yes □ No □
25) Do you intend to continue purchasing products fromtheinternet in the
near future?
Yes □ No □
-Thank You