SlideShare a Scribd company logo
1 of 57
1
A CASE STUDY ON STATUS OF ONLINE SHOPPING
IN TRIVANDRUM DISTRICT
PROJECT
SUBMITTED TO THE UNIVERSITY OF KERALA
IN PARTIAL FULFILMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF SCIENCE
IN
STATISTICS
BY
HALAKE KUMAR SURESH
Reg No : STA 150504
NEETHU M G
Reg No : STA 150506
DEPARTMENT OF STATISTICS
UNIVERSITY OF KERALA
KARIAVATTOM
THIRUVANANTHAPURAM
2015 - 2017
2
Dr. C. SATHEESH KUMAR University of Kerala
Professor and Head Kariavattom,
Department of Statistics Thiruvananthapuram
June, 2017
CERTIFICATE
I hereby certify this “A CASE STUDY ON STATUS OF ONLINE SHOPPING
IN TRIVANDRUM DISTRICT” is a bonafide report of the project work carried
out by Mr. Halake Kumar Suresh & Ms. Neethu M. G. Fourth Semester M.Sc
Statistics students in the Department of statistics, university of Kerala
during 2015 – 2017 under my supervision and guidance, in partial
fulfillment of the requirements for the M.Sc. Degree in Statistics of
University of Kerala.
Dr. C. Satheesh Kumar
3
ACKNOWLEDGEMENT
The success of anything needs encouragement and co-operation
from different quarters. Words are inadequate to express our profound and
deep sense of gratitude to those who helped us in bringing out this project
successfully.
We owe an inestimable debt to Dr. C. Satheesh Kumar, Professor
& Head, Department of Statistics, University of Kerala and our Project guide
for his constant encouragement, excellent invaluable guidance given and
suggestions rendered duringthecourse of the project work.
We are very much grateful to all our teachers, librarian, research
scholars, and M. Phil students, department of Statistics, University of Kerala,
for their kind help throughout the work.
We also take this opportunity to thank our family members and
friendsfor their loveand encouragementthroughout the study.
All above, we thank Almighty God without whose blessings we
would never havebeen able to complete work.
Mr. Halake Kumar Suresh
Ms. Neethu. M. G
Kariavattom
June, 2017
4
CONTENTS
CHAPTER Page No.
1. INTRODUCTION 6-10
1.1 Introduction 6
1.2 Terminology 8
1.3 Objective 9
1.4 Limitations 9
1.5 Summary of the study 9
2. MATERIALS AND METHODS 11-16
2.1 Introduction 11
2.2 Sampling Techniques 11
2.3 Description of the Questionnaire 12
2.4 Statistical Tools 12
3. STUDY ON AWARENESS OF CUSTOMERS 17-26
3.1 Introduction 17
3.2 Normality test 17
3.3 Awareness of males and females 18
3.4 Awareness and residency 20
3.5 Awareness and different education levels 22
3.6 Awareness and different education streams 23
3.7 Awareness and different monthly income
5
groups 24
3.8 Awareness and customers interest 25
3.9 Conclusions 26
4. STUDY ON SATISFACTION OF CUSTOMERS 27-36
4.1 Introduction 27
4.2 Normality test 27
4.3 Awareness of males and females 28
4.4 Awareness and residency 30
4.5 Awareness and different education levels 32
4.6 Awareness and different monthly income
groups 33
4.7 Awareness and customers interest 34
4.8 Conclusion 35
5. LOGISTIC REGRESSION AND LINEAR DISCRIMINANT
ANALYSIS 36-47
5.1 Introduction 36
5.2 Binary logistic regression 37
5.3 Multinomial logistic regression 41
5.4 Linear discriminant analysis of output 44
5.5 Linear discriminant analysis of customer 45
5.6 Conclusions 47
Reference 48
Questionnaire 49-56
6
Chapter 1
INTRODUCTION
1.1 Introduction
1.1.1 E-commerce inIndia: state of play
Retail e-commerce sales in India are expected to increase to approximately $45.17
billion by 2021, and this correlates with the yearly rise in the use of internet on mobile
devices in the country. (A similar trend of rising mobile purchases is also found across
Asia and Western countries as people become more reliant on mobile technology).
Western e-commerce brands have also taken an interest in India’s maturing
commercial scene, Canadian e-commerce start-up. Shopify announced the release
of Shopify.in in 2013 to much fanfare and publicity; a perfect example of how Western
companies are seeing potential in the thriving economy of India. Shopify was quick to
capitalise on the growing e-commerce entrepreneur and SME market — a demographic
that is contributing to India’s economic landscape with refreshingly diverse voices.
China, the other big e-commerce success story of recent years, has also expressed
an interest in entering the buoyant Indian market. The Chinese e-tail giant Alibaba
keenly wants to get a slice of India’s market, currently dominated by native Flipkart and
Snapdeal, as plans were announced to open an office in Mumbai in late 2016.
There are a lot of exciting developments in the pipeline for India’s e-commerce
industry.
India’s online population is increasing at a rapid pace, as is the number of people
accessing the internet on mobile devices.
1.1.2 Online Consumer Buying Behavior
Everybody in the world is the consumer. Each of us buys and sells or consumes
goods and services in life. Consumer behavior is very complex and is determined to a
large extent by social and psychological factors. Consumer behavior can be defined as
those acts of individuals directly involved in obtaining, using and disposing of economic
goods and services.
7
The relevance and importance of understanding consumer behavior is rooted in
modern marketing. The needs of not even two consumers are the same. Therefore they
buy only those products and services which satisfy their wants and desires. To survive
in the market, a firm has to be constantly innovating and understanding the latest
consumer needs and tastes. It will be extremely useful in exploiting marketing
opportunities and in meeting the challenges that the Indian market offers
Online consumer behavior parallels that of offline consumer behavior with some
obvious differences. The stages of the consumer decision process are basically the
same whether the consumer is online or offline. But the general model of consumer
behavior needs modification to take into account new factors. In the online model, web
site features along with consumer skills, product characteristics, attitudes towards online
purchasing and perceptions about control over the Web environment play a vital role.
There are parallels in the analogue world, where it is well known that consumer
behavior can be influenced by store design, and that understanding the precise
movements of consumers through a physical store can enhance sales if goods and
promotions are arranged along the most likely consumer tracks.
Consumer skills refer to the knowledge that consumer has about how to conduct
online transactions. Product characteristics refer to the fact that some products can be
easily described, packaged and shipped over the Internet whereas others cannot.
Combined with traditional factors such as brand, advertising and firm capabilities, these
factors lead to specific attitudes about online shopping
Consumer behavior regarding the use of internet for shopping varies. Some
consumers either lack access or resist using this new channel of distribution, primarily
due to privacy and security concerns. Other shoppers choose to browse the Web so as
to gather information and then visit the stores to negotiate the purchase face to face
with the retailer. Few shoppers visit retail stores first and then buy from an e-retailer.
Still others do all the shopping online: gathering formation, negotiating, purchasing and
either arranging for delivery or picking up the merchandise in the store
Three key ways any Indian e-commerce store can get ahead of the game are:
investing in a brand story, adopting a multichannel strategy, and getting to grips with
social media content
8
1.2 Terminology
Online Shopping
The action or activity of buying goods or services over the internet. Online shopping is a form
of electronic commerce which allows consumers to directly buy goods or services from a seller
over the Internet using a web browser.
Customer
A customer is an individual or business that purchases the goods or services produced by a
business. Attracting customers is the primary goal of most public-facing businesses, because it
is the customer who creates demand for goods and services.
We classified customers in different groups like ‘more efficient’ and ‘less efficient’ or ‘ideal’,
‘ordinary’, ‘low profile’ based on how long he is shopping online, average amount of spending in
single purchase, frequency of internet usage, wish to shop online in future, etc.
Satisfaction towards online shopping
It is the extent or degree to which a customer is fulfilled by the experience of shopping through
internet.
In our study, the satisfaction score is sum of scores assigned to some of sub-questions of table
T1 and T2.
Qns 1,2,3,4,5,6,9,11,12,13,15,16,17 and 20 have assigned scores as below-
Strongly disagree ( score 1), Disagree (score 2), Indifferent (score 3), Agree (score4) and
Strongly agree (score5)
For Qns 8,10,14 and 18 the scoring is in reverse order.
Awareness towards online shopping
There are some psychological, personal and social factors which measure the level of
awareness in customer towards online shopping.
In our study, the awareness score is sum of scores assigned to sub-questions 1 to 10
and Qn 12 of table T3. The scoring is as below.
Never important ( score 1), Not important ( score 2), Indifferent ( score 3), Important ( score
4), Very Important ( score 5).
9
1.3 Main Objectives ofthe Study
 To check whether awareness of customer towards online shopping changes with
different vital factors like gender, residency, stream of education, education level
and income.
 To check whether Satisfaction of online shopping varies with gender, residency,
stream of education, education level and income
 To classify customers into different groups by means of logistic regression,
multinomial regression and discriminant analysis models
1.4 Limitations of the study
Few weak points of the study are enlisted below.
 Results may not be comparable for different geographical area or at different
times, as survey is conducted in Trivandrum district in the months of April-May.
 Survey is only related to ‘Online Shoppers’, no consideration about mere offline
shoppers.
 Sample size is 150, not too large.
1.5 Summaryof the study
In chapter 1 we given brief introduction about e-commerce. We explained the terms
used in the study. Then we also mentioned the main objectives of our study. There are
some limitations of the study, which are discussed in 1st chapter.
Chapter 2 deals with sampling techniques used for collection of data, the description of
the questionnaire-what are the questions included, how they are arranged, what are the
new variables constructed by using many choices of respondents .
Then there is theoretical approach of different statistical tools used throughout the study
and what are the. We included parametric and non - parametric tools according to their
applicability.
Chapter 3 is focused on analysis about awareness of customers towards online
shopping. We compared awareness level with different vital factors like gender,
residency, education level, education stream and customers of different monthly family
income.
10
Chapter 4 is all about satisfaction of customers towards e-shopping. Here also we
applied different statistical tools to see whether there exist significant difference in
satisfaction score for different vital phenomena like gender, residency, education level,
education stream of customers and with different monthly family income.
Chapter 5 has application of different regression models and discriminant analysis to
distinguish the type of customer. Thus we have got several reliable and well precise
models of classification.
We preferred R studio for analysis of the data.
11
Chapter 2
MATERIALS AND METHODS
2.1 Introduction
In chapter 1, we have introduced the terminologies associated with the topic and
described the objectives of the study. In this chapter we are describing the sampling
techniques used for survey, questionnaire and the analytical tools employed. The
details are given through different sections.
2.2 Sampling Techniques
The overall procedure for this study involved in administration of the questionnaire to a
sample of size 150 online shoppers from various parts of the city.
Multi-stage sampling
Multi-stage sampling (also known as multi-stage cluster sampling) is a more complex
form of cluster sampling which contains two or more stages in sample selection. In
simple terms, in multi-stage sampling large clusters of population are divided into
smaller clusters in several stages in order to make primary data collection more
manageable.
Out of total 100 wards in Trivandrum Corporation, most of the neighboring wards are
homogeneous in case of socio-economic conditions. So, all wards are classified into 10
within homogeneous blocks (1st stage sampling unit) and one ward from each block i.e.
2nd sampling unit is taken randomly, which are representative of corresponding block.
They are Kazhakuttom, Chellamangala, Kowdiar, Kachani, Poojappura,
Pappanamcode, Mulloor, Thampanoor, Kadakampally and Akkulam. Then by the
Simple Random sampling method, sample of size 150 is selected from the selected 10
wards.
Better effort had been put to have a sample representing the whole population. It
comprises all age groups of customer, different education levels like SSLC, Graduation,
Master and different streams of education like Arts, Science, Professional also different
occupations. Most of the questionnaires are completed by face to face interview to
avoid any personal bias from respondents.
12
2.3 Description of the Questionnaire
Each questionnaire has two parts-part A and part B, each respondent has asked to
touch all the questions. All questions are arranged in proper sequence, so as to get
reliable data as much as possible.
Part A has 7 questions, which comprises the personal data of respondents like age,
gender, residency, occupation, education level and monthly family income.
Whereas part B includes survey related questions. Questions 1 and 2 are about
internet usage. Then Qns 3 to 7 are about since how long the customer purchasing
online, spending nature and time spent on site. Qns 8 and 9 are for payment. Qn 10 is
of ranking to some elements of online shopping compared to store shopping. Qn 11 has
variety of products among which customer is asked to tick his shopping choices. Qn 12
is ranking of some popular e-retailers and also ranking of Filter option available in any
e-retailers site profile. Qns 13 to 16 are about how friendly the customer for shopping
site.
Then table T1 and T2 together have 20 factors, each having 5 options (a, b, c, d, & e)
which are nothing but the level of agreement from ‘Strongly disagree’ to ‘Strongly agree’
and informant is supposed to choose one of them. These 20 questions are checking
customer’s satisfaction towards online shopping.
Table T3 has 13 factors, which are awareness factors towards online purchase.
There are also 5 options (a, b, c, d, & e) which are levels of importance from ‘Never
important’ to ‘Very important’ and any one is supposed to choose.
We have done scoring to all sub-questions of table T1, T2 and T3, which is described
in terminology.
Then finally Qns from 17 to 25 are asked about influence of social media on online
purchase, technical problems suffered in various steps like payment, cancel or return of
ordered commodities, awareness about consumer rights and whether customer is
willing to purchase in future. (Questionnaire attached at last of document)
2.4 Statistical Tools
In accordance with the main objectives mentioned in previous chapter, we are utilising
some statistical tools to test different hypotheses.
13
Kruskal Wallis H Test
Kruskal Wallis H Test is often called as “Analysis of Variance by Ranks”. This non-
parametric test is especially desired when the k-samples do not come from normal
population, so non-parametric alternative to one way ANOVA. The null hypothesis here
tested,
H0 : k- independent samples come from same population.
Assumptions
1. Dependent variable should be measured at the ordinal or continuous level
2. Independent variable should consist of two or more categorical, independent
groups.
3. Should have independence of observations.
Steps involved
Step 1: Rank all of the scores, ignoring which group they belong to in ascending order.
Step 2: Find "Ti", the total of the ranks for each group. Just add together all of the ranks
for each group in turn
Step 3: Find the value of test statistic H.
𝐻 = {
12
𝑁(𝑁 + 1)
} [∑
𝑇𝑖
2
𝑛𝑖
𝑘
𝑖=1
] − 3(𝑁 + 1)
Where,
N → the total number of observations.
ni → Number of subjects in ith
group
k → Number of groups ( ≥3)
Ti
2
→ square of total ranks for scores in ith
group
Step 4 : The distribution of H is approximately ᵡ2
with k-1 d.f.
Test : reject H0 at α level of significance, when H > ᵡ2α(k-1)
14
Mann-Whitney U test
This is one of most powerful non-parametric test and is alternative to two sample t-
test. The null hypothesis tested here is,
H0 : Two independent random samples come from same population.
Assumptions
Suppose two samples drawn from two independent populations X and Y.
1. X and Y are continuous distributions (or discrete distributions well-approximating
continuous distributions)
2. X and Y have the same shape. The only possible difference is their position (i.e.
the value of the median)
3. the number of elements in each sample is not less than 5
4. the samples are independent
5. scale of measurement should be ordinal, interval or ratio
How it works
To make it simple, the U-test works as follows. Both samples (having sizes N and M)
are combined into one array which is sorted in ascending order. We keep information
about which sample the element had come from. After sorting, each element is replaced
by its rank (its index in array, from 1 to N+M). Then the ranks of the first sample
elements are summarized and the U-value is calculated:
The mean of U equals NM/2. If U is close to this value, the medians of X and Y are
close to each other. If we know distribution quantiles, we can get the significance level
corresponding to the value of U.
Normal approximation
Although U has discrete distribution if N and M are big it could be approximated by the
normal distribution with a mean of NM/2 and standard deviation 𝜎 = √
𝑁𝑀(𝑁𝑀+1)
12
Thus, 𝑍 =
𝑈−
𝑁𝑀
2
√
𝑁𝑀(𝑁+𝑀+1)
12
can be used as test statistic, which has 𝑁(0,1) distribution.
15
Kolmogorov-Smirnov two sample test
This non parametric test is used to test the null hypothesis,
H0: Two data samples come from the same distribution. Note that we are not specifying
what that common distribution is.
i.e. H0: 𝐹 𝑚(𝑥)= 𝐺 𝑛(𝑥) for all x.
The test statistic Dm,n is defined as below,
𝐷 𝑚,𝑛 = 𝑆𝑢𝑝|𝐹̂ 𝑚(𝑥) − 𝐺̂𝑛 (𝑥)|
Where, 𝐹̂ 𝑚(𝑥) and 𝐺̂𝑛(𝑥) are empirical distribution functions of the two samples. Test
reject H0 at α level of significance if 𝐷 𝑚,𝑛 > 𝐷 𝑚,𝑛(𝛼) i.e. accept H0 if 𝐹̂ 𝑚 and 𝐺̂𝑛 are close
for each x.
Students two sample t-test
This test is used to test the null hypothesis
H0: Two population means are not significantly differ. .i.e. 𝜇1 = 𝜇2
Assumptions
1. Each sample is randomly selected from corresponding populations.
2. Populations from which sample drawn are normal.
3. The variance of two populations does not differ significantly.
So before applying t-test, we should go through variance test.
How it works
Suppose we have two independent random samples of size 𝑛1 𝑎𝑛𝑑𝑛2 from normal
populations N(𝜇𝑖, 𝜎𝑖
2
), i =1,2. Let 𝑋𝑖1, 𝑋𝑖2, ⋯⋯ 𝑋𝑖𝑁𝑖
is sample from ith population.
We have to test H01: 𝜎1
2
= 𝜎2
2
Suppose, 𝑠𝑖
2
=
1
𝑛𝑖 −1
∑ (𝑥 𝑖𝑗 − 𝑥̅ 𝑖)
2𝑛𝑖
𝑗=1
is the ith sample variance, i =1,2 and j=1,2,,,,𝑛𝑖
Then F statistic is,
16
Under H01 𝐹 =
𝑠1
2
𝑠2
2 ~𝐹(𝑛1 − 1, 𝑛2 − 1)
Test accept H01 at 5%level of significance if 𝐹 < 𝐹0.05(𝑛1 − 1, 𝑛2 − 1) and then only
perform students t test for testing H0
The test statistic is given below
𝑡 =
𝑋̅1 − 𝑋̅2
√
( 𝑛1 − 1) 𝑠1
2 + ( 𝑛2 − 1) 𝑠2
2
𝑛1 + 𝑛1 − 2
(
1
𝑛1
+
1
𝑛2
)
~𝑡 𝑛1+𝑛2−2
If | 𝑡| > 𝑡0.05(𝑛1 + 𝑛2 − 2) we reject H0 at 5% level of significance.
17
Chapter 3
STUDY ON AWARENESS OF CUSTOMERS
3.1 Introduction
The facility of Online purchasing has allowed customers to identify the different types of
products available in the global market, Due to rapid globalization, all types of products are
available on the internet .Goods and services, consumer durables, books, audio and video
cassettes and services like and air tickets can also be purchased online.
In this era of fast moving lifestyle, customers are busier than what they were few years
back. It is precisely for this reason customers are also purchasing their products and
services through online shopping. Marketplace is fast turning into e-marketplace. So
customer’s awareness is very important to get maximum benefit and to be least affected by
online fraud like issues.
In our study we asked informant to rank some factors according to their preference and
these factors are measures of each respondent’s awareness level towards online shopping.
The questions are separately put in table T3 of questionnaire which is attached in appendix
section. Mostly awareness of e-shopper depends on some terms like reputation of e-seller,
guarantee-warrantee, privacy and security, advertisements, impact of review and rating, etc
Finally the awareness scores are obtained by summing up these scores in particular way
(explained in Terminology).
3.2 Test for Normality of awareness score
18
19
Thus, pattern of the data somewhat deviated from normality, so it would be prefer to
have conclusions on non-parametric tools, even though both are used.
3.3 Awareness towards online shoppingwith Gender
In this section our interest is to test whether the Awareness score varies with gender.
H0 : Mean awareness score of customer are identical in both Male and Female.
H1 : Mean awareness score of customer are not identical in Male and Female.
Let us firstly check whether the variances of awareness score are significantly differen
t in male and female by F test. R yields following output.
F test to compare two variances
data: awarescore[Gender == "m"] and awarescore[Gender == "f"]
F = 1.2198, num df = 100, denom df = 48, p-value = 0.4482
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7304366 1.9516958
sample estimates:
ratio of variances
1.219822
Thus F test shows that awareness score variances are not significant in male and femal
e populations. So we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: awarescore[Gender == "m"] and awarescore[Gender == "f"]
t = -1.2207, df = 104.12, p-value = 0.2249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.627605 0.625180
sample estimates:
mean of x mean of y
45.03960 46.04082
As p-value is not significant, we accept H0 at 5% level of significance.
Thus, mean awareness score for male and female are not significantly different.
20
We applied Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: awarescore by Gender
W = 2769.5, p-value = 0.2369
alternative hypothesis: true location shift is not equal to 0
As p-value is not significant, accept H0 .Here one can claim that Median awareness scor
e for male and female is not significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test
H0 : Awareness scores for male and female have same unknown distributions
H1 : Awareness scores have different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: awarescore[Gender == "m"] and awarescore[Gender == "f"]
D = 0.15619, p-value = 0.3967
alternative hypothesis: two-sided
Here we don’t have a proof against H0 so we accept H0 at 5% level of significance.
21
3.4 Awareness towards online shoppingwith Residency
In this section we are going to test whether the Awareness score varies with residency
of customer.
H0 : Mean awareness score of customer are identical in Rural and Urban population.
H1 : Mean awareness score varies with residency.
First of all we try to check whether the variances of awareness score are significantly
different in customers belonging to Rural/Town and Urban area by F test. R yields follo
wing output.
F test to compare two variances
data: awarescore[Residency == "Urban"] and awarescore[Residency == "
Rural/Town"]
F = 1.2906, num df = 53, denom df = 95, p-value = 0.2789
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.8122947 2.1191771
Sample estimates:
ratio of variances
1.29058
Thus F test shows that awareness score variances are not significant in both residenti
al customers. So we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: awarescore[residency == "b"] and awarescore[residency == "a"]
t = 0.81259, df = 98.798, p-value = 0.4184
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.009676 2.410139
sample estimates:
mean of x mean of y
45.81481 45.11458
As p-value is not significant, we accept H0 at 5% level of significance.
Mean awareness score Rura/townl and Urban population are not significantly different.
22
For Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: awarescore by residency
W = 2379, p-value = 0.4044
alternative hypothesis: true location shift is not equal to 0
As p-value is not significant, we accept H0 and Median awareness score for rural/tow
n and urban area customers is not significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test as
H0 : Awareness scores for rural/town and urban residential customers have same
unknown distributions
H1 : Awareness scores come from different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: awarescore[residency == "b"] and awarescore[residency == "a"]
D = 0.16782, p-value = 0.2847
alternative hypothesis: two-sided
Here also we don’t have an evidence against H0 so we accept H0 at 5% significance lev
el of significamce.
23
3.5 Awareness with Education level
Now we have to check whether the Awareness score varies with education level of
customer like +2/Diploma, Graduation, Master and Higher degrees.
H0 : Mean awareness score are same for customer with any education level.
H1 : Mean awareness score varies with level of education.
Here R has following output for ANOVA
aov(formula = awarescore ~ edu)
Df Sum Sq Mean Sq F value Pr(>F)
edu 3 30 9.952 0.412 0.744
Residuals 146 3523 24.130
Thus we accept H0 since there is no proof against it.
Sometimes the conditions for ANOVA may be violated for ranked data. So a non-
parametric alternative to one way analysis of variance –Kruskal Wallis H test is used
as below.
Kruskal-Wallis rank sum test
data: awarescore by Edu
Kruskal-Wallis chi-squared = 1.6646, df = 3, p-value = 0.6448
Now we accept null hypothesis based on Kruskal Wallis H test also.
Thus, the awareness towards online shopping is almost same in all customers with diffe
rent education levels.
24
3.6 Awareness with Stream of education
In this section we are going to test whether the awareness score varies with
educational streams like Arts, Science and Professional.
H0 : Mean awareness score of customer are identical in all streams.
H1 : Mean awareness score varies with stream.
Firstly we are going through ANOVA. Here R has following output
aov(formula = awarescore ~ stream)
Df Sum Sq Mean Sq F value Pr(>F)
stream 2 38 19.09 0.798 0.452
Residuals 147 3515 23.91
There is no proof against null hypothesis, so obviously due to non-significant p-value we
accept H0 at 5% level of significance.
As scores are obtained by summation of ordered values, sometimes the conditions for
ANOVA may be violated. So a non-parametric alternative to one way analysis of
variance –Kruskal Wallis H test is used as below.
Kruskal-Wallis rank sum test
data: awarescore by factor(stream)
Kruskal-Wallis chi-squared = 1.5512, df = 2, p-value = 0.4604
as p-value > 0.05, we accept the null hypothesis that awareness score does not vary sig
nificantly with different streams.
25
3.7 Awareness with Income groups
Here we focus on testing whether the Awareness score varies with different family
income groups (Monthly) like below 20000, 20000 to 50000, 50000 - 75000, 75000-1
Lakh and above 1 Lakh.
H0 : Mean awareness score of customer are identical in all income categories
H1 : Mean awareness score varies with different income groups.
Firstly we are using ANOVA. Here R has following output
aov(formula = awarescore ~ income)
Df Sum Sq Mean Sq F value Pr(>F)
income 4 16 4.082 0.167 0.955
Residuals 145 3537 24.390
Thus we accept H0 since there is no evidence for its rejection.
Since the score data obtained may sometimes exhibit certain deviation from the
assumption of ANOVA, we apply usual non-parametric alternative to one way analysis
of variance viz Kruskal Wallis H test also to the data and R displays following output.
Kruskal-Wallis rank sum test
data: awarescore by Income
Kruskal-Wallis chi-squared = 0.70578, df = 4, p-value = 0.9506
Kruskal Wallis H test also support to accept H0 against H1
So we conclude that all income group customers have almost equal awareness.
26
3.8 Awareness with two groups-‘interested’and ‘not-interested’
We can focus on the awareness level of customers who wish to continue e-shopping i
n future and who are not interested in future. The hypotheses are
H0 : Awareness score does not vary in both group of customers (willing and not willing
to continue online shopping in future)
H1 : Awareness score vary in both group of customers.
Since, the ‘not interested’ group size is small (=7) we use Mann Whitney U test and R
yields following output.
Wilcoxon rank sum test with continuity correction
data: awarescore[Entry_out_1_0$continue == 1] and awarescore[Entry_ou
t_1_0$continue == 0]
W = 575, p-value = 0.5087
alternative hypothesis: true location shift is not equal to 0
Thus test support to accept H0 i.e. both group customers are equally aware about onli
ne shopping.
We can’t use two sample t test, because the variation for both groups are different whic
h can be seen in following boxplot.
Even though medians are not significant, the boxplot reveals that those customers not w
illing to continue are not much well in case of awareness.
27
3.9 Conclusions
From the analysis carried out through ‘out the chapter, we are about getting following
conclusions
 Awareness score does not vary significantly for males and females. Thus
customers have almost same awareness score irrespective of gender.
 Awareness score does not vary with rural/town and urban residential customers.
That is e-shoppers belonging to any residency have almost similar awareness
about online shopping.
 There is no significant difference in awareness score for customers of different
education qualifications. Thus any customer has almost similar awareness about
online shopping, irrespective of education level. (For our study, the respondents
are minimum +2/diploma holders)
 There is no significant difference in awareness score for customers from different
educational streams like arts, science and professional. Thus all stream
customers exhibit almost similar awareness about online shopping.
 We could see there is no significant difference in awareness scores for the
customers of different family income. Thus, irrespective family income, e-
customers are almost equally aware about online shopping.
 There is no significant difference in awareness level for the customers who are
willing to continue e-shopping and who are not willing for it in future.
If we consider variation in awareness score of these two groups we say that the
customers who are not willing to continue are not doing well in case of
awareness. We are seeing satisfaction level for same two groups in next chapter.
28
Chapter 4
Study on satisfaction of customers towards online shopping
4.1 Introduction
With the rapid global growth in electronic commerce (e-commerce), businesses are
attempting to gain a competitive advantage by using e-commerce to interact with
customers. Growing numbers of consumers shop online to purchase goods and
services, gather product information or even browse for enjoyment. Online shopping
environments are therefore playing an increasing role in the overall relationship
between marketers and their consumers (Koo et al. 2008). That is, consumer-purchases
are mainly based on the cyberspace appearance such as pictures, images, quality
information, and video clips of the product, not on the actual experience
If a customer wants to purchase something online, there are plenty of online providers
available and multiple brands are also available for single product. Then for wealthy
business maintenance, the consumer satisfaction is very mportant.
In previous chapter we tried to know how awareness about e-shopping changes with
different factors like gender, residency, income, education, etc. Here also we are going
through same factors and satisfaction.
4.2 Normality test of satisfaction score
From the graph we conclude that data holds normality property
29
4.3 Satisfaction towards online shoppingwith Gender
In this section our interest is to test whether the satisfaction score varies with gender.
H0 : Mean satisfaction score of customer are identical in both Male and Female.
H1 : Mean satisfaction score of customer are not identical in Male and Female.
Let us firstly check whether the variances of satisfaction score are significantly differen
t in male and female by F test. R yields following output.
F test to compare two variances
data: sscore[Gender == "f"] and sscore[Gender == "m"]
F = 1.2678, num df = 48, denom df = 100, p-value = 0.3204
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7923922 2.1172386
sample estimates:
ratio of variances
1.267814
Thus F test shows that satisfaction score variances are not significant in male and femal
e populations. So we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: sscore[Gender == "f"] and sscore[Gender == "m"]
t = -3.8227, df = 85.738, p-value = 0.0002495
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.629189 -2.408799
sample estimates:
mean of x mean of y
59.69388 64.71287
As p-value is significant, we reject H0 at 5% level of significance.
Thus, mean satisfaction score for male and female are significantly different.
30
We applied Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: sscore[Gender == "f"] and sscore[Gender == "m"]
W = 1555, p-value = 0.0002268
alternative hypothesis: true location shift is not equal to 0
As p-value is significant, reject H0 .Here one can claim that Median satisfaction score for
male and female is significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test
H0 : Satisfaction scores for male and female have same unknown distributions
H1 : Satisfaction scores have different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: sscore[Gender == "f"] and sscore[Gender == "m"]
D = 0.28188, p-value = 0.01057
alternative hypothesis: two-sided
Here alsowe rejectnull hypothesis.
31
4.4 Satisfaction to shoppingonline and Residency
In this section we are going to test whether the satisfaction score varies with residency
of customer.
H0 : Mean satisfaction score of customer are identical in Rural and Urban population.
H1 : Mean satisfaction score varies with residency.
First of all we try to check whether the variances of satisfaction score are significantly
different in customers belonging to Rural/Town and Urban area by F test. R yields follo
wing output.
F test to compare two variances
data: sscore[residency == "b"] and sscore[residency == "a"]
F = 1.2127, num df = 53, denom df = 95, p-value = 0.4109
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7632736 1.9912871
sample estimates:
ratio of variances
1.212695
Thus F test shows that satisfaction score variances are not significant in both residential
customers, one we can use two sample t-test for testing H0 against H1
Welch Two Sample t-test
data: sscore[residency == "b"] and sscore[residency == "a"]
t = 0.56677, df = 101.4, p-value = 0.5721
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.883585 3.390529
sample estimates:
mean of x mean of y
63.55556 62.80208
As p-value is not significant, we accept H0 at 5% level of significance.
Thus, mean satisfaction score Rura/townl and Urban population are not significantly diff
erent.
32
For Mann-Whitney U test and the R output is as below
Wilcoxon rank sum test with continuity correction
data: sscore[residency == "b"] and sscore[residency == "a"]
W = 2674, p-value = 0.7494
alternative hypothesis: true location shift is not equal to 0
As p-value is not significant, we accept H0 and Median satisfaction score for rural/tow
n and urban area customers is not significantly different at 5% level of significance.
We are stating null and alternatives as below for two sample Kolmogorov-Smirnov
test as
H0 : Satisfaction scores for rural/town and urban residential customers have same
unknown distributions
H1 : Satisfaction scores come from different distributions
R yields following output.
Two-sample Kolmogorov-Smirnov test
data: sscore[residency == "b"] and sscore[residency == "a"]
D = 0.096065, p-value = 0.9073
alternative hypothesis: two-sided
Here also we don’t have an evidence against H0 so we accept H0 at 5% significance
level.
33
4.5 Satisfaction with Educationlevel
Now we have to check whether the Satisfaction score varies with education level of
customer like +2/Diploma, Graduation, Master and Higher degrees.
H0 : Mean satisfaction score are same for customer with any education level.
H1 : Mean satisfaction score varies with level of education.
Here R has following output for ANOVA
aov(formula = sscore ~ Edu)
Df Sum Sq Mean Sq F value Pr(>F)
Edu 3 181 60.21 1.046 0.374
Residuals 146 8402 57.54
Thus we accept H0 since there is no proof against it.
Sometimes the conditions for ANOVA may be violated for ranked data. So a non-
parametric alternative to one way analysis of variance –Kruskal Wallis H test is used
as below.
Kruskal-Wallis rank sum test
data: sscore by Edu
Kruskal-Wallis chi-squared = 3.4159, df = 3, p-value = 0.3318
Now we accept null hypothesis based on Kruskal Wallis H test also.
Thus, the satisfaction towards online shopping is almost same in all customers with
different education levels.
34
4.6 Satisfaction with Income groups
Here we focus on testing whether the Satisfaction score varies with different family
income groups (Monthly) like below 20000, 20000 to 50000, 50000 - 75000, 75000-1
Lakh and above 1 Lakh.
H0 : Mean satisfaction score of customer are identical in all income categories
H1 : Mean satisfaction score varies with different income groups.
Firstly we are using ANOVA. Here R has following output
aov(formula = sscore ~ income)
Df Sum Sq Mean Sq F value Pr(>F)
Income 4 265 66.25 1.155 0.333
Residuals 145 8317 57.36
Thus we accept H0 since there is no evidence for its rejection.
Since the score data obtained may sometimes exhibit certain deviation from the
assumption of ANOVA, we apply usual non-parametric alternative to one way analysis
of variance viz Kruskal Wallis H test also to the data and R displays following output.
Kruskal-Wallis rank sum test
data: sscore by Income
Kruskal-Wallis chi-squared = 3.2958, df = 4, p-value = 0.5096
Kruskal Wallis H test also support to accept H0 against H1
So we conclude that all income group customers have almost equal satisfaction towards
online shopping.
35
4.7 Satisfaction with two groups- ‘interested’ and ‘not-interested’
In previous chapter we found that awareness about e-shopping is almost same for both
groups of customers who willing and not willing to continue it in future. Now we can focu
s on the how both groups are satisfied for e-shopping. The hypotheses are
H0 : Satisfaction score does not vary in both group of customers (willing and not willing
to continue online shopping in future)
H1 : Satisfaction score vary in both group of customers.
Since, the ‘not interested’ group size is small (=7) we use Mann Whitney U test and R
yields following output.
Wilcoxon rank sum test with continuity correction
data: sscore[Entry_out_1_0$continue == 1] and sscore[Entry_out_1_0$co
ntinue == 0]
W = 810.5, p-value = 0.005761
alternative hypothesis: true location shift is not equal to 0
Thus test has evidence against H0 So we reject H0 and accept H1
The mean satisfaction score for ‘not-interested’ customer group is less than that for cust
omer who is interested in e-shopping
36
4.8 Conclusions
Through’ out the chapter, we carried out several tests and following are some
conclusions we obtained.
 Satisfaction score vary significantly for males and females. We have seen that
female customers seem more satisfied than male customers.
 Satisfaction score does not vary with rural/town and urban residential customers.
That is e-shoppers belonging to any residency have almost similar satisfaction
about online shopping.
 There is no significant difference in satisfaction score for customers of different
education qualifications. Thus any customer has almost similar satisfaction about
online shopping, irrespective of education level. (For our study, the respondents
are minimum +2/diploma holders)
 We could see there is no significant difference in satisfaction scores for the
customers of different family income. Thus, irrespective family income, e-
customers are almost equally aware about online shopping.
 There do exist significant difference in satisfaction level for the customers who
are willing to continue e-shopping and who are not willing for it in future.
The customer who is not interested to shop online in near future has less mean
satisfaction score than the interested customer.
One can claim that even the customers are aware, their satisfaction is important for any
e-stores to keep market wealthy.
37
Chapter 5
CLASSIFICATION BY MEANS OF REGRESSION AND DISCRIMINANT
ANALYSIS
5.1 Introduction
Classification is a powerful tool in machine learning which classifies categorical
response based on several information provided by several explanatory variables. It
would be really important to any e-shopper to understand the type of the customer
based on his past records.
In our study we used binary logistic regression to distinguish customers into two classes
‘efficient customer’ and ‘less efficient customer’. The response variable is firstly
constructed based on several variables described in Terminology.
Then there is use of multinomial logistic regression for grouping customer in three
different classes ‘ideal’, ’ordinary’ and ’low profile’. They are also based on several
variables and is discussed in Terminology.
In both of cases, we fitted the model for all variables through which response was
constructed. Then we preferred model simplification by removing non-significant
explanatory variables such that there should not be much loss in precision of simpler
model we get at last.
Finally we used linear discriminant analysis in relative to above two regression models.
So there is comparison of each model with one another interms of precision.
Confusion matrix and misclassification for each model tells how reliable the model is.
There is also representation of some regression graphs to know how if any variable is
significant for particular model.
5.2 Binary Logistic Regression of Outcome Y on explanatory variables
U, V, W and X
38
Y : Outcome, new binary variable defined with value 1 (efficient customer) if customer is
frequent internet user, spending not less than 500 in single purchase, shopping online s
ince more than 2 years and should be willing to purchase in future, otherwise 0 (not effic
ient).
U : Since how many years the customer purchasing online.
Ordinal, with ordered levels 1,2,3 & 4
V : Average spending in single purchase, Factor with levels a,b,c & d
W : Frequency of internet usage, ordinal with levels 1,2,3 & 4
X : wish to continue online shopping in future, 1 if yes and otherwise 0.
If p is P(Y=1), then the model becomes
Logit(p) = α + βu + δiv + σw + λx i = b,c,d
Here, level ‘a’ of U is redundant
Link function is Logit (p)=Log(
𝑝
1−𝑝
)
R yields following summary of model.
glm(formula = out ~ u + w + x, family = "binomial", data = Dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.7494 -0.1687 -0.0105 -0.0101 0.6665
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -39.42289 2655.79412 -0.015 0.988
u 5.63671 1.15708 4.871 1.11e-06 ***
w 0.05184 1.33980 0.039 0.969
x 23.85253 2655.78879 0.009 0.993
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 150.121 on 149 degrees of freedom
Residual deviance: 30.496 on 146 degrees of freedom
AIC: 38.496
Number of Fisher Scoring iterations: 18
This complex model shows that, some of the variables are less significant effect on Y. S
o, for simplified model, we use backward elimination method of model simplification and
the simplest model obtained below.
39
Logit(p) = α + βu
glm(formula = out ~ u, family = "binomial", data = Dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1870 -0.2318 -0.2318 -0.0267 0.8854
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -12.280 2.263 -5.426 5.75e-08 ***
u 4.338 0.817 5.310 1.10e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 150.121 on 149 degrees of freedom
Residual deviance: 45.176 on 148 degrees of freedom
AIC: 49.176
Number of Fisher Scoring iterations: 7
Thus the model is
Logit(p) = -12.28+4.338u
𝑝̂=
𝑒 𝑦
1+𝑒 𝑦
where y = -12.28+4.338u
Confusion Matrix
Predicted
actual 0 1
0 115 5
1 0 30
Misclassification error is 3.333333 %
Following are some graphs reflecting the significance of different variables on the outco
me.
40
41
42
5.3 Multinomial Logistic Regression of customer type Y on several explanat
ory variables given below.
Y→New categorical variable for type of customer, having 3 categories-‘ideal’, ’ordinary’
and ‘lowprofile’.
This variable is constructed by using different variables like how customer satisfied with
current custom of online shopping, how much he is confident in digital payment system,
and so on.
P→factor, past years since purchasing online
Q→factor, average spending in single purchase
R→factor, spent in last purchase
S→ordinal, whether the customer using price comparison sites
T→ordinal, for preference to read review/ratings
U→ordinal, for preference to share own review/ratings
V→ordinal, whether feeling safe/secure in online shopping
W→ordinal, agreement level at hesitation for online payment
X→ordinal, agreement level for overall satisfaction
Z→numeric, (yes=1 or no=0 to continue shopping online in future)
The model thus become,
Log(
𝑝2
𝑝1
)= α + βjp +βjq +βkr + δ1s +δ2t+ δ3u+ δ4v+ δ5w + δ6x+ σz
Log(
𝑝3
𝑝1
)= α’ + β’jp +β’jq +β’kr + δ’1s +δ’2t+ δ’3u+ δ’4v+ δ’5w + δ’6x+ σ’z
where i,j,k=b,c,d
p1=P(customer is ’ideal’)
p2=P(customer is ’lowprofile’)
p3=P(customer is ’ordinary’)
here p1+p2+p3=1 and R has taken ‘ideal’ as a ‘reference level’, so the term p 1 is at
denominator
The complex model is,
(Intercept) pb pc pd qb qc qd rb
lowprofile 120.6 -70.6 -47.20 -114.60 -67.41 -10.180 -178.90 20.13
ordinary 128.5 -70.8 -44.99 13.51 -65.28 -5.829 -96.74 20.58
rc rd s t u v w x z
lowprofile 56.16 82.16 13.66 99.82 15.30 -31.26 26.19 -65.09 67.41
ordinary 55.61 79.86 12.13 101.30 16.71 -28.33 20.44 -63.01 65.00
Residual Deviance: 35.16687 AIC: 103.1669
43
Wald Test for significance of coefficients
To know the significance of different coefficients, Wald test gives the p-values which are
calculated by using corresponding standard errors
(Intercept) pb pc pd qb qc qd rb rc
lowprofile 0 1.045e-09 0 NaN 0 0.1069 0.000e+00 0 4.570e-09
ordinary 0 9.291e-10 0 0 0 0.3561 3.672e-07 0 6.354e-09
rd s t u v w x z
lowprofile 0 0.3235 0 0.5426 0.4987 0.3958 0.04770 6.058e-09
ordinary 0 0.3809 0 0.5063 0.5397 0.5075 0.05522 2.061e-08
Now by removing some non significant variables like S and U, also introducing interactio
n between V and W i.e. feeling safe/secure and hesitation at payment, we get more pre
cise and simple model as,
Log(
𝑝2
𝑝1
)= α + βjp + βjq + βkr + δ1t + δ2v + δ3w + δ4x + σz +λv*w (1)
Log(
𝑝3
𝑝1
)= α’ + β’jp + β’jq + β’kr + δ’1t + δ’2v + δ’3w + δ’4x + σ’z +λ’v*w (2)
where i,j,k=b,c,d
If equation (1) is y1 and (2) is y2 , then estimated probabilities are
𝑝̂1=
1
1+𝑒 𝑦1+𝑒 𝑦2
𝑝̂2=
𝑒 𝑦2
1+𝑒 𝑦1+𝑒 𝑦2
𝑝̂3=
𝑒 𝑦3
1+𝑒 𝑦1+𝑒 𝑦2
(Intercept) pb pc pd qb qc qd rb
lowprofile -203.4 -58.44 -34.43 -164.70 -69.17 -10.44 -237.0 12.19
ordinary 373.2 -58.77 -33.76 57.36 -40.89 23.12 -75.2 12.38
rc rd t v w x z v:w
lowprofile -30.23 76.98 41.94 98.60 170.00 -108.60 39.51 -37.01
ordinary 11.83 70.84 52.22 -64.85 -52.53 -54.38 35.25 18.02
Confusion matrix
Misclassification percent is 1.33, which is very less.
Predicted
actual ideal lowprofile ordinary
ideal 4 0 0
lowprofile 0 32 1
ordinary 0 1 112
44
Interms of probability, first four predictions of this model are
ideal lowprofile ordinary
1 0 0.4208 0.5792
2 0 1.0000 0.0000
3 0 0.0000 1.0000
4 0 0.0000 1.0000
45
5.4 Linear Discriminant Analysis (LAD) of outcome on U, V and X
Y→Outcome, new binary variable defined with value 1 (efficient customer) if customer
is frequent internet user, spending not less than 500 in single purchase, shopping online
since more than 2 years and should be willing to purchase in future, otherwise 0 (not eff
icient).
U→Since how many years the customer purchasing online.
Ordinal, with ordered levels 1,2,3 & 4
V→Average spending in single purchase, Factor with levels a,b,c & d
shopping in future, 1 if yes and otherwise 0.
X→wish to continue online
Where R gives following outputs about discriminant analysis
lda(out ~ u + v + x, data = Dat)
Prior probabilities of groups:
0 1
0.8 0.2
Group means:
u v x
0 1.691667 2.150000 0.9416667
1 3.400000 2.333333 1.0000000
Coefficients of linear discriminants:
LD1
u 1.71759540
v -0.04364488
x 0.98842355
Confusion matrix
predict
actual 0 1
0 116 4
1 0 30
Misclassification percent is 2.66
46
5.5 Linear Discriminant analysis (LAD) of Y (customer type) on several
ordinalresponses
Here, all explanatory variables are ordinal except Z one, which is numeric
Y=categorical, customer type (dependent variable)
P<-categorical, past years since purchasing online (past.f)
Q<-ordinal, average spending in single purchase
T<-categorical, for preference to read review/ratings (reviewr.f)
V<-ordinal, whether feeling safe/secure in online shopping
W<-ordinal, agreement level at hesitation for online payment
X<-ordinal, agreement level for overall satisfaction
Z<-numeric, (yes/no to continue shopping online in future)
R yields following discriminant output.
lda(y ~ past.f + q + reviewr.f + v * w + x + z, data = Dat2)
Prior probabilities of groups:
ideal lowprofile ordinary
0.02666667 0.22000000 0.75333333
Group means:
past.fb past.fc past.fd q reviewr.fb reviewr.fc v
ideal 1.0000 0.0000 0.0000 2.5 0.00000 0.00000 4.0000
lowprofile 0.6060 0.2121 0.0000 2.06 0.18181 0.03030 2.2424
ordinary 0.4159 0.1238 0.1238 2.21 0.3185 0.053097 2.9026
w x z v:w
ideal 2.2500 3.2500 1.00000 9.000000
lowprofile 4.0909 1.7272 0.96969 9.151515
ordinary 2.6106 2.1504 0.94690 7.769912
Coefficients of linear discriminants:
LD1 LD2
past.fb 0.03967307 1.1541507
past.fc -0.07931444 0.8777495
past.fd 1.54621131 -0.0129210
q 0.39109904 0.2327005
reviewr.fb 0.85215592 -0.5408363
reviewr.fc 0.39822460 -0.8159128
v -0.70579709 1.7595617
w -2.26878996 1.3843756
x 0.48751834 0.9226888
z 0.58574159 1.0723062
v:w 0.45171501 -0.4799412
Proportion of trace:
LD1 LD2
0.8776 0.1224
47
Confusion matrix
predicted
actual ideal lowprofile ordinary
ideal 4 0 0
lowprofile 0 30 3
ordinary 1 5 107
Misclassification error is 6%,
Here is neat display of classification done by above model.
48
5.6 Conclusions
 The binary logistic regression model reveals that for consumer to be ‘efficient’ and ‘less e
fficient’ the past purchasing behavior has significant role. The misclassification error is ab
out 3.33%, which is preferably less.
 The multicategory logistic regression model works well with several significant explanator
y variables like past purchase behavior, amount spent in single purchase and continue k
eeping online purchase.
The misclassification percent is just about 1.33, so model is said to be more reliable.
 The discriminant analysis model for same variables of binary logistic model has misclassi
fication percent 2.66, indicating is preferably better over binary logistic model.
 The discriminant analysis model for some variables of multicategory logistic regression h
as misclassification error about 6%. The grouping is shown by graph using R tool.
49
REFERNCE
 Alan Agresti(2002). An Introduction To Categorical Data Analysis,
Second Edition, Wiley Series in Probability and Statistics.
 Michael J Crowely (2007).The R Book, Wiley
 Rohatgi V.K (1995).An Introduction To Probability Theory and
Mathematical Statistics.
 Lehmann E. L.(1975).Non Parametric Statistical Methods Based on
Ranks.
 Jared P. Lander (2013).R For Everyone : Advanced Analytics and
Graphics, Kindle Edition
50
Department Of Statistics
University of Kerala
CASE STUDY ON STATUS OF ONLINE SHOPPING IN TRIVANDRUM DISTRICT
Questionnaire
A) Personal Data
1.Name (optional) :
2.Gender : Female □ Male □
3.Age(years) :
4.Residency: a) Rural/Town □ b) Urban □
5.Educationqualification: a) SSLC □ b) plus 2 /Diploma □
c) Graduation □ d) Master □ e) Professional □
(Please specify the stream for above qualifications)
6.Occupation : a) Student □ b) Teacher/Researcher □
c) ITprofessional □ d) Engineer/Industrial □
e)Business/Management □ f)Civilservice □ g)Other □ (specify)_____
7.Monthly family Income(Rs) : a) <20,000 □ b) 20,000-50,000□
c) 50,000-75,000 □ d) 75,000-1Lakh □ e) above 1 Lakh □
B) Survey RelatedQuestions
1) Where do you access internet primarily?
a) Mobile □ b) PC □ c)Tablet / Ipad □
d) Office/workplace □ e) Others □
51
2) How often would you use internet in a week (except study & work)?
a)Daily □ b) Oncein 2-3 days □ c)once in week □ d)less frequently □
3) When did you purchaseonline lastly?
a)Within last week □ b)Within last month □
c)Beforemonth □ d)before3-4 months □ e)before 6 months □
4) Since how many years you have been shopping through online?
a) 1 year □ b)2-3 years □ c)4-5 years □ d)morethan 5 years □
5) On an averagehow much would you spend in single purchase?(inRs)
a) Less than 500 □ b) 500-2000 □
c) 2000-5000 □ d) more than 5000 □
6) Approximately how much you had spent on a last purchase? (inRs)
a) Less than 500 □ b) 500-2000 □
c) 2000-5000 □ d) more than 5000 □
7) Approximately how much time you had spent on a last purchase?
a) few minutes □ b) 15 to 30 min □
c) 30-60 min □ d) morethan 1 hr □
8) Do you have -
Credit/Debit card Yes □ No □
E-banking facility Yes □ No □
9) Which payment method did you use for last purchase?
a) Credit/Debit card □ b) Net banking /Digital Wallet □
c) Cash on delivery □ d) Others (specify) □_____
52
10) Rank the elements thosepromote you to purchaseonline.
(most preferred has rank 1 & least one has rank 6)
Element Ranking
(by preference)
a. Convenient &
Relaxed way
b. Door to door
service
c. Specific Product
information
d. Low price
e. Variety of
products
f. Time save
11) What Products you usually purchaseonline?
Product Choice
Yes No
a. Clothing& Accessories □ □
b. Books & Stationary □ □
c. Mobile &
Computer/Accessories
□ □
d. Electronic & Digital
Accessories
□ □
e. Home, Kitchen & Pets □ □
f. Toys & Baby Products □ □
g. Sports, Fitness &
outdoor
□ □
h. Beauty, health &
Cosmetics/Jewellery
□ □
53
12) Rank the following according to your preference.
13) Would you prefer using price comparison sites?
a) Almostalways □ b) sometimes □ c) rarely/never □
14) Do you prefer to read review/ratings of product by other purchasers?
a) Almostalways □ b) sometimes □ c) rarely/never □
15) Do you express your opinion in the “Productreview/rating” section?
a) Almostalways □ b) sometimes □ c) rarely/never □
16) I am willing to pay more if
a) website offer free delivery □ b) faster/fastestdelivery options □
c) tax free shopping □ d) item not available offline □
Factor Ranking
a) Brand
b) Popularity(rating/reviews)
c) Price (low-high/high-low)
d) Discount/Offer/Coupon
e) Fresh arrivals
f) Others
Site Ranking
a) Amazon
b) Snapdeal
c) Flipkart
d) Myntra
e) Ebay
f) Others(specify)
54
Factors
Strongly
disagree
Disagree
Indifferent
Agree
Strongly
agree
1. I can buy the products anytime24
hours a day while shopping online
2. Itis easy to chooseand make
comparison with other products
3. The website design/ layout helps me
in searching and selecting the right
product
4. Sometimes I can find products online
which I may not find in stores
5. I feel that it takes less time in
evaluating and selecting a product
while shopping online
6. I feel safeand securewhile shopping
online
7. I like to shop online from a
trustworthy website
8. There has been asking unnecessary
information in online shopping.
9. I believe online shopping will
eventually supersedetraditional
shopping
10.A long time is required for thedelivery
of products and service
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
T1
55
11.More choices are available in online
site
12. The description of products shown on
the site are very accurate
13.Online shopping is as secure as
traditional shopping
14.At the time of payment, I hesitate to
give my credit/debit card number
15.Internet reduces the monetary costs of
traditional shopping to a great
extent(parking,travel,etc)
16. I am satisfied with the service quality
of online retailers
17. When I get a product up to
expectation, I prefer same site next
time
18. Delivery/shipping charge of product is
relatively high
19.Product gets delivered before the
delivery timeline mentioned
20. I am overall satisfied with the
experience of shopping online.
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝
⃝ ⃝ ⃝ ⃝ ⃝⃝
Strongly
disagree
Disagree
Indifferent
Agree
Strongly
agree
FactorsT2
56
Factor
i. Reputation of the company □ □ □ □ □
ii. Guarantees and Warrantees □ □ □ □ □
iii. Privacy □ □ □ □ □
iv. Descriptionof goods in the site □ □ □ □ □
v. Riskofpaymentdata □ □ □ □ □
vi. Seasonal/Festivaloffers □ □ □ □ □
vii. Waiting to receive the product □ □ □ □ □
viii. TrackingtheproductIordered □ □ □ □ □
ix. Theadvertisementofrelated
producttothepurchasingproduct
(shoe:socks)
□ □ □ □ □
x. Impactofreviews/ratings □ □ □ □ □
xi. Not being able to touch
products.
□ □ □ □ □
xii. Returnpolicyofonlinestore □ □ □ □ □
xiii. Situation of out of stockitems □ □ □ □ □
17) Do social networking advertisements influence you ononlinepurchase?
Yes □ No □
18)Haveyou suffered fromtechnical problems during purchase?
Yes □ No □ If yes, please mention______
Never
Important
NotImportant
Indifferent
Important
Very
Important
T3
57
19) Haveyou suffered fromtransaction problems during payment?
Yes □ No □
If yes, please mention ______
20) Haveyou ever had to cancel your order? (beforeit got dispatched)
Yes □ No □
21) Haveyou returned the delivered object over last 2-3 purchase?
Yes □ No □
If yes, whatwould be the reason?
a) Absent of receiver □
b) Change in productquality/size/colour □
c) Damaged product □
d) Lostinterest in that product □
e) Other □
22) Haveyou ever faced cancel of order by company without your consent?
Yes □ No □
23) Do you know customer can sell used /fresh commodities through some
shopping sites?
Yes □ No □
24) Are you awareof your consumer rights when shopping online?
Yes □ No □
25) Do you intend to continue purchasing products fromtheinternet in the
near future?
Yes □ No □
-Thank You

More Related Content

What's hot (20)

Law of demand
Law of demandLaw of demand
Law of demand
 
Consumption function and investment function chapter 2
Consumption function and investment function chapter 2Consumption function and investment function chapter 2
Consumption function and investment function chapter 2
 
e payment system ppt
e payment system ppte payment system ppt
e payment system ppt
 
Risks involved in E-payment
Risks involved in E-payment Risks involved in E-payment
Risks involved in E-payment
 
E business and accounting
E business and accountingE business and accounting
E business and accounting
 
Consumption And Investment Function
Consumption And Investment FunctionConsumption And Investment Function
Consumption And Investment Function
 
Balance of payments
Balance of paymentsBalance of payments
Balance of payments
 
Presentation on keynesian theory
Presentation on keynesian theoryPresentation on keynesian theory
Presentation on keynesian theory
 
Perfect Competitive Market
Perfect Competitive Market Perfect Competitive Market
Perfect Competitive Market
 
Growth of e commerce industry
Growth of e commerce industryGrowth of e commerce industry
Growth of e commerce industry
 
E transaction
E transactionE transaction
E transaction
 
money functions
money functionsmoney functions
money functions
 
E payment
E paymentE payment
E payment
 
Balance of payment
Balance of paymentBalance of payment
Balance of payment
 
E wallet
E walletE wallet
E wallet
 
Importance of the study of elasticity of demand
Importance of the study of elasticity of demandImportance of the study of elasticity of demand
Importance of the study of elasticity of demand
 
Basics of Accounting Mechanics-Processing Accounting Information
Basics of Accounting Mechanics-Processing Accounting InformationBasics of Accounting Mechanics-Processing Accounting Information
Basics of Accounting Mechanics-Processing Accounting Information
 
The investment function
The investment functionThe investment function
The investment function
 
Cost and revenue analysis
Cost and revenue analysisCost and revenue analysis
Cost and revenue analysis
 
E Payment Methods
E Payment MethodsE Payment Methods
E Payment Methods
 

Similar to A Case Study on Status of Online Shopping in Trivandrum District

A STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITY
A STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITYA STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITY
A STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITYSara Alvarez
 
Brand advocacy of online shopping in bangladesh
Brand advocacy of online shopping in bangladeshBrand advocacy of online shopping in bangladesh
Brand advocacy of online shopping in bangladeshaman lingkon
 
CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING
CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING
CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING Pragya Bisht
 
Consumer behaviour in online shopping
Consumer behaviour in online shoppingConsumer behaviour in online shopping
Consumer behaviour in online shoppingSSeethalakshmi2
 
consumer perception towards online marketing in india
 consumer perception towards online marketing in india consumer perception towards online marketing in india
consumer perception towards online marketing in indiaINFOGAIN PUBLICATION
 
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales Shrikant Samarth
 
Womens online purchasing behaviour.docx
Womens online purchasing behaviour.docxWomens online purchasing behaviour.docx
Womens online purchasing behaviour.docxArchnaRajVerma
 
A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...
A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...
A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...Tony Lisko
 
A study on consumer buying behavior towards online retailers in Delhi NCR
A study on consumer buying behavior towards online retailers in Delhi NCRA study on consumer buying behavior towards online retailers in Delhi NCR
A study on consumer buying behavior towards online retailers in Delhi NCRkanishamittal1
 
To Understand the Eco-System in Digital Media Marketing.
To Understand the Eco-System in Digital Media Marketing.To Understand the Eco-System in Digital Media Marketing.
To Understand the Eco-System in Digital Media Marketing.Saurabh Giratkar
 
Problems faced by customers during online grocery purchase at bengaluru city–...
Problems faced by customers during online grocery purchase at bengaluru city–...Problems faced by customers during online grocery purchase at bengaluru city–...
Problems faced by customers during online grocery purchase at bengaluru city–...IJLT EMAS
 
How online selling has changed marketing perspective including consumer perce...
How online selling has changed marketing perspective including consumer perce...How online selling has changed marketing perspective including consumer perce...
How online selling has changed marketing perspective including consumer perce...Bhavesh Bhansali
 
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...areeba naseem
 
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...areeba naseem
 
(107) digital marketing emerging trends
(107) digital marketing emerging trends(107) digital marketing emerging trends
(107) digital marketing emerging trendsHariharanAmutha1
 
A Study on Impact of Online Marketing on Consumer Behaviour in Agartala City
A Study on Impact of Online Marketing on Consumer Behaviour in Agartala CityA Study on Impact of Online Marketing on Consumer Behaviour in Agartala City
A Study on Impact of Online Marketing on Consumer Behaviour in Agartala CityBharat Debbarma
 

Similar to A Case Study on Status of Online Shopping in Trivandrum District (20)

A STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITY
A STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITYA STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITY
A STUDY ON THE CUSTOMER SATISFACTION TOWARDS ONLINE SHOPPING IN CHENNAI CITY
 
Brand advocacy of online shopping in bangladesh
Brand advocacy of online shopping in bangladeshBrand advocacy of online shopping in bangladesh
Brand advocacy of online shopping in bangladesh
 
CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING
CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING
CONSUMERS PRECAUTIONS AND EXPERIENCES ABOUT ONLINE SHOPPING
 
Consumer behaviour in online shopping
Consumer behaviour in online shoppingConsumer behaviour in online shopping
Consumer behaviour in online shopping
 
consumer perception towards online marketing in india
 consumer perception towards online marketing in india consumer perception towards online marketing in india
consumer perception towards online marketing in india
 
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
 
Womens online purchasing behaviour.docx
Womens online purchasing behaviour.docxWomens online purchasing behaviour.docx
Womens online purchasing behaviour.docx
 
Rai octane report2014
Rai octane report2014Rai octane report2014
Rai octane report2014
 
A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...
A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...
A STUDY ON CONSUMER S ATTITUDE TOWARDS MYNTRA IN DIGITAL MARKETING WITH REFER...
 
Factors Affecting Customer Trustand Customer Loyaltyin the Online Shopping: a...
Factors Affecting Customer Trustand Customer Loyaltyin the Online Shopping: a...Factors Affecting Customer Trustand Customer Loyaltyin the Online Shopping: a...
Factors Affecting Customer Trustand Customer Loyaltyin the Online Shopping: a...
 
A study on consumer buying behavior towards online retailers in Delhi NCR
A study on consumer buying behavior towards online retailers in Delhi NCRA study on consumer buying behavior towards online retailers in Delhi NCR
A study on consumer buying behavior towards online retailers in Delhi NCR
 
DISSERTAOTON REPORN 0465.docx
DISSERTAOTON REPORN 0465.docxDISSERTAOTON REPORN 0465.docx
DISSERTAOTON REPORN 0465.docx
 
To Understand the Eco-System in Digital Media Marketing.
To Understand the Eco-System in Digital Media Marketing.To Understand the Eco-System in Digital Media Marketing.
To Understand the Eco-System in Digital Media Marketing.
 
Problems faced by customers during online grocery purchase at bengaluru city–...
Problems faced by customers during online grocery purchase at bengaluru city–...Problems faced by customers during online grocery purchase at bengaluru city–...
Problems faced by customers during online grocery purchase at bengaluru city–...
 
How online selling has changed marketing perspective including consumer perce...
How online selling has changed marketing perspective including consumer perce...How online selling has changed marketing perspective including consumer perce...
How online selling has changed marketing perspective including consumer perce...
 
E COMMERCE
E COMMERCEE COMMERCE
E COMMERCE
 
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
 
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
STUDY ON ONLINE SHOPPING BEHAVIOUR IN THE DIGITAL ERA, NEW TRENDS AND CUSTOME...
 
(107) digital marketing emerging trends
(107) digital marketing emerging trends(107) digital marketing emerging trends
(107) digital marketing emerging trends
 
A Study on Impact of Online Marketing on Consumer Behaviour in Agartala City
A Study on Impact of Online Marketing on Consumer Behaviour in Agartala CityA Study on Impact of Online Marketing on Consumer Behaviour in Agartala City
A Study on Impact of Online Marketing on Consumer Behaviour in Agartala City
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

A Case Study on Status of Online Shopping in Trivandrum District

  • 1. 1 A CASE STUDY ON STATUS OF ONLINE SHOPPING IN TRIVANDRUM DISTRICT PROJECT SUBMITTED TO THE UNIVERSITY OF KERALA IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN STATISTICS BY HALAKE KUMAR SURESH Reg No : STA 150504 NEETHU M G Reg No : STA 150506 DEPARTMENT OF STATISTICS UNIVERSITY OF KERALA KARIAVATTOM THIRUVANANTHAPURAM 2015 - 2017
  • 2. 2 Dr. C. SATHEESH KUMAR University of Kerala Professor and Head Kariavattom, Department of Statistics Thiruvananthapuram June, 2017 CERTIFICATE I hereby certify this “A CASE STUDY ON STATUS OF ONLINE SHOPPING IN TRIVANDRUM DISTRICT” is a bonafide report of the project work carried out by Mr. Halake Kumar Suresh & Ms. Neethu M. G. Fourth Semester M.Sc Statistics students in the Department of statistics, university of Kerala during 2015 – 2017 under my supervision and guidance, in partial fulfillment of the requirements for the M.Sc. Degree in Statistics of University of Kerala. Dr. C. Satheesh Kumar
  • 3. 3 ACKNOWLEDGEMENT The success of anything needs encouragement and co-operation from different quarters. Words are inadequate to express our profound and deep sense of gratitude to those who helped us in bringing out this project successfully. We owe an inestimable debt to Dr. C. Satheesh Kumar, Professor & Head, Department of Statistics, University of Kerala and our Project guide for his constant encouragement, excellent invaluable guidance given and suggestions rendered duringthecourse of the project work. We are very much grateful to all our teachers, librarian, research scholars, and M. Phil students, department of Statistics, University of Kerala, for their kind help throughout the work. We also take this opportunity to thank our family members and friendsfor their loveand encouragementthroughout the study. All above, we thank Almighty God without whose blessings we would never havebeen able to complete work. Mr. Halake Kumar Suresh Ms. Neethu. M. G Kariavattom June, 2017
  • 4. 4 CONTENTS CHAPTER Page No. 1. INTRODUCTION 6-10 1.1 Introduction 6 1.2 Terminology 8 1.3 Objective 9 1.4 Limitations 9 1.5 Summary of the study 9 2. MATERIALS AND METHODS 11-16 2.1 Introduction 11 2.2 Sampling Techniques 11 2.3 Description of the Questionnaire 12 2.4 Statistical Tools 12 3. STUDY ON AWARENESS OF CUSTOMERS 17-26 3.1 Introduction 17 3.2 Normality test 17 3.3 Awareness of males and females 18 3.4 Awareness and residency 20 3.5 Awareness and different education levels 22 3.6 Awareness and different education streams 23 3.7 Awareness and different monthly income
  • 5. 5 groups 24 3.8 Awareness and customers interest 25 3.9 Conclusions 26 4. STUDY ON SATISFACTION OF CUSTOMERS 27-36 4.1 Introduction 27 4.2 Normality test 27 4.3 Awareness of males and females 28 4.4 Awareness and residency 30 4.5 Awareness and different education levels 32 4.6 Awareness and different monthly income groups 33 4.7 Awareness and customers interest 34 4.8 Conclusion 35 5. LOGISTIC REGRESSION AND LINEAR DISCRIMINANT ANALYSIS 36-47 5.1 Introduction 36 5.2 Binary logistic regression 37 5.3 Multinomial logistic regression 41 5.4 Linear discriminant analysis of output 44 5.5 Linear discriminant analysis of customer 45 5.6 Conclusions 47 Reference 48 Questionnaire 49-56
  • 6. 6 Chapter 1 INTRODUCTION 1.1 Introduction 1.1.1 E-commerce inIndia: state of play Retail e-commerce sales in India are expected to increase to approximately $45.17 billion by 2021, and this correlates with the yearly rise in the use of internet on mobile devices in the country. (A similar trend of rising mobile purchases is also found across Asia and Western countries as people become more reliant on mobile technology). Western e-commerce brands have also taken an interest in India’s maturing commercial scene, Canadian e-commerce start-up. Shopify announced the release of Shopify.in in 2013 to much fanfare and publicity; a perfect example of how Western companies are seeing potential in the thriving economy of India. Shopify was quick to capitalise on the growing e-commerce entrepreneur and SME market — a demographic that is contributing to India’s economic landscape with refreshingly diverse voices. China, the other big e-commerce success story of recent years, has also expressed an interest in entering the buoyant Indian market. The Chinese e-tail giant Alibaba keenly wants to get a slice of India’s market, currently dominated by native Flipkart and Snapdeal, as plans were announced to open an office in Mumbai in late 2016. There are a lot of exciting developments in the pipeline for India’s e-commerce industry. India’s online population is increasing at a rapid pace, as is the number of people accessing the internet on mobile devices. 1.1.2 Online Consumer Buying Behavior Everybody in the world is the consumer. Each of us buys and sells or consumes goods and services in life. Consumer behavior is very complex and is determined to a large extent by social and psychological factors. Consumer behavior can be defined as those acts of individuals directly involved in obtaining, using and disposing of economic goods and services.
  • 7. 7 The relevance and importance of understanding consumer behavior is rooted in modern marketing. The needs of not even two consumers are the same. Therefore they buy only those products and services which satisfy their wants and desires. To survive in the market, a firm has to be constantly innovating and understanding the latest consumer needs and tastes. It will be extremely useful in exploiting marketing opportunities and in meeting the challenges that the Indian market offers Online consumer behavior parallels that of offline consumer behavior with some obvious differences. The stages of the consumer decision process are basically the same whether the consumer is online or offline. But the general model of consumer behavior needs modification to take into account new factors. In the online model, web site features along with consumer skills, product characteristics, attitudes towards online purchasing and perceptions about control over the Web environment play a vital role. There are parallels in the analogue world, where it is well known that consumer behavior can be influenced by store design, and that understanding the precise movements of consumers through a physical store can enhance sales if goods and promotions are arranged along the most likely consumer tracks. Consumer skills refer to the knowledge that consumer has about how to conduct online transactions. Product characteristics refer to the fact that some products can be easily described, packaged and shipped over the Internet whereas others cannot. Combined with traditional factors such as brand, advertising and firm capabilities, these factors lead to specific attitudes about online shopping Consumer behavior regarding the use of internet for shopping varies. Some consumers either lack access or resist using this new channel of distribution, primarily due to privacy and security concerns. Other shoppers choose to browse the Web so as to gather information and then visit the stores to negotiate the purchase face to face with the retailer. Few shoppers visit retail stores first and then buy from an e-retailer. Still others do all the shopping online: gathering formation, negotiating, purchasing and either arranging for delivery or picking up the merchandise in the store Three key ways any Indian e-commerce store can get ahead of the game are: investing in a brand story, adopting a multichannel strategy, and getting to grips with social media content
  • 8. 8 1.2 Terminology Online Shopping The action or activity of buying goods or services over the internet. Online shopping is a form of electronic commerce which allows consumers to directly buy goods or services from a seller over the Internet using a web browser. Customer A customer is an individual or business that purchases the goods or services produced by a business. Attracting customers is the primary goal of most public-facing businesses, because it is the customer who creates demand for goods and services. We classified customers in different groups like ‘more efficient’ and ‘less efficient’ or ‘ideal’, ‘ordinary’, ‘low profile’ based on how long he is shopping online, average amount of spending in single purchase, frequency of internet usage, wish to shop online in future, etc. Satisfaction towards online shopping It is the extent or degree to which a customer is fulfilled by the experience of shopping through internet. In our study, the satisfaction score is sum of scores assigned to some of sub-questions of table T1 and T2. Qns 1,2,3,4,5,6,9,11,12,13,15,16,17 and 20 have assigned scores as below- Strongly disagree ( score 1), Disagree (score 2), Indifferent (score 3), Agree (score4) and Strongly agree (score5) For Qns 8,10,14 and 18 the scoring is in reverse order. Awareness towards online shopping There are some psychological, personal and social factors which measure the level of awareness in customer towards online shopping. In our study, the awareness score is sum of scores assigned to sub-questions 1 to 10 and Qn 12 of table T3. The scoring is as below. Never important ( score 1), Not important ( score 2), Indifferent ( score 3), Important ( score 4), Very Important ( score 5).
  • 9. 9 1.3 Main Objectives ofthe Study  To check whether awareness of customer towards online shopping changes with different vital factors like gender, residency, stream of education, education level and income.  To check whether Satisfaction of online shopping varies with gender, residency, stream of education, education level and income  To classify customers into different groups by means of logistic regression, multinomial regression and discriminant analysis models 1.4 Limitations of the study Few weak points of the study are enlisted below.  Results may not be comparable for different geographical area or at different times, as survey is conducted in Trivandrum district in the months of April-May.  Survey is only related to ‘Online Shoppers’, no consideration about mere offline shoppers.  Sample size is 150, not too large. 1.5 Summaryof the study In chapter 1 we given brief introduction about e-commerce. We explained the terms used in the study. Then we also mentioned the main objectives of our study. There are some limitations of the study, which are discussed in 1st chapter. Chapter 2 deals with sampling techniques used for collection of data, the description of the questionnaire-what are the questions included, how they are arranged, what are the new variables constructed by using many choices of respondents . Then there is theoretical approach of different statistical tools used throughout the study and what are the. We included parametric and non - parametric tools according to their applicability. Chapter 3 is focused on analysis about awareness of customers towards online shopping. We compared awareness level with different vital factors like gender, residency, education level, education stream and customers of different monthly family income.
  • 10. 10 Chapter 4 is all about satisfaction of customers towards e-shopping. Here also we applied different statistical tools to see whether there exist significant difference in satisfaction score for different vital phenomena like gender, residency, education level, education stream of customers and with different monthly family income. Chapter 5 has application of different regression models and discriminant analysis to distinguish the type of customer. Thus we have got several reliable and well precise models of classification. We preferred R studio for analysis of the data.
  • 11. 11 Chapter 2 MATERIALS AND METHODS 2.1 Introduction In chapter 1, we have introduced the terminologies associated with the topic and described the objectives of the study. In this chapter we are describing the sampling techniques used for survey, questionnaire and the analytical tools employed. The details are given through different sections. 2.2 Sampling Techniques The overall procedure for this study involved in administration of the questionnaire to a sample of size 150 online shoppers from various parts of the city. Multi-stage sampling Multi-stage sampling (also known as multi-stage cluster sampling) is a more complex form of cluster sampling which contains two or more stages in sample selection. In simple terms, in multi-stage sampling large clusters of population are divided into smaller clusters in several stages in order to make primary data collection more manageable. Out of total 100 wards in Trivandrum Corporation, most of the neighboring wards are homogeneous in case of socio-economic conditions. So, all wards are classified into 10 within homogeneous blocks (1st stage sampling unit) and one ward from each block i.e. 2nd sampling unit is taken randomly, which are representative of corresponding block. They are Kazhakuttom, Chellamangala, Kowdiar, Kachani, Poojappura, Pappanamcode, Mulloor, Thampanoor, Kadakampally and Akkulam. Then by the Simple Random sampling method, sample of size 150 is selected from the selected 10 wards. Better effort had been put to have a sample representing the whole population. It comprises all age groups of customer, different education levels like SSLC, Graduation, Master and different streams of education like Arts, Science, Professional also different occupations. Most of the questionnaires are completed by face to face interview to avoid any personal bias from respondents.
  • 12. 12 2.3 Description of the Questionnaire Each questionnaire has two parts-part A and part B, each respondent has asked to touch all the questions. All questions are arranged in proper sequence, so as to get reliable data as much as possible. Part A has 7 questions, which comprises the personal data of respondents like age, gender, residency, occupation, education level and monthly family income. Whereas part B includes survey related questions. Questions 1 and 2 are about internet usage. Then Qns 3 to 7 are about since how long the customer purchasing online, spending nature and time spent on site. Qns 8 and 9 are for payment. Qn 10 is of ranking to some elements of online shopping compared to store shopping. Qn 11 has variety of products among which customer is asked to tick his shopping choices. Qn 12 is ranking of some popular e-retailers and also ranking of Filter option available in any e-retailers site profile. Qns 13 to 16 are about how friendly the customer for shopping site. Then table T1 and T2 together have 20 factors, each having 5 options (a, b, c, d, & e) which are nothing but the level of agreement from ‘Strongly disagree’ to ‘Strongly agree’ and informant is supposed to choose one of them. These 20 questions are checking customer’s satisfaction towards online shopping. Table T3 has 13 factors, which are awareness factors towards online purchase. There are also 5 options (a, b, c, d, & e) which are levels of importance from ‘Never important’ to ‘Very important’ and any one is supposed to choose. We have done scoring to all sub-questions of table T1, T2 and T3, which is described in terminology. Then finally Qns from 17 to 25 are asked about influence of social media on online purchase, technical problems suffered in various steps like payment, cancel or return of ordered commodities, awareness about consumer rights and whether customer is willing to purchase in future. (Questionnaire attached at last of document) 2.4 Statistical Tools In accordance with the main objectives mentioned in previous chapter, we are utilising some statistical tools to test different hypotheses.
  • 13. 13 Kruskal Wallis H Test Kruskal Wallis H Test is often called as “Analysis of Variance by Ranks”. This non- parametric test is especially desired when the k-samples do not come from normal population, so non-parametric alternative to one way ANOVA. The null hypothesis here tested, H0 : k- independent samples come from same population. Assumptions 1. Dependent variable should be measured at the ordinal or continuous level 2. Independent variable should consist of two or more categorical, independent groups. 3. Should have independence of observations. Steps involved Step 1: Rank all of the scores, ignoring which group they belong to in ascending order. Step 2: Find "Ti", the total of the ranks for each group. Just add together all of the ranks for each group in turn Step 3: Find the value of test statistic H. 𝐻 = { 12 𝑁(𝑁 + 1) } [∑ 𝑇𝑖 2 𝑛𝑖 𝑘 𝑖=1 ] − 3(𝑁 + 1) Where, N → the total number of observations. ni → Number of subjects in ith group k → Number of groups ( ≥3) Ti 2 → square of total ranks for scores in ith group Step 4 : The distribution of H is approximately ᵡ2 with k-1 d.f. Test : reject H0 at α level of significance, when H > ᵡ2α(k-1)
  • 14. 14 Mann-Whitney U test This is one of most powerful non-parametric test and is alternative to two sample t- test. The null hypothesis tested here is, H0 : Two independent random samples come from same population. Assumptions Suppose two samples drawn from two independent populations X and Y. 1. X and Y are continuous distributions (or discrete distributions well-approximating continuous distributions) 2. X and Y have the same shape. The only possible difference is their position (i.e. the value of the median) 3. the number of elements in each sample is not less than 5 4. the samples are independent 5. scale of measurement should be ordinal, interval or ratio How it works To make it simple, the U-test works as follows. Both samples (having sizes N and M) are combined into one array which is sorted in ascending order. We keep information about which sample the element had come from. After sorting, each element is replaced by its rank (its index in array, from 1 to N+M). Then the ranks of the first sample elements are summarized and the U-value is calculated: The mean of U equals NM/2. If U is close to this value, the medians of X and Y are close to each other. If we know distribution quantiles, we can get the significance level corresponding to the value of U. Normal approximation Although U has discrete distribution if N and M are big it could be approximated by the normal distribution with a mean of NM/2 and standard deviation 𝜎 = √ 𝑁𝑀(𝑁𝑀+1) 12 Thus, 𝑍 = 𝑈− 𝑁𝑀 2 √ 𝑁𝑀(𝑁+𝑀+1) 12 can be used as test statistic, which has 𝑁(0,1) distribution.
  • 15. 15 Kolmogorov-Smirnov two sample test This non parametric test is used to test the null hypothesis, H0: Two data samples come from the same distribution. Note that we are not specifying what that common distribution is. i.e. H0: 𝐹 𝑚(𝑥)= 𝐺 𝑛(𝑥) for all x. The test statistic Dm,n is defined as below, 𝐷 𝑚,𝑛 = 𝑆𝑢𝑝|𝐹̂ 𝑚(𝑥) − 𝐺̂𝑛 (𝑥)| Where, 𝐹̂ 𝑚(𝑥) and 𝐺̂𝑛(𝑥) are empirical distribution functions of the two samples. Test reject H0 at α level of significance if 𝐷 𝑚,𝑛 > 𝐷 𝑚,𝑛(𝛼) i.e. accept H0 if 𝐹̂ 𝑚 and 𝐺̂𝑛 are close for each x. Students two sample t-test This test is used to test the null hypothesis H0: Two population means are not significantly differ. .i.e. 𝜇1 = 𝜇2 Assumptions 1. Each sample is randomly selected from corresponding populations. 2. Populations from which sample drawn are normal. 3. The variance of two populations does not differ significantly. So before applying t-test, we should go through variance test. How it works Suppose we have two independent random samples of size 𝑛1 𝑎𝑛𝑑𝑛2 from normal populations N(𝜇𝑖, 𝜎𝑖 2 ), i =1,2. Let 𝑋𝑖1, 𝑋𝑖2, ⋯⋯ 𝑋𝑖𝑁𝑖 is sample from ith population. We have to test H01: 𝜎1 2 = 𝜎2 2 Suppose, 𝑠𝑖 2 = 1 𝑛𝑖 −1 ∑ (𝑥 𝑖𝑗 − 𝑥̅ 𝑖) 2𝑛𝑖 𝑗=1 is the ith sample variance, i =1,2 and j=1,2,,,,𝑛𝑖 Then F statistic is,
  • 16. 16 Under H01 𝐹 = 𝑠1 2 𝑠2 2 ~𝐹(𝑛1 − 1, 𝑛2 − 1) Test accept H01 at 5%level of significance if 𝐹 < 𝐹0.05(𝑛1 − 1, 𝑛2 − 1) and then only perform students t test for testing H0 The test statistic is given below 𝑡 = 𝑋̅1 − 𝑋̅2 √ ( 𝑛1 − 1) 𝑠1 2 + ( 𝑛2 − 1) 𝑠2 2 𝑛1 + 𝑛1 − 2 ( 1 𝑛1 + 1 𝑛2 ) ~𝑡 𝑛1+𝑛2−2 If | 𝑡| > 𝑡0.05(𝑛1 + 𝑛2 − 2) we reject H0 at 5% level of significance.
  • 17. 17 Chapter 3 STUDY ON AWARENESS OF CUSTOMERS 3.1 Introduction The facility of Online purchasing has allowed customers to identify the different types of products available in the global market, Due to rapid globalization, all types of products are available on the internet .Goods and services, consumer durables, books, audio and video cassettes and services like and air tickets can also be purchased online. In this era of fast moving lifestyle, customers are busier than what they were few years back. It is precisely for this reason customers are also purchasing their products and services through online shopping. Marketplace is fast turning into e-marketplace. So customer’s awareness is very important to get maximum benefit and to be least affected by online fraud like issues. In our study we asked informant to rank some factors according to their preference and these factors are measures of each respondent’s awareness level towards online shopping. The questions are separately put in table T3 of questionnaire which is attached in appendix section. Mostly awareness of e-shopper depends on some terms like reputation of e-seller, guarantee-warrantee, privacy and security, advertisements, impact of review and rating, etc Finally the awareness scores are obtained by summing up these scores in particular way (explained in Terminology). 3.2 Test for Normality of awareness score
  • 18. 18
  • 19. 19 Thus, pattern of the data somewhat deviated from normality, so it would be prefer to have conclusions on non-parametric tools, even though both are used. 3.3 Awareness towards online shoppingwith Gender In this section our interest is to test whether the Awareness score varies with gender. H0 : Mean awareness score of customer are identical in both Male and Female. H1 : Mean awareness score of customer are not identical in Male and Female. Let us firstly check whether the variances of awareness score are significantly differen t in male and female by F test. R yields following output. F test to compare two variances data: awarescore[Gender == "m"] and awarescore[Gender == "f"] F = 1.2198, num df = 100, denom df = 48, p-value = 0.4482 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.7304366 1.9516958 sample estimates: ratio of variances 1.219822 Thus F test shows that awareness score variances are not significant in male and femal e populations. So we can use two sample t-test for testing H0 against H1 Welch Two Sample t-test data: awarescore[Gender == "m"] and awarescore[Gender == "f"] t = -1.2207, df = 104.12, p-value = 0.2249 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.627605 0.625180 sample estimates: mean of x mean of y 45.03960 46.04082 As p-value is not significant, we accept H0 at 5% level of significance. Thus, mean awareness score for male and female are not significantly different.
  • 20. 20 We applied Mann-Whitney U test and the R output is as below Wilcoxon rank sum test with continuity correction data: awarescore by Gender W = 2769.5, p-value = 0.2369 alternative hypothesis: true location shift is not equal to 0 As p-value is not significant, accept H0 .Here one can claim that Median awareness scor e for male and female is not significantly different at 5% level of significance. We are stating null and alternatives as below for two sample Kolmogorov-Smirnov test H0 : Awareness scores for male and female have same unknown distributions H1 : Awareness scores have different distributions R yields following output. Two-sample Kolmogorov-Smirnov test data: awarescore[Gender == "m"] and awarescore[Gender == "f"] D = 0.15619, p-value = 0.3967 alternative hypothesis: two-sided Here we don’t have a proof against H0 so we accept H0 at 5% level of significance.
  • 21. 21 3.4 Awareness towards online shoppingwith Residency In this section we are going to test whether the Awareness score varies with residency of customer. H0 : Mean awareness score of customer are identical in Rural and Urban population. H1 : Mean awareness score varies with residency. First of all we try to check whether the variances of awareness score are significantly different in customers belonging to Rural/Town and Urban area by F test. R yields follo wing output. F test to compare two variances data: awarescore[Residency == "Urban"] and awarescore[Residency == " Rural/Town"] F = 1.2906, num df = 53, denom df = 95, p-value = 0.2789 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.8122947 2.1191771 Sample estimates: ratio of variances 1.29058 Thus F test shows that awareness score variances are not significant in both residenti al customers. So we can use two sample t-test for testing H0 against H1 Welch Two Sample t-test data: awarescore[residency == "b"] and awarescore[residency == "a"] t = 0.81259, df = 98.798, p-value = 0.4184 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.009676 2.410139 sample estimates: mean of x mean of y 45.81481 45.11458 As p-value is not significant, we accept H0 at 5% level of significance. Mean awareness score Rura/townl and Urban population are not significantly different.
  • 22. 22 For Mann-Whitney U test and the R output is as below Wilcoxon rank sum test with continuity correction data: awarescore by residency W = 2379, p-value = 0.4044 alternative hypothesis: true location shift is not equal to 0 As p-value is not significant, we accept H0 and Median awareness score for rural/tow n and urban area customers is not significantly different at 5% level of significance. We are stating null and alternatives as below for two sample Kolmogorov-Smirnov test as H0 : Awareness scores for rural/town and urban residential customers have same unknown distributions H1 : Awareness scores come from different distributions R yields following output. Two-sample Kolmogorov-Smirnov test data: awarescore[residency == "b"] and awarescore[residency == "a"] D = 0.16782, p-value = 0.2847 alternative hypothesis: two-sided Here also we don’t have an evidence against H0 so we accept H0 at 5% significance lev el of significamce.
  • 23. 23 3.5 Awareness with Education level Now we have to check whether the Awareness score varies with education level of customer like +2/Diploma, Graduation, Master and Higher degrees. H0 : Mean awareness score are same for customer with any education level. H1 : Mean awareness score varies with level of education. Here R has following output for ANOVA aov(formula = awarescore ~ edu) Df Sum Sq Mean Sq F value Pr(>F) edu 3 30 9.952 0.412 0.744 Residuals 146 3523 24.130 Thus we accept H0 since there is no proof against it. Sometimes the conditions for ANOVA may be violated for ranked data. So a non- parametric alternative to one way analysis of variance –Kruskal Wallis H test is used as below. Kruskal-Wallis rank sum test data: awarescore by Edu Kruskal-Wallis chi-squared = 1.6646, df = 3, p-value = 0.6448 Now we accept null hypothesis based on Kruskal Wallis H test also. Thus, the awareness towards online shopping is almost same in all customers with diffe rent education levels.
  • 24. 24 3.6 Awareness with Stream of education In this section we are going to test whether the awareness score varies with educational streams like Arts, Science and Professional. H0 : Mean awareness score of customer are identical in all streams. H1 : Mean awareness score varies with stream. Firstly we are going through ANOVA. Here R has following output aov(formula = awarescore ~ stream) Df Sum Sq Mean Sq F value Pr(>F) stream 2 38 19.09 0.798 0.452 Residuals 147 3515 23.91 There is no proof against null hypothesis, so obviously due to non-significant p-value we accept H0 at 5% level of significance. As scores are obtained by summation of ordered values, sometimes the conditions for ANOVA may be violated. So a non-parametric alternative to one way analysis of variance –Kruskal Wallis H test is used as below. Kruskal-Wallis rank sum test data: awarescore by factor(stream) Kruskal-Wallis chi-squared = 1.5512, df = 2, p-value = 0.4604 as p-value > 0.05, we accept the null hypothesis that awareness score does not vary sig nificantly with different streams.
  • 25. 25 3.7 Awareness with Income groups Here we focus on testing whether the Awareness score varies with different family income groups (Monthly) like below 20000, 20000 to 50000, 50000 - 75000, 75000-1 Lakh and above 1 Lakh. H0 : Mean awareness score of customer are identical in all income categories H1 : Mean awareness score varies with different income groups. Firstly we are using ANOVA. Here R has following output aov(formula = awarescore ~ income) Df Sum Sq Mean Sq F value Pr(>F) income 4 16 4.082 0.167 0.955 Residuals 145 3537 24.390 Thus we accept H0 since there is no evidence for its rejection. Since the score data obtained may sometimes exhibit certain deviation from the assumption of ANOVA, we apply usual non-parametric alternative to one way analysis of variance viz Kruskal Wallis H test also to the data and R displays following output. Kruskal-Wallis rank sum test data: awarescore by Income Kruskal-Wallis chi-squared = 0.70578, df = 4, p-value = 0.9506 Kruskal Wallis H test also support to accept H0 against H1 So we conclude that all income group customers have almost equal awareness.
  • 26. 26 3.8 Awareness with two groups-‘interested’and ‘not-interested’ We can focus on the awareness level of customers who wish to continue e-shopping i n future and who are not interested in future. The hypotheses are H0 : Awareness score does not vary in both group of customers (willing and not willing to continue online shopping in future) H1 : Awareness score vary in both group of customers. Since, the ‘not interested’ group size is small (=7) we use Mann Whitney U test and R yields following output. Wilcoxon rank sum test with continuity correction data: awarescore[Entry_out_1_0$continue == 1] and awarescore[Entry_ou t_1_0$continue == 0] W = 575, p-value = 0.5087 alternative hypothesis: true location shift is not equal to 0 Thus test support to accept H0 i.e. both group customers are equally aware about onli ne shopping. We can’t use two sample t test, because the variation for both groups are different whic h can be seen in following boxplot. Even though medians are not significant, the boxplot reveals that those customers not w illing to continue are not much well in case of awareness.
  • 27. 27 3.9 Conclusions From the analysis carried out through ‘out the chapter, we are about getting following conclusions  Awareness score does not vary significantly for males and females. Thus customers have almost same awareness score irrespective of gender.  Awareness score does not vary with rural/town and urban residential customers. That is e-shoppers belonging to any residency have almost similar awareness about online shopping.  There is no significant difference in awareness score for customers of different education qualifications. Thus any customer has almost similar awareness about online shopping, irrespective of education level. (For our study, the respondents are minimum +2/diploma holders)  There is no significant difference in awareness score for customers from different educational streams like arts, science and professional. Thus all stream customers exhibit almost similar awareness about online shopping.  We could see there is no significant difference in awareness scores for the customers of different family income. Thus, irrespective family income, e- customers are almost equally aware about online shopping.  There is no significant difference in awareness level for the customers who are willing to continue e-shopping and who are not willing for it in future. If we consider variation in awareness score of these two groups we say that the customers who are not willing to continue are not doing well in case of awareness. We are seeing satisfaction level for same two groups in next chapter.
  • 28. 28 Chapter 4 Study on satisfaction of customers towards online shopping 4.1 Introduction With the rapid global growth in electronic commerce (e-commerce), businesses are attempting to gain a competitive advantage by using e-commerce to interact with customers. Growing numbers of consumers shop online to purchase goods and services, gather product information or even browse for enjoyment. Online shopping environments are therefore playing an increasing role in the overall relationship between marketers and their consumers (Koo et al. 2008). That is, consumer-purchases are mainly based on the cyberspace appearance such as pictures, images, quality information, and video clips of the product, not on the actual experience If a customer wants to purchase something online, there are plenty of online providers available and multiple brands are also available for single product. Then for wealthy business maintenance, the consumer satisfaction is very mportant. In previous chapter we tried to know how awareness about e-shopping changes with different factors like gender, residency, income, education, etc. Here also we are going through same factors and satisfaction. 4.2 Normality test of satisfaction score From the graph we conclude that data holds normality property
  • 29. 29 4.3 Satisfaction towards online shoppingwith Gender In this section our interest is to test whether the satisfaction score varies with gender. H0 : Mean satisfaction score of customer are identical in both Male and Female. H1 : Mean satisfaction score of customer are not identical in Male and Female. Let us firstly check whether the variances of satisfaction score are significantly differen t in male and female by F test. R yields following output. F test to compare two variances data: sscore[Gender == "f"] and sscore[Gender == "m"] F = 1.2678, num df = 48, denom df = 100, p-value = 0.3204 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.7923922 2.1172386 sample estimates: ratio of variances 1.267814 Thus F test shows that satisfaction score variances are not significant in male and femal e populations. So we can use two sample t-test for testing H0 against H1 Welch Two Sample t-test data: sscore[Gender == "f"] and sscore[Gender == "m"] t = -3.8227, df = 85.738, p-value = 0.0002495 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7.629189 -2.408799 sample estimates: mean of x mean of y 59.69388 64.71287 As p-value is significant, we reject H0 at 5% level of significance. Thus, mean satisfaction score for male and female are significantly different.
  • 30. 30 We applied Mann-Whitney U test and the R output is as below Wilcoxon rank sum test with continuity correction data: sscore[Gender == "f"] and sscore[Gender == "m"] W = 1555, p-value = 0.0002268 alternative hypothesis: true location shift is not equal to 0 As p-value is significant, reject H0 .Here one can claim that Median satisfaction score for male and female is significantly different at 5% level of significance. We are stating null and alternatives as below for two sample Kolmogorov-Smirnov test H0 : Satisfaction scores for male and female have same unknown distributions H1 : Satisfaction scores have different distributions R yields following output. Two-sample Kolmogorov-Smirnov test data: sscore[Gender == "f"] and sscore[Gender == "m"] D = 0.28188, p-value = 0.01057 alternative hypothesis: two-sided Here alsowe rejectnull hypothesis.
  • 31. 31 4.4 Satisfaction to shoppingonline and Residency In this section we are going to test whether the satisfaction score varies with residency of customer. H0 : Mean satisfaction score of customer are identical in Rural and Urban population. H1 : Mean satisfaction score varies with residency. First of all we try to check whether the variances of satisfaction score are significantly different in customers belonging to Rural/Town and Urban area by F test. R yields follo wing output. F test to compare two variances data: sscore[residency == "b"] and sscore[residency == "a"] F = 1.2127, num df = 53, denom df = 95, p-value = 0.4109 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.7632736 1.9912871 sample estimates: ratio of variances 1.212695 Thus F test shows that satisfaction score variances are not significant in both residential customers, one we can use two sample t-test for testing H0 against H1 Welch Two Sample t-test data: sscore[residency == "b"] and sscore[residency == "a"] t = 0.56677, df = 101.4, p-value = 0.5721 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.883585 3.390529 sample estimates: mean of x mean of y 63.55556 62.80208 As p-value is not significant, we accept H0 at 5% level of significance. Thus, mean satisfaction score Rura/townl and Urban population are not significantly diff erent.
  • 32. 32 For Mann-Whitney U test and the R output is as below Wilcoxon rank sum test with continuity correction data: sscore[residency == "b"] and sscore[residency == "a"] W = 2674, p-value = 0.7494 alternative hypothesis: true location shift is not equal to 0 As p-value is not significant, we accept H0 and Median satisfaction score for rural/tow n and urban area customers is not significantly different at 5% level of significance. We are stating null and alternatives as below for two sample Kolmogorov-Smirnov test as H0 : Satisfaction scores for rural/town and urban residential customers have same unknown distributions H1 : Satisfaction scores come from different distributions R yields following output. Two-sample Kolmogorov-Smirnov test data: sscore[residency == "b"] and sscore[residency == "a"] D = 0.096065, p-value = 0.9073 alternative hypothesis: two-sided Here also we don’t have an evidence against H0 so we accept H0 at 5% significance level.
  • 33. 33 4.5 Satisfaction with Educationlevel Now we have to check whether the Satisfaction score varies with education level of customer like +2/Diploma, Graduation, Master and Higher degrees. H0 : Mean satisfaction score are same for customer with any education level. H1 : Mean satisfaction score varies with level of education. Here R has following output for ANOVA aov(formula = sscore ~ Edu) Df Sum Sq Mean Sq F value Pr(>F) Edu 3 181 60.21 1.046 0.374 Residuals 146 8402 57.54 Thus we accept H0 since there is no proof against it. Sometimes the conditions for ANOVA may be violated for ranked data. So a non- parametric alternative to one way analysis of variance –Kruskal Wallis H test is used as below. Kruskal-Wallis rank sum test data: sscore by Edu Kruskal-Wallis chi-squared = 3.4159, df = 3, p-value = 0.3318 Now we accept null hypothesis based on Kruskal Wallis H test also. Thus, the satisfaction towards online shopping is almost same in all customers with different education levels.
  • 34. 34 4.6 Satisfaction with Income groups Here we focus on testing whether the Satisfaction score varies with different family income groups (Monthly) like below 20000, 20000 to 50000, 50000 - 75000, 75000-1 Lakh and above 1 Lakh. H0 : Mean satisfaction score of customer are identical in all income categories H1 : Mean satisfaction score varies with different income groups. Firstly we are using ANOVA. Here R has following output aov(formula = sscore ~ income) Df Sum Sq Mean Sq F value Pr(>F) Income 4 265 66.25 1.155 0.333 Residuals 145 8317 57.36 Thus we accept H0 since there is no evidence for its rejection. Since the score data obtained may sometimes exhibit certain deviation from the assumption of ANOVA, we apply usual non-parametric alternative to one way analysis of variance viz Kruskal Wallis H test also to the data and R displays following output. Kruskal-Wallis rank sum test data: sscore by Income Kruskal-Wallis chi-squared = 3.2958, df = 4, p-value = 0.5096 Kruskal Wallis H test also support to accept H0 against H1 So we conclude that all income group customers have almost equal satisfaction towards online shopping.
  • 35. 35 4.7 Satisfaction with two groups- ‘interested’ and ‘not-interested’ In previous chapter we found that awareness about e-shopping is almost same for both groups of customers who willing and not willing to continue it in future. Now we can focu s on the how both groups are satisfied for e-shopping. The hypotheses are H0 : Satisfaction score does not vary in both group of customers (willing and not willing to continue online shopping in future) H1 : Satisfaction score vary in both group of customers. Since, the ‘not interested’ group size is small (=7) we use Mann Whitney U test and R yields following output. Wilcoxon rank sum test with continuity correction data: sscore[Entry_out_1_0$continue == 1] and sscore[Entry_out_1_0$co ntinue == 0] W = 810.5, p-value = 0.005761 alternative hypothesis: true location shift is not equal to 0 Thus test has evidence against H0 So we reject H0 and accept H1 The mean satisfaction score for ‘not-interested’ customer group is less than that for cust omer who is interested in e-shopping
  • 36. 36 4.8 Conclusions Through’ out the chapter, we carried out several tests and following are some conclusions we obtained.  Satisfaction score vary significantly for males and females. We have seen that female customers seem more satisfied than male customers.  Satisfaction score does not vary with rural/town and urban residential customers. That is e-shoppers belonging to any residency have almost similar satisfaction about online shopping.  There is no significant difference in satisfaction score for customers of different education qualifications. Thus any customer has almost similar satisfaction about online shopping, irrespective of education level. (For our study, the respondents are minimum +2/diploma holders)  We could see there is no significant difference in satisfaction scores for the customers of different family income. Thus, irrespective family income, e- customers are almost equally aware about online shopping.  There do exist significant difference in satisfaction level for the customers who are willing to continue e-shopping and who are not willing for it in future. The customer who is not interested to shop online in near future has less mean satisfaction score than the interested customer. One can claim that even the customers are aware, their satisfaction is important for any e-stores to keep market wealthy.
  • 37. 37 Chapter 5 CLASSIFICATION BY MEANS OF REGRESSION AND DISCRIMINANT ANALYSIS 5.1 Introduction Classification is a powerful tool in machine learning which classifies categorical response based on several information provided by several explanatory variables. It would be really important to any e-shopper to understand the type of the customer based on his past records. In our study we used binary logistic regression to distinguish customers into two classes ‘efficient customer’ and ‘less efficient customer’. The response variable is firstly constructed based on several variables described in Terminology. Then there is use of multinomial logistic regression for grouping customer in three different classes ‘ideal’, ’ordinary’ and ’low profile’. They are also based on several variables and is discussed in Terminology. In both of cases, we fitted the model for all variables through which response was constructed. Then we preferred model simplification by removing non-significant explanatory variables such that there should not be much loss in precision of simpler model we get at last. Finally we used linear discriminant analysis in relative to above two regression models. So there is comparison of each model with one another interms of precision. Confusion matrix and misclassification for each model tells how reliable the model is. There is also representation of some regression graphs to know how if any variable is significant for particular model. 5.2 Binary Logistic Regression of Outcome Y on explanatory variables U, V, W and X
  • 38. 38 Y : Outcome, new binary variable defined with value 1 (efficient customer) if customer is frequent internet user, spending not less than 500 in single purchase, shopping online s ince more than 2 years and should be willing to purchase in future, otherwise 0 (not effic ient). U : Since how many years the customer purchasing online. Ordinal, with ordered levels 1,2,3 & 4 V : Average spending in single purchase, Factor with levels a,b,c & d W : Frequency of internet usage, ordinal with levels 1,2,3 & 4 X : wish to continue online shopping in future, 1 if yes and otherwise 0. If p is P(Y=1), then the model becomes Logit(p) = α + βu + δiv + σw + λx i = b,c,d Here, level ‘a’ of U is redundant Link function is Logit (p)=Log( 𝑝 1−𝑝 ) R yields following summary of model. glm(formula = out ~ u + w + x, family = "binomial", data = Dat) Deviance Residuals: Min 1Q Median 3Q Max -3.7494 -0.1687 -0.0105 -0.0101 0.6665 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -39.42289 2655.79412 -0.015 0.988 u 5.63671 1.15708 4.871 1.11e-06 *** w 0.05184 1.33980 0.039 0.969 x 23.85253 2655.78879 0.009 0.993 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 150.121 on 149 degrees of freedom Residual deviance: 30.496 on 146 degrees of freedom AIC: 38.496 Number of Fisher Scoring iterations: 18 This complex model shows that, some of the variables are less significant effect on Y. S o, for simplified model, we use backward elimination method of model simplification and the simplest model obtained below.
  • 39. 39 Logit(p) = α + βu glm(formula = out ~ u, family = "binomial", data = Dat) Deviance Residuals: Min 1Q Median 3Q Max -3.1870 -0.2318 -0.2318 -0.0267 0.8854 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -12.280 2.263 -5.426 5.75e-08 *** u 4.338 0.817 5.310 1.10e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 150.121 on 149 degrees of freedom Residual deviance: 45.176 on 148 degrees of freedom AIC: 49.176 Number of Fisher Scoring iterations: 7 Thus the model is Logit(p) = -12.28+4.338u 𝑝̂= 𝑒 𝑦 1+𝑒 𝑦 where y = -12.28+4.338u Confusion Matrix Predicted actual 0 1 0 115 5 1 0 30 Misclassification error is 3.333333 % Following are some graphs reflecting the significance of different variables on the outco me.
  • 40. 40
  • 41. 41
  • 42. 42 5.3 Multinomial Logistic Regression of customer type Y on several explanat ory variables given below. Y→New categorical variable for type of customer, having 3 categories-‘ideal’, ’ordinary’ and ‘lowprofile’. This variable is constructed by using different variables like how customer satisfied with current custom of online shopping, how much he is confident in digital payment system, and so on. P→factor, past years since purchasing online Q→factor, average spending in single purchase R→factor, spent in last purchase S→ordinal, whether the customer using price comparison sites T→ordinal, for preference to read review/ratings U→ordinal, for preference to share own review/ratings V→ordinal, whether feeling safe/secure in online shopping W→ordinal, agreement level at hesitation for online payment X→ordinal, agreement level for overall satisfaction Z→numeric, (yes=1 or no=0 to continue shopping online in future) The model thus become, Log( 𝑝2 𝑝1 )= α + βjp +βjq +βkr + δ1s +δ2t+ δ3u+ δ4v+ δ5w + δ6x+ σz Log( 𝑝3 𝑝1 )= α’ + β’jp +β’jq +β’kr + δ’1s +δ’2t+ δ’3u+ δ’4v+ δ’5w + δ’6x+ σ’z where i,j,k=b,c,d p1=P(customer is ’ideal’) p2=P(customer is ’lowprofile’) p3=P(customer is ’ordinary’) here p1+p2+p3=1 and R has taken ‘ideal’ as a ‘reference level’, so the term p 1 is at denominator The complex model is, (Intercept) pb pc pd qb qc qd rb lowprofile 120.6 -70.6 -47.20 -114.60 -67.41 -10.180 -178.90 20.13 ordinary 128.5 -70.8 -44.99 13.51 -65.28 -5.829 -96.74 20.58 rc rd s t u v w x z lowprofile 56.16 82.16 13.66 99.82 15.30 -31.26 26.19 -65.09 67.41 ordinary 55.61 79.86 12.13 101.30 16.71 -28.33 20.44 -63.01 65.00 Residual Deviance: 35.16687 AIC: 103.1669
  • 43. 43 Wald Test for significance of coefficients To know the significance of different coefficients, Wald test gives the p-values which are calculated by using corresponding standard errors (Intercept) pb pc pd qb qc qd rb rc lowprofile 0 1.045e-09 0 NaN 0 0.1069 0.000e+00 0 4.570e-09 ordinary 0 9.291e-10 0 0 0 0.3561 3.672e-07 0 6.354e-09 rd s t u v w x z lowprofile 0 0.3235 0 0.5426 0.4987 0.3958 0.04770 6.058e-09 ordinary 0 0.3809 0 0.5063 0.5397 0.5075 0.05522 2.061e-08 Now by removing some non significant variables like S and U, also introducing interactio n between V and W i.e. feeling safe/secure and hesitation at payment, we get more pre cise and simple model as, Log( 𝑝2 𝑝1 )= α + βjp + βjq + βkr + δ1t + δ2v + δ3w + δ4x + σz +λv*w (1) Log( 𝑝3 𝑝1 )= α’ + β’jp + β’jq + β’kr + δ’1t + δ’2v + δ’3w + δ’4x + σ’z +λ’v*w (2) where i,j,k=b,c,d If equation (1) is y1 and (2) is y2 , then estimated probabilities are 𝑝̂1= 1 1+𝑒 𝑦1+𝑒 𝑦2 𝑝̂2= 𝑒 𝑦2 1+𝑒 𝑦1+𝑒 𝑦2 𝑝̂3= 𝑒 𝑦3 1+𝑒 𝑦1+𝑒 𝑦2 (Intercept) pb pc pd qb qc qd rb lowprofile -203.4 -58.44 -34.43 -164.70 -69.17 -10.44 -237.0 12.19 ordinary 373.2 -58.77 -33.76 57.36 -40.89 23.12 -75.2 12.38 rc rd t v w x z v:w lowprofile -30.23 76.98 41.94 98.60 170.00 -108.60 39.51 -37.01 ordinary 11.83 70.84 52.22 -64.85 -52.53 -54.38 35.25 18.02 Confusion matrix Misclassification percent is 1.33, which is very less. Predicted actual ideal lowprofile ordinary ideal 4 0 0 lowprofile 0 32 1 ordinary 0 1 112
  • 44. 44 Interms of probability, first four predictions of this model are ideal lowprofile ordinary 1 0 0.4208 0.5792 2 0 1.0000 0.0000 3 0 0.0000 1.0000 4 0 0.0000 1.0000
  • 45. 45 5.4 Linear Discriminant Analysis (LAD) of outcome on U, V and X Y→Outcome, new binary variable defined with value 1 (efficient customer) if customer is frequent internet user, spending not less than 500 in single purchase, shopping online since more than 2 years and should be willing to purchase in future, otherwise 0 (not eff icient). U→Since how many years the customer purchasing online. Ordinal, with ordered levels 1,2,3 & 4 V→Average spending in single purchase, Factor with levels a,b,c & d shopping in future, 1 if yes and otherwise 0. X→wish to continue online Where R gives following outputs about discriminant analysis lda(out ~ u + v + x, data = Dat) Prior probabilities of groups: 0 1 0.8 0.2 Group means: u v x 0 1.691667 2.150000 0.9416667 1 3.400000 2.333333 1.0000000 Coefficients of linear discriminants: LD1 u 1.71759540 v -0.04364488 x 0.98842355 Confusion matrix predict actual 0 1 0 116 4 1 0 30 Misclassification percent is 2.66
  • 46. 46 5.5 Linear Discriminant analysis (LAD) of Y (customer type) on several ordinalresponses Here, all explanatory variables are ordinal except Z one, which is numeric Y=categorical, customer type (dependent variable) P<-categorical, past years since purchasing online (past.f) Q<-ordinal, average spending in single purchase T<-categorical, for preference to read review/ratings (reviewr.f) V<-ordinal, whether feeling safe/secure in online shopping W<-ordinal, agreement level at hesitation for online payment X<-ordinal, agreement level for overall satisfaction Z<-numeric, (yes/no to continue shopping online in future) R yields following discriminant output. lda(y ~ past.f + q + reviewr.f + v * w + x + z, data = Dat2) Prior probabilities of groups: ideal lowprofile ordinary 0.02666667 0.22000000 0.75333333 Group means: past.fb past.fc past.fd q reviewr.fb reviewr.fc v ideal 1.0000 0.0000 0.0000 2.5 0.00000 0.00000 4.0000 lowprofile 0.6060 0.2121 0.0000 2.06 0.18181 0.03030 2.2424 ordinary 0.4159 0.1238 0.1238 2.21 0.3185 0.053097 2.9026 w x z v:w ideal 2.2500 3.2500 1.00000 9.000000 lowprofile 4.0909 1.7272 0.96969 9.151515 ordinary 2.6106 2.1504 0.94690 7.769912 Coefficients of linear discriminants: LD1 LD2 past.fb 0.03967307 1.1541507 past.fc -0.07931444 0.8777495 past.fd 1.54621131 -0.0129210 q 0.39109904 0.2327005 reviewr.fb 0.85215592 -0.5408363 reviewr.fc 0.39822460 -0.8159128 v -0.70579709 1.7595617 w -2.26878996 1.3843756 x 0.48751834 0.9226888 z 0.58574159 1.0723062 v:w 0.45171501 -0.4799412 Proportion of trace: LD1 LD2 0.8776 0.1224
  • 47. 47 Confusion matrix predicted actual ideal lowprofile ordinary ideal 4 0 0 lowprofile 0 30 3 ordinary 1 5 107 Misclassification error is 6%, Here is neat display of classification done by above model.
  • 48. 48 5.6 Conclusions  The binary logistic regression model reveals that for consumer to be ‘efficient’ and ‘less e fficient’ the past purchasing behavior has significant role. The misclassification error is ab out 3.33%, which is preferably less.  The multicategory logistic regression model works well with several significant explanator y variables like past purchase behavior, amount spent in single purchase and continue k eeping online purchase. The misclassification percent is just about 1.33, so model is said to be more reliable.  The discriminant analysis model for same variables of binary logistic model has misclassi fication percent 2.66, indicating is preferably better over binary logistic model.  The discriminant analysis model for some variables of multicategory logistic regression h as misclassification error about 6%. The grouping is shown by graph using R tool.
  • 49. 49 REFERNCE  Alan Agresti(2002). An Introduction To Categorical Data Analysis, Second Edition, Wiley Series in Probability and Statistics.  Michael J Crowely (2007).The R Book, Wiley  Rohatgi V.K (1995).An Introduction To Probability Theory and Mathematical Statistics.  Lehmann E. L.(1975).Non Parametric Statistical Methods Based on Ranks.  Jared P. Lander (2013).R For Everyone : Advanced Analytics and Graphics, Kindle Edition
  • 50. 50 Department Of Statistics University of Kerala CASE STUDY ON STATUS OF ONLINE SHOPPING IN TRIVANDRUM DISTRICT Questionnaire A) Personal Data 1.Name (optional) : 2.Gender : Female □ Male □ 3.Age(years) : 4.Residency: a) Rural/Town □ b) Urban □ 5.Educationqualification: a) SSLC □ b) plus 2 /Diploma □ c) Graduation □ d) Master □ e) Professional □ (Please specify the stream for above qualifications) 6.Occupation : a) Student □ b) Teacher/Researcher □ c) ITprofessional □ d) Engineer/Industrial □ e)Business/Management □ f)Civilservice □ g)Other □ (specify)_____ 7.Monthly family Income(Rs) : a) <20,000 □ b) 20,000-50,000□ c) 50,000-75,000 □ d) 75,000-1Lakh □ e) above 1 Lakh □ B) Survey RelatedQuestions 1) Where do you access internet primarily? a) Mobile □ b) PC □ c)Tablet / Ipad □ d) Office/workplace □ e) Others □
  • 51. 51 2) How often would you use internet in a week (except study & work)? a)Daily □ b) Oncein 2-3 days □ c)once in week □ d)less frequently □ 3) When did you purchaseonline lastly? a)Within last week □ b)Within last month □ c)Beforemonth □ d)before3-4 months □ e)before 6 months □ 4) Since how many years you have been shopping through online? a) 1 year □ b)2-3 years □ c)4-5 years □ d)morethan 5 years □ 5) On an averagehow much would you spend in single purchase?(inRs) a) Less than 500 □ b) 500-2000 □ c) 2000-5000 □ d) more than 5000 □ 6) Approximately how much you had spent on a last purchase? (inRs) a) Less than 500 □ b) 500-2000 □ c) 2000-5000 □ d) more than 5000 □ 7) Approximately how much time you had spent on a last purchase? a) few minutes □ b) 15 to 30 min □ c) 30-60 min □ d) morethan 1 hr □ 8) Do you have - Credit/Debit card Yes □ No □ E-banking facility Yes □ No □ 9) Which payment method did you use for last purchase? a) Credit/Debit card □ b) Net banking /Digital Wallet □ c) Cash on delivery □ d) Others (specify) □_____
  • 52. 52 10) Rank the elements thosepromote you to purchaseonline. (most preferred has rank 1 & least one has rank 6) Element Ranking (by preference) a. Convenient & Relaxed way b. Door to door service c. Specific Product information d. Low price e. Variety of products f. Time save 11) What Products you usually purchaseonline? Product Choice Yes No a. Clothing& Accessories □ □ b. Books & Stationary □ □ c. Mobile & Computer/Accessories □ □ d. Electronic & Digital Accessories □ □ e. Home, Kitchen & Pets □ □ f. Toys & Baby Products □ □ g. Sports, Fitness & outdoor □ □ h. Beauty, health & Cosmetics/Jewellery □ □
  • 53. 53 12) Rank the following according to your preference. 13) Would you prefer using price comparison sites? a) Almostalways □ b) sometimes □ c) rarely/never □ 14) Do you prefer to read review/ratings of product by other purchasers? a) Almostalways □ b) sometimes □ c) rarely/never □ 15) Do you express your opinion in the “Productreview/rating” section? a) Almostalways □ b) sometimes □ c) rarely/never □ 16) I am willing to pay more if a) website offer free delivery □ b) faster/fastestdelivery options □ c) tax free shopping □ d) item not available offline □ Factor Ranking a) Brand b) Popularity(rating/reviews) c) Price (low-high/high-low) d) Discount/Offer/Coupon e) Fresh arrivals f) Others Site Ranking a) Amazon b) Snapdeal c) Flipkart d) Myntra e) Ebay f) Others(specify)
  • 54. 54 Factors Strongly disagree Disagree Indifferent Agree Strongly agree 1. I can buy the products anytime24 hours a day while shopping online 2. Itis easy to chooseand make comparison with other products 3. The website design/ layout helps me in searching and selecting the right product 4. Sometimes I can find products online which I may not find in stores 5. I feel that it takes less time in evaluating and selecting a product while shopping online 6. I feel safeand securewhile shopping online 7. I like to shop online from a trustworthy website 8. There has been asking unnecessary information in online shopping. 9. I believe online shopping will eventually supersedetraditional shopping 10.A long time is required for thedelivery of products and service ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ T1
  • 55. 55 11.More choices are available in online site 12. The description of products shown on the site are very accurate 13.Online shopping is as secure as traditional shopping 14.At the time of payment, I hesitate to give my credit/debit card number 15.Internet reduces the monetary costs of traditional shopping to a great extent(parking,travel,etc) 16. I am satisfied with the service quality of online retailers 17. When I get a product up to expectation, I prefer same site next time 18. Delivery/shipping charge of product is relatively high 19.Product gets delivered before the delivery timeline mentioned 20. I am overall satisfied with the experience of shopping online. ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝ ⃝⃝ Strongly disagree Disagree Indifferent Agree Strongly agree FactorsT2
  • 56. 56 Factor i. Reputation of the company □ □ □ □ □ ii. Guarantees and Warrantees □ □ □ □ □ iii. Privacy □ □ □ □ □ iv. Descriptionof goods in the site □ □ □ □ □ v. Riskofpaymentdata □ □ □ □ □ vi. Seasonal/Festivaloffers □ □ □ □ □ vii. Waiting to receive the product □ □ □ □ □ viii. TrackingtheproductIordered □ □ □ □ □ ix. Theadvertisementofrelated producttothepurchasingproduct (shoe:socks) □ □ □ □ □ x. Impactofreviews/ratings □ □ □ □ □ xi. Not being able to touch products. □ □ □ □ □ xii. Returnpolicyofonlinestore □ □ □ □ □ xiii. Situation of out of stockitems □ □ □ □ □ 17) Do social networking advertisements influence you ononlinepurchase? Yes □ No □ 18)Haveyou suffered fromtechnical problems during purchase? Yes □ No □ If yes, please mention______ Never Important NotImportant Indifferent Important Very Important T3
  • 57. 57 19) Haveyou suffered fromtransaction problems during payment? Yes □ No □ If yes, please mention ______ 20) Haveyou ever had to cancel your order? (beforeit got dispatched) Yes □ No □ 21) Haveyou returned the delivered object over last 2-3 purchase? Yes □ No □ If yes, whatwould be the reason? a) Absent of receiver □ b) Change in productquality/size/colour □ c) Damaged product □ d) Lostinterest in that product □ e) Other □ 22) Haveyou ever faced cancel of order by company without your consent? Yes □ No □ 23) Do you know customer can sell used /fresh commodities through some shopping sites? Yes □ No □ 24) Are you awareof your consumer rights when shopping online? Yes □ No □ 25) Do you intend to continue purchasing products fromtheinternet in the near future? Yes □ No □ -Thank You