SlideShare a Scribd company logo
1 of 12
Download to read offline
Portrait of an Online Shopper:
Understanding and Predicting Consumer Behavior
F. Kooti, K. Lerman, L. M. Aiello, M. Grbovic, and N. Djuric
http://www.kamishima.net/

WSDM206 2016/03/19
1
2
3
4
nger between
cial networks
consumer be-
s through the
w consumers
ormation and
of recommen-
addition, peo-
imilar to one
e more likely
onfirmed that
other tend to
oppers.
behavior, we
Given users’
a problem of
Our method
dom baseline
% relative im-
he time of the
ere shown to
poral features
ow:
et describing
mations mer-
hic, temporal,
);
ct the time of
e spent (Sec-
efit both con-
are ready to
pend can im-
terms of op-
tanding these
0.0
0.1
0.2
100
101
102
103
# of purchases
PDF
(a) PDF
0.4
0.6
0.8
1.0
100
101
102
# of purchases
CDF
(b) CDF
Figure 1: Distribution of number of purchases made by users
0.000
0.003
0.006
0.009
100
101
102
103
Total money spent ($)
PDF
(a) PDF
0.00
0.25
0.50
0.75
1.00
100
101
102
103
104
Total money spent ($)
CDF
(b) CDF
Figure 2: Distribution of total money spent by users
handbags), we developed a categorization module based on the pur-
chased item names. Specifically, an item name was used as an input
to a classifier that predicts the item category. We used a three levels
deep, 1,733 node Pricegrabber taxonomy4
to categorize the items.
The details of categorization are beyond the scope of this paper.
We limited our study to a random subset of Yahoo Mail users in
the US. Our data set contains information on 20.1M users, who col-
lectively made 121M purchases from several top retailers between
February and September 2014, amounting to total spending of 5.1B
dollars. For each user we extracted age, gender, and zip code in-
formation from the Yahoo user database. We excluded users who
made more than 1,000 purchases (amounting to less than 0.01%
of the sample), as these accounts are likely to belong to stores and
not individual users. The analysis of the data set was performed in
aggregate and on anonymized data.
10−6
10−4
10−2
100
100
101
102
103
104
# of times being purchased
PDF
Figure 3: Number of times different items have been purchased
101
102
103
104
105
106
107
100
101
102
103
Item price
#oftimespurchased
Figure 4: Number of item purchases as a function of price
DVD, the most popular item in the data set, has been purchased
more than 200,000 times, whereas the vast majority of the items has
been purchased less than 10 times. Table 1 lists the top 5 most fre-
quently purchased items. Intuitively, the set of items that users have
spent the most money on is a different set, because a single pur-
chase of an expensive item would account for the same amount of
money of several purchases of cheaper items (Table 2). In fact, the
number of times an item is purchased negatively correlates with the
price of that item (Figure 4). This is in line with previous survey-
based studies [7] that found the vast majority of items purchased
online are worth at most few tens of dollars.
3. PURCHASE PATTERN ANALYSIS
In this section we present a quantitative analysis of factors affecting
online purchases. We examine the role of demographic, temporal,
and social factors that include gender and age, daily and weekly
Table 1
Rank Produc
1 Frozen
2 Cards A
3 Google
4 HDMI c
5 Pamper
Table 2: Top 5 pr
Rank Product
1 Play Stat
2 Frozen (D
3 Kindle
4 Samsung
5 Cards Ag
Table 3: Top produ
Rank Top categ
1 Android
2 Accessori
3 Books
4 Vitamins
5 Shirts
With respect to
get older, peaking
then declining after
purchases made, av
ures 5(b), 5(d), 5(c)
types of products on
old) purchase more
shoppers (60-70 yea
shows. Also, blood
users, which is expe
Differences exist
five categories of p
tomers. Even thoug
each product accoun
the same gender. To
pare the fraction of
5
rchased
price
urchased
ems has
most fre-
ers have
gle pur-
mount of
fact, the
with the
survey-
urchased
ffecting
mporal,
weekly
ing pur-
p code)
ed frac-
We found
pared to
es (Fig-
)). It is
in total
groups.
mer sur-
ntage of
for neg-
a higher
Table 2: Top 5 products with the most money spent on them
Rank Product name Money spent on product
1 Play Station 4 $ 7.0M
2 Frozen (DVD) $ 3.8M
3 Kindle $ 3.2M
4 Samsung Galaxy Tab $ 2.7M
5 Cards Against Humanity $ 2.5M
Table 3: Top product categories purchased by women and men
Rank Top categories Distinctive women Distinctive men
1 Android Books Games
2 Accessories Dresses Flash memory
3 Books Diapering Light bulbs
4 Vitamins Wallets Accessories
5 Shirts Bracelets Batteries
With respect to the age, spending ability increases as people
get older, peaking for the population between age 30 to 50 and
then declining afterwards. The same pattern holds for number of
purchases made, average item price, and total money spent (Fig-
ures 5(b), 5(d), 5(c)). Different generations also purchase different
types of products online (Table 5). Younger shoppers (18-22 years
old) purchase more phone accessories and games, whereas older
shoppers (60-70 years old) are much more interested in buying TV
shows. Also, blood sugar medicine is purchased more by the older
users, which is expected.
Differences exist across genders as well. Table 3 shows the top
five categories of purchased products for male and female cus-
tomers. Even though the ranking of the top products is the same,
each product accounts for different fraction of all purchases within
the same gender. To find the most distinctive categories, we com-
pare the fraction of all the items bought by both genders, and con-
sider the categories that have the largest differences. Books, dresses,
and diapering are the categories that were more disproportionately
bought by women, whereas games, flash memory sticks, and ac-
cessories (e.g., headphones) are the categories purchased more by
men. The largest differences range from only 0.5% to 1%, but
are statistically significant. This result is aligned with previous
research on offline shopping that found men more keen in buy-
ing electronics and entertainment products, while women more in-
clined to buy clothes [11, 22]. We repeated the same gender analy-
sis at a product level (Table 4). Consistently, there is a high overlap
between most purchased products.
In the following, we measure the impact of economic factors
on online shopping behavior. We use the US census data to re-
trieve median income associated with each zip code, making an
inferred income for a user an aggregated estimate. Nevertheless,
given the large size of this data set this coarse approach was enough
to observe clear trends. The number of purchases, average product
20
30
40
50
20 30 40 50 60
Age
%ofonlineshoppers
Men
Women
(a) Percentage of online shoppers
3
4
5
6
7
20 40 60 80
Age
Avg#ofitemspurchased
Men
Women
(b) Number of purchases
20
30
20 40 60 80
Age
Medianofproductprice
Men
Women
(c) Average price
25
50
75
100
125
20 40 60 80
Age
Mediantotal$spent
Men
Women
(d) Total money spent
Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) average
price of products purchased; and (d) total spent by men and women
Table 4: Top products purchased by women and men
Rank Top women Top men Distinctive women Distinctive men
1 Frozen Frozen Frozen Chromecast
2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones
3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One
4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4
5 Game of Thrones iPhone screen protector iPhone case Godfather collection
Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users
Rank Top younger users Top older users Distinctive younger users Distinctive older users
20
30
20 30 40 50 60
Age
%ofonlines
(a) Percentage of online shoppers
3
4
5
20 40 60 80
Age
Avg#ofitems
Men
Women
(b) Number of purchases
20
20 40 60 80
Age
Medianofpro
Men
Women
(c) Average price
25
50
75
100
20 40 60 80
Age
Mediantotal
Men
Women
(d) Total money spent
Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) average
price of products purchased; and (d) total spent by men and women
Table 4: Top products purchased by women and men
Rank Top women Top men Distinctive women Distinctive men
1 Frozen Frozen Frozen Chromecast
2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones
3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One
4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4
5 Game of Thrones iPhone screen protector iPhone case Godfather collection
Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users
Rank Top younger users Top older users Distinctive younger users Distinctive older users
1 iPhone screen protector Frozen iPhone screen protector Frozen
2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones
3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey
4 iPhone case Downton Abbey iPhone case Blood sugar medicine
5 Frozen Hunger Games iPhone case TurboTax Package
0.0
2.5
5.0
7.5
50k 100k 150k
Income
Average#ofpurchases
(a) Number of purchases
0
20
40
60
50k 100k 150k
Income
Averageitemprice
(b) Average item price
0
100
200
300
400
500
50k 100k 150k
Income
Averagetotalspent
(c) Total money spent
Figure 6: Effect of income on purchasing behavior
price, and total money spent are all positively correlated with in-
come (Figures 6(a), 6(b), and 6(c) respectively). While users living
in high-income zip codes do not buy substantially more expensive
products, they do make more purchases, spending more money in
total than users from lower-income zip codes. Although the fac-
tors that lead to lower-income households spending less online are
multiple and complex, part of this effect can be explained by the
reluctance of people who are concerned with their financial safety
to trust and make full use of online shopping, as pointed out by
previous studies [23].
3.2 Temporal Factors
Our data set spans a period of eight months, giving us opportu-
nity to investigate temporal dynamics of purchasing behavior and
factors affecting it. Besides daily and weekly cycles and periodic
purchasing, we observed temporal variations that we associate with
financial depletion: users wait longer to buy more expensive items,
waiting for the budget to recover from the previous purchases.
3.2.1 Daily and Weekly Cycles
Figure 7 shows the daily number of purchases over a period of
two months. The figure indicates a clear weekly shopping pattern
with more purchases taking place in the first days of the week and
fewer purchases on the weekends. On average there are 32.6%
more purchases on Mondays than Sundays. There also exists strong
diurnal patterns in shopping behavior (Figure 8). Interestingly,
most of the purchases occur during the working hours (i.e., in the
morning and early afternoon). Note that for this analysis we in-
6
1 Frozen Frozen Frozen Chromecast
2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones
3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One
4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4
5 Game of Thrones iPhone screen protector iPhone case Godfather collection
Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users
Rank Top younger users Top older users Distinctive younger users Distinctive older users
1 iPhone screen protector Frozen iPhone screen protector Frozen
2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones
3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey
4 iPhone case Downton Abbey iPhone case Blood sugar medicine
5 Frozen Hunger Games iPhone case TurboTax Package
0.0
2.5
5.0
7.5
50k 100k 150k
Income
Average#ofpurchases
(a) Number of purchases
0
20
40
60
50k 100k 150k
Income
Averageitemprice
(b) Average item price
0
100
200
300
400
500
50k 100k 150k
Income
Averagetotalspent
(c) Total money spent
Figure 6: Effect of income on purchasing behavior
price, and total money spent are all positively correlated with in-
come (Figures 6(a), 6(b), and 6(c) respectively). While users living
in high-income zip codes do not buy substantially more expensive
products, they do make more purchases, spending more money in
total than users from lower-income zip codes. Although the fac-
factors affecting it. Besides daily and weekly cycles and periodic
purchasing, we observed temporal variations that we associate with
financial depletion: users wait longer to buy more expensive items,
waiting for the budget to recover from the previous purchases.
3.2.1 Daily and Weekly Cycles
Rank Top women Top men Distinctive women Distinctive men
1 Frozen Frozen Frozen Chromecast
2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones
3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One
4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4
5 Game of Thrones iPhone screen protector iPhone case Godfather collection
Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users
Rank Top younger users Top older users Distinctive younger users Distinctive older users
1 iPhone screen protector Frozen iPhone screen protector Frozen
2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones
3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey
4 iPhone case Downton Abbey iPhone case Blood sugar medicine
5 Frozen Hunger Games iPhone case TurboTax Package
0.0
2.5
5.0
7.5
50k 100k 150k
Income
Average#ofpurchases
(a) Number of purchases
0
20
40
60
50k 100k 150k
Income
Averageitemprice
(b) Average item price
0
100
200
300
400
500
50k 100k 150k
Income
Averagetotalspent
(c) Total money spent
Figure 6: Effect of income on purchasing behavior
price, and total money spent are all positively correlated with in-
come (Figures 6(a), 6(b), and 6(c) respectively). While users living
in high-income zip codes do not buy substantially more expensive
products, they do make more purchases, spending more money in
total than users from lower-income zip codes. Although the fac-
tors that lead to lower-income households spending less online are
multiple and complex, part of this effect can be explained by the
reluctance of people who are concerned with their financial safety
to trust and make full use of online shopping, as pointed out by
previous studies [23].
3.2 Temporal Factors
Our data set spans a period of eight months, giving us opportu-
nity to investigate temporal dynamics of purchasing behavior and
factors affecting it. Besides daily and weekly cycles and periodic
purchasing, we observed temporal variations that we associate with
financial depletion: users wait longer to buy more expensive items,
waiting for the budget to recover from the previous purchases.
3.2.1 Daily and Weekly Cycles
Figure 7 shows the daily number of purchases over a period of
two months. The figure indicates a clear weekly shopping pattern
with more purchases taking place in the first days of the week and
fewer purchases on the weekends. On average there are 32.6%
more purchases on Mondays than Sundays. There also exists strong
diurnal patterns in shopping behavior (Figure 8). Interestingly,
most of the purchases occur during the working hours (i.e., in the
morning and early afternoon). Note that for this analysis we in-
7
8 × 10+5
1.2 × 10+6
1.6 × 10+6
2 × 10+6
Jun15 Jul15 Aug15
Date
Numberofpurchases
Daily
Week avg
Figure 7: Number of purchases in a day and average weekly
2 × 10+6
4 × 10+6
6 × 10+6
8 × 10+6
0 5 10 15 20
Time of the day
Numberofpurchases
Figure 8: Number of purchases in each hour of the day
Table 6: Top 5 items with the most number of repurchases
Rank Item name Median purchase delay
1 Pampers 448 count 42
2 Bath tissue 62
3 Pampers 162 count 30
4 Pampers 152 count 31
5 Frozen 12
ferred the time zone from the user’s zip code, which might be dif-
ferent from a shipping zip code for a purchase.
Researchers have also reported monthly effects, where people
spend more money at the beginning of the month when they re-
ceive their paychecks, compared to the end of the month [14]. To
test the first-of-the-month phenomenon, we compared spending in
10−5
10−3
10−1
100
101
102
Days from last purchase
PDF
Figure 9: Distribution of number of days between purchases
0.10
0.15
0.20
0.25
0.30
0 50 100 150
Days from last purchase
Averagenormalizedprice
Figure 10: Relationship between purchase price and time to
next purchase (0.95 confidence interval are shown yet too small
to be observed)
purchases per each item. Table 6 shows the top five such products,
along with the median number of days between purchases. Out
of the top 20 products only four are neither toilet paper nor tissue
(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-
gle serve coffees). In the top 20 list, the only unexpected item is
the Frozen DVD, which probably made the list due to users buying
additional copies as gifts or due to purchases by small stores that
were not eliminated by our removal criterion of maximum 1,000
purchases. Interestingly, the number of days between purchases for
most of the top 20 items is close to 1 or 2 months, which might be
8 × 10+5
1.2 × 10+6
Jun15 Jul15 Aug15
Date
Number
Figure 7: Number of purchases in a day and average weekly
2 × 10+6
4 × 10+6
6 × 10+6
8 × 10+6
0 5 10 15 20
Time of the day
Numberofpurchases
Figure 8: Number of purchases in each hour of the day
Table 6: Top 5 items with the most number of repurchases
Rank Item name Median purchase delay
1 Pampers 448 count 42
2 Bath tissue 62
3 Pampers 162 count 30
4 Pampers 152 count 31
5 Frozen 12
ferred the time zone from the user’s zip code, which might be dif-
ferent from a shipping zip code for a purchase.
Researchers have also reported monthly effects, where people
spend more money at the beginning of the month when they re-
ceive their paychecks, compared to the end of the month [14]. To
test the first-of-the-month phenomenon, we compared spending in
the first Monday of the month with the last Monday of the month.
We considered the first and the last Mondays and not the first and
the last days, because the strong weekly patterns would result in
an unfair comparison if the first and the last day of the month are
not the same day of the week. Our data does not support the ear-
lier findings and there are months in which the last Monday of the
month includes more activity compared to the first Monday of the
month.
3.2.2 Recurring Purchases
Some products are purchased periodically, such as printer car-
tridges, water filters, or toilet paper. Finding these items and their
typical cycle would help predicting purchasing behavior. We do
this by counting the number of times each item has been purchased
10−5
10−3
100
101
102
Days from last purchase
P
Figure 9: Distribution of number of days between purchases
0.10
0.15
0.20
0.25
0.30
0 50 100 150
Days from last purchase
Averagenormalizedprice
Figure 10: Relationship between purchase price and time to
next purchase (0.95 confidence interval are shown yet too small
to be observed)
purchases per each item. Table 6 shows the top five such products,
along with the median number of days between purchases. Out
of the top 20 products only four are neither toilet paper nor tissue
(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-
gle serve coffees). In the top 20 list, the only unexpected item is
the Frozen DVD, which probably made the list due to users buying
additional copies as gifts or due to purchases by small stores that
were not eliminated by our removal criterion of maximum 1,000
purchases. Interestingly, the number of days between purchases for
most of the top 20 items is close to 1 or 2 months, which might be
due to automatic purchasing that users can set up.
3.2.3 Finite Budget Effects
Finally, we study the dynamics of individual purchasing behav-
ior. Figure 9 shows the distribution of number of days between
purchases. The distribution is heavy-tailed, indicating bursty dy-
namics. The most likely time between purchases is one day and
there are local maxima around multiples of 7 days, consistent with
weekly cycles we observed.
An individual’s purchasing decisions are not independent, but
constrained by their finances. Budgetary constraints introduce tem-
poral dependencies between purchases made by the user: After
buying a product, the user has to wait some time to accumulate
money to make another purchase. Previous work in economics
8 × 10+5
1.2 × 10+6
1.6 × 10+6
2 × 10+6
Jun15 Jul15 Aug15
Date
Numberofpurchases
Daily
Week avg
Figure 7: Number of purchases in a day and average weekly
2 × 10+6
4 × 10+6
6 × 10+6
8 × 10+6
0 5 10 15 20
Time of the day
Numberofpurchases
Figure 8: Number of purchases in each hour of the day
Table 6: Top 5 items with the most number of repurchases
Rank Item name Median purchase delay
1 Pampers 448 count 42
2 Bath tissue 62
3 Pampers 162 count 30
4 Pampers 152 count 31
5 Frozen 12
ferred the time zone from the user’s zip code, which might be dif-
ferent from a shipping zip code for a purchase.
Researchers have also reported monthly effects, where people
spend more money at the beginning of the month when they re-
ceive their paychecks, compared to the end of the month [14]. To
test the first-of-the-month phenomenon, we compared spending in
the first Monday of the month with the last Monday of the month.
We considered the first and the last Mondays and not the first and
the last days, because the strong weekly patterns would result in
10−5
10−3
10−1
100
101
102
Days from last purchase
PDF
Figure 9: Distribution of number of days between purchases
0.10
0.15
0.20
0.25
0.30
0 50 100 150
Days from last purchase
Averagenormalizedprice
Figure 10: Relationship between purchase price and time to
next purchase (0.95 confidence interval are shown yet too small
to be observed)
purchases per each item. Table 6 shows the top five such products,
along with the median number of days between purchases. Out
of the top 20 products only four are neither toilet paper nor tissue
(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-
gle serve coffees). In the top 20 list, the only unexpected item is
the Frozen DVD, which probably made the list due to users buying
additional copies as gifts or due to purchases by small stores that
were not eliminated by our removal criterion of maximum 1,000
purchases. Interestingly, the number of days between purchases for
most of the top 20 items is close to 1 or 2 months, which might be
due to automatic purchasing that users can set up.
3.2.3 Finite Budget Effects


8
ug15
weekly
day
chases
ay
42
62
30
31
12
ht be dif-
e people
they re-
[14]. To
nding in
e month.
first and
result in
onth are
the ear-
ay of the
ay of the
nter car-
and their
We do
urchased
ose prod-
umber of
10−5
100
101
102
Days from last purchase
Figure 9: Distribution of number of days between purchases
0.10
0.15
0.20
0.25
0.30
0 50 100 150
Days from last purchase
Averagenormalizedprice
Figure 10: Relationship between purchase price and time to
next purchase (0.95 confidence interval are shown yet too small
to be observed)
purchases per each item. Table 6 shows the top five such products,
along with the median number of days between purchases. Out
of the top 20 products only four are neither toilet paper nor tissue
(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-
gle serve coffees). In the top 20 list, the only unexpected item is
the Frozen DVD, which probably made the list due to users buying
additional copies as gifts or due to purchases by small stores that
were not eliminated by our removal criterion of maximum 1,000
purchases. Interestingly, the number of days between purchases for
most of the top 20 items is close to 1 or 2 months, which might be
due to automatic purchasing that users can set up.
3.2.3 Finite Budget Effects
Finally, we study the dynamics of individual purchasing behav-
ior. Figure 9 shows the distribution of number of days between
purchases. The distribution is heavy-tailed, indicating bursty dy-
namics. The most likely time between purchases is one day and
there are local maxima around multiples of 7 days, consistent with
weekly cycles we observed.
An individual’s purchasing decisions are not independent, but
constrained by their finances. Budgetary constraints introduce tem-
poral dependencies between purchases made by the user: After
buying a product, the user has to wait some time to accumulate
money to make another purchase. Previous work in economics
studied models of budget allocation of households across different
types of goods to maximize an utility function [13] and analyzed
0.19
0.20
0.21
0.22
0 50 100 150
Days from last purchase
Averagenormalizedprice
Normal
Shuffled
(a) Users with 5 purchases
0.100
0.105
0.110
0.115
0.120
0 50 100 150
Days from last purchase
Averagenormalizedprice
Normal
Shuffled
(b) Users with 9-11 purchases
0.030
0.035
0.040
0.045
0.050
0 50 100 150
Days from last purchase
Averagenormalizedprice
Normal
Shuffled
(c) Users with 28-32 purchases
Figure 11: Relationship between purchase price and time to next purchase with 0.95 confidence interval
conditions under which people are willing to break their budget
cap [8]. However, we are not aware of any study aimed to support
the hypothesis of the time of purchase being partly driven by an
underlying cyclic process of budget depletion and replenishment.
To test this hypothesis, we examined the relationship between
purchase price and the time period since last purchase. Since differ-
ent users have different spending power, we considered the normal-
ized change in the price given the number of days from the last pur-
chase. In other words, we computed how users divide their personal
spending across different purchases, given the time delay between
purchases. We then averaged the normalized values for all users,
and report the change for each time delay. Figure 10 shows that as
the time delay gets longer, users spend higher fraction of their bud-
gets, which supports our hypothesis. To test that our analysis does
not have any bias in the way the users are grouped, we perform a
shuffle test by randomly swapping the prices of products purchased
by users. This destroys the correlation between the time delay and
product price. We then do the same analysis with the shuffled data
and expect to see a flat line. However, the same increase also exists
in the shuffled data, indicating a bias in the methodology. This is
due to the heterogeneity of the underlying population: we are mix-
ing users with different number of purchases. Users making more
purchases have lower normalized prices and also shorter time de-
lays, and are systematically overrepresented in the left side of the
plot, even in the shuffled data.
To partially account for heterogeneity, we grouped users by the
number of purchases (i.e., those who made exactly 5 purchases,
those with 9-11 purchases, and 28-32 purchases). Even within
each group there is variation as the total spending differs signifi-
cantly across users, which we address by normalizing the product
price by the total amount of money spent by the user, as explained
above. If our hypothesis is correct, there should be a refractory pe-
riod after a purchase, with users waiting longer to make a larger
tagions in general, and “word-of-mouth” marketing in particular,
although empirical evidence suggests that influence has a limited
effect on shopping behavior [27]. Alternatively, users could have
bought the same product as their friends purchased, because peo-
ple tend to be similar to their friends, and therefore, have similar
needs. The tendency of socially connected individuals to be similar
is called homophily, and it is a strong organizing principle of so-
cial networks. Studies have found that people tend to interact with
others who belong to a similar socio-economic class [16, 29] or
share similar interests [25, 4]. Finally, a user’s and their friend’s be-
havior may be spuriously correlated because both depend on some
other external factor, such as geographic proximity. In reality, all
these effects are interconnected [9, 3] and are difficult to disentan-
gle from observational data [35]. For example, homophily often
results in selective exposure that may amplify social influence and
word-of-mouth effects.
We investigate whether social correlations exist, although we do
not resolve the source of the correlation. Specifically, we study
whether users who are connected to each other via email interac-
tions tend to purchase similar products in contrast to users who
are not connected. To measure similarity of purchases between
two users, we first describe the purchases made by each user with
a vector of products, each entry containing the frequency of pur-
chase. This approach results in large and sparse vectors due to the
large number of unique products in our data set. To address this
challenge, we use vectors of product categories, instead of prod-
uct names. There are three levels of product categories, and we
perform our experiments at all levels.
We compare similarity of category vectors of pairs of users who
are directly connected in the email network (104K pairs of users)
with the same number of pairs of randomly chosen users (who
are not directly connected). We use cosine similarity to measure
similarity of two vectors. Using top-level categories to describe
9
10
11
12

More Related Content

Similar to WSDM2016勉強会 資料

Economics mc conell brue ch 21 study questions
Economics mc conell brue ch 21 study questionsEconomics mc conell brue ch 21 study questions
Economics mc conell brue ch 21 study questions
Haroon Ahmed
 
Your worker role supports
Your worker role supportsYour worker role supports
Your worker role supports
Bayraa Taminaa
 
PowerRuck Final Paper UPDATED (1) (1)
PowerRuck Final Paper UPDATED (1) (1)PowerRuck Final Paper UPDATED (1) (1)
PowerRuck Final Paper UPDATED (1) (1)
Mitchell H Miller
 
Goal 8 supply and demand
Goal 8 supply and demandGoal 8 supply and demand
Goal 8 supply and demand
jenniferdavis22
 
Goal 8 demand and changes
Goal 8   demand and changesGoal 8   demand and changes
Goal 8 demand and changes
Mrs. Sharbs
 

Similar to WSDM2016勉強会 資料 (18)

2015 day 4
2015 day 42015 day 4
2015 day 4
 
Economics mc conell brue ch 21 study questions
Economics mc conell brue ch 21 study questionsEconomics mc conell brue ch 21 study questions
Economics mc conell brue ch 21 study questions
 
Artificial Intelligence - AI marketing 2021 - EMDM - Hugues Rey - Solvay ...
Artificial Intelligence - AI marketing    2021 -  EMDM - Hugues Rey - Solvay ...Artificial Intelligence - AI marketing    2021 -  EMDM - Hugues Rey - Solvay ...
Artificial Intelligence - AI marketing 2021 - EMDM - Hugues Rey - Solvay ...
 
018 Essay Example Pay Someone To Write My
018 Essay Example Pay Someone To Write My018 Essay Example Pay Someone To Write My
018 Essay Example Pay Someone To Write My
 
Essay Of Macbeth Introduction
Essay Of Macbeth IntroductionEssay Of Macbeth Introduction
Essay Of Macbeth Introduction
 
Your worker role supports
Your worker role supportsYour worker role supports
Your worker role supports
 
Definition Essay Examples And Topi. Online assignment writing service.
Definition Essay Examples And Topi. Online assignment writing service.Definition Essay Examples And Topi. Online assignment writing service.
Definition Essay Examples And Topi. Online assignment writing service.
 
Green Grass Running Water Essay. Online assignment writing service.
Green Grass Running Water Essay. Online assignment writing service.Green Grass Running Water Essay. Online assignment writing service.
Green Grass Running Water Essay. Online assignment writing service.
 
PowerRuck Final Paper UPDATED (1) (1)
PowerRuck Final Paper UPDATED (1) (1)PowerRuck Final Paper UPDATED (1) (1)
PowerRuck Final Paper UPDATED (1) (1)
 
How To Write An Essay Competition. Online assignment writing service.
How To Write An Essay Competition. Online assignment writing service.How To Write An Essay Competition. Online assignment writing service.
How To Write An Essay Competition. Online assignment writing service.
 
Goal 8 supply and demand
Goal 8 supply and demandGoal 8 supply and demand
Goal 8 supply and demand
 
Goal 8 demand and changes
Goal 8   demand and changesGoal 8   demand and changes
Goal 8 demand and changes
 
Introduction Of Essay Writing. Online assignment writing service.
Introduction Of Essay Writing. Online assignment writing service.Introduction Of Essay Writing. Online assignment writing service.
Introduction Of Essay Writing. Online assignment writing service.
 
Year of the Watch
Year of the WatchYear of the Watch
Year of the Watch
 
Persuasive Essay On Video Games
Persuasive Essay On Video GamesPersuasive Essay On Video Games
Persuasive Essay On Video Games
 
FREE ITEMS Cocomojodesigns Writing
FREE ITEMS  Cocomojodesigns  WritingFREE ITEMS  Cocomojodesigns  Writing
FREE ITEMS Cocomojodesigns Writing
 
e1500599.full
e1500599.fulle1500599.full
e1500599.full
 
Report: mydata value orange - future of digital trust
Report: mydata value orange - future of digital trustReport: mydata value orange - future of digital trust
Report: mydata value orange - future of digital trust
 

More from Toshihiro Kamishima

Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
Toshihiro Kamishima
 
Considerations on Recommendation Independence for a Find-Good-Items Task
Considerations on Recommendation Independence for a Find-Good-Items TaskConsiderations on Recommendation Independence for a Find-Good-Items Task
Considerations on Recommendation Independence for a Find-Good-Items Task
Toshihiro Kamishima
 
The Independence of Fairness-aware Classifiers
The Independence of Fairness-aware ClassifiersThe Independence of Fairness-aware Classifiers
The Independence of Fairness-aware Classifiers
Toshihiro Kamishima
 
Efficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced RecommendationEfficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced Recommendation
Toshihiro Kamishima
 
Consideration on Fairness-aware Data Mining
Consideration on Fairness-aware Data MiningConsideration on Fairness-aware Data Mining
Consideration on Fairness-aware Data Mining
Toshihiro Kamishima
 
Fairness-aware Classifier with Prejudice Remover Regularizer
Fairness-aware Classifier with Prejudice Remover RegularizerFairness-aware Classifier with Prejudice Remover Regularizer
Fairness-aware Classifier with Prejudice Remover Regularizer
Toshihiro Kamishima
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
Toshihiro Kamishima
 
Fairness-aware Learning through Regularization Approach
Fairness-aware Learning through Regularization ApproachFairness-aware Learning through Regularization Approach
Fairness-aware Learning through Regularization Approach
Toshihiro Kamishima
 

More from Toshihiro Kamishima (18)

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料
 
WSDM2018読み会 資料
WSDM2018読み会 資料WSDM2018読み会 資料
WSDM2018読み会 資料
 
Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用
 
Considerations on Recommendation Independence for a Find-Good-Items Task
Considerations on Recommendation Independence for a Find-Good-Items TaskConsiderations on Recommendation Independence for a Find-Good-Items Task
Considerations on Recommendation Independence for a Find-Good-Items Task
 
ICML2015読み会 資料
ICML2015読み会 資料ICML2015読み会 資料
ICML2015読み会 資料
 
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
 
Correcting Popularity Bias by Enhancing Recommendation Neutrality
Correcting Popularity Bias by Enhancing Recommendation NeutralityCorrecting Popularity Bias by Enhancing Recommendation Neutrality
Correcting Popularity Bias by Enhancing Recommendation Neutrality
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
 
The Independence of Fairness-aware Classifiers
The Independence of Fairness-aware ClassifiersThe Independence of Fairness-aware Classifiers
The Independence of Fairness-aware Classifiers
 
Efficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced RecommendationEfficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced Recommendation
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
 
Consideration on Fairness-aware Data Mining
Consideration on Fairness-aware Data MiningConsideration on Fairness-aware Data Mining
Consideration on Fairness-aware Data Mining
 
Fairness-aware Classifier with Prejudice Remover Regularizer
Fairness-aware Classifier with Prejudice Remover RegularizerFairness-aware Classifier with Prejudice Remover Regularizer
Fairness-aware Classifier with Prejudice Remover Regularizer
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
 
Fairness-aware Learning through Regularization Approach
Fairness-aware Learning through Regularization ApproachFairness-aware Learning through Regularization Approach
Fairness-aware Learning through Regularization Approach
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理
 

Recently uploaded

1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 

WSDM2016勉強会 資料

  • 1. Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior F. Kooti, K. Lerman, L. M. Aiello, M. Grbovic, and N. Djuric http://www.kamishima.net/ WSDM206 2016/03/19 1
  • 2. 2
  • 3. 3
  • 4. 4 nger between cial networks consumer be- s through the w consumers ormation and of recommen- addition, peo- imilar to one e more likely onfirmed that other tend to oppers. behavior, we Given users’ a problem of Our method dom baseline % relative im- he time of the ere shown to poral features ow: et describing mations mer- hic, temporal, ); ct the time of e spent (Sec- efit both con- are ready to pend can im- terms of op- tanding these 0.0 0.1 0.2 100 101 102 103 # of purchases PDF (a) PDF 0.4 0.6 0.8 1.0 100 101 102 # of purchases CDF (b) CDF Figure 1: Distribution of number of purchases made by users 0.000 0.003 0.006 0.009 100 101 102 103 Total money spent ($) PDF (a) PDF 0.00 0.25 0.50 0.75 1.00 100 101 102 103 104 Total money spent ($) CDF (b) CDF Figure 2: Distribution of total money spent by users handbags), we developed a categorization module based on the pur- chased item names. Specifically, an item name was used as an input to a classifier that predicts the item category. We used a three levels deep, 1,733 node Pricegrabber taxonomy4 to categorize the items. The details of categorization are beyond the scope of this paper. We limited our study to a random subset of Yahoo Mail users in the US. Our data set contains information on 20.1M users, who col- lectively made 121M purchases from several top retailers between February and September 2014, amounting to total spending of 5.1B dollars. For each user we extracted age, gender, and zip code in- formation from the Yahoo user database. We excluded users who made more than 1,000 purchases (amounting to less than 0.01% of the sample), as these accounts are likely to belong to stores and not individual users. The analysis of the data set was performed in aggregate and on anonymized data. 10−6 10−4 10−2 100 100 101 102 103 104 # of times being purchased PDF Figure 3: Number of times different items have been purchased 101 102 103 104 105 106 107 100 101 102 103 Item price #oftimespurchased Figure 4: Number of item purchases as a function of price DVD, the most popular item in the data set, has been purchased more than 200,000 times, whereas the vast majority of the items has been purchased less than 10 times. Table 1 lists the top 5 most fre- quently purchased items. Intuitively, the set of items that users have spent the most money on is a different set, because a single pur- chase of an expensive item would account for the same amount of money of several purchases of cheaper items (Table 2). In fact, the number of times an item is purchased negatively correlates with the price of that item (Figure 4). This is in line with previous survey- based studies [7] that found the vast majority of items purchased online are worth at most few tens of dollars. 3. PURCHASE PATTERN ANALYSIS In this section we present a quantitative analysis of factors affecting online purchases. We examine the role of demographic, temporal, and social factors that include gender and age, daily and weekly Table 1 Rank Produc 1 Frozen 2 Cards A 3 Google 4 HDMI c 5 Pamper Table 2: Top 5 pr Rank Product 1 Play Stat 2 Frozen (D 3 Kindle 4 Samsung 5 Cards Ag Table 3: Top produ Rank Top categ 1 Android 2 Accessori 3 Books 4 Vitamins 5 Shirts With respect to get older, peaking then declining after purchases made, av ures 5(b), 5(d), 5(c) types of products on old) purchase more shoppers (60-70 yea shows. Also, blood users, which is expe Differences exist five categories of p tomers. Even thoug each product accoun the same gender. To pare the fraction of
  • 5. 5 rchased price urchased ems has most fre- ers have gle pur- mount of fact, the with the survey- urchased ffecting mporal, weekly ing pur- p code) ed frac- We found pared to es (Fig- )). It is in total groups. mer sur- ntage of for neg- a higher Table 2: Top 5 products with the most money spent on them Rank Product name Money spent on product 1 Play Station 4 $ 7.0M 2 Frozen (DVD) $ 3.8M 3 Kindle $ 3.2M 4 Samsung Galaxy Tab $ 2.7M 5 Cards Against Humanity $ 2.5M Table 3: Top product categories purchased by women and men Rank Top categories Distinctive women Distinctive men 1 Android Books Games 2 Accessories Dresses Flash memory 3 Books Diapering Light bulbs 4 Vitamins Wallets Accessories 5 Shirts Bracelets Batteries With respect to the age, spending ability increases as people get older, peaking for the population between age 30 to 50 and then declining afterwards. The same pattern holds for number of purchases made, average item price, and total money spent (Fig- ures 5(b), 5(d), 5(c)). Different generations also purchase different types of products online (Table 5). Younger shoppers (18-22 years old) purchase more phone accessories and games, whereas older shoppers (60-70 years old) are much more interested in buying TV shows. Also, blood sugar medicine is purchased more by the older users, which is expected. Differences exist across genders as well. Table 3 shows the top five categories of purchased products for male and female cus- tomers. Even though the ranking of the top products is the same, each product accounts for different fraction of all purchases within the same gender. To find the most distinctive categories, we com- pare the fraction of all the items bought by both genders, and con- sider the categories that have the largest differences. Books, dresses, and diapering are the categories that were more disproportionately bought by women, whereas games, flash memory sticks, and ac- cessories (e.g., headphones) are the categories purchased more by men. The largest differences range from only 0.5% to 1%, but are statistically significant. This result is aligned with previous research on offline shopping that found men more keen in buy- ing electronics and entertainment products, while women more in- clined to buy clothes [11, 22]. We repeated the same gender analy- sis at a product level (Table 4). Consistently, there is a high overlap between most purchased products. In the following, we measure the impact of economic factors on online shopping behavior. We use the US census data to re- trieve median income associated with each zip code, making an inferred income for a user an aggregated estimate. Nevertheless, given the large size of this data set this coarse approach was enough to observe clear trends. The number of purchases, average product 20 30 40 50 20 30 40 50 60 Age %ofonlineshoppers Men Women (a) Percentage of online shoppers 3 4 5 6 7 20 40 60 80 Age Avg#ofitemspurchased Men Women (b) Number of purchases 20 30 20 40 60 80 Age Medianofproductprice Men Women (c) Average price 25 50 75 100 125 20 40 60 80 Age Mediantotal$spent Men Women (d) Total money spent Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) average price of products purchased; and (d) total spent by men and women Table 4: Top products purchased by women and men Rank Top women Top men Distinctive women Distinctive men 1 Frozen Frozen Frozen Chromecast 2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones 3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One 4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4 5 Game of Thrones iPhone screen protector iPhone case Godfather collection Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users Rank Top younger users Top older users Distinctive younger users Distinctive older users 20 30 20 30 40 50 60 Age %ofonlines (a) Percentage of online shoppers 3 4 5 20 40 60 80 Age Avg#ofitems Men Women (b) Number of purchases 20 20 40 60 80 Age Medianofpro Men Women (c) Average price 25 50 75 100 20 40 60 80 Age Mediantotal Men Women (d) Total money spent Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) average price of products purchased; and (d) total spent by men and women Table 4: Top products purchased by women and men Rank Top women Top men Distinctive women Distinctive men 1 Frozen Frozen Frozen Chromecast 2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones 3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One 4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4 5 Game of Thrones iPhone screen protector iPhone case Godfather collection Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users Rank Top younger users Top older users Distinctive younger users Distinctive older users 1 iPhone screen protector Frozen iPhone screen protector Frozen 2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones 3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey 4 iPhone case Downton Abbey iPhone case Blood sugar medicine 5 Frozen Hunger Games iPhone case TurboTax Package 0.0 2.5 5.0 7.5 50k 100k 150k Income Average#ofpurchases (a) Number of purchases 0 20 40 60 50k 100k 150k Income Averageitemprice (b) Average item price 0 100 200 300 400 500 50k 100k 150k Income Averagetotalspent (c) Total money spent Figure 6: Effect of income on purchasing behavior price, and total money spent are all positively correlated with in- come (Figures 6(a), 6(b), and 6(c) respectively). While users living in high-income zip codes do not buy substantially more expensive products, they do make more purchases, spending more money in total than users from lower-income zip codes. Although the fac- tors that lead to lower-income households spending less online are multiple and complex, part of this effect can be explained by the reluctance of people who are concerned with their financial safety to trust and make full use of online shopping, as pointed out by previous studies [23]. 3.2 Temporal Factors Our data set spans a period of eight months, giving us opportu- nity to investigate temporal dynamics of purchasing behavior and factors affecting it. Besides daily and weekly cycles and periodic purchasing, we observed temporal variations that we associate with financial depletion: users wait longer to buy more expensive items, waiting for the budget to recover from the previous purchases. 3.2.1 Daily and Weekly Cycles Figure 7 shows the daily number of purchases over a period of two months. The figure indicates a clear weekly shopping pattern with more purchases taking place in the first days of the week and fewer purchases on the weekends. On average there are 32.6% more purchases on Mondays than Sundays. There also exists strong diurnal patterns in shopping behavior (Figure 8). Interestingly, most of the purchases occur during the working hours (i.e., in the morning and early afternoon). Note that for this analysis we in-
  • 6. 6 1 Frozen Frozen Frozen Chromecast 2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones 3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One 4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4 5 Game of Thrones iPhone screen protector iPhone case Godfather collection Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users Rank Top younger users Top older users Distinctive younger users Distinctive older users 1 iPhone screen protector Frozen iPhone screen protector Frozen 2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones 3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey 4 iPhone case Downton Abbey iPhone case Blood sugar medicine 5 Frozen Hunger Games iPhone case TurboTax Package 0.0 2.5 5.0 7.5 50k 100k 150k Income Average#ofpurchases (a) Number of purchases 0 20 40 60 50k 100k 150k Income Averageitemprice (b) Average item price 0 100 200 300 400 500 50k 100k 150k Income Averagetotalspent (c) Total money spent Figure 6: Effect of income on purchasing behavior price, and total money spent are all positively correlated with in- come (Figures 6(a), 6(b), and 6(c) respectively). While users living in high-income zip codes do not buy substantially more expensive products, they do make more purchases, spending more money in total than users from lower-income zip codes. Although the fac- factors affecting it. Besides daily and weekly cycles and periodic purchasing, we observed temporal variations that we associate with financial depletion: users wait longer to buy more expensive items, waiting for the budget to recover from the previous purchases. 3.2.1 Daily and Weekly Cycles Rank Top women Top men Distinctive women Distinctive men 1 Frozen Frozen Frozen Chromecast 2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones 3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One 4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 4 5 Game of Thrones iPhone screen protector iPhone case Godfather collection Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users Rank Top younger users Top older users Distinctive younger users Distinctive older users 1 iPhone screen protector Frozen iPhone screen protector Frozen 2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones 3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey 4 iPhone case Downton Abbey iPhone case Blood sugar medicine 5 Frozen Hunger Games iPhone case TurboTax Package 0.0 2.5 5.0 7.5 50k 100k 150k Income Average#ofpurchases (a) Number of purchases 0 20 40 60 50k 100k 150k Income Averageitemprice (b) Average item price 0 100 200 300 400 500 50k 100k 150k Income Averagetotalspent (c) Total money spent Figure 6: Effect of income on purchasing behavior price, and total money spent are all positively correlated with in- come (Figures 6(a), 6(b), and 6(c) respectively). While users living in high-income zip codes do not buy substantially more expensive products, they do make more purchases, spending more money in total than users from lower-income zip codes. Although the fac- tors that lead to lower-income households spending less online are multiple and complex, part of this effect can be explained by the reluctance of people who are concerned with their financial safety to trust and make full use of online shopping, as pointed out by previous studies [23]. 3.2 Temporal Factors Our data set spans a period of eight months, giving us opportu- nity to investigate temporal dynamics of purchasing behavior and factors affecting it. Besides daily and weekly cycles and periodic purchasing, we observed temporal variations that we associate with financial depletion: users wait longer to buy more expensive items, waiting for the budget to recover from the previous purchases. 3.2.1 Daily and Weekly Cycles Figure 7 shows the daily number of purchases over a period of two months. The figure indicates a clear weekly shopping pattern with more purchases taking place in the first days of the week and fewer purchases on the weekends. On average there are 32.6% more purchases on Mondays than Sundays. There also exists strong diurnal patterns in shopping behavior (Figure 8). Interestingly, most of the purchases occur during the working hours (i.e., in the morning and early afternoon). Note that for this analysis we in-
  • 7. 7 8 × 10+5 1.2 × 10+6 1.6 × 10+6 2 × 10+6 Jun15 Jul15 Aug15 Date Numberofpurchases Daily Week avg Figure 7: Number of purchases in a day and average weekly 2 × 10+6 4 × 10+6 6 × 10+6 8 × 10+6 0 5 10 15 20 Time of the day Numberofpurchases Figure 8: Number of purchases in each hour of the day Table 6: Top 5 items with the most number of repurchases Rank Item name Median purchase delay 1 Pampers 448 count 42 2 Bath tissue 62 3 Pampers 162 count 30 4 Pampers 152 count 31 5 Frozen 12 ferred the time zone from the user’s zip code, which might be dif- ferent from a shipping zip code for a purchase. Researchers have also reported monthly effects, where people spend more money at the beginning of the month when they re- ceive their paychecks, compared to the end of the month [14]. To test the first-of-the-month phenomenon, we compared spending in 10−5 10−3 10−1 100 101 102 Days from last purchase PDF Figure 9: Distribution of number of days between purchases 0.10 0.15 0.20 0.25 0.30 0 50 100 150 Days from last purchase Averagenormalizedprice Figure 10: Relationship between purchase price and time to next purchase (0.95 confidence interval are shown yet too small to be observed) purchases per each item. Table 6 shows the top five such products, along with the median number of days between purchases. Out of the top 20 products only four are neither toilet paper nor tissue (Frozen, Amazon gift card, chocolate chip cookie dough, and sin- gle serve coffees). In the top 20 list, the only unexpected item is the Frozen DVD, which probably made the list due to users buying additional copies as gifts or due to purchases by small stores that were not eliminated by our removal criterion of maximum 1,000 purchases. Interestingly, the number of days between purchases for most of the top 20 items is close to 1 or 2 months, which might be 8 × 10+5 1.2 × 10+6 Jun15 Jul15 Aug15 Date Number Figure 7: Number of purchases in a day and average weekly 2 × 10+6 4 × 10+6 6 × 10+6 8 × 10+6 0 5 10 15 20 Time of the day Numberofpurchases Figure 8: Number of purchases in each hour of the day Table 6: Top 5 items with the most number of repurchases Rank Item name Median purchase delay 1 Pampers 448 count 42 2 Bath tissue 62 3 Pampers 162 count 30 4 Pampers 152 count 31 5 Frozen 12 ferred the time zone from the user’s zip code, which might be dif- ferent from a shipping zip code for a purchase. Researchers have also reported monthly effects, where people spend more money at the beginning of the month when they re- ceive their paychecks, compared to the end of the month [14]. To test the first-of-the-month phenomenon, we compared spending in the first Monday of the month with the last Monday of the month. We considered the first and the last Mondays and not the first and the last days, because the strong weekly patterns would result in an unfair comparison if the first and the last day of the month are not the same day of the week. Our data does not support the ear- lier findings and there are months in which the last Monday of the month includes more activity compared to the first Monday of the month. 3.2.2 Recurring Purchases Some products are purchased periodically, such as printer car- tridges, water filters, or toilet paper. Finding these items and their typical cycle would help predicting purchasing behavior. We do this by counting the number of times each item has been purchased 10−5 10−3 100 101 102 Days from last purchase P Figure 9: Distribution of number of days between purchases 0.10 0.15 0.20 0.25 0.30 0 50 100 150 Days from last purchase Averagenormalizedprice Figure 10: Relationship between purchase price and time to next purchase (0.95 confidence interval are shown yet too small to be observed) purchases per each item. Table 6 shows the top five such products, along with the median number of days between purchases. Out of the top 20 products only four are neither toilet paper nor tissue (Frozen, Amazon gift card, chocolate chip cookie dough, and sin- gle serve coffees). In the top 20 list, the only unexpected item is the Frozen DVD, which probably made the list due to users buying additional copies as gifts or due to purchases by small stores that were not eliminated by our removal criterion of maximum 1,000 purchases. Interestingly, the number of days between purchases for most of the top 20 items is close to 1 or 2 months, which might be due to automatic purchasing that users can set up. 3.2.3 Finite Budget Effects Finally, we study the dynamics of individual purchasing behav- ior. Figure 9 shows the distribution of number of days between purchases. The distribution is heavy-tailed, indicating bursty dy- namics. The most likely time between purchases is one day and there are local maxima around multiples of 7 days, consistent with weekly cycles we observed. An individual’s purchasing decisions are not independent, but constrained by their finances. Budgetary constraints introduce tem- poral dependencies between purchases made by the user: After buying a product, the user has to wait some time to accumulate money to make another purchase. Previous work in economics 8 × 10+5 1.2 × 10+6 1.6 × 10+6 2 × 10+6 Jun15 Jul15 Aug15 Date Numberofpurchases Daily Week avg Figure 7: Number of purchases in a day and average weekly 2 × 10+6 4 × 10+6 6 × 10+6 8 × 10+6 0 5 10 15 20 Time of the day Numberofpurchases Figure 8: Number of purchases in each hour of the day Table 6: Top 5 items with the most number of repurchases Rank Item name Median purchase delay 1 Pampers 448 count 42 2 Bath tissue 62 3 Pampers 162 count 30 4 Pampers 152 count 31 5 Frozen 12 ferred the time zone from the user’s zip code, which might be dif- ferent from a shipping zip code for a purchase. Researchers have also reported monthly effects, where people spend more money at the beginning of the month when they re- ceive their paychecks, compared to the end of the month [14]. To test the first-of-the-month phenomenon, we compared spending in the first Monday of the month with the last Monday of the month. We considered the first and the last Mondays and not the first and the last days, because the strong weekly patterns would result in 10−5 10−3 10−1 100 101 102 Days from last purchase PDF Figure 9: Distribution of number of days between purchases 0.10 0.15 0.20 0.25 0.30 0 50 100 150 Days from last purchase Averagenormalizedprice Figure 10: Relationship between purchase price and time to next purchase (0.95 confidence interval are shown yet too small to be observed) purchases per each item. Table 6 shows the top five such products, along with the median number of days between purchases. Out of the top 20 products only four are neither toilet paper nor tissue (Frozen, Amazon gift card, chocolate chip cookie dough, and sin- gle serve coffees). In the top 20 list, the only unexpected item is the Frozen DVD, which probably made the list due to users buying additional copies as gifts or due to purchases by small stores that were not eliminated by our removal criterion of maximum 1,000 purchases. Interestingly, the number of days between purchases for most of the top 20 items is close to 1 or 2 months, which might be due to automatic purchasing that users can set up. 3.2.3 Finite Budget Effects
  • 8. 
 8 ug15 weekly day chases ay 42 62 30 31 12 ht be dif- e people they re- [14]. To nding in e month. first and result in onth are the ear- ay of the ay of the nter car- and their We do urchased ose prod- umber of 10−5 100 101 102 Days from last purchase Figure 9: Distribution of number of days between purchases 0.10 0.15 0.20 0.25 0.30 0 50 100 150 Days from last purchase Averagenormalizedprice Figure 10: Relationship between purchase price and time to next purchase (0.95 confidence interval are shown yet too small to be observed) purchases per each item. Table 6 shows the top five such products, along with the median number of days between purchases. Out of the top 20 products only four are neither toilet paper nor tissue (Frozen, Amazon gift card, chocolate chip cookie dough, and sin- gle serve coffees). In the top 20 list, the only unexpected item is the Frozen DVD, which probably made the list due to users buying additional copies as gifts or due to purchases by small stores that were not eliminated by our removal criterion of maximum 1,000 purchases. Interestingly, the number of days between purchases for most of the top 20 items is close to 1 or 2 months, which might be due to automatic purchasing that users can set up. 3.2.3 Finite Budget Effects Finally, we study the dynamics of individual purchasing behav- ior. Figure 9 shows the distribution of number of days between purchases. The distribution is heavy-tailed, indicating bursty dy- namics. The most likely time between purchases is one day and there are local maxima around multiples of 7 days, consistent with weekly cycles we observed. An individual’s purchasing decisions are not independent, but constrained by their finances. Budgetary constraints introduce tem- poral dependencies between purchases made by the user: After buying a product, the user has to wait some time to accumulate money to make another purchase. Previous work in economics studied models of budget allocation of households across different types of goods to maximize an utility function [13] and analyzed 0.19 0.20 0.21 0.22 0 50 100 150 Days from last purchase Averagenormalizedprice Normal Shuffled (a) Users with 5 purchases 0.100 0.105 0.110 0.115 0.120 0 50 100 150 Days from last purchase Averagenormalizedprice Normal Shuffled (b) Users with 9-11 purchases 0.030 0.035 0.040 0.045 0.050 0 50 100 150 Days from last purchase Averagenormalizedprice Normal Shuffled (c) Users with 28-32 purchases Figure 11: Relationship between purchase price and time to next purchase with 0.95 confidence interval conditions under which people are willing to break their budget cap [8]. However, we are not aware of any study aimed to support the hypothesis of the time of purchase being partly driven by an underlying cyclic process of budget depletion and replenishment. To test this hypothesis, we examined the relationship between purchase price and the time period since last purchase. Since differ- ent users have different spending power, we considered the normal- ized change in the price given the number of days from the last pur- chase. In other words, we computed how users divide their personal spending across different purchases, given the time delay between purchases. We then averaged the normalized values for all users, and report the change for each time delay. Figure 10 shows that as the time delay gets longer, users spend higher fraction of their bud- gets, which supports our hypothesis. To test that our analysis does not have any bias in the way the users are grouped, we perform a shuffle test by randomly swapping the prices of products purchased by users. This destroys the correlation between the time delay and product price. We then do the same analysis with the shuffled data and expect to see a flat line. However, the same increase also exists in the shuffled data, indicating a bias in the methodology. This is due to the heterogeneity of the underlying population: we are mix- ing users with different number of purchases. Users making more purchases have lower normalized prices and also shorter time de- lays, and are systematically overrepresented in the left side of the plot, even in the shuffled data. To partially account for heterogeneity, we grouped users by the number of purchases (i.e., those who made exactly 5 purchases, those with 9-11 purchases, and 28-32 purchases). Even within each group there is variation as the total spending differs signifi- cantly across users, which we address by normalizing the product price by the total amount of money spent by the user, as explained above. If our hypothesis is correct, there should be a refractory pe- riod after a purchase, with users waiting longer to make a larger tagions in general, and “word-of-mouth” marketing in particular, although empirical evidence suggests that influence has a limited effect on shopping behavior [27]. Alternatively, users could have bought the same product as their friends purchased, because peo- ple tend to be similar to their friends, and therefore, have similar needs. The tendency of socially connected individuals to be similar is called homophily, and it is a strong organizing principle of so- cial networks. Studies have found that people tend to interact with others who belong to a similar socio-economic class [16, 29] or share similar interests [25, 4]. Finally, a user’s and their friend’s be- havior may be spuriously correlated because both depend on some other external factor, such as geographic proximity. In reality, all these effects are interconnected [9, 3] and are difficult to disentan- gle from observational data [35]. For example, homophily often results in selective exposure that may amplify social influence and word-of-mouth effects. We investigate whether social correlations exist, although we do not resolve the source of the correlation. Specifically, we study whether users who are connected to each other via email interac- tions tend to purchase similar products in contrast to users who are not connected. To measure similarity of purchases between two users, we first describe the purchases made by each user with a vector of products, each entry containing the frequency of pur- chase. This approach results in large and sparse vectors due to the large number of unique products in our data set. To address this challenge, we use vectors of product categories, instead of prod- uct names. There are three levels of product categories, and we perform our experiments at all levels. We compare similarity of category vectors of pairs of users who are directly connected in the email network (104K pairs of users) with the same number of pairs of randomly chosen users (who are not directly connected). We use cosine similarity to measure similarity of two vectors. Using top-level categories to describe
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12