More Related Content
Similar to 市場銷售資料數據分析 / 市场销售数据分析 Transformation for marketing research & analytics (20)
市場銷售資料數據分析 / 市场销售数据分析 Transformation for marketing research & analytics
- 1. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Transformation for Marketing Research & Analytics
useful transformation, Box-Cox transform, power transform
Will Kuan 官大鈞
- 2. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
hero of today is about …
part of feature engineering,
【feature transformation】
so, what is feature
transformation ?
- 3. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
特徵轉換
Feature Transformation
make-up for features
The essence of data never changes, we just change another way or
perspective to observe data
What :
Why :
good thing is always good no matter what we do to it , so
bad is still bad as well. What we need to focus and to do is
uncover or discover hidden valuable variables(features). Thus,
personally I prefer to call what we do is feature discovery.
it’s still what it is indeed,
just change what it looks like
- 4. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
How large correlation means significant correlated in marketing?
In science, basically we might need = 0.9, 0.95 or 0.999 so that
we can say correlation is strong. However, in marketing, too
much noise surrounded so normally is very small, but that does
not mean which is not important or not significant. Thus, usually
we use Cohen’s rule in social science.
Cohen’s rule of thumb
Effect Size (Cohen)
.10 Small
.30 Moderate
.50 Large
- 5. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Most importantly, Cohen’s rule is based on normally distributed
variables (Gaussian distribution).
How do we explain Cohen’s highly correlated relationship? It’s
easy for everyone to be aware of something correlated. How’s
the weak correlation? which actually means we have to test with
carefulness if we want to find out.
Cohen’s rule of thumb
Effect Size (Cohen)
.10 Small
.30 Moderate
.50 Large
- 6. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
變數(屬性、特徵) 常見轉換
銷售量、公司收入、
家庭收入、價格
log(𝑥)
距離 1
𝑥
,
1
𝑥2, log(𝑥)
市場份額、選擇偏好 𝑒 𝑥
1 + 𝑒 𝑥
右偏分佈 𝑥 , log(𝑥) (對於log 𝑥 0 慎用)
左偏分佈 𝑥2
Useful Transformation
- 7. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
There’s a retail customer transaction data set
【cust.df】 with 1000 observations and 12 variables
This example is from 【 R for Marketing Research and Analytics 】
by C. Chapman & E. M. Feit
- 8. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
1) 對顧客住址離最近商店的距離與顧客到店消費金額計算相關性
outcome > 大小適度的負相關
2) 對顧客住址離最近商店的距離取倒數後計算相關性,得到不一樣的結果
outcome > 比原先強得多的相關性
3) 對顧客住址離最近商店的距離開平方並取倒數後計算相關性,又得到不一樣的結果
outcome > 更強的相關性
- 9. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
but, how to explain this situation ?
Perhaps we can say that there exists a “reciprocal squared” relationship
between both. More directly, customers living in 1 miles from the
nearest store spend more in the store than customers living in 5 miles.
However, customers living in 20 miles and 30 miles are more or less in
store spending.
- 10. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
let’s check by scatterplot
1/ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑠𝑡𝑜𝑟𝑒𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑠𝑡𝑜𝑟𝑒
spendinginstore
- 11. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Before Next Page
brief conclusion
It is highly important to make data normal before calculating correlation
or building scatterplot. Some good converters are able to help us see
relationship among variables more clearly.
- 12. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Box-Cox 轉換
Box-Cox Transformation
automation formula
𝑦𝑖
− 1
, 0
log 𝑦 , = 0
𝑦𝑖
=
- 13. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
1) an automatic way , power transformation, to find an appropriate formula
for the feature, distance .to.store, than manual way.
2) use coef( ) to get lambda, then use bcPower( ) to run Box-Cox transformation
- 15. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
Recall
1) Calculate the correlation after transformation, basically we will get stronger correlation.
outcome > We can say distance to store & spending in
store is strong negative correlated.
- 16. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Before doing correlation test or plotting scatterplot, it is highly
recommended to do Box-Cox transform to all nongaussian
distributed variables. That will increase more possibilities to
discover strong correlation among variables and which should be
easy to understand.
Box-Cox 轉換
Box-Cox Transformation
tip
- 17. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Reference
【R for Marketing Research and Analytics】[英文]
Authors: Chapman, Christopher N.,
McDonnell Feit, Elea
【R语言市场研究分析】[簡中]
作者:(美)克里斯·查普曼(Chris Chapman)
(美)埃里亚·麦克唐奈·费特(Elea McDonnell Feit)
譯者: 林荟