SlideShare a Scribd company logo
1 of 17
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Transformation for Marketing Research & Analytics
useful transformation, Box-Cox transform, power transform
Will Kuan 官大鈞
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
hero of today is about …
part of feature engineering,
【feature transformation】
so, what is feature
transformation ?
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
特徵轉換
Feature Transformation
make-up for features
The essence of data never changes, we just change another way or
perspective to observe data
What :
Why :
good thing is always good no matter what we do to it , so
bad is still bad as well. What we need to focus and to do is
uncover or discover hidden valuable variables(features). Thus,
personally I prefer to call what we do is feature discovery.
it’s still what it is indeed,
just change what it looks like
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
How large correlation means significant correlated in marketing?
In science, basically we might need  = 0.9, 0.95 or 0.999 so that
we can say correlation is strong. However, in marketing, too
much noise surrounded so  normally is very small, but that does
not mean which is not important or not significant. Thus, usually
we use Cohen’s rule in social science.
Cohen’s rule of thumb
Effect Size (Cohen)
.10 Small
.30 Moderate
.50 Large
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Most importantly, Cohen’s rule is based on normally distributed
variables (Gaussian distribution).
How do we explain Cohen’s highly correlated relationship? It’s
easy for everyone to be aware of something correlated. How’s
the weak correlation? which actually means we have to test with
carefulness if we want to find out.
Cohen’s rule of thumb
Effect Size (Cohen)
.10 Small
.30 Moderate
.50 Large
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
變數(屬性、特徵) 常見轉換
銷售量、公司收入、
家庭收入、價格
log(𝑥)
距離 1
𝑥
,
1
𝑥2, log(𝑥)
市場份額、選擇偏好 𝑒 𝑥
1 + 𝑒 𝑥
右偏分佈 𝑥 , log(𝑥) (對於log 𝑥  0 慎用)
左偏分佈 𝑥2
Useful Transformation
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
There’s a retail customer transaction data set
【cust.df】 with 1000 observations and 12 variables
This example is from 【 R for Marketing Research and Analytics 】
by C. Chapman & E. M. Feit
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
1) 對顧客住址離最近商店的距離與顧客到店消費金額計算相關性
outcome > 大小適度的負相關
2) 對顧客住址離最近商店的距離取倒數後計算相關性,得到不一樣的結果
outcome > 比原先強得多的相關性
3) 對顧客住址離最近商店的距離開平方並取倒數後計算相關性,又得到不一樣的結果
outcome > 更強的相關性
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
but, how to explain this situation ?
Perhaps we can say that there exists a “reciprocal squared” relationship
between both. More directly, customers living in 1 miles from the
nearest store spend more in the store than customers living in 5 miles.
However, customers living in 20 miles and 30 miles are more or less in
store spending.
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
let’s check by scatterplot
1/ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑠𝑡𝑜𝑟𝑒𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑠𝑡𝑜𝑟𝑒
spendinginstore
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Before Next Page
brief conclusion
It is highly important to make data normal before calculating correlation
or building scatterplot. Some good converters are able to help us see
relationship among variables more clearly.
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Box-Cox 轉換
Box-Cox Transformation
automation formula
𝑦𝑖
− 1

,   0
log 𝑦 ,  = 0
𝑦𝑖
 =
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
1) an automatic way , power transformation, to find an appropriate formula
for the feature, distance .to.store, than manual way.
2) use coef( ) to get lambda, then use bcPower( ) to run Box-Cox transformation
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
let’s check before & after
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Demo
Recall
1) Calculate the correlation after transformation, basically we will get stronger correlation.
outcome > We can say distance to store & spending in
store is strong negative correlated.
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Before doing correlation test or plotting scatterplot, it is highly
recommended to do Box-Cox transform to all nongaussian
distributed variables. That will increase more possibilities to
discover strong correlation among variables and which should be
easy to understand.
Box-Cox 轉換
Box-Cox Transformation
tip
COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.
Reference
【R for Marketing Research and Analytics】[英文]
Authors: Chapman, Christopher N.,
McDonnell Feit, Elea
【R语言市场研究分析】[簡中]
作者:(美)克里斯·查普曼(Chris Chapman)
(美)埃里亚·麦克唐奈·费特(Elea McDonnell Feit)
譯者: 林荟

More Related Content

Similar to 市場銷售資料數據分析 / 市场销售数据分析 Transformation for marketing research & analytics

C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
Hammad Rajjoub
 
Microservices rubyconf-2013
Microservices rubyconf-2013Microservices rubyconf-2013
Microservices rubyconf-2013
Mohit Thatte
 
Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016
Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016
Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016
Arne Kittler
 
eBusiness Champions CMS event Leicester
eBusiness Champions CMS event LeicestereBusiness Champions CMS event Leicester
eBusiness Champions CMS event Leicester
eBusiness Champions
 

Similar to 市場銷售資料數據分析 / 市场销售数据分析 Transformation for marketing research & analytics (20)

From a hack to Data Mesh (Devoxx 2022)
From a hack to Data Mesh (Devoxx 2022)From a hack to Data Mesh (Devoxx 2022)
From a hack to Data Mesh (Devoxx 2022)
 
The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)
 
Incremental design v2
Incremental design v2Incremental design v2
Incremental design v2
 
Vertex presentation 2015
Vertex presentation 2015Vertex presentation 2015
Vertex presentation 2015
 
Optimized migration from ICD-9 to ICD-10 for SAP BusinessObjects at Tuscon Me...
Optimized migration from ICD-9 to ICD-10 for SAP BusinessObjects at Tuscon Me...Optimized migration from ICD-9 to ICD-10 for SAP BusinessObjects at Tuscon Me...
Optimized migration from ICD-9 to ICD-10 for SAP BusinessObjects at Tuscon Me...
 
AppDynamics User Group
AppDynamics User GroupAppDynamics User Group
AppDynamics User Group
 
BIG DATA, a new way to achieve success in Enterprise Architecture.
BIG DATA, a new way to achieve success in Enterprise Architecture.BIG DATA, a new way to achieve success in Enterprise Architecture.
BIG DATA, a new way to achieve success in Enterprise Architecture.
 
Java script
Java scriptJava script
Java script
 
DOES16 London - Ron van Kemenade - Nothing Beats Engineering Talent…The Agile...
DOES16 London - Ron van Kemenade - Nothing Beats Engineering Talent…The Agile...DOES16 London - Ron van Kemenade - Nothing Beats Engineering Talent…The Agile...
DOES16 London - Ron van Kemenade - Nothing Beats Engineering Talent…The Agile...
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
 
Microservices rubyconf-2013
Microservices rubyconf-2013Microservices rubyconf-2013
Microservices rubyconf-2013
 
Benchmark of ecommerce solutions (short version, english)
Benchmark of ecommerce solutions (short version, english)Benchmark of ecommerce solutions (short version, english)
Benchmark of ecommerce solutions (short version, english)
 
Benchmark of e-commerce solutions
Benchmark of e-commerce solutionsBenchmark of e-commerce solutions
Benchmark of e-commerce solutions
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016
Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016
Scaling & Aligning Mobile Product Management / ProductTank Lisbon February 2016
 
Soa Meets Roi
Soa Meets RoiSoa Meets Roi
Soa Meets Roi
 
What's beyond ERP? New normal ERP? by Ludo Van den Kerckhove
What's beyond ERP? New normal ERP? by Ludo Van den KerckhoveWhat's beyond ERP? New normal ERP? by Ludo Van den Kerckhove
What's beyond ERP? New normal ERP? by Ludo Van den Kerckhove
 
What’s beyond ERP ? New normal ERP?
What’s beyond ERP ? New normal ERP?What’s beyond ERP ? New normal ERP?
What’s beyond ERP ? New normal ERP?
 
eBusiness Champions CMS event Leicester
eBusiness Champions CMS event LeicestereBusiness Champions CMS event Leicester
eBusiness Champions CMS event Leicester
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

市場銷售資料數據分析 / 市场销售数据分析 Transformation for marketing research & analytics

  • 1. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Transformation for Marketing Research & Analytics useful transformation, Box-Cox transform, power transform Will Kuan 官大鈞
  • 2. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. hero of today is about … part of feature engineering, 【feature transformation】 so, what is feature transformation ?
  • 3. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd.COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. 特徵轉換 Feature Transformation make-up for features The essence of data never changes, we just change another way or perspective to observe data What : Why : good thing is always good no matter what we do to it , so bad is still bad as well. What we need to focus and to do is uncover or discover hidden valuable variables(features). Thus, personally I prefer to call what we do is feature discovery. it’s still what it is indeed, just change what it looks like
  • 4. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. How large correlation means significant correlated in marketing? In science, basically we might need  = 0.9, 0.95 or 0.999 so that we can say correlation is strong. However, in marketing, too much noise surrounded so  normally is very small, but that does not mean which is not important or not significant. Thus, usually we use Cohen’s rule in social science. Cohen’s rule of thumb Effect Size (Cohen) .10 Small .30 Moderate .50 Large
  • 5. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Most importantly, Cohen’s rule is based on normally distributed variables (Gaussian distribution). How do we explain Cohen’s highly correlated relationship? It’s easy for everyone to be aware of something correlated. How’s the weak correlation? which actually means we have to test with carefulness if we want to find out. Cohen’s rule of thumb Effect Size (Cohen) .10 Small .30 Moderate .50 Large
  • 6. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. 變數(屬性、特徵) 常見轉換 銷售量、公司收入、 家庭收入、價格 log(𝑥) 距離 1 𝑥 , 1 𝑥2, log(𝑥) 市場份額、選擇偏好 𝑒 𝑥 1 + 𝑒 𝑥 右偏分佈 𝑥 , log(𝑥) (對於log 𝑥  0 慎用) 左偏分佈 𝑥2 Useful Transformation
  • 7. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo There’s a retail customer transaction data set 【cust.df】 with 1000 observations and 12 variables This example is from 【 R for Marketing Research and Analytics 】 by C. Chapman & E. M. Feit
  • 8. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo 1) 對顧客住址離最近商店的距離與顧客到店消費金額計算相關性 outcome > 大小適度的負相關 2) 對顧客住址離最近商店的距離取倒數後計算相關性,得到不一樣的結果 outcome > 比原先強得多的相關性 3) 對顧客住址離最近商店的距離開平方並取倒數後計算相關性,又得到不一樣的結果 outcome > 更強的相關性
  • 9. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo but, how to explain this situation ? Perhaps we can say that there exists a “reciprocal squared” relationship between both. More directly, customers living in 1 miles from the nearest store spend more in the store than customers living in 5 miles. However, customers living in 20 miles and 30 miles are more or less in store spending.
  • 10. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo let’s check by scatterplot 1/ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑠𝑡𝑜𝑟𝑒𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑡𝑜 𝑠𝑡𝑜𝑟𝑒 spendinginstore
  • 11. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Before Next Page brief conclusion It is highly important to make data normal before calculating correlation or building scatterplot. Some good converters are able to help us see relationship among variables more clearly.
  • 12. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Box-Cox 轉換 Box-Cox Transformation automation formula 𝑦𝑖 − 1  ,   0 log 𝑦 ,  = 0 𝑦𝑖  =
  • 13. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo 1) an automatic way , power transformation, to find an appropriate formula for the feature, distance .to.store, than manual way. 2) use coef( ) to get lambda, then use bcPower( ) to run Box-Cox transformation
  • 14. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo let’s check before & after
  • 15. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Demo Recall 1) Calculate the correlation after transformation, basically we will get stronger correlation. outcome > We can say distance to store & spending in store is strong negative correlated.
  • 16. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Before doing correlation test or plotting scatterplot, it is highly recommended to do Box-Cox transform to all nongaussian distributed variables. That will increase more possibilities to discover strong correlation among variables and which should be easy to understand. Box-Cox 轉換 Box-Cox Transformation tip
  • 17. COPYRIGHT©2016 eBizprise Inc. & eBizprise Technology (TJ) Ltd. Reference 【R for Marketing Research and Analytics】[英文] Authors: Chapman, Christopher N., McDonnell Feit, Elea 【R语言市场研究分析】[簡中] 作者:(美)克里斯·查普曼(Chris Chapman) (美)埃里亚·麦克唐奈·费特(Elea McDonnell Feit) 譯者: 林荟