SlideShare a Scribd company logo
1 of 22
Automated
Retail
Analytics
Omni-Channel
and at Scale
William Komp
We are SmarterHQ
SmarterHQ is the leading multi-channel behavioral marketing
platform, empowering B2C marketers to personalize individual
customer interactions in real-time. We work with some of the
world’s largest brands – such as Bloomingdales, Santander Bank,
Carrentals.com and Finish Line to drive phenomenal business
results. We’ve been recognized by Forbes as technology to push
B2C companies into a new era of personalization and Forrester’s
Total Economic Impact study to deliver 667% in ROI.
So Lets build our models!!!
Easy enough, choose our favorite algorithm (in our case going for eventual
near real time scoring Logistic Regression).
Model build and input data filtering using Standard Deviation, Correlation and
Lasso LARS
We use python libraries (SCIKIT and pySQL Libraries) to automate gathering
the data and delivering to the server for model building!
This was all developed and perfected prior to Jan 2015 (a scant 6 months at
SmarterHQ)
Recently, expanded to include Affinity Analysis for interaction term building and
Product Recommendations
3
So what is the problem???!!!What have I not told you?
StoreFront
StoreFront Data Sources
4
OMS
Retail
Products
Digital Sources
StoreFront Building Blocks
Built on AWS
• EC2, Kinesis, Simple Queue Service (SQS), Lambda, S3, Glacier, Redshift
5
Data Gathering
Digital Sources:
• Tag a website, mobile app, etc
Product views, customer ids, email address, products carted, products purchased, loyalty ids
• Streams to redshift in as little as 5 minutes.
• Incremental batches run on redshift ~5 minutes, so data latency is as little as 10 minutes
OMS:
• Daily Feeds worked out with the Client:
Customer ids, loyalty ids, products, order totals, email address, refunds, cancelations, shipping info
• Processed once a day in a daily process
Product:
• Product ids, client based marketing categories
6
StoreFront Infrastructure Design
Properties:
Modular in design
highly Parallel
Concurrent writing
Processes are Daemonized
Python Apps supporting infrastructure
A typical day for every customer:
Web load (240x/day):
OMS (1x/day):
Product Feeds(1x/day):
7
WEB
streaming
SQS Kinesis Lambda S3 Redshift
ETL from
Client
Informatica S3 Redshift
ETL from
Client
Informatica S3 Redshift
Store Front
StoreFront Data Sources (revisited)
8
OMS
Retail
Products
Digital Sources
5 min
1/day
1/day
Entities!
• Everyone has a definition of what a customer is!!! How do we represent that customer in the data
that we have? If I ask for all of the purchase information from customer X then how can I get it
reliably and quickly?
• Entities are data driven constructs that are the data representation of a customer, location,
marketing campaign, etc….
• Defined by exact matching (Really want to go to Fuzzy land!)
Email Addresses, Loyalty ID, order ids, customer names, other customer ids
Require more than 2 pieces to match (except in the case of web only then email entities!)
Example:
9
Entity Mechanics
Build Entities using Graph Theory
Set of all possible data elements to be linked is the Vertex set
Use the data to build connections between Vertices or Edges!
Set of all connected vertices is the Edge Set
Use a graph building algorithms Breadth First Search or Depth First Search to build out the graphs
10
OMS:
1. Person Identifier fields (name, email address, customer ids, order ids)
2. Parse Email field (filter out with regular expression improperly formatted emails using RFC5322
standard) and get email user id
3. Algorithm Exact match on at least 2 fields (common names and email user names make single
point matches unreliable)
Could expand to 1 point using a frequency analysis to rule out 1 point matches for less common
names or email addresses
Digital:
Personal Identifier fields (email address, order id, loyalty ids)
1. Exact match on at least two of order id, email address or loyalty id to corresponding OMS entity
2. Next do digital email based entities (1 point matches)
11
Entities with both OMS Retail and Digital vertices – CrossChannel
Entities!
StoreFront Predictive Processes
• Asset Quality/ Visit Quality/ Engagement
• Product Recommendations
• Recency Frequency Monitization Latency (RFML)
• Predictive Models
12
Asset Quality/Visit Quality
Measures the expected value based on history of products viewed online
Suppose an Entity “Sarah” views 3 products X, Y and Z.
Asset Quality (AQ) is #purchases * Price / #views
Today Sarah’s AQ:
13
Product Price # views # purchases Asset
Quality
X $5.00 220 23 $0.52
Y $10.00 342 45 $1.32
Z $15.00 122 5 $0.61
Visit Quality (VQ) is Sum of Asset Quality for a visit
e.g. $2.45
Engagement
14
A weeks long Engagement with a 50% decay rate:
Day Visit Quality Engagement
1 $10.98 $10.98
2 $0 $5.49
3 $0 $2.75
4 $0 $1.37
5 $3.46 $4.15
6 $0 $2.07
7 $2.45 $3.49
$-
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
0 1 2 3 4 5 6 7 8
Dollars($)
Day
VQ Engagement
Product Recommendations
Association Rules with monthly customer sessions
• N1: Count the number of times products appear in pairs (over a month for a customer)
• N2: Count the number of times products (Antecedent or Consequent)appear over a month for a
customer
• N3: Count the number customers in a month
Compute
• Antecedent Support ( N2A / N3)
• Consequent Support ( N2C / N3)
• Rule Confidence (N1 / N2A)
• Lift ( N1/ N2A / (N2C / N3 ) )
All of this is done in database for all the most recent month daily!
15
Recommendation Example
Antecedent: Mens Air Jordan City Collection NYC T-Shirt N2A = 384
Consequent: Mens Air Jordan Retro 10 NYC Basketball Shoes N2C = 9770
Rule Occurrence: N1 = 114
Transaction Count: N3 = 780,005
Antecedent Support ( N2A / N3) = 384/780,005 = 0.00049
Consequent Support ( N2C / N3) = 9770/780,005 = 0.012
Rule Confidence (N1 / N2A) = 114/384 = 0.297
Lift ( N1/ N2A / (N2C / N3 ) ) = Rule Confidence / Consequent Support = 23.7
23.7x more likely to purchase Air Jordans after buying the Jordan City
Collection NYC T-Shirt
16
RFML
Recency: the number of days since the last visit or purchase by a shopper.
Frequency: the number of visits or purchases within a time period of interest.
Monetary: the total dollar spend of a shopper within the time period of interest.
Latency: the average number of days between visits or purchases within the time period of interest.
Recency and Latency are computed 1/day
Computed on demand:
Frequency
Monetary
17
Predictive Models
GOAL: Predict Days To Next Purchase and Days to Next Visit for <= 1, 3, 7, 15 and interval 15-
30, 31-60, 61-90
216 input fields (Engagement, Average order value, Average session value, session count, asset
count, many more plus interactions)
Build models on 6M records at an entity level
Model Building Process:
18
6M records (Redshift) Python pyETL library
Variable Reduction
(Variance, Correlation
and Lasso-LARS
variable reduction)
Build Models
(Parallel!!)
Model Tests (ROC
AUC, Regression
Coefficients)
Upload model &
results to SQL
Models ready to
Deploy
Model scoring handled directly in SQL using a SQL process.
Can score 100M’s of records in minutes!
Example Big A$$ Client
Athletic Retailer, 2 years of data, $1.6B in sales / year,
Typical Daily Adds 50,000 transactions, typical batch gives about 20,000 records every 6 min!
Database size: 866G (compressed) which equates 2.5T (uncompressed)
Total Daily Run time 3 hours (rebuilds from scratch), Batch runtime 5 mins!
Vertex Set: 253,449,334
Entity Set: 203,531,275
There are 50 million non-Atomic equivalence classes!
These amount to $850M or ~53% of the sales
(these customers are the known repeat customers)
These are the customers we can target as we have richer information about their repeated
browsing.
19
This is StoreFront Personalization
20
Website Mobile App In-Store Call Center 3rd PartyAnnual Spend: $4,500
Transactional History
• Online: INV 1215 $103.98
• Store: INV 4672 $50.45
• Store: INV 8500 $123.87 [etc]
Email Addresses
• Transactional: sarahhall@gmail.com
• Account: shall@home.com
• Promotional: sarahh@yahoo.com
Category Affinity: Kid’s, Women’s,
Running
Brand Affinity: Nike
S AR AH
Sales Channel
Category, Brand, Product
Cross-Channel
Email Website Mobile Display Social
Here’s what it delivers.
PROMO
Brands personalizing interactions in real-time and email
22

More Related Content

Similar to 1120 track2 komp

Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
jackiewalcutt
 

Similar to 1120 track2 komp (20)

1030 track2 komp
1030 track2 komp1030 track2 komp
1030 track2 komp
 
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Patterns for Building Streaming Apps
Patterns for Building Streaming AppsPatterns for Building Streaming Apps
Patterns for Building Streaming Apps
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
 
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
 
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your DataMongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
 
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2Con USA 2017: Analytics Patterns for Your Digital EnterpriseWSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
 
Analytics Patterns for Your Digital Enterprise
Analytics Patterns for Your Digital EnterpriseAnalytics Patterns for Your Digital Enterprise
Analytics Patterns for Your Digital Enterprise
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDB
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_public
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
 
Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-
 

More from Rising Media, Inc.

More from Rising Media, Inc. (20)

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Keynote adam greco
Keynote adam grecoKeynote adam greco
Keynote adam greco
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptop
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop
 
1415 track 2 richardson
1415 track 2 richardson1415 track 2 richardson
1415 track 2 richardson
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop
 
915 e metrics_claudia perlich
915 e metrics_claudia perlich915 e metrics_claudia perlich
915 e metrics_claudia perlich
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop
 
1615 plack using our laptop
1615 plack using our laptop1615 plack using our laptop
1615 plack using our laptop
 
1530 rimmele do not share
1530 rimmele do not share1530 rimmele do not share
1530 rimmele do not share
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable
 
1115 fiztgerald schuchardt
1115 fiztgerald schuchardt1115 fiztgerald schuchardt
1115 fiztgerald schuchardt
 
1000 kondic do not share
1000 kondic do not share1000 kondic do not share
1000 kondic do not share
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptop
 
Stephen morse sharable
Stephen morse sharableStephen morse sharable
Stephen morse sharable
 
Elder shareable
Elder shareableElder shareable
Elder shareable
 
1115 ramirez using our laptop
1115 ramirez using our laptop1115 ramirez using our laptop
1115 ramirez using our laptop
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

1120 track2 komp

  • 2. We are SmarterHQ SmarterHQ is the leading multi-channel behavioral marketing platform, empowering B2C marketers to personalize individual customer interactions in real-time. We work with some of the world’s largest brands – such as Bloomingdales, Santander Bank, Carrentals.com and Finish Line to drive phenomenal business results. We’ve been recognized by Forbes as technology to push B2C companies into a new era of personalization and Forrester’s Total Economic Impact study to deliver 667% in ROI.
  • 3. So Lets build our models!!! Easy enough, choose our favorite algorithm (in our case going for eventual near real time scoring Logistic Regression). Model build and input data filtering using Standard Deviation, Correlation and Lasso LARS We use python libraries (SCIKIT and pySQL Libraries) to automate gathering the data and delivering to the server for model building! This was all developed and perfected prior to Jan 2015 (a scant 6 months at SmarterHQ) Recently, expanded to include Affinity Analysis for interaction term building and Product Recommendations 3 So what is the problem???!!!What have I not told you?
  • 5. StoreFront Building Blocks Built on AWS • EC2, Kinesis, Simple Queue Service (SQS), Lambda, S3, Glacier, Redshift 5
  • 6. Data Gathering Digital Sources: • Tag a website, mobile app, etc Product views, customer ids, email address, products carted, products purchased, loyalty ids • Streams to redshift in as little as 5 minutes. • Incremental batches run on redshift ~5 minutes, so data latency is as little as 10 minutes OMS: • Daily Feeds worked out with the Client: Customer ids, loyalty ids, products, order totals, email address, refunds, cancelations, shipping info • Processed once a day in a daily process Product: • Product ids, client based marketing categories 6
  • 7. StoreFront Infrastructure Design Properties: Modular in design highly Parallel Concurrent writing Processes are Daemonized Python Apps supporting infrastructure A typical day for every customer: Web load (240x/day): OMS (1x/day): Product Feeds(1x/day): 7 WEB streaming SQS Kinesis Lambda S3 Redshift ETL from Client Informatica S3 Redshift ETL from Client Informatica S3 Redshift
  • 8. Store Front StoreFront Data Sources (revisited) 8 OMS Retail Products Digital Sources 5 min 1/day 1/day
  • 9. Entities! • Everyone has a definition of what a customer is!!! How do we represent that customer in the data that we have? If I ask for all of the purchase information from customer X then how can I get it reliably and quickly? • Entities are data driven constructs that are the data representation of a customer, location, marketing campaign, etc…. • Defined by exact matching (Really want to go to Fuzzy land!) Email Addresses, Loyalty ID, order ids, customer names, other customer ids Require more than 2 pieces to match (except in the case of web only then email entities!) Example: 9
  • 10. Entity Mechanics Build Entities using Graph Theory Set of all possible data elements to be linked is the Vertex set Use the data to build connections between Vertices or Edges! Set of all connected vertices is the Edge Set Use a graph building algorithms Breadth First Search or Depth First Search to build out the graphs 10
  • 11. OMS: 1. Person Identifier fields (name, email address, customer ids, order ids) 2. Parse Email field (filter out with regular expression improperly formatted emails using RFC5322 standard) and get email user id 3. Algorithm Exact match on at least 2 fields (common names and email user names make single point matches unreliable) Could expand to 1 point using a frequency analysis to rule out 1 point matches for less common names or email addresses Digital: Personal Identifier fields (email address, order id, loyalty ids) 1. Exact match on at least two of order id, email address or loyalty id to corresponding OMS entity 2. Next do digital email based entities (1 point matches) 11 Entities with both OMS Retail and Digital vertices – CrossChannel Entities!
  • 12. StoreFront Predictive Processes • Asset Quality/ Visit Quality/ Engagement • Product Recommendations • Recency Frequency Monitization Latency (RFML) • Predictive Models 12
  • 13. Asset Quality/Visit Quality Measures the expected value based on history of products viewed online Suppose an Entity “Sarah” views 3 products X, Y and Z. Asset Quality (AQ) is #purchases * Price / #views Today Sarah’s AQ: 13 Product Price # views # purchases Asset Quality X $5.00 220 23 $0.52 Y $10.00 342 45 $1.32 Z $15.00 122 5 $0.61 Visit Quality (VQ) is Sum of Asset Quality for a visit e.g. $2.45
  • 14. Engagement 14 A weeks long Engagement with a 50% decay rate: Day Visit Quality Engagement 1 $10.98 $10.98 2 $0 $5.49 3 $0 $2.75 4 $0 $1.37 5 $3.46 $4.15 6 $0 $2.07 7 $2.45 $3.49 $- $2.00 $4.00 $6.00 $8.00 $10.00 $12.00 0 1 2 3 4 5 6 7 8 Dollars($) Day VQ Engagement
  • 15. Product Recommendations Association Rules with monthly customer sessions • N1: Count the number of times products appear in pairs (over a month for a customer) • N2: Count the number of times products (Antecedent or Consequent)appear over a month for a customer • N3: Count the number customers in a month Compute • Antecedent Support ( N2A / N3) • Consequent Support ( N2C / N3) • Rule Confidence (N1 / N2A) • Lift ( N1/ N2A / (N2C / N3 ) ) All of this is done in database for all the most recent month daily! 15
  • 16. Recommendation Example Antecedent: Mens Air Jordan City Collection NYC T-Shirt N2A = 384 Consequent: Mens Air Jordan Retro 10 NYC Basketball Shoes N2C = 9770 Rule Occurrence: N1 = 114 Transaction Count: N3 = 780,005 Antecedent Support ( N2A / N3) = 384/780,005 = 0.00049 Consequent Support ( N2C / N3) = 9770/780,005 = 0.012 Rule Confidence (N1 / N2A) = 114/384 = 0.297 Lift ( N1/ N2A / (N2C / N3 ) ) = Rule Confidence / Consequent Support = 23.7 23.7x more likely to purchase Air Jordans after buying the Jordan City Collection NYC T-Shirt 16
  • 17. RFML Recency: the number of days since the last visit or purchase by a shopper. Frequency: the number of visits or purchases within a time period of interest. Monetary: the total dollar spend of a shopper within the time period of interest. Latency: the average number of days between visits or purchases within the time period of interest. Recency and Latency are computed 1/day Computed on demand: Frequency Monetary 17
  • 18. Predictive Models GOAL: Predict Days To Next Purchase and Days to Next Visit for <= 1, 3, 7, 15 and interval 15- 30, 31-60, 61-90 216 input fields (Engagement, Average order value, Average session value, session count, asset count, many more plus interactions) Build models on 6M records at an entity level Model Building Process: 18 6M records (Redshift) Python pyETL library Variable Reduction (Variance, Correlation and Lasso-LARS variable reduction) Build Models (Parallel!!) Model Tests (ROC AUC, Regression Coefficients) Upload model & results to SQL Models ready to Deploy Model scoring handled directly in SQL using a SQL process. Can score 100M’s of records in minutes!
  • 19. Example Big A$$ Client Athletic Retailer, 2 years of data, $1.6B in sales / year, Typical Daily Adds 50,000 transactions, typical batch gives about 20,000 records every 6 min! Database size: 866G (compressed) which equates 2.5T (uncompressed) Total Daily Run time 3 hours (rebuilds from scratch), Batch runtime 5 mins! Vertex Set: 253,449,334 Entity Set: 203,531,275 There are 50 million non-Atomic equivalence classes! These amount to $850M or ~53% of the sales (these customers are the known repeat customers) These are the customers we can target as we have richer information about their repeated browsing. 19
  • 20. This is StoreFront Personalization 20 Website Mobile App In-Store Call Center 3rd PartyAnnual Spend: $4,500 Transactional History • Online: INV 1215 $103.98 • Store: INV 4672 $50.45 • Store: INV 8500 $123.87 [etc] Email Addresses • Transactional: sarahhall@gmail.com • Account: shall@home.com • Promotional: sarahh@yahoo.com Category Affinity: Kid’s, Women’s, Running Brand Affinity: Nike S AR AH Sales Channel Category, Brand, Product Cross-Channel Email Website Mobile Display Social
  • 21. Here’s what it delivers. PROMO
  • 22. Brands personalizing interactions in real-time and email 22