Multi touch attribution

Engaging with Caserta to
ADVANCE YOUR BUSINESS
September 26th, 2017
November 15th, 2017November 15th, 2017
Maxwell Goldbas, Director of Caserta Innovation Labs
Multi-Touch
Attribution Modeling
with Spark

• Who am I?
• Raised on the Upper West Side
• Data Engineer
• Director, Caserta Innovations Lab
• Topics today
• Multi-touch attribution
• Data science with Spark
Introduction
2

• Caserta recently did a cloud migration
• Large media client
• Client could not join us today
• Client was not familiar with Spark
• Hesitant to change to open source code
• We want to demonstrate its power
Background
3

• Client: Which consumer touch points
drive engagement in rewards program?
• Snail Mail
• Texts
• Member Events
• Email
• Site Activity
• Caserta: Get client excited about our
Infrastructure
• Identity Resolution
• Unified Data Source
Objectives
4

• Databricks
• User Access to Data Lake
• Several Spark Clusters
• Graph Dataframes
• AWS
• Data Lake in S3
• Redshift
• EC2 for Clusters
• Caserta
• Airflow
• Docker
• RabbitMQ
Infrastructure
5

• Get data in useable format
• Required knowledge:
• Number of touch points that happened in between
each conversion
• Impact each touch point had on final conversion
• Pull all engagements
• Pull distinct conversions by individual key, event type,
date
• Conversions is the engagement with rewards program
• Do not want multiple conversions, by the same person
on the same day to create noise
• 15 billion rows of event data
Preparation
6

Process
7
Events
Paths
Models
• Order events by individual
• Flag each conversion event
• Flag each a new individual
• Path for each flag for conversion and individua
• Group touch points into paths
• Build Models from Paths

Conversion Paths – Event Data
8
Individual Key Activity Type
Key
Conversion New User Conversion
Path
1 Email 0 0 0
1 Text 0 0 0
1 Conversion 1 0 0
1 Email 0 0 1
2 Text 0 1 2
2 Conversion 1 0 2
2 Text 0 0 3

Conversion Paths – Path Data
9
Individual key Conversion
Path
Total Emails Total Texts
1 0 2 1
1 1 1 0
2 2 0 1
2 3 0 1

Conversion Paths – Conversion Flag
10
Individual
key
Conversion
Path
Total
Emails
Total
Texts
Converted?
1 0 2 1 1
1 1 1 0 0
2 2 0 1 1
2 3 0 1 0

Conversion Paths – Conversion Data
11
Total Emails Total Texts Converted?
2 1 1
1 0 0
0 1 1
0 1 0
LabelFeatures

• Darling child of data science
• Flexible, easy to use, accurate
• Prediction for whether or not a certain
number of events will lead to a
conversion
• Each conversion should have the
number of touch points that lead it
• Results:
• Email and Web Traffic are king
First Model: Logistic Regression
12

• Does not take time between engagements
and conversions into account
• 1000 ads over a year is not 10 times greater
than 100 ads in a week
• Survival analysis to the rescue
• Offset the total number of ads by the
duration they were seen in
• Highest Survival Rate – Web Traffic
• The steeper the curve, the more powerful
the ad
First Model is Wrong: Survival Analysis
13

Survival Analysis
14
Emails Duration
(days)
Survival
Probability
Emails (adj.)
13 6 .94 12.2
13 30 .91 11.8
21 53 .82 17.2
40 214 .61 24.4
52 345 .31 16.1

• Reduce touch points in a long conversion path
• Web traffic activity was effected the most
• More messages means easier to forget
• Less impact
• Multiply number of events by probability they
will convert after that number of events in
their duration
• Results:
• Email and Events are king
Second Model: Discrete Time Survival Model based
conversation paths
15

• Survival Analysis is currently univariate
• Multivariate would could demonstrate
covariance
• Did not have social media data
• Use deep learning
• Account for correlation across channels
• Add parameter for heavy web users,
balance between offline and online focus
Further Analysis
16

• Parallelism is good
• Use Redshift and Spark
• Watch your bottlenecks
• Actions like show and count can cost precious
time
• Bottlenecks can be mitigated by using less,
bigger instances
• Survival Analysis gave us a good amount of
data
• Duration of time before someone would
convert based on a channel
• Caching helped for frequently access data
Notes
17

Thank You
• Maxwell Goldbas, Director, Caserta Innovation Labs:
max@caserta.com

References
• http://gseacademic.harvard.edu/alda/Handouts/ALDA%20Chapter%
2011.pdf
• https://www.jmp.com/support/help/13-2/Survival_Analysis.shtml

Multi touch attribution

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Multi touch attribution

Similar to Multi touch attribution (20)

Recently uploaded

Recently uploaded (20)

Multi touch attribution