Snowplow: evolve your analytics stack with your business

Snowplow: evolve your
analytics stack with your
business
Snowplow Meetup San Francisco, Feb 2017

Our businesses are
constantly evolving…
• Our digital products (apps and platforms) are
constantly developing
• The questions we ask of our data are constantly
changing
• It is critical that our analytics stack can evolve
with our business

Self-describing data Event data modeling+
Analytics stack that evolves
with your business
How Snowplow users evolve their
analytics stacks with their business

Event data varies widely by
company

As a Snowplow user, you can
deﬁne your own events and entities
Events
Entities
(contexts)
• Build castle
• Form alliance
• Declare war
• Player
• Game
• Level
• Currency
• View product
• Buy product
• Deliver product
• Product
• Customer
• Basket
• Delivery van

You then define a schema
for each event and entity
{
"$schema": "http://iglucentral.com/schemas/
com.snowplowanalytics.self-desc/schema/jsonschema/
1-0-0#",
"description": "Schema for a fighter context",
"self": {
"vendor": "com.ufc",
"name": "fighter_context",
"format": "jsonschema",
"version": "1-0-1"
},
"type": "object",
"properties": {
"FirstName": {
"type": "string"
},
"LastName": {
"type": "string"
},
"Nickname": {
"type": "string"
},
"FacebookProfile": {
"type": "string"
},
"TwitterName": {
"type": "string"
},
"GooglePlusProfile": {
"type": "string"
},
"HeightFormat": {
"type": "string"
},
"HeightCm": {
"type": ["integer", "null"]
},
"Weight": {
},
"WeightKg": {
},
"Record": {
"type": "string",
"pattern": "^[0-9]+-[0-9]+-[0-9]+$"
},
"Striking": {
"type": ["number", "null"],
"maxdecimal": 15
},
"Takedowns": {
"maxdecimal": 15
},
"Submissions": {
"maxdecimal": 15
},
"LastFightUrl": {
"type": "string"
},
"LastFightEventText": {
"type": "string"
},
"NextFightUrl": {
"type": "string"
},
"NextFightEventText": {
"type": "string"
},
"LastFightDate": {
"type": "string",
"format": "timestamp"
}
},
"additionalProperties": false
}
Upload the
schema to Iglu

Then send data into
Snowplow as self-
describing JSONs
1. Validation
2. Dimension
widening
3. Data
modeling
{
“schema”: “iglu:com.israel365/
temperature_measure/jsonschema/1-0-0”,
“data”: {
“timestamp”: “2016-11-16 19:53:21”,
“location”: “Berlin”,
“temperature”: 3
“units”: “Centigrade”
}
}
{
"$schema": "http://iglucentral.com/schemas/
com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"description": "Schema for an ad impression
event",
"self": {
"vendor": “com.israel365",
"name": “temperature_measure",
"format": "jsonschema",
"version": "1-0-0"
},
"type": "object",
"properties": {
"timestamp": {
"type": "string"
},
"location": {
"type": "string"
},
…
},
…
Event
Schema
reference
Schema

The schemas can then be
used in a number of ways
• Validate the data (important for data quality)
• Load the data into tidy tables in your data
warehouse
• Make it easy / safe to write downstream data
processing application (e.g. for real-time users)

What is event data modeling?
1. Validation
2. Dimension
widening
3. Data
modeling
Event data modeling is the process of using business logic to aggregate over
event-level data to produce 'modeled' data that is simpler for querying.

event 1
event n
…
Users
Sessions
…
Funnels
Immutable. Unopiniated. Hard to
consume. Not contentious
Mutable and
opinionated. Easy to consume. May
be contentious
Unmodeled data Modeled data

In general, event data modeling is
performed on the complete event stream
• Late arriving events can change the way you
understand earlier arriving events
• If we change our data models: this gives us the
ﬂexibility to recompute historical data based on the
new model

The evolving event
data pipeline

How do we handle pipeline
evolution?
PUSH
FACTORS:
What is being
tracked will
change over
time
PULL
FACTORS:
What questions are
being asked of the
data will change
over time
Businesses are not static, so event pipelines should not be either
Web
Apps
Servers
Comms channels
Push …
Data
warehouse
Data exploration
Predictive modeling
Real-time dashboards
Real-time,
data-driven
applications
RT
Bidder
Voucher
Person-
alization
…
Collection Processing
Smart car / home
…

Push example:
new source of event data
• If data is self-describing it is easy to add an additional
sources
• Self-describing data is good for managing bad data
and pipeline evolution
I’m an email send event and I have
information about the recipient (email
address, customer ID) and the email
(id, tags, variation)

Pull example:
new business question
Answer
Insight
Question?

Answering the question:
3 possibilities
1. Existing data model
supports answer
2. Need to update data
model
3. Need to update data
model and data
collection
• Possible to answer
question with existing
modeled data
• Data collected
already supports
answer
• Additional
computation required
in data modeling step
(additional logic)
• Need to extend event
tracking
• Need to update data
models to
incorporate
additional data (and
potentially additional
logic)

Self-describing data and the ability to recompute data
models are essential to enable pipeline evolution
Self-describing data Recompute data models on entire data set
• Updating existing events and entities in
a backward compatible way e.g. add
optional new fields
• Update existing events and entities in a
backwards incompatible way e.g. change
field types, remove fields, add compulsory fields
• Add new event and entity types
• Add new columns to existing derived
tables e.g. add new audience segmentation
• Change the way existing derived tables
are generated e.g. change sessionization logic
• Create new derived tables

Snowplow: evolve your analytics stack with your business

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Snowplow: evolve your analytics stack with your business

Similar to Snowplow: evolve your analytics stack with your business (20)

More from yalisassoon

More from yalisassoon (7)

Recently uploaded

Recently uploaded (20)

Snowplow: evolve your analytics stack with your business