SlideShare a Scribd company logo
Building Event Collection
SDKs and Data Models
OSA Con ‘22
Paul Boocock
@paul_boocock
2022 www.snowplow.io
Table of
contents
What is Snowplow?
A Quick Introduction
Tracking SDKs
How we build them and our decisions
Data Models
Considerations for modeling the raw data
_01
_02
_03
2022 www.snowplow.io
What is Snowplow?
A quick intro
_01
www.snowplow.io
Snowplow:
we build
tech to
enable
companies
to Create
Data
www.snowplow.io
Trackers
_02
www.snowplow.io
But first, a
small detour…
www.snowplow.io
{
"$schema":
"http://iglucentral.com/schemas/com.snowplowana
lytics.self-desc/schema/jsonschema/1-0-0#",
"description": "JSON schema for
a button click event",
"self": {
"vendor":
"com.acme",
"name": "click",
"format":
"jsonschema",
"version": "1-0-
0"
},
"type": "object",
"properties": {
"button": {
"type":
["string", "null"],
"maxLength":
255
}
},
"additionalProperties": false
}
www.snowplow.io
The data created looks a bit like this
user_id
page_url
timestamp
device
city
mkt_campaign
Default
content_id
content_title
author
date_created
button_click
Custom Custom
Entity
Event
But can crucially be evolved over time
www.snowplow.io
{
"$schema":
"http://iglucentral.com/schemas/com.snowplowana
lytics.self-desc/schema/jsonschema/1-0-0#",
"description": "JSON schema for
a button click event",
"self": {
"vendor":
"com.acme",
"name": "click",
"format":
"jsonschema",
"version": "1-0-
0"
},
"type": "object",
"properties": {
"button": {
"type":
["string", "null"],
"maxLength":
255
}
},
"additionalProperties": false
}
{
"$schema":
"http://iglucentral.com/schemas/com.snowplowana
lytics.self-desc/schema/jsonschema/1-0-0#",
"description": "JSON schema for
a button click event",
"self": {
"vendor":
"com.acme",
"name": "click",
"format":
"jsonschema",
"version": "1-0-
1"
},
"type": "object",
"properties": {
"button": {
"type":
["string", "null"],
"maxLength":
255
},
"index": {
"type":
["integer", "null"]
}
},
"additionalProperties": false
}
www.snowplow.io
The real benefit of that approach is how the
data then looks
Default
fields (1/130)
Custom Event data
event_name unstruct_event_com_acme_click_1
click {
"button": "open_article"
}
Default fields (1/130) Custom Event data
event_name unstruct_event_com_acme_click_1
click {
"button": "open_article"
}
click {
"button": "open_article",
"index": "2"
}
Tracking SDKs
Lorem ipsum
_02.1
www.snowplow.io
Snowplow Tracking SDKs
An Introduction
2022 www.snowplow.io
www.snowplow.io
Multi-language Support
1.
Quickly developed beyond a
web only platform
Web Tracker is unique as
different challenges on the
web vs other platforms
So, what do we do?
2.
Tempting to auto
generate
- Works well for
some trackers
(server)
- Less well for
others (client)
https://quicktype.io/?
www.snowplow.io
Building SDKs
Hand craft SDKs Autogenerate SDKs Can we do both?
www.snowplow.io
Server vs Client SDKs
How do they differ?
2022 www.snowplow.io
www.snowplow.io
Configuration
Challenges
2022 www.snowplow.io
www.snowplow.io
Sensible defaults
Whilst the SDKs are incredibly
configurable, opting for sensible
defaults keeps users happy
Where possible we now try to be more
opinionated and don’t offer
configuration
Works well until someone asks us if
they can configure it on Github!
www.snowplow.io
Data Models
_03
www.snowplow.io
Data modeling is the
process of using
business logic to
aggregate or
otherwise transform
raw data.
www.snowplow.io
1
Process clickstream data from raw events to
create aggregate tables of views, sessions and
users, reducing the volume of data massively
while adding quality
2
Deal with user stitching across sessions and
devices
3
Compatible with schema evolution, so
when you add new fields to your
contexts they immediately show up in
your data models
users
sessions
views
raw events
3%
6%
20%
Rows relative to raw
events
www.snowplow.io
Modeling is not an afterthought
This end-to-end approach results in
Clean, structured
data always available
in your warehouse
Consistent business logic
defined once means no
qualms about what metrics
mean
More time spent building
out ML/AI models instead of
toiling with data problems
www.snowplow.io
Raw
Data
What does the data look like
Structure data to match your product
video_play
view
video_pause
heartbeat
click
content
id: ‘abc123’
type: ‘video’
date updated: 2019-06...
title: ‘Birthday…’
creator: ‘Nicholas Witchell’
theme: ‘royals’
native: FALSE
personalities: [‘Kate’,
‘Wills’,
‘George’]
Events Entity
search content content content
Going Incremental
_03.2
www.snowplow.io
Consolidation
Similarities between queries:
● Common levels of aggregation
● Repeated logic, like joins
Consolidate ad-hoc queries into a set of
derived tables.
These generalised tables can be used to
solve a variety of use cases.
raw events
page views
sessions
users
20%
6%
3%
Rows relative to raw
events
Value of a page or screen view
PAGE VIEW
HEARTBEAT
ENGAGEMENT
time_engaged: 25s
scroll_depth(x): 100%
scroll_depth(y): 37%
clicks:
1
shares:
0
time_engaged: 15s
scroll_depth(x): 100%
scroll_depth(y): 20%
clicks:
2
shares:
0
time_engaged: 35s
scroll_depth(x): 100%
scroll_depth(y): 86%
clicks:
0
shares:
1
One Step Further - Incremental Models
As the business continues to grow:
● The size of the events table grows.
● Behavioural data is more business critical
and needs to be processed more
regularly.
Meaning:
● Running the derived tables in a drop and
compute manner is becoming
increasingly costly and taking
considerable time.
● Processing events in an incremental
manner becomes a necessity.
#page_views.sql
{{ config(
materialized='table'
)
}}
select
page_view_id,
...
row_number() over (
partition by session_id
order by derived_tstamp
) AS page_view_in_session_index
from {{ ref('events') }}
where event_name = 'page_view'
#page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id'
)
}}
with sessions_with_new_events as (
select distinct
session_id
from {{ ref('events') }}
where event_name = 'page_view'
{% if is_incremental() %}
and derived_tstamp > (
select max(derived_tstamp) from
{{this}}
)
{% endif %}
)
select
e.page_view_id,
...
row_number() over (
partition by e.session_id
order by e.derived_tstamp
) AS page_view_in_session_index
from {{ ref('events') }} e
inner join sessions_with_new_events s
on e.session_id = s.session_id
where e.event_name = 'page_view'
Drop & Recompute
Incremental
Page views - Incremental
www.snowplow.io
Process the least data possible
#page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id'
)
}}
with sessions_with_new_events as (
select distinct
session_id
from {{ ref('events') }}
where event_name = 'page_view'
{% if is_incremental() %}
and derived_tstamp > (
select max(derived_tstamp) from
{{this}}
)
{% endif %}
)
Reduce the amount of data by:
● Ensure you filter on the
partition/sort keys of the source
www.snowplow.io
Process the least data possible
#page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id'
)
}}
with sessions_with_new_events as (
select distinct
session_id
from {{ ref('events') }}
where event_name = 'page_view'
{% if is_incremental() %}
and collector_tstamp > (
select max(collector_tstamp)
from {{this}}
)
{% endif %}
)
Reduce the amount of data by:
● Ensure you filter on the
partition/sort keys of the source
www.snowplow.io
Process the least data possible
#page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id'
)
}}
with sessions_with_new_events as (
select ...
)
select
e.page_view_id,
...
row_number() over (
partition by e.session_id
order by e.derived_tstamp
) AS page_view_in_session_index
from {{ ref('events') }} e
inner join sessions_with_new_events s
on e.session_id = s.session_id
where e.event_name = 'page_view'
Reduce the amount of data by:
● Ensure you filter on the
partition/sort keys of the source
● Restrict table scans on all source
tables if possible
www.snowplow.io
Process the least data possible
#page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id'
)
}}
with sessions_with_new_events as (
select ...
)
select
e.page_view_id,
...
from {{ ref('events') }} e
inner join sessions_with_new_events s
on e.session_id = s.session_id
where e.event_name = 'page_view'
-- limit table scans
{% if is_incremental() %}
and collector_tstamp > (
select
dateadd(
day,
-3,
max(collector_tstamp))
from {{this}}
)
{% endif %}
Reduce the amount of data by:
● Ensure you filter on the
partition/sort keys of the source
● Restrict table scans on all source
tables if possible
www.snowplow.io
Process the least data possible
#page_views.sql
{{ config(
materialized='incremental',
unique_key='page_view_id'
)
}}
with sessions_with_new_events as (
select distinct
session_id
from {{ ref('events') }}
where event_name = 'page_view'
{% if is_incremental() %}
and derived_tstamp > (
select max(derived_tstamp) from
{{this}}
)
{% endif %}
)
Incremental
Reduce the amount of data by:
● Ensure you filter on the
partition/sort keys of the source
● Restrict table scans on all source
tables if possible
● Understanding your warehouse
Wrap up Considered the Snowplow Tracking SDKs
- Why we hand craft them
- Differences in Server vs Client SDKs
How do we model billions of rows of atomic data?
- Incrementally!
- Aggregation brings many benefits for analysis
Configuration is painful for everyone
- Easy to get carried away with configurable trackers
but then Data Models need to support it too
Schema’d events make it easy to make type safe objects
for us with the Tracking SDKs
- Great for tracking engineers
Understand your full pipeline to extract the most
- How the data is tracked and processed impacts
how you can model the data

More Related Content

Similar to OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock - Snowplow.pdf

A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
Konrad Kokosa
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
sash236
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
Wim Godden
 
Salesforce Lightning Data Services- Hands on Training
Salesforce Lightning Data Services- Hands on TrainingSalesforce Lightning Data Services- Hands on Training
Salesforce Lightning Data Services- Hands on Training
Sunil kumar
 
Mixpanel
MixpanelMixpanel
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Spark Summit
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
Chris Saxon
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
yalisassoon
 
Backbone.js
Backbone.jsBackbone.js
Backbone.js
Knoldus Inc.
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps
WSO2
 
CQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHPCQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHP
Davide Bellettini
 
MySQL under the siege
MySQL under the siegeMySQL under the siege
MySQL under the siege
Source Ministry
 
Protractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applicationsProtractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applications
Ludmila Nesvitiy
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
Puneet Behl
 
Streaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesStreaming Analytics for Financial Enterprises
Streaming Analytics for Financial Enterprises
Databricks
 
PyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management toolPyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management tool
Crea Very
 
When you need more data in less time...
When you need more data in less time...When you need more data in less time...
When you need more data in less time...
Bálint Horváth
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Reportnyin27
 
Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
Giuseppe Gaviani
 

Similar to OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock - Snowplow.pdf (20)

A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
Salesforce Lightning Data Services- Hands on Training
Salesforce Lightning Data Services- Hands on TrainingSalesforce Lightning Data Services- Hands on Training
Salesforce Lightning Data Services- Hands on Training
 
Know your SQL Server - DMVs
Know your SQL Server - DMVsKnow your SQL Server - DMVs
Know your SQL Server - DMVs
 
Mixpanel
MixpanelMixpanel
Mixpanel
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
 
Backbone.js
Backbone.jsBackbone.js
Backbone.js
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps
 
CQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHPCQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHP
 
MySQL under the siege
MySQL under the siegeMySQL under the siege
MySQL under the siege
 
Protractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applicationsProtractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applications
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Streaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesStreaming Analytics for Financial Enterprises
Streaming Analytics for Financial Enterprises
 
PyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management toolPyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management tool
 
When you need more data in less time...
When you need more data in less time...When you need more data in less time...
When you need more data in less time...
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Report
 
Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
 

More from Altinity Ltd

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 

Recently uploaded

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock - Snowplow.pdf

  • 1. Building Event Collection SDKs and Data Models OSA Con ‘22 Paul Boocock @paul_boocock 2022 www.snowplow.io
  • 2. Table of contents What is Snowplow? A Quick Introduction Tracking SDKs How we build them and our decisions Data Models Considerations for modeling the raw data _01 _02 _03 2022 www.snowplow.io
  • 3. What is Snowplow? A quick intro _01 www.snowplow.io
  • 6. But first, a small detour… www.snowplow.io { "$schema": "http://iglucentral.com/schemas/com.snowplowana lytics.self-desc/schema/jsonschema/1-0-0#", "description": "JSON schema for a button click event", "self": { "vendor": "com.acme", "name": "click", "format": "jsonschema", "version": "1-0- 0" }, "type": "object", "properties": { "button": { "type": ["string", "null"], "maxLength": 255 } }, "additionalProperties": false }
  • 7. www.snowplow.io The data created looks a bit like this user_id page_url timestamp device city mkt_campaign Default content_id content_title author date_created button_click Custom Custom Entity Event
  • 8. But can crucially be evolved over time www.snowplow.io { "$schema": "http://iglucentral.com/schemas/com.snowplowana lytics.self-desc/schema/jsonschema/1-0-0#", "description": "JSON schema for a button click event", "self": { "vendor": "com.acme", "name": "click", "format": "jsonschema", "version": "1-0- 0" }, "type": "object", "properties": { "button": { "type": ["string", "null"], "maxLength": 255 } }, "additionalProperties": false } { "$schema": "http://iglucentral.com/schemas/com.snowplowana lytics.self-desc/schema/jsonschema/1-0-0#", "description": "JSON schema for a button click event", "self": { "vendor": "com.acme", "name": "click", "format": "jsonschema", "version": "1-0- 1" }, "type": "object", "properties": { "button": { "type": ["string", "null"], "maxLength": 255 }, "index": { "type": ["integer", "null"] } }, "additionalProperties": false }
  • 9. www.snowplow.io The real benefit of that approach is how the data then looks Default fields (1/130) Custom Event data event_name unstruct_event_com_acme_click_1 click { "button": "open_article" } Default fields (1/130) Custom Event data event_name unstruct_event_com_acme_click_1 click { "button": "open_article" } click { "button": "open_article", "index": "2" }
  • 11. Snowplow Tracking SDKs An Introduction 2022 www.snowplow.io
  • 13. Multi-language Support 1. Quickly developed beyond a web only platform Web Tracker is unique as different challenges on the web vs other platforms So, what do we do? 2. Tempting to auto generate - Works well for some trackers (server) - Less well for others (client) https://quicktype.io/? www.snowplow.io
  • 14. Building SDKs Hand craft SDKs Autogenerate SDKs Can we do both? www.snowplow.io
  • 15. Server vs Client SDKs How do they differ? 2022 www.snowplow.io
  • 19. Sensible defaults Whilst the SDKs are incredibly configurable, opting for sensible defaults keeps users happy Where possible we now try to be more opinionated and don’t offer configuration Works well until someone asks us if they can configure it on Github! www.snowplow.io
  • 21. Data modeling is the process of using business logic to aggregate or otherwise transform raw data. www.snowplow.io
  • 22. 1 Process clickstream data from raw events to create aggregate tables of views, sessions and users, reducing the volume of data massively while adding quality 2 Deal with user stitching across sessions and devices 3 Compatible with schema evolution, so when you add new fields to your contexts they immediately show up in your data models users sessions views raw events 3% 6% 20% Rows relative to raw events www.snowplow.io Modeling is not an afterthought
  • 23. This end-to-end approach results in Clean, structured data always available in your warehouse Consistent business logic defined once means no qualms about what metrics mean More time spent building out ML/AI models instead of toiling with data problems www.snowplow.io
  • 25. What does the data look like
  • 26. Structure data to match your product video_play view video_pause heartbeat click content id: ‘abc123’ type: ‘video’ date updated: 2019-06... title: ‘Birthday…’ creator: ‘Nicholas Witchell’ theme: ‘royals’ native: FALSE personalities: [‘Kate’, ‘Wills’, ‘George’] Events Entity search content content content
  • 28. Consolidation Similarities between queries: ● Common levels of aggregation ● Repeated logic, like joins Consolidate ad-hoc queries into a set of derived tables. These generalised tables can be used to solve a variety of use cases. raw events page views sessions users 20% 6% 3% Rows relative to raw events
  • 29. Value of a page or screen view PAGE VIEW HEARTBEAT ENGAGEMENT time_engaged: 25s scroll_depth(x): 100% scroll_depth(y): 37% clicks: 1 shares: 0 time_engaged: 15s scroll_depth(x): 100% scroll_depth(y): 20% clicks: 2 shares: 0 time_engaged: 35s scroll_depth(x): 100% scroll_depth(y): 86% clicks: 0 shares: 1
  • 30. One Step Further - Incremental Models As the business continues to grow: ● The size of the events table grows. ● Behavioural data is more business critical and needs to be processed more regularly. Meaning: ● Running the derived tables in a drop and compute manner is becoming increasingly costly and taking considerable time. ● Processing events in an incremental manner becomes a necessity.
  • 31. #page_views.sql {{ config( materialized='table' ) }} select page_view_id, ... row_number() over ( partition by session_id order by derived_tstamp ) AS page_view_in_session_index from {{ ref('events') }} where event_name = 'page_view' #page_views.sql {{ config( materialized='incremental', unique_key='page_view_id' ) }} with sessions_with_new_events as ( select distinct session_id from {{ ref('events') }} where event_name = 'page_view' {% if is_incremental() %} and derived_tstamp > ( select max(derived_tstamp) from {{this}} ) {% endif %} ) select e.page_view_id, ... row_number() over ( partition by e.session_id order by e.derived_tstamp ) AS page_view_in_session_index from {{ ref('events') }} e inner join sessions_with_new_events s on e.session_id = s.session_id where e.event_name = 'page_view' Drop & Recompute Incremental Page views - Incremental
  • 32. www.snowplow.io Process the least data possible #page_views.sql {{ config( materialized='incremental', unique_key='page_view_id' ) }} with sessions_with_new_events as ( select distinct session_id from {{ ref('events') }} where event_name = 'page_view' {% if is_incremental() %} and derived_tstamp > ( select max(derived_tstamp) from {{this}} ) {% endif %} ) Reduce the amount of data by: ● Ensure you filter on the partition/sort keys of the source
  • 33. www.snowplow.io Process the least data possible #page_views.sql {{ config( materialized='incremental', unique_key='page_view_id' ) }} with sessions_with_new_events as ( select distinct session_id from {{ ref('events') }} where event_name = 'page_view' {% if is_incremental() %} and collector_tstamp > ( select max(collector_tstamp) from {{this}} ) {% endif %} ) Reduce the amount of data by: ● Ensure you filter on the partition/sort keys of the source
  • 34. www.snowplow.io Process the least data possible #page_views.sql {{ config( materialized='incremental', unique_key='page_view_id' ) }} with sessions_with_new_events as ( select ... ) select e.page_view_id, ... row_number() over ( partition by e.session_id order by e.derived_tstamp ) AS page_view_in_session_index from {{ ref('events') }} e inner join sessions_with_new_events s on e.session_id = s.session_id where e.event_name = 'page_view' Reduce the amount of data by: ● Ensure you filter on the partition/sort keys of the source ● Restrict table scans on all source tables if possible
  • 35. www.snowplow.io Process the least data possible #page_views.sql {{ config( materialized='incremental', unique_key='page_view_id' ) }} with sessions_with_new_events as ( select ... ) select e.page_view_id, ... from {{ ref('events') }} e inner join sessions_with_new_events s on e.session_id = s.session_id where e.event_name = 'page_view' -- limit table scans {% if is_incremental() %} and collector_tstamp > ( select dateadd( day, -3, max(collector_tstamp)) from {{this}} ) {% endif %} Reduce the amount of data by: ● Ensure you filter on the partition/sort keys of the source ● Restrict table scans on all source tables if possible
  • 36. www.snowplow.io Process the least data possible #page_views.sql {{ config( materialized='incremental', unique_key='page_view_id' ) }} with sessions_with_new_events as ( select distinct session_id from {{ ref('events') }} where event_name = 'page_view' {% if is_incremental() %} and derived_tstamp > ( select max(derived_tstamp) from {{this}} ) {% endif %} ) Incremental Reduce the amount of data by: ● Ensure you filter on the partition/sort keys of the source ● Restrict table scans on all source tables if possible ● Understanding your warehouse
  • 37. Wrap up Considered the Snowplow Tracking SDKs - Why we hand craft them - Differences in Server vs Client SDKs How do we model billions of rows of atomic data? - Incrementally! - Aggregation brings many benefits for analysis Configuration is painful for everyone - Easy to get carried away with configurable trackers but then Data Models need to support it too Schema’d events make it easy to make type safe objects for us with the Tracking SDKs - Great for tracking engineers Understand your full pipeline to extract the most - How the data is tracked and processed impacts how you can model the data