Snowplow Meetup Amsterdam #3
Introduction to Snowplow, new users and what we are
working on
1. SNOWPLOW: EMPOWERING SMART PEOPLE
TO DIFFERENTIATE WITH DATA SINCE 2012
What is Snowplow?
Snowplow is an event analytics platform like GA, Adobe, Mixpanel, but we
take a radically new approach
We give you control of your data so you can:
• Ask and answer any question of the data
• Build data-driven applications, in real-time
• Evolve your analytics stack as your business and understanding grow
We set smart people free to do more with their data and differentiate from
their competition
A radically new approach to digital analytics
To differentiate your business with data,
you need to use data differently
Standard solutions
• Standard reports and dashboards, based
on standard data models
• One size fits all
• Your access to your data is mediated by the
vendor
Snowplow: a fresh approach
• You are in control of your analytics stack
• You decide what data to track
• You decide what questions to ask of that
data
• You decide what techniques and
technologies to use to analyze the data
• You own all your data in your own data
warehouse
• Standard reports and dashboards,
based on standard data models
• One size fits all
• Your access to your data is mediated
by the vendor
• You are in control of your analytics
stack
• You decide what data to track
• You decide what questions to ask of
that data
• You decide what techniques and
technologies to use to analyze the data
• You own all your data in your own data
warehouse
Transform your business with data
We empower analysts, engineers, product managers, marketers, operations and
other specialists to do transformative things with data
Ownership
You own your data in
your data warehouse
and your unified log
Control
You decide what data to
collect, what questions
to ask of the data, what
techniques and tools to
use to analyze it and
how to act on the insight
generated
Freedom
Do what you want with
your data. The only limit
is your imagination
Single customer view
Take control of your data: you own your data in your AWS data warehouse
Track events from everywhere: your own applications and marketing channels (web,
mobile, PC, smart home, smart car, Internet of Things, email, call center, etc.)
Stitch data together from each channel and platform to form a single customer view
Flexible: you decide the data structure, enrichment and modeling logic that suits your
business. Easy to change how you collect data as your business, and the questions you
ask of the data, evolve
Act on your data in real-time
All your data available in seconds. Act on it in real-time
Your data where you need it, including Amazon Redshift, Kinesis, Elasticsearch
and S3
Rich data: define your entities and events. Enrich them with first and third party
data
Built for analysis: highly structured data, easy to query. Clean separation
between events and business logic applied
Trust your data
Trust your data: auditable data pipeline. Systematic data validation and
error reporting
Scalable: track and process billions of events per day
Low cost: it runs on AWS. Much cheaper than the commercial alternatives
Data experts: tap into our data analytics and engineering teams, our
partners and our data-sophisticated community to do more with our
technology
2. SNOWPLOW IS GROWING WITH EXCITING
NEW USERS AND USE CASES
Snowplow is widely used by companies across
sectors to take control of their data
3. WE HAVE A PACKED ROADMAP FOR 2017
3.1 PIPELINE PORTABILITY
Support for more clouds
Adopting Snowplow shouldn’t mean being tied to one cloud. Take Snowplow’s
core capabilities – and your data – with you when migrating
Late 2016 we released a Snowplow real-time on-premise beta with Kafka
Focus this year is getting real-time Snowplow running on Google Cloud Platform
– using Cloud Pub/Sub and Google Cloud Dataflow
Also actively brainstorming a port to Azure with Microsoft
Support for more storage targets
There is no single“hero” database for Snowplow – different databases
meet different analytical and operational needs
Currently support Redshift and Elasticsearch, plus limited Postgres support
Want to add Avro/Parquet/S3, AWS Athena, BigQuery, Snowflake
Also doing a lot of work implementing analytics-on-write with DynamoDB,
and want to restart R&D on graph databases
3.2 TAILORING SNOWPLOW TO YOUR INDUSTRY
3.2 TAILORING SNOWPLOW TO YOUR INDUSTRY
Tailoring Snowplow to your industry
Snowplow users in specific industries (publishing, retail, gaming) want
Snowplow to do more out of the box
1. Schema bundles for specific industries – based on our experiences with
customers
2. Event data models for these industries, using these event schemas
3. BI templates for these industries in specific tools e.g. Looker, Redash,
Superset
More intelligent event sources
Want to build a better “day zero experience” for analysts and developers
Working on more developer-friendly trackers – easier to get started,
fewer intimidating configuration options
Adding more auto-tracking of events, especially in web/mobile SDKs
Webhooks – currently support about 12 external platforms. We are
exploring schema inference so we can add another 100 sources!
3.3 RE-ARCHITECTING SNOWPLOW FOR THE
FUTURE
Moving our batch pipeline to Spark
Have been using Hadoop (Cascading/Scalding) for 5 years - finally porting
the core batch jobs to Spark
Community-driven effort (Phil Kallos @ Hired, David White @ Nordstrom,
Gabor Ratky @ Secret Sauce)
Code complete, going into QA soon
Expect positive performance impact, plus a key building block for our real-
time loading of Redshift
Mega-scale Snowplow
Exploring a “mega-scale” version of Snowplow for 100m+ events a day,
event archives of 50+ billion
At that scale, you start running into problems with S3, Redshift, EMR
Ongoing R&D on a set of practices/patterns to make mega-scale Snowplow
less painful
Techniques include file compression & compaction, job flow
parallelisation, “reactive batch” approaches
3.4 DECISIONING AND RESPONSE
Sauna lets you turn your customer insights into
actions in different marketing channels
After a batch beta, now working on real-time
support for Sauna
Express individual commands as self-describing JSONs in a Kinesis (or
Kafka or SQS or Cloud Pub/Sub) stream/topic
Sauna will read these commands and execute them using the appropriate
responder
At launch we will support sending Slack and HipChat messages, and firing
incidents to PagerDuty
QUESTIONS?

Snowplow presentation for Amsterdam Meetup #3

  • 1.
    Snowplow Meetup Amsterdam#3 Introduction to Snowplow, new users and what we are working on
  • 2.
    1. SNOWPLOW: EMPOWERINGSMART PEOPLE TO DIFFERENTIATE WITH DATA SINCE 2012
  • 3.
    What is Snowplow? Snowplowis an event analytics platform like GA, Adobe, Mixpanel, but we take a radically new approach We give you control of your data so you can: • Ask and answer any question of the data • Build data-driven applications, in real-time • Evolve your analytics stack as your business and understanding grow We set smart people free to do more with their data and differentiate from their competition
  • 4.
    A radically newapproach to digital analytics To differentiate your business with data, you need to use data differently Standard solutions • Standard reports and dashboards, based on standard data models • One size fits all • Your access to your data is mediated by the vendor Snowplow: a fresh approach • You are in control of your analytics stack • You decide what data to track • You decide what questions to ask of that data • You decide what techniques and technologies to use to analyze the data • You own all your data in your own data warehouse • Standard reports and dashboards, based on standard data models • One size fits all • Your access to your data is mediated by the vendor • You are in control of your analytics stack • You decide what data to track • You decide what questions to ask of that data • You decide what techniques and technologies to use to analyze the data • You own all your data in your own data warehouse
  • 5.
    Transform your businesswith data We empower analysts, engineers, product managers, marketers, operations and other specialists to do transformative things with data Ownership You own your data in your data warehouse and your unified log Control You decide what data to collect, what questions to ask of the data, what techniques and tools to use to analyze it and how to act on the insight generated Freedom Do what you want with your data. The only limit is your imagination
  • 6.
    Single customer view Takecontrol of your data: you own your data in your AWS data warehouse Track events from everywhere: your own applications and marketing channels (web, mobile, PC, smart home, smart car, Internet of Things, email, call center, etc.) Stitch data together from each channel and platform to form a single customer view Flexible: you decide the data structure, enrichment and modeling logic that suits your business. Easy to change how you collect data as your business, and the questions you ask of the data, evolve
  • 7.
    Act on yourdata in real-time All your data available in seconds. Act on it in real-time Your data where you need it, including Amazon Redshift, Kinesis, Elasticsearch and S3 Rich data: define your entities and events. Enrich them with first and third party data Built for analysis: highly structured data, easy to query. Clean separation between events and business logic applied
  • 8.
    Trust your data Trustyour data: auditable data pipeline. Systematic data validation and error reporting Scalable: track and process billions of events per day Low cost: it runs on AWS. Much cheaper than the commercial alternatives Data experts: tap into our data analytics and engineering teams, our partners and our data-sophisticated community to do more with our technology
  • 9.
    2. SNOWPLOW ISGROWING WITH EXCITING NEW USERS AND USE CASES
  • 10.
    Snowplow is widelyused by companies across sectors to take control of their data
  • 11.
    3. WE HAVEA PACKED ROADMAP FOR 2017
  • 12.
  • 13.
    Support for moreclouds Adopting Snowplow shouldn’t mean being tied to one cloud. Take Snowplow’s core capabilities – and your data – with you when migrating Late 2016 we released a Snowplow real-time on-premise beta with Kafka Focus this year is getting real-time Snowplow running on Google Cloud Platform – using Cloud Pub/Sub and Google Cloud Dataflow Also actively brainstorming a port to Azure with Microsoft
  • 14.
    Support for morestorage targets There is no single“hero” database for Snowplow – different databases meet different analytical and operational needs Currently support Redshift and Elasticsearch, plus limited Postgres support Want to add Avro/Parquet/S3, AWS Athena, BigQuery, Snowflake Also doing a lot of work implementing analytics-on-write with DynamoDB, and want to restart R&D on graph databases
  • 15.
    3.2 TAILORING SNOWPLOWTO YOUR INDUSTRY
  • 16.
    3.2 TAILORING SNOWPLOWTO YOUR INDUSTRY
  • 17.
    Tailoring Snowplow toyour industry Snowplow users in specific industries (publishing, retail, gaming) want Snowplow to do more out of the box 1. Schema bundles for specific industries – based on our experiences with customers 2. Event data models for these industries, using these event schemas 3. BI templates for these industries in specific tools e.g. Looker, Redash, Superset
  • 18.
    More intelligent eventsources Want to build a better “day zero experience” for analysts and developers Working on more developer-friendly trackers – easier to get started, fewer intimidating configuration options Adding more auto-tracking of events, especially in web/mobile SDKs Webhooks – currently support about 12 external platforms. We are exploring schema inference so we can add another 100 sources!
  • 19.
  • 20.
    Moving our batchpipeline to Spark Have been using Hadoop (Cascading/Scalding) for 5 years - finally porting the core batch jobs to Spark Community-driven effort (Phil Kallos @ Hired, David White @ Nordstrom, Gabor Ratky @ Secret Sauce) Code complete, going into QA soon Expect positive performance impact, plus a key building block for our real- time loading of Redshift
  • 21.
    Mega-scale Snowplow Exploring a“mega-scale” version of Snowplow for 100m+ events a day, event archives of 50+ billion At that scale, you start running into problems with S3, Redshift, EMR Ongoing R&D on a set of practices/patterns to make mega-scale Snowplow less painful Techniques include file compression & compaction, job flow parallelisation, “reactive batch” approaches
  • 22.
  • 23.
    Sauna lets youturn your customer insights into actions in different marketing channels
  • 24.
    After a batchbeta, now working on real-time support for Sauna Express individual commands as self-describing JSONs in a Kinesis (or Kafka or SQS or Cloud Pub/Sub) stream/topic Sauna will read these commands and execute them using the appropriate responder At launch we will support sending Slack and HipChat messages, and firing incidents to PagerDuty
  • 25.