Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Snowplow: evolve your analytics stack with your business


Published on

Deep dive into how digital analytics stacks need to evolve with businesses, and how self-describing data and event data modeling are the key elements that enable Snowplow data pipeliens to elegantly evolve over time

Published in: Business
  • Login to see the comments

Snowplow: evolve your analytics stack with your business

  1. 1. Snowplow: evolve your analytics stack with your business Snowplow Meetup San Francisco, Feb 2017
  2. 2. Our businesses are constantly evolving… • Our digital products (apps and platforms) are constantly developing • The questions we ask of our data are constantly changing • It is critical that our analytics stack can evolve with our business
  3. 3. Self-describing data Event data modeling+ Analytics stack that evolves with your business How Snowplow users evolve their analytics stacks with their business
  4. 4. Self-describing data Overview
  5. 5. Event data varies widely by company
  6. 6. As a Snowplow user, you can define your own events and entities Events Entities (contexts) • Build castle • Form alliance • Declare war • Player • Game • Level • Currency • View product • Buy product • Deliver product • Product • Customer • Basket • Delivery van
  7. 7. You then define a schema for each event and entity { "$schema": " com.snowplowanalytics.self-desc/schema/jsonschema/ 1-0-0#", "description": "Schema for a fighter context", "self": { "vendor": "com.ufc", "name": "fighter_context", "format": "jsonschema", "version": "1-0-1" }, "type": "object", "properties": { "FirstName": { "type": "string" }, "LastName": { "type": "string" }, "Nickname": { "type": "string" }, "FacebookProfile": { "type": "string" }, "TwitterName": { "type": "string" }, "GooglePlusProfile": { "type": "string" }, "HeightFormat": { "type": "string" }, "HeightCm": { "type": ["integer", "null"] }, "Weight": { "type": ["integer", "null"] }, "WeightKg": { "type": ["integer", "null"] }, "Record": { "type": "string", "pattern": "^[0-9]+-[0-9]+-[0-9]+$" }, "Striking": { "type": ["number", "null"], "maxdecimal": 15 }, "Takedowns": { "type": ["number", "null"], "maxdecimal": 15 }, "Submissions": { "type": ["number", "null"], "maxdecimal": 15 }, "LastFightUrl": { "type": "string" }, "LastFightEventText": { "type": "string" }, "NextFightUrl": { "type": "string" }, "NextFightEventText": { "type": "string" }, "LastFightDate": { "type": "string", "format": "timestamp" } }, "additionalProperties": false } Upload the schema to Iglu
  8. 8. Then send data into Snowplow as self- describing JSONs 1. Validation 2. Dimension widening 3. Data modeling { “schema”: “iglu:com.israel365/ temperature_measure/jsonschema/1-0-0”, “data”: { “timestamp”: “2016-11-16 19:53:21”, “location”: “Berlin”, “temperature”: 3 “units”: “Centigrade” } } { "$schema": " com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an ad impression event", "self": { "vendor": “com.israel365", "name": “temperature_measure", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "timestamp": { "type": "string" }, "location": { "type": "string" }, … }, … Event Schema reference Schema
  9. 9. The schemas can then be used in a number of ways • Validate the data (important for data quality) • Load the data into tidy tables in your data warehouse • Make it easy / safe to write downstream data processing application (e.g. for real-time users)
  10. 10. Event data modeling Overview
  11. 11. What is event data modeling? 1. Validation 2. Dimension widening 3. Data modeling Event data modeling is the process of using business logic to aggregate over event-level data to produce 'modeled' data that is simpler for querying.
  12. 12. event 1 event n … Users Sessions … Funnels Immutable. Unopiniated. Hard to consume. Not contentious Mutable and opinionated. Easy to consume. May be contentious Unmodeled data Modeled data
  13. 13. In general, event data modeling is performed on the complete event stream • Late arriving events can change the way you understand earlier arriving events • If we change our data models: this gives us the flexibility to recompute historical data based on the new model
  14. 14. The evolving event data pipeline
  15. 15. How do we handle pipeline evolution? PUSH FACTORS: What is being tracked will change over time PULL FACTORS: What questions are being asked of the data will change over time Businesses are not static, so event pipelines should not be either Web Apps Servers Comms channels Push … Data warehouse Data exploration Predictive modeling Real-time dashboards Real-time, data-driven applications RT Bidder Voucher Person- alization … Collection Processing Smart car / home …
  16. 16. Push example: new source of event data • If data is self-describing it is easy to add an additional sources • Self-describing data is good for managing bad data and pipeline evolution I’m an email send event and I have information about the recipient (email address, customer ID) and the email (id, tags, variation)
  17. 17. Pull example: new business question Answer Insight Question?
  18. 18. Answering the question: 3 possibilities 1. Existing data model supports answer 2. Need to update data model 3. Need to update data model and data collection • Possible to answer question with existing modeled data • Data collected already supports answer • Additional computation required in data modeling step (additional logic) • Need to extend event tracking • Need to update data models to incorporate additional data (and potentially additional logic)
  19. 19. Self-describing data and the ability to recompute data models are essential to enable pipeline evolution Self-describing data Recompute data models on entire data set • Updating existing events and entities in a backward compatible way e.g. add optional new fields • Update existing events and entities in a backwards incompatible way e.g. change field types, remove fields, add compulsory fields • Add new event and entity types • Add new columns to existing derived tables e.g. add new audience segmentation • Change the way existing derived tables are generated e.g. change sessionization logic • Create new derived tables
  20. 20. Questions?