SlideShare a Scribd company logo
 	
  Adding	
  Data	
  Schemas	
  to	
  
Snowplow	
  
Big	
  Data	
  Budapest	
  Meetup	
  -­‐	
  5	
  June	
  2014	
  
Agenda	
  today	
  
1.  Introduc;on	
  to	
  Snowplow	
  
2.  Evolu;on	
  of	
  Snowplow	
  
3.  The	
  answer:	
  schema	
  all	
  the	
  things!	
  
4.  Snowplow	
  roadmap	
  
5.  Ques;ons	
  
Introduc8on	
  to	
  Snowplow	
  
Snowplow	
  is	
  an	
  open-­‐source	
  web	
  and	
  event	
  analy8cs	
  pla<orm,	
  
first	
  version	
  released	
  in	
  early	
  2012	
  
•  Co-­‐founders	
  Alex	
  Dean	
  and	
  Yali	
  Sassoon	
  met	
  at	
  
OpenX,	
  the	
  open-­‐source	
  ad	
  technology	
  business	
  
in	
  2008	
  
•  ASer	
  leaving	
  OpenX,	
  Alex	
  and	
  Yali	
  set	
  up	
  Keplar,	
  
a	
  niche	
  digital	
  product	
  and	
  analy;cs	
  consultancy	
  
•  We	
  released	
  Snowplow	
  as	
  a	
  skunkworks	
  
prototype	
  at	
  start	
  of	
  2012:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  github.com/snowplow/snowplow	
  
•  We	
  started	
  working	
  full	
  ;me	
  on	
  Snowplow	
  in	
  
summer	
  2013	
  
We	
  wanted	
  to	
  take	
  a	
  fresh	
  approach	
  to	
  web	
  analy8cs	
  
•  Your	
  own	
  web	
  event	
  data	
  -­‐>	
  in	
  your	
  own	
  data	
  warehouse	
  
•  Your	
  own	
  event	
  data	
  model	
  
•  Slice	
  /	
  dice	
  and	
  mine	
  the	
  data	
  in	
  highly	
  bespoke	
  ways	
  to	
  answer	
  your	
  
specific	
  business	
  ques;ons	
  
•  Plug	
  in	
  the	
  broadest	
  possible	
  set	
  of	
  analysis	
  tools	
  to	
  drive	
  value	
  from	
  your	
  
data	
  
Data	
  warehouse	
  Data	
  pipeline	
  
Analyse	
  your	
  data	
  in	
  
any	
  analysis	
  tool	
  
By	
  spring	
  2013	
  we	
  had	
  arrived	
  at	
  a	
  rela8vely	
  stable	
  batch-­‐based	
  
processing	
  architecture	
  
Website	
  /	
  webapp	
  
Snowplow	
  Hadoop	
  data	
  pipeline	
  
CloudFront-­‐
based	
  event	
  
collector	
  
Scalding-­‐
based	
  
enrichment	
  
on	
  Hadoop	
  
JavaScript	
  
event	
  tracker	
  
Amazon	
  
RedshiS	
  /	
  
PostgreSQL	
  
Amazon	
  S3	
  
or	
  
Clojure-­‐
based	
  event	
  
collector	
  
Evolu8on	
  of	
  Snowplow	
  
Snowplow	
  is	
  evolving	
  from	
  a	
  web	
  analy8cs	
  pla<orm	
  into	
  a	
  
general	
  event	
  analy8cs	
  pla<orm	
  
Data	
  warehouse	
  
Collect	
  event	
  data	
  
from	
  any	
  connected	
  
device	
  
Web	
  analysts	
  work	
  with	
  a	
  small	
  number	
  of	
  event	
  types	
  –	
  outside	
  
of	
  web,	
  the	
  number	
  of	
  possible	
  event	
  types	
  is…	
  infinite	
  
Web	
  events	
  
All	
  events	
  
•  Page	
  view	
   •  Order	
   •  Add	
  to	
  basket	
  •  Page	
  ac;vity	
  
•  Game	
  saved	
   •  Machine	
  broke	
  •  Car	
  started	
  
•  Spellcheck	
  run	
   •  Screenshot	
  taken	
  •  Fridge	
  empty	
  
•  App	
  crashed	
   •  Disk	
  full	
  •  SMS	
  sent	
  
•  Screen	
  viewed	
   •  Tweet	
  draSed	
  •  Player	
  died	
  
•  Taxi	
  arrived	
   •  Phonecall	
  ended	
  •  Cluster	
  started	
  
•  Till	
  opened	
   •  Product	
  returned	
  
∞	
  
There	
  are	
  two	
  historic	
  approaches	
  to	
  dealing	
  with	
  the	
  explosion	
  
of	
  possible	
  event	
  types	
  
Web	
  analy8cs	
  vendors	
   Mobile	
  and	
  app	
  analy8cs	
  vendors	
  
Custom	
  Variables	
   Schema-­‐less	
  JSONs	
  
Custom	
  variables	
  are	
  very	
  restric8ve	
  
	
  
1.  Take	
  a	
  standard	
  web	
  event,	
  like	
  a	
  page	
  view:	
  
2.  and	
  add	
  custom	
  variables	
  un;l	
  it	
  becomes	
  something	
  totally	
  different:	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =	
  a	
  “taxi	
  arrived”	
  event,	
  kind	
  of!	
  
Page	
  View	
  
Page	
  View	
   vehicle=taxi23	
   status=arrived	
  +	
   +	
  
Schema-­‐less	
  JSONs	
  are	
  beWer,	
  but	
  they	
  have	
  a	
  different	
  set	
  of	
  
problems	
  
Issues	
  with	
  the	
  event	
  name:	
  
•  Separate	
  from	
  the	
  event	
  proper;es	
  
•  Not	
  versioned	
  
•  Not	
  unique	
  –	
  HBO	
  video	
  played	
  
versus	
  Brightcove	
  video	
  played	
  
Lots	
  of	
  unanswered	
  ques;ons	
  about	
  the	
  
proper;es:	
  
•  Is	
  length	
  required,	
  and	
  is	
  it	
  always	
  a	
  
number?	
  
•  Is	
  id	
  required,	
  and	
  is	
  it	
  always	
  a	
  string?	
  
•  What	
  other	
  op;onal	
  proper;es	
  are	
  
allowed	
  for	
  a	
  video	
  play?	
  
Other	
  issues:	
  
•  What	
  if	
  the	
  developer	
  
accidentally	
  starts	
  sending	
  
“len”	
  instead	
  of	
  “length”?	
  The	
  
data	
  will	
  end	
  up	
  split	
  across	
  
two	
  separate	
  fields	
  
•  Why	
  does	
  the	
  analyst	
  need	
  to	
  
keep	
  an	
  implicit	
  schema	
  in	
  
their	
  head	
  to	
  analyze	
  video	
  
played	
  events?	
  
The	
  answer:	
  schema	
  all	
  the	
  
things!	
  
When	
  a	
  developer	
  or	
  analyst	
  defines	
  a	
  new	
  event	
  in	
  JSON,	
  let’s	
  
ask	
  them	
  to	
  create	
  a	
  JSON	
  Schema	
  for	
  that	
  event	
  
Addi;onal	
  op;onal	
  field	
  we	
  might	
  
not	
  know	
  about	
  otherwise	
  
No	
  other	
  fields	
  
allowed	
  
Yes	
  length	
  should	
  always	
  be	
  a	
  
number	
  
But	
  we	
  need	
  to	
  let	
  our	
  event	
  defini8ons	
  evolve,	
  so	
  let’s	
  
add	
  versioning	
  –	
  we’re	
  calling	
  this	
  SchemaVer	
  
MODEL-REVISION-ADDITION!
•  Start	
  versioning	
  at	
  1-­‐0-­‐0	
  –	
  so	
  1-­‐0-­‐0,	
  1-­‐0-­‐1,	
  1-­‐0-­‐2,	
  1-­‐1-­‐0	
  etc	
  
•  Try	
  to	
  s;ck	
  to	
  backwards-­‐compa;ble	
  ADDITION	
  upgrades	
  as	
  much	
  
as	
  possible	
  
Where	
  are	
  our	
  schemas	
  going	
  to	
  live?	
  We	
  need	
  a	
  schema	
  
repository/registry	
  
Schema	
  repo	
  {}!
Enrichment	
  
Manager	
  
Raw	
  events	
  
in	
  JSON	
  
format	
  
Enriched	
  
events	
  in	
  
ThriS	
  or	
  
Arvo	
  
format	
  
Shredder	
  
1.	
  Test	
  
instrumenta;on	
  
2.	
  Validate	
  
events	
  
3.	
  Define	
  
structure	
  
4.	
  Drive	
  
shredding	
  
Enriched	
  
events	
  in	
  
TSV	
  ready	
  
for	
  loading	
  
into	
  db	
  
5.	
  Define	
  
structure	
  
We	
  need	
  to	
  namespace	
  our	
  schemas	
  properly	
  to	
  prevent	
  clashes	
  
and	
  confusion	
  in	
  our	
  schema	
  repository	
  
iglu:com.channel2.vod/video_played/jsonschema/1-0-0!
We	
  are	
  calling	
  our	
  schema	
  methodology	
  “Iglu”	
  
The	
  vendor	
  of	
  this	
  event	
  
Event	
  name	
  
Schema	
  format	
  
Schema	
  
version	
  
Bringing	
  it	
  all	
  together,	
  let’s	
  now	
  make	
  the	
  event	
  JSONs	
  self-­‐
describing,	
  with	
  a	
  schema	
  header	
  and	
  data	
  body	
  
And	
  for	
  good	
  measure,	
  let’s	
  add	
  in	
  our	
  schema	
  informa8on	
  into	
  
the	
  JSON	
  Schema	
  itself	
  	
  
Snowplow	
  roadmap	
  
Self-­‐describing	
  JSON	
  Schemas	
  are	
  coming	
  in	
  the	
  next	
  release	
  of	
  
Snowplow	
  
We	
  are	
  also	
  star8ng	
  to	
  define	
  third-­‐party	
  events	
  for	
  Snowplow	
  
integra8on,	
  star8ng	
  with	
  Zendesk	
  customer	
  support	
  events	
  
Ques8ons?	
  
	
  
hlp://snowplowanaly;cs.com	
  
hlps://github.com/snowplow/snowplow	
  
@snowplowdata	
  
	
  
To	
  chat	
  –	
  @alexcrdean	
  on	
  Twiler	
  or	
  alex@snowplowanaly;cs.com	
  

More Related Content

What's hot

SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101
Sparkhound Inc.
 
Enterprise Security Guided Tour
Enterprise Security Guided TourEnterprise Security Guided Tour
Enterprise Security Guided Tour
Splunk
 
SAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - OverviewSAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
Ctrie Data Structure
Ctrie Data StructureCtrie Data Structure
Ctrie Data Structure
Aleksandar Prokopec
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
EUCLID project
 
SAP BW Introduction.
SAP BW Introduction.SAP BW Introduction.
Splunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | EdurekaSplunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | Edureka
Edureka!
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
WinWire Technologies Inc
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
BI Brainz
 
Secrets of the DSpace Submission Form
Secrets of the DSpace Submission FormSecrets of the DSpace Submission Form
Secrets of the DSpace Submission Form
Bram Luyten
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
Databricks
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Mark Kromer
 
Getting Started with Splunk Enterprise - Demo
Getting Started with Splunk Enterprise - DemoGetting Started with Splunk Enterprise - Demo
Getting Started with Splunk Enterprise - Demo
Splunk
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Mark Kromer
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
Andrea Bollini
 
Introduction to Dublin Core Metadata
Introduction to Dublin Core MetadataIntroduction to Dublin Core Metadata
Introduction to Dublin Core Metadata
Hannes Ebner
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
4Science
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
Splunk
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical Overview
David Lutz
 

What's hot (20)

SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101
 
Enterprise Security Guided Tour
Enterprise Security Guided TourEnterprise Security Guided Tour
Enterprise Security Guided Tour
 
SAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - OverviewSAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - Overview
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Ctrie Data Structure
Ctrie Data StructureCtrie Data Structure
Ctrie Data Structure
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
SAP BW Introduction.
SAP BW Introduction.SAP BW Introduction.
SAP BW Introduction.
 
Splunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | EdurekaSplunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | Edureka
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
Secrets of the DSpace Submission Form
Secrets of the DSpace Submission FormSecrets of the DSpace Submission Form
Secrets of the DSpace Submission Form
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
Getting Started with Splunk Enterprise - Demo
Getting Started with Splunk Enterprise - DemoGetting Started with Splunk Enterprise - Demo
Getting Started with Splunk Enterprise - Demo
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
 
Introduction to Dublin Core Metadata
Introduction to Dublin Core MetadataIntroduction to Dublin Core Metadata
Introduction to Dublin Core Metadata
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical Overview
 

Viewers also liked

Snowplow at Sigfig
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfig
yalisassoon
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
Alexander Dean
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
yalisassoon
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
yalisassoon
 
Using Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMadeUsing Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMade
yalisassoon
 
Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2
yalisassoon
 
Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016
yalisassoon
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
yalisassoon
 
Modeling event data
Modeling event dataModeling event data
Modeling event data
yalisassoon
 
Lean Product Analytics by Dan Olsen
Lean Product Analytics by Dan OlsenLean Product Analytics by Dan Olsen
Lean Product Analytics by Dan Olsen
Dan Olsen
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
yalisassoon
 
A KPI framework for startups
A KPI framework for startupsA KPI framework for startups
A KPI framework for startups
yalisassoon
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 

Viewers also liked (13)

Snowplow at Sigfig
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfig
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
 
Using Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMadeUsing Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMade
 
Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2
 
Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
 
Modeling event data
Modeling event dataModeling event data
Modeling event data
 
Lean Product Analytics by Dan Olsen
Lean Product Analytics by Dan OlsenLean Product Analytics by Dan Olsen
Lean Product Analytics by Dan Olsen
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
A KPI framework for startups
A KPI framework for startupsA KPI framework for startups
A KPI framework for startups
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 

Similar to Big data meetup budapest adding data schemas to snowplow

Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing Snowplow
Alexander Dean
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
Alexander Dean
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
NAVER D2
 
SpringOne 2016 in a nutshell
SpringOne 2016 in a nutshellSpringOne 2016 in a nutshell
SpringOne 2016 in a nutshell
Jeroen Resoort
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
Snowplow Analytics
 
ECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOps
ECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOpsECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOps
ECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOps
European Collaboration Summit
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
Damien Dallimore
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
Giuseppe Gaviani
 
Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014
Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014
Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014
Gil Irizarry
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
Amazon Web Services
 
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
AD113  Speed Up Your Applications w/ Nginx and PageSpeedAD113  Speed Up Your Applications w/ Nginx and PageSpeed
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
edm00se
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
C4Media
 
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
Amazon Web Services
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
Lynn Langit
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
C4Media
 
German introduction to sp framework
German   introduction to sp frameworkGerman   introduction to sp framework
German introduction to sp framework
Bob German
 
How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.com
Salesforce Engineering
 
Develop modern apps using Spring ecosystem at time of BigData
Develop modern apps using Spring ecosystem at time of BigData Develop modern apps using Spring ecosystem at time of BigData
Develop modern apps using Spring ecosystem at time of BigData
Oleg Tsal-Tsalko
 
Dev ops on aws deep dive on continuous delivery - Toronto
Dev ops on aws deep dive on continuous delivery - TorontoDev ops on aws deep dive on continuous delivery - Toronto
Dev ops on aws deep dive on continuous delivery - Toronto
Amazon Web Services
 

Similar to Big data meetup budapest adding data schemas to snowplow (20)

Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing Snowplow
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
SpringOne 2016 in a nutshell
SpringOne 2016 in a nutshellSpringOne 2016 in a nutshell
SpringOne 2016 in a nutshell
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
 
ECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOps
ECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOpsECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOps
ECS19 Elio Struyf - Setting Up Your SPFx CI/CD pipelines on Azure DevOps
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014
Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014
Make Cross-platform Mobile Apps Quickly - SIGGRAPH 2014
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
AD113  Speed Up Your Applications w/ Nginx and PageSpeedAD113  Speed Up Your Applications w/ Nginx and PageSpeed
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
German introduction to sp framework
German   introduction to sp frameworkGerman   introduction to sp framework
German introduction to sp framework
 
How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.com
 
Develop modern apps using Spring ecosystem at time of BigData
Develop modern apps using Spring ecosystem at time of BigData Develop modern apps using Spring ecosystem at time of BigData
Develop modern apps using Spring ecosystem at time of BigData
 
Dev ops on aws deep dive on continuous delivery - Toronto
Dev ops on aws deep dive on continuous delivery - TorontoDev ops on aws deep dive on continuous delivery - Toronto
Dev ops on aws deep dive on continuous delivery - Toronto
 

More from yalisassoon

Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
yalisassoon
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
yalisassoon
 
Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...
yalisassoon
 
Snowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcase
yalisassoon
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
yalisassoon
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
yalisassoon
 
Snowplow Analytics and Looker at Oyster.com
Snowplow Analytics and Looker at Oyster.comSnowplow Analytics and Looker at Oyster.com
Snowplow Analytics and Looker at Oyster.com
yalisassoon
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we do
yalisassoon
 
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
yalisassoon
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
yalisassoon
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
yalisassoon
 
Customer lifetime value
Customer lifetime valueCustomer lifetime value
Customer lifetime value
yalisassoon
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon
 

More from yalisassoon (13)

Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...
 
Snowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcase
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Snowplow Analytics and Looker at Oyster.com
Snowplow Analytics and Looker at Oyster.comSnowplow Analytics and Looker at Oyster.com
Snowplow Analytics and Looker at Oyster.com
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we do
 
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
 
Customer lifetime value
Customer lifetime valueCustomer lifetime value
Customer lifetime value
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 

Recently uploaded

Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!
Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!
Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!
River Recreation - Washington Whitewater Rafting
 
Educational Tour Operators - Edutour.pdf
Educational Tour Operators - Edutour.pdfEducational Tour Operators - Edutour.pdf
Educational Tour Operators - Edutour.pdf
Edu tour
 
How To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptxHow To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptx
edqour001namechange
 
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
v6ldcxuq
 
The Ultimate Travel Guide to Hawaii Island Hopping in 2024
The Ultimate Travel Guide to Hawaii Island Hopping in 2024The Ultimate Travel Guide to Hawaii Island Hopping in 2024
The Ultimate Travel Guide to Hawaii Island Hopping in 2024
adventuressabifn
 
Discover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat TripDiscover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat Trip
White Island Charter
 
Top 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdfTop 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdf
Savita Yadav
 
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
yfuwd
 
Explore Architectural Wonders and Vibrant Culture With Naples Tours
Explore Architectural Wonders and Vibrant Culture With Naples ToursExplore Architectural Wonders and Vibrant Culture With Naples Tours
Explore Architectural Wonders and Vibrant Culture With Naples Tours
Naples Tours
 
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdfHow Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
Eastafrica Travelcompany
 
Excursions in Tahiti Island Adventure
Excursions in Tahiti Island AdventureExcursions in Tahiti Island Adventure
Excursions in Tahiti Island Adventure
Unique Tahiti
 
bangalore metro routes, stations, timings
bangalore metro routes, stations, timingsbangalore metro routes, stations, timings
bangalore metro routes, stations, timings
narinav14
 
5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets
5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets
5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets
Parag Goswami
 
Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...
Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...
Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...
CIOWomenMagazine
 
How To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptxHow To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptx
edqour001namechange
 
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.pptDiscovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Imperial Egypt
 
What Outdoor Adventures Await Young Adults in Montreal's Surrounding Nature
What Outdoor Adventures Await Young Adults in Montreal's Surrounding NatureWhat Outdoor Adventures Await Young Adults in Montreal's Surrounding Nature
What Outdoor Adventures Await Young Adults in Montreal's Surrounding Nature
Spade & Palacio Tours
 
Un viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxxUn viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxx
Judy Hochberg
 
Exploring the Majesty of Nepal: An Unforgettable Tour Experience
Exploring the Majesty of Nepal: An Unforgettable Tour ExperienceExploring the Majesty of Nepal: An Unforgettable Tour Experience
Exploring the Majesty of Nepal: An Unforgettable Tour Experience
Welcome Nepal Treks and Tours
 
Nature of the task 1. write a paragraph about your trip to dubai and what ar...
Nature of the task  1. write a paragraph about your trip to dubai and what ar...Nature of the task  1. write a paragraph about your trip to dubai and what ar...
Nature of the task 1. write a paragraph about your trip to dubai and what ar...
solutionaia
 

Recently uploaded (20)

Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!
Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!
Ready for Cold Weather Rafting Here's What to Wear to Stay Comfortable!
 
Educational Tour Operators - Edutour.pdf
Educational Tour Operators - Edutour.pdfEducational Tour Operators - Edutour.pdf
Educational Tour Operators - Edutour.pdf
 
How To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptxHow To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptx
 
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
 
The Ultimate Travel Guide to Hawaii Island Hopping in 2024
The Ultimate Travel Guide to Hawaii Island Hopping in 2024The Ultimate Travel Guide to Hawaii Island Hopping in 2024
The Ultimate Travel Guide to Hawaii Island Hopping in 2024
 
Discover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat TripDiscover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat Trip
 
Top 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdfTop 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdf
 
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
 
Explore Architectural Wonders and Vibrant Culture With Naples Tours
Explore Architectural Wonders and Vibrant Culture With Naples ToursExplore Architectural Wonders and Vibrant Culture With Naples Tours
Explore Architectural Wonders and Vibrant Culture With Naples Tours
 
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdfHow Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
 
Excursions in Tahiti Island Adventure
Excursions in Tahiti Island AdventureExcursions in Tahiti Island Adventure
Excursions in Tahiti Island Adventure
 
bangalore metro routes, stations, timings
bangalore metro routes, stations, timingsbangalore metro routes, stations, timings
bangalore metro routes, stations, timings
 
5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets
5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets
5-Day Nathdwara Tour Itinerary: From Temples to Traditional Markets
 
Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...
Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...
Golden Gate Bridge: Magnificent Architecture in San Francisco | CIO Women Mag...
 
How To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptxHow To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptx
 
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.pptDiscovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
 
What Outdoor Adventures Await Young Adults in Montreal's Surrounding Nature
What Outdoor Adventures Await Young Adults in Montreal's Surrounding NatureWhat Outdoor Adventures Await Young Adults in Montreal's Surrounding Nature
What Outdoor Adventures Await Young Adults in Montreal's Surrounding Nature
 
Un viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxxUn viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxx
 
Exploring the Majesty of Nepal: An Unforgettable Tour Experience
Exploring the Majesty of Nepal: An Unforgettable Tour ExperienceExploring the Majesty of Nepal: An Unforgettable Tour Experience
Exploring the Majesty of Nepal: An Unforgettable Tour Experience
 
Nature of the task 1. write a paragraph about your trip to dubai and what ar...
Nature of the task  1. write a paragraph about your trip to dubai and what ar...Nature of the task  1. write a paragraph about your trip to dubai and what ar...
Nature of the task 1. write a paragraph about your trip to dubai and what ar...
 

Big data meetup budapest adding data schemas to snowplow

  • 1.    Adding  Data  Schemas  to   Snowplow   Big  Data  Budapest  Meetup  -­‐  5  June  2014  
  • 2. Agenda  today   1.  Introduc;on  to  Snowplow   2.  Evolu;on  of  Snowplow   3.  The  answer:  schema  all  the  things!   4.  Snowplow  roadmap   5.  Ques;ons  
  • 4. Snowplow  is  an  open-­‐source  web  and  event  analy8cs  pla<orm,   first  version  released  in  early  2012   •  Co-­‐founders  Alex  Dean  and  Yali  Sassoon  met  at   OpenX,  the  open-­‐source  ad  technology  business   in  2008   •  ASer  leaving  OpenX,  Alex  and  Yali  set  up  Keplar,   a  niche  digital  product  and  analy;cs  consultancy   •  We  released  Snowplow  as  a  skunkworks   prototype  at  start  of  2012:                    github.com/snowplow/snowplow   •  We  started  working  full  ;me  on  Snowplow  in   summer  2013  
  • 5. We  wanted  to  take  a  fresh  approach  to  web  analy8cs   •  Your  own  web  event  data  -­‐>  in  your  own  data  warehouse   •  Your  own  event  data  model   •  Slice  /  dice  and  mine  the  data  in  highly  bespoke  ways  to  answer  your   specific  business  ques;ons   •  Plug  in  the  broadest  possible  set  of  analysis  tools  to  drive  value  from  your   data   Data  warehouse  Data  pipeline   Analyse  your  data  in   any  analysis  tool  
  • 6. By  spring  2013  we  had  arrived  at  a  rela8vely  stable  batch-­‐based   processing  architecture   Website  /  webapp   Snowplow  Hadoop  data  pipeline   CloudFront-­‐ based  event   collector   Scalding-­‐ based   enrichment   on  Hadoop   JavaScript   event  tracker   Amazon   RedshiS  /   PostgreSQL   Amazon  S3   or   Clojure-­‐ based  event   collector  
  • 8. Snowplow  is  evolving  from  a  web  analy8cs  pla<orm  into  a   general  event  analy8cs  pla<orm   Data  warehouse   Collect  event  data   from  any  connected   device  
  • 9. Web  analysts  work  with  a  small  number  of  event  types  –  outside   of  web,  the  number  of  possible  event  types  is…  infinite   Web  events   All  events   •  Page  view   •  Order   •  Add  to  basket  •  Page  ac;vity   •  Game  saved   •  Machine  broke  •  Car  started   •  Spellcheck  run   •  Screenshot  taken  •  Fridge  empty   •  App  crashed   •  Disk  full  •  SMS  sent   •  Screen  viewed   •  Tweet  draSed  •  Player  died   •  Taxi  arrived   •  Phonecall  ended  •  Cluster  started   •  Till  opened   •  Product  returned   ∞  
  • 10. There  are  two  historic  approaches  to  dealing  with  the  explosion   of  possible  event  types   Web  analy8cs  vendors   Mobile  and  app  analy8cs  vendors   Custom  Variables   Schema-­‐less  JSONs  
  • 11. Custom  variables  are  very  restric8ve     1.  Take  a  standard  web  event,  like  a  page  view:   2.  and  add  custom  variables  un;l  it  becomes  something  totally  different:                                            =  a  “taxi  arrived”  event,  kind  of!   Page  View   Page  View   vehicle=taxi23   status=arrived  +   +  
  • 12. Schema-­‐less  JSONs  are  beWer,  but  they  have  a  different  set  of   problems   Issues  with  the  event  name:   •  Separate  from  the  event  proper;es   •  Not  versioned   •  Not  unique  –  HBO  video  played   versus  Brightcove  video  played   Lots  of  unanswered  ques;ons  about  the   proper;es:   •  Is  length  required,  and  is  it  always  a   number?   •  Is  id  required,  and  is  it  always  a  string?   •  What  other  op;onal  proper;es  are   allowed  for  a  video  play?   Other  issues:   •  What  if  the  developer   accidentally  starts  sending   “len”  instead  of  “length”?  The   data  will  end  up  split  across   two  separate  fields   •  Why  does  the  analyst  need  to   keep  an  implicit  schema  in   their  head  to  analyze  video   played  events?  
  • 13. The  answer:  schema  all  the   things!  
  • 14. When  a  developer  or  analyst  defines  a  new  event  in  JSON,  let’s   ask  them  to  create  a  JSON  Schema  for  that  event   Addi;onal  op;onal  field  we  might   not  know  about  otherwise   No  other  fields   allowed   Yes  length  should  always  be  a   number  
  • 15. But  we  need  to  let  our  event  defini8ons  evolve,  so  let’s   add  versioning  –  we’re  calling  this  SchemaVer   MODEL-REVISION-ADDITION! •  Start  versioning  at  1-­‐0-­‐0  –  so  1-­‐0-­‐0,  1-­‐0-­‐1,  1-­‐0-­‐2,  1-­‐1-­‐0  etc   •  Try  to  s;ck  to  backwards-­‐compa;ble  ADDITION  upgrades  as  much   as  possible  
  • 16. Where  are  our  schemas  going  to  live?  We  need  a  schema   repository/registry   Schema  repo  {}! Enrichment   Manager   Raw  events   in  JSON   format   Enriched   events  in   ThriS  or   Arvo   format   Shredder   1.  Test   instrumenta;on   2.  Validate   events   3.  Define   structure   4.  Drive   shredding   Enriched   events  in   TSV  ready   for  loading   into  db   5.  Define   structure  
  • 17. We  need  to  namespace  our  schemas  properly  to  prevent  clashes   and  confusion  in  our  schema  repository   iglu:com.channel2.vod/video_played/jsonschema/1-0-0! We  are  calling  our  schema  methodology  “Iglu”   The  vendor  of  this  event   Event  name   Schema  format   Schema   version  
  • 18. Bringing  it  all  together,  let’s  now  make  the  event  JSONs  self-­‐ describing,  with  a  schema  header  and  data  body  
  • 19. And  for  good  measure,  let’s  add  in  our  schema  informa8on  into   the  JSON  Schema  itself    
  • 21. Self-­‐describing  JSON  Schemas  are  coming  in  the  next  release  of   Snowplow  
  • 22. We  are  also  star8ng  to  define  third-­‐party  events  for  Snowplow   integra8on,  star8ng  with  Zendesk  customer  support  events  
  • 23. Ques8ons?     hlp://snowplowanaly;cs.com   hlps://github.com/snowplow/snowplow   @snowplowdata     To  chat  –  @alexcrdean  on  Twiler  or  alex@snowplowanaly;cs.com