(Re-)introducing SnowPlow


Published on

A deck describing what we believe is wrong with web analytics, in 2012, and how we have architected SnowPlow to address those problems in a fresh way

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

(Re-)introducing SnowPlow

  1. 1. Introducing SnowPlow A new approach to web analytics
  2. 2. A lot is wrong with web analytics today… • Focus on marketing-related analytics (visits, click-throughs, conversions) Narrow • Focus on ecommerce sites. (Limited number of goals, limited set of clearly defined workflows e.g. sign up to email, purchase product) focused • No analytics for SaaS based businesses, drivers of customer value, product analytics • Hard to perform analyses on users / customers that span multiple visits • Hard to examine the ways users actually engage on sites (esp. for SaaS / web apps), aggregate Inflexible customer journeys • Hard to map and segment users based on their behaviour and customer journeys • Limited tools to pick out the root cause of differences in customer journey Too high level AND • Too high level: impractical or impossible to zoom in on individual customers and events too low • Too low level: hard to see the wood for the trees in a sea of data / pre-defined views level level • Hard to integrate with other sources of customer data including CRM, email marketing, social marketing, customer service, financial systems ad serving systems Siloed • Typically separated from other business intelligence system, with each system used to answer different types of business questions
  3. 3. …with bad consequences for businesses Hard to export web analytics data to answer Cannot answer important business questions questions in other systems• Questions related to the customer base • Two reasons to export our data: – Who are our most valuable customers? – So that we can answer business questions using this – How can I spot them in advance? data in another (more appropriate) system – What are the “sliding doors” moments in a customer’s – So that we can use this data in other value generating journey that impact their future value? ways e.g. drive product / content – How does our customer base break down, by recommendation, service personalisation behaviour? • Sometimes impossible, – How well do I serve each segment? – Impossible to export granular data out of Google – How well do I monetize each segment? Analytics – Where are the best opportunities for growing the • Otherwise expensive value of my customer base? – Enterprise web analytics products charge for export• Product development questions based on data volumes, making export expensive for large data sets – How successful has each product iteration been at driving user engagement? • Hard to house exported data – Does our product work better for some customer – Web analytics systems generate big data volumes of segments than others? If so, why? data, which can be costly to warehouse and query – Does our product work better at some parts of the customer journey than others? Where? – Where should we focus product development efforts?
  4. 4. SnowPlow takes a radically new approach to webanalytics… Traditional approach SnowPlow approach 1. What reports 1. What is all the do we want available data to deliver? that we could ever want? 2. What data do 2. What tools will we collect to empower our support those analysts to reports? answer any possible biz Q?
  5. 5. …one that starts from the principal of having all thedata Capture all data • All data is captured via easy-to-implement JavaScript tags • Light-weight event tracking makes it easy to capture any type of online behaviour • No limits on the number, type or categories of events or variables that can be assigned • Data is stored in Amazon S3 for scalability • Data can be enriched from other 1st and 3rd party sources. (Data can be exported and imported) Complete data ownership • Data capture is via 1st party cookies • Javascript tracking and ETL source code is open source • All data is stored in SnowPlow users’ own Amazon S3 accounts Powerful • Latest big data and cloud computing technologies for data storage and querying analytics toolset • Data is queried using Facebook-developed Apache Hive via Elastic MapReduce, making it easy to run queries against enormous data sets • Possible to run any big data analytics toolset (e.g. Mahout, Cascalog, Microstrategy) on SnowPlow data
  6. 6. To date, SnowPlow users can query data using ApacheHive, which is great for analysts but bad for business users Hive is a datawarehousing platform SnowPlow data is stored in a single Hive table Built on top of Hadoop: scalable Each line of data represents one event (e.g. Developed at Facebook, but now widely used page view, add-to-basket, video play, ad view at e.g. Netflix, OpenX, The Globe and Mail. etc) Enables analysts to query data using SQL Each line of data includes a user_id and visit_id Pros Cons • Easy for anyone with SQL knowledge to run queries • Command-line interface not suitable for many • Straightforward to aggregate data business people • Straightforward to ingest new data sources to • No in-built data visualisation capability. (Have to enrich the web analytics data (e.g. CRM export data to a separate application) data, media catalogues) • KPI dashboards can be driven from Hive • Interactive UI allows for ad hoc query development analysis, but always require the integration of sessions another application • Straightforward to export aggregated data sets into other tools • Possible to schedule jobs to populate e.g. KPI dashboard
  7. 7. Our priority now is to develop the toolset to answer business questions using all this analytics data SnowPlow web analytics data Operational systems e.g. KPIs and standard reports Ad hoc analytics recommendation engines, marketing• Enable analysts to easily create and • Enable analysts with more limited • Use SnowPlow data in live systems distribute KPI dashboards and SQL and programming knowledge e.g. in-store product reports including on customer to query data e.g. pivot tables, data recommendation… lifetime value and cohort analysis visualisation tools • …or to send personalised marketing • Statistical and machine learning to customers to drive up customer• Reports will vary in scope e.g. for tools to perform e.g. behavioural satisfaction management team, marketing segmentations of customer teams, product development team base, predict likely customer Some of the analytics tools etc. lifetime value we develop will be offered as cloud-based solutions, for a monthly subscription
  8. 8. Whilst many of the tools are not yet developed, werecommend installing SnowPlow today • 1 Start warehousing your web analytics data using SnowPlow today • 2 Start using the already available (free, open source) tools, particularly Apache Hive, to drive insight from your user data today • 3 Have a large data set ready for when our more business friendly analytics tools become available Download SnowPlow from Github Contact Keplar LLP for support and consultancy github.com/snowplow/snowplow www.keplarllp.com