Your SlideShare is downloading. ×
0
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)

2,871

Published on

Contrasting approaches to using data to answer business questions from web analysts and data scientists - and how that is changing the web analytics industry

Contrasting approaches to using data to answer business questions from web analysts and data scientists - and how that is changing the web analytics industry

Published in: Business, Technology
2 Comments
7 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,871
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
33
Comments
2
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Web analytics is dead! Long live event analytics How data scientists and big data tech are killing one industry and creating another What role Snowplow plays
  • 2. Web analytics is a big industry • Spend in the just the US on web analytics software (Adobe Sitecatalyst, Webtrends, Google Analytics Premium etc.) estimated at $500m and growing 17 – 20% p.a. in 2011* • Likely that at that amount is spent again on consulting services related to the use of web analytics data • Whole industry of web consultants e.g.: • Semphonic in the USA (bought by Ernst and Young) • Logan Tod in the UK (bought by PwC) • Big 4 accounting firms only buy businesses they can sell into (tens of) thousands of companies • Whole ecosystem around web analytics • “Digital analytics professionals” – it is a career path (retailers, media agencies) • Events, books, organisations geared towards web analysts *Source: Quora http://www.quora.com/Web-Analytics-what-is-the-size-of-the-web-analytics-market
  • 3. Web analytics is an old industry, predating the recent wave in big data technology Web analytics 1990 Web is born 1993 Big data Log file based web analytics 1996 1997 Javascript tagging … 2004 publishes MapReduce paper 2006 Hadoop project split out of Nutch 2008 Facebook develops Hive 2010 publishes Dremel paper 2011 open sources Storm
  • 4. Two problems with web analytics, that stem from the fact web analytics came of age in the 1990s The web was static, hyperlinked documents Tech to handle massive data sets was prohibitively expensive • The entities and events that web analytics programmes understand is limited • Web analytics programmes aggregate raw data to reduce data volumes • Page views, link clicks, transactions, goals, sessions, visitors • This requires specifying in advance how data can be analysed, so that the data can be ‘precut’ Hard to model the rich interactions in today’s interactive webapps Web analytics reporting is very inflexible
  • 5. In particular, web analytics insistence on aggregating data is an anathema to data scientists Data scientist approach Give me the data and I’ll figure out how to answer the question Web analytics approach You can’t get your answer from one of our pre-canned reports? Have a go with our “advanced report-builder” What if I want to: build a model? Understand underlying causality? Use the data in my web application? Dynamically optimize spend / content?
  • 6. We built Snowplow to address the two weaknesses in the web analytics approach Describe web events in much richer grammar and vocabulary Liberate your data • Where you store your data has a big impact on what types of analyses you can quickly run on it
  • 7. Snowplow is an event data collection and warehousing platform Snowplow data pipeline Website / webapp Amazon S3 Mobile apps Other applications (e.g. on games consoles, connected TVs, desktops, connected devices) Collect Transform and enrich Amazon Redshift / PostgreSQL Other (Neo4J, Big Query…) Snowplow delivers your complete, granular event data in your own data warehouse(s), so you can plugin any tool to analyse it
  • 8. Snowplow is composed of a set of loosely coupled subsystems, architected to be robust and scalable 1. Trackers A Generate event data Examples: • Javascript tracker • Ruby / Lua / No-JS / Arduino tracker 2. Collectors B Receive data from trackers and put it in a queue Examples: • Cloudfront collector • Clojure collector for Amazon EB 3. Enrich C Clean and enrich raw data Built on Scalding / Cascading / Hadoop and powered by Amazon EMR 4. Storage D 5. Analytics Store data ready for analysis Examples: • Amazon Redshift • PostgreSQL • Amazon S3 A D Standardised data protocols
  • 9. Snowplow is open source and cloud-based • Open source but easy to deploy via integration with Amazon Web Services (cloud infrastructure) • Our technology is free! • Collecting massive quantities of digital event data should be easy and cheap… • … so that we can focus time and effort on using the data productively • We charge for Professional Services on top of our platform • More value in how you use the data, than in collecting / storing it • Lots of scope to build applications on top of our platform going forwards
  • 10. Our users…
  • 11. …use our tech to solve some of their most intractable problems • What is the impact of different ad campaigns and creative on the way users behave, subsequently? What is the return on that ad spend? • How do visitors use social channels (Facebook / Twitter) to interact around video content? How can we predict which content will “go viral”? • How do updates to our product change the “stickiness” of our service? ARPU? Does that vary by customer segment?
  • 12. We believe that event data is one of the most exciting data sources to work with, today
  • 13. We are only at the beginning of figuring out how to use this data… • How do we represent different types of event sequence? • What makes journeys similar and what makes them different? How can we cluster them? • How can we “spot” those events that are predictive of future events? Of consumer value? Of consumer interest? • How can we unpick the effects of marketing / digital products and user’s predisposition to the way sequences of events unfold? • How best should we model different users at different points on different types of journeys?
  • 14. We hope people like you will use our tech to do amazing things with the data! Questions? More information • Snowplow repo: https://github.com/snowplow/snowplow • Twitter: @SnowPlowData • Website: http://snowplowanaltyics.com • My LinkedIn: • My Twitter:

×