Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Implementing improved and consistent arbitrary event tracking company-wide using Snowplow

1,098 views

Published on

Talk on the role Snowplow plays as part of the larger project to make data accessible to product marketing and other data-driven teams at StumbleUpon. Touches on technical and organizational challenges

Published in: Technology
  • Be the first to comment

Implementing improved and consistent arbitrary event tracking company-wide using Snowplow

  1. 1. Implementing improved and consistent arbitrary event tracking company-wide using Snowplow Nora Paymer Sr. Business & Consumer Insights Analyst, StumbleUpon 10/6/2015 SF Snowplow MeetUp
  2. 2. About me • Hi, I’m Nora • BS & MA in Cognitive Neuroscience – Ask me about sign/speech bilingualism or optical illusions in the brain! • Previous Roles: – UC Berkeley: Institutional Analytics – CBS Interactive: Inventory Analytics – SquareTrade: Marketing/Consumer Insights Analytics • StumbleUpon: Business & Product Analytics
  3. 3. About StumbleUpon • What is StumbleUpon? – Recommendation Engine for the Internet – Ad Platform for native advertisement – Social engagement platform • Still #4 in Referral Traffic* (behind Facebook, Twitter, and Pinterest; ahead of Reddit) • Still alive and kicking! *Shareaholic, Q4 2014 (mot recent data available)
  4. 4. My Role • Data Science Team & Finance/Sales Analytics Team, but no dedicated Product or Business Analytics • When I was hired, I was asked to: – Help Product team be a data-driven culture – Make data more available company-wide • Better & easier to change dashboards • Ability for non-data people to access data – Help clean up Data Pipelines • With support from amazing Data Engineering Team
  5. 5. Problems 1. Data siloed all over the place 2. Data inaccessible to most people
  6. 6. • Other data all over the place • No way to integrate with user/stumble/activity data • Only accessible by a couple people each • Only place to access most real site data • Dashboards all made with R/Shiny • Queries done at terminal, only by Data Science/Analytics Team • Hive/MapReduce is slow for real-time data querying! Data sources Protobuf messages MySQL HBase/ Hive MixPanel FireBase Adjust App AnnieDesk.com Sales Force StrongView
  7. 7. Solutions 1. Copy product data to quicker/more universal data solution 2. Implement BI tool (Looker)
  8. 8. Data sources Protobuf messages MySQL HBase/ Hive MixPanel FireBase Adjust App AnnieDesk.com Sales Force • Send data to RedShift for faster querying • Connect RedShift to Looker: • Dashboards • GUI Query Builder RedShift Looker StrongView
  9. 9. Problems 1. Data siloed all over the place 2. Data inaccessible to most people 3. Difficult for teams to add new events – Only “official” solution was protobuf messages, which was slow and needed to go through Engineering/Data Science/Me just to record a button click – Teams started using MixPanel, which is expensive and limited
  10. 10. Solutions 1. Copy product data to quicker/more universal data solution 2. Implement BI tool (Looker) 3. Replace MixPanel with Snowplow for arbitrary Event Reporting – Sends data to RedShift for easy integration with other data – Easy for teams to add new events
  11. 11. Data Sources Protobuf messages MySQL HBase/ Hive MixPanel FireBase Adjust App AnnieDesk.com Sales Force RedShift Looker Snowplow StrongView
  12. 12. Problems 1. Data siloed all over the place 2. Data inaccessible to most people 3. Difficult for teams to add new events 4. So many teams! So much integration! – Mobile (iOS & Android), Site (back end & front end), Ads, Marketing (including install referral info & email marketing & other), Firefox & Chrome toolbars, etc. etc.
  13. 13. How we did it Intended Plan: 1. Site implements default page tracker 2. Site implements 2-3 events to make sure flow is working properly – Structured Events 3. Assess if everything is working 4. Mobile implements 2-3 events per platform 5. Then roll out everywhere
  14. 14. How we did it What Actually Happened: 1. Site implemented default page tracker 2. Site implemented ~100 events – Structured Events 3. Mobile replaced all MixPanel events with Snowplow – Structured Events – Some trouble with implementation/integration with Android – Used wiki page created by a site engineer, had confusing language, did some things weirdly 4. Testing??
  15. 15. Uh-Oh • Structured Events not really the right thing: • Didn’t have userid implemented properly originally • More fields were going to be needed Snowplow Term Our Use Category Event Name (e.g. thumbup) Action Event Type (e.g. click vs view) Label Platform (site, iOS…) Property Version # Value When event had a value associated with it
  16. 16. So? Switch to Unstructured Events! Easy, right? • OK great, come up with a new framework for Unstructured Events! – Some required fields across all events – Some optional fields that we know will be widely used from day 1 – Nature of unstructured events is that more fields could be added later Field Req’d? Description event_name y Event name platform y site, iOS, Android, etc. device_version y Version number (standard field) event_category n e.g. click; view: useful for filtering event_group n For defining a group of events, for filtering value n For events with a value referrer n Referral source (when applicable)
  17. 17. Sounds good so far! • Teams that had already implemented Unstructured did not want to implement Structured – They had already spent Eng time on this, why spend more? • Everyone is always on a tight timeline – Had trouble seeing the value in the format of their events matching the format of teams they didn’t work with. • Result? Arguments and top-down mandates
  18. 18. What should we have done differently? 1. Program management across all teams – Didn’t have anyone officially in charge 2. Implement in phases: do test events & a test project before going full live 3. Excellent Documentation 4. Get buy-in from everyone from day one 5. Think through dream/far-fetched use cases: what will you need for that? 6. Use Snowplow team for advice!
  19. 19. So now what? • Still working on it • Connecting all existing data pipelines to RedShift, sometimes via Snowplow • Better utilizing Snowplow when back end tracking is too cumbersome – Referral Tracking: both reg and landing page – Better understanding of engagement and Time on Site (for non-stumble pages especially) – Understanding user flow through the site – Etc. etc. etc, hopefully!
  20. 20. Protobuf messages MySQL HBase/ Hive MixPanel FireBase Adjust App AnnieDesk.com Sales Force RedShift Looker Snowplow StrongView New Data!
  21. 21. Thank You! Questions, etc?

×