Talk on the role Snowplow plays as part of the larger project to make data accessible to product marketing and other data-driven teams at StumbleUpon. Touches on technical and organizational challenges
Implementing improved and consistent arbitrary event tracking company-wide using Snowplow
1. Implementing improved and
consistent arbitrary event
tracking company-wide using
Snowplow
Nora Paymer
Sr. Business & Consumer Insights Analyst,
StumbleUpon
10/6/2015
SF Snowplow MeetUp
2. About me
• Hi, I’m Nora
• BS & MA in Cognitive Neuroscience
– Ask me about sign/speech bilingualism or
optical illusions in the brain!
• Previous Roles:
– UC Berkeley: Institutional Analytics
– CBS Interactive: Inventory Analytics
– SquareTrade: Marketing/Consumer Insights
Analytics
• StumbleUpon: Business & Product Analytics
3. About StumbleUpon
• What is StumbleUpon?
– Recommendation Engine for the Internet
– Ad Platform for native advertisement
– Social engagement platform
• Still #4 in Referral Traffic* (behind
Facebook, Twitter, and Pinterest; ahead of
Reddit)
• Still alive and kicking!
*Shareaholic, Q4 2014 (mot recent data available)
4. My Role
• Data Science Team & Finance/Sales
Analytics Team, but no dedicated Product or
Business Analytics
• When I was hired, I was asked to:
– Help Product team be a data-driven culture
– Make data more available company-wide
• Better & easier to change dashboards
• Ability for non-data people to access data
– Help clean up Data Pipelines
• With support from amazing Data Engineering Team
6. • Other data all over the place
• No way to integrate with
user/stumble/activity data
• Only accessible by a couple people each
• Only place to access most real
site data
• Dashboards all made with
R/Shiny
• Queries done at terminal, only
by Data Science/Analytics
Team
• Hive/MapReduce is slow for
real-time data querying!
Data sources
Protobuf
messages
MySQL
HBase/
Hive
MixPanel
FireBase
Adjust
App
AnnieDesk.com
Sales
Force
StrongView
9. Problems
1. Data siloed all over the place
2. Data inaccessible to most people
3. Difficult for teams to add new events
– Only “official” solution was protobuf messages,
which was slow and needed to go through
Engineering/Data Science/Me just to record a
button click
– Teams started using MixPanel, which is
expensive and limited
10. Solutions
1. Copy product data to quicker/more
universal data solution
2. Implement BI tool (Looker)
3. Replace MixPanel with Snowplow for
arbitrary Event Reporting
– Sends data to RedShift for easy integration
with other data
– Easy for teams to add new events
12. Problems
1. Data siloed all over the place
2. Data inaccessible to most people
3. Difficult for teams to add new events
4. So many teams! So much integration!
– Mobile (iOS & Android), Site (back end & front
end), Ads, Marketing (including install referral
info & email marketing & other), Firefox &
Chrome toolbars, etc. etc.
13. How we did it
Intended Plan:
1. Site implements default page tracker
2. Site implements 2-3 events to make sure
flow is working properly
– Structured Events
3. Assess if everything is working
4. Mobile implements 2-3 events per platform
5. Then roll out everywhere
14. How we did it
What Actually Happened:
1. Site implemented default page tracker
2. Site implemented ~100 events
– Structured Events
3. Mobile replaced all MixPanel events with
Snowplow
– Structured Events
– Some trouble with implementation/integration with
Android
– Used wiki page created by a site engineer, had
confusing language, did some things weirdly
4. Testing??
15. Uh-Oh
• Structured Events not really the right thing:
• Didn’t have userid implemented properly
originally
• More fields were going to be needed
Snowplow Term Our Use
Category Event Name (e.g. thumbup)
Action Event Type (e.g. click vs view)
Label Platform (site, iOS…)
Property Version #
Value When event had a value associated with it
16. So? Switch to Unstructured Events! Easy, right?
• OK great, come up with a new framework for
Unstructured Events!
– Some required fields across all events
– Some optional fields that we know will be widely
used from day 1
– Nature of unstructured events is that more fields
could be added later
Field Req’d? Description
event_name y Event name
platform y site, iOS, Android, etc.
device_version y Version number (standard field)
event_category n e.g. click; view: useful for filtering
event_group n For defining a group of events, for filtering
value n For events with a value
referrer n Referral source (when applicable)
17. Sounds good so far!
• Teams that had already implemented
Unstructured did not want to implement
Structured
– They had already spent Eng time on this, why
spend more?
• Everyone is always on a tight timeline
– Had trouble seeing the value in the format of
their events matching the format of teams they
didn’t work with.
• Result? Arguments and top-down mandates
18. What should we have done differently?
1. Program management across all teams
– Didn’t have anyone officially in charge
2. Implement in phases: do test events & a
test project before going full live
3. Excellent Documentation
4. Get buy-in from everyone from day one
5. Think through dream/far-fetched use
cases: what will you need for that?
6. Use Snowplow team for advice!
19. So now what?
• Still working on it
• Connecting all existing data pipelines to
RedShift, sometimes via Snowplow
• Better utilizing Snowplow when back end
tracking is too cumbersome
– Referral Tracking: both reg and landing page
– Better understanding of engagement and Time
on Site (for non-stumble pages especially)
– Understanding user flow through the site
– Etc. etc. etc, hopefully!