Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Snowplow Analytics and Looker at Oyster.com

1,449 views

Published on

Presentation by Ben Hoyt and Devon Pohl on their journey with Snowplow at Oyster.com, presented at the first Snowplow Meetup New York in March 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Snowplow Analytics and Looker at Oyster.com

  1. 1. SNOWPLOW AND LOOKER AT OYSTER.COM SNOWPLOW MEETUP NYC – MARCH 30, 2016 BEN HOYT, DEVON POHL
  2. 2. WHAT IS OYSTER.COM? • “The Hotel Tell-All” • Authentic hotel reviews and photos • We visit every hotel in person • 1000 hotels per month • 7M high-res photos • 100k 360° panoramas
  3. 3. (SOME OF) OUR TECH STACK • Python to run our backend: web, scripting, photo processing, ETL • PostgreSQL for all content data (eg: hotels, metadata for 12M images) • Amazon S3 for image storage, EC2 spot instances for photo processing • Amazon Redshift for analytics and reporting data • Looker for reporting and visualizations • for analytics tracking and analytics ETL
  4. 4. GOOGLE ANALYTICS V. SNOWPLOW Google Analytics • Good for web, but little control and flexibility • Hard to get data out of (your data!) • Crazy pricing model ($0 for free tier, or $150,000/y for premium) • Can only do web analytics, not other business reporting Snowplow • Free and open source, with great support and paid tiers • Puts data into a standard, easily-queryable database (Redshift) • Focuses on tracking and analytics ETL and does that part well
  5. 5. WHY & HOW WE SWITCHED (1 YEAR AGO) • We were considering Looker for reporting and visualization • Looker rep: “majority of our customers use Snowplow to collect their data” • We dug into Snowplow and liked what we saw • Initially the design felt a bit overkill, but it’s definitely built to scale • We implemented the tracking and pipeline, and haven’t looked back
  6. 6. OUR CONTEXT SCHEMA • We use one “custom fields” schema to rule them all • Simple, one table, one SQL join gives us all our custom fields { "self": {"name": "custom_fields", "vendor": "com.oyster", "version": "1-0-9"}, "properties": { "page_type": {"type": "string"}, "page_subtype": {"type": "string"}, "template_type": {"type": "string", "enum": ["desktop", "mobile"]}, "hotel_id": {"$ref": "#/definitions/positiveInteger32"}, "account_id": {"$ref": "#/definitions/positiveInteger32"}, "ab_cell": {"type": "integer", "minimum": 1, "maximum": 20}, "checkin_date": {"type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"}, ...
  7. 7. OUR DATASET • A large, though not a massive, dataset • Redshift cluster: 6 dc1.large SSD nodes, ~1TB storage • 640 million rows in our events table • We add 1.5 million event rows per day • We copy (a subset of) our PostgreSQL content database into Redshift nightly • Enables business reporting and advanced content-based queries
  8. 8. PAGE TRACKING EXAMPLE
  9. 9. ANALYTICS AND LOOKER (DEVON POHL)
  10. 10. REPORTING • Snowplow and content data are merged to provide insights into: • Product • A/B testing • Funnel mapping • Marketing • SEO monitoring • Ad Campaigns • Operations • Workflow Optimization • ROI Modeling • Business Trends • Traffic • Revenue
  11. 11. VISIT TABLE • Event data is large and granular – often hard to digest • Most valuable pre-processing we do is building the visit table • Incremental build Python ETL run on Redshift • This is key to most of our reporting infrastructure • Combines events, custom fields data • This visit table: • Is user and user-session-ID granular • Includes counts of a variety of event types • Includes all information associated with first event of a visit • A/B testing cells • Referral information • Etc.
  12. 12. LOOKER • Looker is our core data exploration and reporting tool • Web-based YAML + visualization wrapper on Redshift • Enables non-technical business owners self-serve reporting and explore • Used for other pre-processing via persistent derived tables (PDTs) • PDTs are temporary tables built and managed by Looker defined by a query • Good for small-to-medium size pre-processing • Applications include de-duping and revenue attribution
  13. 13. DASHBOARDS / SAVED REPORTS
  14. 14. EXPLORATION
  15. 15. OYSTER.COM The Hotel Tell-All

×