Funnel Analysis in Hadoop at Etsy
Upcoming SlideShare
Loading in...5
×
 

Funnel Analysis in Hadoop at Etsy

on

  • 7,158 views

As an ecommerce site with more than 800,000 different sellers, Etsy is particularly interested in understanding how shoppers find the items they seek. Part of this understanding involves attributing ...

As an ecommerce site with more than 800,000 different sellers, Etsy is particularly interested in understanding how shoppers find the items they seek. Part of this understanding involves attributing successful purchases to specific features on the site. This attribution model allows us to compare and refine Etsy’s features, but also provides valuable signals for A/B testing, search quality, and recommenders. However, the path to a successful handmade purchase often involves multiple features over the course of several visits.

This talk will discuss the challenges of funnel analysis at Etsy and the corresponding deficiencies of several widely used web analytics tools. We’ll then dive into our event sequence matching tool, which we’ve successfully applied to hundreds of millions of visits in a single Hadoop job and is widely used across our big data stack. Finally, we’ll take a look at some of our applications of the tool and compare it to related work.

Statistics

Views

Total Views
7,158
Views on SlideShare
7,125
Embed Views
33

Actions

Likes
8
Downloads
106
Comments
0

2 Embeds 33

https://twitter.com 27
http://lanyrd.com 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Funnel Analysis in Hadoop at Etsy Funnel Analysis in Hadoop at Etsy Presentation Transcript

  • Funnel Analysis in Hadoop at Etsy Steve Mardenfeld, Wil Stuckey, Matt WalkerTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Handmade MarketPlaceTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Data Driven Development • Use data to make informed decisions • Use data to evaluate the efficacy of our productsTuesday, March 12, 13
  • Funnel AnalysisTuesday, March 12, 13
  • Registration FunnelTuesday, March 12, 13
  • Registration Funnel 50,000 10,000 100% 20%Tuesday, March 12, 13
  • Registration Funnel 50,000 45,000 15,000 10,000 100% 90% 33% 60% 100% 90% 29% 20%Tuesday, March 12, 13
  • Funnels ++ • Funnels are more than just an optimization tool • Use them to understand different pathways throughout our site • Partition and compare these pathways by attributes • A/B tests, categories, queries, cohorts • Attribution modelTuesday, March 12, 13
  • Attribution Models • Tie conversions and successes to specific products and actions • Use to gain understanding of our users’ interaction with Etsy • Help us to measure gains in A/B testing • Easily compare different varieties of the same product • Attribution techniques for internal and external attributionTuesday, March 12, 13
  • Attributions IllustratedTuesday, March 12, 13
  • Attributions Illustrated • last product interaction - browseTuesday, March 12, 13
  • Attributions Illustrated • last product interaction - browse • co-occurrence - search and browseTuesday, March 12, 13
  • Attributions Illustrated • last product interaction - browse • co-occurrence - search and browse • direct funnel attribution - searchTuesday, March 12, 13
  • Indirect Attribution IllustratedTuesday, March 12, 13
  • Indirect Attribution Illustrated • direct funnel attribution - shop homeTuesday, March 12, 13
  • Indirect Attribution Illustrated • direct funnel attribution - shop home • indirect funnel attribution - searchTuesday, March 12, 13
  • Aggregate ResultsDirect Search Clicked Listing Purchased Events 50,000 25,000 5,000Tuesday, March 12, 13
  • Aggregate ResultsDirect Search Clicked Listing Purchased Events 50,000 25,000 5,000Indirect Search Clicked Listing Shop Clicked Listing Purchased Events 50,000 25,000 10,000 5,000 3,000Tuesday, March 12, 13
  • Aggregate ResultsDirect Search Clicked Listing Purchased Events 50,000 25,000 5,000Indirect Search Clicked Listing Shop Clicked Listing Purchased Events 50,000 25,000 10,000 5,000 3,000Combined Search Clicked Listing Purchased Events 50,000 25,000 8,000Tuesday, March 12, 13
  • Segmenting Within FunnelsOld Algorithm Search Clicked Listing Purchased Counts 50,000 20,000 15,000 New Hotness Search Clicked Listing Purchased Counts 100,000 60,000 15,000Tuesday, March 12, 13
  • Segmenting Within FunnelsOld Algorithm Search Clicked Listing Purchased Step 100% 50% 40% To t a l 100% 50% 20% New Hotness Search Clicked Listing Purchased Step 100% 60% 25% To t a l 100% 60% 15%Tuesday, March 12, 13
  • Segmenting Across Funnels * Clicked Listing Purchased Search 100% 50% 40% Browse 100% 40% 30% Home 100% 60% 36% Activity Feed 100% 62% 28% Ta s t e Te s t 100% 47% 31% Search Ads 100% 45% 38%Tuesday, March 12, 13
  • Democratized Funnels How do we make this awesomeness available for everyone?Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Awesome infrastructure but... • Must be an engineer to write your own queries • Engineering resources become the bottleneck • Hard to scale as the company growsTuesday, March 12, 13
  • So we want to: • Allow the data engineers to focus on higher priority things. • Allow people to answer their own questions.Tuesday, March 12, 13
  • How do we do this?Tuesday, March 12, 13
  • The Path of Funnel Tools google spreadsheets internal tools • Funnel Cake • Feature FunnelTuesday, March 12, 13
  • Funnel CakeTuesday, March 12, 13
  • Version 1Tuesday, March 12, 13
  • Real TimeTuesday, March 12, 13
  • Lets do it in PHP!Tuesday, March 12, 13
  • Why?Tuesday, March 12, 13
  • It’s the “Etsy Way” • Already have existing infrastructure • Operationally stable • The same tools everyone else is using means a better adoption rate across EngineeringTuesday, March 12, 13
  • • Event Stream • Code runs on every page view • Simple matching system • (ab)Used memcached as a temporary storage • Rolled up to DB every min (near real time)Tuesday, March 12, 13
  • What happened?Tuesday, March 12, 13
  • Broader adoption! Over 75 Funnels were createdTuesday, March 12, 13
  • Shiny Visualized ResultsTuesday, March 12, 13
  • • Funnels had to be setup ahead of time (no backfills) • Reconciliation is hard • Limited to events in our web clickstream (ios/android would be excluded) • Scaling and Operational issues • Difficult to maintain multiple stacksTuesday, March 12, 13
  • Turns out that we don’t make Product decisions in real time http://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analyticsTuesday, March 12, 13
  • Version 2Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Getting back on the elephant • Able to carry over the User Interface from v1 • Standardized event sessionization • Operationally supported infrastructure • Nightly batch processTuesday, March 12, 13
  • Feature FunnelTuesday, March 12, 13
  • Attribution for ALLTuesday, March 12, 13
  • Built in metrics • Click through rate • Listing impressions • Purchase rate - • Purchase from shop rate • Favorite from shop rate • More...Tuesday, March 12, 13
  • Built in metrics • Click through rate - Direct Attribution • Listing impressions - Direct Attribution • Purchase rate - Direct Attribution • Purchase from shop rate - Indirect Attribution • Favorite from shop rate - Indirect Attribution • More... -Tuesday, March 12, 13
  • Segmentation • Primarily used for A/B test analysis • Arbitrary segmentationTuesday, March 12, 13
  • How do we get it?Tuesday, March 12, 13
  • How do we get it? select (clicks/visits * 100.0) as "CTR"  from feature_funnel   where event_type = search   and ab_test = sitewide   and epoch_s = 1361318400   and group_name = ALL_GROUPINGS;Tuesday, March 12, 13
  • The Future?Tuesday, March 12, 13
  • Features • Eliminate the need for an engineer to write the queries • Robust segmentation • Not be limited to visit sessions • Run Ad Hoc queriesTuesday, March 12, 13
  • Build your own Funnels show the builder ui hereTuesday, March 12, 13
  • Mechanics of Funnel AnalysisTuesday, March 12, 13
  • Input • Event sequences • Typically visits to website • Sessionization scheme left to developerTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Funnel • Search • Listing • CartTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Query • Only discussed event types • Funnel steps require additional constraints: • Listing referred by search • Added that listing to cartTuesday, March 12, 13
  • Search query “dinosaur” listing_ids 119469855, 90583707, ... “http://www.etsy.com/search/handmade/patterns? loc q=dinosaur&order=most_relevant&view_type=gallery&ship_to=ZZ” ... ...Tuesday, March 12, 13
  • Listing listing_id 119469855 “http://www.etsy.com/search/handmade/patterns? ref q=dinosaur&order=most_relevant&view_type=gallery&ship_to=ZZ” shop_id 7415158 ... ...Tuesday, March 12, 13
  • Cart added_listing_id 119469855 cart_listing_ids 119469855, ... cart_type guest ... ...Tuesday, March 12, 13
  • Query • Listing referred by search • Added that listing to cartTuesday, March 12, 13
  • Search query “dinosaur” listing_ids 119469855, 90583707, ... “http://www.etsy.com/search/handmade/patterns? loc q=dinosaur&order=most_relevant&view_type=gallery&ship_to=ZZ” ... ...Tuesday, March 12, 13
  • Listing listing_id 119469855 “http://www.etsy.com/search/handmade/patterns? ref q=dinosaur&order=most_relevant&view_type=gallery&ship_to=ZZ” shop_id 7415158 ... ...Tuesday, March 12, 13
  • Listing listing_id 119469855 “http://www.etsy.com/search/handmade/patterns? ref q=dinosaur&order=most_relevant&view_type=gallery&ship_to=ZZ” shop_id 7415158 ... ... listing.ref == search.locTuesday, March 12, 13
  • Cart added_listing_id 119469855 cart_listing_ids 119469855, ... cart_type guest ... ...Tuesday, March 12, 13
  • Cart added_listing_id 119469855 cart_listing_ids 119469855, ... cart_type guest ... ... cart.added_listing_id == listing.listing_idTuesday, March 12, 13
  • Query • Search • Listing & Referred • Cart & AddedListingTuesday, March 12, 13
  • Pattern Matching • Apply query to event sequence • Select out tuples of matching eventsTuesday, March 12, 13
  • What About Incomplete Funnels?Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • nullTuesday, March 12, 13
  • null nullTuesday, March 12, 13
  • Funnel Analysis • Replace events with 1 • Replace nulls with 0 • SumTuesday, March 12, 13
  • null null nullTuesday, March 12, 13
  • 1 1 1 1 1 null 1 null nullTuesday, March 12, 13
  • 1 1 1 1 1 0 1 0 0Tuesday, March 12, 13
  • 3 2 1Tuesday, March 12, 13
  • min(3, 1) min(2, 1) min(1, 1)Tuesday, March 12, 13
  • 1 1 1Tuesday, March 12, 13
  • What Do We Keep?Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Search Listing Cart query “dinosaur” listing_id 119469855 added_listing_id 119469855 “http://www.etsy.com/search/handmade/patterns? listing_ids 119469855, 90583707, ... ref q=dinosaur&order=most_relevant&view_type=gal cart_listing_ids 119469855, ... lery&ship_to=ZZ” “http://www.etsy.com/search/handmade/patterns? loc q=dinosaur&order=most_relevant&view_type=gal shop_id 7415158 cart_type guest lery&ship_to=ZZ” ... ... ... ... ... ...Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Segmenting FunnelsTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Search Listing Cart query “dinosaur” listing_id 119469855 added_listing_id 119469855 “http://www.etsy.com/search/handmade/patterns? listing_ids 119469855, 90583707, ... ref q=dinosaur&order=most_relevant&view_type=gal cart_listing_ids 119469855, ... lery&ship_to=ZZ” “http://www.etsy.com/search/handmade/patterns? loc q=dinosaur&order=most_relevant&view_type=gal shop_id 7415158 cart_type guest lery&ship_to=ZZ” ... ... ... ... ... ...Tuesday, March 12, 13
  • Listing Cart listing_id 119469855 added_listing_id 119469855 “http://www.etsy.com/search/handmade/patterns? ref q=dinosaur&order=most_relevant&view_type=gal cart_listing_ids 119469855, ... lery&ship_to=ZZ” query shop_id 7415158 cart_type guest ... ... ... ...Tuesday, March 12, 13
  • Cart added_listing_id 119469855 cart_listing_ids 119469855, ... query listing_id cart_type guest ... ...Tuesday, March 12, 13
  • query listing_id cart_typeTuesday, March 12, 13
  • Segmented Funnel Analysis • Extract segmenting properties • Compute indicators as before • Group on segmenting properties and sumTuesday, March 12, 13
  • MapReduce • Work is done map-side • Common first step in our jobs • Expensive computation limited to first round mappersTuesday, March 12, 13
  • Event Sequence Pattern MatchingTuesday, March 12, 13
  • Components • Predicate: matches/rejects events • Query: tuple of predicates • Match: tuple of eventsTuesday, March 12, 13
  • Match Predicates • Select an event based on: • Full event sequence • Prior matched events • Current candidateTuesday, March 12, 13
  • Match Predicate DSL • Combine predicates with logical operators • val Query = Seq(Search, Listing & Referred, Cart & AddedListing)Tuesday, March 12, 13
  • Semantics • Fixed number of events in match • Arbitrary number of matches per sequence • Collect and extend all partial matchesTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Search Listing Shop Listing Cart Home Search Listing SearchTuesday, March 12, 13
  • Search, Listing, Shop, Listing, Cart, Home, Search, Listing, SearchTuesday, March 12, 13
  • Search, Listing, Shop, Listing, Cart, Home, Search, Listing, Search Query Root Search Listing & Referred Cart & AddedListingTuesday, March 12, 13
  • Search, Listing, Shop, Listing, Cart, Home, Search, Listing, Search Query Root ? Search Listing & Referred Cart & AddedListingTuesday, March 12, 13
  • Listing, Shop, Listing, Cart, Home, Search, Listing, Search Query Root Search Search Listing & Referred Cart & AddedListingTuesday, March 12, 13
  • Listing, Shop, Listing, Cart, Home, Search, Listing, Search Query Root ? Search Search ? Listing & Referred Cart & AddedListingTuesday, March 12, 13
  • Shop, Listing, Cart, Home, Search, Listing, Search Query Root Search Search Listing & Referred Listing Cart & AddedListingTuesday, March 12, 13
  • Shop, Listing, Cart, Home, Search, Listing, Search Query Root ? Search Search ? Listing & Referred Listing ? Cart & AddedListingTuesday, March 12, 13
  • Listing, Cart, Home, Search, Listing, Search Query Root Search Search Listing & Referred Listing Cart & AddedListingTuesday, March 12, 13
  • Listing, Cart, Home, Search, Listing, Search Query Root ? Search Search ? Listing & Referred Listing ? Cart & AddedListingTuesday, March 12, 13
  • Cart, Home, Search, Listing, Search Query Root Search Search Listing & Referred Listing Cart & AddedListingTuesday, March 12, 13
  • Cart, Home, Search, Listing, Search Query Root ? Search Search ? Listing & Referred Listing ? Cart & AddedListingTuesday, March 12, 13
  • Home, Search, Listing, Search Query Root Search Search Listing & Referred Listing Cart & AddedListing CartTuesday, March 12, 13
  • Home, Search, Listing, Search Query Root ? Search Search ? Listing & Referred Listing ? Cart & AddedListing CartTuesday, March 12, 13
  • Search, Listing, Search Query Root Search Search Listing & Referred Listing Cart & AddedListing CartTuesday, March 12, 13
  • Search, Listing, Search Query Root ? Search Search ? Listing & Referred Listing ? Cart & AddedListing CartTuesday, March 12, 13
  • Listing, Search Query Root Search Search Search Listing & Referred Listing Cart & AddedListing CartTuesday, March 12, 13
  • Listing, Search Query Root ? Search Search Search ? ? Listing & Referred Listing ? Cart & AddedListing CartTuesday, March 12, 13
  • Search Query Root Search Search Search Listing & Referred Listing Listing Cart & AddedListing CartTuesday, March 12, 13
  • Search Query Root ? Search Search Search ? ? Listing & Referred Listing Listing ? ? Cart & AddedListing CartTuesday, March 12, 13
  • Query Root Search Search Search Search Listing & Referred Listing Listing Cart & AddedListing CartTuesday, March 12, 13
  • null null nullTuesday, March 12, 13
  • Tuesday, March 12, 13
  • Tuesday, March 12, 13
  • Match Tree • Purely functional data structure • Holds matched events and indices • Match prefixes sharedTuesday, March 12, 13
  • Match Tree Algorithm • Fold over sequence accumulating tree • May extend any non-terminal node • Each level in tree corresponds to predicateTuesday, March 12, 13
  • Practicality • Explodes, but • Queries are constant length • Sequences are bounded (visits) • Predicates constrain growthTuesday, March 12, 13
  • Summary • Why funnels are interesting • What we’ve built with them at Etsy • Our approach to funnel analysisTuesday, March 12, 13
  • Questions? • Steve Mardenfeld • Wil Stuckey @quiiver • Matt Walker @data_daddyTuesday, March 12, 13