The Impact of Big Data On Marketing Analytics (UpStream Software)


Published on

Presenter: Tess Nesbitt, Senior Statistician, UpStream Software
Presentation Date: February 26, 2013

This presentation describes how Hadoop and Revolution R Enterprise provide the predictive analytics models for UpStream's revenue attribution application.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Tess Nesbitt, Statistician and Senior Consultant at Upstream / Business Researchers
  • We are a team of number crunchers, backgrounds in econ, math, statistics, physics, astrophysics, business…. the whole gamut of scientific and technical disciplines Started as BRI, a consulting company but have developed another aspect of the company called Upstream, which has been going for about 2 years where we focus primarily of working on big data problems for marketing revolving bullet 2
  • We hear multi-channel word used a lot in retail, but it is pretty an ambiguous word. We have 2 definitions of channel:Those on the left hand side are where you spend marketing budget, those on the right hand side are purchases are made---we separate these two out so we can see crossings (how much is email driving to store sales, how much is direct mail driving to online sales?)
  • This is an observational data problem---we read in a lot of data: every impression served, every click to the website, every email delivered clicked on every catalog every postcard and all the order data from every channel as well--we look at entire gamut of marketing how you reach customersWe tie this data together and later model it--we borrow techniques from biostatistics and medical research and apply them to this data (outcome instead of die is buy)-once we understand what drives conversion, we can use that to split up orders into channels that drove itwhen you undestandwhat drives sales, you can decide what marketing to buy next--So what we are doing is assigning credit of sales to various types of marketing you are conducting.--when we figure out what drives sales , then we want to move to figuring out how to redirect budget (Targeting)--Strategi Allocation c use this info it to make better decisions about how and when to market to customers--Incremental Response: can see how receptive people are to various types of marketing (reallocate catalog to customers who are most moved by certain treatments))
  • we want to understand co-occurrence of marketing phenomena-most of these survival analysis techniques are for small data, but we apply it to huge data-time-dependent outcome-majority of our inputs are time-dependent covariates-competing risks: survival framework is designed to handle competing risks ------you are exposing people to a cocktail of drugs, and we want to know if was it the aspirin that killed you?
  • Assume we already built a model, what can we do with it?Recency table is in days, sales is in dollars1)Retrospectively - 2 months email is well below the fold, you arent clicking on it (effect has decayed down to nearly zero) agaon so catalog gets credit email gets more credit in second case--we take into account the amplitude of the effect and timing
  • This distribution is what we are up against, what we are trying to modelhighly nonlinearpart of our methodology is to put terms in the model that control for a distribution like this, so we control for this while overlaying marketing treatments
  • we treat upsteam as scoring systemsame scoring system makes data for modelingin Hadoop, we do all the ETL--handle lots of data and files, we create behvioralvariabeles, time between purchases, number of purchases, promotional schedule, etc.Overlay data-demographic datawe push the data out in a cleansed way for survival modeling we use RevoRfor explorating work and modelingwhen these are finished, they are pushed back to Hadoop for scoringscoring for prediction (lift charts, use model for selection,etc.)creating 5 billion scores per day per retailer
  • Retails was double counting their sale s(over 100%)--savvy marketers want ot know this incremental effect--these percentages might not be smae if we only look at web sales, or only retails, this example we have combined all order hcnnales--this is 1 year of data and it is retrospective--could we use this info going forward?
  • The Impact of Big Data On Marketing Analytics (UpStream Software)

    1. 1. The Impact of Big Data on Marketing Analytics FEBRUARY 2013 Powered by: 1
    2. 2. Who we areCompany OverviewExperienced team with a proven history of solving difficult analyticsproblems for Fortune 500 companiesCloud-based software to manage marketing’s big data problems:customer level revenue attribution and multi-channel optimization, triggeredmarketing, and planning and reportingLocations San Francisco, Seattle, and Hyderabad 2
    3. 3. Marketing Analytics GoalsIdentify the most profitable Target the right customers Understand what the spendchannels for every customer at the right time with the right in each marketingand the most profitable message. channel contributes to sales.customers for every channel. “Advanced Revenue Attribution” 3
    4. 4. Challenges with Multi-Channel RetailMulti-channel marketers are unsure where to spend their next dollar.Messy data with many Don’t understand how spending No easy way to identify themarketing and order channels, on marketing affects conversion most profitable channels for everydisparate databases, various customerexecution platforms 4
    5. 5. How do you approach the problem?Enable retailers to conduct customer-level analysis onbig data to understand what motivates individuals to buy.Assemble and standardize Apply the rigor of a medical Identify and attribute Know whomall of a marketer’s data into researcher with patented the revenue drivers to reacha Hadoop cluster methodology 5
    6. 6. Advanced Revenue AttributionWhat is it?Data-driven time-to-event statistical modeling used to establish an objective and accurate revenue distribution, alldone at the individual user levelWhat are Common Attribution Buckets?“Big Data” platform that handles and connects all of a company’s online and offline data (sales, webanalytics logs, catalog and email send data, display and search advertising logs, etc.)Augment marketing campaign data with supplementary information to correctly distribute variance acrossall contributing factors (i.e. Customer Driven (Store Location, Seasonal Factors), Special Cased (BrandedSearch, Economic Conditions)How is it different?Modeling is done at the customer level – facilitates both the micro and macro level analyses in tandem for the most comprehensive insights that a marketer can extract – empowers marketers to customize their strategies at this very same granular levelFocus on modeling time effectively enables the targeting of specific customers with specific treatments atspecific times 6
    7. 7. Attribution Using Time Dependent Models JANUARY FEBRUARY MARCH APRIL MAY JUNE Customer PURCHASE $100 PURCHASE 1 catalog email catalog Customer PURCHASE $100 PURCHASE 2 catalog email catalog email 2 Customer PURCHASE $100 PURCHASE 3 catalog search catalog 1 email catalog 2 email 2 affiliate search 1 RECENCY OF TREATMENTS SALES ALLOCATION customer sales catalog email search affiliate catalog email search affiliate #1 $ 100 20 40 0 0 $ 99.98 $ 0.02 $ - $ - #2 $ 100 20 15 0 0 $ 81.84 $ 18.16 $ - $ - #3 $ 100 72 60 10 30 $ 40.64 $ 0.01 $ 47.03 $ 12.32 7
    8. 8. Exploratory Work 8
    9. 9. Transformations (Catalog vs Email) Catalog Email 9
    10. 10. Architecture: Hadoop – Revolution IntegrationCurrent State: Revo v6 • Functions to read Hadoop output; xdf creation CUSTOM VARIABLESUPSTREAM DATAFORMAT (UDF) • Exploratory data analysis (PMML) • GAM survival models • ETL • Scoring for inference • N marketing channels • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per customer • Overlay data 10
    11. 11. Why Revolution R?We used to prep data and build models with SAS / WPSCurrent Hardware: Linux CentOS 6We switched to Revolution R for the following reasons:Cost effectiveComprehensive and easy-to-use statistical packages (especially familiar for people coming from academia)Scale & Performance (increase 4x with Revo Scale R) • (RevoScaleR) rxLogit on 36MM rows and 30 variables (full input data is 68MB) data runs in under 4 minutes • Descriptive and modeling functions operate on compressed xdf files to preserve disk spaceBeautiful graphics with high degree of user controlOpen source environment enables the best and brightest in both academia and industry to contribute Rpackages every day; unlimited growth potentialOngoing Revo support – extremely receptive team to work with 11
    12. 12. Case Study: Top Multi-Channel Retailer 180%Attribution 160%Impact Direct LoadPresented results that were contrary to 140%company’s expectation; client validated Otherresults internally 120% SearchWithin 3 months, reallocated $5MM 100%marketing budget to another channel Display Remarketingwith more changes to follow 80% Customer Driven/Trade AreaInsights 60% CatalogMarketing is responsible for ~50% of overall 40% Othersales (offline and online). The other half Searchaccount for the customer’s buying habit and 20% Display Remarketingstore trade area. Email Catalog EmailEcommerce significantly more influenced by 0%marketing than retail or call-center channels Before AfterDirect Load: UpStream credits marketingactivities that drove user “navigation” towebsite. 12
    13. 13. Case Study: Top Multi-Channel RetailerOptimizationImpactAlready field tested head-to-head against industry leading model+14% lift in response rate+$270K in new revenue in a single campaignReallocated marketing circulation: identified best prospects to not mail that were likely topurchase without receiving catalogScored 22MM households with 9 models all in the cloud 13
    14. 14. SummaryThe World is Changing:The way customers are purchasing services is changingManaging marketing budgets in the multi-channel world is challengingUnderstanding attribution is critical to successfully deploy your marketing budgetTo Be Successful, Your Attribution Solution Should:Cover all of your dataBoth online and offlineBe statistically relevantGuess work doesn’t countScalable and flexibleMake sure you have the right technology platform and tools 14
    15. 15. Appendix 15
    16. 16. Example FindingsGoogle keywords often perform worse than you thinkIn many cases 20-40% worseDisplay Advertising performs better than you thinkCertain types of display, such as retargeting, performs better than you think and can have strong influenceespecially at retail stores, which most attribution tools fail to pick upCustom loyalty has the most impact at the retail storeOften retail sales are due to habit and loyalty, but the same trend doesn’t hold onlineRetail sales are influenced by the presence of a store near homeUnfortunately the inverse is also true, web purchases are not typically driven by having a store nearbySeasonal is much stronger at Internet than Retail or Call CenterThe impact of season purchasing is almost double that of retailTenure of customers show significant differencesNewer customers are more sensitive to marketing, seasonal factors, and store area than establishedcustomers (based on tenure). 16