Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Year Of Data Science at Metail
Matt McDonnell - Data Scientist
Business Context
Startup: “A group of people operating in an environment of uncertainty
striving for a repeatable and scal...
A scalable startup needs a Customer Factory
Figure adapted from ‘Scaling Lean’ by Ash Maurya https://leanstack.com/scaling...
A look behind the curtain – what’s the data?
See Metail in action:
http://metail.myshopify.com?utm_source=DataInsightsNov2...
The road to Data Science
• Understand the data
• Learn the tools
• Build the analytics for business intelligence
• More so...
My experience prior to Metail
Careers
• Physics Postdoc
Oxford, Griffith
• Technical Consultant
MathWorks
• Quant Develope...
My experience since joining Metail
Lots of event stream data
Many AWS components
Outputs:
- Business Intelligence
- Bespok...
Tools to learn
Tools we used a year ago
• R for analysis and science
• dplyr, tidyr, ggplot
• Looker for some of the analy...
Data Analytics
Business intelligence
• How well is the customer factory working? (KPIs)
• What about if we do this? (A/B T...
Data Analytics
Raw Events Engagement States Analytics Model
(Looker demo goes here if time allows)
Data Science
Exploring Digitised Garments
Event  Data
{
"schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
"data": {
"schema": "",
"d...
Spread of digitised garments
• Look at positions of all digitised garments for a given category.
• page is in units of #sc...
Views by garment position
• Aggregate visitors who see garment ‘X’ in a given
category on a given date.
• Scale these visi...
Views by category
• Look at positions of all digitized garments for a given category.
• ‘page’ is in units of #scrolls (ba...
Views as time series
• Digitised garments on /women-dress over time
• The “hotspot” moves further down the page: most disc...
Data Science
Exploring User Body Shapes
BMI Quantiles
BMI: 17.6
Height: 160cm
Weight: 45kg
BMI: 19.9
Height: 157cm
Weight: 49kg
BMI: 22.2
Height: 153cm
Weight: 52...
Our Shape Segmentation
Spoon Triangle Bottom Hourglass Rectangle Hourglass Top Hourglass Inverted Triangle
Adapting the shape segmentation rules of the Lee et al. (2007) paper used by FFIT
Users Segmented by Shape
Hips – Waist (c...
Shape Distribution and Popular Garments
Engagement by Shape
% of users trying on at least two garments on personalised MeModel
1SD
Data Science
Learning User Behaviour
Understanding Users
Event stream summary over a month
Visits by day of month
All users
Distinct types
Of users
Machine Lea...
Data Driven User Segmentation
Distinct types
Of users
Use Machine Learning techniques to characterise which features defin...
Identify clusters: engaged and converted users
Cluster Labels into Redshift /
Looker
Acquisition
Rate
RPV
Seen Size
Advice...
Acquisition
Retention Reuse
Retention Revisit
Deep Funnel
Revenue
Revenue
674 users 595 users 541 users 721 users 312 user...
Future plans: more MODELLING!
Some possibilities:
• Use engagement clustering to create labels for supervised learning
• E...
Bayesian inference – what are the variables?
(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an ...
Bayesian inference – how are things related?
(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an ...
Bayesian inference – what can we infer?
(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actua...
That’s all folks!
Questions?
Upcoming SlideShare
Loading in …5
×

A Year of Data Science at Metail

385 views

Published on

Presentation given at Cambridge Data Insights Meetup on 3rd November 2016 on use of Data Science at Metail in 2016.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

A Year of Data Science at Metail

  1. 1. A Year Of Data Science at Metail Matt McDonnell - Data Scientist
  2. 2. Business Context Startup: “A group of people operating in an environment of uncertainty striving for a repeatable and scalable business model“
  3. 3. A scalable startup needs a Customer Factory Figure adapted from ‘Scaling Lean’ by Ash Maurya https://leanstack.com/scaling-lean-book/
  4. 4. A look behind the curtain – what’s the data? See Metail in action: http://metail.myshopify.com?utm_source=DataInsightsNov2016 (Scary UTM code is there so I don’t have to spend the next week digging into ‘Who are these mysterious visitors?’) Live Demo Starts Here! Sheepish explanation of why it’s not working starts here
  5. 5. The road to Data Science • Understand the data • Learn the tools • Build the analytics for business intelligence • More sophisticated data analysis for deeper understanding • Apply machine learning techniques • Develop models for prediction and decision making
  6. 6. My experience prior to Metail Careers • Physics Postdoc Oxford, Griffith • Technical Consultant MathWorks • Quant Developer Fidelity Worldwide Investment • Quant Analyst Fidelity Worldwide Investment Tools used: (plus some Java, C#, Excel and VBA when I had to) Understanding the data and tools
  7. 7. My experience since joining Metail Lots of event stream data Many AWS components Outputs: - Business Intelligence - Bespoke Analysis - Productionised Science
  8. 8. Tools to learn Tools we used a year ago • R for analysis and science • dplyr, tidyr, ggplot • Looker for some of the analysis Tools we use now • Python • pandas, SQLAlchemy, boto3, seaborn • Still some R • dplyr, tidyr, ggplot • Looker for most of day to day analysis • Swagger • AWS stack
  9. 9. Data Analytics Business intelligence • How well is the customer factory working? (KPIs) • What about if we do this? (A/B Tests) • How’s our retention? (Cohort analysis) • How efficiently are we digitising garments? (Process monitoring) • How are we growing? To answer this we need … LOTS AND LOTS OF SQL! (yay.) Most of it embedded in Looker LookML (basically YAML) (yay - again.)
  10. 10. Data Analytics Raw Events Engagement States Analytics Model (Looker demo goes here if time allows)
  11. 11. Data Science Exploring Digitised Garments
  12. 12. Event  Data { "schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0", "data": { "schema": "", "data": { "name": "GarmentCoverage", "data": { "page": { "garments": 24, "garmentsWithCtas": 14, "scrollPosY": 201, "load": { "isInitiator": false, "elapsedTimeMs": 1424 } }, "batch": { "garments": 12, "garmentsWithCtas": 7, "ctas": [ { "sku": "32536", "x": 0.2721021611002, "y": 1.6311844077961 }, { "sku": "32544", "x": 0.51768172888016, "y": 1.6311844077961 }, { "sku": "32545", "x": 0.51768172888016, "y": 1.0134932533733 }, { "sku": "32548", "x": 0.51768172888016, "y": 0.39580209895052 }, { "sku": "53282", "x": 0.76326129666012, "y": 0.39580209895052 }, { "sku": "53337", "x": 0.026522593320236, "y": 1.0134932533733 }, { "sku": "134499", "x": 0.2721021611002, "y": 0.39580209895052 } ] } } } } } GarmentCoverage event "scrollPosY": 201, "garmentsWithCtas": 7, { "sku": "32544", "x": 0.51768172888016, "y": 1.6311844077961 },
  13. 13. Spread of digitised garments • Look at positions of all digitised garments for a given category. • page is in units of #scrolls (based on browser height on the user’s device) • Digitised garments on /women-dress and /women-tops-tees are more spread out than garments on /women-jeans
  14. 14. Views by garment position • Aggregate visitors who see garment ‘X’ in a given category on a given date. • Scale these visitor counts by the maximum #visitors for a garment on that date in that category. • In the /women-dress category: • Digitised garments are spread between 0 and 120 page scrolls with median ~40 • Long “tail” of digitised garments which get much fewer visits. • The average digitised garment typically gets 20% of the visitors as the most popular garment in that category (on a given day). Date url_path sku Users Page scaled_count 2016-01-01 /women- dress 101742 699 5.0 0.743617 2016-01-01 /women- dress 101743 700 4.0 0.744681
  15. 15. Views by category • Look at positions of all digitized garments for a given category. • ‘page’ is in units of #scrolls (based on browser height on the user’s device) • Digitised garments on /women-dress and /women-tops-tees are more spread out than digitised garments on /women-jeans. Could also be that there are more digitised garments in /women-tops-tees. • There are some “hotspots” of digitised garment positions e.g. ~page 100 for /women-tops-tees. Unfortunately, they are quite far down the category page and visitor counts are typically around 10-20% of the values for the most popular garments (closest to the top of the category page) /women-tops-tees /women-jeans /women-dress
  16. 16. Views as time series • Digitised garments on /women-dress over time • The “hotspot” moves further down the page: most discernibly in the last 2 weeks.
  17. 17. Data Science Exploring User Body Shapes
  18. 18. BMI Quantiles BMI: 17.6 Height: 160cm Weight: 45kg BMI: 19.9 Height: 157cm Weight: 49kg BMI: 22.2 Height: 153cm Weight: 52kg BMI: 25.8 Height: 146cm Weight: 55kg BMI: 29.7 Height: 155cm Weight: 71kg
  19. 19. Our Shape Segmentation Spoon Triangle Bottom Hourglass Rectangle Hourglass Top Hourglass Inverted Triangle
  20. 20. Adapting the shape segmentation rules of the Lee et al. (2007) paper used by FFIT Users Segmented by Shape Hips – Waist (cm) Bust–Waist(cm)
  21. 21. Shape Distribution and Popular Garments
  22. 22. Engagement by Shape % of users trying on at least two garments on personalised MeModel 1SD
  23. 23. Data Science Learning User Behaviour
  24. 24. Understanding Users Event stream summary over a month Visits by day of month All users Distinct types Of users Machine Learning Techniques
  25. 25. Data Driven User Segmentation Distinct types Of users Use Machine Learning techniques to characterise which features define users in each cluster
  26. 26. Identify clusters: engaged and converted users Cluster Labels into Redshift / Looker Acquisition Rate RPV Seen Size Advice Rate
  27. 27. Acquisition Retention Reuse Retention Revisit Deep Funnel Revenue Revenue 674 users 595 users 541 users 721 users 312 users Try-ons (any model) A first look at the clusters
  28. 28. Future plans: more MODELLING! Some possibilities: • Use engagement clustering to create labels for supervised learning • Engagement prediction using trained machine learning • Apply Probabilistic Graphical Modelling techniques • (I quite like Daphne Koller’s Coursera course and book https://www.coursera.org/learn/probabilistic-graphical-models/home/welcome ) • More Bayesian reasoning • … (any suggestions?) Time permitting, SAMIAM (http://reasoning.cs.ucla.edu/samiam/) demo goes here
  29. 29. Bayesian inference – what are the variables? (Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
  30. 30. Bayesian inference – how are things related? (Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
  31. 31. Bayesian inference – what can we infer? (Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
  32. 32. That’s all folks! Questions?

×