Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

An Agile Approach to Machine Learning Slide 1 An Agile Approach to Machine Learning Slide 2 An Agile Approach to Machine Learning Slide 3 An Agile Approach to Machine Learning Slide 4 An Agile Approach to Machine Learning Slide 5 An Agile Approach to Machine Learning Slide 6 An Agile Approach to Machine Learning Slide 7 An Agile Approach to Machine Learning Slide 8 An Agile Approach to Machine Learning Slide 9 An Agile Approach to Machine Learning Slide 10 An Agile Approach to Machine Learning Slide 11 An Agile Approach to Machine Learning Slide 12 An Agile Approach to Machine Learning Slide 13 An Agile Approach to Machine Learning Slide 14 An Agile Approach to Machine Learning Slide 15 An Agile Approach to Machine Learning Slide 16 An Agile Approach to Machine Learning Slide 17 An Agile Approach to Machine Learning Slide 18 An Agile Approach to Machine Learning Slide 19 An Agile Approach to Machine Learning Slide 20 An Agile Approach to Machine Learning Slide 21 An Agile Approach to Machine Learning Slide 22 An Agile Approach to Machine Learning Slide 23 An Agile Approach to Machine Learning Slide 24 An Agile Approach to Machine Learning Slide 25 An Agile Approach to Machine Learning Slide 26 An Agile Approach to Machine Learning Slide 27 An Agile Approach to Machine Learning Slide 28 An Agile Approach to Machine Learning Slide 29 An Agile Approach to Machine Learning Slide 30 An Agile Approach to Machine Learning Slide 31 An Agile Approach to Machine Learning Slide 32 An Agile Approach to Machine Learning Slide 33 An Agile Approach to Machine Learning Slide 34 An Agile Approach to Machine Learning Slide 35 An Agile Approach to Machine Learning Slide 36 An Agile Approach to Machine Learning Slide 37 An Agile Approach to Machine Learning Slide 38 An Agile Approach to Machine Learning Slide 39 An Agile Approach to Machine Learning Slide 40 An Agile Approach to Machine Learning Slide 41 An Agile Approach to Machine Learning Slide 42 An Agile Approach to Machine Learning Slide 43 An Agile Approach to Machine Learning Slide 44 An Agile Approach to Machine Learning Slide 45
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

2 Likes

Share

Download to read offline

An Agile Approach to Machine Learning

Download to read offline

Machine learning has become an important tool in the modern software toolbox, and high-performing organizations are increasingly coming to rely on data science and machine learning as a core part of their business. eBay introduced machine learning to its commerce search ranking and drove double-digit increases in revenue. Stitch Fix built a multibillion dollar clothing retail business in the US by combining the best of machines with the best of humans. And WeWork is bringing machine-learned approaches to the physical office environment all around the world. In all cases, algorithmic techniques started simple and slowly became more sophisticated over time. This talk will use these examples to derive an agile approach to machine learning, and will explore that approach across several different dimensions. We will set the stage by outlining the kinds of problems that are most amenable to machine-learned approaches as well as describing some important prerequisites, including investments in data quality, a robust data pipeline, and experimental discipline. Next, we will choose the right (algorithmic) tool for the right job, and suggest how to incrementally evolve the algorithmic approaches we bring to bear. Most fancy cutting-edge recommender systems in the real world, for example, started out with simple rules-based techniques or basic regression. Finally, we will integrate machine learning into the broader product development process, and see how it can help us to accelerate business results

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

An Agile Approach to Machine Learning

  1. 1. Technology An Agile Approach to Machine Learning Randy Shoup VP Engineering
  2. 2. Background @randyshoup
  3. 3. Technology 1. The Problem
  4. 4. What problem are you trying to solve?
  5. 5. Agree on what you are optimizing
  6. 6. Technology @randyshoup • aka “Optimization Function” or “One Metric That Matters” • Discussing and agreeing on this metric is itself valuable • Only very few metrics, preferably one Overall Evaluation Criterion (OEC) • E.g., Actions vs. click rate • E.g., Long-term customer value vs. short-term revenue • “Pirate metrics” (AARRR): Acquisition, Activation, Retention, Revenue, Referral Aligned to Business Value • Validated by data science, not solely chosen by product / business • Look for predictive leading indicators • Avoid lagging indicators and vanity metrics Valid and Measurable Evaluating Success Problem
  7. 7. “A problem well-stated is a problem half-solved.” -- Charles Kettering, head of research at GM
  8. 8. Technology Problem Difficulty Problem https://xkcd.com/1425/
  9. 9. Technology 2. The Data
  10. 10. Technology @randyshoup • Many events, only predictive in aggregate • E.g., web search queries, ecommerce clickstream, Netflix viewing metrics Big but Shallow • Few events, each of which is significant • E.g., ecommerce purchases, WeWork event attendance Small but Deep Characterizing Your Data Data
  11. 11. Better data beats a smarter algorithm
  12. 12. Technology @randyshoup • Missing data, partial data • Improperly or inconsistently formatted Clean Data • Consolidated into a single (logical) location so it can be processed or analyzed • Joined together (“enriched”) with other data sources Aggregated Data • Tagged by humans with one or more labels • Required to train supervised models • Complicated and expensive at scale Labeled Data Better Data Data
  13. 13. Technology @randyshoup • More potentially useful attributes • More data sources • Longer retention More Data • Data pipeline to automate collection and aggregation • Move from large batch to mini-batch to streaming data Timely Data Better Data Data
  14. 14. “Data preparation accounts for about 80% of the work of data scientists.” – CrowdFlower survey, 2016 https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#2d58f4ab6f63
  15. 15. Technology 3. The Algorithms
  16. 16. Technology @randyshoup • Encode expert knowledge • Simple set of imperative if-then-else statements • Brittle and primitive • Surprisingly effective Rules and Heuristics • Regression • Decision trees / forests • Collaborative filtering • May be all you need Simple Algorithms • Iterative Optimization / Dynamic Programming • Neural nets • Deep learning • Only when absolutely required Advanced Techniques Algorithmic Evolution Algorithms
  17. 17. Technology @randyshoup • Many real-world problems are best solved through a combination of several algorithms • E.g., Netflix Prize Portfolio / Ensemble Approaches Algorithmic Evolution Algorithms
  18. 18. Technology Model Execution Online Model Execution Algorithms Deploy Model Collect Data Train Model✅ Usage @randyshoup
  19. 19. Technology Offline Model Building Algorithms Model Execution ✅ Model Building Try New Model ✅ @randyshoup
  20. 20. Technology @randyshoup • Many common algorithms are highly accurate, but difficult to interpret • Model can make a decision, but ew cannot “explain” its decision • Particularly important in context of system bias • (+) Decision trees / forests, linear regression • (-) Neural nets, Deep Learning Interpretability / Explainability • Enable data scientists to be self- sufficient in experimenting, building, training, and deploying • End-to-end responsibility for models in production • Write models, deploy models, monitor model performance DevOps for Data Science • Platform-as-a-service for data scientists • Programming model that matches the workflow of a data scientist • Abstract away infrastructure and other details Algorithm Platform Scaling Algorithm Development Algorithms
  21. 21. Technology @randyshoup • Data scientists spin up their own resources • Both ad-hoc execution and repeatable pipelines • Data science-friendly programming model exposes ETL and Matrix transforms • Abstracts away storage (S3), computation (Docker and ECS), and the model building pipeline (Spark) Algorithm Platform-as-a-Service Algorithms
  22. 22. Technology 4. The Experiments
  23. 23. “It doesn’t matter how beautiful your theory is. It doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.” -- Richard Feynman
  24. 24. Technology @randyshoup • What metrics do you expect to move, and why • Understand your baseline 1. State Your Hypothesis • Sample size based on effect size • Separate control and treatment groups, test for bias • Split traffic between control and treatment 2. Design a Real A|B Test • Understand customer and system behavior • Understand why this experiment worked or did not 3. Obsessively Log and Measure Designing and Running Experimental Discipline
  25. 25. Technology @randyshoup • Data trumps hope and intuition • Develop insights for the next experiment 4. Listen to the Data • This is a journey, not a single step 5. Rinse and Repeat Designing and Running Experimental Discipline
  26. 26. Technology @randyshoup Listen to the Data Experimental Discipline • 1/3 of ideas were positive and statistically significant • 1/3 of ideas were flat: no statistically significant difference • 1/3 of ideas were negative and statistically significant https://exp-platform.com/experiments-at-microsoft/
  27. 27. “Being wrong isn’t a bad thing, like they teach you in school. It is an opportunity to learn something.” -- Richard Feynman
  28. 28. Technology @randyshoup • Low-risk, push-button deployment • Rapid release cadence • Rapid rollback and recovery Repeatable Deployment Pipeline • Faster to repair • Easier to understand • Simpler to diagnose Smaller Units of Work • Changes can be rolled out and rolled back • Learnings can be applied in the next experiment Enables Experimentation Continuous Delivery Experimental Discipline
  29. 29. Technology @randyshoup • Flag controls whether feature is “on” for a particular set of users • Independently discovered at eBay, Yahoo, Google • Decouple feature delivery from code delivery Enable / Disable feature via configuration • Develop / test / verify in production • Rapid on or off for any reason Makes Speed Safe • Overall experiment controlled by feature flag • Control vs. treatment Enables Experimentation Feature Flags Experimental Discipline
  30. 30. ● Ranking function for search results ○ Small number of hand-tuned factors  Thousands of factors ● Incremental Experimentation ○ Predictive models: query->view, view->purchase, etc. ○ Hundreds of parallel A | B tests ○ Full year of steady, incremental improvements  2% increase in eBay revenue (~$120M / year) @randyshoup Machine-Learned Ranking
  31. 31. ● Reduce user-experienced latency for search results ● Iterative Process ○ Implement a potential improvement ○ Release to the site in an A | B test ○ Monitor metrics –time to first byte, time to click, click rate, purchase rate  2% increase in eBay revenue (~$120M / year) @randyshoup Site Speed
  32. 32. The most dangerous animal is the “HiPPO”
  33. 33. Technology 33 Putting it All Together
  34. 34. Technology Event Recommendations WeWork Member Experience Member Knowledge Graph Skills and Interests Event Feedback Event Recommender Predictive Model @randyshoup
  35. 35. Technology Event Recipes WeWork Member Experience Event Recommender Predictive Model @randyshoup
  36. 36. Technology Get the predicted opening occupancy based on the recommended 1-Click price Adjust the price to see how occupancy will change Occupancy Predictor WeWork Revenue Optimization @randyshoup
  37. 37. Technology Revenue Simulation WeWork Revenue Optimization @randyshoup
  38. 38. Technology Office Attributes Based Pricing Corner office (premium) Offices with high quality views (premium) Calculate and recommend premium and discounts for key office attributes WeWork Revenue Optimization @randyshoup
  39. 39. Technology Example: Recommend alternative usage for unoccupied spaces Fully optimize inventory usage by leveraging demand and profitability predictions Inventory Management WeWork Revenue Optimization @randyshoup
  40. 40. Technology Automatically lay out desk configuration given space constraints Automated Layout WeWork Applied Science @randyshoup
  41. 41. Technology 41 Takeaways
  42. 42. Technology @randyshoup • Identify and frame a clear business problem • … that matters to customers or the business • Define clear metric(s) for success 1. Drive from Business Needs • Single problem • Solve problem end-to-end • Show business results 2. Start Small • Data collection and storage • Data cleanliness and preparation • Reliable, accurate, timely data pipeline • Better data beats a better model (!) 3. Data Matters Takeaways An Agile Approach to Machine Learning
  43. 43. Technology @randyshoup • Start with a Hypothesis • Design an Experiment • Separate Control and Experiment group(s) • Measure business metric for A vs. B • Learn and Decide 4. A | B Testing Discipline • Simple model / No model • Rules and Heuristics • Gradually increase sophistication with more data and more experience 5. Iteratively Refine Model • Find broader applicability across the business • Apply to more and more problems • Move “upstream” in the development process 6. Iteratively Expand Applications Takeaways An Agile Approach to Machine Learning
  44. 44. Technology @randyshoup • Make decisions with data instead of guesswork and intuition • Avoid HiPPO decisionmaking • Can be threatening to designers, product managers, decisionmakers 7. Data-Driven Culture • Set of tools in our toolbox • Sometimes valuable and useful • Not a panacea • Not a substitute for thinking  8. Machine Learning is not Magic Takeaways An Agile Approach to Machine Learning
  45. 45. Technology New York San Francisco Tel Aviv Shanghai Singapore Seattle Palo Alto Questions? @randyshoup
  • ChrisSharpes

    Sep. 21, 2021
  • whilpert

    Jun. 9, 2020

Machine learning has become an important tool in the modern software toolbox, and high-performing organizations are increasingly coming to rely on data science and machine learning as a core part of their business. eBay introduced machine learning to its commerce search ranking and drove double-digit increases in revenue. Stitch Fix built a multibillion dollar clothing retail business in the US by combining the best of machines with the best of humans. And WeWork is bringing machine-learned approaches to the physical office environment all around the world. In all cases, algorithmic techniques started simple and slowly became more sophisticated over time. This talk will use these examples to derive an agile approach to machine learning, and will explore that approach across several different dimensions. We will set the stage by outlining the kinds of problems that are most amenable to machine-learned approaches as well as describing some important prerequisites, including investments in data quality, a robust data pipeline, and experimental discipline. Next, we will choose the right (algorithmic) tool for the right job, and suggest how to incrementally evolve the algorithmic approaches we bring to bear. Most fancy cutting-edge recommender systems in the real world, for example, started out with simple rules-based techniques or basic regression. Finally, we will integrate machine learning into the broader product development process, and see how it can help us to accelerate business results

Views

Total views

528

On Slideshare

0

From embeds

0

Number of embeds

6

Actions

Downloads

18

Shares

0

Comments

0

Likes

2

×