Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Other 99% of a Data Science Project

1,294 views

Published on

Slides from my talk at Open Data Science Conference 2016.
Algorithms and models are an important (and cool) part of data science. This talk is about all the other steps that it takes to deploy a data science project that makes a product slightly smarter. Stuff that you hear from practitioners, but is not covered well enough in books.

Published in: Software
  • Be the first to comment

The Other 99% of a Data Science Project

  1. 1. THE OTHER 99% OF A DATA SCIENCE PROJECT Open Data Science Conference Santa Clara | November 4-6th 2016 Eugene Mandel @eugmandel
  2. 2. ∎ @eugmandel ∎ lead of data science at directly ∎ formerly: □ data science team at Jawbone □ co-founder qualaroo, jaxtr ABOUT ME
  3. 3. DATA SCIENCE NEEDS PRODUCT MANAGEMENT success of a data science project has as much to do with product management as with data science
  4. 4. 2 KINDS OF DATA SCIENCE B ANALYZE A BUILD
  5. 5. PAY FOR PARKING WITH YOUR PHONE
  6. 6. DON’T YOU KNOW ME?!
  7. 7. ∎ “don’t you know me?!” -> “you get me!” ∎ get smarter with every interaction ∎ reduce search space SMART PRODUCTS
  8. 8. SMART PRODUCTS BUT NOT THAT SMART...
  9. 9. SMART PRODUCTS GO PROBABILISTIC
  10. 10. THE OTHER 99% PERCENT algorithms
  11. 11. Show and explain your web, app or software projects using these gadget templates. PARKING APP ON DEMAND CUSTOMER SUPPORT
  12. 12. LOOKING FOR OPPORTUNITIES
  13. 13. PROBLEM: choose support tickets that expert users can resolve
  14. 14. LOOKING FOR OPPORTUNITIES
  15. 15. CHOOSE RESOLVABLE TICKETS WITH MACHINE LEARNING
  16. 16. GETTING THE DATA
  17. 17. GETTING ALLIES
  18. 18. GETTING THE DATA
  19. 19. CLEAN YOUR DATA Automated bug reports Surveys Bounced emails Internal tickets Email metadata Email threads ...
  20. 20. GUYS CLEAN A DATASET, GET RICH
  21. 21. FEATURE ENGINEERING
  22. 22. TRAINING - COLD START PROBLEM all tickets tickets seen by expert
  23. 23. TRAINING -GET LABELS “Is there a cat in this picture?” “Is this support ticket resolvable?”
  24. 24. TRAINING -GET LABELS ∎ label manually ∎ derive labels from user behavior ∎ derive labels from external sources ∎ mix
  25. 25. My favorite data science algorithm is division. Monica Rogati Former VP of Data, Jawbone & LinkedIn data scientist
  26. 26. Tokenization Bag of words (BOW) Tf–idf Random Forest Classifier MODEL
  27. 27. DEVELOPMENT
  28. 28. PLAYING WELL WITH ENGINEERING ∎ gaining trust ∎ development process
  29. 29. POINTS OF INTEGRATION online or offline?
  30. 30. DEVELOPMENT integration - broad APIs
  31. 31. “NAPKIN ARCHITECTURE”
  32. 32. IS IT WORKING? evaluating data products Image source: https://themouseandthewindmill.wordpress.com
  33. 33. accuracy precision/recall driven by business EVALUATION METRICS
  34. 34. IS IT WORKING? QA’ing data products Image source: https://themouseandthewindmill.wordpress.com
  35. 35. PLAYING WELL WITH DEVOPS
  36. 36. BRIDGING TECH STACKS
  37. 37. IN PRODUCTION
  38. 38. THE KNOBS: HOW TO CONTROL THE PRODUCT ∎ on/off switch per customer ∎ prediction threshold ∎ exclusions
  39. 39. “... SMART…” “... AI …” “...MACHINE LEARNING…” “...INTELLIGENT…” NAMING THINGS
  40. 40. UPDATING THE MODEL ∎ input data changes ∎ users behaviour changes ∎ dataset grows
  41. 41. NEGATIVE SAMPLING send small % of predicted negative as if they were positive predicted positive
  42. 42. NEGATIVE LABELING send small % of predicted negative for manual labeling predicted positive
  43. 43. ∎ “Would you be able to resolve this ticket successfully?” ∎ “Would an expert user be able to resolve this ticket successfully?” ∎ “Would an expert user be able to resolve this ticket successfully without getting a negative rating?” LABELING - HOW TO PHRASE THE QUESTION?
  44. 44. ∎ customers ∎ sales ∎ account managers ∎ marketing ∎ execs MESSAGING
  45. 45. CUSTOMER ENGAGEMENT PLAYBOOK
  46. 46. DATA ETHICS
  47. 47. INTERPRETABILITY Image source:https://en.wikipedia.org/wiki/File:Blue_Poles_(Jackson_Pollock_painting).jpg
  48. 48. THANKS! Eugene Mandel @eugmandel
  49. 49. ∎ Presentation template by SlidesCarnival ∎ Images: □ http://jedismedicine.blogspot.com/ □ Jawbone □ Directly □ Wikipedia □ https://themouseandthewindmill.wordpress.com □ http://www.imdb.com/ CREDITS

×