Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bridging the gap from data science to service

263 views

Published on

Recent years have brought an explosion of algorithms, models and software libraries for doing data science that allow unprecedented possibilities for solving problems. But providing a data science service as a consultant or a company involves more than just tools. In this talk, I will share the most useful lessons that I learned while working at a company providing these services.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Bridging the gap from data science to service

  1. 1. BRIDGING THE GAP FROM DATA SCIENCE TO SERVICE 32ND PYDATA LONDON MEETUP Daniel F Moisset - dmoisset@ /machinalis.com @dmoisset
  2. 2. ABOUT ME Hi! I'm Daniel Moisset! I work at Machinalis A special thanks to Marcos Spontón who also works there and inspired most of this talk.
  3. 3. WARNING: THIS IS NOT A TECH TALK! In other words: THIS TALK IS NOT ABOUT ALGORITHMS, MODELS, TOOLS, OR USE CASES In event di ferent words: THIS TALK IS ABOUT PEOPLE
  4. 4. SO, RAISIN BREAD By je freyw (Mmm...raisin bread) [ ],CC BY 2.0 via Wikimedia Commons
  5. 5. Machine Learning development is like the raisins in a raisin bread... you need the bread first. But, it's just a few tiny raisins but without it you would just have plain bread — I don't really know who, but I love the analogy
  6. 6. WHO WANTS RAISIN BREAD Di ferent organizations use your services: 1. Large companies with a live product and data, but without enough expertise/manpower in DS: «we'd like to add some raisins to our bread» 2. Small start-up, with maybe just a prototype, that want to get to production-ready scalable MVP: «We want some bread». And «it should have raisins now/at some point in the future»
  7. 7. IS THAT WHAT THEY ACTUALLY NEED? “All the cool kids are doing it” is not good enough reason. — Seen on the internet Raisin cookies that look like chocolate chip cookies are the main reason I have trust issues
  8. 8. PART I: COMMUNICATING WITH THE CUSTOMER
  9. 9. IT'S NOT JUST SOFTWARE DEVELOPMENT! It also has a heavy R&D component Higher uncertainty Results are probabilistic
  10. 10. THERE'S A PAPER ABOUT IT ≠ A PRODUCT The distance may not be something coverable today.
  11. 11. MODELS ARE AN ASSET Investing time on it is not a “necessary evil” What's produced on a modelling phase is a critical component A model emerges from the client data and constraints, so it is unique to the client and an advantage over competitors.
  12. 12. MACHINE LEARNING ≠ CLAIRVOYANCE Garbage in, Garbage out The solution may not be clear; you may be unsure of what problem is more important; but your business goal should be clear. Data Science will not make it clear for you.
  13. 13. AGREEING ON METRICS Explain what are you measuring and why Explain what are the baselines and how much you think you can improve Connect these to the business goals.
  14. 14. A PICTURE IS WORTH A THOUSAND WORDS Visualize your proposal. Be minimalistic. Use o f the shelf tools for a proposal.
  15. 15. PART II: PROVIDING THE SERVICE
  16. 16. THE SERVICE IS THE END, DATA SCIENCE IS THE MEANS Do not fall in love with the challenge
  17. 17. JUST OUT OF THE BOX MAY BE ENOUGH You should always be asking yourself: 1. Have I already covered the expectations? 2. Will an improved result here actually improve value?
  18. 18. MEASURE TWICE, CUT ONCE Get a look at the object of analysis before starting work. Has it desirable qualities? 1. Manageable size? 2. It's in an accessible representation? 3. Does it have a reasonable distribution? 4. ...
  19. 19. INVOLVE THE PO Validate your assumptions with a person familiar with your domain 1. Are there contradictions between your assumptions and their knowledge? 2. Are there contradictions between the data you already have and their knowledge? Keep learning about the business side, encourage your business counterpart to learn to talk with Data Scientists.
  20. 20. PART OF YOUR SERVICE IS NOT DS Make sure you use the right tools and people in each area
  21. 21. PART III: WORKING AS A TEAM
  22. 22. SHARE INFORMATION Basic descriptive statistics should be shared with all involved, even the non DS. People in a team must be aware of what's important and what's not.
  23. 23. SHARE UNCERTAINTY There are a lot of tradeo fs to make regarding milestones and deadlines. People can plan better (and have contingency plans) if they know what parts of the project have higher risks.
  24. 24. IT'S OK TO BUILD FLIMSY CODE, AS LONG AS IT'S NOT SOFTWARE code: programming text that runs on a computer so tware: programming text that is part of a deliverable. There are di ferences: code does not necessarily need tests. code does not necessarily need to follow other processes. sometimes the outputs of your code are deliverable and may have to be treated specially.
  25. 25. THE DISCUSSION IS JUST BEGINNING I'D LOVE TO HEAR ABOUT WHAT YOU'VE LEARNED ELSEWHERE
  26. 26. THANKS! ANY QUESTIONS? You can find me at twitter (@dmoisset) or by email (dmoisset@machinalis.com)

×