Advertisement
Advertisement

More Related Content

Similar to [DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick(20)

More from DataScienceConferenc1(20)

Advertisement

[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick

  1. The Ethos of Process in ML Getting Comfortable with Uncomfortable (Again)
  2. process noun A series of actions or steps taken in order to achieve a particular end
  3. Image adapted from The ML Test Score: A Rubric for ML ProductionReadinessand Technical Debt Reduction
  4. Image credit: InterpretableMachine Learning Book, https://christophm.github.io/interpretable-ml-book/terminology.html
  5. Image Credit: Jimmy Whitaker,Pachyderm, https://www.pachyderm.com/blog/completing- machine-learning-loop/
  6. Image adapted from The ML Test Score: A Rubric for ML ProductionReadinessand Technical Debt Reduction
  7. Tech is easy, process is hard
  8. Tech is easy, process is hard What are the things we may overlook in our approach to process?
  9. Get comfortable with uncomfortable
  10. Doubt is an uncomfortable condition, but certainty is a ridiculous one VOLTAIRE
  11. Models are iterative, so should be the approach to process
  12. Stop trying to justify Data Science $$$
  13. Image Credit: giphy.com
  14. Calculating ROI in ML is hard
  15. Calculating ROI in ML is hard Too many cost estimation unknowns Complex if product already exists Justifying the experiments that never get deployed Investment across a huge range of the data suite, from infrastructure to tools, through to governance and insight Gains (added value) can come in many forms, not necessarily easy to standardise
  16. Attribution problem These calculations are complicated because the value isn’t all in one number, it’s often spread across multiple departments, teams, business units and programmes of work. Measuring ROI for a data science project can end up as a project in itself, which is often difficult to justify.
  17. Image Credit: Andrew Ng,Introductionto Machine Learning in Production,Coursera Tell compelling ML stories mapped to business metrics
  18. Make maintenance an even bigger part of your Business As Usual
  19. Steps may get done, but a model will never be Image Credit: BusinessInsider, https://www.businessinsider.com/how-netflix-has-looked- over-the-years-2016-4?
  20. Steps may get done, but a model will never be Be confident and make regular time to: ● Do qualitative and quantitative user research on your models and outputs ● Kill off/deprecate projects that are no longer in use ● Similarly for projects with high maintenance resource/cost and low user consumption ● Bring projects from MVP to mature ● Reviewing all in production projects, assessing ethics, governance, bias, data divergence and edge cases, both through automated testing and manual reliability evaluation
  21. Address your own needs
  22. Image Credit: https://tristramshepard.wordpress.com/2019/01/10/mr-glibblys- square-world/square-peg-round-hole-gif-gif/
  23. Ensure you have the right people
  24. Data Scientist ML Engineer MLOps Engineer Data/Analytics Engineer ML Scientist
  25. The “Data Translator” The “Data Visualiser” The “Data Explorer” The “Data Modeler” The “Data Product Owner” Data Scientist ML Engineer MLOps Engineer Data/Analytics Engineer ML Scientist The “Data Storyteller”
  26. The “Data Translator” The “Data Visualiser” The “Data Explorer” The “Data Modeler” The “Data Product Owner” Data Scientist ML Engineer MLOps Engineer Data/Analytics Engineer ML Scientist The “Data Storyteller”
  27. Delivery Manager Product Manager Change Manager Business Analyst User Research Data Scientist ML Engineer MLOps Engineer Data/Analytics Engineer ML Scientist User Design & Experience Governance & Strategy Solution Architect
  28. The choice you made is less important than having made the choice
  29. Ethics and governance are not steps in the process, but are fundamental to it
  30. Get ok with people not getting ML
  31. Image adapted from Andy Scherpenberg,What Companies ThinkAI Is
  32. But… Organisational alignment is still vital Image Credit: geek-and-poke.com
  33. process noun A series of actions or steps taken in order to achieve a particular end
  34. process noun A series of actions or steps taken in order to achieve a particular end
  35. process noun A series of actions or steps taken in order to achieve a particular end
  36. process noun A series of actions or steps taken in order to achieve a particular end outcome
  37. We focus so much on building good outcomes in Machine Learning. If we focus more on continuing to build good process, our outcomes are more likely to be successful.
  38. Thank you! Please connect with me: @LK_Fitzpatrick Leanne Kim Fitzpatrick

Editor's Notes

  1. When we think of process in Data Science we generally think of the crisp-dm framework
  2. And this was roughly born out of the software development lifecycle in systems/software engineering
  3. The software development process has developed practically - automation & tooling to aid SDLC. This practice is known as DevOps - shorter dev cycles, speedier deployment, increased deployment dependability. Key to this was; Version control — managing code in versions, tracking history, roll back to a previous version if needed CI/CD — automated testing on code changes, remove manual overhead Agile software development — short release cycles, incorporate feedback, emphasize team collaboration Continuous monitoring — ensure visibility into the performance and health of the application, alerts for undesired conditions Infrastructure as code — automate dependable deployments We start with the code repository, adding unit tests to ensure that the addition of a new feature doesn’t break functionality. Code is then integration tested. That’s where different software pieces are tested together to ensure the full system functions as expected. The system is then deployed and monitored, collecting information on user experience, resource utilization, errors, alerts and notifications that inform future development.
  4. These practices and tools work amazingly for code but not necessarily for machine learning and data In ML logic is no longer coded, but learned. Quality of data is as important as quality of code. Data is quite different from code; Data is bigger in size and quantity Data grows and changes frequently Data has many types and forms (video, audio, text, tabular, etc.) Data has privacy challenges, beyond managing secrets (GDPR, CCPA, HIPAA, etc.) Data can exist in various stages (raw, processed, filtered, feature-form, model artifacts, metadata, etc.) We also have a separate stage in the development process - model training - where code & data is combined to create a model or artefact
  5. The key thing to see here is that every single one of these mistakes were caused by data bugs not code problems. If the data is bad it doesn’t matter how good your code was. Testing for bugs in our code, we also need to test for bugs in our data - our testing and monitoring system has to reflect that
  6. And this results in an update to our Data Science process to the Crisp-ML process framework, which takes account for both the research & business understanding phase, as well as the iterative development and deployment stages.
  7. Despite a proliferation of tools to help us with this process, it’s going to take a while for the stack we need to fully come together. The tools themselves are rapidly evolving and changing and even our categories are shifting. This makes it incredibly complex to navigate the tools you need to do machine learning as we’re developing them at the same time as figuring out if we actually need them. In reality it may never converge to a future neat end to end stack, and if it does we’re probably still many years away from it.
  8. It’s timely to remind ourselves that in reality Machine Learning (and Data Science) is still an incredibly nascent field, and therefore it’s ok that we’re still figuring things out. Much like the story of Goldilocks and the 3 bears
  9. We should be continually scrutinizing, analysing and evaluating the way we do data science and ML.
  10. The idea of proving out £ ROI is very appealing as it gives everyone a “safe” feeling around perceived to be risky data science or machine learning projects. Plus who doesn’t want to say their team is rolling in cash?!
  11. However, ROI in ML is hard.
  12. However, ROI in ML is hard. Forms: Efficiencies & optimisations, new opportunities,
  13. Maximum likelihood estimation for our NLP algorithm, mapped through the stages to impact on users and the difference for the business. Focus should be on influence through stories and the narrative around the impact that Machine Learning can have on the business. If you end up in a position where the business is now getting you to justify the existence of ML internally, then something has gone wrong somewhere else and no amount of $ signs is going make a difference at that stage. I have nothing specifically against using guiding light $ note values to justify the value of a team - but don’t let your business take them too seriously as it can spiral downhill quickly!
  14. We’re comfortable with products evolving, but who here can think of that model they built back in 2017 that’s still churning out some prediction around user’s next actions that’s not been updated since? And now there’s hundreds of new potential user actions that have never been addressed. Generally we’re quite poor at thinking of our services as products, they need TLC
  15. We can all think of that one person who desperately still needs to run you that TB consuming regression model that fed in all revenue points - no instead he can get an excel report from finance!
  16. In reality I think
  17. The choice you made is less important than having made the choice
  18. Good decisions don’t always have a good outcome - and that’s ok! Making a choice is paramount to moving forward. Likely the FT wouldn’t have made the choice on the tool stack we have now if we were to pick it today, but it’s what was best at the time and we continue to evolve as our needs require it.
  19. Going to have to do a heck of a lot more advocacy and championing
  20. Going to have to do a heck of a lot more advocacy and championing
  21. And the other reality is that businesses don’t care so much about all the intricacies of these data & code processes. In fact a lot of organisations still question what Machine Learning actually is.
Advertisement