Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lifecycle of a Data Science Project


Published on

Know the 'Lifecycle of a Data Science Project'. Gain insights from the webinar led by Mathangi Sri, Data Science Lead, Phonepe

Published in: Education
  • Be the first to comment

  • Be the first to like this

Lifecycle of a Data Science Project

  1. 1. Data Science Life Cycle
  2. 2. Types Of Organizations • Product - Stable business context - Implementation constraints - Data Science solution to align with a product road map - Test and control - Staged Roll outs - Opportunity to create IP
  3. 3. Types Of Organization Service • Problem requirement understanding • Solution to be signed off from stakeholders • Constant Engagement • Lesser constraint on implementation • Data is not standardized • Data Validation is a must and signed off • Limited IP creation
  4. 4. The People In A Data Science Project • Lead Data Scientist • Data Scientists • Engagement Managers • Account Manager • Sales • Platform Owners • Engineering team • Design Team
  5. 5. Problem Requirement Consulting on Opportunities Solutioni ng Identifying success metric and signing on expected outputs Reportin g inputs Implementatio n decisions Roll out discussions Randomization understanding Lead Data Scientist Client Managers Client Lead Data Scientist Client Managers Client Project Manager Product or Platform manager Design Process Before Data 60 % of time spent on this step
  6. 6. Process Continued After Data 40 % Of Time Spent On This Step Lead Data Scientist Data Scientist Model Governance Board Lead Data Scientist Client Managers Client & Client Sponsors Project Manager Product or Platform manager Governan ce and model approvals Model go live Measure ment Roll out Data understandi ng Data Validation Models Lead Data Scientist Data Scientist Optimization Lead Data Scientist Data Scientist Project Manager Product or Platform manager Design
  7. 7. Problem/Requirement • Very vague to very structured Eg: Can you build me a recommendation engine Eg: What would you to do increase sales in our website? Eg: Lets build a prediction model with responder as the dependent variable
  8. 8. Consulting layer • Ask questions – to see if the problem is the real pain or is there something else? • For eg, “When someone wants to build a recco engine all they want is a better engagement in the app”. • Industry is with Buzz words – getting the real solution means understanding the true problem..
  9. 9. Consulting layer • What is the type of implementation the customer wants – is it real time like an app or some notification or is it an insight • Is there a product roadmap that needs to be aligned to • What kind of modelling tool kit (Python/Scala/R) etc can run in the deployment environment
  10. 10. Solution Blue Print • Make a solution blue print • Walk through with stake holders – Product, Engg, client teams etc • Make sure there are no gaps • IP can be filed possibly at this stage
  11. 11. Identifying success metric • Define control group • Define a bench mark • Make sure benchmark does not change with time and is truly neutral • Arrive at the Formula for incremental revenues or incremental sales etc • Sign off on this with stakeholders • Attribution is a key factor.
  12. 12. Attribution Problems • For eg: Number of chats to an agent may go down when a chatbot is launched. But that does not mean chat was handled well. What if the chat bot had not routed failed questions to human agents? • Another possibility because of the chat bot on the web page more customers may come out and try the chat bot and hence may increase the # of chats
  13. 13. Reports • What should the measurement report contain? • What are the metrics? • Things like conversion how do you track? • Does hanging up on an IVR means resolved? • Reporting frequency – what it should be? • Should report tally with any other existing system
  14. 14. Implementation Decisions • The platform and its support • External data may need to be pumped into the platform • Does the platform get real time data and have the ability to process real time models • What is the real time support the platform has • What is the level of complexity the platform provides? • How should the data science model be delivered? • Who is going to support the models? • Design team/Ux and their role How should the model be handed over?. Who is to take ownership of delivery and maintenance
  15. 15. Roadmap for Rollouts • What is the roll out roadmap? • What should be the gates? For eg: E.comm roll outs would be % of websites. In some organization it could be market based roll outs. There could be certain customer groups or segments that could be part of roll out
  16. 16. Randomization Decisions • How is randomness ascertained -- Browser session id vs Visitor session id for e.comm --Customer id based random groups -- it could be callers randomized on caller id -- what is the system that ensures randomness?
  17. 17. Data • Understand data • Understand data distributions especially the dependent variable • Make sanity checks to ensure distributions are in line with problem statement • Make sure data at this stage that gets used in the modelling process is what is available real time or during model execution
  18. 18. Modelling Process Data validation Understanding Data Data Cleaning Pre processing Feature Engineering
  19. 19. Modeling Process - Contd Train, Test and Out of time validation Algorithm Tuning Model iterations Final Model Governance approval Model handover
  20. 20. Model Go Live And Optimization • Iterative process • Validate reports and measure model performance at a definite frequency • Fine tune the model – add more data • Capture more features – instrument or use additional data • Optimize till model stabilizes and revenue or target is met
  21. 21. Optimization - Design • There could also be design optimizations during this stage – For eg the content in a widget or number of stages in a check out • Performance of the model in terms of execution could also be optimized • It could be offers at this stage that could be optimized • Workflows in BOTS could be optimized as well
  22. 22. Sign Off Model Performance By Stakeholders • Stakeholders to sign off and buy in to the lift generated • Resolution of any attribution conflict • If any seasonality or other effects is showing up in model performance, those gets resolved at this stage. Some cases the test or pilot period is extended
  23. 23. Full Roll Out • Model is rolled out to the max possible extent • Revenues need to be realized on an on-going process • Additional opportunities can be sought
  24. 24. Patents And IP • Lots of IP gets generated during a modelling process • IP – novelty, context to the business, increase defensibility • Patent review committee • IDF – Provisional –Queries – Grants (Could take >3 years from filing) • Expensive process • Any idea is a great idea. Always discuss with the patent lawyer • Simple variable could be a competitive differentiator. • Algorithms are not patented. Methods and processes are
  25. 25. Case Study NPS of telco giant Telx has been dipping. DecX is a product and services org that is engaging with Telx to improve NPS. As the chief scientist of DecX what would you do?. Where do you see the NPS going down?. What does DecX do that can bring up the NPS of TelX