Data Science Life Cycle
Types Of Organizations
• Product
- Stable business context
- Implementation constraints
- Data Science solution to align with a product road map
- Test and control
- Staged Roll outs
- Opportunity to create IP
Types Of Organization
Service
• Problem requirement understanding
• Solution to be signed off from stakeholders
• Constant Engagement
• Lesser constraint on implementation
• Data is not standardized
• Data Validation is a must and signed off
• Limited IP creation
The People In A Data Science Project
• Lead Data Scientist
• Data Scientists
• Engagement Managers
• Account Manager
• Sales
• Platform Owners
• Engineering team
• Design Team
Problem
Requirement
Consulting on
Opportunities
Solutioni
ng
Identifying
success
metric and
signing on
expected
outputs
Reportin
g inputs
Implementatio
n decisions
Roll out
discussions
Randomization
understanding
Lead Data Scientist
Client Managers
Client
Lead Data Scientist
Client Managers
Client
Project Manager
Product or Platform manager
Design
Process Before Data
60 % of time spent on this step
Process Continued After Data
40 % Of Time Spent On This Step
Lead Data
Scientist
Data Scientist
Model
Governance
Board
Lead Data Scientist
Client Managers
Client & Client Sponsors
Project Manager
Product or Platform
manager
Governan
ce and
model
approvals
Model go
live
Measure
ment
Roll out
Data
understandi
ng
Data
Validation
Models
Lead Data Scientist
Data Scientist
Optimization
Lead Data Scientist
Data Scientist
Project Manager
Product or Platform
manager
Design
Problem/Requirement
• Very vague to very structured
Eg: Can you build me a recommendation engine
Eg: What would you to do increase sales in our website?
Eg: Lets build a prediction model with responder as the
dependent variable
Consulting layer
• Ask questions – to see if the problem is the real pain or is there
something else?
• For eg, “When someone wants to build a recco engine all they
want is a better engagement in the app”.
• Industry is with Buzz words – getting the real solution means
understanding the true problem..
Consulting layer
• What is the type of implementation the customer wants – is it
real time like an app or some notification or is it an insight
• Is there a product roadmap that needs to be aligned to
• What kind of modelling tool kit (Python/Scala/R) etc can run in
the deployment environment
Solution Blue Print
• Make a solution blue print
• Walk through with stake holders – Product, Engg, client teams
etc
• Make sure there are no gaps
• IP can be filed possibly at this stage
Identifying success metric
• Define control group
• Define a bench mark
• Make sure benchmark does not change with time and is truly
neutral
• Arrive at the Formula for incremental revenues or incremental
sales etc
• Sign off on this with stakeholders
• Attribution is a key factor.
Attribution Problems
• For eg: Number of chats to an agent may go down when a
chatbot is launched. But that does not mean chat was handled
well. What if the chat bot had not routed failed questions to
human agents?
• Another possibility because of the chat bot on the web page
more customers may come out and try the chat bot and hence
may increase the # of chats
Reports
• What should the measurement report contain?
• What are the metrics?
• Things like conversion how do you track?
• Does hanging up on an IVR means resolved?
• Reporting frequency – what it should be?
• Should report tally with any other existing system
Implementation Decisions
• The platform and its support
• External data may need to be pumped into the
platform
• Does the platform get real time data and have the
ability to process real time models
• What is the real time support the platform has
• What is the level of complexity the platform provides?
• How should the data science model be delivered?
• Who is going to support the models?
• Design team/Ux and their role
How should the model be handed over?. Who is to take ownership of delivery
and maintenance
Roadmap for Rollouts
• What is the roll out roadmap?
• What should be the gates?
For eg: E.comm roll outs would be % of websites. In some
organization it could be market based roll outs. There could be
certain customer groups or segments that could be part of roll
out
Randomization Decisions
• How is randomness ascertained
-- Browser session id vs Visitor session id for e.comm
--Customer id based random groups
-- it could be callers randomized on caller id
-- what is the system that ensures randomness?
Data
• Understand data
• Understand data distributions especially the dependent
variable
• Make sanity checks to ensure distributions are in line with
problem statement
• Make sure data at this stage that gets used in the modelling
process is what is available real time or during model execution
Modelling Process
Data
validation
Understanding
Data
Data
Cleaning
Pre
processing
Feature
Engineering
Modeling Process - Contd
Train, Test and
Out of time
validation
Algorithm
Tuning
Model
iterations
Final Model
Governance
approval
Model
handover
Model Go Live And Optimization
• Iterative process
• Validate reports and measure model performance at a
definite frequency
• Fine tune the model – add more data
• Capture more features – instrument or use additional data
• Optimize till model stabilizes and revenue or target is met
Optimization - Design
• There could also be design optimizations during this stage –
For eg the content in a widget or number of stages in a check
out
• Performance of the model in terms of execution could also be
optimized
• It could be offers at this stage that could be optimized
• Workflows in BOTS could be optimized as well
Sign Off Model Performance By Stakeholders
• Stakeholders to sign off and buy in to the lift generated
• Resolution of any attribution conflict
• If any seasonality or other effects is showing up in model
performance, those gets resolved at this stage. Some cases the
test or pilot period is extended
Full Roll Out
• Model is rolled out to the max possible extent
• Revenues need to be realized on an on-going process
• Additional opportunities can be sought
Patents And IP
• Lots of IP gets generated during a modelling process
• IP – novelty, context to the business, increase defensibility
• Patent review committee
• IDF – Provisional –Queries – Grants (Could take >3 years from filing)
• Expensive process
• Any idea is a great idea. Always discuss with the patent lawyer
• Simple variable could be a competitive differentiator.
• Algorithms are not patented. Methods and processes are
Case Study
NPS of telco giant Telx has been dipping. DecX is a
product and services org that is engaging with
Telx to improve NPS.
As the chief scientist of DecX what would you do?.
Where do you see the NPS going down?. What
does DecX do that can bring up the NPS of TelX
Lifecycle of a Data Science Project

Lifecycle of a Data Science Project

  • 1.
  • 2.
    Types Of Organizations •Product - Stable business context - Implementation constraints - Data Science solution to align with a product road map - Test and control - Staged Roll outs - Opportunity to create IP
  • 3.
    Types Of Organization Service •Problem requirement understanding • Solution to be signed off from stakeholders • Constant Engagement • Lesser constraint on implementation • Data is not standardized • Data Validation is a must and signed off • Limited IP creation
  • 4.
    The People InA Data Science Project • Lead Data Scientist • Data Scientists • Engagement Managers • Account Manager • Sales • Platform Owners • Engineering team • Design Team
  • 5.
    Problem Requirement Consulting on Opportunities Solutioni ng Identifying success metric and signingon expected outputs Reportin g inputs Implementatio n decisions Roll out discussions Randomization understanding Lead Data Scientist Client Managers Client Lead Data Scientist Client Managers Client Project Manager Product or Platform manager Design Process Before Data 60 % of time spent on this step
  • 6.
    Process Continued AfterData 40 % Of Time Spent On This Step Lead Data Scientist Data Scientist Model Governance Board Lead Data Scientist Client Managers Client & Client Sponsors Project Manager Product or Platform manager Governan ce and model approvals Model go live Measure ment Roll out Data understandi ng Data Validation Models Lead Data Scientist Data Scientist Optimization Lead Data Scientist Data Scientist Project Manager Product or Platform manager Design
  • 7.
    Problem/Requirement • Very vagueto very structured Eg: Can you build me a recommendation engine Eg: What would you to do increase sales in our website? Eg: Lets build a prediction model with responder as the dependent variable
  • 8.
    Consulting layer • Askquestions – to see if the problem is the real pain or is there something else? • For eg, “When someone wants to build a recco engine all they want is a better engagement in the app”. • Industry is with Buzz words – getting the real solution means understanding the true problem..
  • 9.
    Consulting layer • Whatis the type of implementation the customer wants – is it real time like an app or some notification or is it an insight • Is there a product roadmap that needs to be aligned to • What kind of modelling tool kit (Python/Scala/R) etc can run in the deployment environment
  • 10.
    Solution Blue Print •Make a solution blue print • Walk through with stake holders – Product, Engg, client teams etc • Make sure there are no gaps • IP can be filed possibly at this stage
  • 11.
    Identifying success metric •Define control group • Define a bench mark • Make sure benchmark does not change with time and is truly neutral • Arrive at the Formula for incremental revenues or incremental sales etc • Sign off on this with stakeholders • Attribution is a key factor.
  • 12.
    Attribution Problems • Foreg: Number of chats to an agent may go down when a chatbot is launched. But that does not mean chat was handled well. What if the chat bot had not routed failed questions to human agents? • Another possibility because of the chat bot on the web page more customers may come out and try the chat bot and hence may increase the # of chats
  • 13.
    Reports • What shouldthe measurement report contain? • What are the metrics? • Things like conversion how do you track? • Does hanging up on an IVR means resolved? • Reporting frequency – what it should be? • Should report tally with any other existing system
  • 14.
    Implementation Decisions • Theplatform and its support • External data may need to be pumped into the platform • Does the platform get real time data and have the ability to process real time models • What is the real time support the platform has • What is the level of complexity the platform provides? • How should the data science model be delivered? • Who is going to support the models? • Design team/Ux and their role How should the model be handed over?. Who is to take ownership of delivery and maintenance
  • 15.
    Roadmap for Rollouts •What is the roll out roadmap? • What should be the gates? For eg: E.comm roll outs would be % of websites. In some organization it could be market based roll outs. There could be certain customer groups or segments that could be part of roll out
  • 16.
    Randomization Decisions • Howis randomness ascertained -- Browser session id vs Visitor session id for e.comm --Customer id based random groups -- it could be callers randomized on caller id -- what is the system that ensures randomness?
  • 17.
    Data • Understand data •Understand data distributions especially the dependent variable • Make sanity checks to ensure distributions are in line with problem statement • Make sure data at this stage that gets used in the modelling process is what is available real time or during model execution
  • 18.
  • 19.
    Modeling Process -Contd Train, Test and Out of time validation Algorithm Tuning Model iterations Final Model Governance approval Model handover
  • 20.
    Model Go LiveAnd Optimization • Iterative process • Validate reports and measure model performance at a definite frequency • Fine tune the model – add more data • Capture more features – instrument or use additional data • Optimize till model stabilizes and revenue or target is met
  • 21.
    Optimization - Design •There could also be design optimizations during this stage – For eg the content in a widget or number of stages in a check out • Performance of the model in terms of execution could also be optimized • It could be offers at this stage that could be optimized • Workflows in BOTS could be optimized as well
  • 22.
    Sign Off ModelPerformance By Stakeholders • Stakeholders to sign off and buy in to the lift generated • Resolution of any attribution conflict • If any seasonality or other effects is showing up in model performance, those gets resolved at this stage. Some cases the test or pilot period is extended
  • 23.
    Full Roll Out •Model is rolled out to the max possible extent • Revenues need to be realized on an on-going process • Additional opportunities can be sought
  • 24.
    Patents And IP •Lots of IP gets generated during a modelling process • IP – novelty, context to the business, increase defensibility • Patent review committee • IDF – Provisional –Queries – Grants (Could take >3 years from filing) • Expensive process • Any idea is a great idea. Always discuss with the patent lawyer • Simple variable could be a competitive differentiator. • Algorithms are not patented. Methods and processes are
  • 25.
    Case Study NPS oftelco giant Telx has been dipping. DecX is a product and services org that is engaging with Telx to improve NPS. As the chief scientist of DecX what would you do?. Where do you see the NPS going down?. What does DecX do that can bring up the NPS of TelX