Building big scale data product doesn't rely only on sophisticated modeling. It also requires an agile methodology, iterative research & development process, versatile big data stack, and a value-oriented mindset. I'll discuss how we -at Dsquares- build big-scale AI product that leverages clients' data from different industries to deliver business-critical value to the end customer. I'll cover the process of product discovery, R&D tasks for unsolved problems, and mapping business requirements into big data technical requirements.
3. What is Data Product
Management?
1
Managing a Data Science Research
Process
2
Creating a Healthy Innovation
Process
3
Data Product Considerations
4
Data in cloud vs in-house
5
Contents
4. The process of building
intelligent products that
leverages Data & AI
What is Data Product
Management?
10. Challenges with Managing a Data Science
Research Process
1 - Research-Based
Most of the times, data
projects are attempting
to solve new, industry-
specific problems
2 - Open-Ended
Experimentation
Most projects require
multiple experiments
with no promise of
success
3 - Iterative Process
Partial & full experiment
re-iteration will always
be there
4 - Scoping &
Estimation
Scoping & estimation
for data science tasks
are usually tough to
keep
12. When Building a Data Product
Follow a Hypothesis-
Driven Development
Validate your
assumptions about
persona, JTBD, &
solution
Follow the process, &
iterate
Test a lot & Quickly
engage in user-testing
since early phasess
Setup the
necessary metrics
success criteria,
definition of done, early
stopping, etc.
Get Rigorously
innovative
13. AI Rolling Out
How can we safely rollout our
model?
Considerations for Data Projects
AI Life Cycle
How often do we train the model?
How can we avoid any model
surprises?
Data Quality & Pipeline
How to consistently validate the
data & assumptions?
How can we avoid data surprises?
Data Governance & Privacy
How to ensure against any Personal
Identifiable Info leak?
14. When Rolling out a new AI
model that does one of the
following:
Replaces a manual process
Replaces a legacy tech
process
Replaces an older model
1.Rolling out an
AI model
Shadow Deployment
Canary Deployment
Blue/ Green Deployment
A/B Deployment
BigBang Deployment
15. Continuous assessment for the
model performance is
required for:
Data Drift
Concept Drift
2. Monitoring the
AI model
16. Continuous assessment for
the data pipeline is
necessary
Set-up a data observability
framework
3. Monitoring the
Data Pipeline
17. 4. Data Governance
& Privacy
Follow the Country’s regulations
for data privacy (i.e: GDPR)
Protect Privacy by design
Anonymizing Personal
Identifiable Information
Using advanced techniques
such as differential privacy
Follow guidelines for:
Restricting & Documenting
data access
Documenting data lineage ,
pipeline & transformations
18. AI on cloud Vs On-Premises
Depending on the use case, we might need to develop our product
either on cloud or on-premises
22. RESOURCES
Alex Cowan (Sep 2022), Hypothesis-Driven Development: A Guide to Smarter Product
Management. Available at: https://www.alexandercowan.com/hdd-book-ref/
Evidently AI (April 2024). Machine Learning in Production Guide: Concept Drift Chapter.
Available at: https://www.evidentlyai.com/ml-in-production/concept-drift
Yashaswi Nayak (July 2022).ML Model Deployment Strategies: An illustrated guide to
deployment strategies for ML Engineers Available at: https://towardsdatascience.com/ml-
model-deployment-strategies-72044b3c1410
Regulation (EU) 2016/679 (General Data Protection Regulation). Available at: https://gdpr-
info.eu/
Belle, Vaishak, and Ioannis Papantonis. “Principles and Practice of Explainable Machine
Learning.” Frontiers, May 26, 2021.
https://www.frontiersin.org/articles/10.3389/fdata.2021.688969/full.
Marktab. “What Is the Team Data Science Process? - Azure Architecture Center.” Azure
Architecture Center | Microsoft Learn. https://learn.microsoft.com/en
us/azure/architecture/data-science-process/overview.