9. Ask a
question
What’s my
Company &
Product’s
Goal/Vision?
How can we
make our
products
‘smarter’?
What data do
we have or
can get?
Will this be
useful to end
user?
Business/DomainExpert
DB and Data Expert
11. Explore distributions
• Are they tellinga story
Handle missing data
• Can the missing data be ignored
• Does it need to be imputed
Look for outliers
• Do we want to
identify/filter/managethem
12. 0
2
4
6
8
10
12
14
16
18
1-3 4-6 7-10 11-14 15-18 19-22 23-26 27-30
No of Questions
%Dropout
0
2
4
6
8
10
12
14
16
18
0 100 200 300 400
%Dropout
No of Questions
How removing outliers helped uncover the correlation
13.
14. Bias (underfitting) Optimal Variance (overfitting)
Solution
• Add more features
• Use a more complex model
Solution
• Fewer features
• More data to reduce
variance
15. overall accuracy is not good enough
People
resigned
People Not
Resigned
High Risk 30
100
(Type 1 Error)
Low Risk
70
(Type 2 Error)
800
OverallAccuracy – 83%
Sensitivity(Recall) – 30%
16.
17.
18. UI and Server side
DevelopmentML
Data Lake
Import/Export
Data Access
Feature Engineering
BenchmarkGeneration
Missing Data Imputation
Data Binning
Visualizations