Implementing Machine Learning Incrementally

Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Implementing Machine
Learning Incrementally
Dr. Ravindra Guntur
Head of Data Science & Senior Member of the ACM
Talentica Software

Typical Startup Journey
2

Data First or Algorithm First
• What is the driver in an AI/ML based product
• Is it data or is it the algorithm
• What do you think a startup will have in early and Series-A stage
• Data or Algorithm
• Should the business model choose the algorithm or should the data choose the
algorithm
• Business model influences the choice of the algorithm and the algorithm
demands the appropriate data.
3

Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.4
Architecture
Business
Driver
Data
Algorithms
Time

Algorithms for Incremental ML
• Fingerprinting
• Stacking
• Over Fitting
• One-class classifiers
• Open Set Learning
5

H o w D o T h e s e
A l g o r i t h m s W o r k ?
6

How do these Algorithms Work
• Fingerprinting
• Represents data of a particular type as a number, a hash or a vector
• Stacking
• Helps chain different decision makers
• Overfitting
• Much frowned upon. Used will with fingerprinting and stacking
• Open-set learning
• Latest class of algorithms
• Identify unknown unknowns

• QA system
• Input - Natural language
question
• Output - Answer
generated based on a DB
query
• Single cell classifier
• Low resolution image of a
cell in a liquid trough
• Output – single cell or
multi-cell or impurity
classification
Case-1 Case-2
Two Case-Studies
8

Case-1: QA System
9
Architecture
Business
Driver
Data
Algorithms
Time
f( )f( )
Text
Interpretation
Machine
Translation
SQL
DB
NLG
Open Set (Unknown Unknowns)
Closed Set (Known Knowns)

Constraints on the System
• The database schema determines the types of questions that can be answered
• There can be many SQL queries of varying complexity
• Supervised set has a small number of such variations to start with
• For example count queries of the form
• Select COUNT(<target>) from <Table> where <Column> = <Key>
• Based on the class of SQL queries
• train a sequence2sequence model or a CRF generator with appropriate
English language questions

Strategy: Incremental training
11
Fingerprint
Detected
ML Model Default response
Response
NO
YES
Retrain
NO
ML Model
Update Fingerprint
Deploy New Model
Architecture
Business
Driver
Data
Algorithms
Time
Input

Strategy: Stacking
12
NO NO
YES
Input
Detected
Detected
Fingerprint
ML Model
Response
Fingerprint
ML Model
Response
Default response
Architecture
Business
Driver
Data
Algorithms
Time
NO
NO
YES
YES

• Single model
• Model grows in complexity
• Prediction error of model
changes when new classes are
added
• Result generated in one
computational cycle
• Multiple models
• Each model is simple
• Prediction error of each model
remains the same even after
new classes are added
• Multiple computational cycles
Incremental Training Stacking
Incremental Training vs Stacking

Case-2: Single Cell Recognition in a Single
Cell Printer
Open Set (Unknown Unknowns)
Closed Set (Known Knowns)
f( )
Image
Transformation
Feature
Augmentation
Detection
Single/non-Single
Printer
Sequencing
Application
f( )

Constraints on the System
• Small number of examples for single cell and non-single cell
• Imbalanced data
• Low resolution images
• 2 to 3 variations in single cell and non-single cell

Strategy: Open Set Recognition (OSR models)
16
Architecture
Business
Driver
Data
Algorithms
Time
NO
Ref: Towards Open Set Deep Networks, CVPR 2016

Strategy: Open Set Deep Network
NO
Mean Activation
Vector (MAV)
Distance between
sample and MAV fits
different Weibull
distributions
OpenMax Layer

Summary
• Proprietary data brings about a natural differentiator in the product
• There exists a class of algorithms that support incremental
improvement in a product’s quality using small proprietary data sets
• Many new algorithms have been proposed in the last 3 years as large
labeled datasets for specific and complex conditions are difficult to
get
• The choice of algorithms depends on the business case, data,
architecture, speed of delivery, proprietary data

CONTACT
S R I R I Z ,
B A N E R - P A S H A N L I N K R D ,
P A S H A N , P U N E ,
M A H A R A S H T R A 4 1 1 0 2 1
E : R A V I N D R A @ T A L E N T I C A . C O M
w w w . t a l e n t i c a . c o m

Implementing Machine Learning Incrementally

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Implementing Machine Learning Incrementally

Similar to Implementing Machine Learning Incrementally (20)

Recently uploaded

Recently uploaded (20)

Implementing Machine Learning Incrementally

Editor's Notes