SlideShare a Scribd company logo
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Implementing Machine
Learning Incrementally
Dr. Ravindra Guntur
Head of Data Science & Senior Member of the ACM
Talentica Software
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Typical Startup Journey
2
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Data First or Algorithm First
• What is the driver in an AI/ML based product
• Is it data or is it the algorithm
• What do you think a startup will have in early and Series-A stage
• Data or Algorithm
• Should the business model choose the algorithm or should the data choose the
algorithm
• Business model influences the choice of the algorithm and the algorithm
demands the appropriate data.
3
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.4
Architecture
Business
Driver
Data
Algorithms
Time
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Algorithms for Incremental ML
• Fingerprinting
• Stacking
• Over Fitting
• One-class classifiers
• Open Set Learning
5
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
H o w D o T h e s e
A l g o r i t h m s W o r k ?
6
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
How do these Algorithms Work
• Fingerprinting
• Represents data of a particular type as a number, a hash or a vector
• Stacking
• Helps chain different decision makers
• Overfitting
• Much frowned upon. Used will with fingerprinting and stacking
• Open-set learning
• Latest class of algorithms
• Identify unknown unknowns
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
• QA system
• Input - Natural language
question
• Output - Answer
generated based on a DB
query
• Single cell classifier
• Low resolution image of a
cell in a liquid trough
• Output – single cell or
multi-cell or impurity
classification
Case-1 Case-2
Two Case-Studies
8
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Case-1: QA System
9
Architecture
Business
Driver
Data
Algorithms
Time
f( )f( )
Text
Interpretation
Machine
Translation
SQL
DB
NLG
Open Set (Unknown Unknowns)
Closed Set (Known Knowns)
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Constraints on the System
• The database schema determines the types of questions that can be answered
• There can be many SQL queries of varying complexity
• Supervised set has a small number of such variations to start with
• For example count queries of the form
• Select COUNT(<target>) from <Table> where <Column> = <Key>
• Based on the class of SQL queries
• train a sequence2sequence model or a CRF generator with appropriate
English language questions
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Strategy: Incremental training
11
Fingerprint
Detected
ML Model Default response
Response
NO
YES
Retrain
NO
ML Model
Update Fingerprint
Deploy New Model
Architecture
Business
Driver
Data
Algorithms
Time
Input
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Strategy: Stacking
12
NO NO
YES
Input
Detected
Detected
Fingerprint
ML Model
Response
Fingerprint
ML Model
Response
Default response
Architecture
Business
Driver
Data
Algorithms
Time
NO
NO
YES
YES
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
• Single model
• Model grows in complexity
• Prediction error of model
changes when new classes are
added
• Result generated in one
computational cycle
• Multiple models
• Each model is simple
• Prediction error of each model
remains the same even after
new classes are added
• Multiple computational cycles
Incremental Training Stacking
Incremental Training vs Stacking
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Case-2: Single Cell Recognition in a Single
Cell Printer
Open Set (Unknown Unknowns)
Closed Set (Known Knowns)
f( )
Image
Transformation
Feature
Augmentation
Detection
Single/non-Single
Printer
Sequencing
Application
f( )
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Constraints on the System
• Small number of examples for single cell and non-single cell
• Imbalanced data
• Low resolution images
• 2 to 3 variations in single cell and non-single cell
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Strategy: Open Set Recognition (OSR models)
16
Architecture
Business
Driver
Data
Algorithms
Time
NO
Ref: Towards Open Set Deep Networks, CVPR 2016
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Strategy: Open Set Deep Network
NO
Mean Activation
Vector (MAV)
Distance between
sample and MAV fits
different Weibull
distributions
OpenMax Layer
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
Summary
• Proprietary data brings about a natural differentiator in the product
• There exists a class of algorithms that support incremental
improvement in a product’s quality using small proprietary data sets
• Many new algorithms have been proposed in the last 3 years as large
labeled datasets for specific and complex conditions are difficult to
get
• The choice of algorithms depends on the business case, data,
architecture, speed of delivery, proprietary data
Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.
CONTACT
S R I R I Z ,
B A N E R - P A S H A N L I N K R D ,
P A S H A N , P U N E ,
M A H A R A S H T R A 4 1 1 0 2 1
E : R A V I N D R A @ T A L E N T I C A . C O M
w w w . t a l e n t i c a . c o m

More Related Content

What's hot

State street edmc swaps pilot
State street edmc swaps pilotState street edmc swaps pilot
State street edmc swaps pilot
Marty Loughlin
 
Introduction To Data Science With Python
Introduction To Data Science With PythonIntroduction To Data Science With Python
Introduction To Data Science With Python
Spotle.ai
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
Dr. Haxel Consult
 
Edmc use cases 2018 nyc
Edmc use cases 2018   nycEdmc use cases 2018   nyc
Edmc use cases 2018 nyc
Marty Loughlin
 
Best Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | EdurekaBest Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | Edureka
Edureka!
 
Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
Dmitry Petukhov
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
Ning Jiang
 
Machine learning using spark Online Training
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online Training
Learntek1
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
Gerard McNamee
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Fwdays
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for Success
Databricks
 
Thinking About Guideline for Data Interoperability - Design concept and workf...
Thinking About Guideline for Data Interoperability - Design concept and workf...Thinking About Guideline for Data Interoperability - Design concept and workf...
Thinking About Guideline for Data Interoperability - Design concept and workf...
Open Cyber University of Korea
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
 
InfoEducatie - What is Solution Architecture?
InfoEducatie - What is Solution Architecture?InfoEducatie - What is Solution Architecture?
InfoEducatie - What is Solution Architecture?
Bogdan Bocse
 
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
Dr. Haxel Consult
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
Dr. Haxel Consult
 

What's hot (20)

State street edmc swaps pilot
State street edmc swaps pilotState street edmc swaps pilot
State street edmc swaps pilot
 
Introduction To Data Science With Python
Introduction To Data Science With PythonIntroduction To Data Science With Python
Introduction To Data Science With Python
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
Edmc use cases 2018 nyc
Edmc use cases 2018   nycEdmc use cases 2018   nyc
Edmc use cases 2018 nyc
 
Best Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | EdurekaBest Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | Edureka
 
Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Machine learning using spark Online Training
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online Training
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for Success
 
Thinking About Guideline for Data Interoperability - Design concept and workf...
Thinking About Guideline for Data Interoperability - Design concept and workf...Thinking About Guideline for Data Interoperability - Design concept and workf...
Thinking About Guideline for Data Interoperability - Design concept and workf...
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
InfoEducatie - What is Solution Architecture?
InfoEducatie - What is Solution Architecture?InfoEducatie - What is Solution Architecture?
InfoEducatie - What is Solution Architecture?
 
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
 

Similar to Implementing Machine Learning Incrementally

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Yogesh Sharma
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
Maria Colgan
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
Sandesh Rao
 
Get ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extGet ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_ext
Oracle Developers
 
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
Tammy Bednar
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Synerzip
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
Josh Yeh
 
Using Machine Learning to Debug complex Oracle RAC Issues
Using Machine Learning  to Debug complex Oracle RAC IssuesUsing Machine Learning  to Debug complex Oracle RAC Issues
Using Machine Learning to Debug complex Oracle RAC Issues
Anil Nair
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
Dataconomy Media
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
BigML, Inc
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Sandesh Rao
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
Nikhil Garg
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
SnapLogic
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
MLconf
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
Cloudera, Inc.
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
DataWorks Summit
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Open Data Group
 
Mass Scale Networking
Mass Scale NetworkingMass Scale Networking
Mass Scale Networking
Steve Iatrou
 
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Amazon Web Services Korea
 

Similar to Implementing Machine Learning Incrementally (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
 
Get ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extGet ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_ext
 
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
Using Machine Learning to Debug complex Oracle RAC Issues
Using Machine Learning  to Debug complex Oracle RAC IssuesUsing Machine Learning  to Debug complex Oracle RAC Issues
Using Machine Learning to Debug complex Oracle RAC Issues
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
 
Mass Scale Networking
Mass Scale NetworkingMass Scale Networking
Mass Scale Networking
 
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
 

Recently uploaded

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
cjimenez2581
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 

Recently uploaded (20)

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 

Implementing Machine Learning Incrementally

  • 1. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Implementing Machine Learning Incrementally Dr. Ravindra Guntur Head of Data Science & Senior Member of the ACM Talentica Software
  • 2. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Typical Startup Journey 2
  • 3. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Data First or Algorithm First • What is the driver in an AI/ML based product • Is it data or is it the algorithm • What do you think a startup will have in early and Series-A stage • Data or Algorithm • Should the business model choose the algorithm or should the data choose the algorithm • Business model influences the choice of the algorithm and the algorithm demands the appropriate data. 3
  • 4. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved.4 Architecture Business Driver Data Algorithms Time
  • 5. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Algorithms for Incremental ML • Fingerprinting • Stacking • Over Fitting • One-class classifiers • Open Set Learning 5
  • 6. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. H o w D o T h e s e A l g o r i t h m s W o r k ? 6
  • 7. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. How do these Algorithms Work • Fingerprinting • Represents data of a particular type as a number, a hash or a vector • Stacking • Helps chain different decision makers • Overfitting • Much frowned upon. Used will with fingerprinting and stacking • Open-set learning • Latest class of algorithms • Identify unknown unknowns
  • 8. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. • QA system • Input - Natural language question • Output - Answer generated based on a DB query • Single cell classifier • Low resolution image of a cell in a liquid trough • Output – single cell or multi-cell or impurity classification Case-1 Case-2 Two Case-Studies 8
  • 9. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Case-1: QA System 9 Architecture Business Driver Data Algorithms Time f( )f( ) Text Interpretation Machine Translation SQL DB NLG Open Set (Unknown Unknowns) Closed Set (Known Knowns)
  • 10. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Constraints on the System • The database schema determines the types of questions that can be answered • There can be many SQL queries of varying complexity • Supervised set has a small number of such variations to start with • For example count queries of the form • Select COUNT(<target>) from <Table> where <Column> = <Key> • Based on the class of SQL queries • train a sequence2sequence model or a CRF generator with appropriate English language questions
  • 11. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Strategy: Incremental training 11 Fingerprint Detected ML Model Default response Response NO YES Retrain NO ML Model Update Fingerprint Deploy New Model Architecture Business Driver Data Algorithms Time Input
  • 12. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Strategy: Stacking 12 NO NO YES Input Detected Detected Fingerprint ML Model Response Fingerprint ML Model Response Default response Architecture Business Driver Data Algorithms Time NO NO YES YES
  • 13. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. • Single model • Model grows in complexity • Prediction error of model changes when new classes are added • Result generated in one computational cycle • Multiple models • Each model is simple • Prediction error of each model remains the same even after new classes are added • Multiple computational cycles Incremental Training Stacking Incremental Training vs Stacking
  • 14. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Case-2: Single Cell Recognition in a Single Cell Printer Open Set (Unknown Unknowns) Closed Set (Known Knowns) f( ) Image Transformation Feature Augmentation Detection Single/non-Single Printer Sequencing Application f( )
  • 15. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Constraints on the System • Small number of examples for single cell and non-single cell • Imbalanced data • Low resolution images • 2 to 3 variations in single cell and non-single cell
  • 16. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Strategy: Open Set Recognition (OSR models) 16 Architecture Business Driver Data Algorithms Time NO Ref: Towards Open Set Deep Networks, CVPR 2016
  • 17. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Strategy: Open Set Deep Network NO Mean Activation Vector (MAV) Distance between sample and MAV fits different Weibull distributions OpenMax Layer
  • 18. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. Summary • Proprietary data brings about a natural differentiator in the product • There exists a class of algorithms that support incremental improvement in a product’s quality using small proprietary data sets • Many new algorithms have been proposed in the last 3 years as large labeled datasets for specific and complex conditions are difficult to get • The choice of algorithms depends on the business case, data, architecture, speed of delivery, proprietary data
  • 19. Copyright © 2019 Talentica Software (I) Pvt Ltd. All rights reserved. CONTACT S R I R I Z , B A N E R - P A S H A N L I N K R D , P A S H A N , P U N E , M A H A R A S H T R A 4 1 1 0 2 1 E : R A V I N D R A @ T A L E N T I C A . C O M w w w . t a l e n t i c a . c o m

Editor's Notes

  1. There are 4 major drivers when we want to build ML-based solutions for startups What drives the business and how will the startup make money This will be told to the data scientist/ML expert and shall be internalized and empathized Based on the above we identify a “class of algorithms” This is where working knowledge in multiple domains comes in handy From among the class of algorithms we choose those appropriate for the delivery timelines Here again working knowledge in multiple domains comes in handy or shall act as a guide Based on these factors we define the data requirements Data comes with its constraints in terms of quality and number of samples Based on all these factors and a projection into the future, the architecture for incremental evolution of ML is determined
  2. Advantages and disadvantages
  3. Advantages and disadvantages
  4. An example with three classes