SlideShare a Scribd company logo
1 of 8
Machine Learning
Methods and Analysis
Smita Agrawal
Data Scientist – Infosys
Popular AI Methodologies
What is Machine Learning?
 Machine learning is simply a set of computer programs that can teach themselves to grow and adopt when exposed to new data
 Machine learning is broadly categorised into Supervised, Unsupervised and Reinforcement Learning techniques
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Machine
Learning
Classification
Churn Prediction
Fraud Detection
Image Classification
Regression
Market Mix Modelling
ARPU Forecasting
Life/Age Expectancy
Advertising Popularity Prediction
Dimensionality Reduction
Meaningful Compression
Feature Selection
Clustering
Customer Profiling
Targeted Marketing
Recommender Systems
Real-time Decisions
Robot Navigations
Self-Learning Models
The crux of
Machine
Learning lies in
“History Repeats
itself!”
How to be Machine Learning enabled?
A Superficial View of steps
1. Data Curation
Data curation involves gathering data
across relevant attributes such that they
can distinguish and thereby help us learn
more about why something is happening
2. Processing Data
3. Resampling
Resampling data so that data is free of
any biased or over-powered
characteristics of data are selected such
that all the characteristics are balanced
Leveraging measures like Mean,
Median or Mode to process
curated data so that any outliers
or anomalies can be addressed
4. Variable Selection
Not all the variables in the curated data
really distinguish the characteristics.
Variable selection is a critical step which
enables in retaining these variables
5. Predictive Model
A suitable Machine Learning can now
be applied to the data obtained from
step 4 to store the patterns observed
6. Generate Predictions
This is the most exciting step as it is the
stage where we can predict with certain
confidence whether the data points
belong to a characteristic
Data is an oil to our Machine Learning’s engine. However, only oil will not ignite the engine!
Data Oil with the right levers will ignite the engine and enable us in achieving our imperfectly perfect predictions!
 Emergence of big data has created tremendous opportunities for businesses to gain real-time insights
 Make more informed decisions by leveraging data from the exploding number of digital systems
 However, as often is the case with disruptive technologies, the innovations behind big data have created a critical
problem – one that we call Data Drift
 Data drift creates serious challenges to fully harness the insights available from big data
Data drift is defined as:
The unpredictable, unannounced and unending mutation of data characteristics caused by the operation,
maintenance and modernization of the systems that produce the data
Some Severe Impacts -
 Erodes data fidelity
 Operational reliability
 Ultimately the productivity of your data scientists and engineers
 It increases your costs
 Delays time to analysis
 Decreases the productivity and agility of your data engineers
 Leads to poor decision-making by data scientists and the line of business
Source: Streamsets
Data Drift
Fail Safe Mechanism - Prepping Validation for Automated Predictions
1. Ensure variable data types match in validation data with that in the historical data
2. Match all the unique values in categorical variables with the historical data to check for any new values that have come in
the daily data
3. Measure the mean and most frequent value for numerical variables in historical and validation data
4. Any new values that are observed in categorical variables in the validation data will be handled as part of fail safe
mechanism
5. There would be no predictions generated for these records as your predictive model does not know that new value as there
were no instances of this value in the history when the model was built and causes a fatal error leading to no predictions
6. Generate a report for all the excluded records by variables to further analyse the trend drifts and enhance the model
7. Ongoing Data Quality Check: Check whether there is a behavioural shift of more than ± 5% in the modal value across
variables to ensure that none of the data curation stage was broken which could have a consequent impact on the
predictions
8. The variation should be compared to the historical data’s modal values
9. This should be followed as a real-time(daily) practice even while automatically updating the model
Some Common Data Drift Measures
Population Stability Index
Kolmogorov - Smirnov Statistic
Kullback - Leibler Divergence
Histogram Intersection
01
02
03
04
Smita Agrawal
Data Scientist – Infosys
Thank you!!!

More Related Content

What's hot

Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesDATAVERSITY
 
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...Dataconomy Media
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing SecurityNinh Nguyen
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Fedarated learning
Fedarated learningFedarated learning
Fedarated learningVaishakhKP1
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiProfessor Lili Saghafi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Hayim Makabee
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision treeAAKANKSHA JAIN
 

What's hot (20)

Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case Studies
 
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing Security
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Fedarated learning
Fedarated learningFedarated learning
Fedarated learning
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Cloud Mashup
Cloud MashupCloud Mashup
Cloud Mashup
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision tree
 

Similar to Data drift and machine learning

Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data WarehouseSandesh Rao
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesSlideTeam
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdfAnastasiaSteele10
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdfJamieDornan2
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdfStephenAmell4
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...Sandesh Rao
 
How to generate Synthetic Data for an effective App Testing strategy.pdf
How to generate Synthetic Data for an effective App Testing strategy.pdfHow to generate Synthetic Data for an effective App Testing strategy.pdf
How to generate Synthetic Data for an effective App Testing strategy.pdfpCloudy
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf4dalert
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Sandesh Rao
 
Enterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptxEnterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptxGenRocket Inc
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedBhupesh Chaurasia
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
MetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterpriseMetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterpriseMinerva SoftCare GmbH
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdfStephenAmell4
 
Building Information System
Building Information SystemBuilding Information System
Building Information SystemRabia Jabeen
 

Similar to Data drift and machine learning (20)

Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdf
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdf
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdf
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
How to generate Synthetic Data for an effective App Testing strategy.pdf
How to generate Synthetic Data for an effective App Testing strategy.pdfHow to generate Synthetic Data for an effective App Testing strategy.pdf
How to generate Synthetic Data for an effective App Testing strategy.pdf
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
 
Enterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptxEnterprise Test Data Generation.pptx
Enterprise Test Data Generation.pptx
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting Started
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data analytics
Data analyticsData analytics
Data analytics
 
MetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterpriseMetaSuite and_hp_quality_center_enterprise
MetaSuite and_hp_quality_center_enterprise
 
Unlock the power of MLOps.pdf
Unlock the power of MLOps.pdfUnlock the power of MLOps.pdf
Unlock the power of MLOps.pdf
 
Bi
BiBi
Bi
 
Building Information System
Building Information SystemBuilding Information System
Building Information System
 

Recently uploaded

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 

Recently uploaded (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 

Data drift and machine learning

  • 1. Machine Learning Methods and Analysis Smita Agrawal Data Scientist – Infosys
  • 3. What is Machine Learning?  Machine learning is simply a set of computer programs that can teach themselves to grow and adopt when exposed to new data  Machine learning is broadly categorised into Supervised, Unsupervised and Reinforcement Learning techniques Supervised Learning Unsupervised Learning Reinforcement Learning Machine Learning Classification Churn Prediction Fraud Detection Image Classification Regression Market Mix Modelling ARPU Forecasting Life/Age Expectancy Advertising Popularity Prediction Dimensionality Reduction Meaningful Compression Feature Selection Clustering Customer Profiling Targeted Marketing Recommender Systems Real-time Decisions Robot Navigations Self-Learning Models The crux of Machine Learning lies in “History Repeats itself!”
  • 4. How to be Machine Learning enabled? A Superficial View of steps 1. Data Curation Data curation involves gathering data across relevant attributes such that they can distinguish and thereby help us learn more about why something is happening 2. Processing Data 3. Resampling Resampling data so that data is free of any biased or over-powered characteristics of data are selected such that all the characteristics are balanced Leveraging measures like Mean, Median or Mode to process curated data so that any outliers or anomalies can be addressed 4. Variable Selection Not all the variables in the curated data really distinguish the characteristics. Variable selection is a critical step which enables in retaining these variables 5. Predictive Model A suitable Machine Learning can now be applied to the data obtained from step 4 to store the patterns observed 6. Generate Predictions This is the most exciting step as it is the stage where we can predict with certain confidence whether the data points belong to a characteristic Data is an oil to our Machine Learning’s engine. However, only oil will not ignite the engine! Data Oil with the right levers will ignite the engine and enable us in achieving our imperfectly perfect predictions!
  • 5.  Emergence of big data has created tremendous opportunities for businesses to gain real-time insights  Make more informed decisions by leveraging data from the exploding number of digital systems  However, as often is the case with disruptive technologies, the innovations behind big data have created a critical problem – one that we call Data Drift  Data drift creates serious challenges to fully harness the insights available from big data Data drift is defined as: The unpredictable, unannounced and unending mutation of data characteristics caused by the operation, maintenance and modernization of the systems that produce the data Some Severe Impacts -  Erodes data fidelity  Operational reliability  Ultimately the productivity of your data scientists and engineers  It increases your costs  Delays time to analysis  Decreases the productivity and agility of your data engineers  Leads to poor decision-making by data scientists and the line of business Source: Streamsets Data Drift
  • 6. Fail Safe Mechanism - Prepping Validation for Automated Predictions 1. Ensure variable data types match in validation data with that in the historical data 2. Match all the unique values in categorical variables with the historical data to check for any new values that have come in the daily data 3. Measure the mean and most frequent value for numerical variables in historical and validation data 4. Any new values that are observed in categorical variables in the validation data will be handled as part of fail safe mechanism 5. There would be no predictions generated for these records as your predictive model does not know that new value as there were no instances of this value in the history when the model was built and causes a fatal error leading to no predictions 6. Generate a report for all the excluded records by variables to further analyse the trend drifts and enhance the model 7. Ongoing Data Quality Check: Check whether there is a behavioural shift of more than ± 5% in the modal value across variables to ensure that none of the data curation stage was broken which could have a consequent impact on the predictions 8. The variation should be compared to the historical data’s modal values 9. This should be followed as a real-time(daily) practice even while automatically updating the model
  • 7. Some Common Data Drift Measures Population Stability Index Kolmogorov - Smirnov Statistic Kullback - Leibler Divergence Histogram Intersection 01 02 03 04
  • 8. Smita Agrawal Data Scientist – Infosys Thank you!!!