SlideShare a Scribd company logo
1 of 14
Download to read offline
17. Large Scale Machine Learning
-- Machine learning algos work better if the large amount of
data is fed to it. So the algos have to be efficient to works with such
large datasets
Accuracy of prediction grows as the amount of data increases, for
each type of algorithm used:
In a Single step of gradient descent, it will have to compute sum of
100 million terms
Sanity check: use just a small set of examples and check if increase
the no of examples will benefit the model:
Draw the learning curves:
If this type of curve occurs, it means that it’s a high variance
problem : and increase no. of examples is likely to help.
But mainly it will require the addition of new extra features, since
its high variance problem.
If this type of curve occurs, it means that it’s a high bias problem :
then increase the no. of examples is not very likely to benefit the
model.
STOCHASTIC GRADIENT DESCCENT: to make it less expensive than
simple gradient descent
What gradient descent does:
What will happen is: Contours:
➢Its called Batch Gradient Descend.
What it does:
➢Each time the inner for loop works: it fits the parameters a
little better for that particular example
The simple gradient descent converges to the optimum in a
reasonably straight-line path
But SGD does not follow linear path, it just moves into a random
direction, but in the end, it converges to a region near optimum.
It does never reach the optimum, it continuously wanders around
the optimum, but it’s a fine fit as the region near optimum is OK to
converge
The outer loop may be made to run 1 to 10 times, based on the
size of m.
If m is very-very large, running the outer loop just once will give a
reasonably good hypothesis, while in case of batch gradient
descent, looping over all examples once will only move the
hypothesis one step closer.
MINI BATCH GRADIENT DESCENT:
Using b - examples at a time benefits the vectorization for parallel
computation
SGD CONVERGENCE:
If the plot looks like this:
Red : with smaller α;
Blue : with larger α
The curve is noisy as the cost is only averaged over 1000 examples
Using smaller learning rate, the algo will initially learn slowly but it
will eventually give a slightly better estimate: because the SGD
doesn’t converge, but the parameters will oscillate around the
optimum, so slower learning rate means the oscillations are also
smaller.
If we average over 5000 examples, the curve will smoothen, b’coz
we only plot one data point out of every 5000 examlpes
If the plot looks like this:
Blue – no of examples to average over is very low
Red – incr in no examples to average over has given a smooth curve
and the cost is decr
Pink – if the curve looks like this after incr no of ex: there is
something wrong with the algo and the algo is not learning. We
have to change some parameters of the algo.
If the plot looks like this: the algo is diverging
How to converge the SGD:
ONLINE LEARNING: when there is continuous input data like on
websites which track users’ actions
➢It can adapt to changing user preferences, even if the entire
pool of users changes
➢After updating the parameters with current example, we can
just throw away the data.
≫ if the data is continuous, its good to use this algo
≫ if the data is small, then its better to use logistic regression.
CLICK THROUGH RATE PREDICTION PROBLEM – online learning
Lets say we have 100 phones in the shop and we want to return the
top 10 results to user:
Each time the user clicks a link, we will collect the (x, y) – pair to
learn through and show them the products they are most likely to
click on
MAP REDUCE:
When an ML problem is too big for a single machine, we can use
map-reduce:
➢We split the training set and sent each part to different
computers
➢Computers process data and sent them to centralized
controller
➢Centralized controller combines the results and produce the
model
When to use map-reduce:
Ask yourself: can the algorithm be expressed as a sum of functions
over training set?
▪ If yes: use map-reduce
DATA PARALLELISM:
We can also use map reduce on machines having multiple cores:
➢This way we don’t have to care about network latencies
Some linear algebra libraries automatically take care of it.

More Related Content

What's hot

6 logistic regression classification algo
6 logistic regression   classification algo6 logistic regression   classification algo
6 logistic regression classification algoTanmayVijay1
 
9 neural network learning
9 neural network learning9 neural network learning
9 neural network learningTanmayVijay1
 
U6 Cn2 Definite Integrals Intro
U6 Cn2 Definite Integrals IntroU6 Cn2 Definite Integrals Intro
U6 Cn2 Definite Integrals IntroAlexander Burt
 
Antiderivatives And Slope Fields
Antiderivatives And Slope FieldsAntiderivatives And Slope Fields
Antiderivatives And Slope Fieldsseltzermath
 
Ap calculus warm up 8.27.13
Ap calculus warm up   8.27.13Ap calculus warm up   8.27.13
Ap calculus warm up 8.27.13Ron Eick
 
AP Calculus Slides October 24, 2007
AP Calculus Slides October 24, 2007AP Calculus Slides October 24, 2007
AP Calculus Slides October 24, 2007Darren Kuropatwa
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Monte carlo-simulation
Monte carlo-simulationMonte carlo-simulation
Monte carlo-simulationjaimarbustos
 
Simplex Method Flowchart/Algorithm
Simplex Method Flowchart/AlgorithmSimplex Method Flowchart/Algorithm
Simplex Method Flowchart/AlgorithmRaja Adapa
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataShivaram Prakash
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by KodebayKodebay
 
How to combine interpolation and regression graphs in R
How to combine interpolation and regression graphs in RHow to combine interpolation and regression graphs in R
How to combine interpolation and regression graphs in RDougLoqa
 
Random number generation
Random number generationRandom number generation
Random number generationVinit Dantkale
 
May 26: Slope from a graph
May 26: Slope from a graphMay 26: Slope from a graph
May 26: Slope from a graphJan Parker
 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsDataminingTools Inc
 
Approximation and error
Approximation and errorApproximation and error
Approximation and errorrubenarismendi
 

What's hot (20)

7 regularization
7 regularization7 regularization
7 regularization
 
6 logistic regression classification algo
6 logistic regression   classification algo6 logistic regression   classification algo
6 logistic regression classification algo
 
9 neural network learning
9 neural network learning9 neural network learning
9 neural network learning
 
U6 Cn2 Definite Integrals Intro
U6 Cn2 Definite Integrals IntroU6 Cn2 Definite Integrals Intro
U6 Cn2 Definite Integrals Intro
 
Riemann's Sum
Riemann's SumRiemann's Sum
Riemann's Sum
 
Antiderivatives And Slope Fields
Antiderivatives And Slope FieldsAntiderivatives And Slope Fields
Antiderivatives And Slope Fields
 
Ap calculus warm up 8.27.13
Ap calculus warm up   8.27.13Ap calculus warm up   8.27.13
Ap calculus warm up 8.27.13
 
Calc 2.1
Calc 2.1Calc 2.1
Calc 2.1
 
AP Calculus Slides October 24, 2007
AP Calculus Slides October 24, 2007AP Calculus Slides October 24, 2007
AP Calculus Slides October 24, 2007
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Monte carlo-simulation
Monte carlo-simulationMonte carlo-simulation
Monte carlo-simulation
 
Simplex Method Flowchart/Algorithm
Simplex Method Flowchart/AlgorithmSimplex Method Flowchart/Algorithm
Simplex Method Flowchart/Algorithm
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
How to combine interpolation and regression graphs in R
How to combine interpolation and regression graphs in RHow to combine interpolation and regression graphs in R
How to combine interpolation and regression graphs in R
 
Random number generation
Random number generationRandom number generation
Random number generation
 
May 26: Slope from a graph
May 26: Slope from a graphMay 26: Slope from a graph
May 26: Slope from a graph
 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared Distributions
 
Approximation and error
Approximation and errorApproximation and error
Approximation and error
 

Similar to 17 large scale machine learning

Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptxkumarkaushal17
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsKun Liu
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...
Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...
Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...Hamish Laird
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Machine Learning Basics for Web Application Developers
Machine Learning Basics for Web Application DevelopersMachine Learning Basics for Web Application Developers
Machine Learning Basics for Web Application DevelopersEtsuji Nakai
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdfjuan631
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 

Similar to 17 large scale machine learning (20)

Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Regresión
RegresiónRegresión
Regresión
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...
Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...
Why Power Converter Modelling Wastes Your Time Camarillo Jan 30_Feb_2_2017 Di...
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Machine Learning Basics for Web Application Developers
Machine Learning Basics for Web Application DevelopersMachine Learning Basics for Web Application Developers
Machine Learning Basics for Web Application Developers
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 

More from TanmayVijay1

18 application example photo ocr
18 application example  photo ocr18 application example  photo ocr
18 application example photo ocrTanmayVijay1
 
1 Introduction to Machine Learning
1 Introduction to Machine Learning1 Introduction to Machine Learning
1 Introduction to Machine LearningTanmayVijay1
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systemsTanmayVijay1
 
11 ml system design
11 ml system design11 ml system design
11 ml system designTanmayVijay1
 
3 linear algebra review
3 linear algebra review3 linear algebra review
3 linear algebra reviewTanmayVijay1
 

More from TanmayVijay1 (6)

18 application example photo ocr
18 application example  photo ocr18 application example  photo ocr
18 application example photo ocr
 
1 Introduction to Machine Learning
1 Introduction to Machine Learning1 Introduction to Machine Learning
1 Introduction to Machine Learning
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systems
 
11 ml system design
11 ml system design11 ml system design
11 ml system design
 
5 octave tutorial
5 octave tutorial5 octave tutorial
5 octave tutorial
 
3 linear algebra review
3 linear algebra review3 linear algebra review
3 linear algebra review
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

17 large scale machine learning

  • 1. 17. Large Scale Machine Learning -- Machine learning algos work better if the large amount of data is fed to it. So the algos have to be efficient to works with such large datasets Accuracy of prediction grows as the amount of data increases, for each type of algorithm used:
  • 2. In a Single step of gradient descent, it will have to compute sum of 100 million terms Sanity check: use just a small set of examples and check if increase the no of examples will benefit the model: Draw the learning curves: If this type of curve occurs, it means that it’s a high variance problem : and increase no. of examples is likely to help. But mainly it will require the addition of new extra features, since its high variance problem.
  • 3. If this type of curve occurs, it means that it’s a high bias problem : then increase the no. of examples is not very likely to benefit the model. STOCHASTIC GRADIENT DESCCENT: to make it less expensive than simple gradient descent What gradient descent does:
  • 4. What will happen is: Contours: ➢Its called Batch Gradient Descend.
  • 5. What it does: ➢Each time the inner for loop works: it fits the parameters a little better for that particular example The simple gradient descent converges to the optimum in a reasonably straight-line path But SGD does not follow linear path, it just moves into a random direction, but in the end, it converges to a region near optimum.
  • 6. It does never reach the optimum, it continuously wanders around the optimum, but it’s a fine fit as the region near optimum is OK to converge The outer loop may be made to run 1 to 10 times, based on the size of m. If m is very-very large, running the outer loop just once will give a reasonably good hypothesis, while in case of batch gradient descent, looping over all examples once will only move the hypothesis one step closer. MINI BATCH GRADIENT DESCENT:
  • 7. Using b - examples at a time benefits the vectorization for parallel computation SGD CONVERGENCE:
  • 8. If the plot looks like this: Red : with smaller α; Blue : with larger α The curve is noisy as the cost is only averaged over 1000 examples Using smaller learning rate, the algo will initially learn slowly but it will eventually give a slightly better estimate: because the SGD doesn’t converge, but the parameters will oscillate around the optimum, so slower learning rate means the oscillations are also smaller.
  • 9. If we average over 5000 examples, the curve will smoothen, b’coz we only plot one data point out of every 5000 examlpes If the plot looks like this: Blue – no of examples to average over is very low Red – incr in no examples to average over has given a smooth curve and the cost is decr Pink – if the curve looks like this after incr no of ex: there is something wrong with the algo and the algo is not learning. We have to change some parameters of the algo.
  • 10. If the plot looks like this: the algo is diverging How to converge the SGD:
  • 11. ONLINE LEARNING: when there is continuous input data like on websites which track users’ actions ➢It can adapt to changing user preferences, even if the entire pool of users changes ➢After updating the parameters with current example, we can just throw away the data. ≫ if the data is continuous, its good to use this algo ≫ if the data is small, then its better to use logistic regression.
  • 12. CLICK THROUGH RATE PREDICTION PROBLEM – online learning Lets say we have 100 phones in the shop and we want to return the top 10 results to user: Each time the user clicks a link, we will collect the (x, y) – pair to learn through and show them the products they are most likely to click on
  • 13. MAP REDUCE: When an ML problem is too big for a single machine, we can use map-reduce: ➢We split the training set and sent each part to different computers ➢Computers process data and sent them to centralized controller ➢Centralized controller combines the results and produce the model When to use map-reduce: Ask yourself: can the algorithm be expressed as a sum of functions over training set? ▪ If yes: use map-reduce
  • 14. DATA PARALLELISM: We can also use map reduce on machines having multiple cores: ➢This way we don’t have to care about network latencies Some linear algebra libraries automatically take care of it.