SlideShare a Scribd company logo
1 of 90
Download to read offline
Welcome to
Explore ML!
Day 2
Linear Regression
Fitting Linear Models
Premise
What are we trying to achieve?
We are trying to solve or
predict something, based on
what we already know.
This is a regression problem,
that is, we want predict a real
valued output.
?
What exactly is “linear
regression”?
To the existing training
data, we try to find a “best
fit” line.
For now, “best fit” means
some line that seems to
match the data.
Best fit line
This is an example of
___________
This is an example of
Supervised Learning
Recall your high school
math classes.
y = mx + c
Model parameters :
Model Representation
Tweaking the value of the parameters
Loss function
Formalizing the notion of best fit line
How exactly do you say one line fits better than the
other?
Let’s look at what exactly is loss and the loss function.
Loss function
H(xi
) - yi
Loss function
Oops,
Looks like
the errors
became
bigger
Calculating the loss function
Add all the
differences
between
predicted
values and
our data
points
Calculating the loss function
But this
difference
is positive
And this
difference
is negative
Calculating the loss function
Square of
the
difference
is positive
tho :)
Square of
the
difference
is positive
tho :)
The math
Calculating the loss function
In fact this idea applies for all machine learning model
The aim is to find parameters for which is minimum.
This function is the reason why models can learn things. It makes the
model descend the gradient of errors to reach a place of perfection. It
involves some mathematical calculation to minimize the error between
the actual value and the predicted value.
Optimization Algorithm
Gradient Descent
Gradient Descent : Intuition
Gradient Descent : Algorithm
Gradient Descent : Algorithm
Gradient Descent : Math
Gradient Descent : Learning Rate
Feature Scaling
Most of the real-life datasets that you will be dealing with will have
many features ranging from a wide range of values.
If you were asked to predict the price of a house, you will be provided
with a dataset with multiple features like no.of. bedrooms, square feet
area of the house, etc.
There’s a problem though.
For example,
The range of data in each feature will vary wildly.
For example, the number of bedrooms can vary from, say, 1 to 5 and
square feet area can range from 500 to 3000.
How is this a problem?
How do you solve this?
Feature Scaling
Feature Scaling is a data preprocessing step used to normalize the
features in the dataset to make sure that all the features lie in a similar
range.
It is one of the most critical steps during the pre-processing of data
before creating a machine learning model.
If a feature’s variance is orders of magnitude more than the variance of
other features, that particular feature might dominate other features in
the dataset, which is not something we want happening in our model.
Why?
Two important scaling techniques:
1. Normalisation
2. Standardisation
Normalisation
Normalization is the concept of scaling the range of values in a feature
between 0 to 1.
This is referred as Min-Max Scaling.
Min-Max Scaling
Standardisation
Standardisation is a scaling technique where the values are centered
around the mean with a unit standard deviation.
Standardisation is required when features of input data set have large
differences between their ranges, or simply when they are measured in
different measurement units, i.e. kwh, Meters, Miles and more.
Z-score is one of the most popular methods to standardise data, and
can be done by subtracting the mean and dividing by the standard
deviation for each value of each feature.
Standardization assumes that your
observations fit a Gaussian distribution (bell
curve) with a well behaved mean and
standard deviation.
In conclusion,
Min-max normalization: Guarantees all features will have the
exact same scale but does not handle outliers well.
Z-score normalization: Handles outliers, but does not produce
normalized data with the exact same scale.
Time to apply what you’ve learnt!
___________
Before we get started
Go to kaggle.com and register for a
new account.
Before we get started
Now go to
bit.ly/gdsc-linear-reg-kaggle and
click on ‘Copy and Edit’ button
(top-right corner of the page).
Time to code!
Time to eat!
Logistic Regression
Learning to say “Yes” or “No”
Need for Logistic Regression
Why can’t we use Linear Regression and fit a
line??
Inaccurate Predictions
Out of Range Problem
For classification y=0 or y=1
In Linear Regression h(x) can be >1 or <0
But for Logistic Regression 0<= h(X) <= 1, must hold true
Hypothesis Representation
hθ
(x) = θT
X for linear regression.
But here we want 0<=hθ
(x)<=1
Sigmoid Function
hθ
(x) = g(θT
X), where g is the sigmoid function
Interpretation of hypothesis
hθ
(x) = Probability that y=1 given input x
For eg:
In cancer detection problem,
y = 1 signifies that a person has tested +ve for cancer
y = 0 signifies that a person has tested -ve for cancer
What does hθ
(x) = 0.7 mean for an example input x??
Decision Boundary
Predict y = 1 if hθ
(x)>=0.5 & y = 0 if hθ
(x)<0.5
Hence for y = 1:
⇒ hθ
(x)>=0.5
⇒ θT
X > 0
How does the model know when to
predict y =1 or y=0 ?
Say we find that θ1
= -3, θ2
= 1, θ3
= 1
Hence, on substitution :
Predict y=1 if -3+x1
+x2
> 0 , else predict y=0
hθ
(x) = 0 is called the decision boundary
i.e -3+x1
+x2
= 0 is the decision boundary
Loss Function
Recall from linear regression where we used this formula for
calculating the loss of our model
It turns out, although this same method gives a metric for loss of the
model, it has a lot of local minima
Loss function
Let’s consider the graph for -log(x) and -log(1-x)
Engineering a better loss function
-log(x)
-log(1 -x)
Let’s consider the case for a
data-point, who’s y = 1
If our model predicts
a 0, ie H(x) = 0 (the
wrong answer), we
get a really high loss
But if our model predicts a 1, ie
H(x) = 1 (the right answer), we
get a low loss
y = - log(x)
Now let’s consider the case for
a data-point, who’s y = 0
If our model predicts
a 1, ie H(x) = 1 (the
wrong answer), we
get a really high loss
But if our model
predicts a 0, ie H(x) =
0 (the right answer),
we get a low loss
y = - log(1-x)
Loss Function
Cool math trick!
Time to code again!
___________
Head over to bit.ly/gdsc-logistic-reg-kaggle and
click on ‘Copy and Edit’
Don’t forget to sign in!
K-Means Clustering
Finding Clusters in Data
K-means Clustering : Theory
K-Means Clustering is an Unsupervised Machine
Learning algorithm. Here, the algorithm can identify
the similarities and differences in the data and divide
the data into several groups called clusters. K is the
number of clusters. We can determine that K value
according to the dataset.
K means Clustering : Algorithm
Step 1 : Choose the number of clusters (K value) according to the dataset.
K = 2 here.
K means Clustering : Algorithm
Step 2 : Select the centroid points at random K points
Step 3 : Assign each data point to the closest centroid. That forms K clusters.
K mean Clustering : Algorithm
K means Clustering : Algorithm
Euclidean Distance : If (x1
, y1
) and (x2
, y2
) are two points, then the
distance between them is given by
Step 4 : Compute and place the new centroid of each cluster
K means Clustering : Algorithm
Step 5 : Reassign each data point to the new closest centroid. This step
repeats till no reassignment takes place
K means Clustering : Algorithm
K means Clustering : Algorithm
Step 6 : Model is ready
K means Clustering :
Choosing the correct number of clusters
K means Clustering :
Elbow Method
Quick Recap!
Machine Learning
Roadmap!
We want to know how we did!
Please fill out the feedback form given below:
https://bit.ly/gdsc-ml-feedback
Registered participants who’ve filled the form will
be eligible for certificates.
We want to know how we did!
We request all of you to check your inbox from
email from GDSC Event Platform. You will get it
soon.
Registered participants who’ve filled the form will
be eligible for certificates.
RESOURCES!
bit.ly/gdsc-explore-ml
Thank You!

More Related Content

What's hot

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Summary statistics
Summary statisticsSummary statistics
Summary statisticsRupak Roy
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)Abhimanyu Dwivedi
 

What's hot (15)

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Chapter 05 k nn
Chapter 05 k nnChapter 05 k nn
Chapter 05 k nn
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
 

Similar to Explore ml day 2

Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregressionkongara
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
Analysis and Design of Algorithms notes
Analysis and Design of Algorithms  notesAnalysis and Design of Algorithms  notes
Analysis and Design of Algorithms notesProf. Dr. K. Adisesha
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 

Similar to Explore ml day 2 (20)

Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
working with python
working with pythonworking with python
working with python
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Analysis and Design of Algorithms notes
Analysis and Design of Algorithms  notesAnalysis and Design of Algorithms  notes
Analysis and Design of Algorithms notes
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 

Recently uploaded

AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Recently uploaded (20)

AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

Explore ml day 2

  • 3. Premise What are we trying to achieve? We are trying to solve or predict something, based on what we already know. This is a regression problem, that is, we want predict a real valued output. ?
  • 4. What exactly is “linear regression”? To the existing training data, we try to find a “best fit” line. For now, “best fit” means some line that seems to match the data. Best fit line
  • 5.
  • 6. This is an example of ___________
  • 7. This is an example of Supervised Learning
  • 8. Recall your high school math classes. y = mx + c Model parameters : Model Representation
  • 9.
  • 10.
  • 11. Tweaking the value of the parameters
  • 12.
  • 13. Loss function Formalizing the notion of best fit line How exactly do you say one line fits better than the other? Let’s look at what exactly is loss and the loss function.
  • 15. Loss function Oops, Looks like the errors became bigger
  • 16. Calculating the loss function Add all the differences between predicted values and our data points
  • 17. Calculating the loss function But this difference is positive And this difference is negative
  • 18. Calculating the loss function Square of the difference is positive tho :) Square of the difference is positive tho :)
  • 19. The math Calculating the loss function In fact this idea applies for all machine learning model The aim is to find parameters for which is minimum.
  • 20. This function is the reason why models can learn things. It makes the model descend the gradient of errors to reach a place of perfection. It involves some mathematical calculation to minimize the error between the actual value and the predicted value. Optimization Algorithm Gradient Descent
  • 21. Gradient Descent : Intuition
  • 22. Gradient Descent : Algorithm
  • 23. Gradient Descent : Algorithm
  • 25. Gradient Descent : Learning Rate
  • 27. Most of the real-life datasets that you will be dealing with will have many features ranging from a wide range of values.
  • 28. If you were asked to predict the price of a house, you will be provided with a dataset with multiple features like no.of. bedrooms, square feet area of the house, etc. There’s a problem though. For example,
  • 29. The range of data in each feature will vary wildly. For example, the number of bedrooms can vary from, say, 1 to 5 and square feet area can range from 500 to 3000. How is this a problem?
  • 30.
  • 31.
  • 32. How do you solve this?
  • 34. Feature Scaling is a data preprocessing step used to normalize the features in the dataset to make sure that all the features lie in a similar range. It is one of the most critical steps during the pre-processing of data before creating a machine learning model.
  • 35. If a feature’s variance is orders of magnitude more than the variance of other features, that particular feature might dominate other features in the dataset, which is not something we want happening in our model. Why?
  • 36. Two important scaling techniques: 1. Normalisation 2. Standardisation
  • 38.
  • 39. Normalization is the concept of scaling the range of values in a feature between 0 to 1. This is referred as Min-Max Scaling.
  • 41.
  • 43.
  • 44. Standardisation is a scaling technique where the values are centered around the mean with a unit standard deviation. Standardisation is required when features of input data set have large differences between their ranges, or simply when they are measured in different measurement units, i.e. kwh, Meters, Miles and more.
  • 45. Z-score is one of the most popular methods to standardise data, and can be done by subtracting the mean and dividing by the standard deviation for each value of each feature.
  • 46.
  • 47. Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well behaved mean and standard deviation.
  • 48. In conclusion, Min-max normalization: Guarantees all features will have the exact same scale but does not handle outliers well. Z-score normalization: Handles outliers, but does not produce normalized data with the exact same scale.
  • 49. Time to apply what you’ve learnt! ___________
  • 50. Before we get started Go to kaggle.com and register for a new account.
  • 51. Before we get started Now go to bit.ly/gdsc-linear-reg-kaggle and click on ‘Copy and Edit’ button (top-right corner of the page).
  • 54. Logistic Regression Learning to say “Yes” or “No”
  • 55. Need for Logistic Regression Why can’t we use Linear Regression and fit a line??
  • 57. Out of Range Problem For classification y=0 or y=1 In Linear Regression h(x) can be >1 or <0 But for Logistic Regression 0<= h(X) <= 1, must hold true
  • 58. Hypothesis Representation hθ (x) = θT X for linear regression. But here we want 0<=hθ (x)<=1
  • 59. Sigmoid Function hθ (x) = g(θT X), where g is the sigmoid function
  • 60. Interpretation of hypothesis hθ (x) = Probability that y=1 given input x For eg: In cancer detection problem, y = 1 signifies that a person has tested +ve for cancer y = 0 signifies that a person has tested -ve for cancer What does hθ (x) = 0.7 mean for an example input x??
  • 61. Decision Boundary Predict y = 1 if hθ (x)>=0.5 & y = 0 if hθ (x)<0.5 Hence for y = 1: ⇒ hθ (x)>=0.5 ⇒ θT X > 0
  • 62. How does the model know when to predict y =1 or y=0 ?
  • 63.
  • 64. Say we find that θ1 = -3, θ2 = 1, θ3 = 1 Hence, on substitution : Predict y=1 if -3+x1 +x2 > 0 , else predict y=0 hθ (x) = 0 is called the decision boundary i.e -3+x1 +x2 = 0 is the decision boundary
  • 65. Loss Function Recall from linear regression where we used this formula for calculating the loss of our model It turns out, although this same method gives a metric for loss of the model, it has a lot of local minima
  • 66. Loss function Let’s consider the graph for -log(x) and -log(1-x) Engineering a better loss function
  • 68. Let’s consider the case for a data-point, who’s y = 1 If our model predicts a 0, ie H(x) = 0 (the wrong answer), we get a really high loss But if our model predicts a 1, ie H(x) = 1 (the right answer), we get a low loss y = - log(x)
  • 69. Now let’s consider the case for a data-point, who’s y = 0 If our model predicts a 1, ie H(x) = 1 (the wrong answer), we get a really high loss But if our model predicts a 0, ie H(x) = 0 (the right answer), we get a low loss y = - log(1-x)
  • 71. Time to code again! ___________
  • 72. Head over to bit.ly/gdsc-logistic-reg-kaggle and click on ‘Copy and Edit’ Don’t forget to sign in!
  • 74. K-means Clustering : Theory K-Means Clustering is an Unsupervised Machine Learning algorithm. Here, the algorithm can identify the similarities and differences in the data and divide the data into several groups called clusters. K is the number of clusters. We can determine that K value according to the dataset.
  • 75. K means Clustering : Algorithm Step 1 : Choose the number of clusters (K value) according to the dataset. K = 2 here.
  • 76. K means Clustering : Algorithm Step 2 : Select the centroid points at random K points
  • 77. Step 3 : Assign each data point to the closest centroid. That forms K clusters. K mean Clustering : Algorithm
  • 78. K means Clustering : Algorithm Euclidean Distance : If (x1 , y1 ) and (x2 , y2 ) are two points, then the distance between them is given by
  • 79. Step 4 : Compute and place the new centroid of each cluster K means Clustering : Algorithm
  • 80. Step 5 : Reassign each data point to the new closest centroid. This step repeats till no reassignment takes place K means Clustering : Algorithm
  • 81. K means Clustering : Algorithm Step 6 : Model is ready
  • 82. K means Clustering : Choosing the correct number of clusters
  • 83. K means Clustering : Elbow Method
  • 86. We want to know how we did! Please fill out the feedback form given below: https://bit.ly/gdsc-ml-feedback Registered participants who’ve filled the form will be eligible for certificates.
  • 87. We want to know how we did! We request all of you to check your inbox from email from GDSC Event Platform. You will get it soon. Registered participants who’ve filled the form will be eligible for certificates.
  • 88.