Introduction to
Machine Learning
For
Complete Beginners
pythonforengineers.com
Steps to machine learning
Gather Data Clean Data Prepare input
for ML
Machine Learning
Algorithm
Test model on
real
data ML model
Visualise Data
Gathering Data

Depending on the use case, this might be the
hardest part!

Data may have to be scraped from websites, or
manually collected (by doing surveys, or taking
measurements in a lab).

Data maybe spread over hundreds of files, in a
haphazard format
Clean the data

Even when you gather the data, it may not be
easily usable

Missing fields, data in different formats (inches
vs centimeter)

I have seen the same file have dates in 3
different formats: dd-mm-yy, mm-dd-yy and yy-
mm-dd

The data has to be made consistent and clear
Visualise Data

You do NOT need machine learning algorithms!

Sometimes, just visualising the data will show
you insights

Made up example:
Why did account cancellation jump in January?
What did we change in the service in that time?
November December January Feb
0
1
2
3
4
5
6
7
8
9
10
Cancellations of Accounts
Num Cancel
Preparing for machine learning
We need to choose which inputs we will use for
our learning, and what the expected output is
Machine Learning
Algorithm
Inputs
Expected output
model
Example

Titanic dataset contains: Name, age, address
etc.

Are all these fields useful?

What are the inputs?

What is the expected output?
Problems we will face

Overfitting
The algorithm does an excellent job of prediction.
But it only works on our test data

The algorithm has only learnt how to predict
with our exact data

Like Astrologers!!
Solutions

The test data is divided into a training and test
section

Only the training set is used to train the
algorithm

The test set is then used to check if the model
works for unseen data (as we know what the
expected output is for the test data)

Problem: The amount of data the algorithm has
is reduced

Engineering is about compromises
Your assignment

Look at dataset

Which fields will you be choosing?

Machine learning for complete beginners.ppt

  • 1.
    Introduction to Machine Learning For CompleteBeginners pythonforengineers.com
  • 2.
    Steps to machinelearning Gather Data Clean Data Prepare input for ML Machine Learning Algorithm Test model on real data ML model Visualise Data
  • 3.
    Gathering Data  Depending onthe use case, this might be the hardest part!  Data may have to be scraped from websites, or manually collected (by doing surveys, or taking measurements in a lab).  Data maybe spread over hundreds of files, in a haphazard format
  • 4.
    Clean the data  Evenwhen you gather the data, it may not be easily usable  Missing fields, data in different formats (inches vs centimeter)  I have seen the same file have dates in 3 different formats: dd-mm-yy, mm-dd-yy and yy- mm-dd  The data has to be made consistent and clear
  • 5.
    Visualise Data  You doNOT need machine learning algorithms!  Sometimes, just visualising the data will show you insights  Made up example:
  • 6.
    Why did accountcancellation jump in January? What did we change in the service in that time? November December January Feb 0 1 2 3 4 5 6 7 8 9 10 Cancellations of Accounts Num Cancel
  • 7.
    Preparing for machinelearning We need to choose which inputs we will use for our learning, and what the expected output is Machine Learning Algorithm Inputs Expected output model
  • 8.
    Example  Titanic dataset contains:Name, age, address etc.  Are all these fields useful?  What are the inputs?  What is the expected output?
  • 9.
    Problems we willface  Overfitting The algorithm does an excellent job of prediction. But it only works on our test data  The algorithm has only learnt how to predict with our exact data  Like Astrologers!!
  • 10.
    Solutions  The test datais divided into a training and test section  Only the training set is used to train the algorithm  The test set is then used to check if the model works for unseen data (as we know what the expected output is for the test data)  Problem: The amount of data the algorithm has is reduced  Engineering is about compromises
  • 11.
    Your assignment  Look atdataset  Which fields will you be choosing?

Editor's Notes

  • #2 Mention I wont explain what machine learning is, since if you are here, you already know Most courses focus only on implementing the machine learning alg