2. Topics Covered
1.1 Introduction to Machine Learning
Artificial Intelligence
Machine Learning
Application of Machine Learning
1.2 Types of Machine Learning
1.3 Supervised Machine Learning
1.3.1 Classification
1.4 Unsupervised Machine Learning and its Application
1.4.1 Difference between Supervised and Unsupervised Machine
Learning
1.5 Semi-Supervised Machine Learning
1.6 Reinforcement Machine Learning and its Application
1.7 Hypothesis Space and Inductive Bias
1.8 Underfitting and Overfitting
1.9 Evaluation and Sampling Methods
1.9.1 Regression Metrics
1.9.2 Classification Metrics
1.10 Training and Test Dataset and Need of
Cross Validation
1.11 Linear Regression
1.111 Linear Models
1.12 Decision Trees
1.12.1 The Decision Tree Learning Algorithm
1.12.2 Entropy
1.12.3 Information Gain
1.124 Impurity Measures
Exercise
3. Introduction to Machine Learning
Machine learning is a branch of Artificial Intelligence (AI) and Computer Science which focuses on the use of
data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
Machine Learning is an umbrella term used to describe a variety of different tools and techniques which
allow a machine or a computer program to learn and improve over time.
ML tools and techniques include to Statistical Reasoning, Data Mining, Mathematics and Programming.
Apache Mahout AWS Machine Learning BigML Colab Google Cloud AutoML
IBM Watson Studio Microsoft Azure Machine Learning OpenNN PyTorch Scikit-learn
Shogun TensorFlow Vertex AI Weka XGBoost
https://builtin.com/machine-learning/machine-learning-tools
4. Introduction to Machine Learning
Machines/computers an ability to learn the way humans do, i.e. without explicitly telling them what to do.
Machine learning gives computers the ability to learn without being explicitly programmed.
Arthur Samuel
Machine learning refers to teaching devices to learn information given to a dataset without manual human
interference.
Machine Learning (ML) is a subset of artificial intelligence (AI) that uses statistics, trial and error, and huge
amount of data to learn a specific task without ever having to be specifically programmed to do that task.
It involves identifying patterns in data, and then optimizing those findings through both trial and error and
feedback.
5. Well Posed Learning Problem
A well-posed learning problem is a task in which the Input, Output, and Learning objective are clearly defined, and there exists a
unique solution to the problem.
A well-posed learning problem has three properties:
1. Existence: The problem must have at least one solution. There must be a possible relationship between the input and output data.
2. Uniqueness: The problem must have a unique solution. There must be only one correct relationship between the input and output
data.
3. Stability: The solution to the problem must be stable with respect to small changes in the input data. The output produced by the
machine learning algorithm should not change significantly when the input data is slightly modified.
4. A well-posed learning problem is essential for the development of effective and reliable machine learning algorithms. Without a
well-posed problem, the algorithm may produce incorrect or unstable results, making it difficult to use in practical applications.
So it is important to carefully define the input, output, and learning objective when formulating a machine learning
problem.
6. Well Posed Learning Problem
A learning problem can be defined as a task in which an agent (such as A Machine Learning
Algorithm or a Human) must learn to perform a specific task or make predictions based on a set of
inputs or data.
Three features that can be identified in a learning problem are:
Input data: This refers to the set of data or information that the agent uses to learn and make
predictions. The input data can be structured or unstructured, and may come from a variety of sources
such as text, images, audio, or sensor data.
Output or prediction: This refers to the task that the agent is trying to learn or the prediction that it is
trying to make based on the input data. The output can be a single value, a set of values, or a
probability distribution over possible outcomes.
7. Well Posed Learning Problem
Evaluation metric / Performance measure: This refers to the measure or metric that is used to evaluate
the performance of the agent on the learning task.
The evaluation metric may vary depending on the specific learning problem and may include metrics such as
Accuracy, Precision, Recall, F1 Score, or Mean Squared Error.
Definition:-
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Tom Mitchell
8. Well Posed Learning Problem
Spam Email Classification.
Input: The input is an email, which can be represented as a collection of words or phrases, or even
more complex structures like the email header information.
Output: The output is a binary label indicating whether the email is spam (1) or not spam (0).
Learning Objective: The learning objective is to train a model that can accurately classify emails as
spam or not spam. This is typically achieved by minimizing a loss function, such as the cross-entropy loss
for binary classification problems.
9. Examples of well-posed learning problems:
2. Sentiment analysis: Given a set of text documents,
Task:- Is to learn a model that can predict the sentiment
of new documents (e.g., positive, negative, or neutral).
Input:- Is the text data,
Output:- Is the sentiment label
Learning objective:- Is to minimize the prediction error.
Performance Measure :- Percentage of prediction of
the sentiments of new documents.
Training Experience :- A database of sentiments of
given documents.
1. Image classification: Given a set of labeled images,
Task:- Is to learn a model that can Correctly classify
new images into their respective classes.
Input:- Is the image data
Output:- Is the class label,
Learning objective:- Is to Minimize the Classification
Error.
Performance Measure :- Percentage of images
correctly classified.
Training Experience :- A Database of images with
given classification
10. Examples of well-posed learning problems:
3. Fraud detection: Given a set of transaction data,
Task:- Is to learn a model that can identify fraudulent transactions.
Input:- Is the transaction data
Output:- Is a binary label (fraudulent or not),
Learning objective:- Is to minimize the false positive and false
negative rates.
Performance Measure :- Percentage of False Positive and False
Negative Rates.
4. Regression: Given a set of input features and corresponding target
values,
Task:- Task is to learn a model that can predict the target value for
new input data
Input:- Is the feature data
Output:- Is the target value,
Learning objective:- Is to minimize the prediction error (e.g., mean
squared error).
Performance Measure :- Percentage of the prediction error.
11.
12.
13. History of Machine Learning
Year 1950 : Alan Turing developed the Turing Test during this year.
Year 1957 : Perceptron - The first ever Neural Network
Year 1960 : MIT developed a Natural Language Processing program to act as a therapist. The program was called ELIZA.
Year 1967 : The advent of Nearest Neighbor algorithm, very prominently used in Search and Approximation
Year 1970 : Backpropagation takes shape. Backpropagation is a set of algorithms used extensively in Deep Learning.
Year 1980 : Kunihiko Fukushima successfully built a multilayered Neural Network called ANN.
Year 1981 : Explanation Based Learning
Year 1989 : Reinforcement Learning is finally realized. Q-Learning algorithm.
Year 2009 : ImageNet
Year 2010 : Google Brain and Facebook's DeepFace
Year 2022 : ChatGPT Chat Generative Pre-trained Transformer
https://www.zeolearn.com/magazine/what-is-machine-learning
14. Artificial Intelligence vs. Machine Learning vs. Deep Learning vs. Neural
Networks
Machine learning, Deep learning, and Neural networks are all sub-fields of Artificial Intelligence.
Neural networks is a sub-field of Machine learning, and Deep learning.
Deep" Machine learning can use labeled datasets, also known as Supervised learning. Eliminates some of the
human intervention required and enables the use of larger data sets.
“Non-deep", Machine learning is more dependent on human intervention to learn. Human experts determine
the set of features to understand the differences between data inputs, requiring more structured data to learn.
Neural networks, or artificial neural networks (ANNs), are comprised of node layers, containing an input layer,
one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an
associated weight and threshold.
Deep learning and Neural Networks are accelerate progress in areas such as computer vision, natural language
processing, and speech recognition.
15. Artificial Intelligence vs. Machine Learning vs. Deep Learning vs.
Neural Networks
AI refers to the software and processes that are designed to mimic the way humans think and process
information. It includes computer vision, natural language processing, robotics, autonomous vehicle operating
systems, and machine learning.
With the help of artificial intelligence, Devices are able to learn and identify information in order to solve
problems and offer key insights into various domains.
16. Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
AI enables machines to understand data and make decisions based on patterns hidden
in data without any human intervention.
Machines adjust their knowledge based on new inputs.
Example, Self-driving cars , Alexa and Cortana - Conversations with us in our natural
human language
Machine Learning:- Subset of AI
Machine learning with the help of the algorithm can process the surplus of
information and output an accurate prediction within moments. Use deep learning all
the time.
Uses statistical models to explore, analyze and find patterns in large amounts of
data.
Perform tasks without being explicitly programmed, allows them to learn from
experience and improve over time without human intervention.
https://learnerjoy.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-vs-data-science/
17. Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
Approaches:- 1. Supervised learning, 2. Unsupervised learning and 3.
Reinforcement learning.
1. Supervised learning:- Requires a human to input labelled data /Past
Labeled data into the machine and outputs a prediction of a new sample.
2. Unsupervised learning:- Takes unlabeled data as input, groups the
data based on its similarity and outputs clusters of similar samples for the
human to analyze further reinforcement. O/p Not known. Algorithms- L-
means, Hierarchical Clustering, PCA , Neural Network.
3. Reinforcement learning. :- Reinforcement learning is also known as
semi-supervised learning. A small amount of labeled data and a large
amount of unlabeled data and utilizes a reward or trial and error system
to learn over time. Good Action and Bad Action
18. Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
Deep Learning - Deep learning is the subset of machine learning.
The main idea behind deep learning is machines to learn things like the human
brain.
Human brain is made of multitudes of neurons that allow us to operate the way
we do.
The collection of connected neurons in a human brain, scientists create a multi-
layer network that machines could use to learn from experience and predict.
Techniques
Artificial Neural Networks (ANN):- I/P in the form of Numbers
Convolutional Neural Networks (CNN):- I/P in the form of Images
Recurrent neural networks (RNN). I/P in the form of Time Series Data
Two popular frameworks used in Deep learning are
•PyTorch by Facebook
•TensorFlow by Google
19. Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
Data Science
Data science is to perform exploratory analysis to better understand
the data.
It plays a huge role when building ML models. If you have a huge
amount of data, you will get more insights from data and accurate
results that can be applied to business use cases.
Statistical tools –Linear algebra
20. Machine Learning Applications
Image Recognition: It is used to identify objects, persons,
places, digital images, etc. ex Automatic friend tagging
suggestion. Deep Face
Traffic prediction: Google Maps, Real Time location of the
vehicle form Google Map app and sensors. Average time has
taken on past days at the same time.
Product recommendations: used by various e-
commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to
the user.
21. Machine Learning Applications
Self-driving cars: It is using unsupervised learning method to train the car models to detect people and
objects while driving.
Email Spam and Malware Filtering: Filtered automatically normal, and spam. Multi-Layer
Perceptron, Decision tree, and Naïve Bayes classifier.
Virtual Personal Assistant: Google assistant, Alexa, Cortana, Siri.
Email Spam and Malware Filtering:
Medical Sector
Banking and Stock Market, Search Engine , Chat Bot,
Speech Recognition: Search by voice, Speech to text", Computer speech recognition
22. Machine learning Life cycle
Machine learning life cycle is a cyclic process to build an
efficient machine learning project.
Gathering Data
Data preparation
Data Wrangling
Analyse Data
Train the model
Test the model
Deployment
23. 1. Gathering Data: Obtain all data-related problems.
This step includes the below tasks:
•Identify various data sources
•Collect data
•Integrate the data obtained from different sources
2. Data preparation: Data preparation is a step where we put our data into a suitable place and
prepare it to use in our machine learning training. This step can be further divided into two processes:
•Data exploration: Understand the nature of data, understand the characteristics, format, and quality
of data.
•Data pre-processing:
24. 3. Data Wrangling: Data wrangling is the process of cleaning and converting raw data into a
useable format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the next step.
Cleaning of data is required to address the quality issues.
collected data may have various issues, including:
•Missing Values
•Duplicate data
•Invalid data
•Noise
25. 4. Data Analysis
The cleaned and prepared data is passed on to the analysis step. This step involves:
• Selection of analytical techniques
• Building models
• Review the result
•Where we select the machine learning techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and evaluate the model.
26. 5. Train Model
Train our model to improve its performance for better outcome of the problem.
Use datasets to train the model using various machine learning algorithms. Training a model is
required to understand the various patterns, rules, and, features.
6. Test Model: trained on a given dataset
7. Deployment
27. What is a dataset?
A dataset is a collection of data in which data is arranged in order. A dataset can contain any data from a series of an array
to a database table.
Types of data in datasets
• Numerical data:Such as house price, temperature, etc.
• Categorical data:Such as Yes/No, True/False, Blue/green, etc.
• Ordinal data:These data are similar to categorical data but can be measured on the basis of comparison.
Types of datasets
Image Datasets,
Text Datasets:
Time Series Datasets: Tabular Datasets:
28. Data Pre-processing:
Pre-processing procedures incorporate data cleaning to eliminate irregularities or blunders,
standardization to scale data inside a particular reach, highlight scaling to guarantee highlights have
comparative ranges, and taking care of missing qualities through ascription or evacuation.
Datasets are divided into two parts:
•Training dataset:
•Test Dataset
29. Popular sources for Machine Learning
datasets
1. Kaggle Datasets
UCI Machine Learning Repository
Datasets via AWS
Google's Dataset Search Engine
Microsoft Datasets
Scikit-learn dataset