Introduction to machine learning and its application

Machine Learning (CS-5137)
Lecture #01
Fundamentals of Machine Learning
Dr Syed
Zaffar Qasim Assistant
Professor (CIS)
MS (AI)
Spring Semester 2025
1
CS-5137 Machine
Learning

2
Background
 In 1959, a paper was published in the IBM Journal of
Research and Development with the title i.e. Some Studies in
Machine Learning Using the Game of Checkers.
 Authored by IBM’s Arthur Samuel, the paper investigated
the use of machine learning in the game of checkers
o to verify that a computer can be programmed so that
o it will learn to better play a game of checkers
o than the person who made the program.
CS-5137 Machine
Learning

3
Background
 The Samuel Checkers-Playing Program is known to be
the first computer program that could learn,
o developed in 1959 by Arthur Samuel,
o one of the fathers of machine learning.
 Followed by Samuel, Ryszard S. Michalski, also
deemed a father of machine learning,
o came out with a system for recognizing
handwritten alphanumeric characters,
o working along with Jacek Karpinski in 1962-1970.
 The subject from then has evolved and led the way
for various applications impacting businesses and
society for the good.
CS-5137 Machine
Learning

4
Definition of ML
 ML is the design and study of softwares that gives
computers the ability to learn patterns from data without
being explicitly programmed.
 ML in fact uses past experience to make future
decisions.
 The fundamental goal of machine learning is to
o generalize, or to induce unknown rules
o from examples of the rule's application.
Past Experience Future Decisions
Statistical model
CS-5137 Machine
Learning

5
Definition of ML
 Throughout the 1950s and 1960s,
Samuel developed programs that played checkers.
 While the rules of checkers are simple, complex
strategies are required to defeat skilled opponents.
 Samuel never explicitly programmed
these
strategies,
o through the experience of playing thousands
of games,
o the program learned complex behaviors
that allowed it to beat many human opponents.
CS-5137 Machine
Learning

6
Definition of ML
 A key feature of ML is to detect patterns based on
empirical data; all without direct programming
commands.
 It doesn’t mean that machines need no upfront
programming.
 On the contrary, machine learning is heavily
dependent on computer programming.
 Instead, Samuel observed that machines don’t require
a direct input command to perform a set task but rather
input data.
CS-5137 Machine
Learning

7
Input Command vs Input Data
 An example of an input command is to sort a list of
numbers in ascending order using predefined
algorithm specifying the steps for sorting.
CS-5137 Machine
Learning
Fig 1
 Input data, however, is different.
o Data is fed to the machine,
o an algorithm is selected,
o hyperparameter settings are configured and
adjusted,
and
o the machine is instructed to conduct its analysis.

8
Input Command vs Input Data
 The machine then deciphers patterns found in the data
through the process of trial and error to produce a
decision model.
 The decision model can then be used to predict future
values.
 The input command approach consists of only two steps:-
o Command > Action
 Machine learning entails a three-step process:
o Data > Model > Action
CS-5137 Machine Learning

9
Definition of ML
 The canonical example of machine learning is spam
filtering.
o Here the empirical data comprises thousands of sample
emails labeled as either spam or ham.
o The model consists of statistical-based rules.
o The parameters of the model include the keywords from
negative list like dear friend, free, invoice, PayPal,
casino, bankruptcy, and winner.
o The model is then trained and tested to learn to classify
new messages.
CS-5137 Machine
Learning

10
Learning from experience
 The three categories of machine learning:-
1.Supervised Learning,
2.Unsupervised Learning, and
3.Reinforcement Learning.
1. Supervised learning
 It works by feeding sample data to the machine
o with input variables or features (denoted as “X”)
and
o the correct value output of the data (denoted as
“Y”).
Education Salary Age Status
Postgraduate High Middle Senior
Graduate Low Young Middle
Graduate Medium Old Senior
Intermediate Medium Young Junior
Postgraduate Medium Middle Middle
Intermediate Medium Old Middle
CS-5137 Machine
Learning

1
1
 The fact that the output associated with feature values
is known qualifies the dataset as labeled.
 The algorithm then deciphers patterns of relationship
between input and output in the data and creates a
model that can apply the same underlying rules with
new data.
Education Salary Age Status
Postgraduate High Middle Senior
Graduate Low Young Middle
Graduate Medium Old Senior
Intermediate Medium Young Junior
Postgraduate Medium Middle Middle
Intermediate Medium Old Middle
CS-5137 Machine
Learning

12
 There are many names for the output of a machine
learning program.
 We will refer to the output as the response variable.
 Other names for response variables include
dependent variables, measured variables, and labels.
 Similarly, the input variables have several names.
 We can refer to the input variables as features, and
the phenomena they measure as explanatory
variables or controlled variables.
 Response variables and explanatory variables may
take real or discrete values.
CS-5137 Machine
Learning

13
 To predict the market price for the purchase of a used car,
o a supervised algorithm can formulate predictions by
analyzing the relationship between
o car attributes (including the year of make, car brand,
mileage, etc.) and
o the selling price of other cars sold based on historical data.
 Given that the supervised algorithm knows the final price of
other cars sold, it can then work backward to determine the
relationship between the characteristics of the car and its
value.
CS-5137 Machine
Learning
Fig 2

14
 After the machine deciphers the data pattern, it creates a model:
o an algorithmic equation for giving an outcome with new data
o based on the rules derived from the training data.
 The model is then given test data for checking accuracy.
 After the model has passed both the training and test data stages,
it is ready to be applied in the real world.
 Examples of supervised learning algorithms include
o regression analysis,
o decision trees,
o k-nearest neighbors,
o neural networks, and
o support vector machines.
CS-5137 Machine
Learning

15
2. Unsupervised learning
 In unsupervised learning, there is no labeled data and the
aim is to discover hidden patterns in the data and find the
regularities in the input.
 The k-means clustering algorithm is a popular example of
unsupervised learning.
 Based on density estimation, clustering aims to find clusters
or groupings of input that are found to possess similar
features as shown in Fig 3.
CS-5137 Machine
Learning
Fig 3

16
 In the case of a company with a data of past customers,
o the customer data contains the
demographic information (age, salary,
education etc)
o as well as the past transactions with the company.
 In such a case, a clustering model allocates customers
similar in their attributes to the same group,
o providing the company with natural groupings of its
customers;
o this is called customer segmentation.
CS-5137 Machine
Learning

17
 Once such groups are found, the company may decide
strategies,
o for example, services and
products, specific to different groups;
o this is known as customer relationship management.
 Such a grouping also allows identifying those who are
outliers,
o namely, those who are different from other
customers,
o which may imply a niche in the market that can
be further exploited by the company.
CS-5137 Machine
Learning

18
3. Reinforcement learning
 Supervised learning and unsupervised learning can be
thought of as occupying opposite ends of a spectrum.
 The semi-supervised learning problems, make use of
both supervised and unsupervised data;
o these problems are located on the spectrum between
supervised and unsupervised learning.
 An example of semi-supervised machine learning is
reinforcement learning,
o unlike supervised and unsupervised learning,
o a program continuously improves its model
o by leveraging feedback from previous iterations.
CS-5137 Machine
Learning

19
3. Reinforcement learning (cont’d)
 However the feedback may not be associated with a
single action but in fact a sequence of actions.
 Here a single action is not important; what is important
is policy – the sequence of correct actions to reach the
goal.
 There is no such thing as best action in any intermediate
state; an action is good if it is part of a good policy.
 In such a case, the ML program should be able to assess
the goodness of policies and learn from past good action
sequences to be able to generate a policy.
CS-5137 Machine
Learning

20
• A good example is game playing where a single move by
itself is not that important; it is the sequence of right
moves that is good.
• Game playing is an important research area in AI.
• Reason: games are easy to describe and at the same time,
they are quite difficult to play well.
 A game like chess has small number of rules but is very
complex with large number of possible moves at each state.
• Once we have good algorithms that can learn to play games
well, we can also apply them to applications with
more evident economic utility.
CS-5137 Machine
Learning

21
 A robot navigating in an environment in search of a
goal location is another application area of
reinforcement learning.
 At any time, the robot can move in one of a number of
directions.
 After a number of trial runs, it should learn the correct
sequence of actions
o to reach to the goal state from an initial state,
o doing this as quickly as possible and
o without hitting any of the obstacles.
CS-5137 Machine
Learning

22
Training data and Test data
 In supervised machine learning, labeled data is split into
training data and test data.
 The observations (or records) in the training set comprise
the experience that the algorithm uses to learn.
 Each observation consists of
o an observed response variable and
o one or more observed explanatory variables.
 The test set is a similar collection of observations that is
used to evaluate the performance of the model using
some performance metric.
 It is important that no observations from the training set
should be included in the test set.
CS-5137 Machine
Learning

23
 If the test set does contain examples from the training
set, it will be difficult to assess whether
o the algorithm has learned to generalize from the
training set or
o has simply memorized it.
 Memorizing the training set leads to over-fitting.
 A program that memorizes its observations may not
perform its task well, as it could memorize relations
and structures that are noise or coincidence.
 In addition to training and test data, a third set of
records, called validation or hold-out set, is
sometimes required.
CS-5137 Machine
Learning

24
 The validation set is used to tune variables called
hyperparameters, which control how the model is learned.
 It is common to allocate 50% or more of data to training
set, 25% to test set, and the remainder to validation set.
 Many supervised training sets are prepared manually, or
by semi-automated processes.
 Creating a large collection of supervised data can be costly
in some domains.
 Fortunately, several datasets are bundled with scikit-learn,
allowing developers to focus on experimenting with
models instead.
CS-5137 Machine
Learning

25
 When the training data is scarce, a practice called cross-
validation can be used to train and validate an algorithm
on the same data.
 Here the training data is partitioned into folds.
 The algorithm is trained using all but one of the partitions,
and tested on the remaining partition.
 The partitions are then rotated several times so that the
algorithm is trained and evaluated on all of the data.
CS-5137 Machine
Learning
Fig 4

26
Machine Learning Tasks
 Two of the most common supervised machine
learning tasks are classification and regression.
 In classification tasks the program must learn to
predict discrete values for the response
variables from one or more explanatory
variables.
 That is, the program must predict the most
probable class, or label for new observations.
 Applications of classification include
o predicting whether a stock's price will rise or fall,
o or deciding if a news article belongs to the
politics or leisure section.
CS-5137 Machine
Learning

27
 In regression problems, the program must predict
the value of a continuous response variable.
 Examples of regression problems include predicting
the sales for a new product, or the salary for a job
based on its description.
 Similar to classification, regression problems require
supervised learning.
 A common unsupervised learning task is to discover
groups of related observations, called clusters,
within the training data.
CS-5137 Machine
Learning

28
 Clustering is often used to explore a dataset.
 For example, given a collection of movie reviews, a
clustering algorithm might discover sets of
positive and negative reviews.
 The system will not be able to label the clusters as
"positive" or "negative";
 Without supervision, it will only have knowledge
that the grouped observations are similar to each
other by some measure.
 A common application of clustering is discovering
segments of customers within a market for a
product.
CS-5137 Machine Learning

29
common
 Dimensionality reduction is
another unsupervised
learning task.
 Some problems may contain
thousands
or even
millions of explanatory variables, which can be
computationally costly to work with.
 Additionally, the program's ability to generalize may
be reduced if
o some of the explanatory variables capture noise or
o are irrelevant to the underlying relationship.
 Dimensionality reduction is the process of
discovering the explanatory variables that
account for the greatest changes in the response
variable.
CS-5137 Machine
Learning

30
 Dimensionality reduction can also be used to
visualize data.
 It is easy to visualize a regression problem such as
predicting the price of a home from its size;
o the size of the home can be plotted on the
graph's x axis, and the price of the home can be
plotted on the y axis.
 Similarly, it is easy to visualize the housing price
regression problem when a second explanatory
variable is added.
CS-5137 Machine
Learning

31
Noise
 Noise is any unwanted anomaly in the data which causes
the class to be more difficult to learn.
 There are several sources of noise:
1. There may be imprecision in recording the input
attributes.
o The data collection instruments may be faulty.
o Human or computer errors occurring at data
entry.
o Errors in data transmission can also occur.
2. There may be errors in labeling the data points. This is
sometimes called teacher noise.
3. There may be additional attributes, which were omitted,
that affect the label of an instance.
o Such attributes may be hidden or latent.
CS-324 Machine
Learning

32
Noise
 As can be seen in fig 5, there is noise (see the negative
example inside rectangle) very near to positive examples.
 One possibility is to keep the prediction model simple (a
rectangle) and allow some error.
 Another possibility is a complex model, one can make a
perfect fit to the data and attain zero error; see the wiggly
shape in figure.
CS-324 Machine
Fig 5

Introduction to machine learning and its application

More Related Content

Similar to Introduction to machine learning and its application

Recently uploaded

Introduction to machine learning and its application