Course Title: Introduction to Machine Learning Chapter, One: Introduction

Chapter One
Introduction to Machine Learning
Shumet Tadesse Nigatu
Department of Computer Science
College of Informatics
University of Gondar
February 2024
Shumet Tadesse (Computer Science) Machine Learning-Introduction February 2024 1 / 36

What is Machine Learning?
Definition
Automated discovery of patterns in data by a computer.
This is learning
because computer is given an initial pattern-recognition model and some
data, and figures out how to make the model better.
This is machine
because computer learns automatically, without intervention from humans
(other than selection of initial model and data)
Machine learning (ML) is a subset of artificial intelligence (AI) that
focuses on developing algorithms and models that enable
computers to learn and make predictions or decisions without
being explicitly programmed for a specific task.

AI vs. ML vs. DL
Machine Learning is an application of Artificial Intelligence that
enables systems to learn from vast volumes of data and solve specific
problems.
It uses computer algorithms that improve their efficiency
automatically through experience.
Machine Learning is the science of making computers learn and act
like humans by feeding data and information without being explicitly
programmed.

Check your progress
1 What is the primary goal of machine learning?
A To replace human decision-making entirely
B To enable computers to learn from data and improve performance on a
specific task over time
C To create algorithms that require explicit programming for every
scenario
D To develop systems that operate without any external input
2 Which of the following statements is true about Artificial Intelligence
(AI)?
A AI refers specifically to machines that can learn from data.
B AI is synonymous with machine learning.
C AI is a broader concept that includes machines performing tasks that
typically require human intelligence.
D AI is only concerned with rule-based systems.

Motivation: Why Machine Learning?
Our capacity of generating and collecting data have been increased
rapidly in the last several decades. Huge amount of data is available
at the tip of our hand because of:
− The computerization of business and scientific transactions.
− Advances in data collection tools, spanning from scanned texts and
image platforms to satellite remote sensors
− The widespread use of the World Wide Web (WWW) as a global
information system
Though all these have made it easier to create, collect, and store all
types of data, it results in data explosion (i.e., information
overload).
Data explosion is the problem of having huge amount of data in an
enterprise stored in databases, data warehouses and other information
repositories generated by automated data collection tools.
As the volume of data increases, the proportion of information in
which people could understand decreases substantially or as the size
of data get larger, analyzing the data becomes very difficult.

Motivation: Why Machine Learning?
The true value is not in storing the data, but rather in our ability to
extract useful reports and to find interesting trends & correlations to
support decisions and policies made by businesses
− We are drowning in data, but starving for knowledge!
− Too much data & too little knowledge!
To bridge the gap of analyzing large volume of data and extracting
useful information and knowledge for decision making, computerized
methods are becomming vital.
Facing too enormous volumes of data, human analysts with no special
tools can no longer make sense.
− In essence, the motivation behind embracing machine learning lies in its
capacity to harness the power of data, drive automation, and bring
about transformative changes that span industries, making it a pivotal
force in shaping the present and future of technology.
If we know how to reveal valuable knowledge hidden in raw data, data
might be one of our most valuable assets.

Why is machine learning important?
Data in many domains is huge
− Thousands to billions of data samples
− Hundreds to millions of attributes
− Impossible for human analysts to see patterns across so much data
Patterns in many domains are subtle, weak, buried in noise, or involve
complex interactions of attributes
− Often very difficult for human analysts to find
In some domains discovery and use of patterns must happen in real
time, e.g. in streaming data
− Human analysts could never keep up
Question: What problem does machine learning aim to address
in handling large and complex datasets?
A Data storage issues
B Data visualization challenges
C Information overload and the difficulty of manual analysis
D Lack of data availability

History and relationships to other fields
Machine Learning has gone through many phases of development
since the inception of computers.
Notable events in the history of machine learning

History and relationships to other fields
Machine learning has a close relationship to many related fields
including artificial intelligence, data mining, statistics, data science,
and others listed below.
In fact, Machine learning is in that way a multi-disciplinary field, and
in some ways is linked to all of these fields.

Check your progress
1 In which decade did the term ”machine learning” first emerge?
A 1950s
B 1960s
C 1970s
D 1980s
2 What is the significance of the ”Turing Test” in the history of
machine learning?
A It marked the invention of the first computer.
B It introduced the concept of artificial neural networks.
C It proposed a test for determining a machine’s ability to exhibit
intelligent behavior.
D It defined the first machine learning algorithm.
3 Which of the following fields of study plays a relatively lesser role in
machine learning?
A Mathematics
B Statistics
C Computer Science
D Linguistics

Essential math and statistics for machine
learning
Mathematics and statistics are fundamental concepts in machine
learning
− Mathematics is essentially a numerical method of expressing ideas
− Statistics is a more abstract form of communicating ideas

Linear Algebra- Basic Concepts
Machine Learning experts cannot live without Linear Algebra:
− ML make heavy use of Scalars
− ML make heavy use of Vectors
− ML make heavy use of Matrices
− ML make heavy use of Tensors
Scalars
A scalar is a real number and an element of a field used to define a
vector space
− In the context of linear algebra, scalars are considered 0-dimensional
tensors.
− In machine learning, scalars can represent constants such as learning
rates in optimization algorithms.
Vectors
For a postive integer n, is an n-tuple, ordered set or array of n
number, called elements or scalars
Two types: row vector, column vector

Linear Algebra - Basic Concepts
Matrices
A matrix is a group of vectors that all have the same dimension
Tensors
A tensor is a multidimensional array at the most fundamental level

Check your progress
1 What is a scalar in the context of linear algebra in machine learning?
A A one-dimensional array of numbers
B A single numerical value
C A matrix with only one row
D A matrix with only one column
2 How is a matrix represented in linear algebra for machine learning?
A One-dimensional array of numbers
B A single numerical value
C A two-dimensional array of numbers
D A tensor with arbitrary dimensions

Linear Algebra - Operations
Dot product
The dot product takes two vectors of the same length and returns
a single number (scalar)
It is an algebraic operation that takes two equal-length sequences of
numbers (usually coordinate vectors), and returns a single number.
− Compute the dot product of a=[1, 2, 3] and b=[4, -5,6]
Dot product is a measure of how big the individual elements are in
each vector
Exercise: Find the dot product (·) of vectors X and Y.
X =


1
−2
3


Y =


−4
0
2



Cosine Similarity
Mathematical notation of a dot product of two normalized vectors
The denominator involves the length of the vectors
Cosine of angle between two vectors
Formula:
Cosine(A, B) =
A · B
kAkkBk
− Dot Product:
A · B =
n
X
i=1
Ai · Bi
− Norm (Magnitude) of a Vector:
kAk =
v
u
u
t
n
X
i=1
A2
i

1 Exercise 1: Consider two vectors:
P =

2, −3, 1

Q =

4, 1, −2

Calculate the cosine similarity between vectors P and Q.
2 Exercise 2: Consider two vectors:
U =


1
0
−1


V =


0
2
1


Calculate the cosine similarity between vectors U and V.

Element-wise product
takes two vectors of the same length and produces a vector of the
same length with each corresponding element multiplied together
from the two source vectors

Exercise Question: Consider two vectors:
A =


2
−1
3


B =


4
2
−1


Calculate the element-wise product of vectors A and B.
Solution Steps:
1 Present the given vectors A and B.
2 Recall that the element-wise product of two vectors A and B is
calculated by multiplying corresponding elements:
C = A B =


A1 · B1
A2 · B2
A3 · B3


3 Substitute the values and perform the calculations.

Matrix multiplication
Matrix multiplication plays an important role in machine learning
Matrix multiplication is a binary operation that produces a matrix
from two matrices.
For matrix multiplication, the number of columns in the first matrix
must be equal to the number of rows in the second matrix.

Exercise Question: Consider the following matrices:
A =

3 −1
2 4

B =

5 1
−2 0

Calculate the matrix product C = A · B.
Solution Steps:
1 Present the given matrices A and B.
2 Recall the formula for matrix multiplication:
Cij =
n
X
k=1
Aik · Bkj
where Cij is the element in the i-th row and j-th column of matrix C.
3 Perform the calculations for each element in matrix C.

Matrix Transpose
The transpose of a matrix is found by interchanging its rows into
columns or columns into rows.
The transpose of the matrix is denoted by using the letter “T” in the
superscript of the given matrix.
Let A be an m × n matrix, and let B = AT . Then B is an n × m
matrix, and Bij = Aji .
Example:

Exercise Question:
Consider the following matrix:
D =


1 3 −2
0 5 1
−4 2 7


Calculate the transpose of matrix D, denoted as DT .
Solution Steps:
1 Present the given matrix D.
2 Recall the definition of the transpose of a matrix:
(DT
)ij = Dji
where (DT )ij is the element in the i-th row and j-th column of the
transpose matrix DT .
3 Perform the calculations to find each element in the transpose matrix.

Statistics Probability
Statistics are tools to get answers to questions about data:
− What is Common?
− What is Expected?
− What is Normal?
− What is the Probability?
Descriptive Statistics
Summarizes (describes) observations from a set of data
Descriptive statistics are broken down into different measures:
− Tendency (Measures of the Center)
The Mean (the average value)value
The Median (the mid point value)
The Mode (the most common value)
Outliers (values ”outside” the other values)
− Spread (Measures of Variability)
Min and Max
Standard Deviation
Variance

Check your progress
1 What is the correct interpretation of the median in a dataset?
A The most frequently occurring value
B The average of all values
C The middle value when the data is sorted in ascending order
D The difference between the highest and lowest values
2 Which of the following statements about variance and standard
deviation is true?
A Variance is the square of the standard deviation
B Standard deviation is the square root of the variance
C Variance and standard deviation are the same
D Variance and standard deviation measure completely different things
Exercise
1 Consider the following dataset: 12,15,18,20,25,25,30,32,35,40
Calculate the mean, median and mode of the dataset.
2 Given a dataset: 5,8,12,15,18
Calculate the variance and standard deviation of the dataset.

Probability is the bedrock of ML, which tells how likely is the event to
occur. The value of Probability always lies between 0 to 1.
It is the core concept as well as a primary prerequisite to
understanding the ML models and their applications.
Probability can be calculated by the number of times the event occurs
divided by the total number of possible outcomes.
Types of Probability
For better understanding the Probability, it can be categorized further in
different types as follows:
Empirical Probability: Empirical Probability can be calculated as
the number of times the event occurs divided by the total number of
incidents observed.
Joint Probability:It tells the Probability of simultaneously occurring
two random events. P(A ∩ B) = P(A) · P(B)

Conditional Probability:It is given by the Probability of event A given
that event B occurred.
The probability of an event A conditioned on an event B is denoted
and defined as:
P(A|B) =
P(A ∩ B)
P(B)
Similarly, P(B|A) = P(A∩B)
P(A) .
We can write the joint probability of A and B as
P(A ∩ B) = P(A) · P(B|A), which means: ”The chance of both
things happening is the chance that the first one happens, and then
the second one is given when the first thing happened.”

Statistics Probability - Exercise
Suppose you have the following dataset: Compute the following:
Day Weather (Sunny) Wind (Windy)
1 Yes Yes
2 No No
3 Yes Yes
4 No No
5 No Yes
6 Yes No
7 No Yes
8 No No
9 Yes Yes
10 No No
Basic Probability: Probability of a day being sunny and
Probability of a day being windy
Conditional Probability:
− Given that it’s sunny, the probability that it’s also windy
− Given that it’s not windy, the probability that it’s sunny

Statistics Probability-Distributions
Machine learning algorithms rely on probability distributions to model
real-world data and make predictions.
Distributions of the training data in machine learning is important to
understand how to vectorize the data for modeling
Types of Probability Distributions
Discrete Probability Distribution:
− A probability distribution is discrete if it involves countable outcomes.
− Example: Binomial Distribution - The binomial distribution models
the number of successes in a fixed number of independent and
identically distributed Bernoulli trials.

Types of Probability Distributions
Continuous Probability Distribution
− A probability distribution is continuous if it involves uncountably
infinite outcomes
− Example: Normal Distribution- The normal distribution is
characterized by its bell-shaped curve and is widely used in statistical
modeling

Check your progress
1 What is the probability of rolling an even number on a fair six-sided
die?
A 1
2
B 1
3
C 1
6
D 1
4
2 If P(A) = 0.6 and P(B|A) = 0.3, what is P(A ∩ B)?
A 0.18
B 0.12
C 0.3
D 0.6
3 In a normal distribution, what percentage of data falls within one
standard deviation from the mean?
A 68%
B 95%
C 99%
D 50%

Applications of machine learning
Machine learning is a dynamic and evolving field with a wide range of
applications that continue to shape industries and improve efficiency.
We are using machine learning in our daily life even without knowing
it such as Google Maps, Google assistant, Alexa, etc.
Some of the most trending real-world applications of Machine
Learning are depicted in the following figure

Check your progress
Match the correct concept from column B for the application areas listed
in Column A.
Column A
1 Marketing and Advertising
2 Automotive
3 Personalized User Experiences
4 Healthcare
5 Finance
Column B
A Diagnosing diseases based on medical images and patient data.
B Analyzing financial data for credit scoring and risk assessment.
C Developing autonomous navigation systems for vehicles.
D Targeting advertisements based on user preferences and behavior.
E Analyzing user behavior to provide personalized content and
recommendations.

Types of machine learning techniques
Machine learning can be broadly categorized into three main types
based on the learning styles and approaches:
1 Supervised Learning: In supervised learning, the algorithm is trained
on a labeled dataset, where each input is associated with a
corresponding output.
Use Case: Classification and Regression. For example, predicting
whether an email is spam (classification) or predicting house prices
based on features (regression).
2 Unsupervised Learning: Unsupervised learning involves training the
algorithm on an unlabeled dataset, and the system tries to learn the
patterns and structure from the data without explicit guidance.
Use Case: Clustering and Association. For example, grouping
customers based on purchasing behavior (clustering) or identifying
frequent itemsets in a transaction database (association).
3 Reinforcement Learning: Reinforcement learning involves training an
agent to make sequences of decisions in an environment, with the goal
of maximizing a cumulative reward signal.
Use Case: Game playing, robotics, and autonomous systems. For
example, training a robot to perform specific tasks or optimizing
strategies in games.

Types of machine learning techniques

Exercises
1 What is the primary characteristic of supervised learning?
A It involves learning from labeled data.
B It does not require any training.
C It focuses on discovering hidden patterns.
D It operates without any external guidance.
2 Which of the following is an example of an unsupervised learning
task?
A Image classification
B Spam email detection
C Customer segmentation
D Predicting stock prices
3 What is the key characteristic of reinforcement learning?
A Learning from labeled data
B Learning without any guidance
C Learning from trial and error
D Learning from pre-defined rules

Course Title: Introduction to Machine Learning Chapter, One: Introduction

Recommended

Recommended

More Related Content

Similar to Course Title: Introduction to Machine Learning Chapter, One: Introduction

Similar to Course Title: Introduction to Machine Learning Chapter, One: Introduction (20)

Recently uploaded

Recently uploaded (20)

Course Title: Introduction to Machine Learning Chapter, One: Introduction