2. What is Machine Learning?
□ Machine learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy.
□ Machine learning is a subfield of AI, which is probably defined as the capability of a
machine to imitate intelligent human behavior.
□ Artificial intelligence systems are used to perform complex tasks in a way that is similar to
how humans solve problems.
3. Techniques of Machine Learning
□ Machine Learning techniques are divided mainly into the
following categories:
• Supervised Learning.
• Unsupervised Learning.
• Reinforcement Learning.
4. □ Supervised Learning
Supervised learning is applicable when a machine has sample data,
i.e., input as well as output data with correct labels.
Supervised learning technique helps us to predict future events with the help of past
experience and labeled examples. Initially, it analyses the known training dataset,
and later it introduces an inferred function that makes predictions about output
values.
Further, it also predicts errors during this entire learning process and also corrects
those errors through algorithms.
□ Unsupervised Learning
In unsupervised learning, a machine is trained with some input samples or
labels only, while output is not known. The training information is neither classified
nor labeled; hence, a machine may not always provide correct output compared to
supervised learning.
It helps in exploring the data and can draw inferences from datasets to describe
hidden structures from unlabeled data.
5. Reinforcement learning
Data scientists typically use reinforcement learning to teach a machine to complete a
multi-step process for which there are clearly defined rules.
Semi-supervised learning works by data scientists feeding a small amount of labeled
training data to an algorithm. From this, the algorithm learns the dimensions of the data
set, which it can then apply to new, unlabeled data.
The performance of algorithms typically improves when they train on labeled data sets.
But labeling data can be time consuming and expensive.
Data scientists program an algorithm to complete a task and give it positive or negative
cues as it works out how to complete a task. But for the most part, the algorithm decides
on its own what steps to take along the way.
7. Supervised Learning Algorithms
1. Linear Regression
□ Linear regression is one of the most popular and simple machine learning algorithms that is used
for predictive analysis. Here, predictive analysis defines prediction of something, and linear
regression makes predictions for continuous numbers such as salary, age, etc.
□ It shows the linear relationship between the dependent and independent variables, and shows
how the dependent variable(y) changes according to the independent variable (x).
□ It tries to best fit a line between the dependent and independent variables, and this best fit line is
knowns as the regression line.
□ The equation for the regression line is: y= a0+ a*x+ b
□ Here, y= dependent variable, x= independent variable and a0 = Intercept of line.
□ Linear regression is further divided into two types:
• Simple Linear Regression: In simple linear regression, a single independent variable is used to
predict the value of the dependent variable.
• Multiple Linear Regression: In multiple linear regression, more than one independent variables
are used to predict the value of the dependent variable.
8. 2. Logistic Regression
□ Logistic regression is the supervised learning algorithm, which is used to predict
the categorical variables or discrete values. It can be used for the classification
problems in machine learning, and the output of the logistic regression algorithm
can be either Yes or NO, 0 or 1, Red or Blue, etc.
□ Logistic regression is similar to the linear regression except how they are used,
such as Linear regression is used to solve the regression problem and predict
continuous values, whereas Logistic regression is used to solve the Classification
problem and used to predict the discrete values.
□ Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0
and 1. The S-shaped curve is also known as a logistic function that uses the
concept of the threshold. Any value above the threshold will tend to 1, and below
the threshold will tend to 0
9. 2.Unsupervised Learning
A clustering Algorithm
K-Means Clustering-
□ K-Means Clustering algorithm computes centroids and repeats until the optimal centroid is found.
It is also known as the flat clustering algorithm.
□ The number of clusters found from data by the method is denoted by the letter ‘k’ in k-means.
□ In this method, data points are assigned to clusters in such a way that the sum of the squared
distances between the data points and the centroid is as small as possible.
□ It is suggested to normalize the data while dealing with clustering algorithms employ
distance-based measurement to identify the similarity between data points.
□ Because of the iterative nature of k-Means and the random initialization of centroids, k-means may
become stuck in a local optimum and fail to converge to the global optimum. As a result, it is
advised to employ distinct centroids initializations.
10. Market Basket Analysis
Using Apriori Algorithm
□ Apriori Principle-If an itemset is frequent, then all of its subsets must also be frequent.
□ Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in
a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior
knowledge of frequent itemset properties. We apply an iterative approach or level-wise search
where k-frequent itemsets are used to find k+1 itemsets.
□ To improve the efficiency of level-wise generation of frequent itemsets, an important property is
used called Apriori property which helps by reducing the search space.
□ Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of Apriori algorithm
is its anti-monotonicity of support measure.
11. Market Basket Analysis-
□ Def: Market Basket Analysis (Association Analysis) is a mathematical modeling technique
based upon the theory that if you buy a certain group of items, you are likely to buy another
group of items.
□ It is used to analyze the customer purchasing behavior and helps in increasing the sales and
maintain inventory by focusing on the point of sale transaction data.
□ The Apriori Algorithm trains and identifies product baskets and product association rules.
□ It is the most established algorithm for finding frequent item sets mining.
□ The basic princpile of Apriori is “Any subset of a frequent itemset must be frequent”.
□ We use these frequent itemsets to generate association rules.
12. Finding Associations-
Customer buying habits by finding associations and correlations between the different items that customers
place in their “shopping basket”.
Customer1- Milk, Eggs, Sugar, Bread.
Customer2- Milk, Eggs, Cereal, Bread,
Customer3- Eggs, Sugar.
Customer1, Customer2, Customer3
13. For Example- Consider
the following dataset and we will find frequent itemsets and
generate association rules for them.
minimum support count is 2
minimum confidence is 60%
Step-1: K=1
(I) Create a table containing support count of each item
present in dataset – Called C1(candidate set)
14. II) compare candidate set item’s support count with minimum
support count. (here
min_support=2 if support_count of candidate set items is less than
min_support then remove those items).
□ This gives us itemset L1.
15. □ Table-1
□ Step-2: K=2
I)-> Generate candidate set C2 using L1 (this is
called join step). Condition of joining Lk-1 and
Lk-1 is that it should have (K-2) elements in
common.
□ II)-> Check all subsets of an itemset are frequent or
not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2}
they are frequent.Check for each itemset).
□ III)->Now find support count of these itemsets by
searching in dataset.TABLE-1
□ (II)comparecandidate(C2) TABLE2 support
count with minimum support count(here
min_support=2 if support_count of candidate set
item is less than min_support then remove those
items) this gives us itemset L2.
16. STEP_3:- Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that
it should have (K-2) elements in common. So here, for L2, first element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3,I5}
□ Check if all subsets of these itemsets are frequent or not and if not, then remove that
itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3,
I4}, subset {I3, I4} is not frequent so remove it. Similarly check for every itemset)
□ find support count of these remaining itemset by searching in dataset.
□ (II) Compare candidate (C3) support count with minimum support count(here min_support=2
if support_count of candidate set item is less than min_support then remove those items)
this gives us itemset L3.
17. Step-4:
□ Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and
Lk-1 (K=4) is that, they should have (K-2) elements in common. So here, for L3, first
2 elements (items) should match.
□ Check all subsets of these itemsets are frequent or not (Here itemset formed by
joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent).
So no itemset in C4
□ We stop here because no frequent itemsets are found further
□ Thus, we have discovered all the frequent item-sets. Now generation of strong
association rule comes into picture. For that we need to calculate confidence of
each rule.
□ Confidence-
□ A confidence of 60% means that 60% of the customers, who purchased milk and bread also
bought butter
□ Confidence(A->B)=Support_count(AUB)/Support_count(A)
18. So here, by taking an example of any frequent itemset, we will show the rule generation.
□ Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong
association rules.