SlideShare a Scribd company logo
1 of 24
Download to read offline
Market Basket Analysis using Apriori Algorithm
-Machine learning
G.SRIHARI
L21IT133
What is Machine Learning?
□ Machine learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy.
□ Machine learning is a subfield of AI, which is probably defined as the capability of a
machine to imitate intelligent human behavior.
□ Artificial intelligence systems are used to perform complex tasks in a way that is similar to
how humans solve problems.
Techniques of Machine Learning
□ Machine Learning techniques are divided mainly into the
following categories:
• Supervised Learning.
• Unsupervised Learning.
• Reinforcement Learning.
□ Supervised Learning
Supervised learning is applicable when a machine has sample data,
i.e., input as well as output data with correct labels.
Supervised learning technique helps us to predict future events with the help of past
experience and labeled examples. Initially, it analyses the known training dataset,
and later it introduces an inferred function that makes predictions about output
values.
Further, it also predicts errors during this entire learning process and also corrects
those errors through algorithms.
□ Unsupervised Learning
In unsupervised learning, a machine is trained with some input samples or
labels only, while output is not known. The training information is neither classified
nor labeled; hence, a machine may not always provide correct output compared to
supervised learning.
It helps in exploring the data and can draw inferences from datasets to describe
hidden structures from unlabeled data.
Reinforcement learning
Data scientists typically use reinforcement learning to teach a machine to complete a
multi-step process for which there are clearly defined rules.
Semi-supervised learning works by data scientists feeding a small amount of labeled
training data to an algorithm. From this, the algorithm learns the dimensions of the data
set, which it can then apply to new, unlabeled data.
The performance of algorithms typically improves when they train on labeled data sets.
But labeling data can be time consuming and expensive.
Data scientists program an algorithm to complete a task and give it positive or negative
cues as it works out how to complete a task. But for the most part, the algorithm decides
on its own what steps to take along the way.
Machine Learning Algorithms
Supervised Learning Algorithms
1. Linear Regression
□ Linear regression is one of the most popular and simple machine learning algorithms that is used
for predictive analysis. Here, predictive analysis defines prediction of something, and linear
regression makes predictions for continuous numbers such as salary, age, etc.
□ It shows the linear relationship between the dependent and independent variables, and shows
how the dependent variable(y) changes according to the independent variable (x).
□ It tries to best fit a line between the dependent and independent variables, and this best fit line is
knowns as the regression line.
□ The equation for the regression line is: y= a0+ a*x+ b
□ Here, y= dependent variable, x= independent variable and a0 = Intercept of line.
□ Linear regression is further divided into two types:
• Simple Linear Regression: In simple linear regression, a single independent variable is used to
predict the value of the dependent variable.
• Multiple Linear Regression: In multiple linear regression, more than one independent variables
are used to predict the value of the dependent variable.
2. Logistic Regression
□ Logistic regression is the supervised learning algorithm, which is used to predict
the categorical variables or discrete values. It can be used for the classification
problems in machine learning, and the output of the logistic regression algorithm
can be either Yes or NO, 0 or 1, Red or Blue, etc.
□ Logistic regression is similar to the linear regression except how they are used,
such as Linear regression is used to solve the regression problem and predict
continuous values, whereas Logistic regression is used to solve the Classification
problem and used to predict the discrete values.
□ Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0
and 1. The S-shaped curve is also known as a logistic function that uses the
concept of the threshold. Any value above the threshold will tend to 1, and below
the threshold will tend to 0
2.Unsupervised Learning
A clustering Algorithm
K-Means Clustering-
□ K-Means Clustering algorithm computes centroids and repeats until the optimal centroid is found.
It is also known as the flat clustering algorithm.
□ The number of clusters found from data by the method is denoted by the letter ‘k’ in k-means.
□ In this method, data points are assigned to clusters in such a way that the sum of the squared
distances between the data points and the centroid is as small as possible.
□ It is suggested to normalize the data while dealing with clustering algorithms employ
distance-based measurement to identify the similarity between data points.
□ Because of the iterative nature of k-Means and the random initialization of centroids, k-means may
become stuck in a local optimum and fail to converge to the global optimum. As a result, it is
advised to employ distinct centroids initializations.
Market Basket Analysis
Using Apriori Algorithm
□ Apriori Principle-If an itemset is frequent, then all of its subsets must also be frequent.
□ Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in
a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior
knowledge of frequent itemset properties. We apply an iterative approach or level-wise search
where k-frequent itemsets are used to find k+1 itemsets.
□ To improve the efficiency of level-wise generation of frequent itemsets, an important property is
used called Apriori property which helps by reducing the search space.
□ Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of Apriori algorithm
is its anti-monotonicity of support measure.
Market Basket Analysis-
□ Def: Market Basket Analysis (Association Analysis) is a mathematical modeling technique
based upon the theory that if you buy a certain group of items, you are likely to buy another
group of items.
□ It is used to analyze the customer purchasing behavior and helps in increasing the sales and
maintain inventory by focusing on the point of sale transaction data.
□ The Apriori Algorithm trains and identifies product baskets and product association rules.
□ It is the most established algorithm for finding frequent item sets mining.
□ The basic princpile of Apriori is “Any subset of a frequent itemset must be frequent”.
□ We use these frequent itemsets to generate association rules.
Finding Associations-
Customer buying habits by finding associations and correlations between the different items that customers
place in their “shopping basket”.
Customer1- Milk, Eggs, Sugar, Bread.
Customer2- Milk, Eggs, Cereal, Bread,
Customer3- Eggs, Sugar.
Customer1, Customer2, Customer3
For Example- Consider
the following dataset and we will find frequent itemsets and
generate association rules for them.
minimum support count is 2
minimum confidence is 60%
Step-1: K=1
(I) Create a table containing support count of each item
present in dataset – Called C1(candidate set)
II) compare candidate set item’s support count with minimum
support count. (here
min_support=2 if support_count of candidate set items is less than
min_support then remove those items).
□ This gives us itemset L1.
□ Table-1
□ Step-2: K=2
I)-> Generate candidate set C2 using L1 (this is
called join step). Condition of joining Lk-1 and
Lk-1 is that it should have (K-2) elements in
common.
□ II)-> Check all subsets of an itemset are frequent or
not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2}
they are frequent.Check for each itemset).
□ III)->Now find support count of these itemsets by
searching in dataset.TABLE-1
□ (II)comparecandidate(C2) TABLE2 support
count with minimum support count(here
min_support=2 if support_count of candidate set
item is less than min_support then remove those
items) this gives us itemset L2.
STEP_3:- Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that
it should have (K-2) elements in common. So here, for L2, first element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3,I5}
□ Check if all subsets of these itemsets are frequent or not and if not, then remove that
itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3,
I4}, subset {I3, I4} is not frequent so remove it. Similarly check for every itemset)
□ find support count of these remaining itemset by searching in dataset.
□ (II) Compare candidate (C3) support count with minimum support count(here min_support=2
if support_count of candidate set item is less than min_support then remove those items)
this gives us itemset L3.
Step-4:
□ Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and
Lk-1 (K=4) is that, they should have (K-2) elements in common. So here, for L3, first
2 elements (items) should match.
□ Check all subsets of these itemsets are frequent or not (Here itemset formed by
joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent).
So no itemset in C4
□ We stop here because no frequent itemsets are found further
□ Thus, we have discovered all the frequent item-sets. Now generation of strong
association rule comes into picture. For that we need to calculate confidence of
each rule.
□ Confidence-
□ A confidence of 60% means that 60% of the customers, who purchased milk and bread also
bought butter
□ Confidence(A->B)=Support_count(AUB)/Support_count(A)
So here, by taking an example of any frequent itemset, we will show the rule generation.
□ Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong
association rules.
Thank you

More Related Content

Similar to machine learning

MachineLearning-v0.1
MachineLearning-v0.1MachineLearning-v0.1
MachineLearning-v0.1Sergey Popov
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised LearningSara Hooker
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptxRashi Agarwal
 
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of  Apriori and Apriori with Hashing AlgorithmIRJET-Comparative Analysis of  Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of Apriori and Apriori with Hashing AlgorithmIRJET Journal
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286Ninad Samel
 
Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.ArchanaT32
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine LearningJeff Tanner
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningIRJET Journal
 

Similar to machine learning (20)

MachineLearning-v0.1
MachineLearning-v0.1MachineLearning-v0.1
MachineLearning-v0.1
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised Learning
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of  Apriori and Apriori with Hashing AlgorithmIRJET-Comparative Analysis of  Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286
 
Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 

Recently uploaded

Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 

Recently uploaded (20)

Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 

machine learning

  • 1. Market Basket Analysis using Apriori Algorithm -Machine learning G.SRIHARI L21IT133
  • 2. What is Machine Learning? □ Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy. □ Machine learning is a subfield of AI, which is probably defined as the capability of a machine to imitate intelligent human behavior. □ Artificial intelligence systems are used to perform complex tasks in a way that is similar to how humans solve problems.
  • 3. Techniques of Machine Learning □ Machine Learning techniques are divided mainly into the following categories: • Supervised Learning. • Unsupervised Learning. • Reinforcement Learning.
  • 4. □ Supervised Learning Supervised learning is applicable when a machine has sample data, i.e., input as well as output data with correct labels. Supervised learning technique helps us to predict future events with the help of past experience and labeled examples. Initially, it analyses the known training dataset, and later it introduces an inferred function that makes predictions about output values. Further, it also predicts errors during this entire learning process and also corrects those errors through algorithms. □ Unsupervised Learning In unsupervised learning, a machine is trained with some input samples or labels only, while output is not known. The training information is neither classified nor labeled; hence, a machine may not always provide correct output compared to supervised learning. It helps in exploring the data and can draw inferences from datasets to describe hidden structures from unlabeled data.
  • 5. Reinforcement learning Data scientists typically use reinforcement learning to teach a machine to complete a multi-step process for which there are clearly defined rules. Semi-supervised learning works by data scientists feeding a small amount of labeled training data to an algorithm. From this, the algorithm learns the dimensions of the data set, which it can then apply to new, unlabeled data. The performance of algorithms typically improves when they train on labeled data sets. But labeling data can be time consuming and expensive. Data scientists program an algorithm to complete a task and give it positive or negative cues as it works out how to complete a task. But for the most part, the algorithm decides on its own what steps to take along the way.
  • 7. Supervised Learning Algorithms 1. Linear Regression □ Linear regression is one of the most popular and simple machine learning algorithms that is used for predictive analysis. Here, predictive analysis defines prediction of something, and linear regression makes predictions for continuous numbers such as salary, age, etc. □ It shows the linear relationship between the dependent and independent variables, and shows how the dependent variable(y) changes according to the independent variable (x). □ It tries to best fit a line between the dependent and independent variables, and this best fit line is knowns as the regression line. □ The equation for the regression line is: y= a0+ a*x+ b □ Here, y= dependent variable, x= independent variable and a0 = Intercept of line. □ Linear regression is further divided into two types: • Simple Linear Regression: In simple linear regression, a single independent variable is used to predict the value of the dependent variable. • Multiple Linear Regression: In multiple linear regression, more than one independent variables are used to predict the value of the dependent variable.
  • 8. 2. Logistic Regression □ Logistic regression is the supervised learning algorithm, which is used to predict the categorical variables or discrete values. It can be used for the classification problems in machine learning, and the output of the logistic regression algorithm can be either Yes or NO, 0 or 1, Red or Blue, etc. □ Logistic regression is similar to the linear regression except how they are used, such as Linear regression is used to solve the regression problem and predict continuous values, whereas Logistic regression is used to solve the Classification problem and used to predict the discrete values. □ Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0 and 1. The S-shaped curve is also known as a logistic function that uses the concept of the threshold. Any value above the threshold will tend to 1, and below the threshold will tend to 0
  • 9. 2.Unsupervised Learning A clustering Algorithm K-Means Clustering- □ K-Means Clustering algorithm computes centroids and repeats until the optimal centroid is found. It is also known as the flat clustering algorithm. □ The number of clusters found from data by the method is denoted by the letter ‘k’ in k-means. □ In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the data points and the centroid is as small as possible. □ It is suggested to normalize the data while dealing with clustering algorithms employ distance-based measurement to identify the similarity between data points. □ Because of the iterative nature of k-Means and the random initialization of centroids, k-means may become stuck in a local optimum and fail to converge to the global optimum. As a result, it is advised to employ distinct centroids initializations.
  • 10. Market Basket Analysis Using Apriori Algorithm □ Apriori Principle-If an itemset is frequent, then all of its subsets must also be frequent. □ Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets. □ To improve the efficiency of level-wise generation of frequent itemsets, an important property is used called Apriori property which helps by reducing the search space. □ Apriori Property – All non-empty subset of frequent itemset must be frequent. The key concept of Apriori algorithm is its anti-monotonicity of support measure.
  • 11. Market Basket Analysis- □ Def: Market Basket Analysis (Association Analysis) is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items. □ It is used to analyze the customer purchasing behavior and helps in increasing the sales and maintain inventory by focusing on the point of sale transaction data. □ The Apriori Algorithm trains and identifies product baskets and product association rules. □ It is the most established algorithm for finding frequent item sets mining. □ The basic princpile of Apriori is “Any subset of a frequent itemset must be frequent”. □ We use these frequent itemsets to generate association rules.
  • 12. Finding Associations- Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping basket”. Customer1- Milk, Eggs, Sugar, Bread. Customer2- Milk, Eggs, Cereal, Bread, Customer3- Eggs, Sugar. Customer1, Customer2, Customer3
  • 13. For Example- Consider the following dataset and we will find frequent itemsets and generate association rules for them. minimum support count is 2 minimum confidence is 60% Step-1: K=1 (I) Create a table containing support count of each item present in dataset – Called C1(candidate set)
  • 14. II) compare candidate set item’s support count with minimum support count. (here min_support=2 if support_count of candidate set items is less than min_support then remove those items). □ This gives us itemset L1.
  • 15. □ Table-1 □ Step-2: K=2 I)-> Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is that it should have (K-2) elements in common. □ II)-> Check all subsets of an itemset are frequent or not and if not frequent remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for each itemset). □ III)->Now find support count of these itemsets by searching in dataset.TABLE-1 □ (II)comparecandidate(C2) TABLE2 support count with minimum support count(here min_support=2 if support_count of candidate set item is less than min_support then remove those items) this gives us itemset L2.
  • 16. STEP_3:- Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that it should have (K-2) elements in common. So here, for L2, first element should match. So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3,I5} □ Check if all subsets of these itemsets are frequent or not and if not, then remove that itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset {I3, I4} is not frequent so remove it. Similarly check for every itemset) □ find support count of these remaining itemset by searching in dataset. □ (II) Compare candidate (C3) support count with minimum support count(here min_support=2 if support_count of candidate set item is less than min_support then remove those items) this gives us itemset L3.
  • 17. Step-4: □ Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-1 (K=4) is that, they should have (K-2) elements in common. So here, for L3, first 2 elements (items) should match. □ Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset in C4 □ We stop here because no frequent itemsets are found further □ Thus, we have discovered all the frequent item-sets. Now generation of strong association rule comes into picture. For that we need to calculate confidence of each rule. □ Confidence- □ A confidence of 60% means that 60% of the customers, who purchased milk and bread also bought butter □ Confidence(A->B)=Support_count(AUB)/Support_count(A)
  • 18. So here, by taking an example of any frequent itemset, we will show the rule generation. □ Itemset {I1, I2, I3} //from L3 SO rules can be [I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50% [I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50% [I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50% [I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33% [I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28% [I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33% So if minimum confidence is 50%, then first 3 rules can be considered as strong association rules.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.