SlideShare a Scribd company logo
1 of 44
Download to read offline
Unsupervised
Learning
Orozco Hsu
2023-11-21 1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
Getting started unsupervised learning with
Orange3 (K-means and Associated Rules)
Home works
What is the unsupervised learning
Supervised learning vs. Unsupervised learning
• Supervised learning: Discover patterns in the data that relate data
attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute in
future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.
• Classic unsupervised learning algorithm
• Clustering algorithms (Inductive/ Transductive learning)
• Association rules (also called Market Basket Analysis)
4
K-means
K-means
7
K-means
(Data observation: Shall we PREPROC data?)
8
Transformation
Transform data before K-means
• Many statistical tests make the assumption that datasets are normally
distributed.
• However, this is often NOT the case in practice.
• Transformations:
• Log Transformation: Transform the response variable from y to log(y).
• Square Root Transformation: Transform the response variable from y to y1/2.
• Cube Root Transformation: Transform the response variable from y to y1/3.
Log Transformation
Square Root Transformation
Cube Root Transformation
Quiz 1:
• Why should we transform data?
• Answer 1 : To avoid overfitting. Ok, but what is the overfitting?
Re-Scaling
Standardize Data
• Standardization (Z-scores) rescales a
dataset to have a mean of 0 and a
standard deviation of 1.
• We typically standardize data when we’d
like to know how many standard
deviations each value in a dataset lies
from the mean.
Normalize Data
• Normalization rescales a dataset so that
each value falls between 0 and 1.
• Typically we normalize data when
performing some type of analysis in
which we have multiple variables that
are measured on different scales and we
want each of the variables to have the
same range.
Quiz 2:
• When conducting K-means, how should categorical variable be
handled?
• When conducting K-means on numerical variables with severe
skewness distribution, how to handle with it?
• If we have segmented several groups by the re-scaled data, how to
proceed new data and group assignment? (Using K-means)
• Answer 1: Union all data, and rebuild model again.
• Answer 2: ??
Example of Clustering analysis
Example of Cluster Analysis
• Retail Marketing
• The company can then send personalized advertisements or sales letters to
each household based on how likely they are to respond to specific types
of advertisements.
Example of Cluster Analysis
• Streaming Services
• Using these metrics, a streaming service can perform cluster analysis
to identify high usage and low usage users so that they can know who
they should spend most of their advertising dollars on.
Example of Cluster Analysis
• Sports Science
• They can then feed these variables into a clustering algorithm to
identify players that are similar to each other so that they can have
these players practice with each other and perform specific drills
based on their strengths and weaknesses.
Example of Cluster Analysis
• Email Marketing
• Using these metrics, a business can perform
cluster analysis to identify consumers who use
email in similar ways and tailor the types of
emails and frequency of emails they send to
different clusters of customers.
https://email.uplers.com/blog/email-segmentation-recipe-great-email-marketing/
Example of Cluster Analysis
• Health Insurance
• An actuary can then feed these variables into a clustering algorithm
to identify households that are similar. The health insurance company
can then set monthly premiums based on how often they expect
households in specific clusters to use their insurance.
Association Rules
Association Rules
• In a transaction database with a large amount of data, look
for items correlations.
• The classic story of Walmart diapers and beer.
• Selling these two unrelated products together can actually increase
sales.
26
In general, the correlations can’t be obtained through direct observation, but through algorithms.
Association Rules
• Two steps as below.
• First, obtain the frequent item sets!
• A collection of items that often appear together.
• Utilizing Apriori algorithm.
• Second, generate Association Rules from frequent item sets!
• There may be strong correlations based on frequent item sets.
• Must meet the definition such like Min Supportance or Min confidence.
27
Association Rules
• From sales database, we found {B, C, E} items have high
correlation. That is called frequent item sets.
• According to {B, E} are likely to be purchased together, that is
called strength of association.
• How strong of association, we estimate Supportance and
Confidence.
28
Association Rules
• Supportance
• If the total transaction data has 200 records, and the item Sausage
has 20 records, then its Supportance is 50/200 = 1/4, that is, the
support of sausage is 25%.
• Confidence range: [0, 1].
• Indicates the conditional probability of two items appearing at the
same time. Simply put, it is the probability of item A appearing
when item B has already appeared.
29
Confidence(A -> B) =
• P(A|B): The probability that A will occur
under the conditions that B occurs
• P ( A ∩ B ) or P ( A , B ) or P ( A B ) : The
probability that two events will occur together
Association Rules
• Min Supportance and Min Confidence:
• Generally, we define support as 50%, which means that the purchased
product set {A, B} appears in at least 50% of the total times before it is
considered a frequent item set.
30
If the Supportance/ Confidence is set too low, too many
association rules will appear in the results.
If it is too high, there will be too few association rules,
which is not conducive to us making decisions based on
the association results. 。
Association Rules
• Outputs:
• A bunch of rules are generated, we use to sorting by Supportance or
Confidence to find what we are interesting.
Example of Association Rules
Association Rules (Item Bundle sales )
• Mixed Bundling
33
Association Rules (Item Bundle sales )
• Cross industry bundling
• Gund Teddy Bear and Amazon.com Gift Cards (Bundle)
Workflows
Have a look on Food dataset
Metrics Description
Supportance how often a rule is applicable to a given data set (rule/data)
Confidence how frequently items in Y appear in transactions with X or in other words how
frequently the rule is true (support for a rule/support of antecedent)
Coverage how often antecedent item is found in the data set (support of antecedent/data)
Strength (support of consequent/support of antecedent)
Lift how frequently a rule is true per consequent item (data * confidence/support of
consequent)
Leverage the difference between two item appearing in a transaction and the two items
appearing independently (support*data - antecedent support * consequent
support/data2)
Association Rules analysis
A logical step would be to place Wine closer to the (Nuts, Aspirin, Pancakes) section
The condition holds when looking from the left Antecedent toward on the right Consequent, but NOT in reverse!
Association Rules analysis
• If we are running a promotion for Wine, which products should we
emphasize?
Home works
• Modifying the file format (20231121_hw.csv) to a format compatible
with Orange 3 Association Rules.
• Please identify what have you discovered any
interesting association rules?
The first row is all item names, go
allover purchase item and mark the
values 1; otherwise mark as ? (not 0)

More Related Content

Similar to 2023 Supervised_Learning_Association_Rules

Instacart Market Basket Analysis
Instacart Market Basket AnalysisInstacart Market Basket Analysis
Instacart Market Basket AnalysisSharanya Prathap
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...Smarten Augmented Analytics
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopVarunSahdev2
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)Kumar P
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptxDr.Shweta
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithmhina firdaus
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the studyanjanishah774
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptDEEPAK948083
 
Association Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAssociation Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAnjumaaraAnsari
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progressoveesingh
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningUtkarsh Sharma
 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectLevel Education
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using rYogesh Khandelwal
 

Similar to 2023 Supervised_Learning_Association_Rules (20)

Instacart Market Basket Analysis
Instacart Market Basket AnalysisInstacart Market Basket Analysis
Instacart Market Basket Analysis
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
Association Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAssociation Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptx
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis Project
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 
EDA
EDAEDA
EDA
 

More from FEG

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfFEG
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdfFEG
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdfFEG
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318FEG
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practicesFEG
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratchFEG
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratchFEG
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)FEG
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis VisualizationFEG
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)FEG
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)FEG
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)FEG
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised LearningFEG
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning ClusteringFEG
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in ExcelFEG
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdfFEG
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdfFEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdfFEG
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdfFEG
 

More from FEG (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

2023 Supervised_Learning_Association_Rules

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2
  • 3. Tutorial Content 3 Getting started unsupervised learning with Orange3 (K-means and Associated Rules) Home works What is the unsupervised learning
  • 4. Supervised learning vs. Unsupervised learning • Supervised learning: Discover patterns in the data that relate data attributes with a target (class) attribute. • These patterns are then utilized to predict the values of the target attribute in future data instances. • Unsupervised learning: The data have no target attribute. • We want to explore the data to find some intrinsic structures in them. • Classic unsupervised learning algorithm • Clustering algorithms (Inductive/ Transductive learning) • Association rules (also called Market Basket Analysis) 4
  • 5.
  • 8. K-means (Data observation: Shall we PREPROC data?) 8
  • 10. Transform data before K-means • Many statistical tests make the assumption that datasets are normally distributed. • However, this is often NOT the case in practice. • Transformations: • Log Transformation: Transform the response variable from y to log(y). • Square Root Transformation: Transform the response variable from y to y1/2. • Cube Root Transformation: Transform the response variable from y to y1/3.
  • 14. Quiz 1: • Why should we transform data? • Answer 1 : To avoid overfitting. Ok, but what is the overfitting?
  • 16. Standardize Data • Standardization (Z-scores) rescales a dataset to have a mean of 0 and a standard deviation of 1. • We typically standardize data when we’d like to know how many standard deviations each value in a dataset lies from the mean.
  • 17. Normalize Data • Normalization rescales a dataset so that each value falls between 0 and 1. • Typically we normalize data when performing some type of analysis in which we have multiple variables that are measured on different scales and we want each of the variables to have the same range.
  • 18. Quiz 2: • When conducting K-means, how should categorical variable be handled? • When conducting K-means on numerical variables with severe skewness distribution, how to handle with it? • If we have segmented several groups by the re-scaled data, how to proceed new data and group assignment? (Using K-means) • Answer 1: Union all data, and rebuild model again. • Answer 2: ??
  • 20. Example of Cluster Analysis • Retail Marketing • The company can then send personalized advertisements or sales letters to each household based on how likely they are to respond to specific types of advertisements.
  • 21. Example of Cluster Analysis • Streaming Services • Using these metrics, a streaming service can perform cluster analysis to identify high usage and low usage users so that they can know who they should spend most of their advertising dollars on.
  • 22. Example of Cluster Analysis • Sports Science • They can then feed these variables into a clustering algorithm to identify players that are similar to each other so that they can have these players practice with each other and perform specific drills based on their strengths and weaknesses.
  • 23. Example of Cluster Analysis • Email Marketing • Using these metrics, a business can perform cluster analysis to identify consumers who use email in similar ways and tailor the types of emails and frequency of emails they send to different clusters of customers. https://email.uplers.com/blog/email-segmentation-recipe-great-email-marketing/
  • 24. Example of Cluster Analysis • Health Insurance • An actuary can then feed these variables into a clustering algorithm to identify households that are similar. The health insurance company can then set monthly premiums based on how often they expect households in specific clusters to use their insurance.
  • 26. Association Rules • In a transaction database with a large amount of data, look for items correlations. • The classic story of Walmart diapers and beer. • Selling these two unrelated products together can actually increase sales. 26 In general, the correlations can’t be obtained through direct observation, but through algorithms.
  • 27. Association Rules • Two steps as below. • First, obtain the frequent item sets! • A collection of items that often appear together. • Utilizing Apriori algorithm. • Second, generate Association Rules from frequent item sets! • There may be strong correlations based on frequent item sets. • Must meet the definition such like Min Supportance or Min confidence. 27
  • 28. Association Rules • From sales database, we found {B, C, E} items have high correlation. That is called frequent item sets. • According to {B, E} are likely to be purchased together, that is called strength of association. • How strong of association, we estimate Supportance and Confidence. 28
  • 29. Association Rules • Supportance • If the total transaction data has 200 records, and the item Sausage has 20 records, then its Supportance is 50/200 = 1/4, that is, the support of sausage is 25%. • Confidence range: [0, 1]. • Indicates the conditional probability of two items appearing at the same time. Simply put, it is the probability of item A appearing when item B has already appeared. 29 Confidence(A -> B) = • P(A|B): The probability that A will occur under the conditions that B occurs • P ( A ∩ B ) or P ( A , B ) or P ( A B ) : The probability that two events will occur together
  • 30. Association Rules • Min Supportance and Min Confidence: • Generally, we define support as 50%, which means that the purchased product set {A, B} appears in at least 50% of the total times before it is considered a frequent item set. 30 If the Supportance/ Confidence is set too low, too many association rules will appear in the results. If it is too high, there will be too few association rules, which is not conducive to us making decisions based on the association results. 。
  • 31. Association Rules • Outputs: • A bunch of rules are generated, we use to sorting by Supportance or Confidence to find what we are interesting.
  • 33. Association Rules (Item Bundle sales ) • Mixed Bundling 33
  • 34. Association Rules (Item Bundle sales ) • Cross industry bundling • Gund Teddy Bear and Amazon.com Gift Cards (Bundle)
  • 35.
  • 36.
  • 38. Have a look on Food dataset
  • 39. Metrics Description Supportance how often a rule is applicable to a given data set (rule/data) Confidence how frequently items in Y appear in transactions with X or in other words how frequently the rule is true (support for a rule/support of antecedent) Coverage how often antecedent item is found in the data set (support of antecedent/data) Strength (support of consequent/support of antecedent) Lift how frequently a rule is true per consequent item (data * confidence/support of consequent) Leverage the difference between two item appearing in a transaction and the two items appearing independently (support*data - antecedent support * consequent support/data2)
  • 40.
  • 41.
  • 42. Association Rules analysis A logical step would be to place Wine closer to the (Nuts, Aspirin, Pancakes) section The condition holds when looking from the left Antecedent toward on the right Consequent, but NOT in reverse!
  • 43. Association Rules analysis • If we are running a promotion for Wine, which products should we emphasize?
  • 44. Home works • Modifying the file format (20231121_hw.csv) to a format compatible with Orange 3 Association Rules. • Please identify what have you discovered any interesting association rules? The first row is all item names, go allover purchase item and mark the values 1; otherwise mark as ? (not 0)