SlideShare a Scribd company logo
Unsupervised
Learning
Orozco Hsu
2023-11-21 1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
Getting started unsupervised learning with
Orange3 (K-means and Associated Rules)
Home works
What is the unsupervised learning
Supervised learning vs. Unsupervised learning
• Supervised learning: Discover patterns in the data that relate data
attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute in
future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.
• Classic unsupervised learning algorithm
• Clustering algorithms (Inductive/ Transductive learning)
• Association rules (also called Market Basket Analysis)
4
K-means
K-means
7
K-means
(Data observation: Shall we PREPROC data?)
8
Transformation
Transform data before K-means
• Many statistical tests make the assumption that datasets are normally
distributed.
• However, this is often NOT the case in practice.
• Transformations:
• Log Transformation: Transform the response variable from y to log(y).
• Square Root Transformation: Transform the response variable from y to y1/2.
• Cube Root Transformation: Transform the response variable from y to y1/3.
Log Transformation
Square Root Transformation
Cube Root Transformation
Quiz 1:
• Why should we transform data?
• Answer 1 : To avoid overfitting. Ok, but what is the overfitting?
Re-Scaling
Standardize Data
• Standardization (Z-scores) rescales a
dataset to have a mean of 0 and a
standard deviation of 1.
• We typically standardize data when we’d
like to know how many standard
deviations each value in a dataset lies
from the mean.
Normalize Data
• Normalization rescales a dataset so that
each value falls between 0 and 1.
• Typically we normalize data when
performing some type of analysis in
which we have multiple variables that
are measured on different scales and we
want each of the variables to have the
same range.
Quiz 2:
• When conducting K-means, how should categorical variable be
handled?
• When conducting K-means on numerical variables with severe
skewness distribution, how to handle with it?
• If we have segmented several groups by the re-scaled data, how to
proceed new data and group assignment? (Using K-means)
• Answer 1: Union all data, and rebuild model again.
• Answer 2: ??
Example of Clustering analysis
Example of Cluster Analysis
• Retail Marketing
• The company can then send personalized advertisements or sales letters to
each household based on how likely they are to respond to specific types
of advertisements.
Example of Cluster Analysis
• Streaming Services
• Using these metrics, a streaming service can perform cluster analysis
to identify high usage and low usage users so that they can know who
they should spend most of their advertising dollars on.
Example of Cluster Analysis
• Sports Science
• They can then feed these variables into a clustering algorithm to
identify players that are similar to each other so that they can have
these players practice with each other and perform specific drills
based on their strengths and weaknesses.
Example of Cluster Analysis
• Email Marketing
• Using these metrics, a business can perform
cluster analysis to identify consumers who use
email in similar ways and tailor the types of
emails and frequency of emails they send to
different clusters of customers.
https://email.uplers.com/blog/email-segmentation-recipe-great-email-marketing/
Example of Cluster Analysis
• Health Insurance
• An actuary can then feed these variables into a clustering algorithm
to identify households that are similar. The health insurance company
can then set monthly premiums based on how often they expect
households in specific clusters to use their insurance.
Association Rules
Association Rules
• In a transaction database with a large amount of data, look
for items correlations.
• The classic story of Walmart diapers and beer.
• Selling these two unrelated products together can actually increase
sales.
26
In general, the correlations can’t be obtained through direct observation, but through algorithms.
Association Rules
• Two steps as below.
• First, obtain the frequent item sets!
• A collection of items that often appear together.
• Utilizing Apriori algorithm.
• Second, generate Association Rules from frequent item sets!
• There may be strong correlations based on frequent item sets.
• Must meet the definition such like Min Supportance or Min confidence.
27
Association Rules
• From sales database, we found {B, C, E} items have high
correlation. That is called frequent item sets.
• According to {B, E} are likely to be purchased together, that is
called strength of association.
• How strong of association, we estimate Supportance and
Confidence.
28
Association Rules
• Supportance
• If the total transaction data has 200 records, and the item Sausage
has 20 records, then its Supportance is 50/200 = 1/4, that is, the
support of sausage is 25%.
• Confidence range: [0, 1].
• Indicates the conditional probability of two items appearing at the
same time. Simply put, it is the probability of item A appearing
when item B has already appeared.
29
Confidence(A -> B) =
• P(A|B): The probability that A will occur
under the conditions that B occurs
• P ( A ∩ B ) or P ( A , B ) or P ( A B ) : The
probability that two events will occur together
Association Rules
• Min Supportance and Min Confidence:
• Generally, we define support as 50%, which means that the purchased
product set {A, B} appears in at least 50% of the total times before it is
considered a frequent item set.
30
If the Supportance/ Confidence is set too low, too many
association rules will appear in the results.
If it is too high, there will be too few association rules,
which is not conducive to us making decisions based on
the association results. 。
Association Rules
• Outputs:
• A bunch of rules are generated, we use to sorting by Supportance or
Confidence to find what we are interesting.
Example of Association Rules
Association Rules (Item Bundle sales )
• Mixed Bundling
33
Association Rules (Item Bundle sales )
• Cross industry bundling
• Gund Teddy Bear and Amazon.com Gift Cards (Bundle)
Workflows
Have a look on Food dataset
Metrics Description
Supportance how often a rule is applicable to a given data set (rule/data)
Confidence how frequently items in Y appear in transactions with X or in other words how
frequently the rule is true (support for a rule/support of antecedent)
Coverage how often antecedent item is found in the data set (support of antecedent/data)
Strength (support of consequent/support of antecedent)
Lift how frequently a rule is true per consequent item (data * confidence/support of
consequent)
Leverage the difference between two item appearing in a transaction and the two items
appearing independently (support*data - antecedent support * consequent
support/data2)
Association Rules analysis
A logical step would be to place Wine closer to the (Nuts, Aspirin, Pancakes) section
The condition holds when looking from the left Antecedent toward on the right Consequent, but NOT in reverse!
Association Rules analysis
• If we are running a promotion for Wine, which products should we
emphasize?
Home works
• Modifying the file format (20231121_hw.csv) to a format compatible
with Orange 3 Association Rules.
• Please identify what have you discovered any
interesting association rules?
The first row is all item names, go
allover purchase item and mark the
values 1; otherwise mark as ? (not 0)

More Related Content

Similar to 2023 Supervised_Learning_Association_Rules

Instacart Market Basket Analysis
Instacart Market Basket AnalysisInstacart Market Basket Analysis
Instacart Market Basket Analysis
Sharanya Prathap
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Smarten Augmented Analytics
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
VarunSahdev2
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
Kumar P
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
Dr.Shweta
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
selvifitria1
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET Journal
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-QCI WORKSHOP- Factor analysis-
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
anjanishah774
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
ssuserb26f53
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
DEEPAK948083
 
Association Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAssociation Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptx
AnjumaaraAnsari
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
veesingh
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
DrGnaneswariG
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
MujtabaAliKhan12
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Utkarsh Sharma
 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis Project
Level Education
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
sirishaYerraboina1
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
Yogesh Khandelwal
 

Similar to 2023 Supervised_Learning_Association_Rules (20)

Instacart Market Basket Analysis
Instacart Market Basket AnalysisInstacart Market Basket Analysis
Instacart Market Basket Analysis
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
 
Association Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAssociation Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptx
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis Project
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 

More from FEG

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
FEG
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
FEG
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
FEG
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
FEG
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
FEG
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
FEG
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
FEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
FEG
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
FEG
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
FEG
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
FEG
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
FEG
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
FEG
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
FEG
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
FEG
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
FEG
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
FEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
FEG
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
FEG
 

More from FEG (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
 

Recently uploaded

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 

Recently uploaded (20)

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 

2023 Supervised_Learning_Association_Rules

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2
  • 3. Tutorial Content 3 Getting started unsupervised learning with Orange3 (K-means and Associated Rules) Home works What is the unsupervised learning
  • 4. Supervised learning vs. Unsupervised learning • Supervised learning: Discover patterns in the data that relate data attributes with a target (class) attribute. • These patterns are then utilized to predict the values of the target attribute in future data instances. • Unsupervised learning: The data have no target attribute. • We want to explore the data to find some intrinsic structures in them. • Classic unsupervised learning algorithm • Clustering algorithms (Inductive/ Transductive learning) • Association rules (also called Market Basket Analysis) 4
  • 5.
  • 8. K-means (Data observation: Shall we PREPROC data?) 8
  • 10. Transform data before K-means • Many statistical tests make the assumption that datasets are normally distributed. • However, this is often NOT the case in practice. • Transformations: • Log Transformation: Transform the response variable from y to log(y). • Square Root Transformation: Transform the response variable from y to y1/2. • Cube Root Transformation: Transform the response variable from y to y1/3.
  • 14. Quiz 1: • Why should we transform data? • Answer 1 : To avoid overfitting. Ok, but what is the overfitting?
  • 16. Standardize Data • Standardization (Z-scores) rescales a dataset to have a mean of 0 and a standard deviation of 1. • We typically standardize data when we’d like to know how many standard deviations each value in a dataset lies from the mean.
  • 17. Normalize Data • Normalization rescales a dataset so that each value falls between 0 and 1. • Typically we normalize data when performing some type of analysis in which we have multiple variables that are measured on different scales and we want each of the variables to have the same range.
  • 18. Quiz 2: • When conducting K-means, how should categorical variable be handled? • When conducting K-means on numerical variables with severe skewness distribution, how to handle with it? • If we have segmented several groups by the re-scaled data, how to proceed new data and group assignment? (Using K-means) • Answer 1: Union all data, and rebuild model again. • Answer 2: ??
  • 20. Example of Cluster Analysis • Retail Marketing • The company can then send personalized advertisements or sales letters to each household based on how likely they are to respond to specific types of advertisements.
  • 21. Example of Cluster Analysis • Streaming Services • Using these metrics, a streaming service can perform cluster analysis to identify high usage and low usage users so that they can know who they should spend most of their advertising dollars on.
  • 22. Example of Cluster Analysis • Sports Science • They can then feed these variables into a clustering algorithm to identify players that are similar to each other so that they can have these players practice with each other and perform specific drills based on their strengths and weaknesses.
  • 23. Example of Cluster Analysis • Email Marketing • Using these metrics, a business can perform cluster analysis to identify consumers who use email in similar ways and tailor the types of emails and frequency of emails they send to different clusters of customers. https://email.uplers.com/blog/email-segmentation-recipe-great-email-marketing/
  • 24. Example of Cluster Analysis • Health Insurance • An actuary can then feed these variables into a clustering algorithm to identify households that are similar. The health insurance company can then set monthly premiums based on how often they expect households in specific clusters to use their insurance.
  • 26. Association Rules • In a transaction database with a large amount of data, look for items correlations. • The classic story of Walmart diapers and beer. • Selling these two unrelated products together can actually increase sales. 26 In general, the correlations can’t be obtained through direct observation, but through algorithms.
  • 27. Association Rules • Two steps as below. • First, obtain the frequent item sets! • A collection of items that often appear together. • Utilizing Apriori algorithm. • Second, generate Association Rules from frequent item sets! • There may be strong correlations based on frequent item sets. • Must meet the definition such like Min Supportance or Min confidence. 27
  • 28. Association Rules • From sales database, we found {B, C, E} items have high correlation. That is called frequent item sets. • According to {B, E} are likely to be purchased together, that is called strength of association. • How strong of association, we estimate Supportance and Confidence. 28
  • 29. Association Rules • Supportance • If the total transaction data has 200 records, and the item Sausage has 20 records, then its Supportance is 50/200 = 1/4, that is, the support of sausage is 25%. • Confidence range: [0, 1]. • Indicates the conditional probability of two items appearing at the same time. Simply put, it is the probability of item A appearing when item B has already appeared. 29 Confidence(A -> B) = • P(A|B): The probability that A will occur under the conditions that B occurs • P ( A ∩ B ) or P ( A , B ) or P ( A B ) : The probability that two events will occur together
  • 30. Association Rules • Min Supportance and Min Confidence: • Generally, we define support as 50%, which means that the purchased product set {A, B} appears in at least 50% of the total times before it is considered a frequent item set. 30 If the Supportance/ Confidence is set too low, too many association rules will appear in the results. If it is too high, there will be too few association rules, which is not conducive to us making decisions based on the association results. 。
  • 31. Association Rules • Outputs: • A bunch of rules are generated, we use to sorting by Supportance or Confidence to find what we are interesting.
  • 33. Association Rules (Item Bundle sales ) • Mixed Bundling 33
  • 34. Association Rules (Item Bundle sales ) • Cross industry bundling • Gund Teddy Bear and Amazon.com Gift Cards (Bundle)
  • 35.
  • 36.
  • 38. Have a look on Food dataset
  • 39. Metrics Description Supportance how often a rule is applicable to a given data set (rule/data) Confidence how frequently items in Y appear in transactions with X or in other words how frequently the rule is true (support for a rule/support of antecedent) Coverage how often antecedent item is found in the data set (support of antecedent/data) Strength (support of consequent/support of antecedent) Lift how frequently a rule is true per consequent item (data * confidence/support of consequent) Leverage the difference between two item appearing in a transaction and the two items appearing independently (support*data - antecedent support * consequent support/data2)
  • 40.
  • 41.
  • 42. Association Rules analysis A logical step would be to place Wine closer to the (Nuts, Aspirin, Pancakes) section The condition holds when looking from the left Antecedent toward on the right Consequent, but NOT in reverse!
  • 43. Association Rules analysis • If we are running a promotion for Wine, which products should we emphasize?
  • 44. Home works • Modifying the file format (20231121_hw.csv) to a format compatible with Orange 3 Association Rules. • Please identify what have you discovered any interesting association rules? The first row is all item names, go allover purchase item and mark the values 1; otherwise mark as ? (not 0)