SlideShare a Scribd company logo
1 of 17
Clustering and Association Rules 
Case 4 
NOVEMBER 24, 2014 
GROUP 7 
Sushmita Dey 
Nikolaos Minas 
AllanKuo 
Prof Shaonan Tian
Clustering 
• Clustering is a popular 
method. 
• It groups a set of points 
together in a . Objects different 
from each other are grouped in 
. The distance is used 
as matric to separate objects to 
.
Clustering 
• Objects within same cluster are closer 
to each other compared to objects in 
different cluster. 
• We used from the iris data 
set to apply
K-Means Clustering 
• We use k-means() function from the 
“fpc” package. 
• We started with number of cluster 
equal to and the result was 
of pure cluster, 
of slightly less pure 
cluster and the mixture of 
and
K-Means Clustering 
• Figure 1 • Figure 2 
3 3 
1 
2 
1 
1 
1 
1 2 2 
2 
2 
1 3 
3 
2 
1 2 
1 
2 
2 
3 
2 
1 
3 
2 
3 3 
1 
2 
1 
2 
3 
2 
2 
2 
3 
2 
1 
1 3 
1 
3 
3 
3 
2 
1 
2 
3 
3 
3 
1 
1 
2 
2 2 
1 
1 
2 
2 
3 
2 
3 
2 
2 
1 
2 
3 
1 
1 
2 
1 
2 
1 
1 
3 
3 
3 
1 
1 
2 
2 
2 
2 
1 
3 
2 
1 
2 
2 
2 
2 
2 2 
2 
1 
1 
3 
2 
2 
2 
2 
1 
3 
3 
1 
2 
2 
2 
2 
2 
1 
2 
3 
1 2 
1 
3 
1 2 
1 
1 
3 
3 
1 2 
3 
1 
3 
2 
2 
3 
1 
1 
1 
0 5 10 
-15 -14 -13 -12 -11 -10 -9 
dc 1 
dc 2 
4 
1 
1 
4 
4 
2 
4 
4 
2 
4 
3 4 
4 
2 
1 
1 
3 1 
1 
4 
2 
2 
4 
4 
1 
4 
3 
1 
1 1 
3 
4 
2 
4 
4 
1 
4 
4 
4 
1 
4 
2 
2 1 
3 
1 
1 
1 
4 
3 
4 
1 
1 
1 
2 
4 
4 4 
3 
3 
4 
4 
1 
4 
1 
4 
4 
3 
4 
1 
2 
2 
4 
3 
4 
2 
2 
1 
1 
1 
3 
3 
2 
4 
4 
4 
4 
3 
1 
4 
4 
4 
4 
4 
4 
3 
2 
1 
4 
4 
4 
4 
3 
1 
1 
3 
4 
4 
4 
4 
2 
4 
1 
3 4 
3 
1 
2 4 
3 
4 
1 
1 
2 4 
3 1 
3 
3 
3 
2 
0 5 10 
-18 -16 -14 -12 
dc 1 
dc 2
Hierarchical Clustering with 
hclust() 
• We used hclust() function from the 
“fpc” package 
• We used War’s variance 
method to create clusters 
• We started with and 
went upto
Hierarchical Clustering 
• Fig 5: • Fig6 
1 
2 
2 
3 
3 
2 1 1 
2 
3 
3 1 
11 
3 
3 
2 
1 2 2 
1 
1 
3 
2 
2 
3 
1 
3 
3 
3 
2 3 
3 
1 
3 
2 
3 
1 
2 
3 
2 
3 
2 
1 
2 
3 
2 
1 
3 
1 
2 
2 
1 
2 
3 
2 1 
2 
2 
3 
2 
3 
2 
3 
3 
2 
1 
3 
3 
3 
1 
3 
3 
2 
2 
2 
1 
2 
1 
3 
2 
3 
2 
1 
3 
1 
3 
3 
3 
3 
2 
1 
3 
1 
1 
2 
1 
3 
2 
2 
3 
3 
3 
3 
2 3 1 
2 
3 
1 
2 
1 
3 
3 
3 
3 
2 
2 
3 
3 
1 
3 
2 
1 
2 
3 
2 
2 
1 
1 
3 
3 
1 
0 5 10 
-15 -14 -13 -12 -11 -10 -9 
dc 1 
dc 2 
1 
2 
2 
2 
3 
1 11 
2 
2 
2 
1 
1 
3 
2 
2 
3 
1 
3 
4 3 
4 
2 4 
4 
3 
3 
4 
1 
3 
2 
3 
1 
2 
3 
2 
3 
2 
1 
2 
3 
2 
1 
4 
2 1 
2 
1 
2 
3 
2 1 
2 
4 
2 
4 
2 
4 
3 
2 
1 
3 
3 
4 
1 
4 
4 
2 
2 
2 
1 
22 
1 
3 
2 
4 
2 
1 
2 
3 
1 
3 
1 
3 
3 
3 
3 
3 
2 
1 
3 
1 
1 
1 
2 
1 
2 
1 
3 
2 
4 
3 
3 
2 3 1 
2 
4 
1 
2 
1 
3 
3 
4 
2 
2 
3 
3 
1 
3 
2 
1 
2 
3 
2 
2 
1 
1 
4 
4 
1 
5 10 15 20 
-16 -15 -14 -13 -12 -11 -10 
dc 1 
dc 2 
Figure 5: Centroid Plot with 3 
Clusters 
Figure 5: Centroid Plot with 4 
Clusters
Association Rules 
• Association rule is a popular 
unsupervised 
• Association rule is used in 
in the retails stores to 
find which items are 
.
Association Rules 
• Association rules are mostly suited to 
find between items in 
large set of transactional data 
• A typical rule may be represented as: 
• {peanut butter, jelly}-> { } 
• If peanut butter and jelly are 
purchased then
Apriori Algorithm 
• Apriori Algorithm is used to learn 
in a large 
transactional dataset. 
• Apriori algorithm employs a simple a 
priori belief as a heuristic that all 
of a set 
must also be . 
• We used the arules package from R to 
analyze the Groceries dataset.
Groceries Data Sets
Data Exploration 
• We install and load the package using the 
commandsinstall.packages(“arules” 
)and library(arules). 
• We use R functions to explore the grocery 
dataset. 
• We use dim() function to find the 
dimensions of the Groceries dataset 
• We use inspect() function from 
”arules” package to find the 1st 10 
transactions in the data sets.
Data Exploration 
• We use output from the summary() 
function on the dataset to find most 
frequently purchased item( 
), items per average 
transaction( ) and items in the 
largest transaction # of items(32) 
• We use the itemFrequencyPlot() 
• Function to create plot from the dataset for visual 
exploration 
• We plotted item frequency plot for all the items 
and items with support
Items frequency plot(All items)
Items frequency plot(Items with 
10% support)
Associations Rules 
•We use Apriori algorithm from the 
arules package to generate set of 
association rules. 
•We generated rules using 
support = and confidence = 
by trying out different values 
of support and confidence.
Associations Rules 
• We use summary() function on rule set 
to find the rule length distribution, 
with rules containing one item. 
• We found that generated rule sets 
have quality metric of lift as 
• We use inspect() and 
sort()function to generate 
sorted by .

More Related Content

Similar to Clustering, Association Rules, and Iris Data Analysis

Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
HIERARCHICAL CLUSTER ANALYSIS.pptx
HIERARCHICAL CLUSTER ANALYSIS.pptxHIERARCHICAL CLUSTER ANALYSIS.pptx
HIERARCHICAL CLUSTER ANALYSIS.pptxagniva pradhan
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesRashmi Bhat
 
Credit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKACredit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKAMehnaz Newaz
 
Market Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesMarket Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesLuca Zavarella
 
Mining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningMining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningLionel Briand
 
Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptxImXaib
 
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...Databricks
 
psikometri
psikometripsikometri
psikometriekasepta
 
vinay-project-report
vinay-project-reportvinay-project-report
vinay-project-reportVinay Avasthi
 
Parametric and non parametric test
Parametric and non parametric testParametric and non parametric test
Parametric and non parametric testAjay Malpani
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)Ontico
 
05-Association-Rules.pptx
05-Association-Rules.pptx05-Association-Rules.pptx
05-Association-Rules.pptxShree Shree
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysisAnimesh Kumar
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafkaconfluent
 

Similar to Clustering, Association Rules, and Iris Data Analysis (20)

Accelerate performance
Accelerate performanceAccelerate performance
Accelerate performance
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
HIERARCHICAL CLUSTER ANALYSIS.pptx
HIERARCHICAL CLUSTER ANALYSIS.pptxHIERARCHICAL CLUSTER ANALYSIS.pptx
HIERARCHICAL CLUSTER ANALYSIS.pptx
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 
Credit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKACredit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKA
 
Market Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesMarket Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning Services
 
Mining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningMining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine Learning
 
Inferential stat tests samples discuss 4
Inferential stat tests samples discuss 4Inferential stat tests samples discuss 4
Inferential stat tests samples discuss 4
 
Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptx
 
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
 
Tes Reliabilitas
Tes ReliabilitasTes Reliabilitas
Tes Reliabilitas
 
psikometri
psikometripsikometri
psikometri
 
vinay-project-report
vinay-project-reportvinay-project-report
vinay-project-report
 
Kmeans
KmeansKmeans
Kmeans
 
Parametric and non parametric test
Parametric and non parametric testParametric and non parametric test
Parametric and non parametric test
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
 
05-Association-Rules.pptx
05-Association-Rules.pptx05-Association-Rules.pptx
05-Association-Rules.pptx
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

Clustering, Association Rules, and Iris Data Analysis

  • 1. Clustering and Association Rules Case 4 NOVEMBER 24, 2014 GROUP 7 Sushmita Dey Nikolaos Minas AllanKuo Prof Shaonan Tian
  • 2. Clustering • Clustering is a popular method. • It groups a set of points together in a . Objects different from each other are grouped in . The distance is used as matric to separate objects to .
  • 3. Clustering • Objects within same cluster are closer to each other compared to objects in different cluster. • We used from the iris data set to apply
  • 4. K-Means Clustering • We use k-means() function from the “fpc” package. • We started with number of cluster equal to and the result was of pure cluster, of slightly less pure cluster and the mixture of and
  • 5. K-Means Clustering • Figure 1 • Figure 2 3 3 1 2 1 1 1 1 2 2 2 2 1 3 3 2 1 2 1 2 2 3 2 1 3 2 3 3 1 2 1 2 3 2 2 2 3 2 1 1 3 1 3 3 3 2 1 2 3 3 3 1 1 2 2 2 1 1 2 2 3 2 3 2 2 1 2 3 1 1 2 1 2 1 1 3 3 3 1 1 2 2 2 2 1 3 2 1 2 2 2 2 2 2 2 1 1 3 2 2 2 2 1 3 3 1 2 2 2 2 2 1 2 3 1 2 1 3 1 2 1 1 3 3 1 2 3 1 3 2 2 3 1 1 1 0 5 10 -15 -14 -13 -12 -11 -10 -9 dc 1 dc 2 4 1 1 4 4 2 4 4 2 4 3 4 4 2 1 1 3 1 1 4 2 2 4 4 1 4 3 1 1 1 3 4 2 4 4 1 4 4 4 1 4 2 2 1 3 1 1 1 4 3 4 1 1 1 2 4 4 4 3 3 4 4 1 4 1 4 4 3 4 1 2 2 4 3 4 2 2 1 1 1 3 3 2 4 4 4 4 3 1 4 4 4 4 4 4 3 2 1 4 4 4 4 3 1 1 3 4 4 4 4 2 4 1 3 4 3 1 2 4 3 4 1 1 2 4 3 1 3 3 3 2 0 5 10 -18 -16 -14 -12 dc 1 dc 2
  • 6. Hierarchical Clustering with hclust() • We used hclust() function from the “fpc” package • We used War’s variance method to create clusters • We started with and went upto
  • 7. Hierarchical Clustering • Fig 5: • Fig6 1 2 2 3 3 2 1 1 2 3 3 1 11 3 3 2 1 2 2 1 1 3 2 2 3 1 3 3 3 2 3 3 1 3 2 3 1 2 3 2 3 2 1 2 3 2 1 3 1 2 2 1 2 3 2 1 2 2 3 2 3 2 3 3 2 1 3 3 3 1 3 3 2 2 2 1 2 1 3 2 3 2 1 3 1 3 3 3 3 2 1 3 1 1 2 1 3 2 2 3 3 3 3 2 3 1 2 3 1 2 1 3 3 3 3 2 2 3 3 1 3 2 1 2 3 2 2 1 1 3 3 1 0 5 10 -15 -14 -13 -12 -11 -10 -9 dc 1 dc 2 1 2 2 2 3 1 11 2 2 2 1 1 3 2 2 3 1 3 4 3 4 2 4 4 3 3 4 1 3 2 3 1 2 3 2 3 2 1 2 3 2 1 4 2 1 2 1 2 3 2 1 2 4 2 4 2 4 3 2 1 3 3 4 1 4 4 2 2 2 1 22 1 3 2 4 2 1 2 3 1 3 1 3 3 3 3 3 2 1 3 1 1 1 2 1 2 1 3 2 4 3 3 2 3 1 2 4 1 2 1 3 3 4 2 2 3 3 1 3 2 1 2 3 2 2 1 1 4 4 1 5 10 15 20 -16 -15 -14 -13 -12 -11 -10 dc 1 dc 2 Figure 5: Centroid Plot with 3 Clusters Figure 5: Centroid Plot with 4 Clusters
  • 8. Association Rules • Association rule is a popular unsupervised • Association rule is used in in the retails stores to find which items are .
  • 9. Association Rules • Association rules are mostly suited to find between items in large set of transactional data • A typical rule may be represented as: • {peanut butter, jelly}-> { } • If peanut butter and jelly are purchased then
  • 10. Apriori Algorithm • Apriori Algorithm is used to learn in a large transactional dataset. • Apriori algorithm employs a simple a priori belief as a heuristic that all of a set must also be . • We used the arules package from R to analyze the Groceries dataset.
  • 12. Data Exploration • We install and load the package using the commandsinstall.packages(“arules” )and library(arules). • We use R functions to explore the grocery dataset. • We use dim() function to find the dimensions of the Groceries dataset • We use inspect() function from ”arules” package to find the 1st 10 transactions in the data sets.
  • 13. Data Exploration • We use output from the summary() function on the dataset to find most frequently purchased item( ), items per average transaction( ) and items in the largest transaction # of items(32) • We use the itemFrequencyPlot() • Function to create plot from the dataset for visual exploration • We plotted item frequency plot for all the items and items with support
  • 15. Items frequency plot(Items with 10% support)
  • 16. Associations Rules •We use Apriori algorithm from the arules package to generate set of association rules. •We generated rules using support = and confidence = by trying out different values of support and confidence.
  • 17. Associations Rules • We use summary() function on rule set to find the rule length distribution, with rules containing one item. • We found that generated rule sets have quality metric of lift as • We use inspect() and sort()function to generate sorted by .