Separation of Lanthanides/ Lanthanides and Actinides
Clustering, Association Rules, and Iris Data Analysis
1. Clustering and Association Rules
Case 4
NOVEMBER 24, 2014
GROUP 7
Sushmita Dey
Nikolaos Minas
AllanKuo
Prof Shaonan Tian
2. Clustering
• Clustering is a popular
method.
• It groups a set of points
together in a . Objects different
from each other are grouped in
. The distance is used
as matric to separate objects to
.
3. Clustering
• Objects within same cluster are closer
to each other compared to objects in
different cluster.
• We used from the iris data
set to apply
4. K-Means Clustering
• We use k-means() function from the
“fpc” package.
• We started with number of cluster
equal to and the result was
of pure cluster,
of slightly less pure
cluster and the mixture of
and
6. Hierarchical Clustering with
hclust()
• We used hclust() function from the
“fpc” package
• We used War’s variance
method to create clusters
• We started with and
went upto
8. Association Rules
• Association rule is a popular
unsupervised
• Association rule is used in
in the retails stores to
find which items are
.
9. Association Rules
• Association rules are mostly suited to
find between items in
large set of transactional data
• A typical rule may be represented as:
• {peanut butter, jelly}-> { }
• If peanut butter and jelly are
purchased then
10. Apriori Algorithm
• Apriori Algorithm is used to learn
in a large
transactional dataset.
• Apriori algorithm employs a simple a
priori belief as a heuristic that all
of a set
must also be .
• We used the arules package from R to
analyze the Groceries dataset.
12. Data Exploration
• We install and load the package using the
commandsinstall.packages(“arules”
)and library(arules).
• We use R functions to explore the grocery
dataset.
• We use dim() function to find the
dimensions of the Groceries dataset
• We use inspect() function from
”arules” package to find the 1st 10
transactions in the data sets.
13. Data Exploration
• We use output from the summary()
function on the dataset to find most
frequently purchased item(
), items per average
transaction( ) and items in the
largest transaction # of items(32)
• We use the itemFrequencyPlot()
• Function to create plot from the dataset for visual
exploration
• We plotted item frequency plot for all the items
and items with support
16. Associations Rules
•We use Apriori algorithm from the
arules package to generate set of
association rules.
•We generated rules using
support = and confidence =
by trying out different values
of support and confidence.
17. Associations Rules
• We use summary() function on rule set
to find the rule length distribution,
with rules containing one item.
• We found that generated rule sets
have quality metric of lift as
• We use inspect() and
sort()function to generate
sorted by .