3. Differing Forms of Machine Learning
Supervised Unsupervised
Categorial
Data
Classification
Association
Analysis
Continuous
Data
Regression
Clustering &
Dimension
Reduction
8. Step 0: Select Variables
Revenue ($) Number of Downloads (#)
Frequency of Purchase (#) Activities Registered (#)
Number of
Items in Basket (#)
Percent of Emails
Clicked (%)
Cost of Acquisition ($) Number of Pages (#)
Time Since Last Purchase
19. Sorting Customers into Groups
Web Analytics Wednesday
Michael Levin
mlevin@otterbein.edu
@MichaelALevin
Editor's Notes
Title slide with playing cards
How can you sort these cards?
Color.
Number non-number.
Suit.
High value. Low value.
Blackjack
Runs by number
Runs by order
Whole deck of card
Each card
Sorting the cards gets us to cluster. Similar/Dissimilar. Parsimony. Actionable
Supervised – we need a teacher or analysis to make decisions about the model. Train the data.
Regression – predict value of a house or price willing to pay
Classification – is this email spam or not
I am simplifying here.
Unsupervised – algorithm makes the decision
Association – people who buy X also buy Y. Basket analysis
Clustering – grouping items, observations, people based on variables.
Other forms besides clustering such as preference or perceptual maps and factor analysis
Photos
Row of different mobile devices
Row of different tablets
Row of different laptops
Photos
Laptop and Mobile devices (2)
4 laptops & 4 mobile (8)
8 laptops, 8 mobile, 8 tablets
Photos
Email
Responsive webdesign
Offers
Four types of clustering. Most stats oriented packages include hierarchical and non hierarchical.
Hierarchical – do not know how many groups.
K-Means – need to specify number of groups to start. Performs better with large datasets
Select the number of variables. Consider what is important to your customer profile.
Specify the desired number of clusters K : Let us choose k=2 for these 5 data points in 2-D space.
Randomly assign each data point to a cluster : Let’s assign three points in cluster 1 shown using green color and two points in cluster 2 shown using grey color.
Compute cluster centroids : The centroid of data points in the green cluster is shown using green cross and those in grey cluster using grey cross.
Re-assign each point to the closest cluster centroid : Note that only the data point at the bottom is assigned to the red cluster even though its closer to the centroid of grey cluster. Thus, we assign that data point into grey cluster
Re-compute cluster centroids : Now, re-computing the centroids for both the clusters.
Photos
Parsimony
Photos
Parsimony
Photos
Parsimony
As I add more clusters, I am getting more homogenous groups.
When I have decided on a solution, I can link the cluster results to regression or other supervised learning by converting the cluster results to dummy results.