RFM: A Precursor to Data Mining
RFM stands for Recency, Frequency and Monetary Value. It has been used by
direct marketers for over 40 years as a segmentation tool to increase marketing
ROI. The basic premise of RFM is that customers who have purchased more
recently, more frequently and have spent more with your company are your best
prospects for future direct marketing campaigns. Like data mining/response
modeling, the goal of RFM is to increase marketing ROI by communicating (via
direct mail, call center, etc.) only with customers that are likely to respond. Done
well, you increase your ROI as you attain almost the same number of sales by
contacting only a fraction of your customer base.
RFM, BI, data mining and optimization represent a common progression away
from mass marketing for many organizations as their marketing efforts become
more analytically based and targeted.
As depicted above, the adoption of each technique is a function of many factors.
Consequently, a technique like RFM can still be a new and promising approach
to many companies today. It is simple to understand, contributes to ROI, is
inexpensive, and can be utilized as a reliable stepping stone to more advanced
techniques like data mining.
RFM in Action
RFM was initially utilized by marketers in the B-2-C space – specifically in
industries like Cataloging, Insurance, Retail Banking, Telecommunications and
others. There are a number of scoring approaches that can be used with RFM.
We’ll take a look at three:
RFM – Basic Ranking
RFM – Within Parent Cell Ranking
RFM – Weighted Cell Ranking
Each approach has experienced proponents that argue one over the other. The
point is to start somewhere and experiment to find the one that works best for
your company and your customer base. Let’s look at a few examples.
RFM – Basic Ranking
This approach involves scoring customers based on each RFM factor
separately. It begins with sorting your customers based on Recency, i.e.,
the number of days or months since their last purchase. Once sorted in
ascending order (most recent purchasers at the top), the customers are
then split into quintiles, or five equal groups. The customers in the top
quintile represent the 20% of your customers that most recently purchased
This process is then undertaken for Frequency and Monetary as well.
Each customer is in one of the five cells for R, F, and M (see below).
Experience tells us that the best prospects for an upcoming campaign are
those customers that are in Quintile 5 for each factor – those customers
that have purchased most recently, most frequently and have spent the
most money. In fact, a common approach to creating an aggregated
score is to concatenate the individual RFM scores together resulting in
125 cells (5x5x5).
A customer’s score can range from 555 being the highest, to 111 being
RFM – Within Parent Cell Ranking
This approach is advocated by Arthur Middleton Hughes1 – one of the
biggest proponents of RFM analysis. It begins like the one above, i.e., all
customer are initially grouped into 5 cells based on Recency. The next
step takes customers in a given Recency cell – say cell number 5, and
then ranks those customers based on Frequency. Then customers in the
55 (RF) cell are ranked by monetary value. The illustration below shows
this method really requires quite a number of sorts on the database.
RFM – Weighted Ranking
Weightings used by RFM practitioners vary. For example some advocate
adding the RFM score together – thus giving equal weight to each factor.
Consequently, scores can range from 15 (5+5+5) to 3 (1+1+1). Another
weighting arrangement often used is, 3xR + 2xF + 1xM. In this case,
scores can range from 30 to 3.
So which to use? In reality, there are many other permutations of
approaches that are being used today. Best-practice marketing analytics
requires a fine mix of mathematical and statistical science, creativity and
Arthur Middleton Hughes, Vice President, The Database Marketing Institute
experimentation. Bottom line, test multiple scoring methods to see which
works best for your unique customer base. The below graphical analysis
is a great first step in determining a weighting scheme that is appropriate
for your company.
So far we have assumed R is more important than F, which is more
important than M. This is a great start, but in reality some businesses find
that a different order works best given the unique nature of their business
and customer base. The graphs below represent an analysis to a recent
campaign for a hypothetical company. When looking at actual response
across each RFM factor, the graphs suggest that this company may be
better off developing a scoring scheme based on some weighting of MRF,
attributing the highest weight to Monetary since it is associated with the
highest response rate.
OK, so now you have scores – how do you decide which customers
should be contacted based on those scores?
Establishing a Score Threshold
After a test or production campaign, you will find that some of the cells
were profitable while some were not. Let’s turn to a case study to see
how you can establish a threshold that will help maximize your profitability.
This study comes from Professor Charlotte Mason of the Kenan-Flagler
Business School and utilizes a real-life marketing study performed by The
BookBinders Book Club.2
BookBinders is a specialty book seller that utilizes multiple marketing
channels. BookBinders traditionally did mass marketing and wanted to
test the power of RFM. To do so, they initially did a random mailing to
50,000 customers. The customers were mailed an offer to purchase The
Art History of Florence. Response data was captured and a “post-RFM”
analysis was completed. This “post analysis” was done by freezing the
files of the 50,000 test customers prior to the actual test offer. Thus, the
Recency, Frequency and Monetary (RFM) Analysis, Professor Charlotte Mason, Kenan-Flagler
Business School, University of North Carolina, 2003.
impact of this test campaign did not effect the analysis by coding many
(the actual buyers) of the 50,000 test subjects as the most recent
purchasers. The results firmly support the use of RFM as a highly
effective segmentation approach.
Purchased Last Total # Dollars
The Book? Purchase Purchases Spent
Yes 8.61 5.22 234.30
No 12.73 3.76 205.74
Customers that purchased the book were more recent purchasers, more
frequent purchasers and had spent the most with BookBinders.
The response rates by decile for Recency paint an even more compelling
picture (see graph below).
The response rate for the top decile (18%) was twice the response rate
associated with the 5th decile (9%).
Results from this test were then used by BookBinders to identify which of
their remaining customers should receive the same mailing. BookBinders
used a breakeven response rate calculation to determine the appropriate
RFM cells to mail.
The following cost information was used as input:
Cost per Mail-piece $0.50
Selling Price $18.00
BookBinders Book Cost $9.00
Shipping Costs $3.00
Breakeven is achieved when the cost of the mailing is equal to the net
profit from a sale. In this case:
Breakeven = (cost to mail the offer/net profit from a single sale)
= 8.3% = Breakeven Response rate
So, according to the test offer, profit can be obtained by mailing to cells
that exhibited a response rate of greater than 8.3% -- or cells with an RFM
score greater than 425. BookBinders compared the profitability of RFM
versus their old mass marketing approach in the table below.
RFM dramatically improved profitability by capturing 71% of buyers
(3,214/4,522) while mailing only 46% of their customers (22,731/50,000).
And the return on marketing expenditures using RFM was more than eight
times (69.7/8.5) that of a mass mailing.
Number of Cells and Cell Size Considerations
As previously mentioned, RFM was initially utilized by companies that
operated in the B-to-C marketplace and generally possessed a very large
number of customers. The idea of generating 125 cells using quintiles for
R, F and M has been a very good practice as an initial modeling effort.
But what if you are a B-to-B marketer with relatively fewer customers? Or,
what if you are a B-to-C marketer with an extremely large file with millions
of customers? The answer is to use the same approach that is used in
data mining -- be flexible and experiment.
Establishing a minimum test cell size is a good place to start. Arthur
Hughes recommends the following formula:
Test Cell Size = 4 / Breakeven Response Rate.
The Breakeven Response Rate was addressed above in the BookBinders
case study. The number “4” is a number that Hughes has found works
successfully based on many studies he has performed. BookBinders
Breakeven Response Rate was 8.3%. Using the above formula, you
would need a minimum of 48 customers in each cell (4/0.083).
BookBinders actually had 400 customers per cell, so they had more than
adequate comfort in the significance of their test. In reality, BookBinders
could have created as many as 1,041 cells if they were comfortable using
the minimum of 48 per cell. As an example, they could have used deciles
as opposed to quintiles and established 1,000 cells (10 x 10 x 10). The
more cells the finer the analysis, but of course the law of diminishing
returns will arise.
Other weighting considerations can be used for small files. If your
Breakeven Response Rate is 3%, your minimum cell size would be 133
customers (4/0.03). Therefore, if you have 12,000 customers you could
have about 90 cells (12,000/133). As such, a 5 x 5 x 4 (100 cells) or a 5 x
4 x 4 (80 cells) approach may be appropriate.
RFM, BI and data mining are all part of an evolutionary path that is common to
many marketing organizations. While RFM has been practiced for over 40 years,
it still holds great value for many organizations. Its merits include:
Simplicity – easy to understand and implement
Relatively low cost
The demand on data requirements are relatively low in terms of
variables required and the number of records
Once utilized, it sets up a broader foundation (from an infrastructure
and business case perspective) to undertake more sophisticated data
RFM’s challenges include:
Contact fatigue can be a problem for the higher scoring customers. A
high level cross-campaign communication strategy can help prevent
Your lowest scoring customers may never hear from you. Again, a
cross-campaign communications plan should ensure that all of your
customers are communicated with periodically to ensure low scoring
customers are given the opportunity to meet their potential. Also, data
mining and the prediction of customer lifetime value can help address
RFM includes only three variables. Data mining typically finds RFM-
based variables to be quite important in response models. But there
are additional variables that data mining typically use (e.g., detailed
transaction, demographic and firmographic) that help produce
improved results. Moreover, data mining techniques can also increase
response rates via the development of richer segment/cell profiles that
can be used to vary offer content and incentives.
As stated before, successful marketing efforts require analytics and
experimentation. RFM has proven itself as an effective approach to predicting
response and improving profitability. It can be an important stage in your
company’s evolution in marketing analytics.
About the Author
Jim has worked for leading companies in the Marketing Automation space (BI,
data mining, campaign management and eMarketing) for over 12 years. He has
Directed SPSS’ pre-sales engineers in North America and has played the role of
Product Marketing Manager for Unica’s Model (data mining) application. Mr.
Stafford has developed response models and customer segmentations strategies
for many industries including: catalogers, financial services, retailers, and
hospitality. Learn more about Jim’s services here. He can be reached at,