1. Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 1
Improving Enterprise Master Data Quality –
What Is The ROI?
Emergent Research Forum Paper
Stephan E. Zoder, MIPP
Informatica Corp.
szoder@informatica.com
Abstract
Better master data promises fewer process errors, rework and lost opportunities in marketing; order,
supply chain, billing and service management; risk and compliance; human resources as well as IT
operations. This study is a comprehensive evaluation of the financial impact and functional use case
distribution driven by improved, enterprise-wide master data quality. It also uncovers correlations
between organizational characteristics and projected outcomes. It is a practitioner's analysis of 53 client
engagements, aka Business Value Assessments, which evaluate how Master Data Managmenet (MDM)
and Data Quality (DQ) tools are projected to enhance financial outcomes by fixing core data elements
used across two or more software applications, business processes or functional business areas.
Introduction
A high degree of master data quality is required to reduce transactional errors, rework and boost accurate
reporting for improved decision making and ultimately, better financial results. How can one get a
directionally correct, quick yet thorough analysis to assess if supporting corporate information technology
(IT) investments should be made to effectively and sensibly support line-of-business strategy? (Kunisch et
al, 2014)
According to analysts, the digital universe will grow by a factor of 300. (Gantz & Reinsel, 2012) Moreover,
today, just 3% of available data is properly tagged and ready for analytics and even less, 0.5%, is actually
used in some sort of analytics. (Burn-Murdoch, 2012) Unbelievably, data quality and its value continue to
have a credibility problem. (Redman, 2013) Significant productivity loss from inefficiently managing
quality of rapidly growing data undermines process improvements made (Redman, 2013). These facts are
not only indicative of transactional but also underlying master data often used for analytical fact and
dimension tables.
Moreover, with the accelerating adoption and acceptance of cloud applications, line-of-business IT groups
added data quality tools to scrub their silos’ specific repositories. However, corporate functions like
finance continued to struggle with conflicting versions of corporate truth in their data. Naturally,
corporate IT departments started cracking down on the ungoverned, ad hoc purchases of these tools.
In addition, business leadership demanded corporate IT to provide financial justifications for data quality
projects. Surprisingly though, only 35% of organizations reported they needed a clear business case, 29%
indicated that senior level buy-in is a must and only 7% had the ability to build them. (Rowe, 2012)
Moreover, only 60% of IT leaders then charge one of the enterprise architects to formulate a “financial
viability study” to ascertain savings. (Hayler, 2010) However, these “studies”, are squarely about IT
talking to IT in “geek language”. (Cap Gemini and Informatica, 2011)
2. Strategic Information Systems Evaluation
2 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
The Hypothesis
The study seeks to (dis)prove assumptions around financial value of improved data quality and potentially
uncover correlations between financial impact and organizational characteristics. As such, better data
quality includes improvements in master data duplication, standardization, linkage, terminology,
historical auditability, governance workflow, security and access. Thus, this Informatica-sponsored study
asks:
1. Unlike prior approaches ignoring the business impact, this study’s “phenomenon” based method
and outcomes are reliable, acceptable for and valued enough by clients to result in a quicker
software evaluation phase indicated by a shorter software evaluation cycle. (Goetz, 2014;
Madsbjert and Rasmussen, 2014)
2. The study’s findings are significant and in line with other, smaller studies of similar nature to
warrant more attention by the business. (Nucleus Research, 2010)
3. The 5-year ROI derived from improved data quality management capabilities is the same or
higher than from other common business applications like CRM with 50-300%. (Nucleus
Research, 2002 & 2010; Picarille, 2002)
4. Unlike other similar studies, this investigation will show correlations between financial outcomes
and client characteristics in at least one area. (Nucleus Research, 2010)
Research Method Overview
Unlike prior studies listed in this paper, this research assesses the financial impact of improved enterprise
master data quality across a much wider and functionally deeper sample set. (Karel, 2008, Cap Gemini
and Informatica, 2007, Harte Hanks Trillium Software, 2007, Barton, 2012) The research aggregates 53
Business Value Assessment (BVA) engagements’ results, centered on MDM and DQ tool capabilities and
conducted over the course of 18 months. Each Delphi-method workshop resulted in multiple individual
benefits with their own calculation logic (Figure 1) resulting in a financial model (5-year, discounted and
phased comprehensive project cost vs benefit). Both were validated and approved by client leadership to
determine total annual benefit and ROI projections associated with comprehensive, end-to-end, data
quality solutions.
Figure 1: Generic example of individual benefit use case calculation
A number of aspects influence the reliability of the individual benefit calculations and interpretation;
thus, creating a degree of statistical variability. Examples are: consultant and interviewee expertise,
engagement scope, benchmark data fit, sample size and make-up, available model cost inputs (required
hard/software license, maintenance and implementation cost; IT operation and change management cost;
training, etc.), engagement selection criteria, dimension segmentation, etc. (Figure 2)
3. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 3
Figure 2: Sample population by geographical region, employee count and annual
revenue
Figure 3: Sample population by sector
4. Strategic Information Systems Evaluation
4 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
Figure 4: Sample population by supply chain and distribution model
Results
Overall, the positive cross-industry financial impact is significant considering the conservative and likely
(omitting optimistic) scenarios, no matter from what angle the analysis was undertaken. (Figure 5)
However, the variances are sizeable within each dimension given the custom nature of the methodology.
Variances also become larger with increasingly large Return-on-Investment (ROI), yet the likely-
conservative relationship shows a tighter fit with the conservative approach. (Figures 6) Only one
organization had a negative (conservative) ROI as shown in table 1.
Table 1: Cross-industry results
While the figures seem high, if viewed in context with the ROI expectation of an average e-mail marketing
campaign with 2,500%, this study’s findings about the value of fixing data for enterprise-wide use case
scenarios appears very feasible. (Direct Marketing Association, 2014) Moreover, given publicly available
data from other software vendors, it appears that this study’s findings are directionally correct as
envisioned in hypothesis two and three. (Oracle, 2010)
Over 5 Years Mean Min Max Median
ROI -
conservative
1279% -15% 5611% 944%
ROI - likely 2579% 75% 15306% 1424%
Payback - likely
in months
10 2 35 9
NPV - likely in
$000
$61,369 $704 $303,073 $32,762
Annual benefit -
likely in $000
$24,643 $740 $85,762 $12,104
5. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 5
Figure 5: Mean (blue) and median (red) annual benefit amounts by use case category (aka
value driver)
The 22 value drivers contributing to the results span from IT, marketing and sales via service and supply
chain operations to finance and human resources. They were an attempt to standardize often fairly
sector-specific versions, e.g. upsell vs higher market-basket revenue in retail, for cross-sector analysis.
They are (starting left in figure 5):
1. Infrastructure cost reduction from hardware, software and IT operations based on deduplication
2. Cost reduction from improved IT and business productivity from less time spent fixing master
data records
3. Improved profit (revenue) from better prospect identification and conversion to valid leads due to
“cleaner” linkage of demographic and interaction data to individual master entities.
4. Reduced cost from decreasing 3rd party data services contract volume by leveraging clean, internal
attributes more often
5. Reduction in (e-)mailing cost from bounces/returned items and handling expense from cleaner
contact and preference record
6. Improved profit (revenue) from higher lead-to-customer conversion by linking transactions to
each record
7. Reduced customer acquisition and maintenance cost via improved segmentation, preferences,
past interaction history.
6. Strategic Information Systems Evaluation
6 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
8. Improved profit from customer cross-selling by enhancing information like parts availability with
hierarchical data structures, e.g. product, thereby avoiding excess discounting.
9. Improved profit (revenue) from reduction in time to cover open sales territories by clearly sizing
market potential
10. Improved profit (revenue) via increased client retention from timely and full capture of all
transactions and interactions to provide superior offers and service
11. Cost reduction via improved service center productivity by reducing required application
complexity and information conflicts
12. Reduction in maintenance operations budget via more efficient use of resources deployed based
on more accurate predictive models using correct and historical asset location, configuration and
condition
13. Improved profit (revenue) from decreased asset downtime by linking accurate supply and parts
information
14. Reduction of commercial penalties by reducing reporting errors
15. Reduction in supplier spend and inventory carrying cost via correct product information and
relationships (alternates) associated with supplier data to improve discounting
16. Accelerated profit (revenue) and productivity from more streamlined new product introduction
across channels
17. Reduction in logistics cost by decreasing instances of expediting from better part information and
handling cost
18. Improved profit (revenue) from collections via fewer billing errors resulting from incorrect
payment terms and codes (e.g. risk score, account status)
19. Lower tax/insurance premiums and regulatory fines via an accurate, historical representation of
asset location, configuration and performance
20. Reduced opportunity cost from improved budget (CAPEX) accuracy via a true picture of assets
21. Reduced compliance expense from internal and 3rd party audit expenses
22. Reduced human resource budget from quicker employee onboarding and reduced turnover by
providing overall better information to employees increasing job satisfaction and reducing ramp-
up time
Figure 6: Correlation between conservative and likely ROIs per engagement
7. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 7
An unexpected result of this study was that the overwhelmingly positive results did not shorten the overall
software evaluation cycle (here 12-15 months for MDM) of an organization. The reduction here was only
13 days, which effectively rejects hypothesis number one as the blended, median total length for these type
of technologies is about 14 months.
Nevertheless, industry-specific results were highly encouraging. (Table 2) It has to be noted that some
verticals had underperforming results (e.g. healthcare) due to narrow engagement scopes. Others, like
insurance, are likely because of the mid-sized nature of this population. Conversely, the significantly
above cross-industry average results in banking are due to a single large investment bank’s projections.
As a result, the study confirms hypothesis four as a combination of asset-intensive, high transaction
margin and frequency business models appear to reap larger benefits. This would explain healthcare,
retail and insurance sectors’ lower than cross-sector average performance.
Table 2: Mean industry-specific results
(Note: Life Sciences had no conservative findings due to client preference)
Avg Metrics by Industry
Dimension
Likely Benefit
in 000
Likely ROI Conservative ROI ROI Delta
Likely Payback
in mos
Banking $27,683 5576% 2979% 47% 8.0
Insurance $5,202 1217% 833% 32% 7.0
Consumer
Products
$27,583 1424% 988% 31% 10.2
Communications
& Media
$17,255 1424% 1000% 30% 8.0
Healthcare $4,462 960% 557% 42% 12.3
Technology $14,645 1566% 877% 44% 7.8
Services $19,943 1877% 1383% 26% 8.1
Retail $18,185 3213% 2023% 37% 9.0
Petrochemical $68,078 8704% 2296% 74% 6.0
Industrial Mfg $35,521 3440% 2239% 35% 7.8
Travel &
Transport
$8,518 772% 450% 42% 15.0
Public Sector $24,724 1979% 930% 53% 6.0
Utilities $47,724 2154% 944% 56% 14.0
Life Sciences $8,427 1253% 0% 100% 10.0
Aerospace &
Defense
$41,690 3127% 1685% 46% 11.0
x-Sector Mean $24,643 2579% 1279% 46% 9
8. Strategic Information Systems Evaluation
8 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
To put the above results and analysis in figure 6 into perspective, the author chose to contrast mean with
median figures for select industries only given the statistical constraints of the sample size. (Appendix A).
A cross-sector median to mean ROI relationship of about 2x and an annual benefit relationship of 3x are
observable.
Conclusion
From a business model point-of-view some interesting patterns emerged to further confirm hypothesis
four. As expected, the more complex the supply chain model (SCM), the more value can be gained from
fixing data quality. In return, simpler supply chains benefit from simpler project requirements reducing
the overall payback period. (Appendix B)
Similarly, the author suspected that more complex distribution chains via channel partners could also
gain higher value as each value-add process step leverages prior data improvements. This appears to
confirm hypothesis four once again. A low annual benefit in category “direct business to consumer“
(Direct B2C) is likely due to generally lower operating margins found in consumer retail. (Appendix C)
When viewed through the lens of size, as expected, the larger the annual revenue of an organization, the
more leveraged its assets; and hence, more positive, its investment in data quality becomes. Ignoring
outliers, a clear upward trend is apparent. (Appendix D)
On the other hand, looking at workforce size via employee count does not render a clear trend. The
author attests this to the fact that data quality is often a back office function, which does not automatically
scale with workforce growth. (Appendix E) Nevertheless, similar to prior perspectives, the general ROI
figures do not vary significantly.
Often IT department staff views value from the perspective of data size; thus, the author chose to assess
the results from the perspective of what the workshop interviewees shared in terms of the estimated
number of true surviving (post-consolidation) records, e.g. customers, suppliers, locations, accounts and
products in most instances. The author expected that more cleaned-up records would result in a pretty
steady increase of annual benefit and some marginal decrease of ROI and increase in payback period due
to increasingly more complex project requirements, e.g. new business models acquired through M&A.
(Appendix F) However, the variability of the data leads to the anecdotal conclusion that not record volume
but number of repositories drives complexity and thus; cost, and ultimately, savings potential.
Given prior deltas between median and mean results, the author also decided to assess the mean-to-
median delta phenomenon on one particular benefit use case (IT & business productivity increase) by
presenting it as Full-time-equivalencies (FTEs) for three blended rates by number of master data records.
(Appendix G) These should correspond to organizations operating in developing as well as industrialized
nations. Here we find an unusually high number of FTEs for organizations with fewer than 100,000
master data records, which are typically smaller-sized companies by revenue, insinuating that almost all
employees dedicate a sizeable portion of their workday to data fixes. As the record count increases, we see
a similar pattern as in prior evaluations in addition to a wider variance between the two statistical
methods. While drastic decreases and increases between neighboring record-count categories appear to
be due to segment size selection, two further interpretations become apparent if we look at the two tables
holistically. First, as record count increases, productivity savings (prior loss) increases. Secondly, one
category (750,000-1 million) is clearly the one with the highest potential savings for freed-up human
capacity.
Upon closer inspection, each industry also appears to have a unique mix of its largest mean value drivers.
A few examples are captured in the stack charts in appendix H.
In summary, three (2-4) of the four hypothesis were directionally confirmed. While the author does not
expect additional engagements to change the findings materially, additional steps could aid in achieving
further results clarity. Amongst the potential refinements are a more defined value driver categorization
and more comprehensive model cost inputs. The integration of post-implementation results a year after
a project’s go-live could also further enhance the current pre-sales projection findings. However, this
analysis will have to be viewed under the pre-sales vs post-sale implementation scope lens.
9. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 9
REFERENCES
Barton, D. and Court D., 2012, “Making advanced analytics work for you”, Harvard Business Review,
October 2012, p64.
Burn-Murdoch, J., 2012, “Study: less than 1% of the world's data is analysed, over 80% is unprotected”,
December 19, 2012.
Cap Gemini and Informatica, 2011, “Building the business case for master data management- Strategies
to quantify and articulate the business value of MDM”, CapGemini.com, May 2011, p5.
Gantz J and Reinsel D., 2012, “The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and
Biggest Growth in the Far East”, (December 2012), p1.
Goetz, M., 2014, “The Seven Deadly Sins of Data Management Investment and Planning”, Forrester
Research, January 14, 2014.
Harte Hanks Trillium Software, 2007, “Building a tangible ROI for data quality”,
progressivemediagroup.com, 2007.
Hayler, A., 2010,”Building a robust business case for high quality master data”, Information Difference,
February 2010, p3.
Karel, R., 2008, “The ROI of master data management”, Forrester Research, October 29, 2008, p7ff.
Kunisch S et al, 2014, “Why corporate functions stumble”, Harvard Business Review, (December 2014),
p112.
Madsbjerg C. and Rasmussen M., 2012,”An Anthropologist walks into a bar…”, Harvard Business
Review, March 2012, p81.
Nucleus Research, 2010, “ROI Case Study: Oracle MDM – Anonymous Utility Company”, oracle.com,
June 2010, p1.
Nucleus Research, 2002, “Assessing the real ROI from Siebel”, oracle.com, 2002, p3.
Oracle, 2010, “ROI Case Study – Oracle MDM Anonymous Utility Company”, oracle.com, June 2010.
Picarille, L., 2002, “Nucleus Research – Siebel Spat”, destinationCRM.com, September 25, 2002.
Redman, T, 2013, “Data’s credibility problem” HBR.org, December 2013.
Rowe, N., 2012, “The state of master data management 2012”, Aberdeen Group, May 2012, p4.
Direct Marketing Association, 2014, “Email marketers optimistic about 2014 budget as ROI hits
2,500%”, dma.org, Feb 4, 2014.
10. Strategic Information Systems Evaluation
10 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
Appendix A
Appendix B
Median vs
Mean Metrics in
$ Thousands/%
& ROI in
% by
Industry
Dimension
Likely
Median ROI
Likely Mean ROI
Likely Median
Annual Benefit
Likely Mean
Annual
Benefit
Banking 2816% 5576% $20,140 $27,683
Insurance 714% 1217% $5,714 $5,202
Consumer
Products
1438% 1424% $6,403 $27,583
Healthcare 551% 960% $5,185 $4,462
Technology 973% 1566% $12,104 $14,645
Services 892% 1877% $11,327 $19,943
Retail 973% 3213% $12,104 $18,185
x-Sector
Mean/Median
1424% 2579% $12,104 $24,643
Avg Metrics by Supply Model
Dimension
Likely Benefit
in 000
Likely ROI Conservative ROI ROI Delta
Likely
Payback in
mos
Complex SCM $29,372 2259% 1187% 47% 9.7
Simple SCM $18,904 2213% 1270% 43% 7.8
Distribution only $14,918 2181% 1198% 45% 11.2
Service only $22,506 2552% 1440% 44% 8.1
11. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 11
Appendix C
(Note: B=Business, C=Consumer, Channel=using reseller)
Appendix D
(Note: M=million, B=Billion)
Avg Metrics by Distribution Model
Dimension
Likely Benefit in
000
Likely ROI Conservative ROI ROI Delta
Likely Payback
in mos
B2B direct $19,605 2097% 1210% 42% 10.3
B2C direct $10,876 1110% 622% 44% 10.3
B2C channel $43,597 1207% 661% 45% 8.7
B2B channel $30,173 3561% 1673% 53% 7.7
B&C direct $25,627 3163% 1532% 52% 9.6
B&C channel $22,274 1863% 1368% 27% 11.4
Avg Metrics by Revenue
Dimension
Likely Benefit in
000
Likely ROI Conservative ROI ROI Delta
Likely
Payback in
mos
<$100M $1,525 582% 251% 57% 12.0
$100M-$500M $8,212 1997% 1269% 36% 13.6
$500M-$1B $12,847 1901% 1265% 33% 8.0
$1B-$3B $19,869 1457% 939% 36% 9.9
$3B-$6B $25,049 1843% 961% 48% 8.8
$6B-$10B $5,587 584% 377% 35% 16.0
>$10B $31,027 3147% 1556% 51% 8.6
12. Strategic Information Systems Evaluation
12 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
Appendix E
Appendix F
(Note: k=thousand, M=million)
Avg Metrics by Employee Count
Dimension
Likely Benefit in
000
Likely ROI Conservative ROI ROI Delta
Likely
Payback in
mos
<10,000 $16,151 1753% 1055% 40% 11.0
10,000-25,000 $21,466 1763% 899% 49% 9.6
25,000-50,000 $36,682 3159% 1743% 45% 8.2
50,000-75,000 $12,267 1735% 1041% 40% 8.5
75,000-100,000 na na na na na
>100,000 $27,420 3636% 1659% 54% 9.6
Avg Metrics by Record count
Dimension
Likely
Benefit in
000
Likely ROI Conservative ROI ROI Delta
Likely
Payback in
mos
<100k $17,597 1707% 1001% 41% 10.0
100k-300k $10,548 1096% 669% 39% 14.7
300k-500k $5,095 3935% 2689% 32% 6.0
500k-1M $25,075 2278% 1307% 43% 9.9
1M-2M $29,500 3245% 1449% 55% 7.7
2M-3M $17,675 984% 759% 23% 12.3
3M-5M $21,385 2844% 1382% 51% 12.2
5-10M $36,686 2661% 1788% 33% 8.4
>10M $15,616 1818% 867% 52% 8.2
13. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 13
Appendix G
(Note: k=thousand, M=million, FTE=Full Time Equivalence)
Appendix H
Average IT &
Business
FTEs involved
in Data
Quality
involved in
Data Quality
by Record
Count
Dimension
Blended FTE @
$40k
Blended FTE @
$60k
Blended FTE @
$80k
<100k 186 124 93
100k-300k 11 7 5
300k-500k 16 10 8
500k-750k 94 62 47
750k-1M 158 106 79
1M-3M 86 57 43
3M-5M 95 64 48
5-10M 216 144 108
>10M 91 60 45
Median IT &
Business FTEs
involved in
Data Quality
by Record
Count
Dimension
Blended FTE @
$40k
Blended FTE @
$60k
Blended FTE @
$80k
<100k 41 28 21
100k-300k 11 8 6
300k-500k 16 10 8
500k-750k 48 32 24
750k-1M 228 152 114
1M-3M 13 8 6
3M-5M 43 29 22
5M-10M 77 51 39
>10M 88 59 44
14. Strategic Information Systems Evaluation
14 Twenty First Americas Conference on Information Systems, Puerto Rico, 2015
15. Improving Enterprise Master Data Quality – What Is the ROI?
Twentieth Americas Conference on Information Systems, Puerto Rico, 2015 15