IJQRM (2014) Statistical Comparison of Final Scores In QFD

QUALITY PAPER
Statistical comparison of final
weight scores in quality function
deployment (QFD) studies
Zafar Iqbal and Nigel P. Grigg
School of Engineering and Advanced Technology, Massey University,
Palmerston North, New Zealand
K. Govindaraju
Institute of Fundamental Sciences, Massey University, Palmerston North,
New Zealand, and
Nicola Campbell-Allen
School of Engineering and Advanced Technology, Massey University,
Palmerston North, New Zealand
Abstract
Purpose – Quality function deployment (QFD) is a methodology to translate the “voice of the
customer” into engineering/technical specifications (HOWs) to be followed in designing of products or
services. For the method to be effective, QFD practitioners need to be able to accurately differentiate
between the final weights (FWs) that have been assigned to HOWs in the house of quality matrix.
The paper aims to introduce a statistical testing procedure to determine whether the FWs of HOWs are
significantly different and investigate the robustness of different rating scales used in QFD practice in
contributing to these differences.
Design/methodology/approach – Using a range of published QFD examples, the paper uses a
parametric bootstrap testing procedure to test the significance of the differences between the FWs by
generating simulated random samples based on a theoretical probability model. The paper then
determines the significance or otherwise of the differences between: the two most extreme FWs and all
pairs of FWs. Finally, the paper checks the robustness of different attribute rating scales (linear vs
non-linear) in the context of these testing procedures.
Findings – The paper demonstrates that not all of the differences that exist between the FWs of
HOW attributes are in fact significant. In the absence of such a procedure, there is no reliable
analytical basis for QFD practitioners to determine whether FWs are significantly different, and they
may wrongly prioritise one engineering attribute over another.
Originality/value – This is the first article to test the significance of the differences between FWs of
HOWs and to determine the robustness of different strength of scales used in relationship matrix.
Keywords Quality function deployment, House of quality, Parametric bootstrapping,
Relationship matrix
Paper type Research paper
1. Introduction
Quality function deployment (QFD) is a methodology used to translate the “voice of
the customer” (VOC) into engineering and technical specifications to be followed in the
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0265-671X.htm
Received 9 December 2012
Revised 4 June 2013
Accepted 5 June 2013
International Journal of Quality &
Reliability Management
Vol. 31 No. 2, 2014
pp. 184-204
q Emerald Group Publishing Limited
0265-671X
DOI 10.1108/IJQRM-06-2013-0092
IJQRM
31,2
184

design of products or services. Akao (1990) has reported that when appropriately
applied, QFD has been effective in substantially reducing product development lead
times. The main goal in implementing QFD is to improve the quality of the product or
service based on customer-deﬁned requirements and expectations. Although QFD is a
popular and widely used technique, as Enriquez et al. (2004 cited in Garver, 2012) point
out, on-going research still seeks to examine the assumptions and methods used within
QFD with a view to continuously improving the methodology and there is a need to be
able to accurately determine importance scores for the customer because with inaccurate
data “the entire House of quality (HOQ) is built upon a weak foundation” (Garver, 2012).
Figure 1 shows a typical “HOQ”, as used within QFD. This structured methodology
is intended to effectively deploy the VOC. It consists of distinct “rooms” (denoted by
rectangles), topped by a “roof” (denoted by the triangle at the top). Engineers and other
product/service development practitioners collect data from customers relating to their
requirements and desires (WHATs). These are weighted for importance, and assigned
a customer priority rating. They are then translated into engineering factors and
requirements (HOWs). The triangular elements shown are used to record the strengths
of intercorrelations between the WHATs or the HOWs. The relationship matrix
records the strengths of the correlations between WHATs and HOWs. Data on
competitor performance is further integrated, and a vector of ﬁnal weights (FWs) for
engineering priorities (HOWs) can be calculated (the bottom element of the HOQ).
Figure 1.
A typical HOQ
Statistical
comparison of
FW scores
185

For the method to be effective, therefore, the differences observed between the FWs
scores should be meaningful and statistically significant. Otherwise, the FW scores
will not provide a valid and reliable basis for the determination of engineering
priorities in the design of the product or service.
The first aim of the research which is presented in this paper was to determine
whether the resulting FWs in a number of QFD examples are in fact (statistically)
significantly different from each other, as measured against the background level of
common cause (random) variation that exists within the relationship matrix from which
they have been derived. Using a range of empirical examples taken from literature, we
use a parametric bootstrap testing procedure to test the statistical significance of the
differences between the FWs via two testing procedures: first, we test the statistical
significance of the differences between only the highest and lowest ranked FWs; second,
we test the significance of the differences between all pairs of FW ratings.
The relationship matrix plays a key role in determining the final HOW weights, but
QFD practice employs a wide range of rating scales. The second aim of our research
was therefore to investigate the robustness of relationship scales by applying different
linear and non-linear changes to the originally reported rating scales. Our findings in
relation to these aims, as reported in this paper, have implications for practitioners,
academics and others involved in QFD research, in determining the degree of
importance to place on FWs.
2. QFD and its factors
In developing a HOQ, the customer, competitor and engineering data that populate the
matrices and vectors are of an inherently qualitative nature, and are operationalised
into numerical values through rating scales that transform linguistic criteria into
numeric data. A wide variety of practice is observable in the application of these
linguistic-numeric scales. In the rating of customer priority, competitor position, etc.
there is not only potential variability in determining which value on a given scale most
closely aligns with the perceived “reality”; but there is also wide variation in the scales
that are applied by practitioners. In the following section we explicate the commonly
used linguistic-numeric scales and outline their use in QFD.
2.1 Customer priority rating scale
Once a QFD developer has converted the VOC into specific requirements (WHATs),
customers are asked to assign priority ratings to those WHATs. The resulting customer
priority ratings are used, together with relationship matrix, to derive the FWs of HOWs
(the engineering/technical criteria required to achieve the WHATs). Table I summarises
several different priority rating scales for importance of WHATs as reported in literature.
Authors Customers priority rating scale
Bouchereau and Rowlands (2000) 1-3
Dikmen et al. (2005) 1-9
Tanik (2010) 1-10
Majid and David (1994) and Utne (2009) 1-5
Olewnik and Lewis (2008), Masui et al. (2003) 1, 3, 9
Park and Kim (1998) Proportions of 1
Table I.
Table of customers
rating scale
IJQRM
31,2
186

2.2 Relationship matrix
In the HOQ, the relationship matrix denotes the strength of relationship between
WHATs and HOWs. In literature, three-point or five-point linguistic-numeric scales are
mostly used for different strengths of relationships. For example: for “weak”,
“medium” and “strong” relationship (Tan et al., 1998), used 1, 3, 5, respectively; (Jeong
and Oh, 1998) used 1, 3, 10; and (Bouchereau and Rowlands, 2000; Dikmen et al., 2005;
Ghiya et al., 1999; Majid and David, 1994; Zhang, 1999) used 1, 3 and 9. We also see
five-point scales 1, 3, 5, 7, 9 reported by Chan and Wu (1998), and 1, 2, 3, 4, 5 by Crowe
and Cheng (1996) to represent “very weak”, “weak”, “medium”, “strong” and “very
strong” relationships. From these and other scales we have observed, the scales are
generally based on a median value of 3. For our study these scales will play an
important role in testing the FWs, as described in Section 1.4.
2.3 Competitor’s data and improvement ratio
Although some practitioners use and some do not use competitors’ data, it is
considered good practice to look at the competitors in the market and make this
assessment part of a robust QFD process (Jeong and Oh, 1998). Table II shows the
most widely used qualitative scales of company’s position in market with customer’s
point of view. Competitors’ data not only contribute to the FWs of HOWs, but also help
to determine current position in the market and to set future goals. The improvement
ratio, also shown in Table II, may substantially change the ranking of FWs. The
empirical examples that we are using in our study do not include improvement ratios,
but if some QFD process includes both competitors’ data and improvement ratio, it can
also be a part of the FWs along with customers priority rating.
2.4 HOWs final weights
The different parts of the HOQ are used to calculate the FWs of technical descriptors
(HOWs). In literature, the following two popular ways are used to find the FWs of
HOWs.
Method 1:
FWj ¼
Xr
i¼1
Rij £ Pi i ¼ 1; . . . ; r; j ¼ i; . . . ; c ð1Þ
where: R is the relationship matrix; and P is customers priority rating (Franceschini
and Rossetto, 2002; Thakkar et al., 2006; Tan et al., 1998).
Author(s) Low – high Goal Improvement ratio
Tanik (2010), Hochman and O’Connell
(1993), Dikmen et al. (2005), Chin et al.
(2001), Bouchereau and Rowlands
(2000), Hoyle
and Chen (2007) 1-5
Goal – next highest
level chosen as
compare
to current level of
company
Improvement ratio –
goal/company
current
level
Utne (2009) 1-4
Jeong and Oh (1998) 1-7
Table II.
Table of competitor’s
rating scale, company
goals and improvement
ratio
Statistical
comparison of
FW scores
187

Method 2:
FWj ¼
Xr
i¼1
Rij £ Pi £ Ii i ¼ 1; . . . ; r; j ¼ i; . . . ; c ð2Þ
where: R is the relationship matrix; P is customers priority rating; and I is
improvement ratio (Jeong and Oh, 1998; Bouchereau and Rowlands, 2000; Hoyle and
Chen, 2007).
Using these methods, FW ratings are obtained that address the customers’ needs, in
order to design or improve products and services. The FWs then must be prioritised to
determine which technical aspect to tackle in which order. The following approaches for
prioritising the FWs have been discussed in literature: analytic hierarchy process (AHP),
“fuzzy QFD”, “statistically extended QFD”; and “dynamic QFD” (Mehrjerdi, 2010). Most
practitioners use customer priority ratings and the relationship matrix to find the FWs
of HOWs. Some also make use of competitor’s data in the determination. The final
HOWs weights give the importance of each technical aspect to be resolved. Usually, the
weights are ranked in descending order, with the number 1 ranked weight being the
most important HOW to resolve, followed by the number 2 ranked weight and so on.
Table III shows the customer priority rating (“customer weight”), relationship matrix
and the FWs of HOWs in a published example from Tan et al. (1998).
Table IV shows the FWs from Table III sorted into descending order, with H1 as the
most important (with priority weight 51) down to H4 as the least (with priority weight 9).
We now test the statistical significance of these FWs in relation to the common cause
variation underscoring each FW value. That is, we will determine the extent to which the
Technical aspects (HOWs)
Customer weights H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
Voice of customer (WHATs)
W1 6 5 0 0 0 1 0 0 0 0 0 0 0
W2 3 0 1 5 3 0 0 0 0 0 0 0 0
W3 1 0 0 1 0 5 0 0 1 0 0 0 0
W4 2 0 5 0 0 0 0 0 0 0 0 0 0
W5 4 0 0 0 0 1 0 0 0 5 0 0 0
W6 8 0 0 0 0 0 0 3 3 0 5 0 0
W7 5 0 0 0 0 0 5 3 0 0 0 0 0
W8 7 3 3 0 0 0 0 0 0 0 0 3 5
Final weights 51 34 16 9 15 25 39 25 20 40 21 35
Source: From Tan et al. (1998)
Table III.
Empirical data for QFD
No. 1 2 3 4 5 6 7 8 9 10 11 12
HOWs H1 H10 H7 H12 H2 H6 H8 H11 H9 H3 H5 H4
Final weights 51 40 39 35 34 25 25 21 20 16 15 9
Table IV.
Final weight of HOWs
in Table III, sorted into
in descending order
IJQRM
31,2
188

differences between FWs indicate special cause variation, and therefore are statistically
significant.
Hypothesis significance/insignificance testing is a vital aspect of statistical
inference. In our testing of FWs, if the difference between two FWs is found to be
insignificant, then this will imply that although the FW values differ from each other,
the variation between these weights is not significantly different to the common cause
(random) variation within the relationship matrix data that contributed to the FW
values. If testing reveals significant differences between FWs, alternatively, then the
variation between FWs is attributable to some special cause and we can infer that one
weight does indeed have priority over another. As we require various different
engineering factors to develop/improve a product or service, then knowing whether or
not two factors are genuinely different from each other in the presence of given data
will be beneficial for engineers and practitioners. This can save time and cost, and
improve the quality of decision making when using QFD. In the next section we will
therefore investigate a statistical procedure to test the statistical significance between
the FWs of HOWs.
3. Methodology: testing of FW differences using a parametric bootstrap
method
3.1 Monte Carlo testing
Monte Carlo theory was first applied by scientists for the development of nuclear
weapons in Las Alamos in 1940, and Monte Carlo methods have various applications in
various disciplines (mathematics, statistics, physics, engineering, chemistry and so on
(Kalos and Whitlock, 2009). The approach simulates random numbers based on some
probability distribution, and the random numbers are then used as a data set for
statistical inference. The major use of Monte Carlo simulation is to estimate some
functions of probability distributions using expectation (James, 2009). Monte Carlo
methods can be used for testing the significance, whereby the significance of a given
statistic can be assessed by comparing it with a sample of test-statistics obtained by
simulating random samples based on a theoretical model. Monte Carlo methods also
help to use bootstrap method in the field of ecology, environmental science, genetics,
etc. where focus in on estimation of percentile confidence limits (Manly, 2007).
3.2 Permutation (randomization) test
The permutation test, introduced by Fisher (1971), can be applied to test whether two
random samples have come from the same population (Kenett and Zacks, 1998).
It determines whether any test-statistic under a null hypothesis genuinely signifies a
difference between the groups (significant result), or whether the data have come
from just one group (non-significant result). Under this test, the distribution of the
test-statistic under the null hypothesis is obtained by permuting all possible
arrangements of the possible values of the data points. This leads to obtaining the
range of possible values for the test-statistic, which will be a realisation of our
test-statistic from original data if the null hypothesis holds true. If the test-statistic
from the original data is extreme in relation to the generated distribution, the null
hypothesis can be rejected. In permutation testing, the main emphasis is therefore on
the data rather than upon underlying assumptions about populations: that is, random
sampling, normality, constant variance and independence (Manly, 2007).
Statistical
comparison of
FW scores
189

3.3 Non-parametric bootstrapping
Bootstrapping helps to draw statistical inferences based on the data given, without
complex assumptions and theory (Kenett and Zacks, 1998). This technique was first
considered in a systematic manner by Efron (1979). In non-parametric bootstrapping
resampling is conducted with replacement, and resampling the values, each with
probability 1/n, helps to model the unknown population. In permutation testing
sampling is done without replacement, whereas in non-parametric bootstrap sampling is
done with replacement. The major use of non-parametric bootstrap to find confidence
limits for population parameters, but it also been used in tests of significance (Manly,
2007).
3.4 The parametric bootstrap
Finally, instead of using the hypothesised value of the parameter, another approach in
computational inference is to use an estimate of the parameter derived from the sample.
In this case, samples can be simulated from some fitted model to obtain a sample of
test-statistics (James, 2009). In the case of QFD, we know the FWs for HOWs are derived
from data which isof a qualitative nature,but we do not know about the parent population
nor any assumptions about the population. So we cannot apply traditional parametric
hypothesis tests (such as z-test, t-test or F-test). From the previous discussion, we have
illustrated that most relationshipmatrices use a scale ofthe form:1,3, 9;1,3,10;1,2,3, 5;or
1, 3, 5. These have a measure of central tendency (median) value approximately equal to 3.
These can be adequately represented by using a (non-parametric) Poisson distribution
with mean of l ¼ 3. In the following illustrations, we therefore use a Poisson distribution
with l ¼ 3 as parametric bootstrap distribution to test the significance of FWs of HOWs,
which is best representative in our case.
4. Results
4.1 Determining the significance of differences between extreme FW ratings
Table V shows an example of a HOQ relationship matrix data showing customer
weights, relationship matrix and the FWs of HOWs, from Masui et al. (2003).
Table VI shows the FWs ranked in ascending order. This more clearly
demonstrates the magnitude of the difference between the highest and lowest FW
ratings (respectively, H12 and H2).
In the first instance we tested the significance of the difference between these
extreme FWs. The test-statistic is the absolute value or modulus of H12-H2 (denoted as
abs (H12-H2)), under the null hypothesis that the technical aspects HOWs H12 and H2
are of same importance.
We generated 10,000 samples, each of size 22 £ 18 (the size of relationship matrix)
using a Poisson distribution with l ¼ 3 as the generator, and determined the HOWs
FWs for all 10,000 samples in the same way as for the original relationship matrix. We
then developed a histogram and density plot of the 10,000 resulting abs (H2-H12)
values, and found the probability value (p-value) associated with our observed
test-statistic of abs (H2-H12). In this procedure, if the probability of our observed
test-statistic is less than 5 percent on the theoretical sampling distribution, then the
difference can be considered significant, indicating that there is a significant difference
between these two FW rating values, and that they can be used as a reliable basis for
prioritising action. If the probability of our observed test-statistic is greater than
IJQRM
31,2
190

Voiceofengineers(HOWs)
CustomersweightH1H2H3H4H5H6H7H8H9H10H11H12H13H14H15H16H17H18
Voiceofcustomer(WHATs)
W19990000000090900000
W23900000000090900000
W33193000013900900000
W41003100000000100000
W59019990010300100000
W63110003309910100000
W71000390000000000000
W81000991300009000000
W91000990000030000000
W103000009000090000000
W119990000000090000000
W129000000019900000000
W131000000090900000000
W143000009900030000000
W151000000090030000000
W163000009909090000000
W173000000300030000000
W181000000300000093100
W193000000000000039999
W209990000003000900000
W211000990000030099909
W223000110000003000009
Finalweights27628293115120917839171171273182292739372772
Source:FromMasuietal.(2003)
Table V.
Customers priority
weights, relationship
matrix and FWs
Statistical
comparison of
FW scores
191

5 percent on the theoretical distribution, then there is no statistical evidence that the
FW ratings are different.
Figures 2 and 3 show the histogram and density plots for our example. The p-value
was 0.006, which shows a highly signiﬁcant difference, implying that H2 and H12 are
in fact different. H2 has signiﬁcantly higher weight than H12, and it is of more
importance to prioritise this technical aspect to effectively meet the VOC.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
HOWs
ranking
H12 H14 H17 H16 H8 H15 H18 H7 H6 H3 H4 H5 H9 H10 H13 H11 H1 H2
Final
weights
18 27 27 37 39 39 72 78 91 93 115 120 171 171 229 273 276 282
Table VI.
Ranking of HOWs FWs
in ascending order
Figure 2.
Histogram for empirical
distributionofabs(H2-H12)
with probability line
Figure 3.
Density plot of empirical
distribution of abs
(H2-H12) with probability
line with p-value ¼ 0.006
IJQRM
31,2
192

We now present four further empirical examples from literature, with associated
density plots and p-values for the FWs. Tables VII-X show the HOW ranking and FWs
(ranked into descending order), and Figures 4-7 show the associated density plots for
each examples with a line representing the observed difference from highest to lowest
FW rating. The p-value is reported below each density plot.
In the preceding examples, Tables VII, IX and X show FWs where there is a
signiﬁcant difference between the highest and lowest FWs. In these cases, it is
appropriate to prioritise the top ranked weight over the lowest ranked weight.
Table VIII shows an instance where there is no signiﬁcant difference between the
highest and lowest ranked weights (respectively H1 ¼ 51 and H4 ¼ 9). In this case, H1
and H4 are values within the range of common cause variation within the HOQ matrix,
and it would be inappropriate to prioritise H1 over H4 for subsequent action.
No. 1 2 3 4 5 6 7
HOWs ranking H2 H6 H1 H4 H3 H5 H7
Final weights 129 107 103 99 72 69 41
Source: From Majid and David (1994)
Table VII.
Ranked FWs
No. 1 2 3 4 5 6 7 8 9 10 11 12
HOWs ranking H1 H10 H7 H12 H2 H6 H8 H11 H9 H3 H5 H4
Final weights 51 40 39 35 34 25 25 21 20 16 15 9
Source: From Tan et al. (1998)
Table VIII.
Ranked FWs
No. 1 2 3 4 5 6 7 8 9 10
HOWs ranking H9 H2 H1 H6 H5 H10 H3 H4 H8 H7
Final weights 705 559 494 488 478 452 438 346 268 157
Source: From Jeong and Oh (1998)
Table IX.
Ranked FWs
No. 1 2 3 4 5
HOWs ranking H3 H5 H4 H2 H1
Final weights 630 630 270 210 105
Source: From Wang et al. (1998)
Table X.
Ranked FWs
Statistical
comparison of
FW scores
193

Figure 5.
distribution of abs (H1-H4)
with probability line with
p-value ¼ 0.630
Figure 4.
with probability line with
p-value ¼ 0.0002
Figure 6.
with probability line and
p-value ¼ 0.000
IJQRM
31,2
194

4.2 Determining the significance of differences between all FW ratings
We next extended this analysis to consider the significance of differences between all
the FWs, by taking differences of all possible pairs of FWs of HOWs. Following the
same procedure to test the significance of any two, a general programme was written
using the statistical software “R” which checked the significance of the difference of all
pairs one by one and generated a p-value ( p-values less than 0.05 indicates significance
differences). For illustration purposes we will consider the FWs of HOWs shown in
Table V (Masui et al., 2003).
The null hypothesis (Ho) is that all of the FWs are of the same importance (meaning
that the variation between FWs is due to common cause). This was tested against the
alternative hypothesis (HA) that at least one of them is significantly different from
others (or the variation between FWs is due to special cause) using, for test-statistic,
abs (Hi-Hj) where i ¼ 1, 2, . . . 17, j ¼ i þ 1. We again generated 10,000 samples, each of
size 22 £ 18 (the size of relationship matrix), using Poisson distribution with l ¼ 3 and
found the final rating for HOWs associated with all samples. We the found abs (Hi-Hj)
for all samples, and the probability (proportion) of each original abs (Hi-Hj) from
the resulting empirical distribution of 10,000 abs (Hi-Hj). We observed whether the
p-value was less than 0.05, representing a significant difference. For the above
example, the following table of p-values resulted (Table XI). The highlighted area
shows that the difference is significant.
Table XI reveals that H2 is the most significantly different from others, and H12 the
least significantly different. Between any two HOW factors, in order to reliably
determine the priority to resolve we can therefore examine the associated p-value. If the
p-value is less than 0.05 we can prioritise the HOWs factor with higher FWs. Such a
smaller p-value shows that a given FW varies significantly from others due to special
cause, and should be addressed first for resolution.
4.3 Scale robustness checking
As a final stage in this analysis, we analysed the robustness of the scales used in the
relationship matrix. That is, the extent to which the scale adopted affects the magnitude
Figure 7.
with probability line and
p-value ¼ 0.090
Statistical
comparison of
FW scores
195

HOWsH12H14H17H16H8H15H18H7H6H3H4H5H9H10H13H11H1H2
HOWsFWs18272737393972789193115120171171229273276282
H1218NA0.8590.8650.7230.6980.6980.3190.2740.1890.1660.0800.0620.0040.0050.0000.0000.0000.000
H1427NANA0.9920.8510.8160.8080.4030.3440.2400.2110.0980.0860.0080.0060.0000.0000.0000.000
H1727NANANA0.8480.8180.8270.4090.3520.2430.2260.1030.0850.0080.0080.0000.0000.0000.000
H1637NANANANA0.9680.9640.5140.4480.3170.2830.1520.1230.0140.0140.0000.0000.0000.000
H839NANANANANA0.9930.5290.4710.3340.3080.1630.1320.0140.0140.0000.0000.0000.000
H1539NANANANANANA0.5320.4570.3310.3210.1660.1310.0170.0140.0000.0000.0000.000
H1872NANANANANANANA0.9030.7240.6840.4250.3650.0690.0740.0030.0000.0000.000
H778NANANANANANANANA0.8040.7840.4860.4370.0900.0890.0050.0000.0000.000
H691NANANANANANANANANA0.9620.6620.5860.1390.1390.0100.0010.0010.000
H393NANANANANANANANANANA0.6800.6000.1560.1390.0110.0010.0010.000
H4115NANANANANANANANANANANA0.9190.2960.2960.0370.0020.0020.003
H5120NANANANANANANANANANANANA0.3420.3400.0470.0060.0040.003
H9171NANANANANANANANANANANANANA0.9930.2860.0630.0560.042
H10171NANANANANANANANANANANANANANA0.2850.0600.0520.040
H13229NANANANANANANANANANANANANANANA0.4160.3840.323
H11273NANANANANANANANANANANANANANANANA0.9450.863
H1276NANANANANANANANANANANANANANANANANA0.900
H2282NANANANANANANANANANANANANANANANANANA
Table XI.
Table of p-values for all
comparisons (abs (Hi-Hj))
IJQRM
31,2
196

of differences between influences the FWs. As we have demonstrated, practitioners use
different linguistic-numeric scales. In this part of the analysis, we investigated whether a
linear or non-linear change in the scale affected the overall ranking of FWs, and whether
the significance of FWs also remained the same under these conditions.
Beginning with the linear conversion, the relationship matrix is the matrix which
shows the strength of relationship between voice of customers, WHATs (Wi) and voice
of engineers HOWs (Hi). From Masui et al. (2003) we know the strength scale for
relationship matrix 0, 1, 3, 9 has been used to find the FWs shown in Table VI. We made
two linear changes from 0, 1, 3, 9 to 0, 2, 4, 10; and from 0, 1, 3, 9 to 0, 3, 5, 11 and obtained
the following two new HOWs FWs ranking in ascending order (Tables XII and XIII).
In Tables XII and XIII when we made a linear change to original scale, we observed
that the FWs changed, but their ranking remained almost the same. Further, the
statistical significance of the final HOWs weights did not substantially (comparing
Tables AI and AII in Appendix 1). Moving onto the nonlinear conversion, we next
make two non-linear changes from 0, 1, 3, 9 to 0, 2, 4, 6; and from 0, 1, 3, 9 to 0, 5, 7 and
we obtained the following two new HOWs ranked FWs (Tables XIV and XV).
We in this case, we observed that the nonlinear conversion to the scales changed the
FWs, but the ranking again remained virtually unchanged, and the p-values similarly
(refer to Appendix 2).
HOWs
ranking
Final
weights
22 30 32 42 44 62 80 92 104 106 132 136 196 198 266 312 312 324
Table XII.
HOWs FWs arranged in
ascending order for scale
0, 2, 4, 10
HOWs
ranking
Final
weights
26 33 37 47 49 85 88 106 117 119 149 152 221 225 303 348 351 366
Table XIII.
0, 3, 5, 11
HOWs
ranking
Final
weights
21 22 29 29 33 35 56 79 82 83 93 94 157 165 181 216 222 237
Table XV.
0, 1, 5, 7
HOWs
ranking
Final
weights
18 18 24 26 28 48 54 68 68 70 84 84 132 138 170 192 204 204
Table XIV.
0, 2, 4, 6
Statistical
comparison of
FW scores
197

5. Conclusions
In relation to the first aim of the research, in this paper we have demonstrated that not all
of the differences between the FWs of HOW attributes may be significant. Indeed, for
one of our literature-derived examples (Tan et al., 1998) we have demonstrated that in the
context of common cause variation, even the most extreme HOW FWs are not
significantly different from each other. This finding implies that the engineering
attributes necessary to maximise customer satisfaction may, in the course of a QFD
analysis, be prioritised inappropriately, and action may be taken in respect of one HOW
requirement in preference to another, where there is in fact no statistical difference
between their ratings. A practical implication of this is that organisations may engage in
costly or time consuming activity resulting from the prioritisation of an engineering
attribute, where an attribute requiring less effort or cost may be an equal priority.
For many QFD situations, an application of Pareto’s 80/20 principle will provide a
pragmatic signpost of the most important engineering factors to prioritise, i.e. the one or
two which have very much higher FWs than the rest (for example, the literature example
from Jeong and Oh (1998), shown earlier in Table IX, shows two extreme FWs that are
clearly and distinctly different from each other). Such a rule of thumb would work
effectively in such cases. However, such a decision making criterion lacks statistical
validity, and will break down where FW differences are less clearly demarcated. For the
example given by Tan et al. (1998) shown earlier in Table VIII, there are no clearly
distinct FWs. In the absence of a formal and rigorous procedure for determining
significance, the practitioner has no real means of determining whether two ratings are
different as compared with the common cause variation present in the relationship
matrix. For QFD to be maximally effective, and in order to overcome this issue, we
advocate that use of a parametric bootstrap testing procedure for FWs can help
practitioners to make more reliable and valid choices when deciding upon which HOWs
to prioritise and which to treat as practically equivalent. We recommend that this
approach can be adopted by engineers and QFD practitioners to enable them to prioritise
more effectively when operating QFD. Although this would be a cumbersome analytical
practice, software can be easily developed that facilitates this testing procedure.
In relation to our second aim, we have further demonstrated that these findings hold
true regardless of the choice of rating scale that is applied. That is, differences between
FWs that are significant will generally remain so regardless of the scale that is applied.
This finding means that the choice of QFD rating scale is not critical, gives
practitioners relative freedom to continue utilising whichever rating scale has been
found to best suit their normal QFD procedures and practices.
References
Akao, Y. (1990), Quality Function Deployment: Integrating Customer Requirement into Product
Design, Productivity Press, Cambridge, MA.
Bouchereau, V. and Rowlands, H. (2000), “Methods and techniques to help quality function
deployment (QFD)”, Benchmarking: An International Journal, Vol. 7 No. 1, pp. 8-20.
Chan, L.K. and Wu, M.L. (1998), “Prioritizing the technical measures in quality function
deployment”, Quality Engineering, Vol. 10 No. 3, pp. 467-479.
Chin, K.S., Pun, K.F., Leung, W. and Lau, H. (2001), “A quality function deployment approach for
improving technical library and information services: a case study”, Library Management,
Vol. 22 Nos 4/5, pp. 195-204.
IJQRM
31,2
198

Crowe, T.J. and Cheng, C.C. (1996), “Using quality function deployment in manufacturing
strategic planning”, International Journal of Operations & Production Management, Vol. 16
No. 4, pp. 35-48.
Dikmen, I., Talat Birgonul, M. and Kiziltas, S. (2005), “Strategic use of quality function
deployment (QFD) in the construction industry”, Building and Environment, Vol. 40 No. 2,
pp. 245-255.
Efron, B. (1979), “Bootstrap methods: another look at the jacknife”, Annals of Statistics, Vol. 7
No. 1, pp. 1-26.
Enriquez, F.T., Osuna, A.J. and Bosch, V.G. (2004), “Prioritising customer needs at spectator
events: obtaining accuracy at a difﬁcult QFD arena”, The International Journal of Quality
& Reliability Management, Vol. 21 No. 9, pp. 984-990.
Fisher, R.A. (1971), The Design of Experiments, Oliver and Boyd, London.
Franceschini, F. and Rossetto, S. (2002), “QFD: an interactive algorithm for the prioritization of
product’s technical design characteristics”, Integrated Manufacturing Systems, Vol. 13
No. 1, pp. 69-75.
Garver, M.S. (2012), “Improving the house of quality with maximum difference scaling”,
International Journal of Quality & Reliability Management, Vol. 29 No. 5, pp. 576-594.
Ghiya, K.K., Bahill, A.T. and Chapman, W.L. (1999), “QFD: validating robustness”, Quality
Engineering, Vol. 11 No. 4, pp. 593-611.
Hochman, S.D. and O’Connell, P.A. (1993), “Quality function deployment: using the customer to
outperform the competition on environmental design”, Proceedings of 1993 IEEE International
Symposium on Electronics and the Environment, Arlington, VA, 10-12 May, pp. 165-172.
Hoyle, C. and Chen, W. (2007), “Next generation QFD: decision-based product attribute function
deployment”, paper presented at International Confereence on Engineering Design,
ICED’07, Cite Des Sciences Et De L’Industrie, Paris, France, 28-31 August.
James, E.G. (2009), Computational Statistics, Springer, New York, NY.
Jeong, M. and Oh, H. (1998), “Quality function deployment: an extended framework for service
quality and customer satisfaction in the hospitality industry”, International Journal of
Hospitality Management, Vol. 17 No. 4, pp. 375-390.
Kalos, M.H. and Whitlock, P.A. (2009), Monte Carlo Methods, Wiley, Hoboken, NJ.
Kenett, R. and Zacks, S. (1998), Modern Industrial Statistics, Design and Control of Quality and
Reliability, Brooks/Coles Publishing Company, Paciﬁc Grove, CA.
Majid, J. and David, R. (1994), “Total quality management applied to engineering education”,
Quality Assurance in Education, Vol. 2 No. 1, pp. 32-40.
Manly, B.F. (2007), Randomization, Bootstrap and Monte Carlo Methods in Biology, Chapman
& Hall/CRC, New York, NY.
Masui, K., Sakao, T., Kobayashi, M. and Inaba, A. (2003), “Applying quality function deployment
to environmentally conscious design”, International Journal of Quality & Reliability
Management, Vol. 20 No. 1, pp. 90-106.
Mehrjerdi, Y.Z. (2010), “Quality function deployment and its extensions”, International Journal of
Quality & Reliability Management, Vol. 27 No. 6, pp. 616-640.
Olewnik, A. and Lewis, K. (2008), “Limitations of the house of quality to provide quantitative design
information”, International Journal of Quality & Reliability Management, Vol. 25 No. 2,
pp. 125-146.
Park, T. and Kim, K.J. (1998), “Determination of an optimal set of design requirements using
house of quality”, Journal of Operations Management, Vol. 16 No. 5, pp. 569-581.
Statistical
comparison of
FW scores
199

Tan, K., Xie, M. and Chia, E. (1998), “Quality function deployment and its use in designing
information technology systems”, International Journal of Quality & Reliability
Management, Vol. 15 No. 6, pp. 634-645.
Tanik, M. (2010), “Improving ‘order handling’ process by using QFD and FMEA methodologies:
a case study”, International Journal of Quality & Reliability Management, Vol. 27 No. 4,
pp. 404-423.
Thakkar, J., Deshmukh, S. and Shastree, A. (2006), “Total quality management (TQM) in
self-financed technical institutions: a quality function deployment (QFD) and force field
analysis approach”, Quality Assurance in Education, Vol. 14 No. 1, pp. 54-74.
Utne, I.B. (2009), “Improving the environmental performance of the fishing fleet by use of quality
function deployment (QFD)”, Journal of Cleaner Production, Vol. 17 No. 8, pp. 724-731.
Wang, H., Xie, M. and Goh, T. (1998), “A comparative study of the prioritization matrix method
and the analytic hierarchy process technique in quality function deployment”,
Total Quality Management, Vol. 9 No. 6, pp. 421-430.
Zhang, Y. (1999), “Green QFD-II: a life cycle approach for environmentally conscious
manufacturing by integrating LCA and LCC into QFD matrices”, International Journal of
Production Research, Vol. 37 No. 5, pp. 1075-1091.
Further reading
Garver, M.S. (2009), “A maximum difference scaling application for customer satisfaction
researchers”, International Journal of Market Research, Vol. 51 No. 4, pp. 481-500.
About the authors
Zafar Iqbal is an Assistant Professor of statistics at The Islamia University of Bahawalpur,
Pakistan, and a doctoral research student based in the School of Engineering and Advanced
Technology at Massey University, New Zealand.
Nigel P. Grigg is an Associate Professor (quality systems) in the School of Engineering and
Advanced Technology at Massey University, New Zealand. He leads Massey University’s
postgraduate teaching and research-based programmes in the quality systems area.
K. Govindaraju is a Senior Lecturer in statistics in the Institute of Fundamental Sciences at
Massey University, New Zealand.
Nicola Campbell-Allen is a Lecturer in quality management in the School of Engineering and
Advanced Technology, Massey University, New Zealand.
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints
IJQRM
31,2
200

Appendix 1
HOWsFWs2230324244628092104106132136196198266312312324
H1222NA0.8790.8440.7030.6750.4550.2800.2000.1280.1250.0460.0350.0020.0020.0000.0000.0000.000
H1730NANA0.9630.8290.7830.5510.3470.2480.1780.1610.0630.0530.0030.0010.0000.0000.0000.000
H1432NANANA0.8490.8190.5900.3780.2710.1840.1670.0670.0530.0010.0020.0000.0000.0000.000
H1642NANANANA0.9650.7100.4870.3670.2550.2490.1050.0800.0060.0040.0000.0000.0000.000
H1544NANANANANA0.7400.4950.3670.2600.2500.1060.0870.0040.0040.0000.0000.0000.000
H862NANANANANANA0.7440.5760.4420.4230.1910.1690.0130.0110.0010.0000.0000.000
Table AI.
p-value table for
relationship strength
scale 0, 2, 4, 10
Statistical
comparison of
FW scores
201

Appendix 2
HOWsFWs26333747498588106117119149152221225303348351366
H1226NA0.8920.8320.6910.6560.2680.2430.1420.0940.0870.0210.0190.0000.0000.0000.0000.0000.000
H1733NANA0.9320.7900.7680.3320.3070.1750.1270.1180.0330.0290.0000.0010.0000.0000.0000.000
H1437NANANA0.8460.8150.3760.3360.2010.1380.1270.0430.0310.0020.0010.0000.0000.0000.000
H1647NANANANA0.9640.4810.4450.2760.1980.1840.0590.0460.0020.0010.0000.0000.0000.000
H1549NANANANANA0.5000.4730.2920.2090.2090.0650.0560.0020.0020.0000.0000.0000.000
H885NANANANANANA0.9450.6990.5440.5290.2270.2080.0150.0100.0000.0000.0000.000
Table AII.
p-value table for
scale 0, 3, 5, 11
IJQRM
31,2
202

HOWsFWs181824262848546868708484132138170192204204
H1718NA0.9920.9070.8700.8480.5750.4900.3500.3500.3340.2210.2150.0360.0270.0050.0020.0010.001
H1218NANA0.9060.8790.8550.5810.5010.3550.3490.3330.2230.2290.0350.0290.0070.0020.0010.001
H1424NANANA0.9610.9360.6580.5700.4200.4170.3940.2650.2620.0470.0380.0090.0020.0010.001
H1626NANANANA0.9620.6890.6010.4390.4320.4230.2760.2840.0500.0400.0080.0020.0010.001
H1528NANANANANA0.7080.6200.4610.4590.4370.3030.3020.0570.0460.0100.0030.0010.001
H1848NANANANANANA0.9080.7030.7070.6780.5070.5090.1210.1000.0260.0100.0040.004
Table AIII.
p-value table for
scale 0, 2, 4, 6
Statistical
comparison of
FW scores
203

HOWsFWs212229293335567982839394157165181216222237
H1721NA0.9770.8770.8760.8250.7960.5240.2770.2590.2510.1850.1730.0110.0110.0030.0000.0000.000
H1222NANA0.8940.8950.8330.8100.5190.2980.2750.2650.1950.1870.0160.0100.0050.0000.0010.000
H1429NANANA0.9940.9360.9030.6140.3500.3180.3240.2400.2300.0160.0130.0050.0010.0010.000
H1629NANANANA0.9370.9080.6180.3570.3280.3200.2280.2300.0180.0140.0050.0010.0010.000
H1533NANANANANA0.9630.6770.4010.3700.3580.2710.2600.0200.0170.0070.0010.0000.000
H835NANANANANANA0.6950.4150.3800.3730.2790.2690.0220.0180.0060.0010.0000.000
Table AIV.
p-value table for
scale 0, 1, 5, 7
IJQRM
31,2
204

IJQRM (2014) Statistical Comparison of Final Scores In QFD

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to IJQRM (2014) Statistical Comparison of Final Scores In QFD

Similar to IJQRM (2014) Statistical Comparison of Final Scores In QFD (20)

IJQRM (2014) Statistical Comparison of Final Scores In QFD