4. Agenda
• Putnam’s study: results and influence
• Putnam’s approach
• Intermezzo – A statistical pitfall
• Critical evaluation:
o dataset is very limited
o model and assumptions are unclear
o analysis is incorrect
• Other studies provide no corroboration
• Simulation study demonstrate incorrectness
5
6. Putnam’s study: results and influence
Claims:
• Generic empirical equations that describe size – effort –
duration relationships.
• Method will produce accurate estimates.
• Only a few quick reference tables and a pocket calculator
needed.
• Trade-off law: K ~ 1 / T4.
7
7. Putnam’s study is very influential
Influence:
• incorporated in estimation software
• many references
• sometimes cited as authoritative
8
8. Putnam’s approach
1) Gather data on effort (K), duration (T) and size (S).
2) Define difficulty: D = K / T 2
3) Define productivity: P = S / K
4) Find relationship between D and P.
Result: P ~ D -0.67
5) Perform basic algebraic manipulations to find
relationships between S, T, and K.
Result: S = C ∙ K1/3 ∙ T4/3,
3
S
and therefore: .
9
4
T
K
10. Putnam’s approach
1) Gather data on effort (K), duration (T) and size (S).
2) Define difficulty: D = K / T 2
3) Define productivity: P = S / K
4) Find relationship between D and P.
Result: P ~ D -0.67
5) Perform basic algebraic manipulations to find
relationships between S, T, and K.
Result: S = C ∙ K1/3 ∙ T4/3,
3
S
and therefore: .
11
4
T
K
11. Intermezzo – A statistical pitfall
• Two researchers examine relationship between S and K.
• Both assume linear relationship.
• Researcher 1 writes K = aS + b
• Researcher 2 writes S = a’K + b’
12
S
K
S
K
13. Intermezzo – The results are quite different
• Researcher 1 writes K = aS + b and finds
(1) K = 1.01 S −0.02.
• Researcher 2 writes S = a’K + b’ and finds
(2) S = 0.50 K + 3.0.
• Researcher 2 then derives
(3) K = 2.02 S – 6.2.
14
15. Intermezzo – A statistical pitfall
• Researcher 1 writes K = aS + b and finds
(1) K = 1.01 S −0.02.
• Researcher 2 writes S = a’K + b’ and finds
(2) S = 0.50 K + 3.0.
•
Researcher 2 then derives
(3) K = 2.02 S – 6.2.
16
^
^
^
17. Critical evaluation (1) – dataset is very limited
• only 13 projects
• all US Military
• 4 are left out => 9 projects remaining
18
18. Critical evaluation (2) – model is unclear
19
size
duration
effort
Putnam does not make clear and consistent
choices regarding model structure.
Only one parameter to capture effort-duration
interaction
24. Critical evaluation (4):
Difficulty – Productivity relationship
Putnam’s reasoning:
More precisely notated:
25
2/3 P D
K
S
1/3 4/3 S K T
ˆ 2/3 P D
ˆ ˆ 2/3 4/3 ?? S K K T
2/3
2
T
K
25. Other studies − Corroboration by Putnam et al.
Putnam & Putnam, “A data verification of the software fourth
power trade-off law,” (Proc. of the Int. Soc. of Parametric Analysts – 6th Annu.
Conf., vol. III(I), pp. 443–471, 1984.)
Putnam & Myers, “Measures for excellence – Reliable
software on time, within budget”, (Englewood Cliffs: Yourdon, 1992.)
Confirmed that K ~ 1 / T4, but…
Found (Dunsmore et al., 1986) and admitted (Putnam &
Myers, n.d.) to be based on circular reasoning.
26
26. Other studies – No corroboration from Jeffery
Jeffery (1987):
• 47 MIS in 4 large organisations
• Find P as a function of K and T.
Result:
• P ~ K−0.47T −0.05
• essentially no productivity – duration relationship
• comparison with Putnam’s P ~ K−0.67T 1.33
• no confirmation
• strictly speaking: no refutation either
27
27. Other studies – No corroboration from
Barry, Mukhopadhyay, and Slaughter
Barry, Mukhopadhyay, and Slaughter (2002):
Ansatz: ln K = … + β1 T
Result: β1 = 0.000677 ± 0.000103, p = .031.
So – larger duration predicts larger effort.
28
28. Other studies – Team size affects effort, so…?
Putnam & Myers (n.d.): larger team size predicts larger
effort:
Teams of 5 or less have better productivity than teams of 20
or more.
Supported by other studies. Example (Rodríguez et al.):
PDR ~ (average team size)^0.57
But…
• translation to effort-duration trade-off unclear
• interpretation in terms of causation dubious
29
30. Simulation (1)
Goal: check whether the analysis issues really lead to
incorrect results.
Method:
• generate simulated data with known structure
• analyze simulated data, following Putnam’s approach
• check whether results are consistent with assumptions
31
31. Simulation (2)
Model assumptions:
• Size, effort, and duration are unrelated random numbers.
• Log-normal distributions.
• 1000 projects.
32
34. Simulation (4) – result
Fit yields:
ln P 0.67ln Dconstant
After transformation:
After some manipulations (same as Putnam’s):
K Yet, no
35
0.670.02 P D
1
4.1 0.4
T
relationship
actually exists!
35. Simulation (5) – coincidence?
For convenience, write s = ln S, k = ln K, and t = ln T.
Difficulty and productivity:
• ln D = k – 2t
• ln P = s – k
Derive the slope of P against D:
D P
cov(ln , ln )
2
k
(ln | ln ) 2 2
Follow Putnam closely, finding K ~ T u , with
which yields u = − 4 if 8
36
.
var(ln ) 4
k t
D
B P D
2
2
u
1
k 2
t
k
t
37. Conclusions
38
Claims:
• Generic equations that
describe size – effort –
duration relationships.
• Method will produce
accurate estimates.
• Trade-off law: K ~ 1 / T4.
Limited dataset,
no corroboration
Not addressed
Faulty analysis,
no corroboration
38. Conclusion
39
No credibility for
Putnam’s result
Putnam’s
original study
was wrong
No
corroboration
39. The bad news
• Handling statistical
relationships as if exact.
• Interpreting statistical
relationship as causal
relationships without
sufficient support.
40
Both issues are
rather common in
the estimation /
metrics literature.
Going to discuss this paper. Many know something about the results; few have read the paper.
This is what it looks like.
Most famous conclusions: equations describing relationship between size, effort and duration of software development projects.
.
Claims: “empirical”, “general”, “will produce accurate estimates”, easily done.
Effectively: estimation problem solved!
[Claims] Effectively: claimed comprehensive solution for software estimation.
Trade-off law: K ~ 1/T^4.
[Influence] In software. References soon after publication as well as recently.
Makes it worthwhile to examine study & results critically.
Focus on effort/duration relationship.
Estimation of duration and staffing variations over the course of a project are also in the Putnam paper, but will not be discussed here.
Does not imply my agreement or approval
[Claims] Effectively: claimed comprehensive solution for software estimation.
Trade-off law: K ~ 1/T^4.
[Influence] In software. References soon after publication as well as recently.
Makes it worthwhile to examine study & results critically.
Focus on effort/duration relationship.
Estimation of duration and staffing variations over the course of a project are also in the Putnam paper, but will not be discussed here.
Does not imply my agreement or approval
gather data; in Putnam’s case: US military. Size is LoC.
define difficulty D=K/T^2. Variable name “difficulty” justified by observation: small D => easier systems; large D => hard systems
productivity = S/K
find relationshop by doing a double-logarithmic plot of prod against diff
manipulate (insert definitions of D and P) to get “software equation” S=…
manipulate to find trade-off law K=…
Crucial relationship is that between productivity and difficulty.
Many think sw eq derived from Rayleigh-Norden curves, but it is not. Derived from empirical data in this way.
gather data; in Putnam’s case: US military. Size is LoC.
define difficulty D=K/T^2. Variable name “difficulty” justified by observation: small D => easier systems; large D => hard systems
productivity = S/K
find relationshop by doing a double-logarithmic plot of prod against diff
manipulate (insert definitions of D and P) to get “software equation” S=…
manipulate to find trade-off law K=…
Before critically examining Putnam’s approach: an intermezzo to demonstrate a common statistical pitfall.
Assume 2 researchers examine relationship between size and effort. Assume linear relationship for simplicity, to bring out the issue more clearly.
Res.1 asks: how much effort does it take to build a system of size S.
Res.2 asks: what size can be produced given effort K.
Write eqs K= and S=, and draw the corresponding data plots. THEY USE THE SAME DATA, but note the orientation of the axes.
I demonstrate with FAKE data, but REAL analysis.
Simulated data and the fit made by researcher 2.
Researcher 1 derives K= directly from data.
Researches 2 derive S=, and then manipulates the result to get an expression for K.
Results are quite different.
(FAKE but same data, REAL analysis!)
Same data (flipped plot)
Solid line = fit by Res.1
Dashed line = relationship derived by Res.2
Note that solid line is (by definition) the best fit for predicting effort.
So the correct (=best) fits for predicting effort and for predicting size are not the same.
Quite different. Res. 2 finds stronger dependency. Surprise?
Rewrite results, making estimations explicit. The hat is in the wrong place.
No.
Fitting S=f(K) minimises vertical distances.
Fitting K=f(S) minimises horizontal distances.
Does not yield the same relationship in a different notation, but a different relationship.
General tendency: noise makes the fitted line flatter, and the inverted slope steeper.
In other words: manipulations make relationships stronger than they really are.
“General” solution is based on only 13 projects from one organisation.
4 are “different” (standards, application type) and are not analysed.
Putnam’s equation essentially based on 9 projects from one organisation, which seems insufficient support for accurate and generic equations.
Putnam switches between predicting productivity from difficulty, size from effort and duration, effort from size and duration. No clear and consistently applied choice of model structure.
Duration may affect effort, as claimed by Putnam. But effort may also affect duration: if more work needs to be done, it will probably take longer to do it.
Putnam does not distinguish the effects, does not distinguish between prediction and causation, and has only one parameter to account for the strength of two effects.
Putnam’s analyses contain incorrrect derivations. Let’s discuss one in detail: derivation of trade-off law from software equation.
This is Putnam’s software equation. Raise to 3rd power.
Switch lhs and rhs. Divide by T^4 and C^3. Find trade-off law.
Now, take a look at what really happens.
Software equation is not true exactly. It’s only a statistical relationship. It can, at best, be interpreted as estimating S given K and T.
Denote “estimated value” by hat. Obtain a more precise notation of the software equation. Follow the same steps.
Hat is in the wrong place.
Not a legal manoeuvre.
So, even if one accepts the software equation (which I do not), the trade-off law doesn not follow as a correct estimation of effort.
But can we even accept the sw eq as a starting point? The same issue is relevant here.
Left: Putnam’s derivation.
Right: With the hats inserted. Because K-hat depends on S, I see nowhere to go.
Question: Is all the criticism just formalistic mathematics, without practical consequences? No, it’s not. Come back to that in a few minutes.
P&P and P&M did claim corroboration of K ~ 1/T^4, using dataset of hundreds of projects.
However, Dunsmore et al. showed this was based on circular reasoning.
Was admitted by Putnam & Myers. So does not need to be further covered here further.
Jeffery examined P as a function of both K and T. (Instead of D=K/T^2.) 47 MIS, 4 large organisations.
Found result quite different from Putnam’s; essential no relationship between P and T.
No confirmation.
(Note that Putnam could never have found Jeffery’s result, because the ratio of the exponents of K and T is fixed.)
Strictly speaking: no refutation: neither Putnam nor Jeffery specifies accuracy of results.
Barry, Mukhopadhyay & Slaughter analysed 18 projects, enhancements to mainframe applications.
Found that larger duration predicts larger effort.
Makes sense: if more work needs to be done, it will probably take longer to do it.
Putnam & Myers have put forward an argument based on effects of team size. (Is in the same paper as the admission of circularity.)
Reasoning: trying to speed up a project causes large effort increases. Evidence: larger team size predicts larger effort. Larger teams have worse productivity than small teams.
Relationship corroborated by several other studies: …
HOWEVER: (1) no straightforward way to derive the trade-off (only that there is one, not the -4)
(2) interpretation dubious.
… alternative interpretation …
reality is probably a mixture, so we do not know how strong the effect really is
I had promised to get back to one important question: Is all the criticism just formalistic mathematics, or does it have consequences? Did a simulation to check this. Will Putnam’s approach really yield incorrect results?
… (create a situation in which we know what the correct answer is) …
Simplest of all: just unrelated random numbers. No relationships. No trade-off, so we shouldn’t find one if Putnam’s approach is correct.
Created fake data, assuming log-normal distributions.
This is what the data look like, if we plot P against D, as did Putnam. May come as a surprise that there is any relationship at all, starting from random numbers.
But have a look at the definitions. Effort is in def of D and of P, causing a relationship between them.
Follow in Putnam’s footsteps. Same fit, same result. Transform into power law. Reproduce Putnam’s algebraic manipulations, and find same result (within error bounds). We find a relationship that is not really there.
Coincidence? No, analysis can be done analytically. …
And of course, I chose the parameters to reproduce Putnam’s results. Could have derived any negative value I wanted from unrelated random numbers, just be selecting appropriate values for the SD’s.
Find same result. (Or different one, depending on choices.)
Generic equations: very small dataset, one organisation, no coroboration.
Accurate estimates: not addressed by Putnam. (Only a claim in the abstract.)
Trade-off: incorrect analysis, no corroboration.
Despite elaborate search: did not find a single study corroborating Putnam’s results, except the circular study.
Inevitable conclusion: no credibility for either the sw eq or the trade-off law. They should no longer be used.
End of story? Unfortunately – no. Some of the issues discussed here are quite common in the estimation and metrics literature. Especially handling of statistical relationships as if exact, and their interpretation as causal relationships without further ado.
E.g., there are at least 4 studies interpreting the relationship between team size and effort (or productivity) as evidence that larger teams cause more effort.
What to do? (1) Take care in handling statistical relationships. Remember the basics. (2) Read with care. (3) Use simulations to test an approach. Some problems can be found doing the simplest of simulations, using unrelated random numbers. Simulations can be especially powerful if some (realistic) causal effects that are not part of the analysis in put into the data.