SlideShare a Scribd company logo
A Method for Predicting Future Trainer Costs via
Analysis of Historical Data
Zachary Forrest
13312 Thomasville Circle #54 D, Tampa, FL, 33617; zachary9@mail.usf.edu; (813) 438-3297; Naval Air
Warfare Center Training Systems Division, Cost Department, Code 4.2; University of South Florida;
M.A. in Mathematics
Abstract
This paper discuss the motivations behind the abstraction of training systems’ cost data to statis-
tical models and difficulties present within such efforts. Current methods of generating trainer costs
require a significant investment in terms of both time and effort. Special attention is given to data
taken from a particular database; and all efforts described within this paper - although discussed in
abstract detail - were applied to the problem of generating initial work towards a parametric evalua-
tion method for trainer costs. The paper covers some evaluated analytical methods; and introduces
one method of analysing cost data for use in future prediction, with the intention of providing an
approach which is grounded in empirical data. Finally, the paper identifies some difficulties present
in the method; provides some comments for the purposes of improving this and future models; and
discusses some of the goals which are being set for predictive analyses of training systems.
The views expressed herein are those of the author and do not necessarily reflect the official position
of the Department of Defense or its components.
Motivations
The NAVAIR Cost Department provides cost and scheduling support for training systems utilized in
training of warfighters; and as such, estimation of trainer costs plays a vital role in all tasks handled
within the department. Such estimates require the use of historical contract costs for the purposes
of determining viable statistical models - empirical data naturally drives all cost generation methods.
Due, however, to considerations of cost threshold, contract type, and program risk, the data available
for such uses is exceptionally limited. Within the Training Systems Division, the primary source of
data for this analysis is the manually generated Trainer Estimating Resource Network (TERN) which
contains two-hundred and forty-five data points describing device costs, system sub-costs, and device
information pertinent to the contract - among other information. From this data emerge several
questions: (1) is it possible to reliably generate future estimates of trainer cost data under such
limiting conditions?; (2) in what manner would such estimates be generated?; and (3) in the event
that such a tool does indeed exist, can it be abstracted to a form which could potentially apply to
other cost data? As this paper will endeavor to show, the answer to all of these questions is (within
certain limitations), “yes.”
The possibility of constructing statistical models for estimating training system costs is a desirable
one. Such a tool would permit a quantitative, mathematical manner in which we could represent
contract costs; represent trends in contracts; and even provide an empirical framework on which to
found skepticism with regard to contractor bids and aid in decision-making pertaining to those bids.
All of these are true and useful benefits of a good statistical model; however, the intrinsic value of
such a tool extends even further.
Currently, many (if not all) cost predictions generated by the Training Systems Division for trainers
entail a time consuming process which involves careful scutiny of all relevant information to the specific
1
trainer; and while there is certainly no dearth of historical information at the CLIN -level on purchases
of trainers, very little of this is in a format which is ideal for use in cost estimation. Although
experienced cost analysts may develop a sense for which costs are likely and unlikely, it remains true
that, for the majority of contract costs, an ad hoc approach1 for determining accurate estimates is
necessary. Similarly, while some suggestions have been made regarding comparisons of sub-costs to
base costs of trainers, proposed estimation factors often lack testing or support from empirical data.
Such circumstances propagate sub-optimal conditions in which to provide more accurate training
system costs for our nation’s warfighters. For if we cannot produce accurate predictions swiftly and
effectively (in a repeatable manner), we shall necessarily incur increased costs both in terms of money
and man-hours spent pursuing an estimate; and if unchecked, such costs could potentially inhibit our
financial capability to acquire and maintain training systems.
If a good statistical model for training systems costs is a necessary tool, several questions are of
immediate concern in the effort to build such a model. Specifically, these questions are: (1) what do
we mean by a “good statistical model”?; and (2) what are key details to look for in a good model?
The first question seems to have a fairly obvious (if somewhat vague) answer: a good statistical model
is any model which accurately and reliably produces predictions regarding the quantitative details of a
given subject matter; and moreover, should be more simple and time efficient to apply to the subject
matter than an ad hoc analysis. The second question, however, requires a little more thought. Clearly,
an important criterion is the ability to apply a proposed methodology across any recent cost data with
impunity; that is, without fear that such a technique may succeed with regard to certain cost data
and yet fail with regard to other data. And since we wish to predict future events in addition to
describing past events, we must also restrict our considerations to statistical techniques that provide
such a capability. What other criteria, then, are important for our model?
Another important point to be considered here is that there are many different variants of training
systems - even for each platform. Moreover, training systems from one platform may not be comparable
to the same variety of training systems for a different platform. (e.g. A flight simulator built for an
F/A-18C platform is almost certainly distinct from a flight simulator built for an MH-60R.) Whatever
method we adopt, it must be capable of separating (or partitioning) data so that similar data remains
categorized together apart from non-similar data. Thought should also be given to the notion of
unusual costs. Certainly such costs do exist (e.g. in first units, where certain non-recurring costs are
commonly found) and our method must be capable of recognizing these outliers; recording the extent
of their deviation from typical data points; and making use of the unusual data to further predictive
capabilities. The approach should be able to apply to new data sets - in the sense of either entirely
new data sets or old data sets with new data included - without undue difficulty. Finally, to be of
true benefit, our method of choice must be capable of summary in some easily read format so that
cost analysts and decision-makers alike may make swift use of results. With these criteria firmly in
mind, we are now ready to turn our attention to questions of detail.
Initial Attempts
Primary concern in developing the details of the final analytical method was initially given to finding
a uniform approach to partitioning the TERN database. (We will write C to denote TERN cost data.)
Table 1: Breakdown of C
Full-task Trainers: 134
Part-task Trainers: 105
Desktop Trainers: 6
Total Number of Devices: 245
1
More commonly referred to as a Bottoms Up or Technical Assessment approach.
2
As mentioned above, it is not necessarily true that any two arbitrary devices (even if classified
with identical device types) may be considered together meaningfully in an analysis; and so initial
statistical tests were run on multiple partitionings of C for the purpose of determining in what manner
similarity could be guaranteed amongst data - i.e. to ensure data homogeneity. These tests included
using statistical measures of central tendency including means, standard deviations, and correlation
coefficients taken over various partitionings of subsets of C - and these partitions were subsequently
tested again under the approach that we shall presently discuss. From these minor statistical tests
- and, indeed, even through use of our proposed methodology - a rather striking fact was quickly
deduced. Namely, that few partitionings of C would support general predictive analysis due to the
wide amounts of variation in data present in C.
Few similarities were seen when devices were partitioned by platform, contract year, contractor,
or even device type - for example.2 In all cases explored, it became quite apparent that there was not
significant similarity between members of partitions in the sense that the difference between the cost
of members - and the difference between members and the mean of those members - was sufficiently
large to guarantee large standard deviations. From this, the predictive tools utilized - which will be
discussed in the following section - within the scope of this project were incapable of generating useful
cost estimates. After some experimentation it became clear that the only method of partitioning in
which any meaningful similarity could be observed was in dividing the data between new training
systems and upgrades of existing training systems.
Further experimentation and observation of trends suggested that a second-level partition of devices
- this time by device type - was necessary to continued analysis; and subsequent, similarly executed
work suggested further such continuations of partitioning. The resultant partition called for devices
to be partitioned in the following manner: first by device type; second by platform; third by whether
the product was new or an upgrade of a previous product; fourth (for upgrade products) by whether
an upgrade was a modification or a “tech refresh”; and fifth, products were divided into full-task,
part-task, and desktop training devices.
At this point in the analysis, thought was finally turned to the question of describing the data
present in C in a fashion amenable to prediction. As with the determination of the method by which C
was to be partitioned, multiple different approaches were considered and discarded; and determination
of an approach’s efficacy was judged upon whether data resultant from the approach could be used
by a cost analyst. From these proceedings, the CERPA analysis was created.
The CERPA Analysis Method
Terminology and Definitions
In order to discuss the Cost Estimating Resource for Predictive Analysis (CERPA) method,
it is first necessary to consider technical details regarding notation and some definitions. If A is a
subset of C (written A ⊆ C), then the sample mean and sample standard deviation of cost data in A
are written as the symbols ¯xA and sA respectively. (Note that for our purposes, we will never consider
the situation A = C.) By a prediction interval for a subset A, we refer to all cost data x so that
|x − ¯xA| ≤ λ with λ defined as
λ =: tn,α/2 · sA · 1 +
1
n
, (1)
where tn,α/2 is a Student-t value defined for n - the number of elements in A, which we assume is
at least 3 - and α =: 0.20.3 Note that given A, a prediction interval generated on A is constructed
to predict individual, point-data of subsequent samples drawn from the same population of data; and
2
More experimentation with a less limited data set is required.
3
It is important to stress that (1) forms predictions for a future point of observation; and does not predict future measures
of central tendency. In this way, it is different from tools like confidence intervals, which are commonly used in hypothesis
testing.
3
from the given choice of α, there is an 80% chance that any new data that is to be included in A will
fall between the values ¯xA − λ and ¯xA + λ. Finally, the following is presented in order to formalize a
definition for “unusual” data in C:
Definition: Let A ⊆ C so that A =: {x1, x2, . . . , xn}. Supposing that y is a point of A (written
y ∈ A), we say that y is a cost outlier for A provided that either y < ¯xA − sA or y > ¯xA + sA. If
y1, y2, . . . , ym are the cost outliers of A then, writing B =: A ∼ {yj}m
j=1 (the subset of A which fails
to contain cost outliers), we define the modification ˆyj of yj (j = 1, . . . , m) to be
ˆyj =:
yj − (|yj − ¯xB| − sB) if y < ¯xA − sA
yj + (|yj − ¯xB| − sB) if y > ¯xA + sA,
(2)
and write ˆA to mean the set A with each yj replaced by ˆyj.
Before proceeding, it is crucial that we understand the meaning of this definition and the value in
(2). The points singled out as being unusual in the above definition are those which fail to fall within
the “middle” 68% of a normal distribution with a mean of ¯xA and standard deviation of sA; and so is
a direct appeal to the Empirical Rule of normal distributions. The value ˆyj can be thought of as a
“horizontal translation of yj to the nearest extremal value of the distribution of non-outlier points.”
At this juncture, it may be prudent to briefly discuss some assumptions and decisions made re-
garding the definition above - and to clarify what is important to the CERPA methodology. From
empirical observations made on the set C, it is convenient to choose to call points which fall outside
the range of values which correspond to a distance of at most one standard deviation from the mean
unusual - although, without knowledge of C, readers may find this choice to be somewhat arbitrary. It
may be that with a greater amount of data, it will be more convenient to define some other regions of
a normal curve as containing unusual values; however, within the context of the set C, this particular
choice of definition is both reasonable and natural. It also may seem presumptuous to assume that
we may apply properties of a normal distribution to A when A may not be normally distributed; but
the implicit claim to be understood is not that A is normally distributed: rather, that the population
from which A is drawn is normally distributed. (Indeed, such an assumption is valid if for no reason
other than the truth of the Central Limit Theorem of statistics.) In order to make use of the
cost-outliers yj it is necessary to replace each such value with a value more typical to the distribution
implied by B; and in order to maintain the relationships between “low” and “high” cost-outliers,
the modification defined by the value ˆyj above has been chosen as being best capable of fulfilling
all necessary considerations - as opposed to mapping each yj to some randomly generated value, for
example.
Finally, it should be noted that the partitioning discussed in previous sections - while chosen for use
in the analysis discussed herein - is not an essential requirement to fulfilling the CERPA methodology.
Rather, it is simply the best empirically-backed manner in which to guarantee data homogeneity; and
on some other set of cost data (or other data), it is prudent to invoke the CERPA method only after
discerning a partitioning best suited to the set in question. We are now ready to discuss the CERPA
approach.
Details of CERPA
Although CERPA is a methodology, it was also generated within a Microsoft Excel workbook which
performed all necessary calculations. Thus, in our discussion of the approach, we will appeal to the
layout of the CERPA as an Excel workbook for reasons of simplicity and clarity.
Taking the cost data-set C, we populate a “Normed Non-Aggregated” (NNA) worksheet with the
method of partitioning discussed above which splits C into the subsets A1, A2, . . . , Ar; and then calcu-
late, for each index i = 1, 2, . . . , r, the outliers of Ai and their modified values. These modified values
are taken into a “Breakdown” worksheet in which prediction intervals are calculated and displayed.
Finally, the information represented in “Breakdown” is used to populate a “Cost Estimation” sheet,
in which summary-level values are displayed in a format which is meant to maximize the ease with
4
which the data can be interpreted. Additionally, “Cost Estimation” also displays values referred to
as Outlier Adjustment Values (OAV’s) which are given as a means of handling unusual data within
each partition-set Ai. Referring to the definition in the previous section, this value is defined as
max
j≤m
|yj − ¯xB|; and OAV’s are used to modify the upper and lower values of the prediction intervals
displayed in “Cost Estimation” for unusual products. (e.g. First products.) As an example, suppose
that the CERPA generates a predicted minimum of 1.2 million and predicted maximum of 2.6 million
for a certain partition; and an associated OAV of 0.7 million. Then our modified predicted minimum
and maximum are 0.5 million and 3.3 million respectively.
Generating values in this fashion, the CERPA methodology is able to produce prediction intervals
for each relevant partition of C in which both lower and upper values are strictly larger than zero.
These prediction intervals have been given to cost analysts; and it is hoped that, after testing, CERPA
will aid in the creation of a mathematical tool-box which can be used for estimating future training
systems.
Issues and Potential Improvements
Despite the positive nature of comments above, it is necessary to point out some flaws in the CERPA
approach as it currently stands. CERPA is, after all, a first step in a new direction; and it is almost
inevitable that it would suffer some defects. First, and most seriously, there was an insufficient
amount of data available for either strengthening the methodology or for performing statistical tests
(e.g. hypothesis tests) which might provide more insight into cost analysis efforts and the CERPA
itself; and furthermore, the lack of data-points limits which partitions may be considered under the
methodology. (For additional comments, consider the previous sections of this paper.) In the case of
partitions containing precisely 2 data points, the maximum and minimum cost data was substituted
for the CERPA approach; and singleton partitions were ignored completely. Another flaw in the
CERPA is that the CERPA is, by construction, capable of only a broad analysis of costs; and is
fundamentally incapable (in its current iteration) of answering questions regarding information which
pertains to the definition of trainers at the subsystem-level of specificity.
Yet another point to be considered concerns the intricacy present in the approach to modifying
cost outlier values. It should be noted that the proposed calculations involve three separate means
and standard deviation values; and that the values used to modify unusual points specifically exclude
those unusual points. It may seem more logical and reasonable to make use of fewer such calculations;
to make use, perhaps, of merely two such calculation pairs, and to utilize the mean and standard
deviation taken over the entire partition Ai (i = 1, . . . , r) to modify outlier values. But while this
approach may be intuitively superior for its simplicity, if nothing else; and while it is certainly the
place of this paper to propose (and indeed, encourage) the exploration of such changes to the CERPA
methodology; it must be noted that within the context of C, this change failed to produce meaningful
data. Until such time as greater quantities of contract-data containing greater amounts of detail are
readily available for use in testing and expanding the CERPA approach, it is likely that propositions
of this nature will meet similar difficulties.
However, the above are not fatal flaws within the approach. In fact, all of the critiques mentioned
can be seen as originating from the same essential problem: a lack of information (in terms of both
quantity and depth) in contracts paired with a lack of contract data to analyze. It should be recalled
that efforts expended upon the CERPA method were meant to determine if a predictive method could
be generated from the limited amount of data available within C; and in view of this, the efforts
described here are to be considered a success and a step forward. From the analyses performed,
it was shown to be possible to generate predictive values and predictive intervals directly from the
data in C. With the knowledge that such results do exist and are attainable, it is possible to either
refine CERPA or develop a more appropriate analytical tool as new data points are made available to
TERN. Although such tasks are time-consuming and tedious to perform, such an effort will produce
invaluable assets for the purposes of cost analysis; and moreover, such a tool may even harbor the
5
capacity of producing further, more powerful mathematical tools for assessing training system cost
data. Therefore, it is highly recommended that thought be given to the task of working with and
improving CERPA.
References
[1] Larson, Ron, and Betsy Farber. Elementary Statistics: Picturing the World 4th Edition. Upper
Saddle River: Prentice Hall, 2008. Print.
[2] Ramachandran, K. M., and Chris P. Tsokos. Mathematical Statistics with Applications. Burlington:
Academic Press, 2009. Print.
[3] Turner, Bryan. Trainer Estimating Resource Network (TERN) Master. 2012. Microsoft Excel file.
Information from the coursebook associated with the Defense Acquisition University course In-
termediate Cost Analysis (BCF 204) was also used in the development of material pertinent to this
paper.
6

More Related Content

What's hot

Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
IAEME Publication
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
Editor IJMTER
 
Techniques in marketing research
Techniques in marketing researchTechniques in marketing research
Techniques in marketing research
Sunny Bose
 
3rd alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a
3rd  alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a3rd  alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a
3rd alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a
Mahmoud Bahgat
 
Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...
Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...
Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...
IJARIDEA Journal
 
An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...
csandit
 
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
csandit
 
MSA – Attribute ARR Test
MSA – Attribute ARR TestMSA – Attribute ARR Test
MSA – Attribute ARR Test
Matt Hansen
 
Market analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterionMarket analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterion
Editor IJMTER
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
Dr. Mirko Kämpf
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
Rachit Mishra
 
Simplifying effort estimation based on use case points
Simplifying effort estimation based on use case pointsSimplifying effort estimation based on use case points
Simplifying effort estimation based on use case points
Abdulrhman Shaheen
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
Gregg Barrett
 
Poonam ahluwalia
Poonam ahluwaliaPoonam ahluwalia
Poonam ahluwalia
PMI2011
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
IJERA Editor
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET Journal
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
Farhan Zaki
 
Conceptualization of a Domain Specific Simulator for Requirements Prioritization
Conceptualization of a Domain Specific Simulator for Requirements PrioritizationConceptualization of a Domain Specific Simulator for Requirements Prioritization
Conceptualization of a Domain Specific Simulator for Requirements Prioritization
researchinventy
 

What's hot (18)

Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
 
Techniques in marketing research
Techniques in marketing researchTechniques in marketing research
Techniques in marketing research
 
3rd alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a
3rd  alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a3rd  alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a
3rd alex marketing club (pharmaceutical forecasting) dr. ahmed sham'a
 
Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...
Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...
Application Of Analytic Hierarchy Process And Artificial Neural Network In Bi...
 
An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...An approach for software effort estimation using fuzzy numbers and genetic al...
An approach for software effort estimation using fuzzy numbers and genetic al...
 
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...
 
MSA – Attribute ARR Test
MSA – Attribute ARR TestMSA – Attribute ARR Test
MSA – Attribute ARR Test
 
Market analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterionMarket analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterion
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
Simplifying effort estimation based on use case points
Simplifying effort estimation based on use case pointsSimplifying effort estimation based on use case points
Simplifying effort estimation based on use case points
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
Poonam ahluwalia
Poonam ahluwaliaPoonam ahluwalia
Poonam ahluwalia
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
Conceptualization of a Domain Specific Simulator for Requirements Prioritization
Conceptualization of a Domain Specific Simulator for Requirements PrioritizationConceptualization of a Domain Specific Simulator for Requirements Prioritization
Conceptualization of a Domain Specific Simulator for Requirements Prioritization
 

Viewers also liked

Contents research 3
Contents research 3Contents research 3
Contents research 3
RyanFrankish7
 
Sudipto CV
Sudipto CVSudipto CV
Sudipto CV
Sudipto Das
 
Informe riesgos CIFA 2016
Informe riesgos CIFA 2016Informe riesgos CIFA 2016
Informe riesgos CIFA 2016
Blancalala uribe
 
Codes and conventions
Codes and conventionsCodes and conventions
Codes and conventions
RyanFrankish7
 
Contents page research 2
Contents page research 2Contents page research 2
Contents page research 2
RyanFrankish7
 
Portfolio
PortfolioPortfolio
Portfolio
Baishakhi Biswas
 
Research 1 conventions
Research 1 conventionsResearch 1 conventions
Research 1 conventions
AlexanderSherrattASmedia
 
FUENTES DE ENERGIAS
FUENTES DE ENERGIASFUENTES DE ENERGIAS
FUENTES DE ENERGIAS
ANDRES HERRERA
 
Codes and conventions
Codes and conventionsCodes and conventions
Codes and conventions
RyanFrankish7
 
CV-Livia HRISCU-Engl
CV-Livia HRISCU-EnglCV-Livia HRISCU-Engl
CV-Livia HRISCU-Engl
Livia Hriscu
 
Research 1 conventions
Research 1 conventionsResearch 1 conventions
Research 1 conventions
AlexanderSherrattASmedia
 
Viral Nanoparticle Research Conference
Viral Nanoparticle Research ConferenceViral Nanoparticle Research Conference
Viral Nanoparticle Research Conference
Nicole Rowley
 
Contents page analysis 2
Contents page analysis 2Contents page analysis 2
Contents page analysis 2
AlexanderSherrattASmedia
 

Viewers also liked (13)

Contents research 3
Contents research 3Contents research 3
Contents research 3
 
Sudipto CV
Sudipto CVSudipto CV
Sudipto CV
 
Informe riesgos CIFA 2016
Informe riesgos CIFA 2016Informe riesgos CIFA 2016
Informe riesgos CIFA 2016
 
Codes and conventions
Codes and conventionsCodes and conventions
Codes and conventions
 
Contents page research 2
Contents page research 2Contents page research 2
Contents page research 2
 
Portfolio
PortfolioPortfolio
Portfolio
 
Research 1 conventions
Research 1 conventionsResearch 1 conventions
Research 1 conventions
 
FUENTES DE ENERGIAS
FUENTES DE ENERGIASFUENTES DE ENERGIAS
FUENTES DE ENERGIAS
 
Codes and conventions
Codes and conventionsCodes and conventions
Codes and conventions
 
CV-Livia HRISCU-Engl
CV-Livia HRISCU-EnglCV-Livia HRISCU-Engl
CV-Livia HRISCU-Engl
 
Research 1 conventions
Research 1 conventionsResearch 1 conventions
Research 1 conventions
 
Viral Nanoparticle Research Conference
Viral Nanoparticle Research ConferenceViral Nanoparticle Research Conference
Viral Nanoparticle Research Conference
 
Contents page analysis 2
Contents page analysis 2Contents page analysis 2
Contents page analysis 2
 

Similar to nej2.3

Manuscript dss
Manuscript dssManuscript dss
Manuscript dss
rakeshkumarford1
 
Cost estimation method
Cost estimation methodCost estimation method
Cost estimation method
Faheem Ullah
 
Data mining-for-prediction-of-aircraft-component-replacement
Data mining-for-prediction-of-aircraft-component-replacementData mining-for-prediction-of-aircraft-component-replacement
Data mining-for-prediction-of-aircraft-component-replacement
Saurabh Gawande
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
Editor IJCATR
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
aciijournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
Kitamura1992
Kitamura1992Kitamura1992
Kitamura1992
Cristian Garcia
 
Ijmet 10 01_141
Ijmet 10 01_141Ijmet 10 01_141
Ijmet 10 01_141
IAEME Publication
 
Pbl session 2 report
Pbl session 2 report Pbl session 2 report
Pbl session 2 report
NUR Ifa
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
IRJET Journal
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
IRJET Journal
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
ijsrd.com
 
A method for missing values imputation of machine learning datasets
A method for missing values imputation of machine learning datasetsA method for missing values imputation of machine learning datasets
A method for missing values imputation of machine learning datasets
IAESIJAI
 
Return on-investment (roi)
Return on-investment (roi)Return on-investment (roi)
Return on-investment (roi)
RSasirekha
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
John B. Rollins, Ph.D.
 
Feasibility_Study
Feasibility_StudyFeasibility_Study
Feasibility_Study
Swapnil Walde
 
Training and Placement Portal
Training and Placement PortalTraining and Placement Portal
Training and Placement Portal
IRJET Journal
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
IRJET Journal
 
Size estimation of olap systems
Size estimation of olap systemsSize estimation of olap systems
Size estimation of olap systems
csandit
 

Similar to nej2.3 (20)

Manuscript dss
Manuscript dssManuscript dss
Manuscript dss
 
Cost estimation method
Cost estimation methodCost estimation method
Cost estimation method
 
Data mining-for-prediction-of-aircraft-component-replacement
Data mining-for-prediction-of-aircraft-component-replacementData mining-for-prediction-of-aircraft-component-replacement
Data mining-for-prediction-of-aircraft-component-replacement
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Kitamura1992
Kitamura1992Kitamura1992
Kitamura1992
 
Ijmet 10 01_141
Ijmet 10 01_141Ijmet 10 01_141
Ijmet 10 01_141
 
Pbl session 2 report
Pbl session 2 report Pbl session 2 report
Pbl session 2 report
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
A method for missing values imputation of machine learning datasets
A method for missing values imputation of machine learning datasetsA method for missing values imputation of machine learning datasets
A method for missing values imputation of machine learning datasets
 
Return on-investment (roi)
Return on-investment (roi)Return on-investment (roi)
Return on-investment (roi)
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
Feasibility_Study
Feasibility_StudyFeasibility_Study
Feasibility_Study
 
Training and Placement Portal
Training and Placement PortalTraining and Placement Portal
Training and Placement Portal
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
 
Size estimation of olap systems
Size estimation of olap systemsSize estimation of olap systems
Size estimation of olap systems
 

nej2.3

  • 1. A Method for Predicting Future Trainer Costs via Analysis of Historical Data Zachary Forrest 13312 Thomasville Circle #54 D, Tampa, FL, 33617; zachary9@mail.usf.edu; (813) 438-3297; Naval Air Warfare Center Training Systems Division, Cost Department, Code 4.2; University of South Florida; M.A. in Mathematics Abstract This paper discuss the motivations behind the abstraction of training systems’ cost data to statis- tical models and difficulties present within such efforts. Current methods of generating trainer costs require a significant investment in terms of both time and effort. Special attention is given to data taken from a particular database; and all efforts described within this paper - although discussed in abstract detail - were applied to the problem of generating initial work towards a parametric evalua- tion method for trainer costs. The paper covers some evaluated analytical methods; and introduces one method of analysing cost data for use in future prediction, with the intention of providing an approach which is grounded in empirical data. Finally, the paper identifies some difficulties present in the method; provides some comments for the purposes of improving this and future models; and discusses some of the goals which are being set for predictive analyses of training systems. The views expressed herein are those of the author and do not necessarily reflect the official position of the Department of Defense or its components. Motivations The NAVAIR Cost Department provides cost and scheduling support for training systems utilized in training of warfighters; and as such, estimation of trainer costs plays a vital role in all tasks handled within the department. Such estimates require the use of historical contract costs for the purposes of determining viable statistical models - empirical data naturally drives all cost generation methods. Due, however, to considerations of cost threshold, contract type, and program risk, the data available for such uses is exceptionally limited. Within the Training Systems Division, the primary source of data for this analysis is the manually generated Trainer Estimating Resource Network (TERN) which contains two-hundred and forty-five data points describing device costs, system sub-costs, and device information pertinent to the contract - among other information. From this data emerge several questions: (1) is it possible to reliably generate future estimates of trainer cost data under such limiting conditions?; (2) in what manner would such estimates be generated?; and (3) in the event that such a tool does indeed exist, can it be abstracted to a form which could potentially apply to other cost data? As this paper will endeavor to show, the answer to all of these questions is (within certain limitations), “yes.” The possibility of constructing statistical models for estimating training system costs is a desirable one. Such a tool would permit a quantitative, mathematical manner in which we could represent contract costs; represent trends in contracts; and even provide an empirical framework on which to found skepticism with regard to contractor bids and aid in decision-making pertaining to those bids. All of these are true and useful benefits of a good statistical model; however, the intrinsic value of such a tool extends even further. Currently, many (if not all) cost predictions generated by the Training Systems Division for trainers entail a time consuming process which involves careful scutiny of all relevant information to the specific 1
  • 2. trainer; and while there is certainly no dearth of historical information at the CLIN -level on purchases of trainers, very little of this is in a format which is ideal for use in cost estimation. Although experienced cost analysts may develop a sense for which costs are likely and unlikely, it remains true that, for the majority of contract costs, an ad hoc approach1 for determining accurate estimates is necessary. Similarly, while some suggestions have been made regarding comparisons of sub-costs to base costs of trainers, proposed estimation factors often lack testing or support from empirical data. Such circumstances propagate sub-optimal conditions in which to provide more accurate training system costs for our nation’s warfighters. For if we cannot produce accurate predictions swiftly and effectively (in a repeatable manner), we shall necessarily incur increased costs both in terms of money and man-hours spent pursuing an estimate; and if unchecked, such costs could potentially inhibit our financial capability to acquire and maintain training systems. If a good statistical model for training systems costs is a necessary tool, several questions are of immediate concern in the effort to build such a model. Specifically, these questions are: (1) what do we mean by a “good statistical model”?; and (2) what are key details to look for in a good model? The first question seems to have a fairly obvious (if somewhat vague) answer: a good statistical model is any model which accurately and reliably produces predictions regarding the quantitative details of a given subject matter; and moreover, should be more simple and time efficient to apply to the subject matter than an ad hoc analysis. The second question, however, requires a little more thought. Clearly, an important criterion is the ability to apply a proposed methodology across any recent cost data with impunity; that is, without fear that such a technique may succeed with regard to certain cost data and yet fail with regard to other data. And since we wish to predict future events in addition to describing past events, we must also restrict our considerations to statistical techniques that provide such a capability. What other criteria, then, are important for our model? Another important point to be considered here is that there are many different variants of training systems - even for each platform. Moreover, training systems from one platform may not be comparable to the same variety of training systems for a different platform. (e.g. A flight simulator built for an F/A-18C platform is almost certainly distinct from a flight simulator built for an MH-60R.) Whatever method we adopt, it must be capable of separating (or partitioning) data so that similar data remains categorized together apart from non-similar data. Thought should also be given to the notion of unusual costs. Certainly such costs do exist (e.g. in first units, where certain non-recurring costs are commonly found) and our method must be capable of recognizing these outliers; recording the extent of their deviation from typical data points; and making use of the unusual data to further predictive capabilities. The approach should be able to apply to new data sets - in the sense of either entirely new data sets or old data sets with new data included - without undue difficulty. Finally, to be of true benefit, our method of choice must be capable of summary in some easily read format so that cost analysts and decision-makers alike may make swift use of results. With these criteria firmly in mind, we are now ready to turn our attention to questions of detail. Initial Attempts Primary concern in developing the details of the final analytical method was initially given to finding a uniform approach to partitioning the TERN database. (We will write C to denote TERN cost data.) Table 1: Breakdown of C Full-task Trainers: 134 Part-task Trainers: 105 Desktop Trainers: 6 Total Number of Devices: 245 1 More commonly referred to as a Bottoms Up or Technical Assessment approach. 2
  • 3. As mentioned above, it is not necessarily true that any two arbitrary devices (even if classified with identical device types) may be considered together meaningfully in an analysis; and so initial statistical tests were run on multiple partitionings of C for the purpose of determining in what manner similarity could be guaranteed amongst data - i.e. to ensure data homogeneity. These tests included using statistical measures of central tendency including means, standard deviations, and correlation coefficients taken over various partitionings of subsets of C - and these partitions were subsequently tested again under the approach that we shall presently discuss. From these minor statistical tests - and, indeed, even through use of our proposed methodology - a rather striking fact was quickly deduced. Namely, that few partitionings of C would support general predictive analysis due to the wide amounts of variation in data present in C. Few similarities were seen when devices were partitioned by platform, contract year, contractor, or even device type - for example.2 In all cases explored, it became quite apparent that there was not significant similarity between members of partitions in the sense that the difference between the cost of members - and the difference between members and the mean of those members - was sufficiently large to guarantee large standard deviations. From this, the predictive tools utilized - which will be discussed in the following section - within the scope of this project were incapable of generating useful cost estimates. After some experimentation it became clear that the only method of partitioning in which any meaningful similarity could be observed was in dividing the data between new training systems and upgrades of existing training systems. Further experimentation and observation of trends suggested that a second-level partition of devices - this time by device type - was necessary to continued analysis; and subsequent, similarly executed work suggested further such continuations of partitioning. The resultant partition called for devices to be partitioned in the following manner: first by device type; second by platform; third by whether the product was new or an upgrade of a previous product; fourth (for upgrade products) by whether an upgrade was a modification or a “tech refresh”; and fifth, products were divided into full-task, part-task, and desktop training devices. At this point in the analysis, thought was finally turned to the question of describing the data present in C in a fashion amenable to prediction. As with the determination of the method by which C was to be partitioned, multiple different approaches were considered and discarded; and determination of an approach’s efficacy was judged upon whether data resultant from the approach could be used by a cost analyst. From these proceedings, the CERPA analysis was created. The CERPA Analysis Method Terminology and Definitions In order to discuss the Cost Estimating Resource for Predictive Analysis (CERPA) method, it is first necessary to consider technical details regarding notation and some definitions. If A is a subset of C (written A ⊆ C), then the sample mean and sample standard deviation of cost data in A are written as the symbols ¯xA and sA respectively. (Note that for our purposes, we will never consider the situation A = C.) By a prediction interval for a subset A, we refer to all cost data x so that |x − ¯xA| ≤ λ with λ defined as λ =: tn,α/2 · sA · 1 + 1 n , (1) where tn,α/2 is a Student-t value defined for n - the number of elements in A, which we assume is at least 3 - and α =: 0.20.3 Note that given A, a prediction interval generated on A is constructed to predict individual, point-data of subsequent samples drawn from the same population of data; and 2 More experimentation with a less limited data set is required. 3 It is important to stress that (1) forms predictions for a future point of observation; and does not predict future measures of central tendency. In this way, it is different from tools like confidence intervals, which are commonly used in hypothesis testing. 3
  • 4. from the given choice of α, there is an 80% chance that any new data that is to be included in A will fall between the values ¯xA − λ and ¯xA + λ. Finally, the following is presented in order to formalize a definition for “unusual” data in C: Definition: Let A ⊆ C so that A =: {x1, x2, . . . , xn}. Supposing that y is a point of A (written y ∈ A), we say that y is a cost outlier for A provided that either y < ¯xA − sA or y > ¯xA + sA. If y1, y2, . . . , ym are the cost outliers of A then, writing B =: A ∼ {yj}m j=1 (the subset of A which fails to contain cost outliers), we define the modification ˆyj of yj (j = 1, . . . , m) to be ˆyj =: yj − (|yj − ¯xB| − sB) if y < ¯xA − sA yj + (|yj − ¯xB| − sB) if y > ¯xA + sA, (2) and write ˆA to mean the set A with each yj replaced by ˆyj. Before proceeding, it is crucial that we understand the meaning of this definition and the value in (2). The points singled out as being unusual in the above definition are those which fail to fall within the “middle” 68% of a normal distribution with a mean of ¯xA and standard deviation of sA; and so is a direct appeal to the Empirical Rule of normal distributions. The value ˆyj can be thought of as a “horizontal translation of yj to the nearest extremal value of the distribution of non-outlier points.” At this juncture, it may be prudent to briefly discuss some assumptions and decisions made re- garding the definition above - and to clarify what is important to the CERPA methodology. From empirical observations made on the set C, it is convenient to choose to call points which fall outside the range of values which correspond to a distance of at most one standard deviation from the mean unusual - although, without knowledge of C, readers may find this choice to be somewhat arbitrary. It may be that with a greater amount of data, it will be more convenient to define some other regions of a normal curve as containing unusual values; however, within the context of the set C, this particular choice of definition is both reasonable and natural. It also may seem presumptuous to assume that we may apply properties of a normal distribution to A when A may not be normally distributed; but the implicit claim to be understood is not that A is normally distributed: rather, that the population from which A is drawn is normally distributed. (Indeed, such an assumption is valid if for no reason other than the truth of the Central Limit Theorem of statistics.) In order to make use of the cost-outliers yj it is necessary to replace each such value with a value more typical to the distribution implied by B; and in order to maintain the relationships between “low” and “high” cost-outliers, the modification defined by the value ˆyj above has been chosen as being best capable of fulfilling all necessary considerations - as opposed to mapping each yj to some randomly generated value, for example. Finally, it should be noted that the partitioning discussed in previous sections - while chosen for use in the analysis discussed herein - is not an essential requirement to fulfilling the CERPA methodology. Rather, it is simply the best empirically-backed manner in which to guarantee data homogeneity; and on some other set of cost data (or other data), it is prudent to invoke the CERPA method only after discerning a partitioning best suited to the set in question. We are now ready to discuss the CERPA approach. Details of CERPA Although CERPA is a methodology, it was also generated within a Microsoft Excel workbook which performed all necessary calculations. Thus, in our discussion of the approach, we will appeal to the layout of the CERPA as an Excel workbook for reasons of simplicity and clarity. Taking the cost data-set C, we populate a “Normed Non-Aggregated” (NNA) worksheet with the method of partitioning discussed above which splits C into the subsets A1, A2, . . . , Ar; and then calcu- late, for each index i = 1, 2, . . . , r, the outliers of Ai and their modified values. These modified values are taken into a “Breakdown” worksheet in which prediction intervals are calculated and displayed. Finally, the information represented in “Breakdown” is used to populate a “Cost Estimation” sheet, in which summary-level values are displayed in a format which is meant to maximize the ease with 4
  • 5. which the data can be interpreted. Additionally, “Cost Estimation” also displays values referred to as Outlier Adjustment Values (OAV’s) which are given as a means of handling unusual data within each partition-set Ai. Referring to the definition in the previous section, this value is defined as max j≤m |yj − ¯xB|; and OAV’s are used to modify the upper and lower values of the prediction intervals displayed in “Cost Estimation” for unusual products. (e.g. First products.) As an example, suppose that the CERPA generates a predicted minimum of 1.2 million and predicted maximum of 2.6 million for a certain partition; and an associated OAV of 0.7 million. Then our modified predicted minimum and maximum are 0.5 million and 3.3 million respectively. Generating values in this fashion, the CERPA methodology is able to produce prediction intervals for each relevant partition of C in which both lower and upper values are strictly larger than zero. These prediction intervals have been given to cost analysts; and it is hoped that, after testing, CERPA will aid in the creation of a mathematical tool-box which can be used for estimating future training systems. Issues and Potential Improvements Despite the positive nature of comments above, it is necessary to point out some flaws in the CERPA approach as it currently stands. CERPA is, after all, a first step in a new direction; and it is almost inevitable that it would suffer some defects. First, and most seriously, there was an insufficient amount of data available for either strengthening the methodology or for performing statistical tests (e.g. hypothesis tests) which might provide more insight into cost analysis efforts and the CERPA itself; and furthermore, the lack of data-points limits which partitions may be considered under the methodology. (For additional comments, consider the previous sections of this paper.) In the case of partitions containing precisely 2 data points, the maximum and minimum cost data was substituted for the CERPA approach; and singleton partitions were ignored completely. Another flaw in the CERPA is that the CERPA is, by construction, capable of only a broad analysis of costs; and is fundamentally incapable (in its current iteration) of answering questions regarding information which pertains to the definition of trainers at the subsystem-level of specificity. Yet another point to be considered concerns the intricacy present in the approach to modifying cost outlier values. It should be noted that the proposed calculations involve three separate means and standard deviation values; and that the values used to modify unusual points specifically exclude those unusual points. It may seem more logical and reasonable to make use of fewer such calculations; to make use, perhaps, of merely two such calculation pairs, and to utilize the mean and standard deviation taken over the entire partition Ai (i = 1, . . . , r) to modify outlier values. But while this approach may be intuitively superior for its simplicity, if nothing else; and while it is certainly the place of this paper to propose (and indeed, encourage) the exploration of such changes to the CERPA methodology; it must be noted that within the context of C, this change failed to produce meaningful data. Until such time as greater quantities of contract-data containing greater amounts of detail are readily available for use in testing and expanding the CERPA approach, it is likely that propositions of this nature will meet similar difficulties. However, the above are not fatal flaws within the approach. In fact, all of the critiques mentioned can be seen as originating from the same essential problem: a lack of information (in terms of both quantity and depth) in contracts paired with a lack of contract data to analyze. It should be recalled that efforts expended upon the CERPA method were meant to determine if a predictive method could be generated from the limited amount of data available within C; and in view of this, the efforts described here are to be considered a success and a step forward. From the analyses performed, it was shown to be possible to generate predictive values and predictive intervals directly from the data in C. With the knowledge that such results do exist and are attainable, it is possible to either refine CERPA or develop a more appropriate analytical tool as new data points are made available to TERN. Although such tasks are time-consuming and tedious to perform, such an effort will produce invaluable assets for the purposes of cost analysis; and moreover, such a tool may even harbor the 5
  • 6. capacity of producing further, more powerful mathematical tools for assessing training system cost data. Therefore, it is highly recommended that thought be given to the task of working with and improving CERPA. References [1] Larson, Ron, and Betsy Farber. Elementary Statistics: Picturing the World 4th Edition. Upper Saddle River: Prentice Hall, 2008. Print. [2] Ramachandran, K. M., and Chris P. Tsokos. Mathematical Statistics with Applications. Burlington: Academic Press, 2009. Print. [3] Turner, Bryan. Trainer Estimating Resource Network (TERN) Master. 2012. Microsoft Excel file. Information from the coursebook associated with the Defense Acquisition University course In- termediate Cost Analysis (BCF 204) was also used in the development of material pertinent to this paper. 6