SlideShare a Scribd company logo
1 of 4
Download to read offline
SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472

RESEARCH ARTICLE

www.ijera.com

OPEN ACCESS

Statistical Methodology for Quality Comparison between
Algorithms
SuΓ‘rez DΓ­az Ronald1, Maldonado Ascanio Erik2, Escorcia Caballero Juan3
1

(Professor and Researcher, Department of Industrial Engineering, Universidad SimΓ³n BolΓ­var, Colombia)
(Professor and Researcher, Department of Industrial Engineering, Universidad AutΓ³noma del Caribe,
Colombia)
3
(Professor and Researcher, Department of Industrial Engineering, Universidad AutΓ³noma del Caribe,
Colombia)
2

ABSTRACT
Performance on the quality of the responses obtained by two algorithms, it is not something to be taken lightly.
Most of the tests used, are based on the calculation of average, however, this statistic can be affected by extreme
results. In addition, a mean difference may be statistically insignificant in practical terms can be decisive. In this
paper we propose a statistical test that solved the mentioned problems. The proposed methodology in this
research also determines the sample size and the probability that there is significant difference when really there
is not (Type II error), factors that generally are not considered in traditional tests.Keywords-Comparison of
Algorithms,
Number
of
instances,
Optimization,
Statistical
test,
Type
II
error.

I.

INTRODUCTION

It is common in the literature, when comparing the
performance of algorithms is to use a mean test,
mainly a paired t-student test, however, this method
may present some drawbacks from the statistical
viewpoint, since by using a paired t-student test
without verifying the normality assumption in the
samples can lead to significant biases in the
conclusions. Further that in some cases is not taken
into account the percentage of times that exceeds
another algorithm when solving a specific problem.
Theseinconvenientcan be explained by the fact that the
paired t-student test resulted in no significant
difference between the results obtained by both
algorithms for a particular problem, but such a
conclusion could be affected by the presence of
outliers (for example, one or more very large
differences in absolute value). Similarly, the presence
of some extremely large differences can give false
evidence that an algorithm outperforms another, when
in fact this may not be so. In optimization problems,
the statistical differences considered not significant
may prove to be dangerous, because if we are talking
for example, of thousands of dollars, two meanslike
985,000 and 965,000 may be statistically equal, but in
monetary terms, 20 million dollars is extremely
significant.

II.

LITERATURE REVIEW

Generally, when comparing two algorithms,
one of the most commonly used statistical measures is
the mean, eventually leading to the use of a t-student
test (Shilane et al, 2008). However, for the case that
compares 3 or more algorithms, it is more convenient
to make a modification in the calculation of the test
www.ijera.com

statistical (Kenward & Roger, 1997). Other multiple
comparison methods are treated by Steel & Torrie
(1980), Hochberg & Tamhane (1987), Shaffer (1995),
Sokal & Rohlf (1995), Hsu (1996) and Clever &
Scarisbrick, (2001). Besides the use of the average for
the comparison of algorithms, other measures of
interest are the variance and/or the entropy (Rogers &
Hsu, 2001; Piepho, 2004).
In regard to performance measurements used,
the most common is the quality measurement
algorithm. Authors such as Barr (1995), Eiben (2002),
Bartz & Beielstein (2004), Birattari (2005) and Hughes
(2006) present a list of the different performance
measures used in the literature, however, within the
measures, no importance is given to the percentage of
times that exceeds another algorithm. Peer et al (2003)
conducted a study, which shows several cases in which
appropriate statistical methods used to analyze results.
Furthermore, some authors report the use of
various methods to compare different types of
problems. Dietterich (1997) reviews five statistical test
for comparing learning algorithms Brazdil et al (2000)
compared ranking methods for the selection of
classification algorithms, Bouckaert (2003) employs
several calibration test for the selection of learning
algorithms and Shilane et al (2008) propose a
statistical methodology for genetic algorithms
performance comparison. In the methodologies cited
above the average ranking, ranking of successive radii
rate, among others.
In this paper, we illustrate a statistical
methodology for the comparison of results between
algorithms, taking into account the value of the
differences, although not statistically significant,
without being affected by this alleged meeting or
469 | P a g e
SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472
outliers. A procedure similar to the one at, is employed
in the design of sampling plans for quality control
(Montgomery, 2009), but the focus of this
investigation is different, and the literature review
conducted, indicates that there has been used for
comparison of algorithms, or estimate the quality of a
particular algorithm.

III.

www.ijera.com

Case 2: when trying to prove 𝑝 ≀ 𝑝0 , for this case, we
have:
𝐻0 : 𝑝 = 𝑝0
𝐻1 : 𝑝 > 𝑝0
(3)

PROPOSED METHODOLOGY
The p value corresponding to this case is:

Consider the following set of symbols required to build
the theoretical framework of the proposed
methodology. The developed method seeks to solve a
minimization problem.
n: Numberofinstancesoftheproblemtosolve
fA : Objectivefunctionvalue bythealgorithmA
fB : ObjectivefunctionvaluebythealgorithmB
fB βˆ’ fA
Ξ³A/B : Relative efficiency of Aover B =
fA
1, ifΞ³A/B β‰₯ Ξ³0
x: Bernoulli random variable =
0, inothercase
n

X = Total number of successes =

xi

𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 β‰₯ 𝑋0 =

𝑛
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘–
𝑖 0

𝑛
𝑖=𝑋0

(4)

The rejection criteria is identical to the previous case.
Case 3: when trying to prove 𝑝 = 𝑝0 , for this case, we
have:
𝐻0 : 𝑝 = 𝑝0
𝐻1 : 𝑝 β‰  𝑝0
(5)
For this case, if there is no evidence to support the null
hypothesis, we observe or be a very small value of X,
or a very large value. Thus, there are two sub-cases:

i=1

p: P(x = 1)
Ξ±: Type I error
Ξ²: Type II error
Thus, instead of inquiring whether an average
algorithm outperforms other, we propose to determine
if 100p% (strictly equal, at least, or maximum) of the
time, an algorithm outperforms other with a minimum
relative efficiency 𝛾0 . Since X is the sum of Bernoulli
random variables, turns out to be a binomial random
variable, whereby this distribution should be used to
verify the statistical validity of the assertion with
respect to p. Following we describe each of the three
cases that can occur.
Case 1: when trying to prove 𝑝 β‰₯ 𝑝0 , The approach of
the hypothesis would be the following:
𝐻0 : 𝑝 = 𝑝0
𝐻1 : 𝑝 < 𝑝0
(1)
At first glance, it seems a test of proportions, leading
to use the normal distribution (assuming p fits this
distribution) to reach a conclusion. However, the idea
is not to rely on this assumption and using the
Binomial distribution would be really appropriate. If
the null hypothesis is true, this should be reflected in
the value of X in the sample. A high p value observed
should lead to a large value of X, while a small value
of p should be reflected in a small value of X. Thus, if
the null hypothesis is true, finding a value greater than
or equal to X should be very unlikely. Then, the p
value of the test proposed in (1) is calculated as:
𝑛
𝑋0
𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 ≀ 𝑋0 = 𝑖=0
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘–
(2)
𝑖 0
Then, H0 must be rejected if 𝑝 π‘£π‘Žπ‘™π‘’π‘’ ≀ 𝛼.

www.ijera.com

1) 𝑋 ≀ 𝑛/2. since, at the beginning, is unknown in
what sense is the deviation of x, we have:
𝑛
𝑋0
𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 ≀ 𝑋0 = 2 𝑖=0
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– (6)
𝑖 0
2) 𝑋 β‰₯ 𝑛/2. we have:

𝑛

𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃 𝑋 β‰₯ 𝑋0 = 2
𝑖=𝑋0

𝑛
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘–
𝑖 0

(7)
Example: Suppose two algorithms were used to solve
10 instances of the same minimization problem. The
results obtained were as follows:
Table 1. Example for the comparison of algorithms
Instance
1
2
3
4
5
1875
875
1335
542
852
𝑓𝐴
1897
924
842
624
892
𝑓𝐡
Instance
6
7
8
9
10
1854
185
2583
3541
365
𝑓𝐴
1896
120
2612
3663
369
𝑓𝐡
We want to determine if the algorithm A exceeds at
least 80% of the time the algorithm B, with a 95%
confidence level. If used a paired t-student test and if
we verify the equality of means (a different
hypothesis), we would have:
𝑑 = 6.8; 𝑠 𝑑 =
144,11; 𝑇 = 0.472and 𝑑0.025 ,9 = 1.83. Since
𝑇<
𝑑0.025 ,9 , we cannot reject the null hypothesis, so that
the algorithms in average generate the same result.
However, in practical problems, what really matters is
to see which algorithm gets the best answer in most
cases.We can propose the following hypothesis:

470 | P a g e
SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472
𝐻0 : 𝑝 = 0.8
𝐻1 : 𝑝 < 0.8

𝑍𝛼

8

𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 ≀ 8 =
𝑖=0

10
0.8 𝑖 0.210βˆ’π‘–
𝑖

= 0.62 > 0.05
Therefore, we cannot reject the null hypothesis, so that
with a 95% confidence, it is concluded that the
algorithm A exceeds the algorithm B at least 80% of
the time.
3.1 Type II error and Number of instances
Generally, researchers arbitrarily choose the number of
instances that apply to algorithms to compare, which
can carry with it a potentially dangerous type II error.
Suppose that the test proposed in (1), the true value of
𝑝 is at least 𝑝′ 0 , where 𝑝′ 0 < 𝑝. Let π‘Ÿ be the critical
value of the test:
π‘Ÿ
𝑖=0

𝑃 𝑋≀ π‘Ÿ = 𝛼=

𝑛
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘–
𝑖 0

(8)

Since in a hypothesis test, 𝛽 is the probability of not
rejecting the null hypothesis when it is false, we have:
𝑛
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘–
𝑖 0
(9)
(8) and (9) representa system of two non-linear
equations with two unknown variables, π‘Ÿ and𝑛. Since
the system is extremely difficult to solve, two phases
will be used to solve it:
𝛽 = 𝑃(𝑋 > π‘Ÿ/𝑝 β‰₯ 𝑝′ 0 ) =

𝑛
𝑖=π‘Ÿ+1

Phase 1: Normal approximation
Assuming an initial approximation of the binomial
distribution to the normal distribution, we have the
following parameterization:
𝑍=

π‘₯βˆ’π‘›π‘

(10)

𝑛𝑝 (1βˆ’π‘)

Thus, using jointly (8), (9) and (10), we have:
𝑍𝛼 =
𝑍1βˆ’π›½ =

π‘Ÿβˆ’π‘›π‘ 0

(12)

𝑛𝑝′0 (1βˆ’π‘β€²0 )

Isolating π‘Ÿ from (11) and (12):
𝑍 𝛼 𝑛𝑝0 (1 βˆ’ 𝑝0 ) + 𝑛𝑝0 = 𝑍1βˆ’π›½
𝑛𝑝′0
Associating terms, we have:
𝑛

𝑍𝛼

𝑝0 1 βˆ’ 𝑝0 βˆ’ 𝑍1βˆ’π›½
= 𝑛(𝑝′ 0 βˆ’ 𝑝0 )

www.ijera.com

𝑛𝑝′0 (1 βˆ’ 𝑝′0 ) +
(13)
𝑝′ 0 1 βˆ’ 𝑝′ 0

(𝑝′ 0 βˆ’ 𝑝0 )
2

𝑍𝛼

𝑛′ =

𝑛𝑝 β€² 0 1βˆ’π‘ β€² 0

𝑝 0 1βˆ’π‘ 0 βˆ’π‘1βˆ’π›½

(14)

(𝑝 β€² 0 βˆ’π‘ 0 )2

Since (14) can calculate an approximation only, we
should look for the exact number of instances to run.
This is done in phase two.
Phase 2: Calculation for the number of instances
Then, we propose an algorithm to find the appropriate
value of the number of instances that ensures desired
significance and statistical power in the analysis.
Step 1: make n = nβ€²
Step 2: Calculate a and b such that:
π‘₯

π‘Ž = {min π‘₯ /
π‘₯

𝑏 = {min π‘₯ /
𝑖=0

𝑖=0

𝑛
𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– ≀ 𝛼}
𝑖 0

𝑛
𝑝′0 𝑖 (1 βˆ’ 𝑝0 )β€² π‘›βˆ’π‘– β‰₯ 1 βˆ’ 𝛽}
𝑖

𝐼𝑓 π‘Ž = 𝑏 𝑒𝑛𝑑, 𝑒𝑙𝑠𝑒, π‘šπ‘Žπ‘˜π‘’ 𝑛
= 𝑛 + 1 π‘Žπ‘›π‘‘ π‘Ÿπ‘’π‘π‘’π‘Žπ‘‘ 𝑠𝑑𝑒𝑝
Phase 1 avoids unnecessary scans, as it establishes a
minimum limit to the number of instances to be
executed.
3.2 Performance Measure of an Algorithm
Besides being useful for proper comparison of
algorithms in optimization problems, the test can be
modified to measure how well an algorithm performs
compared to the optimal solution of the problem (if the
latter is known). Under the assumption that this is a
minimization problem, let 𝑓 𝑂𝑃𝑇 be the minimum of a
particular instance. Thus, we define the following:
𝑓 𝐴 βˆ’ 𝑓 𝑂𝑃𝑇
𝛾: π‘Ÿπ‘’π‘™π‘Žπ‘‘π‘–π‘£π‘’ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘› =
𝑓 𝑂𝑃𝑇
Redefining x:
1, 𝑖𝑓 𝛾 ≀ 𝛾0
π‘₯=
0, 𝑖𝑛 π‘œπ‘‘β„Žπ‘’π‘Ÿ π‘π‘Žπ‘ π‘’
𝑛

𝑋=
(11)

𝑛𝑝 0 (1βˆ’π‘ 0 )
π‘Ÿβˆ’π‘›π‘β€²0

𝑝′ 0 1 βˆ’ 𝑝′ 0

𝑝0 1 βˆ’ 𝑝0 βˆ’ 𝑍1βˆ’π›½

𝑛=

From the information, we have 𝛾0 = 0, from which it
follows𝑋0 = 8. Thus:

www.ijera.com

π‘₯𝑖
𝑖=0

Thus, the methodology allows us to study, with some
reliability, the percentage of times that the algorithm is
at a maximum distance of the optimal solution of a
problem.
Must be clarified an extremely important point, in
regard to the scope of the conclusion of the test. When
working with instances of very different sizes, it is
advisable to group them by type (small, medium and
large) and make an Anova Test first to determine
whether the size of the instance is a factor that affects
the efficiency of the response obtained by the
algorithm. If so, the methodology proposed here
should be applied only to relatively homogeneous size
instances, so that the conclusions are only valid for
instances of that range. If on the contrary, the variance
471 | P a g e
SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472
analysis revealed that the quality of the algorithm is
independent of the size of the instance, then, the test
results can be generalized.
IV.
CONCLUSION
In this paper, we present a statistical methodology for
the comparison of algorithms used to solve
optimization problems. The methodology does not
depend on compliance with any statistical assumption
for the accuracy of their results, and allows us to
establish with any degree of reliability, the percentage
of times that an algorithm outperforms another. This
statistic test, under minor modifications, can be
extended to measure the performance of an algorithm
compared to the optimal response to the problem. The
scope of the conclusions depends on whether the
quality of the response of the algorithm is affected or
not by the size of instances resolved.

[11]

[12]

[13]
[14]

[15]

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]
[10]

Barr, R., Golden, R., Kelly, J., Rescende M.,
Stewart, W. Designing and Reporting on
Computational Experiments with Heuristic
Methods, Journal of Heuristics, 1995.
Bartz, T., Beielstein, T. Design and Analysis
of
Optimization
Algorithms
Using
Computational
Statistics.
AppliedNumericalAnalysis& Computational
Mathematics, 2004.
Birattari, M., Dorigo, M. How to assess and
report the performance of a stochastic
algorithm on a benchmark problem: Mean or
best result on a number of runs? IRIDIA,
Universite Libre de Bruxelles, 2005.
Bouckaert, R. Choosing between two learning
algorithms based of calibrate test. Twentieth
International
Conferenceon
Machine
Learning. Washington DC, 2003.
Brazdil, P, Soarez, C. A Comparison of
Ranking
Methods
for
Classification
Algorithm Selection. Faculty of Economics,
University of Porto, Portugal.
Clever, A.G., Scarisbrick, D.H., 2001.
Practical Statistics and Experimental Design
for Plant and Crop Science. Wiley, New
York.
Dietterich, T. Approximate Statistical Test for
Comparing
Supervised
Classification
Learning
Algorithms.
Corvallis,
OregonStateUniversity, 1993.
Eiben, A. E., Jelasity, M. A critical note on
experimental research methodology in EC.
Proceedings of the 2002 Congress on
Evolutionary Computation (CEC '02), 2002.
Hochberg,Y., Tamhane, A.C., 1987. Multiple
Comparison Procedures.Wiley, New York.
Hsu, J.C., 1996. Multiple Comparisons:
Theory and Methods. Chapman& Hall,
London.

www.ijera.com

[16]

[17]
[18]

[19]
[20]

www.ijera.com

Hughes, E. J. Assessing Robustness of
Optimization Performance for Problems with
Expensive Evaluation Functions. IEEE
CongressonEvolutionaryComputation (CEC
2006), 2006.
Kenward, M.G., Roger, J.H., 1997. Small
sample inference for fixed effects from
restricted maximum likelihood. Biometrics 53
(3), 983–997.
Montgomey, D., 2009. Statistical Quality
Control. Wiley, Arizona.
Peer, E. S., Engelbrecht, A. P., Van den
Bergh, F. CIRG@UP OptiBench: A
statistically
sound
framework
for
benchmarking
optimisation
algorithms.
CongressonEvolutionaryComputation, 2003.
Piepho, H.-P., 2004. An algorithm for a letterbased
representation
of
all-pairwise
comparisons. J. Comput. Graph. Statist. 13,
456–466.
Rogers, J.A., Hsu, J.C., 2001. Multiple
Comparisons of Biodiversity. Biometrical J.
43 (5), 617–625.
Shaffer, J.P., 1995. Multiple hypothesis
testing. Ann. Rev. Psych. 46, 561–584.
Shilane, D., Martikainen, J., Dudoit, S.,
Ovaska, S. A General Framework for
Statistical Performance Comparison of
Evolutionary
Computation
Algorithms.
InformationSciences 178 (2008), 2870-2879.
Sokal, R.R., Rohlf, F.J., 1995. Biometry.
Freeman, New York.
Steel, R.G.D., Torrie, J.H., 1980. Principles
and Procedures of Statistics. McGraw-Hill,
New York.

472 | P a g e

More Related Content

What's hot

Les5e ppt 07
Les5e ppt 07Les5e ppt 07
Les5e ppt 07Subas Nandy
Β 
Statistical tests
Statistical testsStatistical tests
Statistical testsmartyynyyte
Β 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
Β 
Test hypothesis
Test hypothesisTest hypothesis
Test hypothesisHomework Guru
Β 
Presentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin PremPresentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin PremAshikAminPrem
Β 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsLong Beach City College
Β 
High-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsHigh-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsNBER
Β 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Ruru Chowdhury
Β 
Practice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing SolutionPractice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing SolutionLong Beach City College
Β 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: EstimationParag Shah
Β 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- IIIAkhila Prabhakaran
Β 
Student’s t test
Student’s  t testStudent’s  t test
Student’s t testLorena Villela
Β 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection GuideLeanleaders.org
Β 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testingjemille6
Β 

What's hot (20)

Basics of Hypothesis Testing
Basics of Hypothesis TestingBasics of Hypothesis Testing
Basics of Hypothesis Testing
Β 
Contingency Tables
Contingency TablesContingency Tables
Contingency Tables
Β 
Les5e ppt 07
Les5e ppt 07Les5e ppt 07
Les5e ppt 07
Β 
Statistical tests
Statistical testsStatistical tests
Statistical tests
Β 
Goodness of Fit Notation
Goodness of Fit NotationGoodness of Fit Notation
Goodness of Fit Notation
Β 
Chi square test
Chi square test Chi square test
Chi square test
Β 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Β 
Test hypothesis
Test hypothesisTest hypothesis
Test hypothesis
Β 
Mc namer test of correlation
Mc namer test of correlationMc namer test of correlation
Mc namer test of correlation
Β 
Presentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin PremPresentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin Prem
Β 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
Β 
High-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsHigh-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural Effects
Β 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
Β 
Practice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing SolutionPractice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing Solution
Β 
Mc Nemar
Mc NemarMc Nemar
Mc Nemar
Β 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: Estimation
Β 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
Β 
Student’s t test
Student’s  t testStudent’s  t test
Student’s t test
Β 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection Guide
Β 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
Β 

Viewers also liked

Viewers also liked (20)

Bl36378381
Bl36378381Bl36378381
Bl36378381
Β 
Cf36490495
Cf36490495Cf36490495
Cf36490495
Β 
Bz36459463
Bz36459463Bz36459463
Bz36459463
Β 
Ci36507510
Ci36507510Ci36507510
Ci36507510
Β 
Bp36398403
Bp36398403Bp36398403
Bp36398403
Β 
Cp36547553
Cp36547553Cp36547553
Cp36547553
Β 
Ck36515520
Ck36515520Ck36515520
Ck36515520
Β 
Bu36433437
Bu36433437Bu36433437
Bu36433437
Β 
Bn36386389
Bn36386389Bn36386389
Bn36386389
Β 
Bt36430432
Bt36430432Bt36430432
Bt36430432
Β 
Bk36372377
Bk36372377Bk36372377
Bk36372377
Β 
Cq36554559
Cq36554559Cq36554559
Cq36554559
Β 
By36454458
By36454458By36454458
By36454458
Β 
Bw36444448
Bw36444448Bw36444448
Bw36444448
Β 
Bi36358362
Bi36358362Bi36358362
Bi36358362
Β 
Bf36342346
Bf36342346Bf36342346
Bf36342346
Β 
Bo36390397
Bo36390397Bo36390397
Bo36390397
Β 
Co36544546
Co36544546Co36544546
Co36544546
Β 
Bv36438443
Bv36438443Bv36438443
Bv36438443
Β 
Cj36511514
Cj36511514Cj36511514
Cj36511514
Β 

Similar to Cb36469472

06StatisticalInference.pptx
06StatisticalInference.pptx06StatisticalInference.pptx
06StatisticalInference.pptxvijaykumar838577
Β 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learningSadia Zafar
Β 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingDenni Domingo
Β 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxmattinsonjanel
Β 
Non Parametric Test by Vikramjit Singh
Non Parametric Test by  Vikramjit SinghNon Parametric Test by  Vikramjit Singh
Non Parametric Test by Vikramjit SinghVikramjit Singh
Β 
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture FinalNBER
Β 
statistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).pptstatistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).pptHalilIbrahimUlusoy
Β 
Statistics practice for finalBe sure to review the following.docx
Statistics practice for finalBe sure to review the following.docxStatistics practice for finalBe sure to review the following.docx
Statistics practice for finalBe sure to review the following.docxdessiechisomjj4
Β 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
Β 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
Β 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptxmahamoh6
Β 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptSolomonkiplimo
Β 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxcockekeshia
Β 
A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...
A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...
A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...mathsjournal
Β 
sigir2018tutorial
sigir2018tutorialsigir2018tutorial
sigir2018tutorialTetsuya Sakai
Β 
Bel ventutorial hetero
Bel ventutorial heteroBel ventutorial hetero
Bel ventutorial heteroEdda Kang
Β 
Statistics
StatisticsStatistics
StatisticsBob Smullen
Β 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
Β 

Similar to Cb36469472 (20)

06StatisticalInference.pptx
06StatisticalInference.pptx06StatisticalInference.pptx
06StatisticalInference.pptx
Β 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
Β 
SNM-1.pdf
SNM-1.pdfSNM-1.pdf
SNM-1.pdf
Β 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Β 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docx
Β 
Non Parametric Test by Vikramjit Singh
Non Parametric Test by  Vikramjit SinghNon Parametric Test by  Vikramjit Singh
Non Parametric Test by Vikramjit Singh
Β 
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture Final
Β 
statistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).pptstatistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).ppt
Β 
Statistics practice for finalBe sure to review the following.docx
Statistics practice for finalBe sure to review the following.docxStatistics practice for finalBe sure to review the following.docx
Statistics practice for finalBe sure to review the following.docx
Β 
1624.pptx
1624.pptx1624.pptx
1624.pptx
Β 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
Β 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
Β 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
Β 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.ppt
Β 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Β 
A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...
A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...
A NEW STUDY OF TRAPEZOIDAL, SIMPSON’S1/3 AND SIMPSON’S 3/8 RULES OF NUMERICAL...
Β 
sigir2018tutorial
sigir2018tutorialsigir2018tutorial
sigir2018tutorial
Β 
Bel ventutorial hetero
Bel ventutorial heteroBel ventutorial hetero
Bel ventutorial hetero
Β 
Statistics
StatisticsStatistics
Statistics
Β 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
Β 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Β 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
Β 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Β 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Β 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Β 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Β 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Β 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Β 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
Β 
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
Β 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Β 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Β 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Β 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Β 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Β 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Β 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Β 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Β 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Β 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Β 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Β 
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
Β 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Β 

Cb36469472

  • 1. SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472 RESEARCH ARTICLE www.ijera.com OPEN ACCESS Statistical Methodology for Quality Comparison between Algorithms SuΓ‘rez DΓ­az Ronald1, Maldonado Ascanio Erik2, Escorcia Caballero Juan3 1 (Professor and Researcher, Department of Industrial Engineering, Universidad SimΓ³n BolΓ­var, Colombia) (Professor and Researcher, Department of Industrial Engineering, Universidad AutΓ³noma del Caribe, Colombia) 3 (Professor and Researcher, Department of Industrial Engineering, Universidad AutΓ³noma del Caribe, Colombia) 2 ABSTRACT Performance on the quality of the responses obtained by two algorithms, it is not something to be taken lightly. Most of the tests used, are based on the calculation of average, however, this statistic can be affected by extreme results. In addition, a mean difference may be statistically insignificant in practical terms can be decisive. In this paper we propose a statistical test that solved the mentioned problems. The proposed methodology in this research also determines the sample size and the probability that there is significant difference when really there is not (Type II error), factors that generally are not considered in traditional tests.Keywords-Comparison of Algorithms, Number of instances, Optimization, Statistical test, Type II error. I. INTRODUCTION It is common in the literature, when comparing the performance of algorithms is to use a mean test, mainly a paired t-student test, however, this method may present some drawbacks from the statistical viewpoint, since by using a paired t-student test without verifying the normality assumption in the samples can lead to significant biases in the conclusions. Further that in some cases is not taken into account the percentage of times that exceeds another algorithm when solving a specific problem. Theseinconvenientcan be explained by the fact that the paired t-student test resulted in no significant difference between the results obtained by both algorithms for a particular problem, but such a conclusion could be affected by the presence of outliers (for example, one or more very large differences in absolute value). Similarly, the presence of some extremely large differences can give false evidence that an algorithm outperforms another, when in fact this may not be so. In optimization problems, the statistical differences considered not significant may prove to be dangerous, because if we are talking for example, of thousands of dollars, two meanslike 985,000 and 965,000 may be statistically equal, but in monetary terms, 20 million dollars is extremely significant. II. LITERATURE REVIEW Generally, when comparing two algorithms, one of the most commonly used statistical measures is the mean, eventually leading to the use of a t-student test (Shilane et al, 2008). However, for the case that compares 3 or more algorithms, it is more convenient to make a modification in the calculation of the test www.ijera.com statistical (Kenward & Roger, 1997). Other multiple comparison methods are treated by Steel & Torrie (1980), Hochberg & Tamhane (1987), Shaffer (1995), Sokal & Rohlf (1995), Hsu (1996) and Clever & Scarisbrick, (2001). Besides the use of the average for the comparison of algorithms, other measures of interest are the variance and/or the entropy (Rogers & Hsu, 2001; Piepho, 2004). In regard to performance measurements used, the most common is the quality measurement algorithm. Authors such as Barr (1995), Eiben (2002), Bartz & Beielstein (2004), Birattari (2005) and Hughes (2006) present a list of the different performance measures used in the literature, however, within the measures, no importance is given to the percentage of times that exceeds another algorithm. Peer et al (2003) conducted a study, which shows several cases in which appropriate statistical methods used to analyze results. Furthermore, some authors report the use of various methods to compare different types of problems. Dietterich (1997) reviews five statistical test for comparing learning algorithms Brazdil et al (2000) compared ranking methods for the selection of classification algorithms, Bouckaert (2003) employs several calibration test for the selection of learning algorithms and Shilane et al (2008) propose a statistical methodology for genetic algorithms performance comparison. In the methodologies cited above the average ranking, ranking of successive radii rate, among others. In this paper, we illustrate a statistical methodology for the comparison of results between algorithms, taking into account the value of the differences, although not statistically significant, without being affected by this alleged meeting or 469 | P a g e
  • 2. SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472 outliers. A procedure similar to the one at, is employed in the design of sampling plans for quality control (Montgomery, 2009), but the focus of this investigation is different, and the literature review conducted, indicates that there has been used for comparison of algorithms, or estimate the quality of a particular algorithm. III. www.ijera.com Case 2: when trying to prove 𝑝 ≀ 𝑝0 , for this case, we have: 𝐻0 : 𝑝 = 𝑝0 𝐻1 : 𝑝 > 𝑝0 (3) PROPOSED METHODOLOGY The p value corresponding to this case is: Consider the following set of symbols required to build the theoretical framework of the proposed methodology. The developed method seeks to solve a minimization problem. n: Numberofinstancesoftheproblemtosolve fA : Objectivefunctionvalue bythealgorithmA fB : ObjectivefunctionvaluebythealgorithmB fB βˆ’ fA Ξ³A/B : Relative efficiency of Aover B = fA 1, ifΞ³A/B β‰₯ Ξ³0 x: Bernoulli random variable = 0, inothercase n X = Total number of successes = xi 𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 β‰₯ 𝑋0 = 𝑛 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– 𝑖 0 𝑛 𝑖=𝑋0 (4) The rejection criteria is identical to the previous case. Case 3: when trying to prove 𝑝 = 𝑝0 , for this case, we have: 𝐻0 : 𝑝 = 𝑝0 𝐻1 : 𝑝 β‰  𝑝0 (5) For this case, if there is no evidence to support the null hypothesis, we observe or be a very small value of X, or a very large value. Thus, there are two sub-cases: i=1 p: P(x = 1) Ξ±: Type I error Ξ²: Type II error Thus, instead of inquiring whether an average algorithm outperforms other, we propose to determine if 100p% (strictly equal, at least, or maximum) of the time, an algorithm outperforms other with a minimum relative efficiency 𝛾0 . Since X is the sum of Bernoulli random variables, turns out to be a binomial random variable, whereby this distribution should be used to verify the statistical validity of the assertion with respect to p. Following we describe each of the three cases that can occur. Case 1: when trying to prove 𝑝 β‰₯ 𝑝0 , The approach of the hypothesis would be the following: 𝐻0 : 𝑝 = 𝑝0 𝐻1 : 𝑝 < 𝑝0 (1) At first glance, it seems a test of proportions, leading to use the normal distribution (assuming p fits this distribution) to reach a conclusion. However, the idea is not to rely on this assumption and using the Binomial distribution would be really appropriate. If the null hypothesis is true, this should be reflected in the value of X in the sample. A high p value observed should lead to a large value of X, while a small value of p should be reflected in a small value of X. Thus, if the null hypothesis is true, finding a value greater than or equal to X should be very unlikely. Then, the p value of the test proposed in (1) is calculated as: 𝑛 𝑋0 𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 ≀ 𝑋0 = 𝑖=0 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– (2) 𝑖 0 Then, H0 must be rejected if 𝑝 π‘£π‘Žπ‘™π‘’π‘’ ≀ 𝛼. www.ijera.com 1) 𝑋 ≀ 𝑛/2. since, at the beginning, is unknown in what sense is the deviation of x, we have: 𝑛 𝑋0 𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 ≀ 𝑋0 = 2 𝑖=0 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– (6) 𝑖 0 2) 𝑋 β‰₯ 𝑛/2. we have: 𝑛 𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 2𝑃 𝑋 β‰₯ 𝑋0 = 2 𝑖=𝑋0 𝑛 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– 𝑖 0 (7) Example: Suppose two algorithms were used to solve 10 instances of the same minimization problem. The results obtained were as follows: Table 1. Example for the comparison of algorithms Instance 1 2 3 4 5 1875 875 1335 542 852 𝑓𝐴 1897 924 842 624 892 𝑓𝐡 Instance 6 7 8 9 10 1854 185 2583 3541 365 𝑓𝐴 1896 120 2612 3663 369 𝑓𝐡 We want to determine if the algorithm A exceeds at least 80% of the time the algorithm B, with a 95% confidence level. If used a paired t-student test and if we verify the equality of means (a different hypothesis), we would have: 𝑑 = 6.8; 𝑠 𝑑 = 144,11; 𝑇 = 0.472and 𝑑0.025 ,9 = 1.83. Since 𝑇< 𝑑0.025 ,9 , we cannot reject the null hypothesis, so that the algorithms in average generate the same result. However, in practical problems, what really matters is to see which algorithm gets the best answer in most cases.We can propose the following hypothesis: 470 | P a g e
  • 3. SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472 𝐻0 : 𝑝 = 0.8 𝐻1 : 𝑝 < 0.8 𝑍𝛼 8 𝑝 π‘£π‘Žπ‘™π‘’π‘’ = 𝑃 𝑋 ≀ 8 = 𝑖=0 10 0.8 𝑖 0.210βˆ’π‘– 𝑖 = 0.62 > 0.05 Therefore, we cannot reject the null hypothesis, so that with a 95% confidence, it is concluded that the algorithm A exceeds the algorithm B at least 80% of the time. 3.1 Type II error and Number of instances Generally, researchers arbitrarily choose the number of instances that apply to algorithms to compare, which can carry with it a potentially dangerous type II error. Suppose that the test proposed in (1), the true value of 𝑝 is at least 𝑝′ 0 , where 𝑝′ 0 < 𝑝. Let π‘Ÿ be the critical value of the test: π‘Ÿ 𝑖=0 𝑃 𝑋≀ π‘Ÿ = 𝛼= 𝑛 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– 𝑖 0 (8) Since in a hypothesis test, 𝛽 is the probability of not rejecting the null hypothesis when it is false, we have: 𝑛 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– 𝑖 0 (9) (8) and (9) representa system of two non-linear equations with two unknown variables, π‘Ÿ and𝑛. Since the system is extremely difficult to solve, two phases will be used to solve it: 𝛽 = 𝑃(𝑋 > π‘Ÿ/𝑝 β‰₯ 𝑝′ 0 ) = 𝑛 𝑖=π‘Ÿ+1 Phase 1: Normal approximation Assuming an initial approximation of the binomial distribution to the normal distribution, we have the following parameterization: 𝑍= π‘₯βˆ’π‘›π‘ (10) 𝑛𝑝 (1βˆ’π‘) Thus, using jointly (8), (9) and (10), we have: 𝑍𝛼 = 𝑍1βˆ’π›½ = π‘Ÿβˆ’π‘›π‘ 0 (12) 𝑛𝑝′0 (1βˆ’π‘β€²0 ) Isolating π‘Ÿ from (11) and (12): 𝑍 𝛼 𝑛𝑝0 (1 βˆ’ 𝑝0 ) + 𝑛𝑝0 = 𝑍1βˆ’π›½ 𝑛𝑝′0 Associating terms, we have: 𝑛 𝑍𝛼 𝑝0 1 βˆ’ 𝑝0 βˆ’ 𝑍1βˆ’π›½ = 𝑛(𝑝′ 0 βˆ’ 𝑝0 ) www.ijera.com 𝑛𝑝′0 (1 βˆ’ 𝑝′0 ) + (13) 𝑝′ 0 1 βˆ’ 𝑝′ 0 (𝑝′ 0 βˆ’ 𝑝0 ) 2 𝑍𝛼 𝑛′ = 𝑛𝑝 β€² 0 1βˆ’π‘ β€² 0 𝑝 0 1βˆ’π‘ 0 βˆ’π‘1βˆ’π›½ (14) (𝑝 β€² 0 βˆ’π‘ 0 )2 Since (14) can calculate an approximation only, we should look for the exact number of instances to run. This is done in phase two. Phase 2: Calculation for the number of instances Then, we propose an algorithm to find the appropriate value of the number of instances that ensures desired significance and statistical power in the analysis. Step 1: make n = nβ€² Step 2: Calculate a and b such that: π‘₯ π‘Ž = {min π‘₯ / π‘₯ 𝑏 = {min π‘₯ / 𝑖=0 𝑖=0 𝑛 𝑝 𝑖 (1 βˆ’ 𝑝0 ) π‘›βˆ’π‘– ≀ 𝛼} 𝑖 0 𝑛 𝑝′0 𝑖 (1 βˆ’ 𝑝0 )β€² π‘›βˆ’π‘– β‰₯ 1 βˆ’ 𝛽} 𝑖 𝐼𝑓 π‘Ž = 𝑏 𝑒𝑛𝑑, 𝑒𝑙𝑠𝑒, π‘šπ‘Žπ‘˜π‘’ 𝑛 = 𝑛 + 1 π‘Žπ‘›π‘‘ π‘Ÿπ‘’π‘π‘’π‘Žπ‘‘ 𝑠𝑑𝑒𝑝 Phase 1 avoids unnecessary scans, as it establishes a minimum limit to the number of instances to be executed. 3.2 Performance Measure of an Algorithm Besides being useful for proper comparison of algorithms in optimization problems, the test can be modified to measure how well an algorithm performs compared to the optimal solution of the problem (if the latter is known). Under the assumption that this is a minimization problem, let 𝑓 𝑂𝑃𝑇 be the minimum of a particular instance. Thus, we define the following: 𝑓 𝐴 βˆ’ 𝑓 𝑂𝑃𝑇 𝛾: π‘Ÿπ‘’π‘™π‘Žπ‘‘π‘–π‘£π‘’ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘› = 𝑓 𝑂𝑃𝑇 Redefining x: 1, 𝑖𝑓 𝛾 ≀ 𝛾0 π‘₯= 0, 𝑖𝑛 π‘œπ‘‘β„Žπ‘’π‘Ÿ π‘π‘Žπ‘ π‘’ 𝑛 𝑋= (11) 𝑛𝑝 0 (1βˆ’π‘ 0 ) π‘Ÿβˆ’π‘›π‘β€²0 𝑝′ 0 1 βˆ’ 𝑝′ 0 𝑝0 1 βˆ’ 𝑝0 βˆ’ 𝑍1βˆ’π›½ 𝑛= From the information, we have 𝛾0 = 0, from which it follows𝑋0 = 8. Thus: www.ijera.com π‘₯𝑖 𝑖=0 Thus, the methodology allows us to study, with some reliability, the percentage of times that the algorithm is at a maximum distance of the optimal solution of a problem. Must be clarified an extremely important point, in regard to the scope of the conclusion of the test. When working with instances of very different sizes, it is advisable to group them by type (small, medium and large) and make an Anova Test first to determine whether the size of the instance is a factor that affects the efficiency of the response obtained by the algorithm. If so, the methodology proposed here should be applied only to relatively homogeneous size instances, so that the conclusions are only valid for instances of that range. If on the contrary, the variance 471 | P a g e
  • 4. SuΓ‘rez DΓ­az Ronald et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.469-472 analysis revealed that the quality of the algorithm is independent of the size of the instance, then, the test results can be generalized. IV. CONCLUSION In this paper, we present a statistical methodology for the comparison of algorithms used to solve optimization problems. The methodology does not depend on compliance with any statistical assumption for the accuracy of their results, and allows us to establish with any degree of reliability, the percentage of times that an algorithm outperforms another. This statistic test, under minor modifications, can be extended to measure the performance of an algorithm compared to the optimal response to the problem. The scope of the conclusions depends on whether the quality of the response of the algorithm is affected or not by the size of instances resolved. [11] [12] [13] [14] [15] REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] Barr, R., Golden, R., Kelly, J., Rescende M., Stewart, W. Designing and Reporting on Computational Experiments with Heuristic Methods, Journal of Heuristics, 1995. Bartz, T., Beielstein, T. Design and Analysis of Optimization Algorithms Using Computational Statistics. AppliedNumericalAnalysis& Computational Mathematics, 2004. Birattari, M., Dorigo, M. How to assess and report the performance of a stochastic algorithm on a benchmark problem: Mean or best result on a number of runs? IRIDIA, Universite Libre de Bruxelles, 2005. Bouckaert, R. Choosing between two learning algorithms based of calibrate test. Twentieth International Conferenceon Machine Learning. Washington DC, 2003. Brazdil, P, Soarez, C. A Comparison of Ranking Methods for Classification Algorithm Selection. Faculty of Economics, University of Porto, Portugal. Clever, A.G., Scarisbrick, D.H., 2001. Practical Statistics and Experimental Design for Plant and Crop Science. Wiley, New York. Dietterich, T. Approximate Statistical Test for Comparing Supervised Classification Learning Algorithms. Corvallis, OregonStateUniversity, 1993. Eiben, A. E., Jelasity, M. A critical note on experimental research methodology in EC. Proceedings of the 2002 Congress on Evolutionary Computation (CEC '02), 2002. Hochberg,Y., Tamhane, A.C., 1987. Multiple Comparison Procedures.Wiley, New York. Hsu, J.C., 1996. Multiple Comparisons: Theory and Methods. Chapman& Hall, London. www.ijera.com [16] [17] [18] [19] [20] www.ijera.com Hughes, E. J. Assessing Robustness of Optimization Performance for Problems with Expensive Evaluation Functions. IEEE CongressonEvolutionaryComputation (CEC 2006), 2006. Kenward, M.G., Roger, J.H., 1997. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53 (3), 983–997. Montgomey, D., 2009. Statistical Quality Control. Wiley, Arizona. Peer, E. S., Engelbrecht, A. P., Van den Bergh, F. CIRG@UP OptiBench: A statistically sound framework for benchmarking optimisation algorithms. CongressonEvolutionaryComputation, 2003. Piepho, H.-P., 2004. An algorithm for a letterbased representation of all-pairwise comparisons. J. Comput. Graph. Statist. 13, 456–466. Rogers, J.A., Hsu, J.C., 2001. Multiple Comparisons of Biodiversity. Biometrical J. 43 (5), 617–625. Shaffer, J.P., 1995. Multiple hypothesis testing. Ann. Rev. Psych. 46, 561–584. Shilane, D., Martikainen, J., Dudoit, S., Ovaska, S. A General Framework for Statistical Performance Comparison of Evolutionary Computation Algorithms. InformationSciences 178 (2008), 2870-2879. Sokal, R.R., Rohlf, F.J., 1995. Biometry. Freeman, New York. Steel, R.G.D., Torrie, J.H., 1980. Principles and Procedures of Statistics. McGraw-Hill, New York. 472 | P a g e