The document discusses whether fractional norms and quasinorms can help overcome the curse of dimensionality. It analyzes three measures of classification accuracy for different values of p in the Minkowski distance. The results show that fractional quasinorms with small p have higher relative contrast and variation, but do not necessarily improve KNN classification performance. In fact, values of p around 0.5, 1, and 2 generally perform best, while extremely small or large p values perform worse. Therefore, the conclusion is that fractional quasinorms do not overcome the curse of dimensionality for classification problems.
Does Fractional Norms Help Overcome Curse of Dimensionality
1. Do Fractional Norms and Quasinorms
Help to Overcome
the Curse of Dimensionality?
Alexander N. Gorban
with Jeza Allohibi and Evgeny M. Mirkes
University of Leicester, UK
and Lobachevsky State University, Russia
2. Curse of dimensionality (Bellman, 1957)
Blessing of dimensionality (Kainen, 1997)
For a random sample in high-dimensional space
• Concentration of distances: Distances between almost all
pairs of points are almost equal;
• Quasiorthogonality: Vectors of the sample are almost
orthogonal (after centralization);
• Stochastic separation: Almost every point is linearly
separable from the set of all other points
With high probability,
for a wide class of distributions
and even for exponentially large samples
3. Essentially high-dimensional
distributions
• Stochastic separation theorems and other
concentration results do not need hypotheses about
independence and uniform distribution of data.
• They do not need any other hypothesis about special
distributions like the Gaussian one.
• The main condition used instead of these
simplifications is: sets of small volume should not have
a high probability (further specifications in what ‘small’
and ‘large’ mean here can be found in publications).
• In particular, instead of uniform or Gaussian
distributions general log-concave distributions can be
used, and this is just an example.
7. Fractional norms can compensate
curse of dimensionality???
(C.C. Aggarwal, 2001)
We select three measures to compare 𝑙 𝑝 for different p
(𝐷 𝑝 is the set of 𝑙 𝑝 distances between points in a sample):
1. Relative contrast: 𝑅𝐶 𝑝 =
max 𝐷 𝑝 −min 𝐷 𝑝
min 𝐷 𝑝
2. Coefficient of variation 𝐶𝑉𝑝 =
var 𝐷 𝑝
𝑚𝑒𝑎𝑛 𝐷 𝑝
3. Accuracy of KNN classification
8. Relative contrast
Comparison of relative contrast for Euclidean and
Manhattan metrics: for any dataset with reasonable size
𝑃[𝑅𝐶2 < 𝑅𝐶1] = 1 (equidistribution in a cube [0,1]n)
Dim
𝑃[𝑅𝐶2 < 𝑅𝐶1] for number of points
[Aggarwal] 10 10 20 100
1 0 0 0 0
2 0.850 0.850 0.960 >0.999
3 0.887 0.930 0.996 >0.999
4 0.913 0.973 0.996 >0.999
10 0.956 0.994 >0.999 >0.999
15 0.961 >0.999 >0.999 >0.999
20 0.971 >0.999 >0.999 >0.999
100 0.982 >0.999 >0.999 >0.999
9. Relative contrast and
coefficient of variation
For almost all relatively rich datasets, the following
inequality are true
𝑅𝐶 𝑝 < 𝑅𝐶 𝑞, 𝐶𝑉𝑝 < 𝐶𝑉𝑞, ∀𝑝 > 𝑞
(equidistribution in a cube [0,1]n)
10. Main questions:
A) What does it mean
“Data Dimension”?
B) Does the greater value of relative
contrast or coefficient of variation
means the better quality of
classifier?
11. Dimension definitions in use
• Number of attributes (#Attr)
• Number of informative principal components
according to the Kaiser rule (PCA-K)
• Number of informative principal components
according to the Broken stick rule (PCA-B)
• Number of informative principal components
according to the Conditional number rule (PCA-CN)
• Dimension according to the separability property
• Fractal dimension
13. Comparison of accuracies for 𝑙 𝑝
We select several measures of classification accuracy
measures:
1. Total Number of Neighbours of the Same Class
(TNNSC)
2. Accuracy (fraction of correctly recognised cases
among all cases)
3. Sensitivity plus specificity (true positive rate +
true negative rate)
For TNNSC and accuracy the proportion estimation
was used to identify significance of differences
14. Comparison of several algorithms
To compare simultaneously performance of several
algorithms we applied Friedman test (null hypothesis
is “all algorithms have the same performance”)
If the Friedman test identified a performance
inequality of tested algorithms then post hoc
Nomenyi test allows identifying pairs of algorithms
with statistically significantly different performance.
15. Results
Green is the best, Yellow is the second best, Red is the worst
𝑝 for 𝑙 𝑝
Indicator
0.01 0.1 0.5 1 2 4 10 ∞
TNNSC
The best 1 5 10 13 4 6 1 3
The worst 23 4 2 2 3 3 4 7
Insignificantly different from the best 19 26 32 31 30 29 26 26
Insignificantly different from the worst 36 24 22 21 22 22 26 26
Accuracy
The best 2 7 15 8 8 3 3 6
The worst 18 6 3 4 5 9 8 8
Insignificantly different from the best 30 31 34 33 33 32 31 32
Insignificantly different from the worst 36 33 31 31 31 32 33 32
Sensitivity plus specificity
The best 5 8 13 6 9 3 4 5
The worst 15 4 2 3 3 7 8 13
16. Results
Friedman test shows p-values of less than 0.0001 for
all tests.
Preprocessing Indicator
Set of insignificantly different
0.01 0.1 0.5 1 2 4 10 ∞
No preprocessing
TNNSC X X X X X
Accuracy X X X X
Se+Sp X X X X
Standardisation
TNNSC X X X
Accuracy X X X
Se+Sp X X X X
Standard
dispersion
TNNSC X X X X
Accuracy X X X X
Se+Sp X X X X
17. Conclusion
• For almost all rich enough datasets relative contrast
and coefficient of variation are less for greater degrees
p of Minkowski metrics or quasimetrics 𝑙 𝑝 (Fractional
quasimetrics with small p have greater relative contrast
and coefficient of variation).
• Greater values of relative contrast and coefficient of
variations do not mean better quality of KNN
classification.
• Performance of KNN for 𝑝 = 0.5, 1, 2 are statistically
insignificant for all tests. Extremely small or high values
of 𝑝 correspond to worse performance.
• Fractional quasinorms do not help to overcome the
curse of dimensionality in classification problem.
19. Some references 1
• C. C. Aggarwal, A. Hinneburg, and D. A. Keim, On the surprising
behavior of distance metrics in high dimensional space, in
International conference on database theory. Springer, 2001, pp.
420–434.
• P. C. Kainen, Utilizing geometric anomalies of high dimension:
When complexity makes computation easier, in Computer
Intensive Methods in Control and Signal Processing. Springer,
1997, pp. 283–294.
• P. Lévy, Problèmes concrets d’analyse fonctionnelle. Paris, France:
Gauthier-Villars, 1951.
• P . Kainen, V. Kůrková. Quasiorthogonal dimension of Euclidian
spaces. Appl. Math. Lett. 6 (1993), 7–10.
• A.N. Gorban, I.Y. Tyukin, D.V. Prokhorov, K.I. Sofeikov,
Approximation with random bases: Pro et Contra, Information
Sciences 364-365, (2016), 129-145.
20. Some references 2
• A.N. Gorban, I.Y. Tyukin. Stochastic Separation Theorems, Neural
Networks, 94, October 2017, 255-259.
• D. Donoho, J. Tanner. Observed universality of phase transitions in
high-dimensional geometry, with implications for modern data
analysis and signal processing, Philosophical Transactions of The
Royal Society A 367(1906), 20090152 (2009).
• A.N. Gorban, I.Y. Tyukin. Blessing of dimensionality: mathematical
foundations of the statistical physics of data. Philosophical
Transactions of The Royal Society A 376(2118), 20170237 (2018).
• A.N. Gorban, A. Golubkov, B. Grechuk, E.M. Mirkes, I.Y. Tyukin,
Correction of AI systems by linear discriminants: Probabilistic
foundations, Information Sciences 466 (2018), 303-322.
• A.N. Gorban, V.A. Makarov, I.Y. Tyukin, The unreasonable
effectiveness of small neural ensembles in high-dimensional
brain, Physics of Life Reviews, 2019,
https://doi.org/10.1016/j.plrev.2018.09.005