SlideShare a Scribd company logo
1 of 20
Do Fractional Norms and Quasinorms
Help to Overcome
the Curse of Dimensionality?
Alexander N. Gorban
with Jeza Allohibi and Evgeny M. Mirkes
University of Leicester, UK
and Lobachevsky State University, Russia
Curse of dimensionality (Bellman, 1957)
Blessing of dimensionality (Kainen, 1997)
For a random sample in high-dimensional space
• Concentration of distances: Distances between almost all
pairs of points are almost equal;
• Quasiorthogonality: Vectors of the sample are almost
orthogonal (after centralization);
• Stochastic separation: Almost every point is linearly
separable from the set of all other points
With high probability,
for a wide class of distributions
and even for exponentially large samples
Essentially high-dimensional
distributions
• Stochastic separation theorems and other
concentration results do not need hypotheses about
independence and uniform distribution of data.
• They do not need any other hypothesis about special
distributions like the Gaussian one.
• The main condition used instead of these
simplifications is: sets of small volume should not have
a high probability (further specifications in what ‘small’
and ‘large’ mean here can be found in publications).
• In particular, instead of uniform or Gaussian
distributions general log-concave distributions can be
used, and this is just an example.
Measure concentration
Almost all are almost orthogonal (equidistribution in a cube [-1,1]n)
Measure concentration
Almost all are almost equidistant (equidistribution in a cube [0,1]n)
Minkowski distance
Value
𝑥 𝑝 =
𝑖=1
𝑑
𝑥𝑖
𝑝
1/𝑝
is Minkovski distance for
𝑝 ≥ 1 and quasinorm for
0 < 𝑝 < 1
Fractional norms can compensate
curse of dimensionality???
(C.C. Aggarwal, 2001)
We select three measures to compare 𝑙 𝑝 for different p
(𝐷 𝑝 is the set of 𝑙 𝑝 distances between points in a sample):
1. Relative contrast: 𝑅𝐶 𝑝 =
max 𝐷 𝑝 −min 𝐷 𝑝
min 𝐷 𝑝
2. Coefficient of variation 𝐶𝑉𝑝 =
var 𝐷 𝑝
𝑚𝑒𝑎𝑛 𝐷 𝑝
3. Accuracy of KNN classification
Relative contrast
Comparison of relative contrast for Euclidean and
Manhattan metrics: for any dataset with reasonable size
𝑃[𝑅𝐶2 < 𝑅𝐶1] = 1 (equidistribution in a cube [0,1]n)
Dim
𝑃[𝑅𝐶2 < 𝑅𝐶1] for number of points
[Aggarwal] 10 10 20 100
1 0 0 0 0
2 0.850 0.850 0.960 >0.999
3 0.887 0.930 0.996 >0.999
4 0.913 0.973 0.996 >0.999
10 0.956 0.994 >0.999 >0.999
15 0.961 >0.999 >0.999 >0.999
20 0.971 >0.999 >0.999 >0.999
100 0.982 >0.999 >0.999 >0.999
Relative contrast and
coefficient of variation
For almost all relatively rich datasets, the following
inequality are true
𝑅𝐶 𝑝 < 𝑅𝐶 𝑞, 𝐶𝑉𝑝 < 𝐶𝑉𝑞, ∀𝑝 > 𝑞
(equidistribution in a cube [0,1]n)
Main questions:
A) What does it mean
“Data Dimension”?
B) Does the greater value of relative
contrast or coefficient of variation
means the better quality of
classifier?
Dimension definitions in use
• Number of attributes (#Attr) 
• Number of informative principal components
according to the Kaiser rule (PCA-K)
• Number of informative principal components
according to the Broken stick rule (PCA-B)
• Number of informative principal components
according to the Conditional number rule (PCA-CN)
• Dimension according to the separability property
• Fractal dimension
Name #Attr PCA-K PCA_B PCA-CN SepD FracD
EEG Eye State 14 4 4 5 2.1 1.2
Climate Model Simulation Crashes 18 10 0 18 16.8 21.7
Diabetic Retinopathy Debrecen 19 5 3 8 4.3 2.3
SPECT Heart 22 7 3 12 4.9 11.5
Breast Cancer 30 6 3 5 4.3 3.5
Ionosphere 34 8 4 9 3.9 3.5
QSAR biodegradation 41 11 6 15 5.4 3.1
SPECTF Heart 44 10 3 6 5.6 7
MiniBooNE particle identification 50 4 1 1 0.5 2.7
First-order theorem proving 51 13 7 9 3.4 2.04
Connectionist Bench (Sonar) 60 13 6 11 6.1 5.5
Quality Assessment of Digital Colposcopies 62 11 6 9 5.6 4.7
LFW (faces) 128 51 55 57 13.8 19.3
Musk 1 166 23 9 7 4.1 4.4
Musk 2 166 25 13 6 4.1 7.8
Madelon 500 224 0 362 436.3 13.5
Gisette 5,000 1465 133 25 10.2 2.04
Different dimensions for databases
Comparison of accuracies for 𝑙 𝑝
We select several measures of classification accuracy
measures:
1. Total Number of Neighbours of the Same Class
(TNNSC)
2. Accuracy (fraction of correctly recognised cases
among all cases)
3. Sensitivity plus specificity (true positive rate +
true negative rate)
For TNNSC and accuracy the proportion estimation
was used to identify significance of differences
Comparison of several algorithms
To compare simultaneously performance of several
algorithms we applied Friedman test (null hypothesis
is “all algorithms have the same performance”)
If the Friedman test identified a performance
inequality of tested algorithms then post hoc
Nomenyi test allows identifying pairs of algorithms
with statistically significantly different performance.
Results
Green is the best, Yellow is the second best, Red is the worst
𝑝 for 𝑙 𝑝
Indicator
0.01 0.1 0.5 1 2 4 10 ∞
TNNSC
The best 1 5 10 13 4 6 1 3
The worst 23 4 2 2 3 3 4 7
Insignificantly different from the best 19 26 32 31 30 29 26 26
Insignificantly different from the worst 36 24 22 21 22 22 26 26
Accuracy
The best 2 7 15 8 8 3 3 6
The worst 18 6 3 4 5 9 8 8
Insignificantly different from the best 30 31 34 33 33 32 31 32
Insignificantly different from the worst 36 33 31 31 31 32 33 32
Sensitivity plus specificity
The best 5 8 13 6 9 3 4 5
The worst 15 4 2 3 3 7 8 13
Results
Friedman test shows p-values of less than 0.0001 for
all tests.
Preprocessing Indicator
Set of insignificantly different
0.01 0.1 0.5 1 2 4 10 ∞
No preprocessing
TNNSC X X X X X
Accuracy X X X X
Se+Sp X X X X
Standardisation
TNNSC X X X
Accuracy X X X
Se+Sp X X X X
Standard
dispersion
TNNSC X X X X
Accuracy X X X X
Se+Sp X X X X
Conclusion
• For almost all rich enough datasets relative contrast
and coefficient of variation are less for greater degrees
p of Minkowski metrics or quasimetrics 𝑙 𝑝 (Fractional
quasimetrics with small p have greater relative contrast
and coefficient of variation).
• Greater values of relative contrast and coefficient of
variations do not mean better quality of KNN
classification.
• Performance of KNN for 𝑝 = 0.5, 1, 2 are statistically
insignificant for all tests. Extremely small or high values
of 𝑝 correspond to worse performance.
• Fractional quasinorms do not help to overcome the
curse of dimensionality in classification problem.
Conclusion
Fractional quasinorms do not help to overcome the
curse of dimensionality in classification problem
Some references 1
• C. C. Aggarwal, A. Hinneburg, and D. A. Keim, On the surprising
behavior of distance metrics in high dimensional space, in
International conference on database theory. Springer, 2001, pp.
420–434.
• P. C. Kainen, Utilizing geometric anomalies of high dimension:
When complexity makes computation easier, in Computer
Intensive Methods in Control and Signal Processing. Springer,
1997, pp. 283–294.
• P. Lévy, Problèmes concrets d’analyse fonctionnelle. Paris, France:
Gauthier-Villars, 1951.
• P . Kainen, V. Kůrková. Quasiorthogonal dimension of Euclidian
spaces. Appl. Math. Lett. 6 (1993), 7–10.
• A.N. Gorban, I.Y. Tyukin, D.V. Prokhorov, K.I. Sofeikov,
Approximation with random bases: Pro et Contra, Information
Sciences 364-365, (2016), 129-145.
Some references 2
• A.N. Gorban, I.Y. Tyukin. Stochastic Separation Theorems, Neural
Networks, 94, October 2017, 255-259.
• D. Donoho, J. Tanner. Observed universality of phase transitions in
high-dimensional geometry, with implications for modern data
analysis and signal processing, Philosophical Transactions of The
Royal Society A 367(1906), 20090152 (2009).
• A.N. Gorban, I.Y. Tyukin. Blessing of dimensionality: mathematical
foundations of the statistical physics of data. Philosophical
Transactions of The Royal Society A 376(2118), 20170237 (2018).
• A.N. Gorban, A. Golubkov, B. Grechuk, E.M. Mirkes, I.Y. Tyukin,
Correction of AI systems by linear discriminants: Probabilistic
foundations, Information Sciences 466 (2018), 303-322.
• A.N. Gorban, V.A. Makarov, I.Y. Tyukin, The unreasonable
effectiveness of small neural ensembles in high-dimensional
brain, Physics of Life Reviews, 2019,
https://doi.org/10.1016/j.plrev.2018.09.005

More Related Content

What's hot

Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Awad Albalwi
 
Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Awad Albalwi
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionSanoj Fernando
 
Business Statistics Chapter 6
Business Statistics Chapter 6Business Statistics Chapter 6
Business Statistics Chapter 6Lux PP
 
A proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designsA proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designsAlexander Decker
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVAStephen Senn
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencykreshajay
 
06 ch ken black solution
06 ch ken black solution06 ch ken black solution
06 ch ken black solutionKrunal Shah
 
Confidence level and sample size
Confidence level and sample sizeConfidence level and sample size
Confidence level and sample sizeThomson Leopoldo
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Long Beach City College
 
Estimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss function Estimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss function ijscmcj
 
Estimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss functionEstimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss functionijscmcj
 
Real Applications of Normal Distributions
Real Applications of Normal Distributions Real Applications of Normal Distributions
Real Applications of Normal Distributions Long Beach City College
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology
 

What's hot (20)

DETECTION THEORY CHAPTER 6
DETECTION THEORY CHAPTER 6DETECTION THEORY CHAPTER 6
DETECTION THEORY CHAPTER 6
 
Chapter9
Chapter9Chapter9
Chapter9
 
Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2
 
Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Business Statistics Chapter 6
Business Statistics Chapter 6Business Statistics Chapter 6
Business Statistics Chapter 6
 
A proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designsA proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designs
 
Dispersion
DispersionDispersion
Dispersion
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
06 ch ken black solution
06 ch ken black solution06 ch ken black solution
06 ch ken black solution
 
Confidence level and sample size
Confidence level and sample sizeConfidence level and sample size
Confidence level and sample size
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Introduction to the t-test
Introduction to the t-testIntroduction to the t-test
Introduction to the t-test
 
Estimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss function Estimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss function
 
Estimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss functionEstimation of mean and its function using asymmetric loss function
Estimation of mean and its function using asymmetric loss function
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
Ds vs Is discuss 3.1
Ds vs Is discuss 3.1Ds vs Is discuss 3.1
Ds vs Is discuss 3.1
 
Real Applications of Normal Distributions
Real Applications of Normal Distributions Real Applications of Normal Distributions
Real Applications of Normal Distributions
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersion
 

Similar to Does Fractional Norms Help Overcome Curse of Dimensionality

L1 statistics
L1 statisticsL1 statistics
L1 statisticsdapdai
 
Lect5_GSEA_Classify (1).ppt
Lect5_GSEA_Classify (1).pptLect5_GSEA_Classify (1).ppt
Lect5_GSEA_Classify (1).pptSaiGanesh836443
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsBabasab Patil
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencycheweb1
 
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptxCHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptxrathorebhagwan07
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detectionHyun-hwan Jeong
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISIrene Pochinok
 
clustering tendency
clustering tendencyclustering tendency
clustering tendencyAmir Shokri
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
Uncertainity
UncertainityUncertainity
UncertainityVIGNESH C
 

Similar to Does Fractional Norms Help Overcome Curse of Dimensionality (20)

Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
Errors2
Errors2Errors2
Errors2
 
L1 statistics
L1 statisticsL1 statistics
L1 statistics
 
Lect5_GSEA_Classify (1).ppt
Lect5_GSEA_Classify (1).pptLect5_GSEA_Classify (1).ppt
Lect5_GSEA_Classify (1).ppt
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
 
P1121133727
P1121133727P1121133727
P1121133727
 
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptxCHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
 
Unit3
Unit3Unit3
Unit3
 
9618821.ppt
9618821.ppt9618821.ppt
9618821.ppt
 
9618821.pdf
9618821.pdf9618821.pdf
9618821.pdf
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
clustering tendency
clustering tendencyclustering tendency
clustering tendency
 
Respose surface methods
Respose surface methodsRespose surface methods
Respose surface methods
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
Uncertainity
UncertainityUncertainity
Uncertainity
 

More from Alexander Gorban

Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...Alexander Gorban
 
Adaptation free energy: The third generation of models of physiological ada...
Adaptation free energy: The third generation of models of physiological ada...Adaptation free energy: The third generation of models of physiological ada...
Adaptation free energy: The third generation of models of physiological ada...Alexander Gorban
 
Looking for tomorrows mainstream
Looking for tomorrows mainstreamLooking for tomorrows mainstream
Looking for tomorrows mainstreamAlexander Gorban
 
Evolution of adaptation mechanisms
Evolution of adaptation mechanismsEvolution of adaptation mechanisms
Evolution of adaptation mechanismsAlexander Gorban
 
FEHRMAN MIRKES GORBAN IFCS FIN
FEHRMAN MIRKES  GORBAN IFCS FINFEHRMAN MIRKES  GORBAN IFCS FIN
FEHRMAN MIRKES GORBAN IFCS FINAlexander Gorban
 
New universal Lyapunov functions for nonlinear kinetics
New universal Lyapunov functions for nonlinear kineticsNew universal Lyapunov functions for nonlinear kinetics
New universal Lyapunov functions for nonlinear kineticsAlexander Gorban
 

More from Alexander Gorban (7)

Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
 
Adaptation free energy: The third generation of models of physiological ada...
Adaptation free energy: The third generation of models of physiological ada...Adaptation free energy: The third generation of models of physiological ada...
Adaptation free energy: The third generation of models of physiological ada...
 
Looking for tomorrows mainstream
Looking for tomorrows mainstreamLooking for tomorrows mainstream
Looking for tomorrows mainstream
 
mmnp2015103Gorban
mmnp2015103Gorbanmmnp2015103Gorban
mmnp2015103Gorban
 
Evolution of adaptation mechanisms
Evolution of adaptation mechanismsEvolution of adaptation mechanisms
Evolution of adaptation mechanisms
 
FEHRMAN MIRKES GORBAN IFCS FIN
FEHRMAN MIRKES  GORBAN IFCS FINFEHRMAN MIRKES  GORBAN IFCS FIN
FEHRMAN MIRKES GORBAN IFCS FIN
 
New universal Lyapunov functions for nonlinear kinetics
New universal Lyapunov functions for nonlinear kineticsNew universal Lyapunov functions for nonlinear kinetics
New universal Lyapunov functions for nonlinear kinetics
 

Recently uploaded

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Does Fractional Norms Help Overcome Curse of Dimensionality

  • 1. Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensionality? Alexander N. Gorban with Jeza Allohibi and Evgeny M. Mirkes University of Leicester, UK and Lobachevsky State University, Russia
  • 2. Curse of dimensionality (Bellman, 1957) Blessing of dimensionality (Kainen, 1997) For a random sample in high-dimensional space • Concentration of distances: Distances between almost all pairs of points are almost equal; • Quasiorthogonality: Vectors of the sample are almost orthogonal (after centralization); • Stochastic separation: Almost every point is linearly separable from the set of all other points With high probability, for a wide class of distributions and even for exponentially large samples
  • 3. Essentially high-dimensional distributions • Stochastic separation theorems and other concentration results do not need hypotheses about independence and uniform distribution of data. • They do not need any other hypothesis about special distributions like the Gaussian one. • The main condition used instead of these simplifications is: sets of small volume should not have a high probability (further specifications in what ‘small’ and ‘large’ mean here can be found in publications). • In particular, instead of uniform or Gaussian distributions general log-concave distributions can be used, and this is just an example.
  • 4. Measure concentration Almost all are almost orthogonal (equidistribution in a cube [-1,1]n)
  • 5. Measure concentration Almost all are almost equidistant (equidistribution in a cube [0,1]n)
  • 6. Minkowski distance Value 𝑥 𝑝 = 𝑖=1 𝑑 𝑥𝑖 𝑝 1/𝑝 is Minkovski distance for 𝑝 ≥ 1 and quasinorm for 0 < 𝑝 < 1
  • 7. Fractional norms can compensate curse of dimensionality??? (C.C. Aggarwal, 2001) We select three measures to compare 𝑙 𝑝 for different p (𝐷 𝑝 is the set of 𝑙 𝑝 distances between points in a sample): 1. Relative contrast: 𝑅𝐶 𝑝 = max 𝐷 𝑝 −min 𝐷 𝑝 min 𝐷 𝑝 2. Coefficient of variation 𝐶𝑉𝑝 = var 𝐷 𝑝 𝑚𝑒𝑎𝑛 𝐷 𝑝 3. Accuracy of KNN classification
  • 8. Relative contrast Comparison of relative contrast for Euclidean and Manhattan metrics: for any dataset with reasonable size 𝑃[𝑅𝐶2 < 𝑅𝐶1] = 1 (equidistribution in a cube [0,1]n) Dim 𝑃[𝑅𝐶2 < 𝑅𝐶1] for number of points [Aggarwal] 10 10 20 100 1 0 0 0 0 2 0.850 0.850 0.960 >0.999 3 0.887 0.930 0.996 >0.999 4 0.913 0.973 0.996 >0.999 10 0.956 0.994 >0.999 >0.999 15 0.961 >0.999 >0.999 >0.999 20 0.971 >0.999 >0.999 >0.999 100 0.982 >0.999 >0.999 >0.999
  • 9. Relative contrast and coefficient of variation For almost all relatively rich datasets, the following inequality are true 𝑅𝐶 𝑝 < 𝑅𝐶 𝑞, 𝐶𝑉𝑝 < 𝐶𝑉𝑞, ∀𝑝 > 𝑞 (equidistribution in a cube [0,1]n)
  • 10. Main questions: A) What does it mean “Data Dimension”? B) Does the greater value of relative contrast or coefficient of variation means the better quality of classifier?
  • 11. Dimension definitions in use • Number of attributes (#Attr)  • Number of informative principal components according to the Kaiser rule (PCA-K) • Number of informative principal components according to the Broken stick rule (PCA-B) • Number of informative principal components according to the Conditional number rule (PCA-CN) • Dimension according to the separability property • Fractal dimension
  • 12. Name #Attr PCA-K PCA_B PCA-CN SepD FracD EEG Eye State 14 4 4 5 2.1 1.2 Climate Model Simulation Crashes 18 10 0 18 16.8 21.7 Diabetic Retinopathy Debrecen 19 5 3 8 4.3 2.3 SPECT Heart 22 7 3 12 4.9 11.5 Breast Cancer 30 6 3 5 4.3 3.5 Ionosphere 34 8 4 9 3.9 3.5 QSAR biodegradation 41 11 6 15 5.4 3.1 SPECTF Heart 44 10 3 6 5.6 7 MiniBooNE particle identification 50 4 1 1 0.5 2.7 First-order theorem proving 51 13 7 9 3.4 2.04 Connectionist Bench (Sonar) 60 13 6 11 6.1 5.5 Quality Assessment of Digital Colposcopies 62 11 6 9 5.6 4.7 LFW (faces) 128 51 55 57 13.8 19.3 Musk 1 166 23 9 7 4.1 4.4 Musk 2 166 25 13 6 4.1 7.8 Madelon 500 224 0 362 436.3 13.5 Gisette 5,000 1465 133 25 10.2 2.04 Different dimensions for databases
  • 13. Comparison of accuracies for 𝑙 𝑝 We select several measures of classification accuracy measures: 1. Total Number of Neighbours of the Same Class (TNNSC) 2. Accuracy (fraction of correctly recognised cases among all cases) 3. Sensitivity plus specificity (true positive rate + true negative rate) For TNNSC and accuracy the proportion estimation was used to identify significance of differences
  • 14. Comparison of several algorithms To compare simultaneously performance of several algorithms we applied Friedman test (null hypothesis is “all algorithms have the same performance”) If the Friedman test identified a performance inequality of tested algorithms then post hoc Nomenyi test allows identifying pairs of algorithms with statistically significantly different performance.
  • 15. Results Green is the best, Yellow is the second best, Red is the worst 𝑝 for 𝑙 𝑝 Indicator 0.01 0.1 0.5 1 2 4 10 ∞ TNNSC The best 1 5 10 13 4 6 1 3 The worst 23 4 2 2 3 3 4 7 Insignificantly different from the best 19 26 32 31 30 29 26 26 Insignificantly different from the worst 36 24 22 21 22 22 26 26 Accuracy The best 2 7 15 8 8 3 3 6 The worst 18 6 3 4 5 9 8 8 Insignificantly different from the best 30 31 34 33 33 32 31 32 Insignificantly different from the worst 36 33 31 31 31 32 33 32 Sensitivity plus specificity The best 5 8 13 6 9 3 4 5 The worst 15 4 2 3 3 7 8 13
  • 16. Results Friedman test shows p-values of less than 0.0001 for all tests. Preprocessing Indicator Set of insignificantly different 0.01 0.1 0.5 1 2 4 10 ∞ No preprocessing TNNSC X X X X X Accuracy X X X X Se+Sp X X X X Standardisation TNNSC X X X Accuracy X X X Se+Sp X X X X Standard dispersion TNNSC X X X X Accuracy X X X X Se+Sp X X X X
  • 17. Conclusion • For almost all rich enough datasets relative contrast and coefficient of variation are less for greater degrees p of Minkowski metrics or quasimetrics 𝑙 𝑝 (Fractional quasimetrics with small p have greater relative contrast and coefficient of variation). • Greater values of relative contrast and coefficient of variations do not mean better quality of KNN classification. • Performance of KNN for 𝑝 = 0.5, 1, 2 are statistically insignificant for all tests. Extremely small or high values of 𝑝 correspond to worse performance. • Fractional quasinorms do not help to overcome the curse of dimensionality in classification problem.
  • 18. Conclusion Fractional quasinorms do not help to overcome the curse of dimensionality in classification problem
  • 19. Some references 1 • C. C. Aggarwal, A. Hinneburg, and D. A. Keim, On the surprising behavior of distance metrics in high dimensional space, in International conference on database theory. Springer, 2001, pp. 420–434. • P. C. Kainen, Utilizing geometric anomalies of high dimension: When complexity makes computation easier, in Computer Intensive Methods in Control and Signal Processing. Springer, 1997, pp. 283–294. • P. Lévy, Problèmes concrets d’analyse fonctionnelle. Paris, France: Gauthier-Villars, 1951. • P . Kainen, V. Kůrková. Quasiorthogonal dimension of Euclidian spaces. Appl. Math. Lett. 6 (1993), 7–10. • A.N. Gorban, I.Y. Tyukin, D.V. Prokhorov, K.I. Sofeikov, Approximation with random bases: Pro et Contra, Information Sciences 364-365, (2016), 129-145.
  • 20. Some references 2 • A.N. Gorban, I.Y. Tyukin. Stochastic Separation Theorems, Neural Networks, 94, October 2017, 255-259. • D. Donoho, J. Tanner. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing, Philosophical Transactions of The Royal Society A 367(1906), 20090152 (2009). • A.N. Gorban, I.Y. Tyukin. Blessing of dimensionality: mathematical foundations of the statistical physics of data. Philosophical Transactions of The Royal Society A 376(2118), 20170237 (2018). • A.N. Gorban, A. Golubkov, B. Grechuk, E.M. Mirkes, I.Y. Tyukin, Correction of AI systems by linear discriminants: Probabilistic foundations, Information Sciences 466 (2018), 303-322. • A.N. Gorban, V.A. Makarov, I.Y. Tyukin, The unreasonable effectiveness of small neural ensembles in high-dimensional brain, Physics of Life Reviews, 2019, https://doi.org/10.1016/j.plrev.2018.09.005