SlideShare a Scribd company logo
1 of 37
Download to read offline
1
A PRACTICAL, POWERFUL, ROBUST AND
INTERPRETABLE FAMILY OF
CORRELATION COEFFICIENTS
by
Savas Papadopoulos1
Bank of Greece
Department of Financial Stability
sapapa@bankofgreece.gr
23/05/2022
Keywords: dependence test; permutation tests; Pearson, Spearman & Kendall coefficients;
computational statistics
The views expressed are those of the author and do not necessarily reflect those of Bank of Greece
1
Copyright Β© 2022 Savas Papadopoulos, www.protectmywork.com. All rights reserved.
2
CONTENTS
Ø ABSTRACT
Ø A FAMILY OF CORRELATION COEFFICIENTS
Ø MADE-UP EXAMPLE
Ø APPLICATION TO GDP PER CAPITA
Ø SIMULATION
Ø CONCLUSIONS
3
ABSTRACT	
If we conducted a competition for which statistical quantity would be the
most valuable in exploratory data analysis, the winner would most likely
be the correlation coefficient with a significant difference from its first
competitor. In addition, most data applications contain non-normal data
with outliers without being able to be converted to normal data.
Therefore, we search for robust correlation coefficients to nonnormality
and/or outliers that could be applied to all applications and detect
influenced or hidden correlations not recognized by the most popular
correlation coefficients. We introduce a correlation-coefficient family
with the Pearson and Spearman coefficients as specific cases. Other
family members provide desirable lower p-values than those derived by
the standard coefficients in the earlier problems. The proposed family of
coefficients, their cut-off points, and p-values, computed by permutation
tests, could be applied by all scientists analyzing data. We share
simulations, code, and real data by email or the internet.
4
INTRODUCTION	
Ø The existing literature recommends the Pearson (P) correlation for
normal data and the Spearman (S) correlation for nonnormal data.
Ø We propose alternative coefficients that perform better than P & S
coefficients on applications.
Ø Data-analysis software typically computes three classic correlation
coefficients, Pearson’s, Spearman’s, and Kendall’s.
Ø It is very striking that although the three correlation coefficients were
developed in the late 19th and early 20th centuries, and despite the rapid
development of computers, the three coefficients still dominate the use.
5
THE	CORRELATION	COEFFICIENT	FAMILY	
Define the Minkowski distance: 𝐷!(π‘₯", 𝑦") = (
#
$
βˆ™ βˆ‘ |π‘₯" βˆ’ 𝑦"|!
$
"%# -
#/!
In this study we mainly apply for p=1 (Manhattan distance)
Compute the standardized values of order p as
π‘₯!,"
())
=
+!,+
-"(+!,+)
Proposed 1, Value Correlation for positive & negative relationships
π‘Ÿ!,. = /
π‘Ÿ!,./ = 1 βˆ’
#
0# βˆ™ 𝐷!
1
(π‘₯!,"
())
, 	𝑦!,"
())
- , if π‘Ÿ!,./ β‰₯ βˆ’π‘Ÿ!,.,	
π‘Ÿ!,., =
#
0# βˆ™ 𝐷!
1 (π‘₯!,"
())
, βˆ’	𝑦!,"
())
- βˆ’ 1, if π‘Ÿ!,./ < βˆ’π‘Ÿ!,.,
𝐷! (π‘₯!,"
())
, 	𝑦!,"
())
-
!
β†’ 	𝐿, as 𝑛 β†’ ∞ (convergence in probability)
6
THE	CORRELATION	COEFFICIENT	FAMILY	
WITH	RANKINGS		
Proposed 2, Rank-Value Correlation
(Standardized Rankings 𝑅!
())
(π‘₯")	&	𝑅!
())
(𝑦") for positive & negative relationships
π‘Ÿ!,2. = /
π‘Ÿ!,2./ = 1 βˆ’
#
0# βˆ™ 𝐷!
1
:𝑅!
())
(π‘₯"), 𝑅!
())
(𝑦"); , if π‘Ÿ!,2./ β‰₯ βˆ’π‘Ÿ!,2.,
π‘Ÿ!,2., =
#
0# βˆ™ 𝐷!
1
:𝑅!
())
(π‘₯"), βˆ’π‘…!
())
(𝑦"); βˆ’ 1, if π‘Ÿ!,2./ < βˆ’π‘Ÿ!,2.,	
𝐷! :𝑅!
())
(π‘₯"), βˆ’π‘…!
())
(𝑦");
!
β†’ 	𝐿, as 𝑛 β†’ ∞ (convergence in probability)
7
Pearson	Correlation	Coefficient,	𝒓𝑷𝒆,	as	a	Special	Case	(p=2):	
π‘Ÿ56 = π‘Ÿ56(π‘₯", 𝑦") = =
π‘Ÿ56/ = 1 βˆ’
-#
#
7+#,!
(&)
,	9#,!
(&)
:
1
, if π‘Ÿ56/ β‰₯ βˆ’π‘Ÿ56,
π‘Ÿ56, =
-#
#7+#,!
(&)
,,9#,!
(&)
:
1
βˆ’ 1, if π‘Ÿ56/ < βˆ’π‘Ÿ56,
𝐷1 (π‘₯1,"
())
, 	𝑦1,"
())
-
!
β†’	√2, as 𝑛 β†’ ∞ & independent x & y
Spearman2	Correlation	Coefficient,	𝒓𝑺,	as	a	Special	Case	(p=2):	
π‘Ÿ< = 	π‘Ÿ56[𝑅(π‘₯"), 𝑅(𝑦")] = 1 βˆ’
=
$#,#
βˆ™ 𝐷1
1[𝑅(π‘₯"), 𝑅(𝑦")]
-#[2(+!),,2(9!)]
$
!
β†’	
#
√=
, as 𝑛 β†’ ∞ & independent x & y
2
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
8
Kendall3	Correlation	Coefficient,	𝒓𝑲:	
	
π‘Ÿ! =
"
#βˆ™(#&')
βˆ™ βˆ‘ βˆ‘ 𝑠𝑔𝑛(π‘₯) βˆ’ π‘₯*+ βˆ™ 𝑠𝑔𝑛(𝑦) βˆ’ 𝑦*+	
)&'
*+'
#
)+"
(sgn	ΒΊ	the	sign	function)	
Spearman	&	Kendall	coefficients	are	special	cases	of	a	general	
rank	correlation	coefficient4	
3
https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
4
https://en.wikipedia.org/wiki/Rank_correlation#General_correlation_coefficient
9
PROPERTIES	FOR	π‘Ÿ",$	AND	π‘Ÿ",%$	
Ø βˆ’1 ≀ π‘Ÿ,,. ≀ 1, βˆ’1 ≀ π‘Ÿ,,/. ≀ 1
Ø An exact value of +1 or -1 indicates a perfect positive or
negative relationship.
Ø A correlation value close to 0 indicates no relationship.
Ø The closer to +1 or -1 the coefficient, the stronger the
bivariate association.
Ø We square the distance 𝐷!
1
(π‘₯!,"
())
, 	𝑦!,"
())
- so π‘Ÿ!,. will have the
same units as Pearson’s correlation coefficient.
10
THE	PROPOSED	CORRELATION	COEFFICIENT		
FOR	DATA	NOT	REJECTED	AS	NORMAL	
Compute the standardized values (s) of order p=1 as
𝑧',)
(0)
=
1!&1
|1"&1|
333333333, bar ≑ arithmetic	mean	
π‘Ÿ#,. = M
π‘Ÿ#,./, if π‘Ÿ#,./ β‰₯ βˆ’π‘Ÿ#,.,	
π‘Ÿ#,.,	, if π‘Ÿ#,./ < βˆ’π‘Ÿ#,.,
For positive correlation: π‘Ÿ#,./ = 1 βˆ’
#
1
βˆ™ NOπ‘₯1,𝑖
(𝑠)
βˆ’ 𝑦1,𝑖
(𝑠)
O
PPPPPPPPPPPPPPPPP
Q
1
For negative correlation: π‘Ÿ#,., =
#
1
βˆ™ NOπ‘₯1,𝑖
(𝑠)
+ 𝑦1,𝑖
(𝑠)
O
PPPPPPPPPPPPPPPPP
Q
1
βˆ’ 1
Oπ‘₯1,𝑖
(𝑠)
βˆ’ 𝑦1,𝑖
(𝑠)
O
PPPPPPPPPPPPPPPPP !
β†’	√2, as 𝑛 β†’ ∞	&	independent	π‘₯	&	𝑦 (numerical finding)
We use this version in the applications and simulations.
11
THE	PROPOSED	CORRELATION	COEFFICIENT		
FOR	NONNORMAL	DATA	
Compute Rankings and their standardized values of order 1 as
𝑅)(𝑧") =
2(A!),2(A)
BBBBBB
C2(A(),2(A)
BBBBBBC
BBBBBBBBBBBBBBBBBB, , bar ≑ arithmetic	mean
π‘Ÿ#,2. = M
π‘Ÿ#,2./, if π‘Ÿ#,2./ β‰₯ βˆ’π‘Ÿ#,2.,	
π‘Ÿ#,2.,	, if π‘Ÿ#,2./ < βˆ’π‘Ÿ#,2.,
For positive correlation: π‘Ÿ#,2./ = 1 βˆ’ :
#
0
βˆ™ |𝑅)(π‘₯D) βˆ’ 𝑅)(𝑦D)|
PPPPPPPPPPPPPPPPPPPPPP;
1
For negative correlation: π‘Ÿ#,2., = :
#
0
βˆ™ |𝑅)(π‘₯D) + 𝑅)(𝑦D)|
PPPPPPPPPPPPPPPPPPPPPP;
1
βˆ’ 1
|𝑅)(π‘₯D) βˆ’ 𝑅)(𝑦D)|
PPPPPPPPPPPPPPPPPPPPPP
!
β†’ 	L = 1.344	 as 𝑛 β†’ ∞ &	independent	π‘₯	&	𝑦 (numerical finding)
We use this version in the applications and simulations.
12
THE	CUT-OFF	POINTS	AND	P-VALUES	FOR	𝒓𝒑,𝑽	AND	
𝒓𝒑,𝑹𝑽	ARE	COMPUTED	BY	PERMUTATION	TESTS	
	
Cut-Off Points	for π‘Ÿ',/.	(two-sided Ξ±=0.05 or one-sided Ξ±=0.025)
n c n c n c n c n c n c
5 0.938 12 0.616 19 0.518 30 0.422 80 0.276 500 0.123
6 0.891 13 0.594 20 0.511 35 0.395 90 0.259 1000 0.091
7 0.784 14 0.592 21 0.491 40 0.372 100 0.248 2000 0.069
8 0.754 15 0.576 22 0.486 45 0.355 150 0.208 5000 0.050
9 0.729 16 0.559 23 0.479 50 0.337 200 0.184 104
0.039
10 0.713 17 0.538 24 0.477 60 0.311 300 0.151 105
0.023
11 0.646 18 0.535 25 0.460 70 0.293 400 0.135 106
0.019
Example: In a case with nonnormal data and n=37, we observe π‘Ÿ!,#$ = 0.41.
Then we can reject 𝐻%:		𝜌 = 0 and accept 𝐻!:	𝜌 > 0 with α=0.025 since
,π‘Ÿ!,#$, > 𝑐1,𝑅𝑉 = 0.395.
13
PERMUTATION	TESTS	
Permutation tests5 have been used for hypothesis testing of correlation
coefficients between two variables, x and y. Initially, calculate the
correlation coefficient repeatedly after shuffling the observations of the
variable y and keeping constant the order of the observations for the
variable x. Then, we can derive p-values from the distribution of the
computed correlation coefficients. Permutation tests6 enjoy the
following merits against other standard statistical tests:
β€’ Approximate p-values very satisfactory.
β€’ Do not assume any particular distribution (distribution-free).
β€’ Are suitable for small samples.
β€’ Are applicable to non-random samples, e.g., time-series data.
5
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
6
Berry, K. J., Johnston, J. E., & Mielke, Jr.(Paul W.). (2018). The measurement of association: a permutation statistical
approach. Springer International Publishing.
14
PYTHON	CODE	FOR	THE	COEFFICIENTS	&	PERMUTATION	TESTS
def	arithm_corr(x,y,rank):	
				nper=1000	
				if	rank	==	0:	
								plimit	=	np.sqrt(2)	
				else:	
								plimit	=	1.3439871655521578	
								x	=	x.argsort(0).argsort(0)	
								y	=	y.argsort(0).argsort(0)	
				xm	=	np.mean(x)*np.ones(x.shape);	ym	=	np.mean(y)*np.ones(y.shape)				
				dmx	=	np.mean(abs(x-xm));	dmy	=	np.mean(abs(y-ym))	
				x	=	(x-xm)/dmx;	y	=	(y-ym)/dmy	
				del	xm,	ym,	dmx,	dmy				
				rp	=	1-(np.mean(abs(x-y)/plimit))**2	
				rm	=	-(1-(np.mean(abs(x+y)/plimit))**2)				
				if	rp>=-rm:	
								r	=	rp	
				else:	
								r	=	rm	
					
(continued)
15
PYTHON	CODE	FOR	THE	COEFFICIENTS	&	PERMUTATION	TESTS
(continued)	
	
#	Permutation	
				dr	=	np.zeros(nper)	
				yh	=	y.tolist()	
				for	ii	in	range(0,nper):		#	Permutation	
								yr	=	np.array(	random.sample(	yh,	len(yh)	)	)	
								rp2	=	1-(np.mean(abs(x-yr)/plimit))**2;		
								rm2	=	-(1-(np.mean(abs(x+yr)/plimit))**2)				
								if	rp2>=-rm2:	
												dr[ii]	=	rp2	
								else:	
												dr[ii]	=	rm2	
								ecdfr	=	ECDF(dr[:])	
				pv	=	ecdfr(-abs(r))	+	1	-	ecdfr(abs(r))	
				return	r,	pv
16
AN	INTERPRETATION	OF	THE	CORRELATION	COEFFICIENT	
π‘Ÿ),* =
⎩
βŽͺ
βŽͺ
⎨
βŽͺ
βŽͺ
⎧
π‘Ÿ),*+ = 1 βˆ’ )
*π‘₯),,
(-)
βˆ’ 𝑦),,
(-)
*
---------------
√2
0
.
, if π‘Ÿ),*+ β‰₯ βˆ’π‘Ÿ),*/	
π‘Ÿ),*/ = )
*π‘₯),,
(-)
βˆ’ 𝑦),,
(-)
*
---------------
√2
0
.
βˆ’ 1	, if π‘Ÿ),*+ < βˆ’π‘Ÿ),*/
ØThe correlation coefficient can be interpreted as the percentage change between
the squared distance of the standardized values and the squared limiting distance
for independent π‘₯	and	𝑦.
ØFor example, a value of r=0.5 implies 50% reduction of the squared observed
distance from the squared distance under independence.
ØIn the literature, it is known that the Pearson’s correlation can be viewed as are
scaled variance of the difference between standardized scores7
7
Rodgers; Nicewander (1988). "Thirteen ways to look at the correlation coefficient" (PDF). The American
Statistician. 42 (1): 59–66.
17
A	CLASSIFICATION	MODEL
We consider the test: 𝐻H:		𝜌 = 0	against	𝐻#:	𝜌 β‰  0
Binary Observed Variable 𝑦 = [𝜌 β‰  0] = M
1, Correlation	exists
0, No	Correlation
	
(Iverson bracket, 1 if condition is true, 0 otherwise)
First, we test: 𝐻H:	(π‘₯, 𝑦)~MN ≑ Multivariate	Normal	Distribution	
		 	 	 	 𝐻#:	(π‘₯, 𝑦) ≁ MN ≑ Multivariate	Normal	Distribution
≁ MN ≑ Does	NOT	follow	a	MN	Distribution		(Henze-Zirkler	MN	Test)	
Predicted Binary Variable 𝑦
8 = :
;π‘Ÿ!,# > 𝑐!,#?, 𝐻$	for	MN	cannot	be	Rejected
;π‘Ÿ!,%# > 𝑐!,%#?, 𝐻$	for	MN	is	Rejected
π‘Ÿ!,.	&	π‘Ÿ!,2. 	≑	Proposed	correlation	coefficients	for	normal	and	nonnormal
𝑐!,.	&	𝑐!,2.	<-	critical	values	are	computed	by	Permutation	Tests
18
A	MADE-UP	EXAMPLE	
The	Pearson	correlation	coefficient	is	not	robust	under	the	presence	of	outliers8.	
Thus,	we	use	Spearman,	Kendall	coefficients	&	the	proposed,	π‘Ÿ#,2.	coefficient.	
A	perfect	monotonic	relationship	with	two	outliers.	Only	the	proposed	
correlation,	π‘Ÿ#,2.,	recognizes	the	monotonic	relationship	p-value<0.01.		
	
	
	
Correlation Coefficients
& p-values
n=11 Correlation
Coefficient
p-value
Proposed
Rank, π‘Ÿ#,2.
0.754 0.000
Kendall 0.309 0.218
Spearman 0.091 0.790
8
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
19
AN	APPLICATION	TO	GDP	PER	CAPITA	
β€’ Public available data9 WORLD BANK
β€’ N=61 countries with GDP per Cap > 10,000$ in 2020, and full
annual data for 1981-2020
β€’ T=40 for the period 1981-2020. Analyze Growth Rates (%)
β€’ (612
-61)/2 = 1830 pairs (x,y) correlation cases
β€’ 1187 Not rejected as normal
β€’ 643 Rejected as Normal (Non-normal)
β€’ No causality. Lurking variables: Global or Continental Economy
β€’ Compare the economic growth of a country with its correlated
countries by regression residuals.
9
https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
20
APP	TO	GDP	π‘―πŸŽ	π…πŽπ‘	𝐌𝐍	π‚π€πππŽπ“	𝐁𝐄	𝐑𝐄𝐉𝐄𝐂𝐓𝐄𝐃	
Ø For bivariate data not rejected as Normal, we compare
Pearson with the proposed π‘Ÿ',.. 	
Ø 1=Reject 𝐻R:		𝜌 = 0, 0=otherwise	
Ø In 1045=690+355 cases, 88%, the two coefficients agree
[(1,1), (0,0)] and we assume that this is the true outcome.
This hypothesis may not be entirely accurate, but it does
not affect the conclusions for correlation comparisons.	
Pearson Proposed_Value, π‘Ÿ#,. Frequencies
1 1 690
1 0 44
0 1 98
0 0 355
Total 1187
21
APP	TO	GDP	FOR	DATA	NOT	REJECTED	AS	NORMAL	
	
Ø In	98	cases	π‘Ÿ',.	gives	the	right	signal	not	recognized	
by	the	Pearson	Coefficient.	
Ø In	44	cases	the	Pearson	Coefficient	indicates	
significant	correlation	from	which	only	in	18	cases	are	
correct	(41%)	and	in	26	cases	incorrect	(59%).	
Ø This	happens	because	although	the	data	can	be	
assumed	normal,	there	are	outliers	and/or	influential	
points.	
Ø In	cases	of	coefficient	disagreement,	we	examine	the	
cases	if	there	is	a	correlation.
22
A	CASE	FOR	DATA	NOT	REJECTED	AS	NORMAL	
Correlation Coefficients & p-values for Euro Area vs Qatar.
1981-2020, n=40
1981-2019 excluding
outliers 1986 & 2000, n=38
Correl. p-value Correl. p-value
Proposed, π‘Ÿ#,.
0.346 0.022 0.482 0.000
Pearson 0.143 0.379 0.478 0.002
Only	the	proposed	correlation,	π‘Ÿ#,.,	recognizes	the	relationship	p-value<0.05.		
There	is	a	significant	positive	relationship	after	removing	2	outliers.
23
A	CASE	FOR	NONNORMAL	DATA	
Correlation Coefficients & p-values for UK vs Trinidad and Tobago
1981-2020, n=40
1981-2019 excluding outliers 1986,
1987, 1988, 2008 & 2009, n=35
Correl. p-value Correl. p-value
Proposed, π‘Ÿ#,2.
0.452 0.009 0.613 0.002
Kendall 0.169 0.124 0.348 0.003
Spearman 0.229 0.155 0.494 0.003
Only	the	proposed	correlation,	π‘Ÿ#,2.,	recognizes	the	relationship	p-value<0.05.		
There	is	a	significant	positive	relationship	after	removing	5	outliers.
24
AN	APPLICATION	TO	GDP	FOR	NONNORMAL	DATA	
Ø For NONNORMAL data, we compare the Spearman,
Kendall coefficients	with	the proposed π‘Ÿ',/.. 	
Ø 1=Reject 𝐻R:		𝜌 = 0, 0=otherwise	
Ø In 643=297+296 cases, 92.2%, the 3 coefficients agree
[(1,1,1), (0,0,0)] and we assume that this is the true
outcome. This hypothesis may not be entirely accurate, but
it does not affect the conclusions for correlation
comparisons.
25
THE	APPLICATION	TO	GDP	FOR	NONNORMAL	DATA	
1=Reject 𝐻R:		𝜌 = 0, 0=otherwise	
Spearman Kendall Proposed_Rank, π‘Ÿ#,2. Frequencies
1 1 1 297
1 1 0 1
1 0 1 0
0 1 1 12
1 0 0 4
0 1 0 1
0 0 1 32
0 0 0 296
Total 643
26
THE	APPLICATION	TO	GDP	FOR	NONNORMAL	DATA	
Ø In	32	cases	π‘Ÿ',.	gives	1	not	recognized	by	others	
(87.5%	correct,	and	22.5%	incorrect).	
Ø In	12	cases	the	π‘Ÿ',.	&	Kendall	indicate	significant	
correlation	(1)	in	contrast	to	Spearman	(0).	
(83%	correct,	and	17%	incorrect).			
Ø Only	in	4	cases	Spearman	coefficient	gives	1	not	
recognized	by	others	(50%	correct,	and	50%	incorrect).			
Ø In	cases	of	coefficient	disagreement,	we	examine	the	
cases	if	there	is	a	correlation.
27
THE CONFUSION TABLE10
The	confusion	table	reports	the	true	and	false	positives	and	negatives	(TP,	
FP,	TN,	and	FN).	TPR	(true	positive	rate	or	sensitivity	or	recall)	and	TNR	
(true	negative	rate	or	specificity)	are	the	percentages	of	positives	(1’s)	and	
negatives	 (NP,	 0’s),	 respectively,	 that	 are	 correctly	 classified	 (%CC1	 &	
%CC0	respectively).	Also,	PPV	(positive	predictive	value	or	precision)	and	
NPV	 (negative	 predictive	 value)	 are	 the	 proportions	 of	 positive	 and	
negative	signals	that	are	correct	predicted,	%CP1	&	%CP0,	respectively.	
The	proportion	of	total	correct	classified	(%CC)	is	given	by	the	accuracy	
measure	(ACC).	The	𝑭-measure	is	the	harmonic	mean	(HM)	of	TPR	and	
PPV.	The	probability	of	Type	I	error	=	1	–	TPR	and	it	is	the	probability	of	
the	 incorrect	 rejection	 of	 a	 true	null	 hypothesis	(a	 "false	 positive	 -	 FP")	
while	Type	II	error	=	1	–	TNR	and	it	is	the	failure	to	reject	a	false	null	
hypothesis	(a	"false	negative	-	FN").	
10
https://en.wikipedia.org/wiki/Confusion_matrix
28
THE CONFUSION TABLE11
(notation)
Predicted
𝑛 = 𝑛!,βˆ™ + 𝑛$,βˆ™
1
π‘›βˆ™,! = 𝑛!,! + 𝑛$,!
0
π‘›βˆ™,$ = 𝑛!,$ + 𝑛$,$
Observed
1
𝑛!,βˆ™ = 𝑛!,! + 𝑛!,$ 𝑇𝑃 = 𝑛!,! 𝐹𝑁 = 𝑛!,$
%CC1
𝑇𝑃𝑅 = 𝑛!,!/𝑛!,βˆ™
0
𝑛$,βˆ™ = 𝑛$,! + 𝑛$,$ 𝐹𝑃 = 𝑛$,! 𝑇𝑁 = 𝑛$,$
%CC0
𝑇𝑁𝑅 = 𝑛$,$/𝑛$,βˆ™
%CP1
𝑃𝑃𝑉 = 𝑛!,!/π‘›βˆ™,!
%CP0
𝑁𝑃𝑉 = 𝑛$,$/π‘›βˆ™,$
𝐴𝐢𝐢 = (𝑛!,!
+ 𝑛$,$)/𝑛
𝐹 = 𝐻𝑀(𝑇𝑃𝑅, 𝑃𝑃𝑉)
EWHM
HM -> Harmonic Mean, AM -> Arithmetic Mean
11 https://en.wikipedia.org/wiki/Confusion_matrix
29
A NEW PERFORMANCE MEASURE
ERROR WEIGHTED HARMONIC MEAN (EWHM)12
πΈπ‘Šπ»π‘€(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) =
=
4 βˆ’ 𝑇𝑃𝑅 βˆ’ 𝑇𝑁𝑅 βˆ’ 𝑃𝑃𝑉 βˆ’ 𝑁𝑃𝑉
1 βˆ’ 𝑇𝑃𝑅
𝑇𝑃𝑅 +
1 βˆ’ 𝑇𝑁𝑅
𝑇𝑁𝑅 +
1 βˆ’ 𝑃𝑃𝑉
𝑃𝑃𝑉 +
1 βˆ’ 𝑁𝑃𝑉
𝑁𝑃𝑉
=
=
1 βˆ’ 𝐴𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉)
1 βˆ’ 𝐻𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉)
βˆ™ 𝐻𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) =
=
1 βˆ’ 𝐴𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉)
𝐴𝑀 _
1
𝑇𝑃𝑅 ,
1
𝑇𝑁𝑅 ,
1
𝑃𝑃𝑉 ,
1
𝑁𝑃𝑉
` βˆ’ 1
The higher the variance of TPR, PPV, TNR, & NPV, the smaller the EWHM.
HM -> Harmonic Mean, AM -> Arithmetic Mean
12
Papadopoulos, S., Stavroulias, P., & Sager, T. (2019). Systemic early warning systems for EU14 based on the 2008 crisis:
proposed estimation and model assessment for classification forecasting. Journal of Banking Regulation, 20(3), 226-244.
30
NOTATION	FOR	PERFORMANCE	MEASURES	
FOR	THE	GDP	APP	
By Normal we really mean -> not rejected as Normal
All_PS All observations with Pearson for Normal & Spearman for Nonnormal
All_VR All observations with π‘Ÿ),* for Normal & π‘Ÿ),0* for Nonnormal
N_P Only Normal cases with Pearson
N_V Only Normal cases with π‘Ÿ),*
NN_S Only Nonnormal cases with Spearman
NN_K Only Nonnormal cases with Kendall
NN_RV Only Nonnormal cases with π‘Ÿ),0*
31
PERFORMANCE	MEASURES	FOR	THE	APPLICATION	
TP FP FN TN TPR PPV TNR NPV ACC Fb T1+T2 EWHM
All_PS 1007 29 138 656 87.95 97.20 95.77 82.62 90.87 92.34 16.29 86.74
All_VR 1123 6 22 679 98.08 99.47 99.12 96.86 98.47 98.77 2.80 97.73
N_P 708 26 98 355 87.84 96.46 93.18 78.37 89.55 91.95 18.98 84.20
N_V 788 0 18 381 97.77 100.0 100.0 95.49 98.48 98.87 2.23 96.23
NN_S 299 3 40 301 88.20 99.01 99.01 88.27 93.31 93.29 12.79 88.99
NN_K 308 3 31 301 90.86 99.04 99.01 90.66 94.71 94.77 10.13 91.49
NN_RV 335 6 4 298 98.82 98.24 98.03 98.68 98.44 98.53 3.15 98.37
Error Weighted Harmonic Mean (EWHM), ACC=Accuracy, Fb=F-measure
32
PERFORMANCE-MEASURE	DISCUSSION
Ø The overall measures ACC, Fb & EWHM give much higher
values when we use the proposed correlation coefficients π‘Ÿ',.
& π‘Ÿ',/. compared to the classic coefficients Pearson,
Spearman & Kendall separately for normal & nonnormal data
and all together.
Ø While, ACC, Fb & T1+T2 indicate π‘Ÿ',. (N_V) as the best
coefficient, our EWHM measure shows π‘Ÿ',/. (NN_RV) as the
best. The higher the variance of TPR, PPV, TNR, & NPV, the
smaller the EWHM.
33
SIMULATION	DESIGN	
Ø10,000 simulations
ØPython (NUMPY library)
ØTwo schemes of n correlated-bivariate data, xi and yi with Pearson’s coefficient
π‘Ÿ = π‘Ÿ56
ØScheme 1 contains all the data correlated as follows:
β€’ π‘₯", 𝑒" 	∼ 𝑁(0, 1),			𝑖 = 1,2, …	, 𝑛 independent and
β€’ 𝑦" = π‘Ÿ βˆ™ π‘₯" +	√1 βˆ’ π‘Ÿ1 βˆ™ 𝑒"
ØScheme 2 retains
β€’ 90% of the observations as in Scheme 1 and (**), and
β€’ the remaining 10% NONNORMAL from uniform distribution, U(a,b),
β€’ NONNORMAL within the area between two circles with radii, q, 3 and 3.5.
β€’ The random circle coordinates are given by:
β€’ π‘₯" = π‘ž" βˆ™ cos(𝑀"), 	𝑦" = π‘ž" βˆ™ sin(𝑀"),
β€’ π‘ž" ∼ π‘ˆ(3, 3.5)	and		𝑀" ∼ π‘ˆ(0, 2 βˆ™ πœ‹)
34
SIMULATION	RESULTS	
Simulation Results for Scheme 1, Normal Distribution
Correlation
Coefficients
n=20, r=0.70 n=50, r=0.50 n=100, r=0.35
Overall
Ξ± Ξ² Ξ± Ξ² Ξ± Ξ² Average	Ξ± + Ξ²
Pearson
Spearman
Kendall
Proposed π‘Ÿ&,#
Proposed π‘Ÿ&,%#
0.053
0.058
0.052
0.055
0.051
0.045
0.091
0.093
0.056
0.116
0.052
0.052
0.048
0.058
0.053
0.032
0.053
0.056
0.048
0.074
0.061
0.058
0.061
0.081
0.058
0.043
0.065
0.066
0.085
0.100
0.095
0.126
0.125
0.128
0.151
Ξ±	= P(Type I error), Ξ²	= P(Type II error)
Simulation Results for Scheme 2, Non-normal Distribution
Correlation
Coefficients
n=20, r=0.70 n=50, r=0.50 n=100, r=0.35 Overall
Ξ± Ξ² Ξ± Ξ² Ξ± Ξ² Average	Ξ± + Ξ²
Pearson
Spearman
Kendall
Proposed π‘Ÿ&,#
Proposed π‘Ÿ&,%#
0.070
0.059
0.062
0.091
0.062
0.467
0.319
0.270
0.204
0.242
0.063
0.046
0.048
0.082
0.055
0.328
0.228
0.204
0.156
0.176
0.062
0.053
0.056
0.108
0.083
0.435
0.267
0.241
0.176
0.193
0.475
0.324
0.294
0.272
0.270
35
SIMULATION	CONCLUSIONS	
Ø For nonnormal data, the proposed correlation coefficients
π‘Ÿ',/. & π‘Ÿ',. have higher power (1- Ξ²) and smaller total error
(Ξ± + Ξ²) than the classic coefficients.
Ø For normal data, the inverse order holds but in practice, we
get bivariate data NOT REJECTED AS NORMAL, which may have
a few outliers or nonnormalities.
Ø In additional, the linear relationships in real data are
hypothetical, while in simulations, are real.
Ø The Pearson coefficient performs best in simulations for
normal data but not in the application.
36
SIMULATION:	PYTHON	CODE	FOR	DATA	GENERATION	
def	data_corr(n,r,per,radius,rdif,distribution):	
				n1	=	np.rint((1-per/100)*n);	n1	=	n1.astype(int)	
				n2	=	np.rint(per/100*n);					n2	=	n2.astype(int)	
				radiusr	=	np.random.uniform(radius,(radius+rdif),size=(n2,1))			
				if	distribution	==	0:	
							x0	=	np.random.normal(size=(n1,1))	
							e	=	np.random.normal(size=(n1,1))	
				elif	distribution	==	1:	
							aa	=	np.sqrt(3)	
							x0	=	np.random.uniform(-aa,aa,size=(n1,1))	
							e	=	np.random.uniform(-aa,aa,size=(n1,1))	
										y0	=	r*x0	+	np.sqrt(1-r**2)*e	
				tt	=	np.random.rand(n2,1)*2*np.pi	
				outx	=	np.median(x0)	+	radiusr*np.cos(tt)	
				outy	=	np.median(y0)	+	radiusr*np.sin(tt)	
				y	=	np.append(y0,	outy)	
				x	=	np.append(x0,	outx)	
				return	x,	y
37
CONCLUSIONS	
Ø Proposed π‘Ÿ',. & π‘Ÿ',./ coefficients are more powerful than
the standard coefficients Pearson, Spearman, & Kendall
Ø Could be applied by all scientists analyzing data
Ø Provide substantive interpretation
Ø Robust to Nonnormality & Outliers
Ø Cut-off points for Proposed-Rank coefficient are given
ØThe Kendall coef. performs better than the Spearman coef.
OUR RECOMMENTATION: Use π‘Ÿ',/. for nonnormal data & π‘Ÿ',.
when multivariate normality (MN) cannot be rejected.

More Related Content

Similar to A PRACTICAL POWERFUL ROBUST AND INTERPRETABLE FAMILY OF CORRELATION COEFFICIENTS Savas Papadopoulos.pdf

Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Matt Hansen
Β 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
Β 
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docxChapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docxtiffanyd4
Β 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionRione Drevale
Β 
Estimating Causal Effects from Observations
Estimating Causal Effects from ObservationsEstimating Causal Effects from Observations
Estimating Causal Effects from ObservationsAntigoni-Maria Founta
Β 
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...IRJET Journal
Β 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11P Palai
Β 
9.1 9.2 9.3 using the graph calc
9.1 9.2 9.3 using the graph calc9.1 9.2 9.3 using the graph calc
9.1 9.2 9.3 using the graph calcleblance
Β 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments Furk Kruf
Β 
Causal Inference in R
Causal Inference in RCausal Inference in R
Causal Inference in RAna Daglis
Β 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdfSathyaNarayanan47813
Β 
Les5e ppt 09
Les5e ppt 09Les5e ppt 09
Les5e ppt 09Subas Nandy
Β 
NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...
NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...
NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...SubmissionResearchpa
Β 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING cscpconf
Β 

Similar to A PRACTICAL POWERFUL ROBUST AND INTERPRETABLE FAMILY OF CORRELATION COEFFICIENTS Savas Papadopoulos.pdf (20)

Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
Β 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
Β 
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docxChapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Β 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
Β 
Estimating Causal Effects from Observations
Estimating Causal Effects from ObservationsEstimating Causal Effects from Observations
Estimating Causal Effects from Observations
Β 
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Β 
Lecture 4
Lecture 4Lecture 4
Lecture 4
Β 
1. linear model, inference, prediction
1. linear model, inference, prediction1. linear model, inference, prediction
1. linear model, inference, prediction
Β 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11
Β 
9.1 9.2 9.3 using the graph calc
9.1 9.2 9.3 using the graph calc9.1 9.2 9.3 using the graph calc
9.1 9.2 9.3 using the graph calc
Β 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments
Β 
DMAIC
DMAICDMAIC
DMAIC
Β 
Correlation in Statistics
Correlation in StatisticsCorrelation in Statistics
Correlation in Statistics
Β 
Causal Inference in R
Causal Inference in RCausal Inference in R
Causal Inference in R
Β 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
Β 
Les5e ppt 09
Les5e ppt 09Les5e ppt 09
Les5e ppt 09
Β 
Ijetcas14 608
Ijetcas14 608Ijetcas14 608
Ijetcas14 608
Β 
NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...
NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...
NaΓ―ve Bayes Machine Learning Classification with R Programming: A case study ...
Β 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
Β 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
Β 

Recently uploaded

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
Β 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
Β 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
Β 
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘anilsa9823
Β 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSΓ©rgio Sacani
Β 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
Β 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
Β 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
Β 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
Β 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
Β 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSΓ©rgio Sacani
Β 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSΓ©rgio Sacani
Β 
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
Β 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
Β 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
Β 
Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.
Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.
Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.aasikanpl
Β 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
Β 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
Β 

Recently uploaded (20)

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
Β 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
Β 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Β 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
Β 
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘
Β 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Β 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
Β 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
Β 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
Β 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Β 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
Β 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
Β 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
Β 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Β 
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Β 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
Β 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Β 
Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.
Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.
Call Girls in Mayapuri Delhi πŸ’―Call Us πŸ”9953322196πŸ” πŸ’―Escort.
Β 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Β 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
Β 

A PRACTICAL POWERFUL ROBUST AND INTERPRETABLE FAMILY OF CORRELATION COEFFICIENTS Savas Papadopoulos.pdf

  • 1. 1 A PRACTICAL, POWERFUL, ROBUST AND INTERPRETABLE FAMILY OF CORRELATION COEFFICIENTS by Savas Papadopoulos1 Bank of Greece Department of Financial Stability sapapa@bankofgreece.gr 23/05/2022 Keywords: dependence test; permutation tests; Pearson, Spearman & Kendall coefficients; computational statistics The views expressed are those of the author and do not necessarily reflect those of Bank of Greece 1 Copyright Β© 2022 Savas Papadopoulos, www.protectmywork.com. All rights reserved.
  • 2. 2 CONTENTS Ø ABSTRACT Ø A FAMILY OF CORRELATION COEFFICIENTS Ø MADE-UP EXAMPLE Ø APPLICATION TO GDP PER CAPITA Ø SIMULATION Ø CONCLUSIONS
  • 3. 3 ABSTRACT If we conducted a competition for which statistical quantity would be the most valuable in exploratory data analysis, the winner would most likely be the correlation coefficient with a significant difference from its first competitor. In addition, most data applications contain non-normal data with outliers without being able to be converted to normal data. Therefore, we search for robust correlation coefficients to nonnormality and/or outliers that could be applied to all applications and detect influenced or hidden correlations not recognized by the most popular correlation coefficients. We introduce a correlation-coefficient family with the Pearson and Spearman coefficients as specific cases. Other family members provide desirable lower p-values than those derived by the standard coefficients in the earlier problems. The proposed family of coefficients, their cut-off points, and p-values, computed by permutation tests, could be applied by all scientists analyzing data. We share simulations, code, and real data by email or the internet.
  • 4. 4 INTRODUCTION Ø The existing literature recommends the Pearson (P) correlation for normal data and the Spearman (S) correlation for nonnormal data. Ø We propose alternative coefficients that perform better than P & S coefficients on applications. Ø Data-analysis software typically computes three classic correlation coefficients, Pearson’s, Spearman’s, and Kendall’s. Ø It is very striking that although the three correlation coefficients were developed in the late 19th and early 20th centuries, and despite the rapid development of computers, the three coefficients still dominate the use.
  • 5. 5 THE CORRELATION COEFFICIENT FAMILY Define the Minkowski distance: 𝐷!(π‘₯", 𝑦") = ( # $ βˆ™ βˆ‘ |π‘₯" βˆ’ 𝑦"|! $ "%# - #/! In this study we mainly apply for p=1 (Manhattan distance) Compute the standardized values of order p as π‘₯!," ()) = +!,+ -"(+!,+) Proposed 1, Value Correlation for positive & negative relationships π‘Ÿ!,. = / π‘Ÿ!,./ = 1 βˆ’ # 0# βˆ™ 𝐷! 1 (π‘₯!," ()) , 𝑦!," ()) - , if π‘Ÿ!,./ β‰₯ βˆ’π‘Ÿ!,., π‘Ÿ!,., = # 0# βˆ™ 𝐷! 1 (π‘₯!," ()) , βˆ’ 𝑦!," ()) - βˆ’ 1, if π‘Ÿ!,./ < βˆ’π‘Ÿ!,., 𝐷! (π‘₯!," ()) , 𝑦!," ()) - ! β†’ 𝐿, as 𝑛 β†’ ∞ (convergence in probability)
  • 6. 6 THE CORRELATION COEFFICIENT FAMILY WITH RANKINGS Proposed 2, Rank-Value Correlation (Standardized Rankings 𝑅! ()) (π‘₯") & 𝑅! ()) (𝑦") for positive & negative relationships π‘Ÿ!,2. = / π‘Ÿ!,2./ = 1 βˆ’ # 0# βˆ™ 𝐷! 1 :𝑅! ()) (π‘₯"), 𝑅! ()) (𝑦"); , if π‘Ÿ!,2./ β‰₯ βˆ’π‘Ÿ!,2., π‘Ÿ!,2., = # 0# βˆ™ 𝐷! 1 :𝑅! ()) (π‘₯"), βˆ’π‘…! ()) (𝑦"); βˆ’ 1, if π‘Ÿ!,2./ < βˆ’π‘Ÿ!,2., 𝐷! :𝑅! ()) (π‘₯"), βˆ’π‘…! ()) (𝑦"); ! β†’ 𝐿, as 𝑛 β†’ ∞ (convergence in probability)
  • 7. 7 Pearson Correlation Coefficient, 𝒓𝑷𝒆, as a Special Case (p=2): π‘Ÿ56 = π‘Ÿ56(π‘₯", 𝑦") = = π‘Ÿ56/ = 1 βˆ’ -# # 7+#,! (&) , 9#,! (&) : 1 , if π‘Ÿ56/ β‰₯ βˆ’π‘Ÿ56, π‘Ÿ56, = -# #7+#,! (&) ,,9#,! (&) : 1 βˆ’ 1, if π‘Ÿ56/ < βˆ’π‘Ÿ56, 𝐷1 (π‘₯1," ()) , 𝑦1," ()) - ! β†’ √2, as 𝑛 β†’ ∞ & independent x & y Spearman2 Correlation Coefficient, 𝒓𝑺, as a Special Case (p=2): π‘Ÿ< = π‘Ÿ56[𝑅(π‘₯"), 𝑅(𝑦")] = 1 βˆ’ = $#,# βˆ™ 𝐷1 1[𝑅(π‘₯"), 𝑅(𝑦")] -#[2(+!),,2(9!)] $ ! β†’ # √= , as 𝑛 β†’ ∞ & independent x & y 2 https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
  • 8. 8 Kendall3 Correlation Coefficient, 𝒓𝑲: π‘Ÿ! = " #βˆ™(#&') βˆ™ βˆ‘ βˆ‘ 𝑠𝑔𝑛(π‘₯) βˆ’ π‘₯*+ βˆ™ 𝑠𝑔𝑛(𝑦) βˆ’ 𝑦*+ )&' *+' # )+" (sgn ΒΊ the sign function) Spearman & Kendall coefficients are special cases of a general rank correlation coefficient4 3 https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient 4 https://en.wikipedia.org/wiki/Rank_correlation#General_correlation_coefficient
  • 9. 9 PROPERTIES FOR π‘Ÿ",$ AND π‘Ÿ",%$ Ø βˆ’1 ≀ π‘Ÿ,,. ≀ 1, βˆ’1 ≀ π‘Ÿ,,/. ≀ 1 Ø An exact value of +1 or -1 indicates a perfect positive or negative relationship. Ø A correlation value close to 0 indicates no relationship. Ø The closer to +1 or -1 the coefficient, the stronger the bivariate association. Ø We square the distance 𝐷! 1 (π‘₯!," ()) , 𝑦!," ()) - so π‘Ÿ!,. will have the same units as Pearson’s correlation coefficient.
  • 10. 10 THE PROPOSED CORRELATION COEFFICIENT FOR DATA NOT REJECTED AS NORMAL Compute the standardized values (s) of order p=1 as 𝑧',) (0) = 1!&1 |1"&1| 333333333, bar ≑ arithmetic mean π‘Ÿ#,. = M π‘Ÿ#,./, if π‘Ÿ#,./ β‰₯ βˆ’π‘Ÿ#,., π‘Ÿ#,., , if π‘Ÿ#,./ < βˆ’π‘Ÿ#,., For positive correlation: π‘Ÿ#,./ = 1 βˆ’ # 1 βˆ™ NOπ‘₯1,𝑖 (𝑠) βˆ’ 𝑦1,𝑖 (𝑠) O PPPPPPPPPPPPPPPPP Q 1 For negative correlation: π‘Ÿ#,., = # 1 βˆ™ NOπ‘₯1,𝑖 (𝑠) + 𝑦1,𝑖 (𝑠) O PPPPPPPPPPPPPPPPP Q 1 βˆ’ 1 Oπ‘₯1,𝑖 (𝑠) βˆ’ 𝑦1,𝑖 (𝑠) O PPPPPPPPPPPPPPPPP ! β†’ √2, as 𝑛 β†’ ∞ & independent π‘₯ & 𝑦 (numerical finding) We use this version in the applications and simulations.
  • 11. 11 THE PROPOSED CORRELATION COEFFICIENT FOR NONNORMAL DATA Compute Rankings and their standardized values of order 1 as 𝑅)(𝑧") = 2(A!),2(A) BBBBBB C2(A(),2(A) BBBBBBC BBBBBBBBBBBBBBBBBB, , bar ≑ arithmetic mean π‘Ÿ#,2. = M π‘Ÿ#,2./, if π‘Ÿ#,2./ β‰₯ βˆ’π‘Ÿ#,2., π‘Ÿ#,2., , if π‘Ÿ#,2./ < βˆ’π‘Ÿ#,2., For positive correlation: π‘Ÿ#,2./ = 1 βˆ’ : # 0 βˆ™ |𝑅)(π‘₯D) βˆ’ 𝑅)(𝑦D)| PPPPPPPPPPPPPPPPPPPPPP; 1 For negative correlation: π‘Ÿ#,2., = : # 0 βˆ™ |𝑅)(π‘₯D) + 𝑅)(𝑦D)| PPPPPPPPPPPPPPPPPPPPPP; 1 βˆ’ 1 |𝑅)(π‘₯D) βˆ’ 𝑅)(𝑦D)| PPPPPPPPPPPPPPPPPPPPPP ! β†’ L = 1.344 as 𝑛 β†’ ∞ & independent π‘₯ & 𝑦 (numerical finding) We use this version in the applications and simulations.
  • 12. 12 THE CUT-OFF POINTS AND P-VALUES FOR 𝒓𝒑,𝑽 AND 𝒓𝒑,𝑹𝑽 ARE COMPUTED BY PERMUTATION TESTS Cut-Off Points for π‘Ÿ',/. (two-sided Ξ±=0.05 or one-sided Ξ±=0.025) n c n c n c n c n c n c 5 0.938 12 0.616 19 0.518 30 0.422 80 0.276 500 0.123 6 0.891 13 0.594 20 0.511 35 0.395 90 0.259 1000 0.091 7 0.784 14 0.592 21 0.491 40 0.372 100 0.248 2000 0.069 8 0.754 15 0.576 22 0.486 45 0.355 150 0.208 5000 0.050 9 0.729 16 0.559 23 0.479 50 0.337 200 0.184 104 0.039 10 0.713 17 0.538 24 0.477 60 0.311 300 0.151 105 0.023 11 0.646 18 0.535 25 0.460 70 0.293 400 0.135 106 0.019 Example: In a case with nonnormal data and n=37, we observe π‘Ÿ!,#$ = 0.41. Then we can reject 𝐻%: 𝜌 = 0 and accept 𝐻!: 𝜌 > 0 with Ξ±=0.025 since ,π‘Ÿ!,#$, > 𝑐1,𝑅𝑉 = 0.395.
  • 13. 13 PERMUTATION TESTS Permutation tests5 have been used for hypothesis testing of correlation coefficients between two variables, x and y. Initially, calculate the correlation coefficient repeatedly after shuffling the observations of the variable y and keeping constant the order of the observations for the variable x. Then, we can derive p-values from the distribution of the computed correlation coefficients. Permutation tests6 enjoy the following merits against other standard statistical tests: β€’ Approximate p-values very satisfactory. β€’ Do not assume any particular distribution (distribution-free). β€’ Are suitable for small samples. β€’ Are applicable to non-random samples, e.g., time-series data. 5 https://en.wikipedia.org/wiki/Pearson_correlation_coefficient 6 Berry, K. J., Johnston, J. E., & Mielke, Jr.(Paul W.). (2018). The measurement of association: a permutation statistical approach. Springer International Publishing.
  • 16. 16 AN INTERPRETATION OF THE CORRELATION COEFFICIENT π‘Ÿ),* = ⎩ βŽͺ βŽͺ ⎨ βŽͺ βŽͺ ⎧ π‘Ÿ),*+ = 1 βˆ’ ) *π‘₯),, (-) βˆ’ 𝑦),, (-) * --------------- √2 0 . , if π‘Ÿ),*+ β‰₯ βˆ’π‘Ÿ),*/ π‘Ÿ),*/ = ) *π‘₯),, (-) βˆ’ 𝑦),, (-) * --------------- √2 0 . βˆ’ 1 , if π‘Ÿ),*+ < βˆ’π‘Ÿ),*/ ØThe correlation coefficient can be interpreted as the percentage change between the squared distance of the standardized values and the squared limiting distance for independent π‘₯ and 𝑦. ØFor example, a value of r=0.5 implies 50% reduction of the squared observed distance from the squared distance under independence. ØIn the literature, it is known that the Pearson’s correlation can be viewed as are scaled variance of the difference between standardized scores7 7 Rodgers; Nicewander (1988). "Thirteen ways to look at the correlation coefficient" (PDF). The American Statistician. 42 (1): 59–66.
  • 17. 17 A CLASSIFICATION MODEL We consider the test: 𝐻H: 𝜌 = 0 against 𝐻#: 𝜌 β‰  0 Binary Observed Variable 𝑦 = [𝜌 β‰  0] = M 1, Correlation exists 0, No Correlation (Iverson bracket, 1 if condition is true, 0 otherwise) First, we test: 𝐻H: (π‘₯, 𝑦)~MN ≑ Multivariate Normal Distribution 𝐻#: (π‘₯, 𝑦) ≁ MN ≑ Multivariate Normal Distribution ≁ MN ≑ Does NOT follow a MN Distribution (Henze-Zirkler MN Test) Predicted Binary Variable 𝑦 8 = : ;π‘Ÿ!,# > 𝑐!,#?, 𝐻$ for MN cannot be Rejected ;π‘Ÿ!,%# > 𝑐!,%#?, 𝐻$ for MN is Rejected π‘Ÿ!,. & π‘Ÿ!,2. ≑ Proposed correlation coefficients for normal and nonnormal 𝑐!,. & 𝑐!,2. <- critical values are computed by Permutation Tests
  • 19. 19 AN APPLICATION TO GDP PER CAPITA β€’ Public available data9 WORLD BANK β€’ N=61 countries with GDP per Cap > 10,000$ in 2020, and full annual data for 1981-2020 β€’ T=40 for the period 1981-2020. Analyze Growth Rates (%) β€’ (612 -61)/2 = 1830 pairs (x,y) correlation cases β€’ 1187 Not rejected as normal β€’ 643 Rejected as Normal (Non-normal) β€’ No causality. Lurking variables: Global or Continental Economy β€’ Compare the economic growth of a country with its correlated countries by regression residuals. 9 https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
  • 20. 20 APP TO GDP π‘―πŸŽ π…πŽπ‘ 𝐌𝐍 π‚π€πππŽπ“ 𝐁𝐄 𝐑𝐄𝐉𝐄𝐂𝐓𝐄𝐃 Ø For bivariate data not rejected as Normal, we compare Pearson with the proposed π‘Ÿ',.. Ø 1=Reject 𝐻R: 𝜌 = 0, 0=otherwise Ø In 1045=690+355 cases, 88%, the two coefficients agree [(1,1), (0,0)] and we assume that this is the true outcome. This hypothesis may not be entirely accurate, but it does not affect the conclusions for correlation comparisons. Pearson Proposed_Value, π‘Ÿ#,. Frequencies 1 1 690 1 0 44 0 1 98 0 0 355 Total 1187
  • 21. 21 APP TO GDP FOR DATA NOT REJECTED AS NORMAL Ø In 98 cases π‘Ÿ',. gives the right signal not recognized by the Pearson Coefficient. Ø In 44 cases the Pearson Coefficient indicates significant correlation from which only in 18 cases are correct (41%) and in 26 cases incorrect (59%). Ø This happens because although the data can be assumed normal, there are outliers and/or influential points. Ø In cases of coefficient disagreement, we examine the cases if there is a correlation.
  • 22. 22 A CASE FOR DATA NOT REJECTED AS NORMAL Correlation Coefficients & p-values for Euro Area vs Qatar. 1981-2020, n=40 1981-2019 excluding outliers 1986 & 2000, n=38 Correl. p-value Correl. p-value Proposed, π‘Ÿ#,. 0.346 0.022 0.482 0.000 Pearson 0.143 0.379 0.478 0.002 Only the proposed correlation, π‘Ÿ#,., recognizes the relationship p-value<0.05. There is a significant positive relationship after removing 2 outliers.
  • 23. 23 A CASE FOR NONNORMAL DATA Correlation Coefficients & p-values for UK vs Trinidad and Tobago 1981-2020, n=40 1981-2019 excluding outliers 1986, 1987, 1988, 2008 & 2009, n=35 Correl. p-value Correl. p-value Proposed, π‘Ÿ#,2. 0.452 0.009 0.613 0.002 Kendall 0.169 0.124 0.348 0.003 Spearman 0.229 0.155 0.494 0.003 Only the proposed correlation, π‘Ÿ#,2., recognizes the relationship p-value<0.05. There is a significant positive relationship after removing 5 outliers.
  • 24. 24 AN APPLICATION TO GDP FOR NONNORMAL DATA Ø For NONNORMAL data, we compare the Spearman, Kendall coefficients with the proposed π‘Ÿ',/.. Ø 1=Reject 𝐻R: 𝜌 = 0, 0=otherwise Ø In 643=297+296 cases, 92.2%, the 3 coefficients agree [(1,1,1), (0,0,0)] and we assume that this is the true outcome. This hypothesis may not be entirely accurate, but it does not affect the conclusions for correlation comparisons.
  • 25. 25 THE APPLICATION TO GDP FOR NONNORMAL DATA 1=Reject 𝐻R: 𝜌 = 0, 0=otherwise Spearman Kendall Proposed_Rank, π‘Ÿ#,2. Frequencies 1 1 1 297 1 1 0 1 1 0 1 0 0 1 1 12 1 0 0 4 0 1 0 1 0 0 1 32 0 0 0 296 Total 643
  • 26. 26 THE APPLICATION TO GDP FOR NONNORMAL DATA Ø In 32 cases π‘Ÿ',. gives 1 not recognized by others (87.5% correct, and 22.5% incorrect). Ø In 12 cases the π‘Ÿ',. & Kendall indicate significant correlation (1) in contrast to Spearman (0). (83% correct, and 17% incorrect). Ø Only in 4 cases Spearman coefficient gives 1 not recognized by others (50% correct, and 50% incorrect). Ø In cases of coefficient disagreement, we examine the cases if there is a correlation.
  • 27. 27 THE CONFUSION TABLE10 The confusion table reports the true and false positives and negatives (TP, FP, TN, and FN). TPR (true positive rate or sensitivity or recall) and TNR (true negative rate or specificity) are the percentages of positives (1’s) and negatives (NP, 0’s), respectively, that are correctly classified (%CC1 & %CC0 respectively). Also, PPV (positive predictive value or precision) and NPV (negative predictive value) are the proportions of positive and negative signals that are correct predicted, %CP1 & %CP0, respectively. The proportion of total correct classified (%CC) is given by the accuracy measure (ACC). The 𝑭-measure is the harmonic mean (HM) of TPR and PPV. The probability of Type I error = 1 – TPR and it is the probability of the incorrect rejection of a true null hypothesis (a "false positive - FP") while Type II error = 1 – TNR and it is the failure to reject a false null hypothesis (a "false negative - FN"). 10 https://en.wikipedia.org/wiki/Confusion_matrix
  • 28. 28 THE CONFUSION TABLE11 (notation) Predicted 𝑛 = 𝑛!,βˆ™ + 𝑛$,βˆ™ 1 π‘›βˆ™,! = 𝑛!,! + 𝑛$,! 0 π‘›βˆ™,$ = 𝑛!,$ + 𝑛$,$ Observed 1 𝑛!,βˆ™ = 𝑛!,! + 𝑛!,$ 𝑇𝑃 = 𝑛!,! 𝐹𝑁 = 𝑛!,$ %CC1 𝑇𝑃𝑅 = 𝑛!,!/𝑛!,βˆ™ 0 𝑛$,βˆ™ = 𝑛$,! + 𝑛$,$ 𝐹𝑃 = 𝑛$,! 𝑇𝑁 = 𝑛$,$ %CC0 𝑇𝑁𝑅 = 𝑛$,$/𝑛$,βˆ™ %CP1 𝑃𝑃𝑉 = 𝑛!,!/π‘›βˆ™,! %CP0 𝑁𝑃𝑉 = 𝑛$,$/π‘›βˆ™,$ 𝐴𝐢𝐢 = (𝑛!,! + 𝑛$,$)/𝑛 𝐹 = 𝐻𝑀(𝑇𝑃𝑅, 𝑃𝑃𝑉) EWHM HM -> Harmonic Mean, AM -> Arithmetic Mean 11 https://en.wikipedia.org/wiki/Confusion_matrix
  • 29. 29 A NEW PERFORMANCE MEASURE ERROR WEIGHTED HARMONIC MEAN (EWHM)12 πΈπ‘Šπ»π‘€(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) = = 4 βˆ’ 𝑇𝑃𝑅 βˆ’ 𝑇𝑁𝑅 βˆ’ 𝑃𝑃𝑉 βˆ’ 𝑁𝑃𝑉 1 βˆ’ 𝑇𝑃𝑅 𝑇𝑃𝑅 + 1 βˆ’ 𝑇𝑁𝑅 𝑇𝑁𝑅 + 1 βˆ’ 𝑃𝑃𝑉 𝑃𝑃𝑉 + 1 βˆ’ 𝑁𝑃𝑉 𝑁𝑃𝑉 = = 1 βˆ’ 𝐴𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) 1 βˆ’ 𝐻𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) βˆ™ 𝐻𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) = = 1 βˆ’ 𝐴𝑀(𝑇𝑃𝑅, 𝑇𝑁𝑅, 𝑃𝑃𝑉, 𝑁𝑃𝑉) 𝐴𝑀 _ 1 𝑇𝑃𝑅 , 1 𝑇𝑁𝑅 , 1 𝑃𝑃𝑉 , 1 𝑁𝑃𝑉 ` βˆ’ 1 The higher the variance of TPR, PPV, TNR, & NPV, the smaller the EWHM. HM -> Harmonic Mean, AM -> Arithmetic Mean 12 Papadopoulos, S., Stavroulias, P., & Sager, T. (2019). Systemic early warning systems for EU14 based on the 2008 crisis: proposed estimation and model assessment for classification forecasting. Journal of Banking Regulation, 20(3), 226-244.
  • 30. 30 NOTATION FOR PERFORMANCE MEASURES FOR THE GDP APP By Normal we really mean -> not rejected as Normal All_PS All observations with Pearson for Normal & Spearman for Nonnormal All_VR All observations with π‘Ÿ),* for Normal & π‘Ÿ),0* for Nonnormal N_P Only Normal cases with Pearson N_V Only Normal cases with π‘Ÿ),* NN_S Only Nonnormal cases with Spearman NN_K Only Nonnormal cases with Kendall NN_RV Only Nonnormal cases with π‘Ÿ),0*
  • 31. 31 PERFORMANCE MEASURES FOR THE APPLICATION TP FP FN TN TPR PPV TNR NPV ACC Fb T1+T2 EWHM All_PS 1007 29 138 656 87.95 97.20 95.77 82.62 90.87 92.34 16.29 86.74 All_VR 1123 6 22 679 98.08 99.47 99.12 96.86 98.47 98.77 2.80 97.73 N_P 708 26 98 355 87.84 96.46 93.18 78.37 89.55 91.95 18.98 84.20 N_V 788 0 18 381 97.77 100.0 100.0 95.49 98.48 98.87 2.23 96.23 NN_S 299 3 40 301 88.20 99.01 99.01 88.27 93.31 93.29 12.79 88.99 NN_K 308 3 31 301 90.86 99.04 99.01 90.66 94.71 94.77 10.13 91.49 NN_RV 335 6 4 298 98.82 98.24 98.03 98.68 98.44 98.53 3.15 98.37 Error Weighted Harmonic Mean (EWHM), ACC=Accuracy, Fb=F-measure
  • 32. 32 PERFORMANCE-MEASURE DISCUSSION Ø The overall measures ACC, Fb & EWHM give much higher values when we use the proposed correlation coefficients π‘Ÿ',. & π‘Ÿ',/. compared to the classic coefficients Pearson, Spearman & Kendall separately for normal & nonnormal data and all together. Ø While, ACC, Fb & T1+T2 indicate π‘Ÿ',. (N_V) as the best coefficient, our EWHM measure shows π‘Ÿ',/. (NN_RV) as the best. The higher the variance of TPR, PPV, TNR, & NPV, the smaller the EWHM.
  • 33. 33 SIMULATION DESIGN Ø10,000 simulations ØPython (NUMPY library) ØTwo schemes of n correlated-bivariate data, xi and yi with Pearson’s coefficient π‘Ÿ = π‘Ÿ56 ØScheme 1 contains all the data correlated as follows: β€’ π‘₯", 𝑒" ∼ 𝑁(0, 1), 𝑖 = 1,2, … , 𝑛 independent and β€’ 𝑦" = π‘Ÿ βˆ™ π‘₯" + √1 βˆ’ π‘Ÿ1 βˆ™ 𝑒" ØScheme 2 retains β€’ 90% of the observations as in Scheme 1 and (**), and β€’ the remaining 10% NONNORMAL from uniform distribution, U(a,b), β€’ NONNORMAL within the area between two circles with radii, q, 3 and 3.5. β€’ The random circle coordinates are given by: β€’ π‘₯" = π‘ž" βˆ™ cos(𝑀"), 𝑦" = π‘ž" βˆ™ sin(𝑀"), β€’ π‘ž" ∼ π‘ˆ(3, 3.5) and 𝑀" ∼ π‘ˆ(0, 2 βˆ™ πœ‹)
  • 34. 34 SIMULATION RESULTS Simulation Results for Scheme 1, Normal Distribution Correlation Coefficients n=20, r=0.70 n=50, r=0.50 n=100, r=0.35 Overall Ξ± Ξ² Ξ± Ξ² Ξ± Ξ² Average Ξ± + Ξ² Pearson Spearman Kendall Proposed π‘Ÿ&,# Proposed π‘Ÿ&,%# 0.053 0.058 0.052 0.055 0.051 0.045 0.091 0.093 0.056 0.116 0.052 0.052 0.048 0.058 0.053 0.032 0.053 0.056 0.048 0.074 0.061 0.058 0.061 0.081 0.058 0.043 0.065 0.066 0.085 0.100 0.095 0.126 0.125 0.128 0.151 Ξ± = P(Type I error), Ξ² = P(Type II error) Simulation Results for Scheme 2, Non-normal Distribution Correlation Coefficients n=20, r=0.70 n=50, r=0.50 n=100, r=0.35 Overall Ξ± Ξ² Ξ± Ξ² Ξ± Ξ² Average Ξ± + Ξ² Pearson Spearman Kendall Proposed π‘Ÿ&,# Proposed π‘Ÿ&,%# 0.070 0.059 0.062 0.091 0.062 0.467 0.319 0.270 0.204 0.242 0.063 0.046 0.048 0.082 0.055 0.328 0.228 0.204 0.156 0.176 0.062 0.053 0.056 0.108 0.083 0.435 0.267 0.241 0.176 0.193 0.475 0.324 0.294 0.272 0.270
  • 35. 35 SIMULATION CONCLUSIONS Ø For nonnormal data, the proposed correlation coefficients π‘Ÿ',/. & π‘Ÿ',. have higher power (1- Ξ²) and smaller total error (Ξ± + Ξ²) than the classic coefficients. Ø For normal data, the inverse order holds but in practice, we get bivariate data NOT REJECTED AS NORMAL, which may have a few outliers or nonnormalities. Ø In additional, the linear relationships in real data are hypothetical, while in simulations, are real. Ø The Pearson coefficient performs best in simulations for normal data but not in the application.
  • 37. 37 CONCLUSIONS Ø Proposed π‘Ÿ',. & π‘Ÿ',./ coefficients are more powerful than the standard coefficients Pearson, Spearman, & Kendall Ø Could be applied by all scientists analyzing data Ø Provide substantive interpretation Ø Robust to Nonnormality & Outliers Ø Cut-off points for Proposed-Rank coefficient are given ØThe Kendall coef. performs better than the Spearman coef. OUR RECOMMENTATION: Use π‘Ÿ',/. for nonnormal data & π‘Ÿ',. when multivariate normality (MN) cannot be rejected.