SlideShare a Scribd company logo
1 of 19
Download to read offline
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
472
Some robust methods for estimating random linear regression model
Amani Emad1, Ahmed Shaker Mohammed Tahir2
1
Mustansiriyah University, College of Management and Economics Statistics, department
Baghdad, Iraq, amaniemad93@uomustansiriyah.edu.iq
2
Mustansiriyah University, College of Management and Economics Statistics, department
Baghdad, Iraq, ahmutwali@uomustansiriyah.edu.iq
Abstract
The regression model with a random explanatory variable is one of the widely used models in
representing the regression relationship between variables in various economic or life
phenomena. This model is called the random linear regression model. In this model, the different
values of the explanatory variable occur with a certain probability, instead of being fixed in
repeated samples, so it contradicts one of the basic assumptions of the linear regression model,
which leads to the least squares estimators losing some or all of their optimal properties,
depending on the nature of the relationship between the explanatory variable . random and
random error limits.
As an alternative to the ordinary least squares method, the modified maximum likelihood
method, which was previously used by many researchers, was used to estimate the coefficients of
the random regression model. also, two methods were employed, the M method and the robust
empirical likelihood method, which were not used previously in estimating the coefficients of the
random regression model, but were used in estimating linear regression models that suffer from
some standard problems, or in the event that the sample values contain outliers or extreme
values. The three methods are among the robust estimation methods.
For the purpose of comparison between the aforementioned estimation methods, simulated
experiments were conducted, which revealed the preference of the modified maximum likelihood
method despite the great convergence between the estimation results of the three methods. We
also carried out a practical application to estimate the regression relationship between packed cell
volume (PCV) as a response variable and random blood sugar (RBS) as a random explanatory
variable, based on the data of a random sample of 30 patients with heart disease. The results of
the practical application were consistent with the results of the simulation experiments.
Keywords: random explanatory variable; random linear regression; modified maximum
likelihood; M method; robust empirical likelihood.
Introduction
One of the basic assumptions of the linear regression model is that the explanatory variables are
constant in the repeated samples, However, this assumption may be practically unfulfilled, as is
the case in most economic and life phenomena, as the explanatory variables are random, in the
sense that the values of those variables are not fixed, so the different values of the explanatory
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
473
variables occur with a certain probability. In such cases the values of the explanatory variable are
determined along with the values of the response variable as a result of some probability
mechanism, rather than their values being controlled by the experimenters.
In the case of adopting the method of least squares to estimate the coefficients of the regression
model with random explanatory variables, the estimates of that method lose some or all of their
optimal properties, and that depends on the type of relationship between the explanatory
variables and the random error terms. In this case, alternative methods should be sought for the
method of least squares that are efficient and not greatly affected by the deviation from the
assumptions of the linear regression model. Among these methods are the robust methods,
including the modified maximum likelihood method (MMLE), the M method, and the robust
empirical likelihood method (REL).
Kerridge was the first to deal with the regression model with random explanatory variables in
1967 [1]., assuming that the values of the explanatory variables corresponding to the values of
the response variable were drawn from a population with a multivariate normal distribution and
independent of the random error terms, and he worked on analyzing the resulting error Regarding
the estimation process, he explained that it is useful to study the regression model with random
explanatory variables, despite Ehrenberg's suggestion in 1963 that studying this model is useless
and limited, especially when the number of explanatory variables is large . Lai and Wei,1981, [2]
, studied the properties of least squares estimators for random regression models under special
assumptions of the random explanatory variable and the random error terms.
Tiku and Vaugham ,1997, [3] ,used the modified maximum likelihood method in estimating the
coefficients of the binary regression model with a random explanatory variable, as they showed
that the use of the maximum likelihood estimators in estimating this model is intractable and
difficult to solve iteratively, and they showed that the modified maximum likelihood estimators
are explicit functions of the sample observations, and therefore they are easy to calculate and are
approximately efficient, and for small samples, they are almost fully efficient . In the year 2000
Vaughan and Tiku , 2000, [4] , they also used the modified maximum likelihood method in
estimating the regression model with a random explanatory variable, as they assumed that this
variable is distributed according to the extreme value distribution, and that the conditional
distribution of the response variable is the normal distribution, and that these estimates are highly
efficient. They also derived the hypothesis test method. For form transactions .
Islam and Tiku ,2004, [5] , derived modified maximum likelihood estimators and M estimators
for the coefficients of the multiple linear regression model under the assumption that the random
errors do not follow the normal distribution, as they concluded that these estimators are efficient
and robust, and that the ordinary least squares estimators are less efficient .
Sazak et al ,2006, [6] , used modified maximum likelihood estimators for the coefficients of the
random linear regression model, which are characterized by efficiency and robustness, assuming
that both the random explanatory variable and the random errors follow a generalized logistic
distribution .The same result reached by Islam and Tiku ,2010, [7], in adopting the same robust
method to estimate the multiple linear regression model when both the explanatory variable and
the random errors do not follow the normal distribution .
Bharali and Hazarika, 2019, [8] , provided a historical review of the works which related with the
regression with stochastic explanatory variable, they referred to the properties of ordinary least
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
474
squares estimators, the use of the modified maximum likelihood method, and to the case of non-
normal distribution of explanatory variables and random error limits .
Deshpande and V. Bhat, 2019, [9], used kernel regression estimator which is one of the non-
parametric estimation methods to estimate random linear regression model, they suggested the
use of adaptive Nadaraya-Watson estimator, they obtained the asymptotic distribution of that
estimator, and through the relative efficiency they showed the efficiency of this estimator
compared to other estimators that depend on different kernel functions .
Lateef and Ahmed, 2021, [10] ,used modified maximum likelihood method to estimate the
parameters of the multiple regression model when the random errors is distributed abnormal, by a
simulation study they showed the performance of the modified maximum likelihood estimators
comparing with the ordinary least squares estimators .
In this research, the modified maximum likelihood method was used to estimate the coefficients
of the linear regression model with a random explanatory variable, in addition to the use of two
robust methods, the M method and the robust empirical likelihood method, these two methods
were used to estimate the coefficients of the regression model in the presence of outliers or if the
model suffered from some problems, however, they were not previously used in estimating the
regression model with a random explanatory variable, so they will be employed to estimate the
coefficients of this model.
The research aims to compare the aforementioned robust estimation methods by adopting the
mean square error criterion using simulation experiments with application on real data.
1. Linear regression model with a random explanatory variable:
Assuming the simple linear regression model shown by the following equation:
π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖
(1)
For a random sample of size n, π‘Œπ‘– is a random variable with independent values that represents
the response variable, which is supposed to be distributed according to a normal distribution,
with a mean equal to 𝐸(π‘Œπ‘–) and a variance of 𝜎2
, 𝑋𝑖 represents the explanatory variable, which is
assumed to be constant for repeated samples, 𝑒𝑖 is the random errors of the model, whose values
are assumed to be independent and distributed according to a normal distribution, with a mean
equal to 0 and a fixed variance equal to 𝜎2
. In many practical applications, the explanatory
variable is not fixed, but rather a random variable. In this case, the regression model according to
equation (1) is called the random regression model [7] . The random explanatory variable can
follow a normal distribution or any other probability distribution.
Assuming that the probability distribution of the explanatory variable X is the extreme value
distribution according to the following probability function [4] .
𝑓(π‘₯) =
1
ΞΈ
𝑒π‘₯𝑝 [βˆ’((π‘₯ βˆ’ 𝛼)/πœƒ) βˆ’ 𝑒π‘₯𝑝 (βˆ’
π‘₯βˆ’π›Ό
πœƒ
)] ، βˆ’ ∞ < π‘₯ < ∞
(2)
Whereas, Ξ±: represents the location parameter, ΞΈ: represents the scale parameter. When Ξ± = 0 and
ΞΈ = 1 we obtain the standard extreme value distribution. As for the conditional probability
distribution of the response variable Y, the normal distribution is according to the following
conditional probability function:
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
475
𝑓(π‘Œ|𝑋 = π‘₯) = [2πœ‹πœŽ2(1 βˆ’ 𝜌2)]βˆ’
1
2 𝑒π‘₯𝑝 [βˆ’
1
2𝜎2((1 βˆ’ 𝜌2)
(𝑦 βˆ’ 𝛽0
βˆ’ 𝜌
𝜎
πœƒ
(π‘₯ βˆ’ 𝛼))
2
] βˆ’ ∞ < 𝑦 < ∞
(3)
Where 𝛽0 ∈ 𝑅 , 𝛽1 = 𝜌
𝜎
πœƒ
, 𝜎 > 0, βˆ’1 < 𝜌 < 1, 𝑒 = 𝑦 βˆ’ 𝛽0 βˆ’ 𝜌
𝜎
πœƒ
(π‘₯ βˆ’ 𝛼), and u represented
the random error terms which normally distributed with conditional mean equal to zero (
𝐸(𝑒|𝑋 = π‘₯) = 0) and fixed conditional variance ( π‘‰π‘Žπ‘Ÿ(𝑒 / 𝑋) = 𝜎2
(1 βˆ’ 𝜌2
)),[6], note that:
𝐸(π‘Œ|𝑋 = π‘₯) = 𝛽0 βˆ’ 𝜌
𝜎
πœƒ
(π‘₯ βˆ’ 𝛼)
(4)
2. The robust estimation methods of the coefficients of the random linear regression model:
This section includes a presentation of the robust estimation methods, the subject of research,
which were adopted in estimating the simple linear regression model with a random explanatory
variable.
3.1: Modified Maximum Likelihood Method (MMLE):
The process of estimating the coefficients of the regression model with a random explanatory
variable according to equation (1) using the method of Maximum Likelihood and depending on
the probability functions (2) and (3), is difficult and useless, as the equations resulting from the
derivation of the Likelihood function do not have explicit solutions and the process of solving
them by iterative methods is fraught with difficulties because it does not achieve rapid
convergence or convergence to false values, in addition to having multiple roots. Therefore, the
modified Maximum Likelihood method proposed by Tiku and Vaugham, [3,4], is used .
The joint probability density function for the response variable Y and the explanatory variable X
is as follows:
𝑓(𝑦, π‘₯) = 𝑓(π‘₯)𝑓(𝑦π‘₯)
= (
𝟏
ΞΈ
) (
𝟏
Οƒ
) (1 βˆ’ 𝜌2
)βˆ’
1
2 exp [βˆ’ (
π‘₯ βˆ’ Ξ±
ΞΈ
) βˆ’ exp (βˆ’
π‘₯ βˆ’ Ξ±
ΞΈ
)] 𝑒π‘₯𝑝 [βˆ’
1
2𝜎2((1 βˆ’ 𝜌2)
(𝑒𝑖)2
]
(5)
Therefore, the possibility function is in the following form:
𝐿 = (ΞΈ)βˆ’π‘›
(Οƒ)βˆ’π‘›
(1 βˆ’ 𝜌2
)βˆ’
𝑛
2 exp [βˆ’ βˆ‘ ((
π‘₯π‘–βˆ’Ξ±
ΞΈ
) + exp (βˆ’
π‘₯π‘–βˆ’Ξ±
ΞΈ
))
𝑛
𝑖=1 βˆ’
1
2𝜎2((1βˆ’πœŒ2)
βˆ‘ 𝑒𝑖
2
𝑛
𝑖=1 ]
(6)
And that the logarithm of the Likelihood function is as in the following formula, assuming that
𝑧𝑖 = (
π’™π’Šβˆ’π›‚
ΞΈ
).
LnL = βˆ’nlnΞΈ βˆ’ nlnΟƒ βˆ’
n
2
ln(1 βˆ’ ρ2) βˆ’ βˆ‘ (zi + exp(βˆ’zi))
n
i=1 βˆ’
1
2Οƒ2((1βˆ’Ο2)
βˆ‘ ui
2
n
i=1
(7)
By partial derivation of equation (7) with respect to the coefficients (ΞΈ, Ξ±, 𝛽0, 𝛽1, 𝝈 ، ρ ) and
setting them equal to zero, we obtain the following maximum likelihood equations, by solving
which we obtain the estimates of those coefficients:
πœ• ln 𝐿
πœ•Ξ±
=
𝑛
πœƒ
βˆ’
1
πœƒ
βˆ‘ exp(βˆ’π‘§π‘–)
𝑛
𝑖=1 βˆ’
𝜌
πœƒπœŽ(1βˆ’πœŒ2)
βˆ‘ 𝑒𝑖
𝑛
𝑖=1 = 0 (8)
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
476
πœ• ln 𝐿
πœ•ΞΈ
= βˆ’
𝑛
πœƒ
βˆ’
1
πœƒ
βˆ‘ 𝑧𝑖 exp(βˆ’π‘§π‘–) +
1
πœƒ
𝑛
𝑖=1 βˆ‘ 𝑧𝑖 βˆ’
𝜌
πœƒπœŽ(1βˆ’πœŒ2)
βˆ‘ 𝑧𝑖𝑒𝑖
𝑛
𝑖=1
𝑛
𝑖=1 = 0
(9)
πœ• ln 𝐿
πœ•π›½0
=
1
𝜎2((1βˆ’πœŒ2)
βˆ‘ 𝑒𝑖
𝑛
𝑖=1 = 0
(10)
πœ• ln 𝐿
πœ•πœŽ
= βˆ’
𝑛
𝜎
+
1
𝜎3((1βˆ’πœŒ2)
βˆ‘ 𝑒𝑖
2
𝑛
𝑖=1 +
𝜌
𝜎2(1βˆ’πœŒ2)
βˆ‘ 𝑧𝑖𝑒𝑖
𝑛
𝑖=1 = 0
(11)
πœ• ln 𝐿
πœ•Ο
=
π‘›πœŒ
(1βˆ’πœŒ2)
βˆ’
𝜌
𝜎2(1βˆ’πœŒ2)2
βˆ‘ 𝑒𝑖
2
𝑛
𝑖=1 +
1
𝜎(1βˆ’πœŒ2)
βˆ‘ 𝑧𝑖𝑒𝑖
𝑛
𝑖=1 = 0
(12)
πœ• ln 𝐿
πœ•π›½1
=
πœƒ
𝜎2(1βˆ’πœŒ2)
βˆ‘ 𝑧𝑖𝑒𝑖
𝑛
𝑖=1 = 0
(13)
Equations (8&9) contain difficult limits, which leads to the fact that the set of equations (8-13)
has no explicit solutions, which must be solved using iterative methods, so the estimates of the
maximum likelihood are difficult to obtain. To address this problem, we convert the explanatory
variable into ordered statistics 𝒙(π’Š), 1 ≀ 𝑖 ≀ 𝑛, by arranging its value ascendingly as follows:
𝒙(𝟏) ≀ 𝒙(𝟐) ≀ β‹― ≀ 𝒙(𝒏)
(14)
Depending on the ordered values of the explanatory variable, π’š[π’Š] represents the values of the
response variable corresponding to the ordered values 𝒙(π’Š), as well as the standard formula 𝑧𝑖 and
the random errors 𝑒𝑖 are rewritten as follows:
𝒛(π’Š) =
𝒙(π’Š)βˆ’π›‚
ΞΈ
(15)
𝒖[π’Š] = π’š[π’Š] βˆ’ 𝛽0 βˆ’ 𝜌
𝜎
πœƒ
(𝒙(π’Š) βˆ’ 𝛼)
(16)
Since the total number of errors is not affected by the order of values, which leads to that
βˆ‘ 𝑒[𝑖]
𝑛
𝑖=1 = 0 , and βˆ‘ 𝑧(𝑖)𝑒[𝑖]
𝑛
𝑖=1 = 0 . Based on the above, equations (8-13) are rewritten as
follows:
πœ• ln πΏβˆ—
πœ•Ξ±
≑
𝑛
πœƒ
βˆ’
1
πœƒ
βˆ‘ exp(βˆ’π‘§(𝑖))
𝑛
𝑖=1 = 0
(17)
πœ• ln πΏβˆ—
πœ•ΞΈ
≑ βˆ’
𝑛
πœƒ
+
1
πœƒ
βˆ‘ 𝑧(𝑖)
𝑛
𝑖=1 βˆ’
1
πœƒ
βˆ‘ 𝑧(𝑖) exp(βˆ’π‘§(𝑖))
𝑛
𝑖=1 = 0
(18)
πœ• ln πΏβˆ—
πœ•π›½0
≑
1
𝜎2((1βˆ’πœŒ2)
βˆ‘ 𝑒[𝑖]
𝑛
𝑖=1 = 0
(19)
πœ• ln πΏβˆ—
πœ•πœŽ
≑ βˆ’
𝑛
𝜎
+
1
𝜎3(1βˆ’πœŒ2)
βˆ‘ 𝑒[𝑖]
2
𝑛
𝑖=1 = 0
(20)
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
477
πœ• ln πΏβˆ—
πœ•Ο
≑
π‘›πœŒ
(1βˆ’πœŒ2)
βˆ’
𝜌
𝜎2(1βˆ’πœŒ2)2
βˆ‘ 𝑒[𝑖]
2
𝑛
𝑖=1 = 0
(21)
πœ• ln πΏβˆ—
πœ•π›½1
≑
πœƒ
𝜎2(1βˆ’πœŒ2)
βˆ‘ 𝑧(𝑖)𝑒[𝑖]
𝑛
𝑖=1 = 0
(22)
Equations (17-22) represent the modified maximum likelihood equations. To achieve the linear
form of exp(βˆ’π‘§(𝑖)) the first two terms of the Taylor series can be used, assuming that 𝑑(𝑖) =
𝐸[𝑧(𝑖)] as follows:
exp(βˆ’π‘§(𝑖)) β‰… exp(βˆ’π‘‘(𝑖)) + [𝑧(𝑖) βˆ’ 𝑑(𝑖)] {
𝑑
𝑑𝑧
π‘’βˆ’π‘§
}
𝑧=𝑑(𝑖) (23)
Which can be rewritten as follows:
exp(βˆ’π‘§(𝑖)) = π‘Žπ‘– + 𝑏𝑖𝒛(π’Š) , 𝟏 ≀ π’Š ≀ 𝒏
(24)
Where:
π‘Žπ‘– = π‘’βˆ’π’•(π’Š)(1 + 𝒕(π’Š))
(25)
𝑏𝑖 = βˆ’ {
𝑑
𝑑𝑧
π‘’βˆ’π‘§
}
𝑧=𝒕(π’Š) (26)
Noting that 𝑏𝑖 β‰₯ 0 for every i= 1,2,…, n . Equations (17) and (18) are rewritten after substituting
equation (24) in each of them to become:
πœ• ln πΏβˆ—
πœ•Ξ±
≑
𝑛
πœƒ
βˆ’
1
πœƒ
βˆ‘ (π‘Žπ‘– + 𝑏𝑖𝑧(𝑖))
𝑛
𝑖=1 = 0
(27)
πœ• ln πΏβˆ—
πœ•ΞΈ
≑ βˆ’
𝑛
πœƒ
+
1
πœƒ
βˆ‘ 𝑧(𝑖) βˆ’
1
πœƒ
𝑛
𝑖=1 βˆ‘ 𝑧(𝑖)(π‘Žπ‘– + 𝑏𝑖𝑧(𝑖))
𝑛
𝑖=1 = 0
(28)
By solving equations (27), (28) and (19-22) we obtain the so-called modified maximum
likelihood estimates, which are characterized in the case of large samples as unbiased and have a
variance equal to the minimum variance bounds (MVB), and that they are fully efficient, and in
the case of small samples that they are mostly fully efficient, as they are explicit functions with
sample observations and are easy to compute, [ 4 ].
From equation (27) and after substituting for 𝑧(𝑖) we get the estimate for parameter Ξ± as follows:
𝑛
πœƒ
βˆ’
1
πœƒ
βˆ‘ (
ΞΈπ‘Žπ‘–+𝑏𝑖(π‘₯(𝑖)βˆ’Ξ±
ΞΈ
)
𝑛
𝑖=1 = 0
(29)
And by doing some mathematical operations we get an estimate of Ξ±:
Ξ±
Μ‚ =
βˆ‘ 𝑏𝑖𝒙(π’Š)
𝑛
𝑖=1 +βˆ‘ (1βˆ’π‘Žπ‘–)
𝑛
𝑖=1 ΞΈ
Μ‚
βˆ‘ 𝑏𝑖
𝑛
𝑖=1 (30)
which can be rewritten as follows:
Ξ±
Μ‚ = K + DΞΈ
Μ‚
(31)
Where: K =
βˆ‘ 𝑏𝑖𝒙(π’Š)
𝑛
𝑖=1
βˆ‘ 𝑏𝑖
𝑛
𝑖=1
and D =
1
βˆ‘ 𝑏𝑖
𝑛
𝑖=1
βˆ‘ (1 βˆ’ π‘Žπ‘–)
𝑛
𝑖=1
We get the estimate for parameter ΞΈ from equation (28) after substituting in the value of 𝑧(𝑖) and
performing some mathematical operations, as shown follows:
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
478
βˆ‘ 𝑏𝑖
𝑛
𝑖=1 (π‘₯(𝑖) βˆ’ Ξ±
Μ‚)2
βˆ’ [πœƒ
Μ‚(π‘›πœƒ
Μ‚ + βˆ‘ (π‘₯(𝑖) βˆ’ Ξ±
Μ‚)
𝑛
𝑖=1 ] βˆ’ πœƒ
Μ‚ βˆ‘ π‘Žπ‘–
𝑛
𝑖=1 (π‘₯(𝑖) βˆ’ Ξ±
Μ‚) = 0
(32)
Equation (32) can be rewritten after replacing Ξ±
Μ‚ with its equal according to equation (31) as
follows:
π‘›πœƒ
Μ‚2
βˆ’ πœƒ
Μ‚ βˆ‘ (π‘₯(𝑖) βˆ’ 𝐾)
𝑛
𝑖=1 (1 βˆ’ π‘Žπ‘–) + βˆ‘ 𝑏𝑖
𝑛
𝑖=1 (π‘₯(𝑖) βˆ’ 𝐾)
2
= 0
(33)
Equation (33) can be written in a simpler form as follows:
π΄πœƒ
Μ‚2
βˆ’ π΅πœƒ
Μ‚ + 𝐢 = 0
(34)
Where: 𝐴 = 𝑛 , 𝐡 = βˆ‘ (π‘₯(𝑖) βˆ’ π‘˜)
𝑛
𝑖=1 (1 βˆ’ π‘Žπ‘–) , 𝐢 = βˆ‘ 𝑏𝑖 (π‘₯(𝑖) βˆ’ π‘˜)
2
𝑛
𝑖=1
The solution to equation (34) represents the estimation of the parameter ΞΈ, i.e.:
πœƒ
Μ‚ =
𝐡+√𝐡2+4π‘ŽπΆ
2π‘Ž (35)
In order to obtain an estimate of the parameter 𝛽0, we substitute the value of 𝑒[𝑖] into equation
(19) and perform some calculations as follows:
1
𝜎2((1βˆ’πœŒ2)
βˆ‘ π’š[π’Š] βˆ’ 𝛽
Μ‚0 βˆ’ 𝜌
𝜎
πœƒ
(𝒙(π’Š) βˆ’ 𝛼)
𝒏
π’Š=𝟏 = 0
𝛽
Μ‚0 = 𝑦
Μ… βˆ’ 𝜌
Μ‚
𝜎
Μ‚
πœƒ
Μ‚ ( π‘₯Μ… βˆ’ 𝛼
Μ‚)
(36)
To find an estimate of Οƒ, we rewrite 𝑒[𝑖] by substituting for 𝛽
Μ‚0 in equation (16) its equal
according to equation (36), as follows:
𝑒[𝑖] = (𝑦[𝑖] βˆ’ 𝑦) βˆ’ 𝜌
Μ‚ (
𝜎
Μ‚
πœƒ
Μ‚) (π‘₯(𝑖) βˆ’ π‘₯)
(37)
Squaring both sides and taking the sum of all values of the sample size, we get:
βˆ‘ 𝑒[𝑖]
2
𝑛
𝑖=1 = βˆ‘ [(𝑦[𝑖] βˆ’ 𝑦) βˆ’ 𝜌
Μ‚ (
𝜎
Μ‚
πœƒ
Μ‚) (π‘₯(𝑖) βˆ’ π‘₯) ]
2
𝑛
𝑖=1
(38)
By substituting equation (38) into equation (20) and performing some mathematical operations,
we get the following equation:
βˆ‘ (𝑦[𝑖] βˆ’ 𝑦
Μ…)2
𝑛
𝑖=1 βˆ’ 2 βˆ‘ (𝑦[𝑖] βˆ’ 𝑦
Μ…)
𝑛
𝑖=1 ( π‘₯(𝑖) βˆ’ π‘₯Μ…)𝜌
Μ‚
𝜎
Μ‚
πœƒ
Μ‚ +
𝜌
Μ‚2𝜎
Μ‚2
πœƒ
Μ‚2
βˆ‘ ( π‘₯(𝑖) βˆ’ π‘₯Μ…)
2
𝑛
𝑖=1 βˆ’ π‘›πœŽ2
+
π‘›πœŽ2
𝜌2
= 0
(39)
The last equation can be rewritten in terms of the variance and covariance of the sample and for
both 𝑦[𝑖] and π‘₯(𝑖) as follows:
n𝑠2
𝑦 βˆ’ 2𝑛𝑠π‘₯𝑦
𝑠π‘₯𝑦 πœƒ
Μ‚
𝑠2
π‘₯𝜎
Μ‚
.
𝜎
Μ‚
πœƒ
Μ‚ +
𝑠2
π‘₯π‘¦πœƒ
Μ‚2
𝑠4
π‘₯𝜎
Μ‚2
.
𝜎
Μ‚2
πœƒ
Μ‚2
𝑛𝑠2
π‘₯ βˆ’ π‘›πœŽ2
+ π‘›πœŽ2 𝑠2
π‘₯π‘¦πœƒ
Μ‚2
𝑠4
π‘₯𝜎
Μ‚2
= 0
(40)
The solution to the last equation represents an estimate of Οƒ, as shown below:
𝜎
Μ‚ = [𝑠2
𝑦 +
𝑠2
π‘₯𝑦
𝑠2
π‘₯
(
πœƒ
Μ‚2
𝑠2
π‘₯
βˆ’ 1)]
1
2
(41)
To get an estimate of 𝛽1, we substitute the value of 𝑒[𝑖] and 𝑧(𝑖) into equation (22) and perform
some mathematical operations to get the following equation:
βˆ‘ (π‘₯(𝑖) βˆ’ 𝛼)(𝑦[𝑖] βˆ’ 𝛽
Μ‚0) βˆ’ (π‘₯(𝑖) βˆ’ 𝛼)
2
𝛽
Μ‚1 = 0
𝑛
𝑖=1
(42)
And by solving this equation, we get an estimate of 𝛽1:
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
479
𝛽
Μ‚1 =
βˆ‘ (𝒙(π’Š)βˆ’π›Ό
Μ‚)(π’š[π’Š]βˆ’π›½
Μ‚0)
𝑛
𝑖=1
(𝒙(π’Š)βˆ’π›Ό
Μ‚)
2
(43)
Estimating the correlation parameter ρ we get from equation (21) after substituting for the sum of
the squares of random errors 𝑒[𝑖] , as it can be rewritten in terms of the variance and covariance
of the sample and for each of 𝑦[𝑖] and π‘₯(𝑖) after making some mathematical operations to get the
following formula:
[(
𝜎
Μ‚
πœƒ
Μ‚)
2
𝑆π‘₯
2
+ 𝜎
Μ‚2
] 𝜌
Μ‚2
βˆ’ 2 [(
𝜎
Μ‚
πœƒ
Μ‚) 𝑆π‘₯𝑦] 𝜌
Μ‚ βˆ’ [
𝑆π‘₯𝑦
2
𝑆π‘₯
2 (
πœƒ
Μ‚
𝑆π‘₯
2 βˆ’ 1)] = 0
(44)
Which can be rewritten as follows:
π‘ŽπœŒ
Μ‚2
βˆ’ π‘πœŒ
Μ‚ βˆ’ 𝑐 = 0
(45)
Where: π‘Ž = [(
𝜎
Μ‚
πœƒ
Μ‚)
2
𝑆π‘₯
2
+ 𝜎
Μ‚2
] , 𝑏 = βˆ’2 [(
𝜎
Μ‚
πœƒ
Μ‚) 𝑆π‘₯𝑦] , 𝑐 = βˆ’ [
𝑆π‘₯𝑦
2
𝑆π‘₯
2 (
πœƒ
Μ‚
𝑆π‘₯
2 βˆ’ 1)]
Solving equation (45), we get an estimate of the parameter ρ as follows:
𝜌
Μ‚ =
πœƒ
̂𝑆π‘₯𝑦
𝜎
̂𝑆π‘₯
2
(46)
Estimates of the modified maximum likelihood method were derived assuming that the
probability distribution of the explanatory variable X is the extreme value distribution, but it can
be any other probability distribution imposed on us by the data under research.
3.2: M-estimation Method:
This method is one of the robust methods which is commonly used in estimating regression
model coefficients in case these models suffer from basic problems, and in particular if the data
contains outliers or extreme values. In this research, this method was used to estimate the
coefficients of the regression model with one or more random explanatory variables, as far as we
know, they were not used in estimating these models.
This method is an extension of the Maximum Likelihood method and is symbolized by the
symbol M, which indicates that these estimates are of the type of estimates of the Maximum
Likelihood. The idea of this method is based on minimizing a function in random errors instead
of minimizing random errors themselves, meaning that the M estimator (𝛽
̂𝑀) is obtained from
minimizing the following objective function, [ 11,12 ] .
𝛽
̂𝑀 = min
𝛽
βˆ‘ 𝜌(𝑒𝑖)
𝑛
𝑖=1
(47)
Where 𝜌(𝑒𝑖) is the objective function and it is a function of the random errors of the regression
model. By substituting for the random errors 𝑒𝑖 with the standard errors, the objective function
(47) can be rewritten as follows:
𝛽
̂𝑀 = min
𝛽
βˆ‘ 𝜌 (
π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖
𝜎
)
𝑛
𝑖=1
(48)
The standard errors function ρ takes several forms, such as, Huber's formula, Tukey's bisquare
formula, as shown in Table (1). By deriving the objective function in equation (48) and setting it
equal to zero, we get:
βˆ‘ π‘‹π‘–π‘—πœ“ (
π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖
𝜎
Μ‚
) = 0 , 𝑗 = 0,1, … , π‘˜
𝑛
𝑖=1
(49)
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
480
where πœ“ represents the first derivative of the function ρ, 𝑋𝑖𝑗 is the observation i of the
explanatory variable j and 𝑋𝑖0 = 1 to include the fixed term in the model, equation (49) is solved
depending on the method of Iteratively reweighted least squares (IRLS) and using the weights
shown in the following equation, [ 12].
𝑀𝑖 = 𝑀(𝑒𝑖) =
πœ“(
π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖
𝜎
Μ‚
)
(
π‘Œπ‘–βˆ’π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖
𝜎
Μ‚
)
(50)
The weights 𝑀𝑖 are shown in Table (1) for each of the two formulas of the function ρ, the
estimate of the measurement parameter 𝜎 is obtained according to the following formula, [ 11].
𝜎
Μ‚ =
Median|π‘’π‘–βˆ’Median(𝑒)|
0.6745 (51)
Depending on the 𝑀𝑖 weights, equation (49) can be rewritten as follows:
βˆ‘ 𝑋𝑖𝑗𝑀𝑖 (
π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖
𝜎
Μ‚
) = 0 , 𝑗 = 0,1, … , π‘˜
𝑛
𝑖=1
(52)
Using the IRLS method, equation (52) can be solved by assuming that there are initial values for
the regression model coefficients 𝛽
Μ‚0
and the estimate of the measurement parameter Οƒ Μ‚. Equation
(52) can be rewritten using matrices to obtain the following equations system of weighted least
squares:
𝑋′
π‘Šπ‘‹π›½
Μ‚ = 𝑋′
π‘Šπ‘Œ
(53)
where W is a diagonal matrix with size (n * n) and the diagonal elements represent the 𝑀𝑖
weights, by solving equation (53) we obtain an estimate of the regression coefficients according
to the (M) method, which is shown in the following formula, [11] .
𝛽
̂𝑀 = (𝑋′
π‘Šπ‘‹)βˆ’1
𝑋′
π‘Šπ‘Œ
(54)
Table (1): Huber and Tukey's bisquare formulas for the function (ρ)
The value of the tune
parameter a
Weight Function
Objective Functions
Method
1.345
{
1 , 𝑖𝑓 |𝑒| ≀ π‘Ž
π‘Ž
|𝑒|
, 𝑖𝑓 |𝑒| > π‘Ž
{
1
2
𝑒2
, 𝑖𝑓 |𝑒| ≀ π‘Ž
π‘Ž|𝑒| βˆ’
1
2
π‘Ž2
, 𝑖𝑓 |𝑒| > π‘Ž
Huber
4.685
{(1 βˆ’ (
𝑒
π‘Ž
)
2
)
2
, |𝑒| ≀ π‘Ž
0 , |𝑒| > π‘Ž
{
π‘Ž2
6
(1 βˆ’ (
𝑒
π‘Ž
)
2
)
3
, |𝑒| ≀ π‘Ž
1
6
π‘Ž2
, |𝑒| > π‘Ž
Tukey's
Bisquare
3.3: Robust Empirical Likelihood Method (REL ):
The empirical likelihood method is one of the nonparametric estimation methods proposed by
Owen, 1988, [ 13] which is an alternative to the maximum likelihood method in the absence of
assumptions about the probability distribution of the random errors term of the regression model.
This method assumes that there are probability weights 𝑝𝑖 for each of the sample observations
(i=1,2,…,n ), it aims to estimate the regression model coefficients by maximizing the empirical
likelihood function, which is defined as the product of those probability weights, subject to some
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
481
restrictions related to the model coefficients that are to be estimated, , these constraints are
similar to the natural equations of the method of least squares or to the equations of the
likelihood under the hypothesis of normal distribution, which leads to this method being very
sensitive to the non-fulfillment of the hypothesis of a normal distribution or the presence of
outliers or extreme values, therefore, Ozdemir and Arslan, 2018 , [14] suggested the use of robust
constraints borrowed from the objective function of the robust estimation method M, in the sense
of reconciling the empirical likelihood method with the robust M method.
The empirical likelihood method will first be presented as an introduction to presenting the
robust empirical likelihood method .
For the linear regression model shown in equation (1) and assuming 𝑝𝑖 for each (i = 1,2,…,n),
represented the probability weights for each observation of the sample, which is unknown and
has different values for each observation and needs to be estimated, whereas, 𝑝𝑖 β‰₯ 0, the
empirical likelihood method used to estimate the vector of model coefficients 𝛽 and the
population variance 𝜎2
requires the maximization of the empirical likelihood function under the
availability of some constraints on the maximization process, as shown below:
𝐿𝐸𝐿(𝛽, 𝜎2) = ∏ 𝑝𝑖
𝑛
𝑖=1 m
(55)
Under the following constraints:
βˆ‘ 𝑝𝑖 = 1
𝑛
𝑖=1
(56)
βˆ‘ 𝑝𝑖 ( π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖
𝑛
𝑖=1 )𝑋𝑖 = 0
(57)
βˆ‘ 𝑝𝑖 [(π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖)2
βˆ’ 𝜎2
] = 0
𝑛
𝑖=1
(58)
By taking the logarithm of both sides of equation (55), we get the logarithm of the empirical
likelihood function as in the following equation:
πΏπ‘œπ‘”πΏ(𝛽, 𝜎2) = πΏπ‘œπ‘” ∏ 𝑝𝑖
𝑛
𝑖=1 = βˆ‘ πΏπ‘œπ‘”
𝑛
𝑖=1 (𝑝𝑖)
(59)
Estimating the probability weights 𝑝𝑇
= [𝑝1 𝑝2 … 𝑝𝑛] and the coefficients vector 𝛽𝑇
= [𝛽0 𝛽1]
and the variance 𝜎2
, can be obtained by maximizing the logarithm of the empirical likelihood
function under the realization of the three constraints according to equations (56- 58), this
maximization problem can be solved by using the Lagrange multiplier method, which requires
the formulation of the Lagrange function according to the following formula:
𝐿 (𝑝, 𝛽, πœ†0, πœ†1, πœ†2) = βˆ‘ πΏπ‘œπ‘”
𝑛
𝑖=1 (𝑝𝑖) βˆ’ πœ†0(βˆ‘ 𝑝𝑖 βˆ’ 1) βˆ’
𝑛
𝑖=1 π‘›πœ†1 βˆ‘ 𝑝𝑖 (π‘Œπ‘– βˆ’
𝑛
𝑖=1
𝑋𝑖
𝑇
𝛽) 𝑋𝑖 βˆ’ π‘›πœ†2 βˆ‘ 𝑝𝑖 [(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)2
βˆ’ 𝜎2
]
𝑛
𝑖=1
(60)
Where: πœ†0 , πœ†1, πœ†2 ∈ 𝑅 , which represented the Lagrange multiplier. The estimation process
requires first estimating the vector of probability weights by taking the derivative of the
Lagrange function (60) for each 𝑝𝑖 and equating it to zero, so that we can get an estimate of those
weights according to the following formula:
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
482
𝑝𝑖 =
1
πœ†0+π‘›πœ†1 (π‘Œπ‘–βˆ’π‘‹π‘–
𝑇 𝛽)𝑋𝑖+ π‘›πœ†2[(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇𝛽)2βˆ’πœŽ2]
π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛
(61)
By taking the sum of both sides of equation (61) for all values of i, we get that πœ†0 = 𝑛 , so we
can rewrite this equation after substituting for πœ†0 as follows:
𝑝𝑖 =
1
𝑛( 1+πœ†1(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇 𝛽)𝑋𝑖+ πœ†2[(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇𝛽)2βˆ’πœŽ2]
π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛
(62)
By substituting for 𝑝𝑖 according to the formula (62) in the empirical likelihood function (60), we
get the following objective function in terms of 𝛽, πœ†1, πœ†2, 𝜎2
only:
𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2
) = βˆ’ βˆ‘ πΏπ‘œπ‘” (
𝑛
𝑖=1 1 + πœ†1 (π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽) 𝑋𝑖 + πœ†2 [(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)2
βˆ’ 𝜎2
] βˆ’
π‘›πΏπ‘œπ‘”π‘›
(63)
The values of the Lagrange multiplier πœ†1 and πœ†2 for each of the regression model coefficients
vector 𝛽 and 𝜎2
can be obtained through the following maximization problem:
πœ†
Μ‚ (𝛽, 𝜎2
) = π‘Žπ‘Ÿπ‘” min
πœ†
[βˆ’ βˆ‘ πΏπ‘œπ‘” (
𝑛
𝑖=1 πœ†0 + πœ†1(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)𝑋𝑖 + πœ†2[(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)2
βˆ’ 𝜎2
] βˆ’
π‘›πΏπ‘œπ‘”π‘›]
(64)
Where: πœ† = [πœ†1 πœ†2] . Noting that the problem of minimization (64) has no explicit solutions,
which requires numerical methods to find solutions to this problem, by substituting those
solutions in 𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2
) we get the logarithm of the empirical likelihood function
𝐿 (πœ†
Μ‚ (𝛽, 𝜎2
) , 𝛽 , 𝜎2
), this function is in terms of the vector of the coefficients 𝛽 and the variance
𝜎2
, and the estimation of the empirical maximum likelihood for those parameters is obtained by
solving the following maximization problem:
(𝛽
Μ‚, 𝜎
Μ‚2
) = π‘Žπ‘Ÿπ‘” max
(𝛽,𝜎2)
𝐿 (πœ†
Μ‚(𝛽, 𝜎2) , 𝛽 , 𝜎2
)
(65)
The last maximization problem is solved using numerical methods because there are no explicit
solutions for it.
We have shown previously that the method of the robust empirical likelihood method is the result
of replacing the constraints (56- 58) with the robust constraints borrowed from the objective
function of the robust estimation method M indicated by the formula (49), this modification was
adopted to increase the effectiveness of the empirical likelihood method in addressing the effect
of outliers on the estimation of the model's coefficients, and here we will employ this method to
estimate the coefficients of the linear regression model in case the explanatory variable is
random. The empirical likelihood function shown in relation (55) will be maximized after
replacing the second and third constraints, relations (57) and (58), with the following robust
constraints [14]
βˆ‘ 𝑝𝑖 πœ‘ ( π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖
𝑛
𝑖=1 )𝑋𝑖 = 0
(66)
βˆ‘ 𝑝𝑖 πœ‘[(π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖)2
βˆ’ 𝜎2
] = 0
𝑛
𝑖=1
(67)
where πœ‘ is a function of random errors, which is non-increasing in the case of Huber's function,
and decreasing in the case of Tukey's function. The robust estimate for each of the model
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
483
coefficients and variance is obtained by solving the following maximization problem and using
the Lagrange function:
𝐿 (𝑝, 𝛽, πœ†0, πœ†1, πœ†2) = βˆ‘ πΏπ‘œπ‘”
𝑛
𝑖=1 (𝑝𝑖) βˆ’ πœ†0(βˆ‘ 𝑝𝑖 βˆ’ 1
𝑛
𝑖=1 ) βˆ’ π‘›πœ†1 (βˆ‘ 𝑝𝑖 πœ‘(π‘Œπ‘– βˆ’
𝑛
𝑖=1
𝑋𝑖
𝑇
𝛽) 𝑋𝑖 βˆ’ π‘›πœ†2 (βˆ‘ 𝑝𝑖 [πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)2
βˆ’ 𝜎2
])
𝑛
𝑖=1
(68)
With the same previous steps that were followed in the empirical likelihood method, the
estimation process requires first estimating the vector of probability weights by taking the
derivative of the Lagrange function (68) for each 𝑝𝑖 and equating it to zero to get an estimate of
those weights according to the following formula:
𝑝𝑖 =
1
πœ†0+π‘›πœ†1 πœ‘(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇 𝛽)𝑋𝑖+ π‘›πœ†2[πœ‘(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇𝛽)
2
βˆ’πœŽ2]
π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛
(69)
By taking the sum of both sides of equation (69) for all values of i, we get that πœ†0 = 𝑛, so we can
rewrite this equation after substituting for πœ†0 as follows:
𝑝𝑖 =
1
𝑛( 1+πœ†1πœ‘(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇 𝛽)𝑋𝑖+ πœ†2[πœ‘(π‘Œπ‘–βˆ’π‘‹π‘–
𝑇𝛽)
2
βˆ’πœŽ2]
π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛
(70)
By substituting for 𝑝𝑖 according to the equation (70) in the empirical likelihood function (68), we
get the following objective function in terms of 𝛽, πœ†1, πœ†2, 𝜎2
only:
𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2
) = βˆ’ βˆ‘ πΏπ‘œπ‘” (
𝑛
𝑖=1 1 + πœ†1 πœ‘ (π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽) 𝑋𝑖 + πœ†2 [πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)2
βˆ’ 𝜎2
] βˆ’
π‘›πΏπ‘œπ‘”π‘›
(71)
The values of the Lagrange multiples πœ†1 and πœ†2 for each of the regression model coefficients
vector 𝛽 and 𝜎2
can be obtained through the following minimization problem:
𝝀
Μ‚(𝛽, 𝜎2) = π‘Žπ‘Ÿπ‘” min
πœ†
[βˆ’ βˆ‘ πΏπ‘œπ‘” (
𝑛
𝑖=1 πœ†0 + πœ†1(πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)𝑋𝑖) + πœ†2[πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖
𝑇
𝛽)2
βˆ’
𝜎2] βˆ’ π‘›πΏπ‘œπ‘”π‘›]
(72)
Noting that the minimization problem (72) has no explicit solutions, which requires numerical
methods to find solutions to this problem, and by substituting these solutions into
𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2
) we get the logarithm of the empirical likelihood function 𝐿 (πœ†
Μ‚ (𝛽, 𝜎2
) , 𝛽 , 𝜎2
),
this function is in terms of the vector of the model coefficients 𝛽 and the variance 𝜎2
, and the
estimate of the robust empirical maximum likelihood for those coefficients is obtained by solving
the following maximization problem:
(𝛽
Μ‚, 𝜎
Μ‚2
) = π‘Žπ‘Ÿπ‘” max
(𝛽,𝜎2)
𝐿 (πœ†
Μ‚ (𝛽, 𝜎2
) , 𝛽 , 𝜎2
)
(73)
The last maximization problem is solved using numerical methods because there are no explicit
solutions for it.
4. Simulation experiments:
Simulation experiments will be conducted to compare the robust estimation methods under study
to sort out the best estimation method, based on the comparison criterion, Mean Squares Errors
(MSE). The R 4.2.1 statistical programming language has been used to write the simulation
program. The written program includes four basic stages for estimating the regression model, as
follows:
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
484
4.1: Determined default values:
The default values for the parameters were determined, based on the real data adopted in the
practical application. The real data was simulated for the random explanatory variable, which
represents the patient's blood sugar (RBS), which follows the distribution of the extreme value to
determine the default values for the two parameters of this distribution, as shown in Table (1),
and based on simulating real data for each of the response variable, red blood cell volume (PCV)
and the previously mentioned explanatory variable, the default values of the two parameters of
the simple linear regression model were determined according to formula (1) and shown in Table
(1). The values (0.3, 0.5, 0.7) were chosen as default values for the correlation coefficient ρ
between the random explanatory variable and the terms of random error. As for the approved
sample sizes, three different sample sizes were chosen (30, 70, 150), and each experiment was
repeated 1000 times.
Table (1): The default values for the two parameters of the extreme value distribution and for the
parameters of the simple linear random regression model.
ΞΈ 40 41 42 43 44
Ξ± 135 145 155 160 175
𝛽0 16 17 18 19 20
𝛽1 0.04 0.05 0.06 0.07 0.08
4.2: Data generation:
Depending on the default values of the two parameters of the extreme value distribution, the
values of the random explanatory variable that follows this distribution were generated according
to formula (2), the values of the random error terms variable were generated assuming that it
follows a normal distribution with a mean equal to zero and a variance of 𝜎2
(1 βˆ’ 𝜌2
). Note that
the values of 𝜎2
were determined based on the default values for each of the parameters ΞΈ and 𝛽1
and the correlation coefficient ρ, the generation process took place for the sizes of the three
samples mentioned previously, based on the values of the random explanatory variable and the
random error terms, the values of the response variable were generated based on the simple linear
regression model according to formula (1).
4.3: Estimation and comparison:
By adopting the three estimation methods under research and the data generated, the simple
linear regression model was estimated with a random explanatory variable that follows the
distribution of the extreme value according to the model shown in formula (1), and then
calculating the average rout mean squares errors for the estimated model and for each estimation
method according to the following formula:
𝐴𝑅𝑀𝑆𝐸 =
1
π‘Ÿ
√
1
𝑛
βˆ‘ βˆ‘ [π‘Œ
̂𝑖 βˆ’ π‘Œπ‘–]
2
𝑛
𝑖=1
π‘Ÿ
π‘˜=1 (74)
Where r represents the number of iterations for each experiment and is equal to (1000).
According to the above comparison criteria, the comparison will be made between the results of
estimation of the three methods. For each combination of the default values and the three sample
sizes, the values of the comparison criterion are summarized in Table (2). It is clear from these
results that the best estimation method is the MMLE, followed by the ME method, although the
value of the comparison criterion for the three estimation methods is close.
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
485
Table (2): Average rout mean square errors values for the random regression model estimation
results.
Estimated method ρ n A*
B*
C*
D*
E*
MMLE 0.3 30 10.70234 6.42548 7.81122 9.32264 5.06044
70 10.65632 6.25740 7.78046 9.23494 5.01819
150 10.54366 6.23644 7.67183 8.94120 4.68794
0.5 30 5.83150 3.45439 4.23047 5.12812 2.67360
70 5.79143 3.41886 4.20564 5.08658 2.65669
150 5.63819 3.41322 4.08129 4.91907 2.64957
0.7 30 3.43634 2.02950 2.51128 3.02261 1.61564
70 3.39836 2.02566 2.49299 2.98417 1.57930
150 3.29939 1.97072 2.48488 2.93176 1.50329
ME 0.3 30 10.70665 6.42715 7.81689 9.32983 5.06424
70 10.66579 6.26244 7.78304 9.23813 5.01946
150 10.57076 6.25282 7.68618 8.95571 4.69933
0.5 30 5.83400 3.45580 4.23198 5.13209 2.67408
70 5.64809 3.42608 4.20894 5.08826 2.66265
150 5.79602 3.41581 4.08827 4.92857 2.65129
0.7 30 3.43770 2.03111 2.51231 3.02370 1.61605
70 3.40131 2.02629 2.49510 2.98635 1.57994
150 3.30746 1.97426 2.48925 2.93706 1.50462
RELE 0.3 30 10.70659 6.42717 7.81678 9.32996 5.06412
70 10.66617 6.26243 7.78306 9.23817 5.01984
150 10.57200 6.25408 7.68638 8.95669 4.69955
0.5 30 5.83400 3.45583 4.23207 5.13238 2.67407
70 5.79609 3.42629 4.20912 5.08832 2.66192
150 5.64793 3.41592 4.08902 4.92877 2.65151
0.7 30 3.43772 2.03118 2.51232 3.02368 1.61606
70 3.40154 2.02629 2.49517 2.98642 1.57997
150 3.30882 1.97469 2.48968 2.93759 1.50462
* A: ΞΈ=44, Ξ±=175, Ξ²0=20, Ξ²1=0.08 B: ΞΈ=41, Ξ±=145, Ξ²0=17, Ξ²1=0.05 C: ΞΈ=42, Ξ±=155,
Ξ²0=18, Ξ²1=0.06
D: ΞΈ=43, Ξ±=165, Ξ²0=19, Ξ²1=0.07 E: ΞΈ=40, Ξ±=135, Ξ²0=16, Ξ²1=0.04
5. Practical application:
The data that was adopted in the practical application related to people with heart disease, as a
random sample of 30 patients was chosen from those who were hospitalized in Ghazi Hariri
Hospital for Specialized Surgery. For each patient, packed cell volume (PCV) and random blood
sugar (RBS) were recorded, as shown in table (3), these data were used in estimating the simple
linear relationship between (PCV) as a response variable and (RBS) as an explanatory variable, a
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
486
simple linear regression model was built with a random explanatory variable, the regression
relationship was described in the following model:
PCV𝑖 = 𝛽0 + 𝛽1RBS𝑖 + 𝑒𝑖
(75)
Table (3): Packed cell volume (PCV) and random blood sugar (RBS) of the sample observations.
No. PCV RBS No. PCV RBS No. PCV RBC
1 36 170 11 26 149 21 31 191
2 23 136 12 27 348 22 30 182
3 39 233 13 46 387 23 30 184
4 22 96 14 32 202 24 27 124
5 21 145 15 31 200 25 29 218
6 25 188 16 29 199 26 27 143
7 21 136 17 26 115 27 34 150
8 25 141 18 33 158 28 29 153
9 22 147 19 31 152 29 34 161
10 41 310 20 29 193 30 32 209
The random regression model according to formula (74) was built on the assumption that the
response variable PCV is distributed according to the normal distribution and that the
explanatory variable is distributed according to the extreme value distribution. As for the random
error terms, they follow the normal distribution independently with a mean equal to zero and a
constant variance equal to 𝜎2
(1 βˆ’ 𝜌2
). To verify these assumptions, the chi-square test was used
for goodness of fit for each of the response and explanatory variables. With regard to the data of
the response variable (PCV), It was tested whether it has a normal distribution versus not having
such a distribution, i.e. testing the following hypothesis:
𝐻0: 𝑃𝐢𝑉~π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™ 𝑑𝑖𝑠𝑑.
𝐻1: 𝑃𝐢𝑉 ≁ π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™ 𝑑𝑖𝑠𝑑.
The test results are shown in Table (5), these results indicate that the P-value associated with this
test is greater than the significance level of 0.05, and thus the null hypothesis is not rejected in
the sense that the data of the PCV variable are normally distributed, Figure (1) shows the
histogram and the normal distribution curve of the data this variable. As for the data of the RBS
response variable, it was tested to see if it follows the distribution of the extreme value,
according to the following hypothesis:
𝐻0: 𝑃𝐢𝑉~𝐸π‘₯π‘‘π‘Ÿπ‘’π‘šπ‘’ π‘‰π‘Žπ‘™π‘’π‘’ 𝑑𝑖𝑠𝑑.
𝐻1: 𝑃𝐢𝑉 ≁ 𝐸π‘₯π‘‘π‘Ÿπ‘’π‘šπ‘’ π‘‰π‘Žπ‘™π‘’π‘’π‘‘π‘–π‘ π‘‘.
The results of this test are shown in Table (5), as the P-value of this test, which is greater than the
level of significance 0.05, indicates the acceptance of the null hypothesis, meaning that the data
of this variable follow the extreme value distribution, Figure (2) shows the histogram and
distribution curve of the data of this variable.
Table (5): The chi-square goodness of fit for the response variable PCV and explanatory variable
RBS.
PCV RBS
Statistic 1.46667 6.26667
P-Value 0.916884 0.281129
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
487
Parameters πœ‡ = 29.6; 𝜎 = 5.75384 𝛼 = 157.785; πœƒ = 41.8356
Figure (1): Histogram and normal distribution curve for the response variable PCV.
Figure (2): Histogram and the extreme value distribution curve for the response variable RBS.
To estimate the coefficients of the random regression model shown in formula (75), the three
estimation methods under research will be relied upon, as although the simulation experiments
revealed the preference of the modified Maximum Likelihood method, we find that there is a
great convergence between the values of the comparison criteria RMSE for the models estimated
according to those methods.
Table (6) summarizes the results of the estimation. From these results, we find that the three
methods produced significant estimated models, depending on the statistical indicator F for a
significant level of 0.05. They also agreed that there is a positive effect of the random
explanatory variable RBS on the response variable PCV, and the value of this effect is
convergent for the three estimated models, as for the criteria of goodness of fit and accuracy of
estimation, R2
and RMSE, indicated the preference for the modified maximum likelihood
method, followed by the robust empirical likelihood method in preference. Based on the
foregoing and depending on the estimates of the modified Maximum Likelihood method as the
best method, the estimated random regression model is as follows:
𝑅𝐡𝑆
̂𝑖 = 28.0954 + 0.05963𝑃𝐢𝑉𝑖
(76)
Table (7) shows the real values of the response variable PCV and its estimated values. Figure (3)
shows the graph of the real and estimated regression curve.
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
488
Table (6): The estimate results for the three estimation methods MMLE, ME, RELE.
Estimation method Estimated parameters F statistic R2
RMSE
MMLE 𝛽0
𝛽1
28.0954
0.05963
21.71965*
43.68% 4.469
ME 𝛽0
𝛽1
16.0487
0.07463
19.316*
40.82% 4.581
RELE 𝛽0
𝛽1
16.09
0.07443
19.369*
40.89% 4.578
Table (7): The real and estimated values of the response variable PCV.
No. Real
value
Estimated
value
No. Real
value
Estimated
value
No. Real
value
Estimated
value
1 36 30.494 11 26 27.155 21 31 29.839
2 23 25.485 12 27 27.573 22 30 26.738
3 39 28.050 13 46 27.751 23 30 27.036
4 22 27.692 14 32 28.228 24 27 27.394
5 21 30.137 15 31 31.091 25 29 37.113
6 25 30.017 16 29 28.765 26 27 27.513
7 21 29.481 17 26 26.738 27 34 39.379
8 25 29.600 18 33 32.522 28 29 41.705
9 22 26.022 19 31 24.353 29 34 30.673
10 41 31.627 20 29 27.274 30 32 30.554
Figure (3): The graph of the real and estimated regression curve.
Conclusions:
The estimation of the simple linear regression model with a random explanatory variable, which
contradicts the basic assumptions of this model, is discussed. Three robust estimation methods were
used, the first is the modified maximum likelihood method, which was applied by researchers in
estimating the coefficients of this model, the second and third methods are the M estimation method
0
5
10
15
20
25
30
35
40
45
50
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
β€«Ψ§Ω„Ψ­Ω‚ΩŠΩ‚ΩŠΨ©β€¬ β€«Ψ§Ω„Ω‚ΩŠΩ…β€¬
‫الΨͺΩ‚Ψ―ΩŠΨ±ΩŠΨ©β€¬ β€«Ψ§Ω„Ω‚ΩŠΩ…β€¬
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
489
and the robust empirical likelihood method, which were employed by us in estimating the regression
model coefficients under research, as far as we know, they were not used previously in estimating this
model, but rather they were used in estimating regression models that suffer from some standard
problems, and in case the sample observations contain outliers or extreme values.
The results of the simulation experiments revealed the superiority of the modified maximum
likelihood method based on the RMSE comparison criterion. These experiments also revealed the
effectiveness of the two methods M and the robust empirical likelihood in view of the great
convergence between the estimation results of these two methods and the results of the modified
maximum likelihood method. As for the results of the practical application, they were identical to the
results of the simulation experiments.
Based on the foregoing, the two robust estimation methods M or the robust empirical likelihood can be
used in estimating the random regression model, as they are easier in terms of applications than the
modified maximum likelihood method, nor are they based on assumptions about the probability
distribution of each of the random error limits and the random explanatory variable.
Acknowledgement: We would like to thank mustansiriyah university, Baghdad-Iraq for its support in
the present work. (www.uomustansiriyah.edu.iq ).
References
1. Kerridge, D. (1967). Errors of prediction in multiple regression with stochastic regressor
variables. Technometrics, 9(2), 309-311.
2. Lai, T. L., & Wei, C. Z. (1982). Least squares estimates in stochastic regression models with
applications to identification and control of dynamic systems. The Annals of Statistics,
10(1), 154-166.
3. Tiku, M. L., & Vaughan, D. C. (1997). Logistic and nonlogistic density functions in binary
regression with nonstochastic covariates. Biometrical Journal, 39(8), 883-898.
4. Vaughan, D. C., & Tiku, M. L. (2000). Estimation and hypothesis testing for a nonnormal
bivariate distribution with applications. Mathematical and Computer Modelling, 32(1-2), 53-
67.
5. Islam, M. Q., & Tiku, M. L. (2005). Multiple linear regression model under nonnormality.
Communications in Statistics-Theory and Methods, 33(10), 2443-2467
6. Sazak, H. S., Tiku, M. L., &amp; Islam, M. Q. (2006). Regression analysis with a stochastic
design variable. International Statistical Review/Revue Internationale de Statistique, 77-88.
7. Islam, M. Q., & Tiku, M. L. (2010). Multiple linear regression model with stochastic design
variables. Journal of Applied Statistics, 37(6), 923-943
8. Bharali, S., & Hazarika, J. (2018). Regression models with stochastic regressors: An
expository note. . Int. J. Agricult. Stat. Sci. Vol. 15, No. 2, pp. 873-880, 2019.
9. Deshpande, B., & Bhat, S. V. (2019). Random Design Kernel Regression Estimator.
International Journal of Agricultural and Statistical Sciences, 15(1), 11-17.
UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
490
10. Lateef, Zainab Kareem & Ahmed, Ahmed Dheyab . ( 2021)Use of modified maximum
likelihood method to estimate parameters of the multiple linear regression model . Int. J.
Agricult. Stat. Sci. Vol. 17, Supplement 1, pp. 1255-1263, 2021.
11. Anderson, C., &amp; Schumacker, R. E. (2003). A comparison of five robust regression
methods with ordinary least squares regression: Relative efficiency, bias, and test of the null
hypothesis. Understanding Statistics: Statistical Issues in Psychology, Education, and the
Social Sciences, 2(2), 79-103.
12. Susanti, Y., Pratiwi, H., Sulistijowati, S., &amp; Liana, T. (2014). M estimation, S
estimation, and MM estimation in robust regression. International Journal of Pure and
Applied Mathematics, 91(3), 349-360 .
13. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for single functional.
Biometrika, 75(2), 237-249.
14. Γ–zdemir, Ş., &amp; Arslan, O. (2020). Combining empirical likelihood and robust estimation
methods for linear regression models. Communications in Statistics-Simulation and
Computation, 51(3), 941-954

More Related Content

Similar to Estimating random linear regression with robust methods

Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
Β 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technologypeertechzpublication
Β 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)Jeff Lail
Β 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
Β 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisSalim Azad
Β 
Sem with amos ii
Sem with amos iiSem with amos ii
Sem with amos iiJordan Sitorus
Β 
Estimation of parameters of linear econometric model and the power of test in...
Estimation of parameters of linear econometric model and the power of test in...Estimation of parameters of linear econometric model and the power of test in...
Estimation of parameters of linear econometric model and the power of test in...Alexander Decker
Β 
linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdfssuser7d5314
Β 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...BRNSS Publication Hub
Β 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...ijcsity
Β 
Model of robust regression with parametric and nonparametric methods
Model of robust regression with parametric and nonparametric methodsModel of robust regression with parametric and nonparametric methods
Model of robust regression with parametric and nonparametric methodsAlexander Decker
Β 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)AJHSSR Journal
Β 
Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptxVictoria Bozhenko
Β 
Urban strategies to promote resilient cities The case of enhancing Historic C...
Urban strategies to promote resilient cities The case of enhancing Historic C...Urban strategies to promote resilient cities The case of enhancing Historic C...
Urban strategies to promote resilient cities The case of enhancing Historic C...inventionjournals
Β 
beven 2001.pdf
beven 2001.pdfbeven 2001.pdf
beven 2001.pdfDiego Lopez
Β 
On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...IRJESJOURNAL
Β 

Similar to Estimating random linear regression with robust methods (20)

Recommender system
Recommender systemRecommender system
Recommender system
Β 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
Β 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)
Β 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Β 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
Β 
ESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSIONESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSION
Β 
Sem with amos ii
Sem with amos iiSem with amos ii
Sem with amos ii
Β 
Estimation of parameters of linear econometric model and the power of test in...
Estimation of parameters of linear econometric model and the power of test in...Estimation of parameters of linear econometric model and the power of test in...
Estimation of parameters of linear econometric model and the power of test in...
Β 
02_AJMS_441_22.pdf
02_AJMS_441_22.pdf02_AJMS_441_22.pdf
02_AJMS_441_22.pdf
Β 
linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdf
Β 
1756-0500-3-267.pdf
1756-0500-3-267.pdf1756-0500-3-267.pdf
1756-0500-3-267.pdf
Β 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
Β 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...
Β 
Model of robust regression with parametric and nonparametric methods
Model of robust regression with parametric and nonparametric methodsModel of robust regression with parametric and nonparametric methods
Model of robust regression with parametric and nonparametric methods
Β 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)
Β 
Dimesional Analysis
Dimesional Analysis Dimesional Analysis
Dimesional Analysis
Β 
Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
Β 
Urban strategies to promote resilient cities The case of enhancing Historic C...
Urban strategies to promote resilient cities The case of enhancing Historic C...Urban strategies to promote resilient cities The case of enhancing Historic C...
Urban strategies to promote resilient cities The case of enhancing Historic C...
Β 
beven 2001.pdf
beven 2001.pdfbeven 2001.pdf
beven 2001.pdf
Β 
On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...
Β 

More from rikaseorika

journalism research
journalism researchjournalism research
journalism researchrikaseorika
Β 
science research journal
science research journalscience research journal
science research journalrikaseorika
Β 
scopus publications
scopus publicationsscopus publications
scopus publicationsrikaseorika
Β 
research publish journal
research publish journalresearch publish journal
research publish journalrikaseorika
Β 
ugc journal
ugc journalugc journal
ugc journalrikaseorika
Β 
journalism research
journalism researchjournalism research
journalism researchrikaseorika
Β 
publishable paper
publishable paperpublishable paper
publishable paperrikaseorika
Β 
journal publication
journal publicationjournal publication
journal publicationrikaseorika
Β 
published journals
published journalspublished journals
published journalsrikaseorika
Β 
journal for research
journal for researchjournal for research
journal for researchrikaseorika
Β 
research publish journal
research publish journalresearch publish journal
research publish journalrikaseorika
Β 
mathematica journal
mathematica journalmathematica journal
mathematica journalrikaseorika
Β 
research paper publication journals
research paper publication journalsresearch paper publication journals
research paper publication journalsrikaseorika
Β 
journal for research
journal for researchjournal for research
journal for researchrikaseorika
Β 
ugc journal
ugc journalugc journal
ugc journalrikaseorika
Β 
ugc care list
ugc care listugc care list
ugc care listrikaseorika
Β 
research paper publication
research paper publicationresearch paper publication
research paper publicationrikaseorika
Β 
published journals
published journalspublished journals
published journalsrikaseorika
Β 
journalism research paper
journalism research paperjournalism research paper
journalism research paperrikaseorika
Β 
research on journaling
research on journalingresearch on journaling
research on journalingrikaseorika
Β 

More from rikaseorika (20)

journalism research
journalism researchjournalism research
journalism research
Β 
science research journal
science research journalscience research journal
science research journal
Β 
scopus publications
scopus publicationsscopus publications
scopus publications
Β 
research publish journal
research publish journalresearch publish journal
research publish journal
Β 
ugc journal
ugc journalugc journal
ugc journal
Β 
journalism research
journalism researchjournalism research
journalism research
Β 
publishable paper
publishable paperpublishable paper
publishable paper
Β 
journal publication
journal publicationjournal publication
journal publication
Β 
published journals
published journalspublished journals
published journals
Β 
journal for research
journal for researchjournal for research
journal for research
Β 
research publish journal
research publish journalresearch publish journal
research publish journal
Β 
mathematica journal
mathematica journalmathematica journal
mathematica journal
Β 
research paper publication journals
research paper publication journalsresearch paper publication journals
research paper publication journals
Β 
journal for research
journal for researchjournal for research
journal for research
Β 
ugc journal
ugc journalugc journal
ugc journal
Β 
ugc care list
ugc care listugc care list
ugc care list
Β 
research paper publication
research paper publicationresearch paper publication
research paper publication
Β 
published journals
published journalspublished journals
published journals
Β 
journalism research paper
journalism research paperjournalism research paper
journalism research paper
Β 
research on journaling
research on journalingresearch on journaling
research on journaling
Β 

Recently uploaded

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
Β 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
Β 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
Β 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
Β 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
Β 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
Β 
18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdfssuser54595a
Β 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
Β 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Β 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
Β 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
Β 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
Β 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
Β 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
Β 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
Β 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
Β 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
Β 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
Β 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Β 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
Β 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
Β 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
Β 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
Β 
Model Call Girl in Bikash Puri Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Bikash Puri  Delhi reach out to us at πŸ”9953056974πŸ”Model Call Girl in Bikash Puri  Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Bikash Puri Delhi reach out to us at πŸ”9953056974πŸ”
Β 
18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAΠ‘Y_INDEX-DM_23-1-final-eng.pdf
Β 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
Β 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Β 
CΓ³digo Creativo y Arte de Software | Unidad 1
CΓ³digo Creativo y Arte de Software | Unidad 1CΓ³digo Creativo y Arte de Software | Unidad 1
CΓ³digo Creativo y Arte de Software | Unidad 1
Β 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
Β 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Β 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
Β 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
Β 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
Β 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
Β 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
Β 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Β 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
Β 

Estimating random linear regression with robust methods

  • 1. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 472 Some robust methods for estimating random linear regression model Amani Emad1, Ahmed Shaker Mohammed Tahir2 1 Mustansiriyah University, College of Management and Economics Statistics, department Baghdad, Iraq, amaniemad93@uomustansiriyah.edu.iq 2 Mustansiriyah University, College of Management and Economics Statistics, department Baghdad, Iraq, ahmutwali@uomustansiriyah.edu.iq Abstract The regression model with a random explanatory variable is one of the widely used models in representing the regression relationship between variables in various economic or life phenomena. This model is called the random linear regression model. In this model, the different values of the explanatory variable occur with a certain probability, instead of being fixed in repeated samples, so it contradicts one of the basic assumptions of the linear regression model, which leads to the least squares estimators losing some or all of their optimal properties, depending on the nature of the relationship between the explanatory variable . random and random error limits. As an alternative to the ordinary least squares method, the modified maximum likelihood method, which was previously used by many researchers, was used to estimate the coefficients of the random regression model. also, two methods were employed, the M method and the robust empirical likelihood method, which were not used previously in estimating the coefficients of the random regression model, but were used in estimating linear regression models that suffer from some standard problems, or in the event that the sample values contain outliers or extreme values. The three methods are among the robust estimation methods. For the purpose of comparison between the aforementioned estimation methods, simulated experiments were conducted, which revealed the preference of the modified maximum likelihood method despite the great convergence between the estimation results of the three methods. We also carried out a practical application to estimate the regression relationship between packed cell volume (PCV) as a response variable and random blood sugar (RBS) as a random explanatory variable, based on the data of a random sample of 30 patients with heart disease. The results of the practical application were consistent with the results of the simulation experiments. Keywords: random explanatory variable; random linear regression; modified maximum likelihood; M method; robust empirical likelihood. Introduction One of the basic assumptions of the linear regression model is that the explanatory variables are constant in the repeated samples, However, this assumption may be practically unfulfilled, as is the case in most economic and life phenomena, as the explanatory variables are random, in the sense that the values of those variables are not fixed, so the different values of the explanatory
  • 2. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 473 variables occur with a certain probability. In such cases the values of the explanatory variable are determined along with the values of the response variable as a result of some probability mechanism, rather than their values being controlled by the experimenters. In the case of adopting the method of least squares to estimate the coefficients of the regression model with random explanatory variables, the estimates of that method lose some or all of their optimal properties, and that depends on the type of relationship between the explanatory variables and the random error terms. In this case, alternative methods should be sought for the method of least squares that are efficient and not greatly affected by the deviation from the assumptions of the linear regression model. Among these methods are the robust methods, including the modified maximum likelihood method (MMLE), the M method, and the robust empirical likelihood method (REL). Kerridge was the first to deal with the regression model with random explanatory variables in 1967 [1]., assuming that the values of the explanatory variables corresponding to the values of the response variable were drawn from a population with a multivariate normal distribution and independent of the random error terms, and he worked on analyzing the resulting error Regarding the estimation process, he explained that it is useful to study the regression model with random explanatory variables, despite Ehrenberg's suggestion in 1963 that studying this model is useless and limited, especially when the number of explanatory variables is large . Lai and Wei,1981, [2] , studied the properties of least squares estimators for random regression models under special assumptions of the random explanatory variable and the random error terms. Tiku and Vaugham ,1997, [3] ,used the modified maximum likelihood method in estimating the coefficients of the binary regression model with a random explanatory variable, as they showed that the use of the maximum likelihood estimators in estimating this model is intractable and difficult to solve iteratively, and they showed that the modified maximum likelihood estimators are explicit functions of the sample observations, and therefore they are easy to calculate and are approximately efficient, and for small samples, they are almost fully efficient . In the year 2000 Vaughan and Tiku , 2000, [4] , they also used the modified maximum likelihood method in estimating the regression model with a random explanatory variable, as they assumed that this variable is distributed according to the extreme value distribution, and that the conditional distribution of the response variable is the normal distribution, and that these estimates are highly efficient. They also derived the hypothesis test method. For form transactions . Islam and Tiku ,2004, [5] , derived modified maximum likelihood estimators and M estimators for the coefficients of the multiple linear regression model under the assumption that the random errors do not follow the normal distribution, as they concluded that these estimators are efficient and robust, and that the ordinary least squares estimators are less efficient . Sazak et al ,2006, [6] , used modified maximum likelihood estimators for the coefficients of the random linear regression model, which are characterized by efficiency and robustness, assuming that both the random explanatory variable and the random errors follow a generalized logistic distribution .The same result reached by Islam and Tiku ,2010, [7], in adopting the same robust method to estimate the multiple linear regression model when both the explanatory variable and the random errors do not follow the normal distribution . Bharali and Hazarika, 2019, [8] , provided a historical review of the works which related with the regression with stochastic explanatory variable, they referred to the properties of ordinary least
  • 3. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 474 squares estimators, the use of the modified maximum likelihood method, and to the case of non- normal distribution of explanatory variables and random error limits . Deshpande and V. Bhat, 2019, [9], used kernel regression estimator which is one of the non- parametric estimation methods to estimate random linear regression model, they suggested the use of adaptive Nadaraya-Watson estimator, they obtained the asymptotic distribution of that estimator, and through the relative efficiency they showed the efficiency of this estimator compared to other estimators that depend on different kernel functions . Lateef and Ahmed, 2021, [10] ,used modified maximum likelihood method to estimate the parameters of the multiple regression model when the random errors is distributed abnormal, by a simulation study they showed the performance of the modified maximum likelihood estimators comparing with the ordinary least squares estimators . In this research, the modified maximum likelihood method was used to estimate the coefficients of the linear regression model with a random explanatory variable, in addition to the use of two robust methods, the M method and the robust empirical likelihood method, these two methods were used to estimate the coefficients of the regression model in the presence of outliers or if the model suffered from some problems, however, they were not previously used in estimating the regression model with a random explanatory variable, so they will be employed to estimate the coefficients of this model. The research aims to compare the aforementioned robust estimation methods by adopting the mean square error criterion using simulation experiments with application on real data. 1. Linear regression model with a random explanatory variable: Assuming the simple linear regression model shown by the following equation: π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 (1) For a random sample of size n, π‘Œπ‘– is a random variable with independent values that represents the response variable, which is supposed to be distributed according to a normal distribution, with a mean equal to 𝐸(π‘Œπ‘–) and a variance of 𝜎2 , 𝑋𝑖 represents the explanatory variable, which is assumed to be constant for repeated samples, 𝑒𝑖 is the random errors of the model, whose values are assumed to be independent and distributed according to a normal distribution, with a mean equal to 0 and a fixed variance equal to 𝜎2 . In many practical applications, the explanatory variable is not fixed, but rather a random variable. In this case, the regression model according to equation (1) is called the random regression model [7] . The random explanatory variable can follow a normal distribution or any other probability distribution. Assuming that the probability distribution of the explanatory variable X is the extreme value distribution according to the following probability function [4] . 𝑓(π‘₯) = 1 ΞΈ 𝑒π‘₯𝑝 [βˆ’((π‘₯ βˆ’ 𝛼)/πœƒ) βˆ’ 𝑒π‘₯𝑝 (βˆ’ π‘₯βˆ’π›Ό πœƒ )] ، βˆ’ ∞ < π‘₯ < ∞ (2) Whereas, Ξ±: represents the location parameter, ΞΈ: represents the scale parameter. When Ξ± = 0 and ΞΈ = 1 we obtain the standard extreme value distribution. As for the conditional probability distribution of the response variable Y, the normal distribution is according to the following conditional probability function:
  • 4. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 475 𝑓(π‘Œ|𝑋 = π‘₯) = [2πœ‹πœŽ2(1 βˆ’ 𝜌2)]βˆ’ 1 2 𝑒π‘₯𝑝 [βˆ’ 1 2𝜎2((1 βˆ’ 𝜌2) (𝑦 βˆ’ 𝛽0 βˆ’ 𝜌 𝜎 πœƒ (π‘₯ βˆ’ 𝛼)) 2 ] βˆ’ ∞ < 𝑦 < ∞ (3) Where 𝛽0 ∈ 𝑅 , 𝛽1 = 𝜌 𝜎 πœƒ , 𝜎 > 0, βˆ’1 < 𝜌 < 1, 𝑒 = 𝑦 βˆ’ 𝛽0 βˆ’ 𝜌 𝜎 πœƒ (π‘₯ βˆ’ 𝛼), and u represented the random error terms which normally distributed with conditional mean equal to zero ( 𝐸(𝑒|𝑋 = π‘₯) = 0) and fixed conditional variance ( π‘‰π‘Žπ‘Ÿ(𝑒 / 𝑋) = 𝜎2 (1 βˆ’ 𝜌2 )),[6], note that: 𝐸(π‘Œ|𝑋 = π‘₯) = 𝛽0 βˆ’ 𝜌 𝜎 πœƒ (π‘₯ βˆ’ 𝛼) (4) 2. The robust estimation methods of the coefficients of the random linear regression model: This section includes a presentation of the robust estimation methods, the subject of research, which were adopted in estimating the simple linear regression model with a random explanatory variable. 3.1: Modified Maximum Likelihood Method (MMLE): The process of estimating the coefficients of the regression model with a random explanatory variable according to equation (1) using the method of Maximum Likelihood and depending on the probability functions (2) and (3), is difficult and useless, as the equations resulting from the derivation of the Likelihood function do not have explicit solutions and the process of solving them by iterative methods is fraught with difficulties because it does not achieve rapid convergence or convergence to false values, in addition to having multiple roots. Therefore, the modified Maximum Likelihood method proposed by Tiku and Vaugham, [3,4], is used . The joint probability density function for the response variable Y and the explanatory variable X is as follows: 𝑓(𝑦, π‘₯) = 𝑓(π‘₯)𝑓(𝑦π‘₯) = ( 𝟏 ΞΈ ) ( 𝟏 Οƒ ) (1 βˆ’ 𝜌2 )βˆ’ 1 2 exp [βˆ’ ( π‘₯ βˆ’ Ξ± ΞΈ ) βˆ’ exp (βˆ’ π‘₯ βˆ’ Ξ± ΞΈ )] 𝑒π‘₯𝑝 [βˆ’ 1 2𝜎2((1 βˆ’ 𝜌2) (𝑒𝑖)2 ] (5) Therefore, the possibility function is in the following form: 𝐿 = (ΞΈ)βˆ’π‘› (Οƒ)βˆ’π‘› (1 βˆ’ 𝜌2 )βˆ’ 𝑛 2 exp [βˆ’ βˆ‘ (( π‘₯π‘–βˆ’Ξ± ΞΈ ) + exp (βˆ’ π‘₯π‘–βˆ’Ξ± ΞΈ )) 𝑛 𝑖=1 βˆ’ 1 2𝜎2((1βˆ’πœŒ2) βˆ‘ 𝑒𝑖 2 𝑛 𝑖=1 ] (6) And that the logarithm of the Likelihood function is as in the following formula, assuming that 𝑧𝑖 = ( π’™π’Šβˆ’π›‚ ΞΈ ). LnL = βˆ’nlnΞΈ βˆ’ nlnΟƒ βˆ’ n 2 ln(1 βˆ’ ρ2) βˆ’ βˆ‘ (zi + exp(βˆ’zi)) n i=1 βˆ’ 1 2Οƒ2((1βˆ’Ο2) βˆ‘ ui 2 n i=1 (7) By partial derivation of equation (7) with respect to the coefficients (ΞΈ, Ξ±, 𝛽0, 𝛽1, 𝝈 ، ρ ) and setting them equal to zero, we obtain the following maximum likelihood equations, by solving which we obtain the estimates of those coefficients: πœ• ln 𝐿 πœ•Ξ± = 𝑛 πœƒ βˆ’ 1 πœƒ βˆ‘ exp(βˆ’π‘§π‘–) 𝑛 𝑖=1 βˆ’ 𝜌 πœƒπœŽ(1βˆ’πœŒ2) βˆ‘ 𝑒𝑖 𝑛 𝑖=1 = 0 (8)
  • 5. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 476 πœ• ln 𝐿 πœ•ΞΈ = βˆ’ 𝑛 πœƒ βˆ’ 1 πœƒ βˆ‘ 𝑧𝑖 exp(βˆ’π‘§π‘–) + 1 πœƒ 𝑛 𝑖=1 βˆ‘ 𝑧𝑖 βˆ’ 𝜌 πœƒπœŽ(1βˆ’πœŒ2) βˆ‘ 𝑧𝑖𝑒𝑖 𝑛 𝑖=1 𝑛 𝑖=1 = 0 (9) πœ• ln 𝐿 πœ•π›½0 = 1 𝜎2((1βˆ’πœŒ2) βˆ‘ 𝑒𝑖 𝑛 𝑖=1 = 0 (10) πœ• ln 𝐿 πœ•πœŽ = βˆ’ 𝑛 𝜎 + 1 𝜎3((1βˆ’πœŒ2) βˆ‘ 𝑒𝑖 2 𝑛 𝑖=1 + 𝜌 𝜎2(1βˆ’πœŒ2) βˆ‘ 𝑧𝑖𝑒𝑖 𝑛 𝑖=1 = 0 (11) πœ• ln 𝐿 πœ•Ο = π‘›πœŒ (1βˆ’πœŒ2) βˆ’ 𝜌 𝜎2(1βˆ’πœŒ2)2 βˆ‘ 𝑒𝑖 2 𝑛 𝑖=1 + 1 𝜎(1βˆ’πœŒ2) βˆ‘ 𝑧𝑖𝑒𝑖 𝑛 𝑖=1 = 0 (12) πœ• ln 𝐿 πœ•π›½1 = πœƒ 𝜎2(1βˆ’πœŒ2) βˆ‘ 𝑧𝑖𝑒𝑖 𝑛 𝑖=1 = 0 (13) Equations (8&9) contain difficult limits, which leads to the fact that the set of equations (8-13) has no explicit solutions, which must be solved using iterative methods, so the estimates of the maximum likelihood are difficult to obtain. To address this problem, we convert the explanatory variable into ordered statistics 𝒙(π’Š), 1 ≀ 𝑖 ≀ 𝑛, by arranging its value ascendingly as follows: 𝒙(𝟏) ≀ 𝒙(𝟐) ≀ β‹― ≀ 𝒙(𝒏) (14) Depending on the ordered values of the explanatory variable, π’š[π’Š] represents the values of the response variable corresponding to the ordered values 𝒙(π’Š), as well as the standard formula 𝑧𝑖 and the random errors 𝑒𝑖 are rewritten as follows: 𝒛(π’Š) = 𝒙(π’Š)βˆ’π›‚ ΞΈ (15) 𝒖[π’Š] = π’š[π’Š] βˆ’ 𝛽0 βˆ’ 𝜌 𝜎 πœƒ (𝒙(π’Š) βˆ’ 𝛼) (16) Since the total number of errors is not affected by the order of values, which leads to that βˆ‘ 𝑒[𝑖] 𝑛 𝑖=1 = 0 , and βˆ‘ 𝑧(𝑖)𝑒[𝑖] 𝑛 𝑖=1 = 0 . Based on the above, equations (8-13) are rewritten as follows: πœ• ln πΏβˆ— πœ•Ξ± ≑ 𝑛 πœƒ βˆ’ 1 πœƒ βˆ‘ exp(βˆ’π‘§(𝑖)) 𝑛 𝑖=1 = 0 (17) πœ• ln πΏβˆ— πœ•ΞΈ ≑ βˆ’ 𝑛 πœƒ + 1 πœƒ βˆ‘ 𝑧(𝑖) 𝑛 𝑖=1 βˆ’ 1 πœƒ βˆ‘ 𝑧(𝑖) exp(βˆ’π‘§(𝑖)) 𝑛 𝑖=1 = 0 (18) πœ• ln πΏβˆ— πœ•π›½0 ≑ 1 𝜎2((1βˆ’πœŒ2) βˆ‘ 𝑒[𝑖] 𝑛 𝑖=1 = 0 (19) πœ• ln πΏβˆ— πœ•πœŽ ≑ βˆ’ 𝑛 𝜎 + 1 𝜎3(1βˆ’πœŒ2) βˆ‘ 𝑒[𝑖] 2 𝑛 𝑖=1 = 0 (20)
  • 6. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 477 πœ• ln πΏβˆ— πœ•Ο ≑ π‘›πœŒ (1βˆ’πœŒ2) βˆ’ 𝜌 𝜎2(1βˆ’πœŒ2)2 βˆ‘ 𝑒[𝑖] 2 𝑛 𝑖=1 = 0 (21) πœ• ln πΏβˆ— πœ•π›½1 ≑ πœƒ 𝜎2(1βˆ’πœŒ2) βˆ‘ 𝑧(𝑖)𝑒[𝑖] 𝑛 𝑖=1 = 0 (22) Equations (17-22) represent the modified maximum likelihood equations. To achieve the linear form of exp(βˆ’π‘§(𝑖)) the first two terms of the Taylor series can be used, assuming that 𝑑(𝑖) = 𝐸[𝑧(𝑖)] as follows: exp(βˆ’π‘§(𝑖)) β‰… exp(βˆ’π‘‘(𝑖)) + [𝑧(𝑖) βˆ’ 𝑑(𝑖)] { 𝑑 𝑑𝑧 π‘’βˆ’π‘§ } 𝑧=𝑑(𝑖) (23) Which can be rewritten as follows: exp(βˆ’π‘§(𝑖)) = π‘Žπ‘– + 𝑏𝑖𝒛(π’Š) , 𝟏 ≀ π’Š ≀ 𝒏 (24) Where: π‘Žπ‘– = π‘’βˆ’π’•(π’Š)(1 + 𝒕(π’Š)) (25) 𝑏𝑖 = βˆ’ { 𝑑 𝑑𝑧 π‘’βˆ’π‘§ } 𝑧=𝒕(π’Š) (26) Noting that 𝑏𝑖 β‰₯ 0 for every i= 1,2,…, n . Equations (17) and (18) are rewritten after substituting equation (24) in each of them to become: πœ• ln πΏβˆ— πœ•Ξ± ≑ 𝑛 πœƒ βˆ’ 1 πœƒ βˆ‘ (π‘Žπ‘– + 𝑏𝑖𝑧(𝑖)) 𝑛 𝑖=1 = 0 (27) πœ• ln πΏβˆ— πœ•ΞΈ ≑ βˆ’ 𝑛 πœƒ + 1 πœƒ βˆ‘ 𝑧(𝑖) βˆ’ 1 πœƒ 𝑛 𝑖=1 βˆ‘ 𝑧(𝑖)(π‘Žπ‘– + 𝑏𝑖𝑧(𝑖)) 𝑛 𝑖=1 = 0 (28) By solving equations (27), (28) and (19-22) we obtain the so-called modified maximum likelihood estimates, which are characterized in the case of large samples as unbiased and have a variance equal to the minimum variance bounds (MVB), and that they are fully efficient, and in the case of small samples that they are mostly fully efficient, as they are explicit functions with sample observations and are easy to compute, [ 4 ]. From equation (27) and after substituting for 𝑧(𝑖) we get the estimate for parameter Ξ± as follows: 𝑛 πœƒ βˆ’ 1 πœƒ βˆ‘ ( ΞΈπ‘Žπ‘–+𝑏𝑖(π‘₯(𝑖)βˆ’Ξ± ΞΈ ) 𝑛 𝑖=1 = 0 (29) And by doing some mathematical operations we get an estimate of Ξ±: Ξ± Μ‚ = βˆ‘ 𝑏𝑖𝒙(π’Š) 𝑛 𝑖=1 +βˆ‘ (1βˆ’π‘Žπ‘–) 𝑛 𝑖=1 ΞΈ Μ‚ βˆ‘ 𝑏𝑖 𝑛 𝑖=1 (30) which can be rewritten as follows: Ξ± Μ‚ = K + DΞΈ Μ‚ (31) Where: K = βˆ‘ 𝑏𝑖𝒙(π’Š) 𝑛 𝑖=1 βˆ‘ 𝑏𝑖 𝑛 𝑖=1 and D = 1 βˆ‘ 𝑏𝑖 𝑛 𝑖=1 βˆ‘ (1 βˆ’ π‘Žπ‘–) 𝑛 𝑖=1 We get the estimate for parameter ΞΈ from equation (28) after substituting in the value of 𝑧(𝑖) and performing some mathematical operations, as shown follows:
  • 7. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 478 βˆ‘ 𝑏𝑖 𝑛 𝑖=1 (π‘₯(𝑖) βˆ’ Ξ± Μ‚)2 βˆ’ [πœƒ Μ‚(π‘›πœƒ Μ‚ + βˆ‘ (π‘₯(𝑖) βˆ’ Ξ± Μ‚) 𝑛 𝑖=1 ] βˆ’ πœƒ Μ‚ βˆ‘ π‘Žπ‘– 𝑛 𝑖=1 (π‘₯(𝑖) βˆ’ Ξ± Μ‚) = 0 (32) Equation (32) can be rewritten after replacing Ξ± Μ‚ with its equal according to equation (31) as follows: π‘›πœƒ Μ‚2 βˆ’ πœƒ Μ‚ βˆ‘ (π‘₯(𝑖) βˆ’ 𝐾) 𝑛 𝑖=1 (1 βˆ’ π‘Žπ‘–) + βˆ‘ 𝑏𝑖 𝑛 𝑖=1 (π‘₯(𝑖) βˆ’ 𝐾) 2 = 0 (33) Equation (33) can be written in a simpler form as follows: π΄πœƒ Μ‚2 βˆ’ π΅πœƒ Μ‚ + 𝐢 = 0 (34) Where: 𝐴 = 𝑛 , 𝐡 = βˆ‘ (π‘₯(𝑖) βˆ’ π‘˜) 𝑛 𝑖=1 (1 βˆ’ π‘Žπ‘–) , 𝐢 = βˆ‘ 𝑏𝑖 (π‘₯(𝑖) βˆ’ π‘˜) 2 𝑛 𝑖=1 The solution to equation (34) represents the estimation of the parameter ΞΈ, i.e.: πœƒ Μ‚ = 𝐡+√𝐡2+4π‘ŽπΆ 2π‘Ž (35) In order to obtain an estimate of the parameter 𝛽0, we substitute the value of 𝑒[𝑖] into equation (19) and perform some calculations as follows: 1 𝜎2((1βˆ’πœŒ2) βˆ‘ π’š[π’Š] βˆ’ 𝛽 Μ‚0 βˆ’ 𝜌 𝜎 πœƒ (𝒙(π’Š) βˆ’ 𝛼) 𝒏 π’Š=𝟏 = 0 𝛽 Μ‚0 = 𝑦 Μ… βˆ’ 𝜌 Μ‚ 𝜎 Μ‚ πœƒ Μ‚ ( π‘₯Μ… βˆ’ 𝛼 Μ‚) (36) To find an estimate of Οƒ, we rewrite 𝑒[𝑖] by substituting for 𝛽 Μ‚0 in equation (16) its equal according to equation (36), as follows: 𝑒[𝑖] = (𝑦[𝑖] βˆ’ 𝑦) βˆ’ 𝜌 Μ‚ ( 𝜎 Μ‚ πœƒ Μ‚) (π‘₯(𝑖) βˆ’ π‘₯) (37) Squaring both sides and taking the sum of all values of the sample size, we get: βˆ‘ 𝑒[𝑖] 2 𝑛 𝑖=1 = βˆ‘ [(𝑦[𝑖] βˆ’ 𝑦) βˆ’ 𝜌 Μ‚ ( 𝜎 Μ‚ πœƒ Μ‚) (π‘₯(𝑖) βˆ’ π‘₯) ] 2 𝑛 𝑖=1 (38) By substituting equation (38) into equation (20) and performing some mathematical operations, we get the following equation: βˆ‘ (𝑦[𝑖] βˆ’ 𝑦 Μ…)2 𝑛 𝑖=1 βˆ’ 2 βˆ‘ (𝑦[𝑖] βˆ’ 𝑦 Μ…) 𝑛 𝑖=1 ( π‘₯(𝑖) βˆ’ π‘₯Μ…)𝜌 Μ‚ 𝜎 Μ‚ πœƒ Μ‚ + 𝜌 Μ‚2𝜎 Μ‚2 πœƒ Μ‚2 βˆ‘ ( π‘₯(𝑖) βˆ’ π‘₯Μ…) 2 𝑛 𝑖=1 βˆ’ π‘›πœŽ2 + π‘›πœŽ2 𝜌2 = 0 (39) The last equation can be rewritten in terms of the variance and covariance of the sample and for both 𝑦[𝑖] and π‘₯(𝑖) as follows: n𝑠2 𝑦 βˆ’ 2𝑛𝑠π‘₯𝑦 𝑠π‘₯𝑦 πœƒ Μ‚ 𝑠2 π‘₯𝜎 Μ‚ . 𝜎 Μ‚ πœƒ Μ‚ + 𝑠2 π‘₯π‘¦πœƒ Μ‚2 𝑠4 π‘₯𝜎 Μ‚2 . 𝜎 Μ‚2 πœƒ Μ‚2 𝑛𝑠2 π‘₯ βˆ’ π‘›πœŽ2 + π‘›πœŽ2 𝑠2 π‘₯π‘¦πœƒ Μ‚2 𝑠4 π‘₯𝜎 Μ‚2 = 0 (40) The solution to the last equation represents an estimate of Οƒ, as shown below: 𝜎 Μ‚ = [𝑠2 𝑦 + 𝑠2 π‘₯𝑦 𝑠2 π‘₯ ( πœƒ Μ‚2 𝑠2 π‘₯ βˆ’ 1)] 1 2 (41) To get an estimate of 𝛽1, we substitute the value of 𝑒[𝑖] and 𝑧(𝑖) into equation (22) and perform some mathematical operations to get the following equation: βˆ‘ (π‘₯(𝑖) βˆ’ 𝛼)(𝑦[𝑖] βˆ’ 𝛽 Μ‚0) βˆ’ (π‘₯(𝑖) βˆ’ 𝛼) 2 𝛽 Μ‚1 = 0 𝑛 𝑖=1 (42) And by solving this equation, we get an estimate of 𝛽1:
  • 8. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 479 𝛽 Μ‚1 = βˆ‘ (𝒙(π’Š)βˆ’π›Ό Μ‚)(π’š[π’Š]βˆ’π›½ Μ‚0) 𝑛 𝑖=1 (𝒙(π’Š)βˆ’π›Ό Μ‚) 2 (43) Estimating the correlation parameter ρ we get from equation (21) after substituting for the sum of the squares of random errors 𝑒[𝑖] , as it can be rewritten in terms of the variance and covariance of the sample and for each of 𝑦[𝑖] and π‘₯(𝑖) after making some mathematical operations to get the following formula: [( 𝜎 Μ‚ πœƒ Μ‚) 2 𝑆π‘₯ 2 + 𝜎 Μ‚2 ] 𝜌 Μ‚2 βˆ’ 2 [( 𝜎 Μ‚ πœƒ Μ‚) 𝑆π‘₯𝑦] 𝜌 Μ‚ βˆ’ [ 𝑆π‘₯𝑦 2 𝑆π‘₯ 2 ( πœƒ Μ‚ 𝑆π‘₯ 2 βˆ’ 1)] = 0 (44) Which can be rewritten as follows: π‘ŽπœŒ Μ‚2 βˆ’ π‘πœŒ Μ‚ βˆ’ 𝑐 = 0 (45) Where: π‘Ž = [( 𝜎 Μ‚ πœƒ Μ‚) 2 𝑆π‘₯ 2 + 𝜎 Μ‚2 ] , 𝑏 = βˆ’2 [( 𝜎 Μ‚ πœƒ Μ‚) 𝑆π‘₯𝑦] , 𝑐 = βˆ’ [ 𝑆π‘₯𝑦 2 𝑆π‘₯ 2 ( πœƒ Μ‚ 𝑆π‘₯ 2 βˆ’ 1)] Solving equation (45), we get an estimate of the parameter ρ as follows: 𝜌 Μ‚ = πœƒ ̂𝑆π‘₯𝑦 𝜎 ̂𝑆π‘₯ 2 (46) Estimates of the modified maximum likelihood method were derived assuming that the probability distribution of the explanatory variable X is the extreme value distribution, but it can be any other probability distribution imposed on us by the data under research. 3.2: M-estimation Method: This method is one of the robust methods which is commonly used in estimating regression model coefficients in case these models suffer from basic problems, and in particular if the data contains outliers or extreme values. In this research, this method was used to estimate the coefficients of the regression model with one or more random explanatory variables, as far as we know, they were not used in estimating these models. This method is an extension of the Maximum Likelihood method and is symbolized by the symbol M, which indicates that these estimates are of the type of estimates of the Maximum Likelihood. The idea of this method is based on minimizing a function in random errors instead of minimizing random errors themselves, meaning that the M estimator (𝛽 ̂𝑀) is obtained from minimizing the following objective function, [ 11,12 ] . 𝛽 ̂𝑀 = min 𝛽 βˆ‘ 𝜌(𝑒𝑖) 𝑛 𝑖=1 (47) Where 𝜌(𝑒𝑖) is the objective function and it is a function of the random errors of the regression model. By substituting for the random errors 𝑒𝑖 with the standard errors, the objective function (47) can be rewritten as follows: 𝛽 ̂𝑀 = min 𝛽 βˆ‘ 𝜌 ( π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖 𝜎 ) 𝑛 𝑖=1 (48) The standard errors function ρ takes several forms, such as, Huber's formula, Tukey's bisquare formula, as shown in Table (1). By deriving the objective function in equation (48) and setting it equal to zero, we get: βˆ‘ π‘‹π‘–π‘—πœ“ ( π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖 𝜎 Μ‚ ) = 0 , 𝑗 = 0,1, … , π‘˜ 𝑛 𝑖=1 (49)
  • 9. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 480 where πœ“ represents the first derivative of the function ρ, 𝑋𝑖𝑗 is the observation i of the explanatory variable j and 𝑋𝑖0 = 1 to include the fixed term in the model, equation (49) is solved depending on the method of Iteratively reweighted least squares (IRLS) and using the weights shown in the following equation, [ 12]. 𝑀𝑖 = 𝑀(𝑒𝑖) = πœ“( π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖 𝜎 Μ‚ ) ( π‘Œπ‘–βˆ’π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖 𝜎 Μ‚ ) (50) The weights 𝑀𝑖 are shown in Table (1) for each of the two formulas of the function ρ, the estimate of the measurement parameter 𝜎 is obtained according to the following formula, [ 11]. 𝜎 Μ‚ = Median|π‘’π‘–βˆ’Median(𝑒)| 0.6745 (51) Depending on the 𝑀𝑖 weights, equation (49) can be rewritten as follows: βˆ‘ 𝑋𝑖𝑗𝑀𝑖 ( π‘Œπ‘–βˆ’π›½0βˆ’π›½1𝑋𝑖 𝜎 Μ‚ ) = 0 , 𝑗 = 0,1, … , π‘˜ 𝑛 𝑖=1 (52) Using the IRLS method, equation (52) can be solved by assuming that there are initial values for the regression model coefficients 𝛽 Μ‚0 and the estimate of the measurement parameter Οƒ Μ‚. Equation (52) can be rewritten using matrices to obtain the following equations system of weighted least squares: 𝑋′ π‘Šπ‘‹π›½ Μ‚ = 𝑋′ π‘Šπ‘Œ (53) where W is a diagonal matrix with size (n * n) and the diagonal elements represent the 𝑀𝑖 weights, by solving equation (53) we obtain an estimate of the regression coefficients according to the (M) method, which is shown in the following formula, [11] . 𝛽 ̂𝑀 = (𝑋′ π‘Šπ‘‹)βˆ’1 𝑋′ π‘Šπ‘Œ (54) Table (1): Huber and Tukey's bisquare formulas for the function (ρ) The value of the tune parameter a Weight Function Objective Functions Method 1.345 { 1 , 𝑖𝑓 |𝑒| ≀ π‘Ž π‘Ž |𝑒| , 𝑖𝑓 |𝑒| > π‘Ž { 1 2 𝑒2 , 𝑖𝑓 |𝑒| ≀ π‘Ž π‘Ž|𝑒| βˆ’ 1 2 π‘Ž2 , 𝑖𝑓 |𝑒| > π‘Ž Huber 4.685 {(1 βˆ’ ( 𝑒 π‘Ž ) 2 ) 2 , |𝑒| ≀ π‘Ž 0 , |𝑒| > π‘Ž { π‘Ž2 6 (1 βˆ’ ( 𝑒 π‘Ž ) 2 ) 3 , |𝑒| ≀ π‘Ž 1 6 π‘Ž2 , |𝑒| > π‘Ž Tukey's Bisquare 3.3: Robust Empirical Likelihood Method (REL ): The empirical likelihood method is one of the nonparametric estimation methods proposed by Owen, 1988, [ 13] which is an alternative to the maximum likelihood method in the absence of assumptions about the probability distribution of the random errors term of the regression model. This method assumes that there are probability weights 𝑝𝑖 for each of the sample observations (i=1,2,…,n ), it aims to estimate the regression model coefficients by maximizing the empirical likelihood function, which is defined as the product of those probability weights, subject to some
  • 10. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 481 restrictions related to the model coefficients that are to be estimated, , these constraints are similar to the natural equations of the method of least squares or to the equations of the likelihood under the hypothesis of normal distribution, which leads to this method being very sensitive to the non-fulfillment of the hypothesis of a normal distribution or the presence of outliers or extreme values, therefore, Ozdemir and Arslan, 2018 , [14] suggested the use of robust constraints borrowed from the objective function of the robust estimation method M, in the sense of reconciling the empirical likelihood method with the robust M method. The empirical likelihood method will first be presented as an introduction to presenting the robust empirical likelihood method . For the linear regression model shown in equation (1) and assuming 𝑝𝑖 for each (i = 1,2,…,n), represented the probability weights for each observation of the sample, which is unknown and has different values for each observation and needs to be estimated, whereas, 𝑝𝑖 β‰₯ 0, the empirical likelihood method used to estimate the vector of model coefficients 𝛽 and the population variance 𝜎2 requires the maximization of the empirical likelihood function under the availability of some constraints on the maximization process, as shown below: 𝐿𝐸𝐿(𝛽, 𝜎2) = ∏ 𝑝𝑖 𝑛 𝑖=1 m (55) Under the following constraints: βˆ‘ 𝑝𝑖 = 1 𝑛 𝑖=1 (56) βˆ‘ 𝑝𝑖 ( π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 𝑛 𝑖=1 )𝑋𝑖 = 0 (57) βˆ‘ 𝑝𝑖 [(π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖)2 βˆ’ 𝜎2 ] = 0 𝑛 𝑖=1 (58) By taking the logarithm of both sides of equation (55), we get the logarithm of the empirical likelihood function as in the following equation: πΏπ‘œπ‘”πΏ(𝛽, 𝜎2) = πΏπ‘œπ‘” ∏ 𝑝𝑖 𝑛 𝑖=1 = βˆ‘ πΏπ‘œπ‘” 𝑛 𝑖=1 (𝑝𝑖) (59) Estimating the probability weights 𝑝𝑇 = [𝑝1 𝑝2 … 𝑝𝑛] and the coefficients vector 𝛽𝑇 = [𝛽0 𝛽1] and the variance 𝜎2 , can be obtained by maximizing the logarithm of the empirical likelihood function under the realization of the three constraints according to equations (56- 58), this maximization problem can be solved by using the Lagrange multiplier method, which requires the formulation of the Lagrange function according to the following formula: 𝐿 (𝑝, 𝛽, πœ†0, πœ†1, πœ†2) = βˆ‘ πΏπ‘œπ‘” 𝑛 𝑖=1 (𝑝𝑖) βˆ’ πœ†0(βˆ‘ 𝑝𝑖 βˆ’ 1) βˆ’ 𝑛 𝑖=1 π‘›πœ†1 βˆ‘ 𝑝𝑖 (π‘Œπ‘– βˆ’ 𝑛 𝑖=1 𝑋𝑖 𝑇 𝛽) 𝑋𝑖 βˆ’ π‘›πœ†2 βˆ‘ 𝑝𝑖 [(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)2 βˆ’ 𝜎2 ] 𝑛 𝑖=1 (60) Where: πœ†0 , πœ†1, πœ†2 ∈ 𝑅 , which represented the Lagrange multiplier. The estimation process requires first estimating the vector of probability weights by taking the derivative of the Lagrange function (60) for each 𝑝𝑖 and equating it to zero, so that we can get an estimate of those weights according to the following formula:
  • 11. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 482 𝑝𝑖 = 1 πœ†0+π‘›πœ†1 (π‘Œπ‘–βˆ’π‘‹π‘– 𝑇 𝛽)𝑋𝑖+ π‘›πœ†2[(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇𝛽)2βˆ’πœŽ2] π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛 (61) By taking the sum of both sides of equation (61) for all values of i, we get that πœ†0 = 𝑛 , so we can rewrite this equation after substituting for πœ†0 as follows: 𝑝𝑖 = 1 𝑛( 1+πœ†1(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇 𝛽)𝑋𝑖+ πœ†2[(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇𝛽)2βˆ’πœŽ2] π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛 (62) By substituting for 𝑝𝑖 according to the formula (62) in the empirical likelihood function (60), we get the following objective function in terms of 𝛽, πœ†1, πœ†2, 𝜎2 only: 𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2 ) = βˆ’ βˆ‘ πΏπ‘œπ‘” ( 𝑛 𝑖=1 1 + πœ†1 (π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽) 𝑋𝑖 + πœ†2 [(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)2 βˆ’ 𝜎2 ] βˆ’ π‘›πΏπ‘œπ‘”π‘› (63) The values of the Lagrange multiplier πœ†1 and πœ†2 for each of the regression model coefficients vector 𝛽 and 𝜎2 can be obtained through the following maximization problem: πœ† Μ‚ (𝛽, 𝜎2 ) = π‘Žπ‘Ÿπ‘” min πœ† [βˆ’ βˆ‘ πΏπ‘œπ‘” ( 𝑛 𝑖=1 πœ†0 + πœ†1(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)𝑋𝑖 + πœ†2[(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)2 βˆ’ 𝜎2 ] βˆ’ π‘›πΏπ‘œπ‘”π‘›] (64) Where: πœ† = [πœ†1 πœ†2] . Noting that the problem of minimization (64) has no explicit solutions, which requires numerical methods to find solutions to this problem, by substituting those solutions in 𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2 ) we get the logarithm of the empirical likelihood function 𝐿 (πœ† Μ‚ (𝛽, 𝜎2 ) , 𝛽 , 𝜎2 ), this function is in terms of the vector of the coefficients 𝛽 and the variance 𝜎2 , and the estimation of the empirical maximum likelihood for those parameters is obtained by solving the following maximization problem: (𝛽 Μ‚, 𝜎 Μ‚2 ) = π‘Žπ‘Ÿπ‘” max (𝛽,𝜎2) 𝐿 (πœ† Μ‚(𝛽, 𝜎2) , 𝛽 , 𝜎2 ) (65) The last maximization problem is solved using numerical methods because there are no explicit solutions for it. We have shown previously that the method of the robust empirical likelihood method is the result of replacing the constraints (56- 58) with the robust constraints borrowed from the objective function of the robust estimation method M indicated by the formula (49), this modification was adopted to increase the effectiveness of the empirical likelihood method in addressing the effect of outliers on the estimation of the model's coefficients, and here we will employ this method to estimate the coefficients of the linear regression model in case the explanatory variable is random. The empirical likelihood function shown in relation (55) will be maximized after replacing the second and third constraints, relations (57) and (58), with the following robust constraints [14] βˆ‘ 𝑝𝑖 πœ‘ ( π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 𝑛 𝑖=1 )𝑋𝑖 = 0 (66) βˆ‘ 𝑝𝑖 πœ‘[(π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖)2 βˆ’ 𝜎2 ] = 0 𝑛 𝑖=1 (67) where πœ‘ is a function of random errors, which is non-increasing in the case of Huber's function, and decreasing in the case of Tukey's function. The robust estimate for each of the model
  • 12. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 483 coefficients and variance is obtained by solving the following maximization problem and using the Lagrange function: 𝐿 (𝑝, 𝛽, πœ†0, πœ†1, πœ†2) = βˆ‘ πΏπ‘œπ‘” 𝑛 𝑖=1 (𝑝𝑖) βˆ’ πœ†0(βˆ‘ 𝑝𝑖 βˆ’ 1 𝑛 𝑖=1 ) βˆ’ π‘›πœ†1 (βˆ‘ 𝑝𝑖 πœ‘(π‘Œπ‘– βˆ’ 𝑛 𝑖=1 𝑋𝑖 𝑇 𝛽) 𝑋𝑖 βˆ’ π‘›πœ†2 (βˆ‘ 𝑝𝑖 [πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)2 βˆ’ 𝜎2 ]) 𝑛 𝑖=1 (68) With the same previous steps that were followed in the empirical likelihood method, the estimation process requires first estimating the vector of probability weights by taking the derivative of the Lagrange function (68) for each 𝑝𝑖 and equating it to zero to get an estimate of those weights according to the following formula: 𝑝𝑖 = 1 πœ†0+π‘›πœ†1 πœ‘(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇 𝛽)𝑋𝑖+ π‘›πœ†2[πœ‘(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇𝛽) 2 βˆ’πœŽ2] π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛 (69) By taking the sum of both sides of equation (69) for all values of i, we get that πœ†0 = 𝑛, so we can rewrite this equation after substituting for πœ†0 as follows: 𝑝𝑖 = 1 𝑛( 1+πœ†1πœ‘(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇 𝛽)𝑋𝑖+ πœ†2[πœ‘(π‘Œπ‘–βˆ’π‘‹π‘– 𝑇𝛽) 2 βˆ’πœŽ2] π‘“π‘œπ‘Ÿ 𝑖 = 1,2, … , 𝑛 (70) By substituting for 𝑝𝑖 according to the equation (70) in the empirical likelihood function (68), we get the following objective function in terms of 𝛽, πœ†1, πœ†2, 𝜎2 only: 𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2 ) = βˆ’ βˆ‘ πΏπ‘œπ‘” ( 𝑛 𝑖=1 1 + πœ†1 πœ‘ (π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽) 𝑋𝑖 + πœ†2 [πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)2 βˆ’ 𝜎2 ] βˆ’ π‘›πΏπ‘œπ‘”π‘› (71) The values of the Lagrange multiples πœ†1 and πœ†2 for each of the regression model coefficients vector 𝛽 and 𝜎2 can be obtained through the following minimization problem: 𝝀 Μ‚(𝛽, 𝜎2) = π‘Žπ‘Ÿπ‘” min πœ† [βˆ’ βˆ‘ πΏπ‘œπ‘” ( 𝑛 𝑖=1 πœ†0 + πœ†1(πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)𝑋𝑖) + πœ†2[πœ‘(π‘Œπ‘– βˆ’ 𝑋𝑖 𝑇 𝛽)2 βˆ’ 𝜎2] βˆ’ π‘›πΏπ‘œπ‘”π‘›] (72) Noting that the minimization problem (72) has no explicit solutions, which requires numerical methods to find solutions to this problem, and by substituting these solutions into 𝐿 (𝛽, πœ†1, πœ†2 , 𝜎2 ) we get the logarithm of the empirical likelihood function 𝐿 (πœ† Μ‚ (𝛽, 𝜎2 ) , 𝛽 , 𝜎2 ), this function is in terms of the vector of the model coefficients 𝛽 and the variance 𝜎2 , and the estimate of the robust empirical maximum likelihood for those coefficients is obtained by solving the following maximization problem: (𝛽 Μ‚, 𝜎 Μ‚2 ) = π‘Žπ‘Ÿπ‘” max (𝛽,𝜎2) 𝐿 (πœ† Μ‚ (𝛽, 𝜎2 ) , 𝛽 , 𝜎2 ) (73) The last maximization problem is solved using numerical methods because there are no explicit solutions for it. 4. Simulation experiments: Simulation experiments will be conducted to compare the robust estimation methods under study to sort out the best estimation method, based on the comparison criterion, Mean Squares Errors (MSE). The R 4.2.1 statistical programming language has been used to write the simulation program. The written program includes four basic stages for estimating the regression model, as follows:
  • 13. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 484 4.1: Determined default values: The default values for the parameters were determined, based on the real data adopted in the practical application. The real data was simulated for the random explanatory variable, which represents the patient's blood sugar (RBS), which follows the distribution of the extreme value to determine the default values for the two parameters of this distribution, as shown in Table (1), and based on simulating real data for each of the response variable, red blood cell volume (PCV) and the previously mentioned explanatory variable, the default values of the two parameters of the simple linear regression model were determined according to formula (1) and shown in Table (1). The values (0.3, 0.5, 0.7) were chosen as default values for the correlation coefficient ρ between the random explanatory variable and the terms of random error. As for the approved sample sizes, three different sample sizes were chosen (30, 70, 150), and each experiment was repeated 1000 times. Table (1): The default values for the two parameters of the extreme value distribution and for the parameters of the simple linear random regression model. ΞΈ 40 41 42 43 44 Ξ± 135 145 155 160 175 𝛽0 16 17 18 19 20 𝛽1 0.04 0.05 0.06 0.07 0.08 4.2: Data generation: Depending on the default values of the two parameters of the extreme value distribution, the values of the random explanatory variable that follows this distribution were generated according to formula (2), the values of the random error terms variable were generated assuming that it follows a normal distribution with a mean equal to zero and a variance of 𝜎2 (1 βˆ’ 𝜌2 ). Note that the values of 𝜎2 were determined based on the default values for each of the parameters ΞΈ and 𝛽1 and the correlation coefficient ρ, the generation process took place for the sizes of the three samples mentioned previously, based on the values of the random explanatory variable and the random error terms, the values of the response variable were generated based on the simple linear regression model according to formula (1). 4.3: Estimation and comparison: By adopting the three estimation methods under research and the data generated, the simple linear regression model was estimated with a random explanatory variable that follows the distribution of the extreme value according to the model shown in formula (1), and then calculating the average rout mean squares errors for the estimated model and for each estimation method according to the following formula: 𝐴𝑅𝑀𝑆𝐸 = 1 π‘Ÿ √ 1 𝑛 βˆ‘ βˆ‘ [π‘Œ ̂𝑖 βˆ’ π‘Œπ‘–] 2 𝑛 𝑖=1 π‘Ÿ π‘˜=1 (74) Where r represents the number of iterations for each experiment and is equal to (1000). According to the above comparison criteria, the comparison will be made between the results of estimation of the three methods. For each combination of the default values and the three sample sizes, the values of the comparison criterion are summarized in Table (2). It is clear from these results that the best estimation method is the MMLE, followed by the ME method, although the value of the comparison criterion for the three estimation methods is close.
  • 14. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 485 Table (2): Average rout mean square errors values for the random regression model estimation results. Estimated method ρ n A* B* C* D* E* MMLE 0.3 30 10.70234 6.42548 7.81122 9.32264 5.06044 70 10.65632 6.25740 7.78046 9.23494 5.01819 150 10.54366 6.23644 7.67183 8.94120 4.68794 0.5 30 5.83150 3.45439 4.23047 5.12812 2.67360 70 5.79143 3.41886 4.20564 5.08658 2.65669 150 5.63819 3.41322 4.08129 4.91907 2.64957 0.7 30 3.43634 2.02950 2.51128 3.02261 1.61564 70 3.39836 2.02566 2.49299 2.98417 1.57930 150 3.29939 1.97072 2.48488 2.93176 1.50329 ME 0.3 30 10.70665 6.42715 7.81689 9.32983 5.06424 70 10.66579 6.26244 7.78304 9.23813 5.01946 150 10.57076 6.25282 7.68618 8.95571 4.69933 0.5 30 5.83400 3.45580 4.23198 5.13209 2.67408 70 5.64809 3.42608 4.20894 5.08826 2.66265 150 5.79602 3.41581 4.08827 4.92857 2.65129 0.7 30 3.43770 2.03111 2.51231 3.02370 1.61605 70 3.40131 2.02629 2.49510 2.98635 1.57994 150 3.30746 1.97426 2.48925 2.93706 1.50462 RELE 0.3 30 10.70659 6.42717 7.81678 9.32996 5.06412 70 10.66617 6.26243 7.78306 9.23817 5.01984 150 10.57200 6.25408 7.68638 8.95669 4.69955 0.5 30 5.83400 3.45583 4.23207 5.13238 2.67407 70 5.79609 3.42629 4.20912 5.08832 2.66192 150 5.64793 3.41592 4.08902 4.92877 2.65151 0.7 30 3.43772 2.03118 2.51232 3.02368 1.61606 70 3.40154 2.02629 2.49517 2.98642 1.57997 150 3.30882 1.97469 2.48968 2.93759 1.50462 * A: ΞΈ=44, Ξ±=175, Ξ²0=20, Ξ²1=0.08 B: ΞΈ=41, Ξ±=145, Ξ²0=17, Ξ²1=0.05 C: ΞΈ=42, Ξ±=155, Ξ²0=18, Ξ²1=0.06 D: ΞΈ=43, Ξ±=165, Ξ²0=19, Ξ²1=0.07 E: ΞΈ=40, Ξ±=135, Ξ²0=16, Ξ²1=0.04 5. Practical application: The data that was adopted in the practical application related to people with heart disease, as a random sample of 30 patients was chosen from those who were hospitalized in Ghazi Hariri Hospital for Specialized Surgery. For each patient, packed cell volume (PCV) and random blood sugar (RBS) were recorded, as shown in table (3), these data were used in estimating the simple linear relationship between (PCV) as a response variable and (RBS) as an explanatory variable, a
  • 15. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 486 simple linear regression model was built with a random explanatory variable, the regression relationship was described in the following model: PCV𝑖 = 𝛽0 + 𝛽1RBS𝑖 + 𝑒𝑖 (75) Table (3): Packed cell volume (PCV) and random blood sugar (RBS) of the sample observations. No. PCV RBS No. PCV RBS No. PCV RBC 1 36 170 11 26 149 21 31 191 2 23 136 12 27 348 22 30 182 3 39 233 13 46 387 23 30 184 4 22 96 14 32 202 24 27 124 5 21 145 15 31 200 25 29 218 6 25 188 16 29 199 26 27 143 7 21 136 17 26 115 27 34 150 8 25 141 18 33 158 28 29 153 9 22 147 19 31 152 29 34 161 10 41 310 20 29 193 30 32 209 The random regression model according to formula (74) was built on the assumption that the response variable PCV is distributed according to the normal distribution and that the explanatory variable is distributed according to the extreme value distribution. As for the random error terms, they follow the normal distribution independently with a mean equal to zero and a constant variance equal to 𝜎2 (1 βˆ’ 𝜌2 ). To verify these assumptions, the chi-square test was used for goodness of fit for each of the response and explanatory variables. With regard to the data of the response variable (PCV), It was tested whether it has a normal distribution versus not having such a distribution, i.e. testing the following hypothesis: 𝐻0: 𝑃𝐢𝑉~π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™ 𝑑𝑖𝑠𝑑. 𝐻1: 𝑃𝐢𝑉 ≁ π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™ 𝑑𝑖𝑠𝑑. The test results are shown in Table (5), these results indicate that the P-value associated with this test is greater than the significance level of 0.05, and thus the null hypothesis is not rejected in the sense that the data of the PCV variable are normally distributed, Figure (1) shows the histogram and the normal distribution curve of the data this variable. As for the data of the RBS response variable, it was tested to see if it follows the distribution of the extreme value, according to the following hypothesis: 𝐻0: 𝑃𝐢𝑉~𝐸π‘₯π‘‘π‘Ÿπ‘’π‘šπ‘’ π‘‰π‘Žπ‘™π‘’π‘’ 𝑑𝑖𝑠𝑑. 𝐻1: 𝑃𝐢𝑉 ≁ 𝐸π‘₯π‘‘π‘Ÿπ‘’π‘šπ‘’ π‘‰π‘Žπ‘™π‘’π‘’π‘‘π‘–π‘ π‘‘. The results of this test are shown in Table (5), as the P-value of this test, which is greater than the level of significance 0.05, indicates the acceptance of the null hypothesis, meaning that the data of this variable follow the extreme value distribution, Figure (2) shows the histogram and distribution curve of the data of this variable. Table (5): The chi-square goodness of fit for the response variable PCV and explanatory variable RBS. PCV RBS Statistic 1.46667 6.26667 P-Value 0.916884 0.281129
  • 16. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 487 Parameters πœ‡ = 29.6; 𝜎 = 5.75384 𝛼 = 157.785; πœƒ = 41.8356 Figure (1): Histogram and normal distribution curve for the response variable PCV. Figure (2): Histogram and the extreme value distribution curve for the response variable RBS. To estimate the coefficients of the random regression model shown in formula (75), the three estimation methods under research will be relied upon, as although the simulation experiments revealed the preference of the modified Maximum Likelihood method, we find that there is a great convergence between the values of the comparison criteria RMSE for the models estimated according to those methods. Table (6) summarizes the results of the estimation. From these results, we find that the three methods produced significant estimated models, depending on the statistical indicator F for a significant level of 0.05. They also agreed that there is a positive effect of the random explanatory variable RBS on the response variable PCV, and the value of this effect is convergent for the three estimated models, as for the criteria of goodness of fit and accuracy of estimation, R2 and RMSE, indicated the preference for the modified maximum likelihood method, followed by the robust empirical likelihood method in preference. Based on the foregoing and depending on the estimates of the modified Maximum Likelihood method as the best method, the estimated random regression model is as follows: 𝑅𝐡𝑆 ̂𝑖 = 28.0954 + 0.05963𝑃𝐢𝑉𝑖 (76) Table (7) shows the real values of the response variable PCV and its estimated values. Figure (3) shows the graph of the real and estimated regression curve.
  • 17. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 488 Table (6): The estimate results for the three estimation methods MMLE, ME, RELE. Estimation method Estimated parameters F statistic R2 RMSE MMLE 𝛽0 𝛽1 28.0954 0.05963 21.71965* 43.68% 4.469 ME 𝛽0 𝛽1 16.0487 0.07463 19.316* 40.82% 4.581 RELE 𝛽0 𝛽1 16.09 0.07443 19.369* 40.89% 4.578 Table (7): The real and estimated values of the response variable PCV. No. Real value Estimated value No. Real value Estimated value No. Real value Estimated value 1 36 30.494 11 26 27.155 21 31 29.839 2 23 25.485 12 27 27.573 22 30 26.738 3 39 28.050 13 46 27.751 23 30 27.036 4 22 27.692 14 32 28.228 24 27 27.394 5 21 30.137 15 31 31.091 25 29 37.113 6 25 30.017 16 29 28.765 26 27 27.513 7 21 29.481 17 26 26.738 27 34 39.379 8 25 29.600 18 33 32.522 28 29 41.705 9 22 26.022 19 31 24.353 29 34 30.673 10 41 31.627 20 29 27.274 30 32 30.554 Figure (3): The graph of the real and estimated regression curve. Conclusions: The estimation of the simple linear regression model with a random explanatory variable, which contradicts the basic assumptions of this model, is discussed. Three robust estimation methods were used, the first is the modified maximum likelihood method, which was applied by researchers in estimating the coefficients of this model, the second and third methods are the M estimation method 0 5 10 15 20 25 30 35 40 45 50 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 β€«Ψ§Ω„Ψ­Ω‚ΩŠΩ‚ΩŠΨ©β€¬ β€«Ψ§Ω„Ω‚ΩŠΩ…β€¬ ‫الΨͺΩ‚Ψ―ΩŠΨ±ΩŠΨ©β€¬ β€«Ψ§Ω„Ω‚ΩŠΩ…β€¬
  • 18. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 489 and the robust empirical likelihood method, which were employed by us in estimating the regression model coefficients under research, as far as we know, they were not used previously in estimating this model, but rather they were used in estimating regression models that suffer from some standard problems, and in case the sample observations contain outliers or extreme values. The results of the simulation experiments revealed the superiority of the modified maximum likelihood method based on the RMSE comparison criterion. These experiments also revealed the effectiveness of the two methods M and the robust empirical likelihood in view of the great convergence between the estimation results of these two methods and the results of the modified maximum likelihood method. As for the results of the practical application, they were identical to the results of the simulation experiments. Based on the foregoing, the two robust estimation methods M or the robust empirical likelihood can be used in estimating the random regression model, as they are easier in terms of applications than the modified maximum likelihood method, nor are they based on assumptions about the probability distribution of each of the random error limits and the random explanatory variable. Acknowledgement: We would like to thank mustansiriyah university, Baghdad-Iraq for its support in the present work. (www.uomustansiriyah.edu.iq ). References 1. Kerridge, D. (1967). Errors of prediction in multiple regression with stochastic regressor variables. Technometrics, 9(2), 309-311. 2. Lai, T. L., & Wei, C. Z. (1982). Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1), 154-166. 3. Tiku, M. L., & Vaughan, D. C. (1997). Logistic and nonlogistic density functions in binary regression with nonstochastic covariates. Biometrical Journal, 39(8), 883-898. 4. Vaughan, D. C., & Tiku, M. L. (2000). Estimation and hypothesis testing for a nonnormal bivariate distribution with applications. Mathematical and Computer Modelling, 32(1-2), 53- 67. 5. Islam, M. Q., & Tiku, M. L. (2005). Multiple linear regression model under nonnormality. Communications in Statistics-Theory and Methods, 33(10), 2443-2467 6. Sazak, H. S., Tiku, M. L., &amp; Islam, M. Q. (2006). Regression analysis with a stochastic design variable. International Statistical Review/Revue Internationale de Statistique, 77-88. 7. Islam, M. Q., & Tiku, M. L. (2010). Multiple linear regression model with stochastic design variables. Journal of Applied Statistics, 37(6), 923-943 8. Bharali, S., & Hazarika, J. (2018). Regression models with stochastic regressors: An expository note. . Int. J. Agricult. Stat. Sci. Vol. 15, No. 2, pp. 873-880, 2019. 9. Deshpande, B., & Bhat, S. V. (2019). Random Design Kernel Regression Estimator. International Journal of Agricultural and Statistical Sciences, 15(1), 11-17.
  • 19. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 490 10. Lateef, Zainab Kareem & Ahmed, Ahmed Dheyab . ( 2021)Use of modified maximum likelihood method to estimate parameters of the multiple linear regression model . Int. J. Agricult. Stat. Sci. Vol. 17, Supplement 1, pp. 1255-1263, 2021. 11. Anderson, C., &amp; Schumacker, R. E. (2003). A comparison of five robust regression methods with ordinary least squares regression: Relative efficiency, bias, and test of the null hypothesis. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 2(2), 79-103. 12. Susanti, Y., Pratiwi, H., Sulistijowati, S., &amp; Liana, T. (2014). M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3), 349-360 . 13. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for single functional. Biometrika, 75(2), 237-249. 14. Γ–zdemir, Ş., &amp; Arslan, O. (2020). Combining empirical likelihood and robust estimation methods for linear regression models. Communications in Statistics-Simulation and Computation, 51(3), 941-954