Estimating random linear regression with robust methods

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
472
Some robust methods for estimating random linear regression model
Amani Emad1, Ahmed Shaker Mohammed Tahir2
1
Mustansiriyah University, College of Management and Economics Statistics, department
Baghdad, Iraq, amaniemad93@uomustansiriyah.edu.iq
2
Mustansiriyah University, College of Management and Economics Statistics, department
Baghdad, Iraq, ahmutwali@uomustansiriyah.edu.iq
Abstract
The regression model with a random explanatory variable is one of the widely used models in
representing the regression relationship between variables in various economic or life
phenomena. This model is called the random linear regression model. In this model, the different
values of the explanatory variable occur with a certain probability, instead of being fixed in
repeated samples, so it contradicts one of the basic assumptions of the linear regression model,
which leads to the least squares estimators losing some or all of their optimal properties,
depending on the nature of the relationship between the explanatory variable . random and
random error limits.
As an alternative to the ordinary least squares method, the modified maximum likelihood
method, which was previously used by many researchers, was used to estimate the coefficients of
the random regression model. also, two methods were employed, the M method and the robust
empirical likelihood method, which were not used previously in estimating the coefficients of the
random regression model, but were used in estimating linear regression models that suffer from
some standard problems, or in the event that the sample values contain outliers or extreme
values. The three methods are among the robust estimation methods.
For the purpose of comparison between the aforementioned estimation methods, simulated
experiments were conducted, which revealed the preference of the modified maximum likelihood
method despite the great convergence between the estimation results of the three methods. We
also carried out a practical application to estimate the regression relationship between packed cell
volume (PCV) as a response variable and random blood sugar (RBS) as a random explanatory
variable, based on the data of a random sample of 30 patients with heart disease. The results of
the practical application were consistent with the results of the simulation experiments.
Keywords: random explanatory variable; random linear regression; modified maximum
likelihood; M method; robust empirical likelihood.
Introduction
One of the basic assumptions of the linear regression model is that the explanatory variables are
constant in the repeated samples, However, this assumption may be practically unfulfilled, as is
the case in most economic and life phenomena, as the explanatory variables are random, in the
sense that the values of those variables are not fixed, so the different values of the explanatory

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
473
variables occur with a certain probability. In such cases the values of the explanatory variable are
determined along with the values of the response variable as a result of some probability
mechanism, rather than their values being controlled by the experimenters.
In the case of adopting the method of least squares to estimate the coefficients of the regression
model with random explanatory variables, the estimates of that method lose some or all of their
optimal properties, and that depends on the type of relationship between the explanatory
variables and the random error terms. In this case, alternative methods should be sought for the
method of least squares that are efficient and not greatly affected by the deviation from the
assumptions of the linear regression model. Among these methods are the robust methods,
including the modified maximum likelihood method (MMLE), the M method, and the robust
empirical likelihood method (REL).
Kerridge was the first to deal with the regression model with random explanatory variables in
1967 [1]., assuming that the values of the explanatory variables corresponding to the values of
the response variable were drawn from a population with a multivariate normal distribution and
independent of the random error terms, and he worked on analyzing the resulting error Regarding
the estimation process, he explained that it is useful to study the regression model with random
explanatory variables, despite Ehrenberg's suggestion in 1963 that studying this model is useless
and limited, especially when the number of explanatory variables is large . Lai and Wei,1981, [2]
, studied the properties of least squares estimators for random regression models under special
assumptions of the random explanatory variable and the random error terms.
Tiku and Vaugham ,1997, [3] ,used the modified maximum likelihood method in estimating the
coefficients of the binary regression model with a random explanatory variable, as they showed
that the use of the maximum likelihood estimators in estimating this model is intractable and
difficult to solve iteratively, and they showed that the modified maximum likelihood estimators
are explicit functions of the sample observations, and therefore they are easy to calculate and are
approximately efficient, and for small samples, they are almost fully efficient . In the year 2000
Vaughan and Tiku , 2000, [4] , they also used the modified maximum likelihood method in
estimating the regression model with a random explanatory variable, as they assumed that this
variable is distributed according to the extreme value distribution, and that the conditional
distribution of the response variable is the normal distribution, and that these estimates are highly
efficient. They also derived the hypothesis test method. For form transactions .
Islam and Tiku ,2004, [5] , derived modified maximum likelihood estimators and M estimators
for the coefficients of the multiple linear regression model under the assumption that the random
errors do not follow the normal distribution, as they concluded that these estimators are efficient
and robust, and that the ordinary least squares estimators are less efficient .
Sazak et al ,2006, [6] , used modified maximum likelihood estimators for the coefficients of the
random linear regression model, which are characterized by efficiency and robustness, assuming
that both the random explanatory variable and the random errors follow a generalized logistic
distribution .The same result reached by Islam and Tiku ,2010, [7], in adopting the same robust
method to estimate the multiple linear regression model when both the explanatory variable and
the random errors do not follow the normal distribution .
Bharali and Hazarika, 2019, [8] , provided a historical review of the works which related with the
regression with stochastic explanatory variable, they referred to the properties of ordinary least

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
474
squares estimators, the use of the modified maximum likelihood method, and to the case of non-
normal distribution of explanatory variables and random error limits .
Deshpande and V. Bhat, 2019, [9], used kernel regression estimator which is one of the non-
parametric estimation methods to estimate random linear regression model, they suggested the
use of adaptive Nadaraya-Watson estimator, they obtained the asymptotic distribution of that
estimator, and through the relative efficiency they showed the efficiency of this estimator
compared to other estimators that depend on different kernel functions .
Lateef and Ahmed, 2021, [10] ,used modified maximum likelihood method to estimate the
parameters of the multiple regression model when the random errors is distributed abnormal, by a
simulation study they showed the performance of the modified maximum likelihood estimators
comparing with the ordinary least squares estimators .
In this research, the modified maximum likelihood method was used to estimate the coefficients
of the linear regression model with a random explanatory variable, in addition to the use of two
robust methods, the M method and the robust empirical likelihood method, these two methods
were used to estimate the coefficients of the regression model in the presence of outliers or if the
model suffered from some problems, however, they were not previously used in estimating the
regression model with a random explanatory variable, so they will be employed to estimate the
coefficients of this model.
The research aims to compare the aforementioned robust estimation methods by adopting the
mean square error criterion using simulation experiments with application on real data.
1. Linear regression model with a random explanatory variable:
Assuming the simple linear regression model shown by the following equation:
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖
(1)
For a random sample of size n, 𝑌𝑖 is a random variable with independent values that represents
the response variable, which is supposed to be distributed according to a normal distribution,
with a mean equal to 𝐸(𝑌𝑖) and a variance of 𝜎2
, 𝑋𝑖 represents the explanatory variable, which is
assumed to be constant for repeated samples, 𝑢𝑖 is the random errors of the model, whose values
are assumed to be independent and distributed according to a normal distribution, with a mean
equal to 0 and a fixed variance equal to 𝜎2
. In many practical applications, the explanatory
variable is not fixed, but rather a random variable. In this case, the regression model according to
equation (1) is called the random regression model [7] . The random explanatory variable can
follow a normal distribution or any other probability distribution.
Assuming that the probability distribution of the explanatory variable X is the extreme value
distribution according to the following probability function [4] .
𝑓(𝑥) =
1
θ
𝑒𝑥𝑝 [−((𝑥 − 𝛼)/𝜃) − 𝑒𝑥𝑝 (−
𝑥−𝛼
𝜃
)] ، − ∞ < 𝑥 < ∞
(2)
Whereas, α: represents the location parameter, θ: represents the scale parameter. When α = 0 and
θ = 1 we obtain the standard extreme value distribution. As for the conditional probability
distribution of the response variable Y, the normal distribution is according to the following
conditional probability function:

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
475
𝑓(𝑌|𝑋 = 𝑥) = [2𝜋𝜎2(1 − 𝜌2)]−
1
2 𝑒𝑥𝑝 [−
1
2𝜎2((1 − 𝜌2)
(𝑦 − 𝛽0
− 𝜌
𝜎
𝜃
(𝑥 − 𝛼))
2
] − ∞ < 𝑦 < ∞
(3)
Where 𝛽0 ∈ 𝑅 , 𝛽1 = 𝜌
𝜎
𝜃
, 𝜎 > 0, −1 < 𝜌 < 1, 𝑢 = 𝑦 − 𝛽0 − 𝜌
𝜎
𝜃
(𝑥 − 𝛼), and u represented
the random error terms which normally distributed with conditional mean equal to zero (
𝐸(𝑢|𝑋 = 𝑥) = 0) and fixed conditional variance ( 𝑉𝑎𝑟(𝑢 / 𝑋) = 𝜎2
(1 − 𝜌2
)),[6], note that:
𝐸(𝑌|𝑋 = 𝑥) = 𝛽0 − 𝜌
𝜎
𝜃
(𝑥 − 𝛼)
(4)
2. The robust estimation methods of the coefficients of the random linear regression model:
This section includes a presentation of the robust estimation methods, the subject of research,
which were adopted in estimating the simple linear regression model with a random explanatory
variable.
3.1: Modified Maximum Likelihood Method (MMLE):
The process of estimating the coefficients of the regression model with a random explanatory
variable according to equation (1) using the method of Maximum Likelihood and depending on
the probability functions (2) and (3), is difficult and useless, as the equations resulting from the
derivation of the Likelihood function do not have explicit solutions and the process of solving
them by iterative methods is fraught with difficulties because it does not achieve rapid
convergence or convergence to false values, in addition to having multiple roots. Therefore, the
modified Maximum Likelihood method proposed by Tiku and Vaugham, [3,4], is used .
The joint probability density function for the response variable Y and the explanatory variable X
is as follows:
𝑓(𝑦, 𝑥) = 𝑓(𝑥)𝑓(𝑦𝑥)
= (
𝟏
θ
) (
𝟏
σ
) (1 − 𝜌2
)−
1
2 exp [− (
𝑥 − α
θ
) − exp (−
𝑥 − α
θ
)] 𝑒𝑥𝑝 [−
1
2𝜎2((1 − 𝜌2)
(𝑢𝑖)2
]
(5)
Therefore, the possibility function is in the following form:
𝐿 = (θ)−𝑛
(σ)−𝑛
(1 − 𝜌2
)−
𝑛
2 exp [− ∑ ((
𝑥𝑖−α
θ
) + exp (−
𝑥𝑖−α
θ
))
𝑛
𝑖=1 −
1
2𝜎2((1−𝜌2)
∑ 𝑢𝑖
2
𝑛
𝑖=1 ]
(6)
And that the logarithm of the Likelihood function is as in the following formula, assuming that
𝑧𝑖 = (
𝒙𝒊−𝛂
θ
).
LnL = −nlnθ − nlnσ −
n
2
ln(1 − ρ2) − ∑ (zi + exp(−zi))
n
i=1 −
1
2σ2((1−ρ2)
∑ ui
2
n
i=1
(7)
By partial derivation of equation (7) with respect to the coefficients (θ, α, 𝛽0, 𝛽1, 𝝈 ، ρ ) and
setting them equal to zero, we obtain the following maximum likelihood equations, by solving
which we obtain the estimates of those coefficients:
𝜕 ln 𝐿
𝜕α
=
𝑛
𝜃
−
1
𝜃
∑ exp(−𝑧𝑖)
𝑛
𝑖=1 −
𝜌
𝜃𝜎(1−𝜌2)
∑ 𝑢𝑖
𝑛
𝑖=1 = 0 (8)

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
476
𝜕 ln 𝐿
𝜕θ
= −
𝑛
𝜃
−
1
𝜃
∑ 𝑧𝑖 exp(−𝑧𝑖) +
1
𝜃
𝑛
𝑖=1 ∑ 𝑧𝑖 −
𝜌
𝜃𝜎(1−𝜌2)
∑ 𝑧𝑖𝑢𝑖
𝑛
𝑖=1
𝑛
𝑖=1 = 0
(9)
𝜕 ln 𝐿
𝜕𝛽0
=
1
𝜎2((1−𝜌2)
∑ 𝑢𝑖
𝑛
𝑖=1 = 0
(10)
𝜕 ln 𝐿
𝜕𝜎
= −
𝑛
𝜎
+
1
𝜎3((1−𝜌2)
∑ 𝑢𝑖
2
𝑛
𝑖=1 +
𝜌
𝜎2(1−𝜌2)
𝑛
𝑖=1 = 0
(11)
𝜕 ln 𝐿
𝜕ρ
=
𝑛𝜌
(1−𝜌2)
−
𝜌
𝜎2(1−𝜌2)2
∑ 𝑢𝑖
2
𝑛
𝑖=1 +
1
𝜎(1−𝜌2)
𝑛
𝑖=1 = 0
(12)
𝜕 ln 𝐿
𝜕𝛽1
=
𝜃
𝜎2(1−𝜌2)
𝑛
𝑖=1 = 0
(13)
Equations (8&9) contain difficult limits, which leads to the fact that the set of equations (8-13)
has no explicit solutions, which must be solved using iterative methods, so the estimates of the
maximum likelihood are difficult to obtain. To address this problem, we convert the explanatory
variable into ordered statistics 𝒙(𝒊), 1 ≤ 𝑖 ≤ 𝑛, by arranging its value ascendingly as follows:
𝒙(𝟏) ≤ 𝒙(𝟐) ≤ ⋯ ≤ 𝒙(𝒏)
(14)
Depending on the ordered values of the explanatory variable, 𝒚[𝒊] represents the values of the
response variable corresponding to the ordered values 𝒙(𝒊), as well as the standard formula 𝑧𝑖 and
the random errors 𝑢𝑖 are rewritten as follows:
𝒛(𝒊) =
𝒙(𝒊)−𝛂
θ
(15)
𝒖[𝒊] = 𝒚[𝒊] − 𝛽0 − 𝜌
𝜎
𝜃
(𝒙(𝒊) − 𝛼)
(16)
Since the total number of errors is not affected by the order of values, which leads to that
∑ 𝑢[𝑖]
𝑛
𝑖=1 = 0 , and ∑ 𝑧(𝑖)𝑢[𝑖]
𝑛
𝑖=1 = 0 . Based on the above, equations (8-13) are rewritten as
follows:
𝜕 ln 𝐿∗
𝜕α
≡
𝑛
𝜃
−
1
𝜃
∑ exp(−𝑧(𝑖))
𝑛
𝑖=1 = 0
(17)
𝜕 ln 𝐿∗
𝜕θ
≡ −
𝑛
𝜃
+
1
𝜃
∑ 𝑧(𝑖)
𝑛
𝑖=1 −
1
𝜃
∑ 𝑧(𝑖) exp(−𝑧(𝑖))
𝑛
𝑖=1 = 0
(18)
𝜕 ln 𝐿∗
𝜕𝛽0
≡
1
𝜎2((1−𝜌2)
∑ 𝑢[𝑖]
𝑛
𝑖=1 = 0
(19)
𝜕 ln 𝐿∗
𝜕𝜎
≡ −
𝑛
𝜎
+
1
𝜎3(1−𝜌2)
∑ 𝑢[𝑖]
2
𝑛
𝑖=1 = 0
(20)

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
477
𝜕 ln 𝐿∗
𝜕ρ
≡
𝑛𝜌
(1−𝜌2)
−
𝜌
𝜎2(1−𝜌2)2
∑ 𝑢[𝑖]
2
𝑛
𝑖=1 = 0
(21)
𝜕 ln 𝐿∗
𝜕𝛽1
≡
𝜃
𝜎2(1−𝜌2)
∑ 𝑧(𝑖)𝑢[𝑖]
𝑛
𝑖=1 = 0
(22)
Equations (17-22) represent the modified maximum likelihood equations. To achieve the linear
form of exp(−𝑧(𝑖)) the first two terms of the Taylor series can be used, assuming that 𝑡(𝑖) =
𝐸[𝑧(𝑖)] as follows:
exp(−𝑧(𝑖)) ≅ exp(−𝑡(𝑖)) + [𝑧(𝑖) − 𝑡(𝑖)] {
𝑑
𝑑𝑧
𝑒−𝑧
}
𝑧=𝑡(𝑖) (23)
Which can be rewritten as follows:
exp(−𝑧(𝑖)) = 𝑎𝑖 + 𝑏𝑖𝒛(𝒊) , 𝟏 ≤ 𝒊 ≤ 𝒏
(24)
Where:
𝑎𝑖 = 𝑒−𝒕(𝒊)(1 + 𝒕(𝒊))
(25)
𝑏𝑖 = − {
𝑑
𝑑𝑧
𝑒−𝑧
}
𝑧=𝒕(𝒊) (26)
Noting that 𝑏𝑖 ≥ 0 for every i= 1,2,…, n . Equations (17) and (18) are rewritten after substituting
equation (24) in each of them to become:
𝜕 ln 𝐿∗
𝜕α
≡
𝑛
𝜃
−
1
𝜃
∑ (𝑎𝑖 + 𝑏𝑖𝑧(𝑖))
𝑛
𝑖=1 = 0
(27)
𝜕 ln 𝐿∗
𝜕θ
≡ −
𝑛
𝜃
+
1
𝜃
∑ 𝑧(𝑖) −
1
𝜃
𝑛
𝑖=1 ∑ 𝑧(𝑖)(𝑎𝑖 + 𝑏𝑖𝑧(𝑖))
𝑛
𝑖=1 = 0
(28)
By solving equations (27), (28) and (19-22) we obtain the so-called modified maximum
likelihood estimates, which are characterized in the case of large samples as unbiased and have a
variance equal to the minimum variance bounds (MVB), and that they are fully efficient, and in
the case of small samples that they are mostly fully efficient, as they are explicit functions with
sample observations and are easy to compute, [ 4 ].
From equation (27) and after substituting for 𝑧(𝑖) we get the estimate for parameter α as follows:
𝑛
𝜃
−
1
𝜃
∑ (
θ𝑎𝑖+𝑏𝑖(𝑥(𝑖)−α
θ
)
𝑛
𝑖=1 = 0
(29)
And by doing some mathematical operations we get an estimate of α:
α
̂ =
∑ 𝑏𝑖𝒙(𝒊)
𝑛
𝑖=1 +∑ (1−𝑎𝑖)
𝑛
𝑖=1 θ
̂
∑ 𝑏𝑖
𝑛
𝑖=1 (30)
which can be rewritten as follows:
α
̂ = K + Dθ
̂
(31)
Where: K =
∑ 𝑏𝑖𝒙(𝒊)
𝑛
𝑖=1
∑ 𝑏𝑖
𝑛
𝑖=1
and D =
1
∑ 𝑏𝑖
𝑛
𝑖=1
∑ (1 − 𝑎𝑖)
𝑛
𝑖=1
We get the estimate for parameter θ from equation (28) after substituting in the value of 𝑧(𝑖) and
performing some mathematical operations, as shown follows:

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
478
∑ 𝑏𝑖
𝑛
𝑖=1 (𝑥(𝑖) − α
̂)2
− [𝜃
̂(𝑛𝜃
̂ + ∑ (𝑥(𝑖) − α
̂)
𝑛
𝑖=1 ] − 𝜃
̂ ∑ 𝑎𝑖
𝑛
𝑖=1 (𝑥(𝑖) − α
̂) = 0
(32)
Equation (32) can be rewritten after replacing α
̂ with its equal according to equation (31) as
follows:
𝑛𝜃
̂2
− 𝜃
̂ ∑ (𝑥(𝑖) − 𝐾)
𝑛
𝑖=1 (1 − 𝑎𝑖) + ∑ 𝑏𝑖
𝑛
𝑖=1 (𝑥(𝑖) − 𝐾)
2
= 0
(33)
Equation (33) can be written in a simpler form as follows:
𝐴𝜃
̂2
− 𝐵𝜃
̂ + 𝐶 = 0
(34)
Where: 𝐴 = 𝑛 , 𝐵 = ∑ (𝑥(𝑖) − 𝑘)
𝑛
𝑖=1 (1 − 𝑎𝑖) , 𝐶 = ∑ 𝑏𝑖 (𝑥(𝑖) − 𝑘)
2
𝑛
𝑖=1
The solution to equation (34) represents the estimation of the parameter θ, i.e.:
𝜃
̂ =
𝐵+√𝐵2+4𝑎𝐶
2𝑎 (35)
In order to obtain an estimate of the parameter 𝛽0, we substitute the value of 𝑢[𝑖] into equation
(19) and perform some calculations as follows:
1
𝜎2((1−𝜌2)
∑ 𝒚[𝒊] − 𝛽
̂0 − 𝜌
𝜎
𝜃
(𝒙(𝒊) − 𝛼)
𝒏
𝒊=𝟏 = 0
𝛽
̂0 = 𝑦
̅ − 𝜌
̂
𝜎
̂
𝜃
̂ ( 𝑥̅ − 𝛼
̂)
(36)
To find an estimate of σ, we rewrite 𝑢[𝑖] by substituting for 𝛽
̂0 in equation (16) its equal
according to equation (36), as follows:
𝑢[𝑖] = (𝑦[𝑖] − 𝑦) − 𝜌
̂ (
𝜎
̂
𝜃
̂) (𝑥(𝑖) − 𝑥)
(37)
Squaring both sides and taking the sum of all values of the sample size, we get:
∑ 𝑢[𝑖]
2
𝑛
𝑖=1 = ∑ [(𝑦[𝑖] − 𝑦) − 𝜌
̂ (
𝜎
̂
𝜃
̂) (𝑥(𝑖) − 𝑥) ]
2
𝑛
𝑖=1
(38)
By substituting equation (38) into equation (20) and performing some mathematical operations,
we get the following equation:
∑ (𝑦[𝑖] − 𝑦
̅)2
𝑛
𝑖=1 − 2 ∑ (𝑦[𝑖] − 𝑦
̅)
𝑛
𝑖=1 ( 𝑥(𝑖) − 𝑥̅)𝜌
̂
𝜎
̂
𝜃
̂ +
𝜌
̂2𝜎
̂2
𝜃
̂2
∑ ( 𝑥(𝑖) − 𝑥̅)
2
𝑛
𝑖=1 − 𝑛𝜎2
+
𝑛𝜎2
𝜌2
= 0
(39)
The last equation can be rewritten in terms of the variance and covariance of the sample and for
both 𝑦[𝑖] and 𝑥(𝑖) as follows:
n𝑠2
𝑦 − 2𝑛𝑠𝑥𝑦
𝑠𝑥𝑦 𝜃
̂
𝑠2
𝑥𝜎
̂
.
𝜎
̂
𝜃
̂ +
𝑠2
𝑥𝑦𝜃
̂2
𝑠4
𝑥𝜎
̂2
.
𝜎
̂2
𝜃
̂2
𝑛𝑠2
𝑥 − 𝑛𝜎2
+ 𝑛𝜎2 𝑠2
𝑥𝑦𝜃
̂2
𝑠4
𝑥𝜎
̂2
= 0
(40)
The solution to the last equation represents an estimate of σ, as shown below:
𝜎
̂ = [𝑠2
𝑦 +
𝑠2
𝑥𝑦
𝑠2
𝑥
(
𝜃
̂2
𝑠2
𝑥
− 1)]
1
2
(41)
To get an estimate of 𝛽1, we substitute the value of 𝑢[𝑖] and 𝑧(𝑖) into equation (22) and perform
some mathematical operations to get the following equation:
∑ (𝑥(𝑖) − 𝛼)(𝑦[𝑖] − 𝛽
̂0) − (𝑥(𝑖) − 𝛼)
2
𝛽
̂1 = 0
𝑛
𝑖=1
(42)
And by solving this equation, we get an estimate of 𝛽1:

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
479
𝛽
̂1 =
∑ (𝒙(𝒊)−𝛼
̂)(𝒚[𝒊]−𝛽
̂0)
𝑛
𝑖=1
(𝒙(𝒊)−𝛼
̂)
2
(43)
Estimating the correlation parameter ρ we get from equation (21) after substituting for the sum of
the squares of random errors 𝑢[𝑖] , as it can be rewritten in terms of the variance and covariance
of the sample and for each of 𝑦[𝑖] and 𝑥(𝑖) after making some mathematical operations to get the
following formula:
[(
𝜎
̂
𝜃
̂)
2
𝑆𝑥
2
+ 𝜎
̂2
] 𝜌
̂2
− 2 [(
𝜎
̂
𝜃
̂) 𝑆𝑥𝑦] 𝜌
̂ − [
𝑆𝑥𝑦
2
𝑆𝑥
2 (
𝜃
̂
𝑆𝑥
2 − 1)] = 0
(44)
Which can be rewritten as follows:
𝑎𝜌
̂2
− 𝑏𝜌
̂ − 𝑐 = 0
(45)
Where: 𝑎 = [(
𝜎
̂
𝜃
̂)
2
𝑆𝑥
2
+ 𝜎
̂2
] , 𝑏 = −2 [(
𝜎
̂
𝜃
̂) 𝑆𝑥𝑦] , 𝑐 = − [
𝑆𝑥𝑦
2
𝑆𝑥
2 (
𝜃
̂
𝑆𝑥
2 − 1)]
Solving equation (45), we get an estimate of the parameter ρ as follows:
𝜌
̂ =
𝜃
̂𝑆𝑥𝑦
𝜎
̂𝑆𝑥
2
(46)
Estimates of the modified maximum likelihood method were derived assuming that the
probability distribution of the explanatory variable X is the extreme value distribution, but it can
be any other probability distribution imposed on us by the data under research.
3.2: M-estimation Method:
This method is one of the robust methods which is commonly used in estimating regression
model coefficients in case these models suffer from basic problems, and in particular if the data
contains outliers or extreme values. In this research, this method was used to estimate the
coefficients of the regression model with one or more random explanatory variables, as far as we
know, they were not used in estimating these models.
This method is an extension of the Maximum Likelihood method and is symbolized by the
symbol M, which indicates that these estimates are of the type of estimates of the Maximum
Likelihood. The idea of this method is based on minimizing a function in random errors instead
of minimizing random errors themselves, meaning that the M estimator (𝛽
̂𝑀) is obtained from
minimizing the following objective function, [ 11,12 ] .
𝛽
̂𝑀 = min
𝛽
∑ 𝜌(𝑢𝑖)
𝑛
𝑖=1
(47)
Where 𝜌(𝑢𝑖) is the objective function and it is a function of the random errors of the regression
model. By substituting for the random errors 𝑢𝑖 with the standard errors, the objective function
(47) can be rewritten as follows:
𝛽
̂𝑀 = min
𝛽
∑ 𝜌 (
𝑌𝑖−𝛽0−𝛽1𝑋𝑖
𝜎
)
𝑛
𝑖=1
(48)
The standard errors function ρ takes several forms, such as, Huber's formula, Tukey's bisquare
formula, as shown in Table (1). By deriving the objective function in equation (48) and setting it
equal to zero, we get:
∑ 𝑋𝑖𝑗𝜓 (
𝜎
̂
) = 0 , 𝑗 = 0,1, … , 𝑘
𝑛
𝑖=1
(49)

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
480
where 𝜓 represents the first derivative of the function ρ, 𝑋𝑖𝑗 is the observation i of the
explanatory variable j and 𝑋𝑖0 = 1 to include the fixed term in the model, equation (49) is solved
depending on the method of Iteratively reweighted least squares (IRLS) and using the weights
shown in the following equation, [ 12].
𝑤𝑖 = 𝑤(𝑒𝑖) =
𝜓(
𝜎
̂
)
(
𝑌𝑖−𝑌𝑖−𝛽0−𝛽1𝑋𝑖
𝜎
̂
)
(50)
The weights 𝑤𝑖 are shown in Table (1) for each of the two formulas of the function ρ, the
estimate of the measurement parameter 𝜎 is obtained according to the following formula, [ 11].
𝜎
̂ =
Median|𝑢𝑖−Median(𝑢)|
0.6745 (51)
Depending on the 𝑤𝑖 weights, equation (49) can be rewritten as follows:
∑ 𝑋𝑖𝑗𝑤𝑖 (
𝜎
̂
) = 0 , 𝑗 = 0,1, … , 𝑘
𝑛
𝑖=1
(52)
Using the IRLS method, equation (52) can be solved by assuming that there are initial values for
the regression model coefficients 𝛽
̂0
and the estimate of the measurement parameter σ ̂. Equation
(52) can be rewritten using matrices to obtain the following equations system of weighted least
squares:
𝑋′
𝑊𝑋𝛽
̂ = 𝑋′
𝑊𝑌
(53)
where W is a diagonal matrix with size (n * n) and the diagonal elements represent the 𝑤𝑖
weights, by solving equation (53) we obtain an estimate of the regression coefficients according
to the (M) method, which is shown in the following formula, [11] .
𝛽
̂𝑀 = (𝑋′
𝑊𝑋)−1
𝑋′
𝑊𝑌
(54)
Table (1): Huber and Tukey's bisquare formulas for the function (ρ)
The value of the tune
parameter a
Weight Function
Objective Functions
Method
1.345
{
1 , 𝑖𝑓 |𝑒| ≤ 𝑎
𝑎
|𝑒|
, 𝑖𝑓 |𝑒| > 𝑎
{
1
2
𝑒2
, 𝑖𝑓 |𝑒| ≤ 𝑎
𝑎|𝑒| −
1
2
𝑎2
, 𝑖𝑓 |𝑒| > 𝑎
Huber
4.685
{(1 − (
𝑒
𝑎
)
2
)
2
, |𝑒| ≤ 𝑎
0 , |𝑒| > 𝑎
{
𝑎2
6
(1 − (
𝑒
𝑎
)
2
)
3
, |𝑒| ≤ 𝑎
1
6
𝑎2
, |𝑒| > 𝑎
Tukey's
Bisquare
3.3: Robust Empirical Likelihood Method (REL ):
The empirical likelihood method is one of the nonparametric estimation methods proposed by
Owen, 1988, [ 13] which is an alternative to the maximum likelihood method in the absence of
assumptions about the probability distribution of the random errors term of the regression model.
This method assumes that there are probability weights 𝑝𝑖 for each of the sample observations
(i=1,2,…,n ), it aims to estimate the regression model coefficients by maximizing the empirical
likelihood function, which is defined as the product of those probability weights, subject to some

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
481
restrictions related to the model coefficients that are to be estimated, , these constraints are
similar to the natural equations of the method of least squares or to the equations of the
likelihood under the hypothesis of normal distribution, which leads to this method being very
sensitive to the non-fulfillment of the hypothesis of a normal distribution or the presence of
outliers or extreme values, therefore, Ozdemir and Arslan, 2018 , [14] suggested the use of robust
constraints borrowed from the objective function of the robust estimation method M, in the sense
of reconciling the empirical likelihood method with the robust M method.
The empirical likelihood method will first be presented as an introduction to presenting the
robust empirical likelihood method .
For the linear regression model shown in equation (1) and assuming 𝑝𝑖 for each (i = 1,2,…,n),
represented the probability weights for each observation of the sample, which is unknown and
has different values for each observation and needs to be estimated, whereas, 𝑝𝑖 ≥ 0, the
empirical likelihood method used to estimate the vector of model coefficients 𝛽 and the
population variance 𝜎2
requires the maximization of the empirical likelihood function under the
availability of some constraints on the maximization process, as shown below:
𝐿𝐸𝐿(𝛽, 𝜎2) = ∏ 𝑝𝑖
𝑛
𝑖=1 m
(55)
Under the following constraints:
∑ 𝑝𝑖 = 1
𝑛
𝑖=1
(56)
∑ 𝑝𝑖 ( 𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖
𝑛
𝑖=1 )𝑋𝑖 = 0
(57)
∑ 𝑝𝑖 [(𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖)2
− 𝜎2
] = 0
𝑛
𝑖=1
(58)
By taking the logarithm of both sides of equation (55), we get the logarithm of the empirical
likelihood function as in the following equation:
𝐿𝑜𝑔𝐿(𝛽, 𝜎2) = 𝐿𝑜𝑔 ∏ 𝑝𝑖
𝑛
𝑖=1 = ∑ 𝐿𝑜𝑔
𝑛
𝑖=1 (𝑝𝑖)
(59)
Estimating the probability weights 𝑝𝑇
= [𝑝1 𝑝2 … 𝑝𝑛] and the coefficients vector 𝛽𝑇
= [𝛽0 𝛽1]
and the variance 𝜎2
, can be obtained by maximizing the logarithm of the empirical likelihood
function under the realization of the three constraints according to equations (56- 58), this
maximization problem can be solved by using the Lagrange multiplier method, which requires
the formulation of the Lagrange function according to the following formula:
𝐿 (𝑝, 𝛽, 𝜆0, 𝜆1, 𝜆2) = ∑ 𝐿𝑜𝑔
𝑛
𝑖=1 (𝑝𝑖) − 𝜆0(∑ 𝑝𝑖 − 1) −
𝑛
𝑖=1 𝑛𝜆1 ∑ 𝑝𝑖 (𝑌𝑖 −
𝑛
𝑖=1
𝑋𝑖
𝑇
𝛽) 𝑋𝑖 − 𝑛𝜆2 ∑ 𝑝𝑖 [(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)2
− 𝜎2
]
𝑛
𝑖=1
(60)
Where: 𝜆0 , 𝜆1, 𝜆2 ∈ 𝑅 , which represented the Lagrange multiplier. The estimation process
requires first estimating the vector of probability weights by taking the derivative of the
Lagrange function (60) for each 𝑝𝑖 and equating it to zero, so that we can get an estimate of those
weights according to the following formula:

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
482
𝑝𝑖 =
1
𝜆0+𝑛𝜆1 (𝑌𝑖−𝑋𝑖
𝑇 𝛽)𝑋𝑖+ 𝑛𝜆2[(𝑌𝑖−𝑋𝑖
𝑇𝛽)2−𝜎2]
𝑓𝑜𝑟 𝑖 = 1,2, … , 𝑛
(61)
By taking the sum of both sides of equation (61) for all values of i, we get that 𝜆0 = 𝑛 , so we
can rewrite this equation after substituting for 𝜆0 as follows:
𝑝𝑖 =
1
𝑛( 1+𝜆1(𝑌𝑖−𝑋𝑖
𝑇 𝛽)𝑋𝑖+ 𝜆2[(𝑌𝑖−𝑋𝑖
𝑇𝛽)2−𝜎2]
𝑓𝑜𝑟 𝑖 = 1,2, … , 𝑛
(62)
By substituting for 𝑝𝑖 according to the formula (62) in the empirical likelihood function (60), we
get the following objective function in terms of 𝛽, 𝜆1, 𝜆2, 𝜎2
only:
𝐿 (𝛽, 𝜆1, 𝜆2 , 𝜎2
) = − ∑ 𝐿𝑜𝑔 (
𝑛
𝑖=1 1 + 𝜆1 (𝑌𝑖 − 𝑋𝑖
𝑇
𝛽) 𝑋𝑖 + 𝜆2 [(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)2
− 𝜎2
] −
𝑛𝐿𝑜𝑔𝑛
(63)
The values of the Lagrange multiplier 𝜆1 and 𝜆2 for each of the regression model coefficients
vector 𝛽 and 𝜎2
can be obtained through the following maximization problem:
𝜆
̂ (𝛽, 𝜎2
) = 𝑎𝑟𝑔 min
𝜆
[− ∑ 𝐿𝑜𝑔 (
𝑛
𝑖=1 𝜆0 + 𝜆1(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)𝑋𝑖 + 𝜆2[(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)2
− 𝜎2
] −
𝑛𝐿𝑜𝑔𝑛]
(64)
Where: 𝜆 = [𝜆1 𝜆2] . Noting that the problem of minimization (64) has no explicit solutions,
which requires numerical methods to find solutions to this problem, by substituting those
solutions in 𝐿 (𝛽, 𝜆1, 𝜆2 , 𝜎2
) we get the logarithm of the empirical likelihood function
𝐿 (𝜆
̂ (𝛽, 𝜎2
) , 𝛽 , 𝜎2
), this function is in terms of the vector of the coefficients 𝛽 and the variance
𝜎2
, and the estimation of the empirical maximum likelihood for those parameters is obtained by
solving the following maximization problem:
(𝛽
̂, 𝜎
̂2
) = 𝑎𝑟𝑔 max
(𝛽,𝜎2)
𝐿 (𝜆
̂(𝛽, 𝜎2) , 𝛽 , 𝜎2
)
(65)
The last maximization problem is solved using numerical methods because there are no explicit
solutions for it.
We have shown previously that the method of the robust empirical likelihood method is the result
of replacing the constraints (56- 58) with the robust constraints borrowed from the objective
function of the robust estimation method M indicated by the formula (49), this modification was
adopted to increase the effectiveness of the empirical likelihood method in addressing the effect
of outliers on the estimation of the model's coefficients, and here we will employ this method to
estimate the coefficients of the linear regression model in case the explanatory variable is
random. The empirical likelihood function shown in relation (55) will be maximized after
replacing the second and third constraints, relations (57) and (58), with the following robust
constraints [14]
∑ 𝑝𝑖 𝜑 ( 𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖
𝑛
𝑖=1 )𝑋𝑖 = 0
(66)
∑ 𝑝𝑖 𝜑[(𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖)2
− 𝜎2
] = 0
𝑛
𝑖=1
(67)
where 𝜑 is a function of random errors, which is non-increasing in the case of Huber's function,
and decreasing in the case of Tukey's function. The robust estimate for each of the model

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
483
coefficients and variance is obtained by solving the following maximization problem and using
the Lagrange function:
𝐿 (𝑝, 𝛽, 𝜆0, 𝜆1, 𝜆2) = ∑ 𝐿𝑜𝑔
𝑛
𝑖=1 (𝑝𝑖) − 𝜆0(∑ 𝑝𝑖 − 1
𝑛
𝑖=1 ) − 𝑛𝜆1 (∑ 𝑝𝑖 𝜑(𝑌𝑖 −
𝑛
𝑖=1
𝑋𝑖
𝑇
𝛽) 𝑋𝑖 − 𝑛𝜆2 (∑ 𝑝𝑖 [𝜑(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)2
− 𝜎2
])
𝑛
𝑖=1
(68)
With the same previous steps that were followed in the empirical likelihood method, the
estimation process requires first estimating the vector of probability weights by taking the
derivative of the Lagrange function (68) for each 𝑝𝑖 and equating it to zero to get an estimate of
those weights according to the following formula:
𝑝𝑖 =
1
𝜆0+𝑛𝜆1 𝜑(𝑌𝑖−𝑋𝑖
𝑇 𝛽)𝑋𝑖+ 𝑛𝜆2[𝜑(𝑌𝑖−𝑋𝑖
𝑇𝛽)
2
−𝜎2]
𝑓𝑜𝑟 𝑖 = 1,2, … , 𝑛
(69)
By taking the sum of both sides of equation (69) for all values of i, we get that 𝜆0 = 𝑛, so we can
rewrite this equation after substituting for 𝜆0 as follows:
𝑝𝑖 =
1
𝑛( 1+𝜆1𝜑(𝑌𝑖−𝑋𝑖
𝑇 𝛽)𝑋𝑖+ 𝜆2[𝜑(𝑌𝑖−𝑋𝑖
𝑇𝛽)
2
−𝜎2]
𝑓𝑜𝑟 𝑖 = 1,2, … , 𝑛
(70)
By substituting for 𝑝𝑖 according to the equation (70) in the empirical likelihood function (68), we
get the following objective function in terms of 𝛽, 𝜆1, 𝜆2, 𝜎2
only:
𝐿 (𝛽, 𝜆1, 𝜆2 , 𝜎2
) = − ∑ 𝐿𝑜𝑔 (
𝑛
𝑖=1 1 + 𝜆1 𝜑 (𝑌𝑖 − 𝑋𝑖
𝑇
𝛽) 𝑋𝑖 + 𝜆2 [𝜑(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)2
− 𝜎2
] −
𝑛𝐿𝑜𝑔𝑛
(71)
The values of the Lagrange multiples 𝜆1 and 𝜆2 for each of the regression model coefficients
vector 𝛽 and 𝜎2
can be obtained through the following minimization problem:
𝝀
̂(𝛽, 𝜎2) = 𝑎𝑟𝑔 min
𝜆
[− ∑ 𝐿𝑜𝑔 (
𝑛
𝑖=1 𝜆0 + 𝜆1(𝜑(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)𝑋𝑖) + 𝜆2[𝜑(𝑌𝑖 − 𝑋𝑖
𝑇
𝛽)2
−
𝜎2] − 𝑛𝐿𝑜𝑔𝑛]
(72)
Noting that the minimization problem (72) has no explicit solutions, which requires numerical
methods to find solutions to this problem, and by substituting these solutions into
𝐿 (𝛽, 𝜆1, 𝜆2 , 𝜎2
) we get the logarithm of the empirical likelihood function 𝐿 (𝜆
̂ (𝛽, 𝜎2
) , 𝛽 , 𝜎2
),
this function is in terms of the vector of the model coefficients 𝛽 and the variance 𝜎2
, and the
estimate of the robust empirical maximum likelihood for those coefficients is obtained by solving
the following maximization problem:
(𝛽
̂, 𝜎
̂2
) = 𝑎𝑟𝑔 max
(𝛽,𝜎2)
𝐿 (𝜆
̂ (𝛽, 𝜎2
) , 𝛽 , 𝜎2
)
(73)
The last maximization problem is solved using numerical methods because there are no explicit
solutions for it.
4. Simulation experiments:
Simulation experiments will be conducted to compare the robust estimation methods under study
to sort out the best estimation method, based on the comparison criterion, Mean Squares Errors
(MSE). The R 4.2.1 statistical programming language has been used to write the simulation
program. The written program includes four basic stages for estimating the regression model, as
follows:

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
484
4.1: Determined default values:
The default values for the parameters were determined, based on the real data adopted in the
practical application. The real data was simulated for the random explanatory variable, which
represents the patient's blood sugar (RBS), which follows the distribution of the extreme value to
determine the default values for the two parameters of this distribution, as shown in Table (1),
and based on simulating real data for each of the response variable, red blood cell volume (PCV)
and the previously mentioned explanatory variable, the default values of the two parameters of
the simple linear regression model were determined according to formula (1) and shown in Table
(1). The values (0.3, 0.5, 0.7) were chosen as default values for the correlation coefficient ρ
between the random explanatory variable and the terms of random error. As for the approved
sample sizes, three different sample sizes were chosen (30, 70, 150), and each experiment was
repeated 1000 times.
Table (1): The default values for the two parameters of the extreme value distribution and for the
parameters of the simple linear random regression model.
θ 40 41 42 43 44
α 135 145 155 160 175
𝛽0 16 17 18 19 20
𝛽1 0.04 0.05 0.06 0.07 0.08
4.2: Data generation:
Depending on the default values of the two parameters of the extreme value distribution, the
values of the random explanatory variable that follows this distribution were generated according
to formula (2), the values of the random error terms variable were generated assuming that it
follows a normal distribution with a mean equal to zero and a variance of 𝜎2
(1 − 𝜌2
). Note that
the values of 𝜎2
were determined based on the default values for each of the parameters θ and 𝛽1
and the correlation coefficient ρ, the generation process took place for the sizes of the three
samples mentioned previously, based on the values of the random explanatory variable and the
random error terms, the values of the response variable were generated based on the simple linear
regression model according to formula (1).
4.3: Estimation and comparison:
By adopting the three estimation methods under research and the data generated, the simple
linear regression model was estimated with a random explanatory variable that follows the
distribution of the extreme value according to the model shown in formula (1), and then
calculating the average rout mean squares errors for the estimated model and for each estimation
method according to the following formula:
𝐴𝑅𝑀𝑆𝐸 =
1
𝑟
√
1
𝑛
∑ ∑ [𝑌
̂𝑖 − 𝑌𝑖]
2
𝑛
𝑖=1
𝑟
𝑘=1 (74)
Where r represents the number of iterations for each experiment and is equal to (1000).
According to the above comparison criteria, the comparison will be made between the results of
estimation of the three methods. For each combination of the default values and the three sample
sizes, the values of the comparison criterion are summarized in Table (2). It is clear from these
results that the best estimation method is the MMLE, followed by the ME method, although the
value of the comparison criterion for the three estimation methods is close.

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
485
Table (2): Average rout mean square errors values for the random regression model estimation
results.
Estimated method ρ n A*
B*
C*
D*
E*
MMLE 0.3 30 10.70234 6.42548 7.81122 9.32264 5.06044
70 10.65632 6.25740 7.78046 9.23494 5.01819
150 10.54366 6.23644 7.67183 8.94120 4.68794
0.5 30 5.83150 3.45439 4.23047 5.12812 2.67360
70 5.79143 3.41886 4.20564 5.08658 2.65669
150 5.63819 3.41322 4.08129 4.91907 2.64957
0.7 30 3.43634 2.02950 2.51128 3.02261 1.61564
70 3.39836 2.02566 2.49299 2.98417 1.57930
150 3.29939 1.97072 2.48488 2.93176 1.50329
ME 0.3 30 10.70665 6.42715 7.81689 9.32983 5.06424
70 10.66579 6.26244 7.78304 9.23813 5.01946
150 10.57076 6.25282 7.68618 8.95571 4.69933
0.5 30 5.83400 3.45580 4.23198 5.13209 2.67408
70 5.64809 3.42608 4.20894 5.08826 2.66265
150 5.79602 3.41581 4.08827 4.92857 2.65129
0.7 30 3.43770 2.03111 2.51231 3.02370 1.61605
70 3.40131 2.02629 2.49510 2.98635 1.57994
150 3.30746 1.97426 2.48925 2.93706 1.50462
RELE 0.3 30 10.70659 6.42717 7.81678 9.32996 5.06412
70 10.66617 6.26243 7.78306 9.23817 5.01984
150 10.57200 6.25408 7.68638 8.95669 4.69955
0.5 30 5.83400 3.45583 4.23207 5.13238 2.67407
70 5.79609 3.42629 4.20912 5.08832 2.66192
150 5.64793 3.41592 4.08902 4.92877 2.65151
0.7 30 3.43772 2.03118 2.51232 3.02368 1.61606
70 3.40154 2.02629 2.49517 2.98642 1.57997
150 3.30882 1.97469 2.48968 2.93759 1.50462
* A: θ=44, α=175, β0=20, β1=0.08 B: θ=41, α=145, β0=17, β1=0.05 C: θ=42, α=155,
β0=18, β1=0.06
D: θ=43, α=165, β0=19, β1=0.07 E: θ=40, α=135, β0=16, β1=0.04
5. Practical application:
The data that was adopted in the practical application related to people with heart disease, as a
random sample of 30 patients was chosen from those who were hospitalized in Ghazi Hariri
Hospital for Specialized Surgery. For each patient, packed cell volume (PCV) and random blood
sugar (RBS) were recorded, as shown in table (3), these data were used in estimating the simple
linear relationship between (PCV) as a response variable and (RBS) as an explanatory variable, a

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
486
simple linear regression model was built with a random explanatory variable, the regression
relationship was described in the following model:
PCV𝑖 = 𝛽0 + 𝛽1RBS𝑖 + 𝑢𝑖
(75)
Table (3): Packed cell volume (PCV) and random blood sugar (RBS) of the sample observations.
No. PCV RBS No. PCV RBS No. PCV RBC
1 36 170 11 26 149 21 31 191
2 23 136 12 27 348 22 30 182
3 39 233 13 46 387 23 30 184
4 22 96 14 32 202 24 27 124
5 21 145 15 31 200 25 29 218
6 25 188 16 29 199 26 27 143
7 21 136 17 26 115 27 34 150
8 25 141 18 33 158 28 29 153
9 22 147 19 31 152 29 34 161
10 41 310 20 29 193 30 32 209
The random regression model according to formula (74) was built on the assumption that the
response variable PCV is distributed according to the normal distribution and that the
explanatory variable is distributed according to the extreme value distribution. As for the random
error terms, they follow the normal distribution independently with a mean equal to zero and a
constant variance equal to 𝜎2
(1 − 𝜌2
). To verify these assumptions, the chi-square test was used
for goodness of fit for each of the response and explanatory variables. With regard to the data of
the response variable (PCV), It was tested whether it has a normal distribution versus not having
such a distribution, i.e. testing the following hypothesis:
𝐻0: 𝑃𝐶𝑉~𝑁𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡.
𝐻1: 𝑃𝐶𝑉 ≁ 𝑁𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡.
The test results are shown in Table (5), these results indicate that the P-value associated with this
test is greater than the significance level of 0.05, and thus the null hypothesis is not rejected in
the sense that the data of the PCV variable are normally distributed, Figure (1) shows the
histogram and the normal distribution curve of the data this variable. As for the data of the RBS
response variable, it was tested to see if it follows the distribution of the extreme value,
according to the following hypothesis:
𝐻0: 𝑃𝐶𝑉~𝐸𝑥𝑡𝑟𝑒𝑚𝑒 𝑉𝑎𝑙𝑢𝑒 𝑑𝑖𝑠𝑡.
𝐻1: 𝑃𝐶𝑉 ≁ 𝐸𝑥𝑡𝑟𝑒𝑚𝑒 𝑉𝑎𝑙𝑢𝑒𝑑𝑖𝑠𝑡.
The results of this test are shown in Table (5), as the P-value of this test, which is greater than the
level of significance 0.05, indicates the acceptance of the null hypothesis, meaning that the data
of this variable follow the extreme value distribution, Figure (2) shows the histogram and
distribution curve of the data of this variable.
Table (5): The chi-square goodness of fit for the response variable PCV and explanatory variable
RBS.
PCV RBS
Statistic 1.46667 6.26667
P-Value 0.916884 0.281129

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
487
Parameters 𝜇 = 29.6; 𝜎 = 5.75384 𝛼 = 157.785; 𝜃 = 41.8356
Figure (1): Histogram and normal distribution curve for the response variable PCV.
Figure (2): Histogram and the extreme value distribution curve for the response variable RBS.
To estimate the coefficients of the random regression model shown in formula (75), the three
estimation methods under research will be relied upon, as although the simulation experiments
revealed the preference of the modified Maximum Likelihood method, we find that there is a
great convergence between the values of the comparison criteria RMSE for the models estimated
according to those methods.
Table (6) summarizes the results of the estimation. From these results, we find that the three
methods produced significant estimated models, depending on the statistical indicator F for a
significant level of 0.05. They also agreed that there is a positive effect of the random
explanatory variable RBS on the response variable PCV, and the value of this effect is
convergent for the three estimated models, as for the criteria of goodness of fit and accuracy of
estimation, R2
and RMSE, indicated the preference for the modified maximum likelihood
method, followed by the robust empirical likelihood method in preference. Based on the
foregoing and depending on the estimates of the modified Maximum Likelihood method as the
best method, the estimated random regression model is as follows:
𝑅𝐵𝑆
̂𝑖 = 28.0954 + 0.05963𝑃𝐶𝑉𝑖
(76)
Table (7) shows the real values of the response variable PCV and its estimated values. Figure (3)
shows the graph of the real and estimated regression curve.

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
488
Table (6): The estimate results for the three estimation methods MMLE, ME, RELE.
Estimation method Estimated parameters F statistic R2
RMSE
MMLE 𝛽0
𝛽1
28.0954
0.05963
21.71965*
43.68% 4.469
ME 𝛽0
𝛽1
16.0487
0.07463
19.316*
40.82% 4.581
RELE 𝛽0
𝛽1
16.09
0.07443
19.369*
40.89% 4.578
Table (7): The real and estimated values of the response variable PCV.
No. Real
value
Estimated
value
No. Real
value
Estimated
value
No. Real
value
Estimated
value
1 36 30.494 11 26 27.155 21 31 29.839
2 23 25.485 12 27 27.573 22 30 26.738
3 39 28.050 13 46 27.751 23 30 27.036
4 22 27.692 14 32 28.228 24 27 27.394
5 21 30.137 15 31 31.091 25 29 37.113
6 25 30.017 16 29 28.765 26 27 27.513
7 21 29.481 17 26 26.738 27 34 39.379
8 25 29.600 18 33 32.522 28 29 41.705
9 22 26.022 19 31 24.353 29 34 30.673
10 41 31.627 20 29 27.274 30 32 30.554
Figure (3): The graph of the real and estimated regression curve.
Conclusions:
The estimation of the simple linear regression model with a random explanatory variable, which
contradicts the basic assumptions of this model, is discussed. Three robust estimation methods were
used, the first is the modified maximum likelihood method, which was applied by researchers in
estimating the coefficients of this model, the second and third methods are the M estimation method
0
5
10
15
20
25
30
35
40
45
50
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
‫الحقيقية‬ ‫القيم‬
‫التقديرية‬ ‫القيم‬

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
489
and the robust empirical likelihood method, which were employed by us in estimating the regression
model coefficients under research, as far as we know, they were not used previously in estimating this
model, but rather they were used in estimating regression models that suffer from some standard
problems, and in case the sample observations contain outliers or extreme values.
The results of the simulation experiments revealed the superiority of the modified maximum
likelihood method based on the RMSE comparison criterion. These experiments also revealed the
effectiveness of the two methods M and the robust empirical likelihood in view of the great
convergence between the estimation results of these two methods and the results of the modified
maximum likelihood method. As for the results of the practical application, they were identical to the
results of the simulation experiments.
Based on the foregoing, the two robust estimation methods M or the robust empirical likelihood can be
used in estimating the random regression model, as they are easier in terms of applications than the
modified maximum likelihood method, nor are they based on assumptions about the probability
distribution of each of the random error limits and the random explanatory variable.
Acknowledgement: We would like to thank mustansiriyah university, Baghdad-Iraq for its support in
the present work. (www.uomustansiriyah.edu.iq ).
References
1. Kerridge, D. (1967). Errors of prediction in multiple regression with stochastic regressor
variables. Technometrics, 9(2), 309-311.
2. Lai, T. L., & Wei, C. Z. (1982). Least squares estimates in stochastic regression models with
applications to identification and control of dynamic systems. The Annals of Statistics,
10(1), 154-166.
3. Tiku, M. L., & Vaughan, D. C. (1997). Logistic and nonlogistic density functions in binary
regression with nonstochastic covariates. Biometrical Journal, 39(8), 883-898.
4. Vaughan, D. C., & Tiku, M. L. (2000). Estimation and hypothesis testing for a nonnormal
bivariate distribution with applications. Mathematical and Computer Modelling, 32(1-2), 53-
67.
5. Islam, M. Q., & Tiku, M. L. (2005). Multiple linear regression model under nonnormality.
Communications in Statistics-Theory and Methods, 33(10), 2443-2467
6. Sazak, H. S., Tiku, M. L., & Islam, M. Q. (2006). Regression analysis with a stochastic
design variable. International Statistical Review/Revue Internationale de Statistique, 77-88.
7. Islam, M. Q., & Tiku, M. L. (2010). Multiple linear regression model with stochastic design
variables. Journal of Applied Statistics, 37(6), 923-943
8. Bharali, S., & Hazarika, J. (2018). Regression models with stochastic regressors: An
expository note. . Int. J. Agricult. Stat. Sci. Vol. 15, No. 2, pp. 873-880, 2019.
9. Deshpande, B., & Bhat, S. V. (2019). Random Design Kernel Regression Estimator.
International Journal of Agricultural and Statistical Sciences, 15(1), 11-17.

UtilitasMathematica
ISSN 0315-3681 Volume 120, 2023
490
10. Lateef, Zainab Kareem & Ahmed, Ahmed Dheyab . ( 2021)Use of modified maximum
likelihood method to estimate parameters of the multiple linear regression model . Int. J.
Agricult. Stat. Sci. Vol. 17, Supplement 1, pp. 1255-1263, 2021.
11. Anderson, C., & Schumacker, R. E. (2003). A comparison of five robust regression
methods with ordinary least squares regression: Relative efficiency, bias, and test of the null
hypothesis. Understanding Statistics: Statistical Issues in Psychology, Education, and the
Social Sciences, 2(2), 79-103.
12. Susanti, Y., Pratiwi, H., Sulistijowati, S., & Liana, T. (2014). M estimation, S
estimation, and MM estimation in robust regression. International Journal of Pure and
Applied Mathematics, 91(3), 349-360 .
13. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for single functional.
Biometrika, 75(2), 237-249.
14. Özdemir, Ş., & Arslan, O. (2020). Combining empirical likelihood and robust estimation
methods for linear regression models. Communications in Statistics-Simulation and
Computation, 51(3), 941-954

Estimating random linear regression with robust methods

Recommended

Recommended

More Related Content

Similar to Estimating random linear regression with robust methods

Similar to Estimating random linear regression with robust methods (20)

More from rikaseorika

More from rikaseorika (20)

Recently uploaded

Recently uploaded (20)

Estimating random linear regression with robust methods