MSTHESIS_Fuzzy

An Assessment of Fuzzy Linear Regression
By
Eric Szegedi
B.S., Liberty University, 1992
Advisor: Andrzej S. Kosinski, Ph. D.
A thesis submitted to the Faculty of the Rollins School of Public Health
Of Emory University in partial fulfillment
Of the requirements for the degree of
Master of Public Health
Department of Biostatistics
1996

An Assessment of Fuzzy Linear Regression
By
Eric Szegedi
Advisor: Andrzej S. Kosinski, Ph. D.
Approved for the Department
Andrzej S. Kosinski
Adviser
Michael Lynn
Committee Member
Accepted:
Vicki Stover Hertzeg
Director, Division of Biostatistics
10 May 1996
Date

In presenting this thesis as a partial fulfillment of the requirements for an advanced
degree from Emory University, I agree that the Library of the University shall make it
available for inspection and circulation in accordance with its regulations governing
materials of this type. I agree that permission to copy from, or to publish, this thesis may
be granted by the professor under whose direction it was written, or, in his absence, by
the Dean of the Rollins School of Public Health when such copying or publication is
solely for scholarly purposes and does not involve potential financial gain. It is
understood that any copying from, or publication of, this thesis which involves potential
financial gain will not be allowed without written permission.
Eric Szegedi

NOTICE TO BORROWERS
Unpublished theses deposited in the Emory University Library must be used only in
accordance with the stipulations prescribed by the author in the preceding statement.
The author of this thesis is:
NAME: Eric Szegedi
ADDRESS: szegman@yahoo.com
The director of this thesis is:
NAME: Andrzej S. Kosinski, Ph.D.
ADDRESS: The Rollins School of Public Health at Emory University, Division of
Biostatistics, 1518 Clifton Rd. NE, Atlanta, GA 30322, USA
Users of this thesis not regularly enrolled as students at Emory University are required to
attest acceptance of the preceding stipulations by signing below. Libraries borrowing this
thesis for the use of their patrons are required to see that each user record here the
information requested.
Name of user Address Date Type of use: (Examination or
copying)

ABSTRACT
The purpose of this thesis is to investigate the usefulness of fuzzy linear regression as
developed by Tanaka, Uejima, and Asai, and then refined by Savic and Pedrycz, 1991. A
comparison of fuzzy linear regression to least-squares linear regression is used to make
the assessment. The comparison is undertaken in two ways. One way is to compare the
spreads,
!
r
c , of the fuzzy linear regression estimated coefficients with the standard errors
or least-squares linear regression estimated coefficients. The other way is to compare the
residuals of each type of linear regression. The goal of these two comparisons is for
statisticians to gain a better understanding of fuzzy linear regression as a method for data
analysis. The conclusion of this thesis is that further work needs to be conducted in order
to obtain an interpretable meaning from the fuzzy linear regression parameters. At this
point a clearer understanding cannot be given.
(Note: Thesis is on file at http://www.sph.emory.edu/bios/news/library/szegedi.html)

Contents
1 INTRODUCTION......................................................................................................8
2 THEORY OF FUZZY LINEAR REGRESSION....................................................11
3 COMPARISON OF PERFORMANCE ..................................................................18
3.1 COMPARISON OF FLR SPREAD AND LSLR STANDARD ERROR ..........................18
3.2 COMPARISON OF RESIDUALS ..............................................................................20
4. CONCLUSION AND DISCUSSION......................................................................22
A APPENDIX..............................................................................................................25
B APPENDIX..............................................................................................................28
C APPENDIX..............................................................................................................31
REFERENCES............................................................................................................32

List of Tables
Table 1: Example of Membership Values for Several____________________ 10
Table 2: Estimated
!
" j's and
!
c j's for Restenosis Data Set ________________ 19
Table 3: LSLR Standard Error for Restenosis Data Set __________________ 19
Table 4: Comparison of FLR and LSLR for =
!
ˆH 0.50____________________ 21
List of Figures
Figure 1: Comparison of FLR residuals with LSLR residuals for Restenosis Data Set 21

Eric Szegedi: An Assessment of Fuzzy Linear Regression 8
1 Introduction
In analyzing medical data, various indicators for a disease or outcome do not always have
an exact definition nor does their relationship. Also measurements may not be precise or
cannot be precise and a more precise meaning may be given to a variable using fuzzy
sets. For example, hypertension is defined as being greater than 90 mmHg diastolic blood
pressure for a “normal” person. A more accurate definition may be given using fuzzy sets
which will give a slight possibility of hypertension to someone with 70 mmHg diastolic
blood pressure and a high possibility of hypertension to someone with 105 mmHg
diastolic blood pressure. Another example is the relationship between a person’s hdl
cholesterol level and their level of heart disease. Someone with a high hdl cholesterol
level will have a high possibility of heart disease, whereas someone with a low hdl
cholesterol level will have a slight possibility of heart disease.
Fuzzy sets are a way to deal with problems where the source of imprecision is not
random error but “the absence of sharply defined criteria of class membership” [12]. If X
is said to be a space of points or objects with an element of X being referred to in general
as x, then a membership function
!
µS x( ) that maps each point in X to a real number in the
interval [0,1] defines a fuzzy set S in X. The value of
!
µS x( ) at x
!
" X, and is the set of
possible membership values. A set is considered fuzzy as long as the valuation set
contains values between 0 and 1. In classical set theory the valuation set comprises only
two values – membership or non-membership,
!
µS x( )=1 or
!
µS x( )=0 respectively. In
fuzzy set theory this valuation set describes the graded membership of an element x in a

set S by use of the function
!
µS x( ) mapping x to values between 0 and 1. An element may
belong partially to a set S, and the higher the value of the membership function
!
µS x( ) or
the closer
!
µS x( ) is to 1, the more an element x belongs to the set S. The set S does not
have a clearly defined membership, since S cannot be said to contain certain elements or
not. An
!
x1 with a membership of
!
µS x2( )=0.15. The element
!
x1 is said to characterize the
set S more than
!
x2 since
!
x1 has a higher membership value than
!
x2.
The various indicators for a disease or outcome, mentioned earlier, may actually
have a fuzzy definition, and their relationship may be described by a fuzzy function, such
as the fuzzy linear regression described in this thesis. The distribution of the data for
these various indicators may be characterized by a valuation set, or possibility
distribution as a valuation set is sometimes called in the literature [5]. The valuation set
consists of the values of the membership function and is based on the concept of fuzzy
logic. Fuzzy logic was created nearly simultaneously by Zadeh [12] and Klaua [2] in
1965 and has been mostly applied to engineering problems and control systems. Where
classical logic allows only for a value of true or false, denoted 1 and 0 respectively, fuzzy
logic allows for a gradation of values within the interval [0,1], so that a value of partially
true or partially false may be assigned. Fuzzy set theory is based on this concept of fuzzy
logic.
For example, if one were to ask a respondent in a survey how many times they
use cocaine and they only respond with the word “several”, a valuation set for the word
“several” would have to be used. The valuation set may look like the table below [8]. In
this case the valuation set, which comprises the membership values in Table 1, was

created by asking 23 students to “rate the degree of possibility that various integers could
be the number someone has in mind when they say several” [8].
integer membership value
0 0.00
1 0.00
2 0.00
3 0.18
4 0.57
5 0.81
6 0.97
7 0.84
8 0.72
9 0.26
10 0.02
11 0.00
Table 1: Example of Membership Values for Several
The numbers shown in Table 1 are the mean ratings given by the students. The value of
the membership function for x=3 would be
!
µS 3( )=0.18. The table shows that the word
“several” refers mainly to the integers between 5 and 8 because of their high membership
values. The most possible value of the word “several” would be 6, since 6 is the number
in the fuzzy set S with the highest membership value.
In this thesis, the question of the usefulness of fuzzy linear regression (FLR) in
statistics is assessed by comparing FLR to least-squares linear regression (LSLR). The
comparison between FLR and LSLR is done in two ways. One such way is to compare
the spreads of FLR estimated coefficients with the standard errors of LSLR estimated
coefficients. The other way is to compare the residuals of FLR with the residuals of
LSLR. The goal is to have a better understanding of the FLR parameters [
!
(
r
",
r
c), defined
in section 2], so that statisticians can use FLR as an alternative to LSLR.

2 Theory of Fuzzy Linear Regression
The purpose of fuzzy linear regression is to determine the estimated fuzzy coefficients
that have the minimum membership function for the observed fuzzy set
!
Yi = (i =1,...,n) in
the predicted fuzzy set
!
Yi
*
. The fuzzy set
!
Yi
*
used in this thesis is defined by the linear
model
!
Yi
*
= "0 + "1xi1 + "2xi2 + ... + "p xip [10], where the
!
xij j =1,..., p) are covariates
given as non-fuzzy input data and
!
" j ( j = 0,..., p) are fuzzy sets. A design matrix for
!
xij is
!
X =
1 x11 K x1p
1 x21 K x2p
M M M M
1 xn1 K xnp
"
#
$
$
$
$
%
&
'
'
'
'
where
!
xi
T
is the ith
row of X. The observed fuzzy set
!
Yi is characterized by center and
spread
!
yi,ei( ),such that the observed fuzzy set
!
Yi contains real numbers in the interval
!
yi " ei,yi + ei[ ]. The center
!
yi and the spread
!
ei are given as input from the observed data
set. In this thesis
!
ei =0 for all i. The predicted fuzzy set
!
Yi
*
is characterized by
!
yi
*
,ei
*
( )
where
!
yi
*
= xi
T r
" and
!
ei
*
= xi
T r
c , such that the predicted fuzzy set
!
Yi
*
contains real numbers
in the interval
!
yi
*
" ei
*
,yi
*
+ ei
*
[ ]. Here
!
r
" and
!
r
c denote vectors of center values
!
r
" = "0,...,"p[ ]
T
( ) and spread
!
r
c = c0,...,cp[ ]
T
( ) for all the fuzzy sets
!
" j ( j = 0,..., p).
In this thesis the membership function that determines the membership value will
be of the form
!
L •( ) = max 0,1" •( ) such that the membership function of
!
Yi
*
is

!
µYi
* Yi( ) = L
yi " yi
*
ei " ei
*
#
$
%
&
'
( =
1"
yi " yi
*
ei " ei
*
if ei " ei
*
> 0 and yi " yi
*
) ei " ei
*
0 if ei " ei
*
= 0 and yi * yi
*
or ei " ei
*
< yi " yi
*
1 if ei " ei
*
= 0 and yi = yi
*
0 if ei " ei
*
< 0
+
,
-
--
.
-
-
-
where
!
µYi
* Yi( ) is the membership function of
!
Yi defining the fuzzy set
!
Yi
*
. The center
!
yi
*
is the most possible value of the set
!
Yi
*
because
!
yi
*
has the highest membership function
for
!
µYi
* Yi( ). The spread
!
ei
*
determines how fuzzy or precise the set
!
Yi
*
will be. Also, the
wider the spread the closer to 1 (the largest membership value) the membership function
becomes.
Let the minimum
!
µYi
* Yi( ) for all i be H such that H is “the largest membership
value such that all
!
yi values having membership of at least [H] inside the fuzzy
[observed] set
!
Yi have at least [H] membership values inside the fuzzy [estimated] set
!
Yi
*
” [6]
!
H = minµYi
* (Yi ),i =1,K,n( ). The value H is called the degree of fit for a model
and is a value in the interval [0,1]. A value of H=0.7 implies a higher degree of
membership of
!
Yi in
!
Yi than a value of H=0.2.
The fuzzy sets
!
" j ( j = 0,..., p) are defined as
!
µ" j
aj( )=
1#
aj # $ j
c j
if aj # $ j % cj and cj > 0
0 if cj = 0 and aj & $ j or cj < aj # $ j
1 if cj = 0 and aj = $ j
'
(
)
)
)
*
)
)
)

where
!
µ" j
aj( ) is the membership function of
!
aj defining the fuzzy set
!
" j . The center of
the fuzzy set
!
" j is
!
" j and is the most possible value of the set
!
" j because
!
" j has the
highest membership value for
!
µ" j
aj( ). The spread around the center
!
" j of the fuzzy set
!
" j is
!
c j and is the precision of the fuzzy set
!
" j . A fuzzy set
!
" j with
!
c j=0 is referred to
as a crisp set.
The determination of the FLR residuals will be accomplished by using the
original formulation of the minimization problem by Tanaka et al. [10]. The minimization
problem is formulated as solving the following linear programming problem for
!
r
c and
!
r
" :
!
r
"
r
c # R p + 1
min
ei
*
i=1
n
$
subject to yi
*
+ 1% ˆH( )ei
*
& yi + 1% ˆH( )ei
and yi
*
% 1% ˆH( )ei
*
& yi % 1% ˆH( )ei
where
!
c j " 0 e*
= xi
T r
c( ) and
!
ˆH is the estimated degree of fit. The minimization problem
above has 2n constraints.
To compare the spreads of FLR with standard errors of LSLR this thesis will use
the minimization problem, as developed by Tanaka et al. [10] but then refined by Savic
and Pedrycz [7], along with the search method by Moskowitz and Kim to provide the
proper
!
r
" and
!
r
c . The reason for the difference in the two analyses is that in the
refinement by Savic and Pedrycz the centers of FLR would be the same as the parameter
estimates in LSLR, therefore, the residuals of FLR and LSLR could not be different by
definition.

The fuzzy regression minimization problem for the second comparison is
conducted in two steps. The first step in the refinement of the problem by Savic and
Pedrycz is to obtain
!
r
" = XT
X( )
#1
XT
Y, which is the least-squares estimator. The second
step is to solve the following linear programming problem for
!
r
c :
!
r
c " R p + 1
min
ei
*
i=1
n
#
subject to yi
*
+ 1$ ˆH( )ei
*
% yi + 1$ ˆH( )ei
and yi
*
$ 1$ ˆH( )ei
*
% yi $ 1$ ˆH( )ei
!
c j " 0 e*
= xi
T r
c( ). The minimization problem above also has 2n constraints. This
procedure as refined by Savic and Pedrycz is uniquely defined for
!
r
" when X is a full
rank matrix. As the value of
!
ˆH becomes higher, the greater
!
r
c becomes [4]. The best
degree of fit is determined by finding the estimated fuzzy sets
!
" j characterized by
!
" j ,c j( ) of
!
Yi
*
which are solutions to the minimization problem.
Moskowitz and Kim note that “the selection of a proper value of [
!
ˆH] is important
in fuzzy regression, because it determines the range of the possibility distributions
[valuation sets] of the fuzzy parameters.” Moskowitz and Kim suggest two methods for
determining
!
ˆH, an analytical method and a search method. I will use the search method
in this thesis since the search method is more advantageous when the amount of spread is
uncertain. The search method is an extension to the second minimization problem above
and allows for the incorporation of the researchers beliefs regarding the spread of the
valuation set in selecting an
!
ˆH value, instead of just guessing an
!
ˆH value. The search
method is conducted by way of the following algorithm [4]:

1. Initialization
• set the interval of uncertainty (
!
Hmin = 0,Hmax =1)
• set H to the initial guess
!
H*
• set
!
ˆc j (chosen spread of selected jth
parameter) and set level of tolerance
!
"
(greatest amount of difference wanted between
!
ˆc j and
!
c j)
• obtain
!
r
" = XT
X( )
#1
XT
Y
• choose the type of membership function
!
µYi
* Yi( )
2. Fuzzy Regression
• determine fuzzy fitted sets with a degree of fit H and membership function
!
µYi
* Yi( )
• calculate for a selected jth
parameter the difference
!
"j = ˆc j (chosen) # c j (from fuzzy regression)
3. Termination or Update
• if
!
"j < # then set
!
ˆH=H,
!
" j = # j ,c j( ) for all j and stop
• if
!
"j # $ then set
!
Hmin = H
• if
!
"j # $% then set
!
Hmax = H
• set
!
H = Hmin + (Hmax " Hmin )
2 and then go to step 2
This algorithm will provide the proper level
!
ˆH and the optimal
!
c j’s under the
membership function
!
µYi
* Yi( ). The degree of fit
!
ˆH will be the membership value that can
be obtained while maintaining the spread of the jth
parameter at a specified level [4].

Least-squares linear regression equation has the equation
!
Y = "0 + "1X1 +K+ "p Xp + #, where the
!
"0,"1,K,"p are the p+1 regression coefficients
that need to be estimated,
!
X1,X2,K,Xp are the p independent variables, and
!
" is the
random error. The random error
!
" has a mean of 0 and a variance of
!
"2
. Least-squares
linear regression chooses as the best-fitting model that model which minimizes the sum
of squares of the distances between the observed responses and those predicted by the
fitted model. The idea is that the better the fit, the smaller the deviations of the observed
values from the predicted values. The least-squares solution then consists of those values
!
ˆ"0, ˆ"1,K, ˆ"p for which the sum
!
Yi " ˆY i( )i=1
n
#
2
is a minimum.
Fuzzy linear regression allows for some of the strict assumptions of least-squares
linear regression to be relaxed [10]. A comparison of the two relevant assumptions of
least-squares linear regression (LSLR) with fuzzy linear regression (FLR) is as follows
(LSLR assumptions are taken from a book by Kleinbaum et al. [3]):
• LSLR requires the linearity assumption, which states that the mean value of Y
for each specific combination of
!
X1,X2,K,Xp is a linear function of
!
X1,X2,K,Xp (i.e.
!
µY X1 ,K,X p
= "0 + "1X1 +K+ "p Xp ). FLR is not strictly linear
because it’s coefficients are the fuzzy sets
!
" j , which are defined by triangular
membership functions in this thesis.
• LSLR requires a normality assumption which states that for any fixed
combination of
!
X1,X2,K,Xp , the variable Y is normally distributed. In LSLR,
deviations between the observed and the estimated values are assumed to be

due to random errors. The variable Y obtains it’s normality distribution from
these normally distributed random error variables. In FLR, the normality
assumption does not apply since the deviations between the observed and the
estimated values are assumed to depend on the vagueness or lack of precision
of the parameters.
The fuzzy linear regression estimates of the coefficients
!
" j , (where j=0,…p) in the
equation
!
Y*
= "0 + "1x1 +K+ "p xp are determined using a Fortran program written by
Redden [6], and modified for the specific application in this thesis. The program uses the
IMSL programming subroutine DDLPRS [1] to determine the spread
!
c j and / or
!
r
" center
of the fuzzy parameters,
!
" j . The program is in Appendix A and limited documentation
for DDLPRS is in Appendix B. The Fortran program needs the data set, the least-squares
estimates of the parameters which will be used for
!
r
" , the center of the fuzzy coefficients
!
" j , and the estimated spread,
!
c j, of one of the parameters as input when comparing the
spreads of FLR with the standard errors of LSLR. When comparing residuals, the Fortran
program only needs the data set as input. The solutions for LSLR and the validity of
LSLR’s assumptions are found by using SAS.

3 Comparison of Performance
3.1 Comparison of FLR Spread and LSLR Standard Error
The data set will be referred to as the Restenosis data set and comes from a Lovastatin
restenosis trial [11]. The Restenosis data set consists of 404 observations with missing
values for 94 of those observations. There are ten covariates and the dependent variable.
The covariates are the angina pectoris grade III or IV (yes or no), the diameter of stenosis
before angioplasty, the index site, the status of diabetes mellitus (yes or no), the diameter
of stenosis after angioplasty, the presence of systemic hypertension (yes or no), the
presence of intimal tear pre-PTCA (yes or no), the presence of intimal tear post-PTCA
(yes or no), the determination of an eccentric or concentric index site, and the age of the
individual. The dependent variable is the restudy of the diameter of the stenosis. In the
comparisons made between FLR and LSLR for the Restenosis data set, only the 310
observations with no missing values were used. The Restenosis data set met the LSLR
assumptions of linearity and normality, as can be seen in Appendix C.
The comparison of the FLR spreads with the LSLR standard errors is undertaken
in order to have an interpretable meaning for the FLR spreads. What is the meaning of an
observed set
!
Yi falling within the predicted fuzzy set
!
Yi
*
? The solutions for FLR are given
below in Table 2, and the standard errors for LSLR are in Table 3. The
!
" j’s in Table 2

are LSLR parameters. Variables 1 through 10 in Table 2 and Table 3 are the covariates,
and variable 0 is the Y-intercept.
Table 2: Estimated
!
" j's and
!
c j's for Restenosis Data Set
The low values of
!
ˆH=0.019 and
!
ˆH=0.188 mean that the FLR estimates do not fit the
data well. The values of
!
ˆH=0.500 and
!
ˆH=0.705 mean that the FLR estimates fit the data
fairly well.
j
!
s.e.j 1.96
!
s.e.j
0 15.452 30.286
1 2.239 4.388
2 0.109 0.213
3 0.184 0.361
4 3.666 7.186
5 0.093 0.183
6 2.259 4.428
7 10.074 19.745
8 2.984 5.848
9 2.263 4.434
10 0.113 0.221
Table 3: LSLR Standard Error for Restenosis Data Set
H=0.019 H=0.188 H=0.500 H=0.705
!
"0=23.696
!
c0= 0.00
!
c0= 0.00
!
c0= 0.00
!
c0= 0.00
!
"1= 1.624
!
c1= 0.00
!
c1= 0.00
!
c1= 0.00
!
c1= 0.00
!
"2= 0.264
!
c2=0.453
!
c2=0.547
!
c2=0.889
!
c2=1.505
!
"3=-0.175
!
c3= 0.00
!
c3= 0.00
!
c3= 0.00
!
c3= 0.00
!
"4=-4.358
!
c4= 0.00
!
c4= 0.00
!
c4= 0.00
!
c4= 0.00
!
"5= 0.477
!
c5=0.343
!
c5=0.415
!
c5=0.674
!
c5=1.141
!
"6=-2.518
!
c6= 0.00
!
c6= 0.00
!
c6= 0.00
!
c6= 0.00
!
"7= 0.725
!
c7= 0.00
!
c7= 0.00
!
c7= 0.00
!
c7= 0.00
!
"8=-5.676
!
c8= 0.00
!
c8= 0.00
!
c8= 0.00
!
c8= 0.00
!
"9=-0.585
!
c9= 0.00
!
c9= 0.00
!
c9= 0.00
!
c9= 0.00
!
"10= 0.078
!
c10=0.338
!
c10=0.408
!
c10=0.663
!
c10=1.123

Only 1.9% of FLR prediction intervals for
!
ˆH=0.019 are narrower than the LSLR 95%
level prediction intervals. For
!
ˆH=0.188,
!
ˆH=0.500, and
!
ˆH=0.705, the FLR prediction
intervals are much wider than the LSLR 95% level prediction intervals. However, the
connection between FLR spreads and LSLR standard error is unclear. The question is
why the FLR prediction intervals for one
!
ˆH value, as opposed to prediction intervals for
other
!
ˆH values, have a higher percentage of prediction intervals that are narrower than
LSLR prediction intervals. There seems to be no relation between the FLR spreads and
the LSLR standard errors with regards to why some FLR prediction intervals are
narrower than the LSLR prediction intervals and other FLR prediction intervals are not
narrower. The crisp FLR spreads of 8 of the parameters do not allow for a good
comparison. Tanaka and Ishibuchi [9] have proposed a method with interactive fuzzy
parameters and quadratic membership functions to deal with crisp FLR spreads. This
method by Tanaka and Ishibuchi is not dealt with, though, in this thesis.
3.2 Comparison of Residuals
When comparing the residuals of the Restenosis data for LSLR and FLR, only 9 of the 10
covariates are used. The covariate for the presence of intimal tear post-PTCA was
removed for ease of computation. This comparison is undertaken in order to determine
whether FLR or LSLR has the better estimated coefficients with regards to the observed
data.
LSLR does much better than FLR since the sum of squared residuals for LSLR is
much smaller than the sum of squared residuals for FLR. The sum of squared residuals

for LSLR is 113,682.71, and the sum of squared residuals for FLR is 145,997.95, for a
difference of 32,315.24.
The results of FLR for
!
ˆH=0.50 are in Table 4. The first line of Table 4 is for the
Y-intercept, and the other lines are for the 9 covariates. For other
!
ˆH values, the
!
" j‘s are
the same except for
!
ˆH=0.80 where
!
"4= -1.632, a minor difference. With these other
!
ˆH
values, only the
!
c j‘s change. These
!
c j‘s are not relevant to the comparison of the
residuals between FLR and LSLR, since only
!
" j‘s are used in the comparison.
j
!
" j
!
c j LSLR
!
paramj
!
s.e.j
0 8.181 0.000 23.696 15.452
1 -4.943 7.468 1.624 2.239
2 0.546 1.246 0.264 0.109
3 -1.601 0.000 -0.175 0.184
4 -7.997 0.000 -4.358 3.666
5 0.418 0.000 0.477 0.093
6 1.365 0.000 -2.518 2.259
7 -2.154 0.000 -5.676 2.984
8 10.628 0.000 -0.585 2.262
9 -0.030 0.000 0.078 0.113
Table 4: Comparison of FLR and LSLR for =
!
ˆH 0.50
The graphs comparing the residuals to their respective observations and to their
respective predicted values are in Figure 1. FLR residuals seem to have greater variability
around 0 then LSLR residuals. The greater variability of the FLR residuals implies that
FLR predicted values are not as close to the observed values as are LSLR predicted
values.
Figure 1: Comparison of FLR residuals with LSLR residuals for Restenosis Data Set
(to view contact author at szegman@yahoo.com)

4. Conclusion and Discussion
An attempt to give meaning to the FLR parameters has been presented by comparing the
spreads of FLR estimated coefficients with the standard errors of LSLR estimated
coefficients and by comparing the residuals of FLR with the residuals of LSLR. At this
point no real conclusions can be drawn. LSLR seems to be better than FLR at describing
the Restenosis data set. Also, an interpretation that has meaning to statisticians can be
given to the results from LSLR. The same still cannot be said of FLR. Further work needs
to be conducted in order for statisticians to obtain an interpretable meaning from the FLR
parameters so that FLR can be used as a viable method for data analysis. For instance,
FLR might be able to be used for small data sets when LSLR can not meet it’s
assumptions.
Other considerations that need to be made when trying to give meaning to FLR
parameters are first, a generalization of the model used in this thesis. This generalized
model is considered robust in the presence of outliers, and the generalized model is a
model where the bounds of the interval are fuzzy. “The dependent data y are no longer
inside or outside the interval but belong to the interval to certain degrees (membership)”
[5]. This generalized model with fuzzy intervals maximizes
!
1
n " = "
i=1
n
# such that
!
1" #)( s0 " ei
*
j=1
p
$
i=1
n
$ % "d0 objective function

!
1" #i)( s1 + yi
*
j= 0
p
$ + ei
*
j= 0
p
$ % yi upper limit
!
1" #i)( s1 " yi
*
j= 0
p
$ + ei
*
j= 0
p
$ % "yi lower limit
!
"#i $ "1,#i $ 0,% & R and
!
xi0 =1
where
!
d0 is the desired value of the objective function,
!
si is the width of the tolerance
interval of the observed
!
yi, and
!
"i represents the membership value to which the solution
belongs to the set “good solution” (
!
"i restricted to [0,1]). A weak requirement to
minimize the spread (a high value of
!
s0 and low values of
!
si) leads to a wide interval.
Whereas strong requirements to minimize the spread (a low value of and high values of )
lead to a narrow interval. Peters suggest
!
d0 = 0 to obtain a model as crisp as possible [5].
This generalized model needs to be investigated to see if and if so, how much better or
narrower are the fuzzy linear regression prediction intervals than least-squares regression
prediction intervals for outliers.
A second consideration that can also possibly enhance the results of fuzzy linear
regression would be to vary the membership function used. Other membership functions
include the following: uniform membership function
!
L z( ) =
1 if -1" z "1
0 otherwise
#
$
%
asymmetric membership function
!
L z( ) = max 0,1" z
p
( ), p > 0, or summed exponential
function

!
L z( ) =
a " e
b z
- ce
d z
if L z( ) is concave -1# z #1
a + e
"b z
+ce
d z
if L z( ) is convex -1# z #1
$
%
&
where L represents the membership function of a standardized parameter z. The
standardized parameter z is defined as the distance between the observation
!
Yi and the
center value of the corresponding fuzzy estimate
!
Yi
*
divided by the difference in spread
of
!
Yi
*
and
!
Yi [4].

A Appendix
PARAMETER (NDATA = 310)
PARAMETER (NVARS = 10)
PARAMETER (N = 2 * NVARS)
PARAMETER (M1 = 2 * NDATA)
PARAMETER (ZZ1 = 1)
PARAMETER (ZZ2 = -1)
INTEGER IRTYPE (M1), I, J, JUMP, K
DOUBLE PRECISION X(NDATA, NVARS + 2), DUMMY(NDATA)
DOUBLE PRECISION A(M1, N), CF(N), S, SUM(NVARS)
DOUBLE PRECISION PSOL(N), DSOL(M1), H, UB, LB, DUMMY2(NDATA)
DOUBLE PRECISION XLB(N), XUB(N), BL(M1), BU(M1)
COMMON / WORKSP / RWKSP
REAL RWKSP(15844578)
C IMSL SUBROUTINE FOR WORKSPACE ALLOCATION
CALL IWKIN(15844578)
OPEN (UNIT=1, FILE=’rest.fprogb.dat’,STATUS=’OLD’)
OPEN (UNIT=2, FILE=’s2.dat’,STATUS=’OLD’)
OPEN (UNIT=3, FILE=’s3.dat’, STATUS=’UNKNOWN’)
C READS IN DATA POINTS
DO 10 I=1, NDATA
10 READ(1,*) (X(I,J), J=1,NVARS+2)
H=0.4
DO 15 WHILE (H .LE. 1.0)
DO 16 I=1, 2*NDATA
DO 17 J=1,2*NVARS
A(I,J)=0
BL(J)=0
17 CONTINUE
16 CONTINUE
C CALCULATES A(M1,NVARS) MATRIX AND BL(M1) VECTOR
C A MATRIX CONTAINS COEFFICIENTS OF M1 CONSTRAINTS
C BL VECTOR CONTAINS THE LOWER LIMIT CONSTRAINTS
ZZ3 = 1 – H
JUMP = 1
DO 30 I = 1, NDATA

DO 40 J = 1, NVARS
A(JUMP,J)=ZZ2*X(I,J+1)
A(JUMP+1,J)=ZZ1*X(I,J+1)
A(JUMP, J+NVARS)=ZZ3*ABS(X(I,J+1))
A(JUMP+1, J+NVARS)=ZZ3*ABS(X(I,J+1))
BL(JUMP) = ZZ2 * (X(I,1)+ZZ3*X(I,NVARS+2)
BL(JUMP + 1) = X(I,1)+ZZ3*X(I,NVARS+2)
40 CONTINUE
JUMP = JUMP + 2
30 CONTINUE
C CALCULATES THE COEFFICIENTS FOR THE OBJECTIVE FUNCTION
DO 50 J = 2, NVARS + 1
DO 60 I = 1, NDATA
SUM(J-1) = SUM(J-1) + ABS(X(I,J))
60 CONTINUE
50 CONTINUE
C ASSIGNS COEFFICIENTS TO OBJECTIVE FUNCTION
DO 70 J=1, NVARS
CF(J) = 0
CF(J+NVARS)=SUM(J)
XLB(J) = 1.0E30
XUB(J) = -1.0E30
XLB(J+NVARS)= 0
XUB(J+NVARS)= -1.0E30
70 CONTINUE
C ASSIGNS TYPE OF CONSTRAINT TO VECTOR IRTYPE(M1)
C (i.e. 2 INDICATES .GE. CONSTRAINT)
DO 80 I = 1, M1
IRTYPE(I) = 2
80 CONTINUE
C IMSL LINEAR PROGRAMMING SUBROUTINE
CALL DDLPRS(M1,N,A,M1,BL,BU,CF,IRTYPE,XLB,XUB,S,PSOL,DSOL)
C PRINTS SOLUTION TO LINEAR PROGRAMMING PROBLEM
WRITE(3, 90)
90 FORMAT(4X, ‘GAMMA’, 9X, ‘C’)
DO 100 J =1, NVARS
WRITE(3, 110) PSOL(J),PSOL(J+NVARS)
110 FORMAT(1X, F10.3, 3X, F7.3)
100CONTINUE

WRITE(3, 120) S
120FORMAT(‘ OBJECTIVE FN = ‘, F11.3)
WRITE(3, *) ‘H VALUE’, H
H=H+0.2
C CALCULATES CONFIDENCE INTERVAL
WRITE(3,*) ‘OBSERVATION ‘,’ UPPER BOUND’,’ LOWER BOUND’
DO 130 I=1, NDATA
DUMMY(I)=0
DUMMY2(I)=0
DO 140 J=1, NVARS
C CALCULATES [(GAMMA)^T]*X(i)
DUMMY(I)=PSOL(J)*X(I,J+1)+DUMMY(I)
C CALCULATES [C^T]*(Xi)
DUMMY2(I)=PSOL(J+NVARS)*X(I,J)+DUMMY2(I)
140 CONTINUE
IF (DUMMY2(I) .GE. 0.0) THEN
UB=DUMMY(I)+DUMMY2(I)
LB=DUMMY(I)-DUMMY2(I)
END IF
IF (DUMMY2(I) .LT. 0.0) THEN
UB=DUMMY(I)-DUMMY2(I)
LB=DUMMY(I)+DUMMY2(I)
END IF
WRITE(3,150) I,UB,LB
150 FORMAT(I7,F19.3,F13.3)
130CONTINUE
15CONTINUE
END

B Appendix
DDLPRS: DOUBLE PRECISION
Purpose: solve a linear programming problem via the revised simples algorithm
Usage: Call DDLPRS(M, NVAR, A LDA, BL, BU, C, IRTYPE, XLB, XUB, OBJ,
XSOL, DSOL)
Arguments:
M – Number of constraints.
NVAR – Number of variables (p+1).
A – Matrix of dimension M by NVAR containing the coefficients of the M
constraints.
LDA – Leading dimension of A exactly as specified in the dimension statement of the
calling program. LDA must be at least M.
BL – Vector of length M containing the lower limit of the general constraints. If there
is no limit on the I-th constraint, the BL(I) would not be referred.
BU – Vector of length M containing the upper limit of the general constraints. If there
is no upper limit on the I-th constraint, then BU(I) would not be referred. If there is
no range constraint, BL and BU can share the same storage location.
C – Vector of length NVAR containing the coefficients of the objective function.
IRTYPE – Vector of length M indication the type of constraints exclusive of simple
bounds, where IRTYPE(I) = 0, 1, 2, 3 indicate .EQ., .LE., .GE., and range constraints
respectively.

XLB – Vector of length NVAR containing the lower bound on the variables. If there
is no lower bound on a variable, then 1.0E30 should be set as the lower bound.
XUB – Vector of length NVAR containing the upper bound on the variables. If there
is no upper bound on a variable, then -1.0E30 should be set as the upper bound.
OBJ – Value of the objective function (Output).
XSOL – Vector of length NVAR containing the primal solution (Output).
DSOL – Vector of length M containing the dual solution (Output).
DDLPRS is based on Richard Hanson’s routine LPMGB. It uses a revised simplex
method to solve linear programming problems. In this thesis the problem is of the form:
!
c " R
min
p +1 r
cT
i = 1
n
# x
i
subject to
!
A
r
cT
"
r
bj
c j " 0
where A is the coefficient matrix consisting of p+1 columns and 2n rows such that
!
A = (1" H)
X
X
#
$
%
&
'
(
X is a design matrix such that
!
X =
1 x11 K x1p
1 x21 K x2p
M M M M
1 xn1 K xnp
"
#
$
$
$
$
%
&
'
'
'
'
!
xi = 1, xi1 , ..., xip[ ]
T
, bl is the vector of the lower bounds on the constraints consisting of
2n rows such that

!
r
bj =
Y1 + (1" H)e1 "
r
#T
x1
"Y2 + (1" H)e2 +
r
#T
x2
K
K
K
Y2n + (1" H)e2n +
r
#T
x2n
"Y2n + (1" H)e2n +
r
#T
x2n
$
%
&
&
&
&
&
&
&
&
&
'
(
)
)
)
)
)
)
)
)
)
and
!
r
cT
is a vector of p+1 rows.

C Appendix
Predicted Value of Y vs. Residual of Y for Restenosis Data Set
Normal Probability Plot for Restenosis Data Set
To view contact author at szegman@yahoo.com

References
[1] IMSL, Inc., Houston, Texas. IMSL Math/Library, version 1.0 edition, 1989.
[2] D. Klaua. Uber einen ansatz zur mehrwertigen mengenlehre. Monatsber Deut. Akad.
Wiss. Berlin, 7:859-876, 1965.
[3] David G. Kleinbaum, Lawrence L. Kupper, and Keith E. Muller. Applied Regression
Analysis and Other Multivariable Methods, 2 ed. PWS-Kent Publishing Company,
Boston, 1988.
[4] Hervert Moskowitz and Kwangjae Kim. On assessing the h value in fuzzy linear
regression. Fuzzy Sets and Systems, 58:303-327, 1993.
[5] Georg Peters. Fuzzy linear regression with fuzzy intervals. Fuzzy Sets and Systems,
63:45-55, 1994.
[6] David T. Redden and William H. Woodall. Properties of certain fuzzy linear
regression methods. Fuzzy Sets and Systems, 64:361-375, 1994.
[7] Dragan A. Savic and Witold Pedrycz. Evaluation of fuzzy linear regression models.
Fuzzy Sets and Systems, 39:51-63, 1991.

[8] Michael Smithson. Fuzzy Set Analysis for Behavioral and Social Sciences, Springer-
Verlag New York, Inc., NY 1987.
[9] Hideo Tanaka and H. Ishibuchi. Identification of possibilistic linear systems by
quadratic membership functions of fuzzy parameters. Fuzzy Sets and Systems,
41:145-160, 1991.
[10] Hideo Tanaka, Satoru Uejima, and Kiyoji Asai. Linear regression analysis with
fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics, 12(6):903-907,
1982.
[11] William S. Weintraub, Andrzej S. Kosinski, Charles L. Brown III, and Spencer B.
King III. Can restenosis after coronary angioplasty be predicted from clinical
variables? Journal of the American College of Cardiology, 21:6-14, 1993.
[12] Lofti A. Zadeh. Fuzzy sets. Information and Control, 8:338-353, 1965.

MSTHESIS_Fuzzy

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to MSTHESIS_Fuzzy

Similar to MSTHESIS_Fuzzy (20)

MSTHESIS_Fuzzy