ABSTRACT : This paper critically examined a broad view of Structural Equation Model (SEM) with a view
of pointing out direction on how researchers can employ this model to future researches, with specific focus on
several traditional multivariate procedures like factor analysis, discriminant analysis, path analysis. This study
employed a descriptive survey and historical research design. Data was computed viaDescriptive Statistics,
Correlation Coefficient, Reliability. The study concluded that Novice researchers must take care of assumptions
and concepts of Structure Equation Modeling, while building a model to check the proposed hypothesis. SEM is
more or less an evolving technique in the research, which is expanding to new fields. Moreover, it is providing
new insights to researchers for conducting longitudinal investigations.
.
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
This lecture provides an overview of linear regression analysis, interaction terms, ANOVA, optimization, log-level, and log-log transformations. The first practical example centers around the Boston housing market where the second example dives into business applications of regression analysis in a supermarket retailer.
ABSTRACT : This paper critically examined a broad view of Structural Equation Model (SEM) with a view
of pointing out direction on how researchers can employ this model to future researches, with specific focus on
several traditional multivariate procedures like factor analysis, discriminant analysis, path analysis. This study
employed a descriptive survey and historical research design. Data was computed viaDescriptive Statistics,
Correlation Coefficient, Reliability. The study concluded that Novice researchers must take care of assumptions
and concepts of Structure Equation Modeling, while building a model to check the proposed hypothesis. SEM is
more or less an evolving technique in the research, which is expanding to new fields. Moreover, it is providing
new insights to researchers for conducting longitudinal investigations.
.
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
This lecture provides an overview of linear regression analysis, interaction terms, ANOVA, optimization, log-level, and log-log transformations. The first practical example centers around the Boston housing market where the second example dives into business applications of regression analysis in a supermarket retailer.
Between Black and White Population1. Comparing annual percent .docxjasoninnes20
Between Black and White Population
1. Comparing annual percent of Medicare enrollees having at least one ambulatory visit between B and W
2. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having hemoglobin A1c between B and W
3. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having eye examination between B and W
4. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having
Students will develop an analysis report, in five main sections, including introduction, research method (research questions/objective, data set, research method, and analysis), results, conclusion and health policy recommendations. This is a 5-6 page individual project report.
Here are the main steps for this assignment.
Step 1: Students require to submit the topic using topic selection discussion forum by the end of week 1 and wait for instructor approval.
Step 2: Develop the research question and
Step 3: Run the analysis using EXCEL (RStudio for BONUS points) and report the findings using the assignment instruction.
The Report Structure:
Start with the
1.Cover page (1 page, including running head).
Please look at the example http://www.apastyle.org/manual/related/sample-experiment-paper-1.pdf (you can download the file from the class) and http://www.umuc.edu/library/libhow/apa_tutorial.cfm to learn more about the APA style.
In the title page include:
· Title, this is the approved topic by your instructor.
· Student name
· Class name
· Instructor name
· Date
2.Introduction
Introduce the problem or topic being investigated. Include relevant background information, for example;
· Indicates why this is an issue or topic worth researching;
· Highlight how others have researched this topic or issue (whether quantitatively or qualitatively), and
· Specify how others have operationalized this concept and measured these phenomena
Note: Introduction should not be more than one or two paragraphs.
Literature Review
There is no need for a literature review in this assignment
3.Research Question or Research Hypothesis
What is the Research Question or Research Hypothesis?
***Just in time information: Here are a few points for Research Question or Research Hypothesis
There are basically two kinds of research questions: testable and non-testable. Neither is better than the other, and both have a place in applied research.
Examples of non-testable questions are:
How do managers feel about the reorganization?
What do residents feel are the most important problems facing the community?
Respondents' answers to these questions could be summarized in descriptive tables and the results might be extremely valuable to administrators and planners. Business and social science researchers often ask non-testable research questions. The shortcoming with these types of questions is that they do not provide objective cut-off points for decision-makers.
In order to overcome this problem, researchers often seek to answer o ...
Assessing Discriminatory Performance of a Binary Logistic Regression Modelsajjalp
The evaluation of fitted binary logistic regression model is very important in assessing the appropriateness of a model for specific purposes. The studyproposesto assess the discriminatory performance of a binary logistic regression model to correctly classify between the cases and non-cases. The discriminatory performance of binary logistic regression model is measured using two approaches. The first approach is the use of fitted binary logistic regression model to correctly predict the subjects that are cases and non-cases,with the help of the parameters sensitivity and specificity. The alternative approach is basedon receiver operatingcharacteristic(ROC)curvefor the fitted binary logistic regression model and then determining the area under the curve (AUC) as a measure of discriminatory performance. The value of sensitivity is observed to be greater than the value of 1-specificity, which signifies suitable discrimination for the mentioned cut point. The area under the curve indicates that there is evidence of reasonable discrimination reported bythe fitted model.
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGIJDKP
This paper presents a methodology that eliminates multicollinearity of the predictors variables in
supervised classification by transforming the predictor variables into orthogonal components obtained
from the application of Partial Least Squares (PLS) Logistic Regression. The PLS logistic regression was
developed by Bastien, Esposito-Vinzi, and Tenenhaus [1]. We apply the techniques of supervised
classification on data, based on the original variables and data based on the PLS components. The error
rates are calculated and the results compared. The implementation of the methodology of classification is
rests upon the development of computer programs written in the R language to make possible the
calculation of PLS components and error rates of classification. The impact of this research will be
disseminated, based on evidence that the methodology of Partial Least Squares Logistic Regression, is
fundamental when working in a supervised classification with data of many predictors variables.
Utilitas Mathematica Journal is a broad scope journal that publishes original research and review articles on all aspects of both pure and applied mathematics.The journal publishes original research in all areas of pure and applied mathematics, statistics.
Utilitas Mathematica Journal. It's our journal publishes original research in all areas of pure and applied mathematics.Number Theory,Operations Research,Mathematical Biology,and it's the Utilitas Mathematica Journal commits to strengthening our professional .
https://utilitasmathematica.com/index
Our journal has to a fully open-access format is a significant step toward advancing the principles of open science and equitable access to knowledge. However, this transition also brings challenges, such as ensuring sustainable funding models and maintaining rigorous peer-review standards.
https://utilitasmathematica.com/index.php/Index
Our Journal has recently transitioned to becoming a fully open-access journal. This transition marks a significant shift in the landscape of academic publishing, aiming to provide numerous benefits to both authors and readers. Open access has the potential to democratize knowledge, enhance research impact, and foster greater collaboration within the academic community.
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
This article advances our understanding of regression-based data mining by comparing the utility of Least
Absolute Value (LAV) and Least Squares (LS) regression methods. Using demographic variables from
U.S. state-wide data, we fit variable regression models to dependent variables of varying distributions
using both LS and LAV. Forecasts generated from the resulting equations are used to compare the
performance of the regression methods under different dependent variable distribution conditions. Initial
findings indicate LAV procedures better forecast in data mining applications when the dependent variable
is non-normal. Our results differ from those found in prior research using simulated data.
Welcome to this comprehensive presentation on regression analysis, a fundamental technique in predictive modeling. In this slide deck, we will embark on a journey through the intricate world of regression, exploring its essence, types, applications, systematic process, underlying assumptions, diagnostic tools, and real-world significance.
Regression analysis is a powerful statistical tool that enables us to understand and quantify the relationships between variables. By examining the interplay between a dependent variable and one or more independent variables, regression unveils patterns and trends that can drive informed decision-making. Whether you're working in finance, marketing, healthcare, or any other field, regression empowers analysts to extract valuable information from their data and make accurate predictions.
During our exploration, we will delve into various types of regression models. Simple Linear Regression establishes a linear relationship between two variables, serving as a foundation for understanding more complex models. Multiple Linear Regression expands this concept by incorporating multiple predictors, allowing us to account for multiple factors influencing the dependent variable. Polynomial Regression goes beyond linear relationships, capturing non-linear associations between variables. Logistic Regression, on the other hand, is specifically designed for predicting categorical outcomes, making it an invaluable tool for classification problems.
Throughout the presentation, we will showcase real-world applications of regression analysis. Witness how regression aids in predicting stock prices, forecasting sales, estimating housing prices, analyzing customer behavior, predicting disease outcomes, optimizing resource allocation, and much more. These examples illustrate the remarkable impact of regression across industries, demonstrating its relevance and effectiveness in solving complex problems and driving data-driven decision-making.
In conclusion, regression analysis is a powerful tool that unlocks a world of possibilities. By unraveling complex relationships, making accurate predictions, and extracting valuable insights from data, regression empowers analysts to drive evidence-based decision-making and stay ahead in a rapidly evolving world. Join us as we delve into the world of regression and discover its potential to transform the way you approach data analysis and modeling. Let's embark on this journey together and harness the power of regression analysis!
Between Black and White Population1. Comparing annual percent .docxjasoninnes20
Between Black and White Population
1. Comparing annual percent of Medicare enrollees having at least one ambulatory visit between B and W
2. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having hemoglobin A1c between B and W
3. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having eye examination between B and W
4. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having
Students will develop an analysis report, in five main sections, including introduction, research method (research questions/objective, data set, research method, and analysis), results, conclusion and health policy recommendations. This is a 5-6 page individual project report.
Here are the main steps for this assignment.
Step 1: Students require to submit the topic using topic selection discussion forum by the end of week 1 and wait for instructor approval.
Step 2: Develop the research question and
Step 3: Run the analysis using EXCEL (RStudio for BONUS points) and report the findings using the assignment instruction.
The Report Structure:
Start with the
1.Cover page (1 page, including running head).
Please look at the example http://www.apastyle.org/manual/related/sample-experiment-paper-1.pdf (you can download the file from the class) and http://www.umuc.edu/library/libhow/apa_tutorial.cfm to learn more about the APA style.
In the title page include:
· Title, this is the approved topic by your instructor.
· Student name
· Class name
· Instructor name
· Date
2.Introduction
Introduce the problem or topic being investigated. Include relevant background information, for example;
· Indicates why this is an issue or topic worth researching;
· Highlight how others have researched this topic or issue (whether quantitatively or qualitatively), and
· Specify how others have operationalized this concept and measured these phenomena
Note: Introduction should not be more than one or two paragraphs.
Literature Review
There is no need for a literature review in this assignment
3.Research Question or Research Hypothesis
What is the Research Question or Research Hypothesis?
***Just in time information: Here are a few points for Research Question or Research Hypothesis
There are basically two kinds of research questions: testable and non-testable. Neither is better than the other, and both have a place in applied research.
Examples of non-testable questions are:
How do managers feel about the reorganization?
What do residents feel are the most important problems facing the community?
Respondents' answers to these questions could be summarized in descriptive tables and the results might be extremely valuable to administrators and planners. Business and social science researchers often ask non-testable research questions. The shortcoming with these types of questions is that they do not provide objective cut-off points for decision-makers.
In order to overcome this problem, researchers often seek to answer o ...
Assessing Discriminatory Performance of a Binary Logistic Regression Modelsajjalp
The evaluation of fitted binary logistic regression model is very important in assessing the appropriateness of a model for specific purposes. The studyproposesto assess the discriminatory performance of a binary logistic regression model to correctly classify between the cases and non-cases. The discriminatory performance of binary logistic regression model is measured using two approaches. The first approach is the use of fitted binary logistic regression model to correctly predict the subjects that are cases and non-cases,with the help of the parameters sensitivity and specificity. The alternative approach is basedon receiver operatingcharacteristic(ROC)curvefor the fitted binary logistic regression model and then determining the area under the curve (AUC) as a measure of discriminatory performance. The value of sensitivity is observed to be greater than the value of 1-specificity, which signifies suitable discrimination for the mentioned cut point. The area under the curve indicates that there is evidence of reasonable discrimination reported bythe fitted model.
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGIJDKP
This paper presents a methodology that eliminates multicollinearity of the predictors variables in
supervised classification by transforming the predictor variables into orthogonal components obtained
from the application of Partial Least Squares (PLS) Logistic Regression. The PLS logistic regression was
developed by Bastien, Esposito-Vinzi, and Tenenhaus [1]. We apply the techniques of supervised
classification on data, based on the original variables and data based on the PLS components. The error
rates are calculated and the results compared. The implementation of the methodology of classification is
rests upon the development of computer programs written in the R language to make possible the
calculation of PLS components and error rates of classification. The impact of this research will be
disseminated, based on evidence that the methodology of Partial Least Squares Logistic Regression, is
fundamental when working in a supervised classification with data of many predictors variables.
Utilitas Mathematica Journal is a broad scope journal that publishes original research and review articles on all aspects of both pure and applied mathematics.The journal publishes original research in all areas of pure and applied mathematics, statistics.
Utilitas Mathematica Journal. It's our journal publishes original research in all areas of pure and applied mathematics.Number Theory,Operations Research,Mathematical Biology,and it's the Utilitas Mathematica Journal commits to strengthening our professional .
https://utilitasmathematica.com/index
Our journal has to a fully open-access format is a significant step toward advancing the principles of open science and equitable access to knowledge. However, this transition also brings challenges, such as ensuring sustainable funding models and maintaining rigorous peer-review standards.
https://utilitasmathematica.com/index.php/Index
Our Journal has recently transitioned to becoming a fully open-access journal. This transition marks a significant shift in the landscape of academic publishing, aiming to provide numerous benefits to both authors and readers. Open access has the potential to democratize knowledge, enhance research impact, and foster greater collaboration within the academic community.
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
This article advances our understanding of regression-based data mining by comparing the utility of Least
Absolute Value (LAV) and Least Squares (LS) regression methods. Using demographic variables from
U.S. state-wide data, we fit variable regression models to dependent variables of varying distributions
using both LS and LAV. Forecasts generated from the resulting equations are used to compare the
performance of the regression methods under different dependent variable distribution conditions. Initial
findings indicate LAV procedures better forecast in data mining applications when the dependent variable
is non-normal. Our results differ from those found in prior research using simulated data.
Welcome to this comprehensive presentation on regression analysis, a fundamental technique in predictive modeling. In this slide deck, we will embark on a journey through the intricate world of regression, exploring its essence, types, applications, systematic process, underlying assumptions, diagnostic tools, and real-world significance.
Regression analysis is a powerful statistical tool that enables us to understand and quantify the relationships between variables. By examining the interplay between a dependent variable and one or more independent variables, regression unveils patterns and trends that can drive informed decision-making. Whether you're working in finance, marketing, healthcare, or any other field, regression empowers analysts to extract valuable information from their data and make accurate predictions.
During our exploration, we will delve into various types of regression models. Simple Linear Regression establishes a linear relationship between two variables, serving as a foundation for understanding more complex models. Multiple Linear Regression expands this concept by incorporating multiple predictors, allowing us to account for multiple factors influencing the dependent variable. Polynomial Regression goes beyond linear relationships, capturing non-linear associations between variables. Logistic Regression, on the other hand, is specifically designed for predicting categorical outcomes, making it an invaluable tool for classification problems.
Throughout the presentation, we will showcase real-world applications of regression analysis. Witness how regression aids in predicting stock prices, forecasting sales, estimating housing prices, analyzing customer behavior, predicting disease outcomes, optimizing resource allocation, and much more. These examples illustrate the remarkable impact of regression across industries, demonstrating its relevance and effectiveness in solving complex problems and driving data-driven decision-making.
In conclusion, regression analysis is a powerful tool that unlocks a world of possibilities. By unraveling complex relationships, making accurate predictions, and extracting valuable insights from data, regression empowers analysts to drive evidence-based decision-making and stay ahead in a rapidly evolving world. Join us as we delve into the world of regression and discover its potential to transform the way you approach data analysis and modeling. Let's embark on this journey together and harness the power of regression analysis!
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
1.4 modern child centered education - mahatma gandhi-2.pptx
02_AJMS_441_22.pdf
1. www.ajms.com 6
RESEARCH ARTICLE
Regression Diagnostic Checking for Survey Data on Mothers’ Weights and Ages as
Factors Responsible for Babies’ Weight at Birth
Habiba Danjuma
Department of Statistics, Federal Polytechnic Bali, PMB 05 Bali, Taraba State, Nigeria
Received: 22-07-2022; Revised: 25-08-2022; Accepted: 25-10-2022
ABSTRACT
When a regression model is considered for an application, we can usually not be certain in advance
that the model is appropriate for that application, any one, or several, of the features of the model, such
as linearity of the regression function or normality of the error terms, may not be appropriate for the
particular data at hand. Hence, the diagnostic checking techniques on the regression model are essential.
This study therefore is on model diagnostic checking techniques with application to linear regression
analysis. In this study, a method useful for diagnosing violation of basic regression assumptions are
presented and tested using a secondary data on babies’weight at birth which serves as dependent variable
and mothers’ weight and ages as independent variables. All the assumptions tested from the objectives
(Normality of residual, collinearity between the independent variable, outlier/leverage, and linearity of
the model) are met and no one deviated from the assumptions of multiple linear regression fitted on the
data.
Key words: Diagnostic checking, Regression assumption, Collinearity, Outlier, Normality, linearity
Address for correspondence:
Habiba Danjuma
E-mail: yumarhabiba@gmail.com
INTRODUCTION
Regression analysis is a statistical technique
for investigating and modeling the relationship
between two or more variables. More specifically,
it is an attempt to explain movement in one variable
by reference to movements in one or more other
variables. To make this definition more concrete,
we can denote the variable whose movements the
regression seeks to explain by y and the variables
which are used to explain those variation by x1
,
x2
,… x3
. Hence, in this relative set up, it would
be said that the variations in k variables (the x’s)
cause changes in some variable, y’s.[1]
Applications of regression analysis exist in almost
every field of human endeavor. In econometrics,
the dependent variable might be a family’ s
consumption expenditure and the independent
variables might be the family’s income, number
of children in the family, and other factors that
would affect the family’s consumption patterns.
In political science, the dependent variable
might be a state’s level of welfare spending and
the independent variables measures of public
opinion and institutional variables that would
cause the state to have higher or lower levels of
welfare spending. In sociology, the dependent
variable might be a measure of the social status
of various occupations and the independent
variables characteristics of the occupations (pay,
qualifications, etc.). In psychology, the dependent
variable might be individual’s racial tolerance as
measured on a standard scale and with indicators
of social background as independent variables.
In education, the dependent variable might be
a student’s score on an achievement test and
the independent variables characteristics of the
student’s family, quality of teachers, or school.
One major way to judge whether a variable adds
to the explanatory power of a model is by looking
at the impact its inclusion has on the value of
R-Squared. If the value of R-Squared increases
significantly when a variable is added to the
model, then the extra information provided by this
variable increases the model’s ability to explain
the variation in the response variable Imam.[2]
ISSN 2581-3463
2. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 7
Diagnostic techniques were gradually developed
to find problems in model-fitting and to assess
the quality and reliability of regression estimates.
These concerns turned into an important area
in regression theory intended to explore the
characteristics of a fitted regression model for a
given data set. Discussion of diagnostics for linear
regression models is often indispensable chapters
or sections in most of the statistical textbooks on
linear models. One of the most influential books on
the topic was regression diagnostics: Identifying
influential data and sources of collinearity by
Belsley et al.[3]
Survey literature has not given much attention to
diagnosticsforlinearregressionmodels.Devilleand
Särndal,[4]
and Potter[5]
discuss some possibilities
for locating or trimming extreme survey weights
when the goal is to estimate population totals
and other simple descriptive statistics. Hulliger[6]
and Moreno-Rebollo et al.[7]
addresses the effect
of outliers on the Horvitz-Thompson estimator
of a population total. Smith (1987) demonstrates
diagnostics based on case deletion and a form of
the influence function. Chambers and Skinner,[8]
Gwet and Rivest,[9]
Welsh and Ronchetti,[10]
and
Duchesne (1999) conduct research on outlier
robust estimation techniques for totals. Elliott[11]
and Korn and Graubard (1995)[12]
are two of the
few references which introduce techniques for the
evaluation of the quality of regression on complex
survey data.
When a regression model is considered for an
application, we can usually not be certain in
advance that the model is appropriate for that
application, any one, or several, of the features
of the model, such as linearity of the regression
function or normality of the error terms, may not
be appropriate for the particular data at hand.[13]
Hence, it is important to examine the suitability
of the model for the data before inferences based
on that model are undertaken. This achieved
using model diagnostic checking techniques on
the regression model. In this section, we discuss
some simple graphic methods for studying the
appropriateness of a model, as well as some
remedial measures that can be helpful when the
data are not in accordance with the conditions of
the regression model. Therefore, the aim of this
studyistoexaminedeparturesfromtheassumption
of linear regression model with normal errors
through model diagnostic checking techniques.
We shall consider following six important types
of departures from linear regression model with
normal errors: non-linearity, non-constant of
variances of error term, multicollinearity, outlier
in observations, and non-normality of error term.
METHODOLOGY
This section involves the full procedure for the
diagnosis testing on deviations of regression
model from some its assumptions that are
considered in this paper. The data used are
secondary which comprises a dependent/response
and two independent/predictor variables. In
regression analysis, tests based on the square of
the residuals may be used to detect non-linearity.
This study considers the following five important
types of departures from linear regression model.
These are as follows: the regression function is
not linear, the error terms do not have constant
variance, the error terms are not independent, the
model fits all but one or a few outlier observations,
and the error terms are not normally distributed.
The procedures to carry out these departures are
stated as follows:
Linear Regression Models
We consider a basic linear model where there is
only one predictor variable and the regression
function is linear. Model with more than one
predictor variable is straight forward. The model
can be stated as follows:
Y X
i 0 1 i i
= + +
β β ε (1)
Where Yi
is the value of the response variable
in the ith trial β0
and β1
are parameters, Xi
is
a known constant, namely, the value of the
predictor variable in the ith trial, εi
is a random
error term with mean zero and variance σ2
and
εi
and εj
are uncorrelated so that their covariance
is zero.
Regression model (1) is said to be simple, linear
in the parameters, and linear in the predictor
variable. It is “simple” in that there is only one
predictor variable, “linear in the parameters,”
because no parameters appears as an exponent
or its multiplied or divided by another parameter,
and “linear in predictor variable” because this
variable appears only in the first power. A model
that is linear in the parameters and in the predictor
variable is also called first order model.
3. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 8
Diagnostics and Remedial Measures
When a regression model is considered for an
application, we can usually not be certain in
advance that the model is appropriate for that
application, any one, or several, of the features
of the model, such as linearity of the regression
function or normality of the error terms, may not be
appropriate for the particular data at hand. Hence, it
is important to examine the aptness of the model for
the data before inferences based on that model are
undertaken. In this section, we discuss some simple
graphic methods for studying the appropriateness
of a model, as well as some remedial measures that
can be helpful when the data are not in accordance
with the conditions of the regression model.
Non-linearity of Regression Model
Whetheralinearregressionfunctionisappropriatefor
the data being analyzed can be studied from a residual
plot against the predictor variable or equivalently
from a residual plot against the fitted values.
Figure 1a shows a prototype situation of the residual
plot against X when a linear regression model is
appropriate.The residuals then fall within a horizontal
band centered around 0, displaying no systematic
tendencies to be positive and negative. Figure 1b
shows a prototype situation of a departure from the
linear regression model that indicates the need for a
curvilinearregressionfunction.Here,theresidualstend
to vary in a systematic fashion between being positive
andnegative.However,Figure 2adisplaysaprototype
situation of a departure from the linear regression
model that indicates the need a nonlinear relationship
while Figure 2b shows a linear relationship.
Non-constancy of Error Variance
Plots of residuals against the predictor variable or
against the fitted values are not only helpful to study
whether a linear regression function is appropriate
but also to examine whether the variance of the error
terms is constant. The prototype plot in Figure 1a
exemplifies residual plots when error term variance
is constant. Figure 2a shows a prototype picture of
residual plot when the error variance increases with X.
Presence of Outliers
Outliers are extreme observations. Residual
outliers can be identified from residual plots
Figure 1: (a and b) Prototype situation of the residual plot
against x when a linear regression model is appropriate
against X or ˆ
Y . Outliers can create great difficulty.
When we encounter one, our first suspicion is that
the observation resulted from a mistake or other
extraneous effect. On the other hand, outliers
may convey significant information, as when an
outlier occurs due to an interaction with another
predictor omitted from the model. A safe rule
frequently suggested is to discard an outlier only
if there is direct evidence that it represents in error
in recording, a miscalculation, a malfunctioning
of equipment, or a similar type of circumstances.
Non-independence of Error Terms
Whenever data are obtained in a time sequence or
some other type of sequence, such as for adjacent
geographical areas, it is good idea to prepare a
sequence plot of the residuals. The purpose of
plotting the residuals against time or some other
type of sequence is to see if there is any correlation
between error terms that are near each other in
the sequence. A prototype residual plot showing a
time related trend effect is presented in Figure 2b,
which portrays a linear time related trend effect.
When the error terms are independent, we expect
the residuals in a sequence plot to fluctuate in a
more or less random pattern around the base line 0.
Non-normality of Error Terms
Small departures from normality do not create any
serious problems. Major departures, on the other
hand, should be of concern. The normality of the
error terms can be studied informally by examining
the residuals in a variety of graphic ways.
a b
Figure 2: (a and b) A prototype situation of a departure
from the linear regression model
a b
4. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 9
Comparison of frequencies when the number of
cases is reasonably large is to compare actual
frequencies of the residuals against expected
frequencies under normality. For example, one
can determine whether, say, about 90% of the
residuals fall between ±1.645 MSE. Normal
probability plot: Still another possibility is to
prepare a normal probability plot of the residuals.
Here, each residual is plotted against its expected
value under normality. A plot that is nearly linear
suggests agreement with normality, whereas a plot
that departs substantially from linearity suggests
that the error distribution is not normal.
Omission of Important Predictor Variables
Residuals should also be plotted against variables
omitted from the model that might have important
effects on the response. The purpose of this
additional analysis is to determine whether there
are any key variables that could provide important
additional descriptive and predictive power to
the model. The residuals are plotted against the
additional predictor variable to see whether or not
the residuals tend to vary systematically with the
level of the additional predictor variable.
Overview of Tests
Graphical analysis of residuals is inherently
subjective. Nevertheless, subjective analysis
of a variety of interrelated residuals plots will
frequently reveal difficulties with the model more
clearly than particular formal tests.
TESTS FOR RANDOMNESS
A run test is frequently used to test for lack of
randomness in the residuals arranged in time order.
Anothertest,speciallydesignedforlackofrandomness
in least squares residuals is the Durbin-Watson test.
Durbin–Watson test
The Durbin–Watson test assumes the first order
autoregressive error models. The test consists of
determining whether or not the autocorrelation
coefficient (ρ, say) is zero. The usual test
alternatives considered are:
H0
: ρ = 0
H0
: ρ 0
The Durbin–Watson test statistic D is obtained
using ordinary least squares to fit the regression
function, calculating the ordinary residuals:
ˆ
t t t
e Y Y
= − , and then calculating the statistic:
D
e e
n e
t t
t
n
t
t
=
− −
=
=
∑
∑
( )
1
2
2
2
1
Exact critical values are difficult to obtain,
but Durbin–Watson has obtained lower and
upper bound dL
and dU
such that a value of D
outside these bounds leads to a definite decision.
The decision rule for testing between the
alternatives is:
if D dU
, conclude H0
if D dL
, conclude H1
if L U
d D d
≤ ≤ , test is inconclusive.
Small value of D leads to the conclusion that ρ 0.
Correlation Test for Normality
In addition to visually assessing the appropriate
linearity of the points plotted in a normal
probability plot, a formal test for normality of
the error terms can be conducted by calculating
the coefficient of correlation between residuals ei
and their expected values under normality. A high
value of the correlation coefficient is indicative of
normality.
TESTS FOR CONSTANCY OF ERROR
VARIANCE
Modified Levene Test
The test is based on the variability of the residuals.
Let ei1
denotes the ith
residual for group 1 and ei2
denotes the ith
residual for group 2. Furthermore,
we denote n1
and n2
to denote the sample sizes of
the two groups, where: n1
+ n2
= n.
Further, we shall use
e1 and
e2 to denote the
medians of the residuals in the two groups. The
modified Levene test uses the absolute deviations
of the residuals around their median, to be denoted
by di1
and di2
:
d e e
i i
1 1 1
= − , d e e
i i
2 2 2
= −
5. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 10
With this notation, the two-sample t-test statistic
becomes:
t
d d
s
n n
L
*
=
−
+
1 2
1 2
1 1
Where d1 and d2 are the sample means of the di1
and di2
, respectively, and the pooled variance s2
is:
s
d d d d
n
i i
2 1 1
2
2 2
2
2
=
− + −
−
∑
∑( ) ( )
If the error terms have constant variance and n1
and n2
are not too small, tL
*
follows approximately
the t distribution with n-2° of freedom. Large
absolute values of tL
*
indicate that the error terms
do not have constant variance.
TESTS FOR OUTLYING OBSERVATIONS
(i)
Elements of Hat Matrix: The Hat matrix is
defined as H=X(X X) X
-1
′ ′ , X is the matrix for
explanatory variables. The larger values reflect
data points are outliers.
(ii)
WSSDi
: WSSDi
is an important statistic
to locate points that are remote in x-space.
WSSDi
measures the weighted sum of
squared distance of the ith
point from the
center of the data. In general, if the WSSDi
values progress smoothly from small to
large, there are probably no extremely remote
points. However, if there is a sudden jump in
the magnitude of WSSDi
, this often indicates
that one or more extreme points are present.
(iii)
Cook’s Di
: Cook’s Di
is designed to measure
the shift in ŷ when ith
observation is not used
in the estimation of parameters. Di
follows
approximately F p n p
, − −
( )
1
(1-α). Lower 10%
point of this distribution is taken as a
reasonable cutoff (more conservative users
suggest the 50% point). The cutoff for Di
can
be taken as
4
n
.
(iv)
DFFITSi
:DFFITisusedtomeasuredifference
in ith
component of ( )
( )
ˆ ˆ i
y y
− . It is suggested
that DFFITS
p
n
i ≥
+
2
1
1
2
may be used to
flag off influential observations.
(v) DFBETASj(i) : Cook’s Di
reveals the impact of
ith
observation on the entire vector of the
estimated regression coefficients. The
influential observations for individual
regression coefficient are identified by
DFBETAS j p
j i
( ) , , ,...,
= +
1 2 1, where each
DFBETASj i
( ) is the standardized change in bj
when the ith
observation is deleted.
(vi) COVRATIOi :Theimpactoftheith
observation
on variance-covariance matrix of the estimated
regression coefficients is measured by the
ratio of the determinants of the two variance-
covariance matrices. Thus, COVRATIO
reflects the impact of the ith
observation on the
precision of the estimates of the regression
coefficients. Values near 1 indicate that the ith
observation has little effect on the precision of
the estimates. A value of COVRATIO 1
indicates that the deletion of the ith
observation
decreases the precision of the estimates; a ratio
1indicatesthatthe deletionofthe observation
increases the precision of the estimates.
Influential points are indicated by,
COVRATIO
p
n
i −
+
( )
1
3 1
TEST OF LINEARITY IN THE DATA
Testing whether a linear regression function is
appropriate for the data being analyzed can be
studied from a residual plot against the predictor
variable or equivalently from a residual plot
against the fitted values.
This is displayed as follows:
Figure 3 shows a residual plot against the predictor
variable. Ideally, the residual plot will show
no fitted pattern. That is, the red line should be
approximately horizontal at zero. The presence of
a pattern may indicate a problem with some aspect
of the linear model. In this Figure 3 above, there
is no pattern in the residual plot and the red line
is approximately horizontal. This suggests that
we can assume linear relationship between the
predictors and the outcome variables.
Test of Normality of Residuals
Sometimes the error distribution is “skewed” by the
presence of a few large outliers. Since parameter
estimation is based on the minimization of squared
6. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 11
error, a few extreme observations can exert a
disproportionate influence on parameter estimates.
The best test for normally distributed errors is a
normal probability plot or normal quantile plot
of the residuals. These are plots of the fractiles of
error distribution versus the fractiles of a normal
distribution having the same mean and variance.
If the distribution is normal, the points on such a
plot should fall close to the diagonal reference
line. A bow-shaped pattern of deviations from the
diagonal indicates that the residuals have excessive
skewness (i.e., they are not symmetrically
distributed, with too many large errors in one
direction). An S-shaped pattern of deviations
indicates that the residuals have excessive kurtosis,
that is, there are either too many or two few large
errors in both directions. Sometimes the problem
is revealed to be that there are a few data points
on one or both ends that deviate significantly from
the reference line (“outliers”), in which case they
should get close attention. The QQ plot of residuals
is used here to visually check the normality
assumption.Thenormalprobabilityplotofresiduals
should approximately follow a straight line. In our
example, all the points fall approximately along
this reference line, so we can assume normality.
Figure 4 displays the normal probability plot of
residuals. The normal probability plot of residuals
should approximately follow a straight line. In our
Figure 4above,allthepointsfallapproximatelyalong
this reference line, so we can assume normality.
Test of Normality Shapiro–Wilk Normality
Test
Shapiro–Wilk normality test is used in this study.
Calculation of confidence intervals and various
significance tests for coefficients is all based on
the assumptions of normally distributed errors. If
the error distribution is significantly non-normal,
confidence intervals may be too wide or too narrow.
The normality analysis in the Table 1 shows that
the residual values is normal because the p-value is
5% level of significance and therefore accept Ho
.
Detection of Outliers and High Leverage
Points
The presence of outliers may affect the
interpretation of the model, because it increases
the RSE. Outliers can be identified by examining
the standardized residual (or studentized residual),
Figure 3: Checking Linearity in the Data
Figure 4: Normality of Residual Plot
7. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 12
Figure 5: Outlier Checking
Figure 6: Leverage Checking
which is the residual divided by its estimated
standard error. Standardized residuals can be
interpreted as the number of standard errors away
from the regression line. Observations whose
standardized residuals are 3 in absolute value are
possible outliers. A data point has high leverage,
if it has extreme predictor x values. This can be
detected by examining the leverage statistic or
the hat-value. A value of this statistic above 2(p +
1)/n indicates an observation with high leverage;[1]
where, P is the number of predictors and n is the
number of observations.
The plot above in Figure 5 on outlier checking
highlights the top 3 most extreme points (#40, #26
and #16), with a standardized residuals higher 3.
However, there are outliers that exceed 3 standard
deviations.
In addition, in Figure 6 on leverage checking,
there is high leverage point in the data. That is,
some data points have a leverage statistic higher
than 2 (p + 1)/n = 6/80 = 0.08.
Testing the Homoscedasticity Assumption
This assumption was checked by examining the
scale-location plot, also known as the spread-
location plot.
Figure 7 shows if residuals are spread equally
along the ranges of predictors. It is good if you
see a horizontal line with equally spread points.
In our Figure 7, this is the case. We can assume
Homogeneity of variance.
Non-Constant Error Variance (NCV) Test
A further test was carried out using non-constant
variance (NCV) test to determine whether the
assumption of homoscedacity holds in the data.
Table 2 below shows the result/output of the test.
The analysis in Table 2 is testing of
homoscedasticity assumption. It shows that, there
homoscedacity because P-value is 5% level of
significance and therefore accept Ho
.
Testing the Multicollinearity Assumption
Variance inflation factor for multicollinearity
was tested using Farrar-Glauber test for
multicollinearity. This is to ascertain the weather
there is a collinearity between the two independent
Table 1: Normality test
Test
statistic
(W)
P‑value
(p)
Null
hypothesis
(H0
)
Decision
0.97052 0.06126 There is
normality of
residual
Accept H0
(Residual is
normal)
8. Danjuma: Regression diagnostic checking for survey data on mothers’ weights and ages
AJMS/Oct-Dec-2022/Vol 6/Issue 4 13
Figure 7: Homoscedasticity Assumption Checking
Table 4: Durbin–Watson Test
Correlation
value
Test
statistic
(D‑W)
P‑value
(p)
Null
hypothesis
(H0
)
Decision
0.0473677 1.882538 0.634 There is no
correlation
Accept
H0
(no
correlation)
Table 2: Homoscedasticity assumption test using NCV
statistic
Test
statistic
(X2
)
P‑value
(p)
Null
Hypothesis
(H0
)
Decision
0.0196848 0.8884 There is
homoscedacity
Accept H0
(homoscedacity
exist)
Table 3: Collinearity test
VIF 1.008809 3 No collinearity is detected
by the test
Farrar Test 2.05 3 No collinearity is detected
by the test
Durbin
WatsonTest
0.0473677 1.882538 0.634
variables (Mothers’ weights and mothers’ ages) or
not. The results are given as follows.
Tables 3 and 4 above show the collinearity and
Durbin–Watson Test. The result in Table 3 indicates
in that there is no collinearity between the two
independent variables since the values of both tests
are3.Furthermore,inTable 4,thereisnocorrelation
between the two independent variables due to its
P-value higher than 5% level of significance.
CONCLUSION
The findings from the above preliminary
investigation and diagnostics of assumptions/
deviation from the assumption give indications that
data on weight of child at birth are dependent on his/
her mother weight or age during the birth. All the
assumptions tested from the objectives (normality
of residual, collinearity between the independent
variable, outlier/leverage, and linearity of the model)
are met and no one deviated from the assumptions
of multiple linear regression fitted on the data.
REFERENCES
1. Oyeyemi GM, Bukoye A, Akeyede I. Comparison of
outlier detection procedures in multiple linear
regressions. Am J Math Stat 2015;5:37-41.
2. Imam A. Investigation of parameter behaviors in
stationarity of autoregressive and moving average models
through simulations. Asian J Math Sci 2020;4:30-7.
3. Belsley DA, Kuh E, Welsch R. Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity.
New York: John Wiley; 1980.
4. Deville JC, Särndal CE. Calibration estimators in
survey sampling. J Am Stat Assoc 1992;87:376-82.
5. Potter FJ. A study of procedures to identify and trim
extreme sampling weights. In: Proceedings Section on
Survey Research Methods Association. New Jersey:
American Statistical Association; 1990. p. 225-30.
6. Hulliger B. Outlier robust Horvitz-Thompson
estimators. Surv Methodol 1995;21:79-87.
7. Moreno-Rebollo JL, Muñoz-Reyes A, Muñoz-
Pichardo J. Influence diagnostic in s u r v e y
sampling: Conditional bias. Biometrika 1999;86:923-8.
8. Chambers RL, Skinner CJ. Analysis of Survey Data.
New York: John Wiley; 2003.
9. Gwet J, Rivest L. Outlier resistant alternatives to the
ratio estimator. J Am Stat Assoc 1992;87:1174-82.
10. Welsh AH, Ronchetti E. Bias-calibrated estimation
from sample surveys containing outliers. J R Statist Soc
Ser B Methodol 1998;60:413-28.
11. Elliott MR. Bayesian weight trimming for generalized
linear regression models. Surv Methodol 2017;43:23-34.
12. Korn EL, Graubard BI. Examples of differing weighted
and un-weighted estimates from a sample survey.
Am Stat 1995;49:291-5.
13. Warha AA, Yusuf AM, Akeyede I. A comparative
analysis on some estimators of parameters of linear
regression models in presence of multicollinearity.
Asian J Probab Stat 2018;2:1-8.