a template for publishing with the most amazing paper ever never ever yes and this is the really most amazing paper you haev e ver seen and you know why Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...v wsef sdf equirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...v wsef sdf equirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day Trace Elements : Requirement ...Minerals are essential for normal growth and maintenance of the body. Major elements : Requirement >100 mg /day T
A simulation study is carried put in order to compare the perfomance of the Lincoln -Petersen and Chapman estimators for single capture-recapture or dual system estimation when the sizes of the samples or record systems are not fixed by researcher. Performance is explored through both bias and variability. Unless both record probabilities and population size are very small, the Chapman estimator performs better than the Lincoln-Petersen estimator. This is due to the lower varaibility of the Chapman estimator and because it is nearly unbiased for a set population size ande record probabilities wider than the set for which the Lincon-Petersen estimator is nearly unbiased. Thus, for those kin of studies where te record probabulity is high for at least one record system, such as census correction studies, it should be preferred the Chapman estimator
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...IJCI JOURNAL
Activity based travel demand models are widely used in transportation planning to predict future demand of
transportation. Disaggregate level data for the entire population is required as input to these models, which
included household level and person level attributes for the entire study area. These data are usually
collected by the population census, but are rarely available due to confidentiality reasons. Hence as a
viable alternative, population synthesis techniques are used to supplement the microdata. An attempt has
been made in this study to generate synthetic population using Markov Chain Monte Carlo Simulation
method and to compare this with conventional method. Thiruvananthapuram Corporation in Kerala was
selected as the study area and sample data were collected by household survey. The algorithm for
population synthesis was coded in C++. The methodology was validated using 16 percentage of the
collected data. Prediction accuracy of the method was compared with conventional method and was found
better.
A STEPWISE TOOL FOR SPATIAL DELINEATION OF MARGINAL AREASRise Spatial
cite as: Duque, J. C.; Royuela, V.; Norenña, M. (2010) A stepwise tool for spatial delimitation of poverty areas. Medellín (Colombia) as a case study. In Seminar of Spatial Econometrics in honor of Doctor J.H.P. Paelinck, Oviedo, Spain. 22 October.
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
This article advances our understanding of regression-based data mining by comparing the utility of Least
Absolute Value (LAV) and Least Squares (LS) regression methods. Using demographic variables from
U.S. state-wide data, we fit variable regression models to dependent variables of varying distributions
using both LS and LAV. Forecasts generated from the resulting equations are used to compare the
performance of the regression methods under different dependent variable distribution conditions. Initial
findings indicate LAV procedures better forecast in data mining applications when the dependent variable
is non-normal. Our results differ from those found in prior research using simulated data.
This document describes a study that uses the Mahalanobis Taguchi System (MTS) to create a heat vulnerability index for New York City neighborhoods. The MTS is a statistical method that combines Mahalanobis distances and Taguchi orthogonal arrays to identify important variables. The study collects geographical, socioeconomic, and tree cover data for NYC census tracts. MTS is applied to calculate Mahalanobis distances between "normal" and "outside" tract groups. Taguchi arrays are used to test variables. The results are inconclusive due to limited data. More medical data is needed to better differentiate tract groups and identify significant variables for a heat vulnerability index.
The epidemiology of schistosomiasis in the later stages of a control program ...Alim A-H Yacoub Lovers
Southgate BA, Yacoub A. The epidemiology of schistosomiasis in the later stages of a control program based on chemotherapy: the Basrah study. 3. Antibody distributions and the use of age catalytic models and log-probit analysis in seroepidemiology. Transactions of the Royal Society of Tropical Medicine and Hygiene. 1987 Jan 1;81(3):468-75.
This study analyzes the climate change vulnerability of small holder farmers in Genderighe kebele of Dire Dawa
Administration. The purpose of the study was to generate a baseline data on the overall vulnerability of HHs in the
kebele and rank/prioritize villages based on their respective level of vulnerability. In so-doing, data was collected from
a total of 69 households in the 8 villages of Genderighe kebele, employing a proportionate random sampling
technique. According to Temesgen et al., (2008), vulnerability to climate change in Ethiopia is highly related to
poverty. Hence, the Head Count Ratio (HCR) method of poverty analysis was used in measuring the absolute level of
HHs’ vulnerability to climate change in the study area. Results from this analysis revealed that 48%of the sample
households in the kebele live below the poverty line implying that about 574 households were estimated to be poor
and/or vulnerable to climate change. The HCR indices were also estimated for each of the 8 villages in the study
kebele and were used as relative measure of vulnerability based on which villages were ranked. Thus, out of the 8
villages in Genderighe kebele, Halele, Eyidu arba and Kiyero villages have shown an HCR value of 0.78, 0.75 and
0.75 respectively. On the other hand, Haloli, Goro guda and Dieyno villages have shown a relatively smaller value of
HCR (0.50, 0.43 and 0.33 respectively), while Wechali and Gend rieghie proper villages had the smallest value, that
was 0.25 and 0.23 respectively. However, since these rankings are based on the relative value of HCR indices, which
relied only on HHs’ level of consumption, the research team has employed a more inclusive method of vulnerability
ranking, by applying a Multi-Criteria Analysis (MCA) that enables controlling for other socioeconomic and biophysical
factors. Hence, applying a simeltaneous analysis of different vulnerability indicators classified into five classes
(availability of technology, wealth, Infrastructure, Irrigation potential and frequency of drought and flood), the MCA
has constructed vulnerability indices for each village, based on which villages were prioritized once again. Comparing
the outcomes from the two ranking methods employed, it appears that some villages keep the same positions or
successive positions, which shows almost the same output in both methods. Thus, investing in the development of
the relatively underdeveloped villages, irrigation for villages with high potential and production of drought-tolerant
varieties of crops and species of livestock can all reduce the vulnerability of small farmers to climate change in the
study areas.
This document summarizes a study that used small area estimation to analyze poverty levels across sub-districts in Demak District, Indonesia. Regression analysis was used to model the relationship between poverty (dependent variable) and the percentage of farm households, access to water taps, and population density (independent variables). Small area estimation with a non-parametric kernel approach was then applied to estimate poverty levels for each sub-district using the model and additional data from statistics surveys. The results of this poverty mapping showed that population density was the dominant factor influencing poverty levels in some sub-districts of Demak.
A simulation study is carried put in order to compare the perfomance of the Lincoln -Petersen and Chapman estimators for single capture-recapture or dual system estimation when the sizes of the samples or record systems are not fixed by researcher. Performance is explored through both bias and variability. Unless both record probabilities and population size are very small, the Chapman estimator performs better than the Lincoln-Petersen estimator. This is due to the lower varaibility of the Chapman estimator and because it is nearly unbiased for a set population size ande record probabilities wider than the set for which the Lincon-Petersen estimator is nearly unbiased. Thus, for those kin of studies where te record probabulity is high for at least one record system, such as census correction studies, it should be preferred the Chapman estimator
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...IJCI JOURNAL
Activity based travel demand models are widely used in transportation planning to predict future demand of
transportation. Disaggregate level data for the entire population is required as input to these models, which
included household level and person level attributes for the entire study area. These data are usually
collected by the population census, but are rarely available due to confidentiality reasons. Hence as a
viable alternative, population synthesis techniques are used to supplement the microdata. An attempt has
been made in this study to generate synthetic population using Markov Chain Monte Carlo Simulation
method and to compare this with conventional method. Thiruvananthapuram Corporation in Kerala was
selected as the study area and sample data were collected by household survey. The algorithm for
population synthesis was coded in C++. The methodology was validated using 16 percentage of the
collected data. Prediction accuracy of the method was compared with conventional method and was found
better.
A STEPWISE TOOL FOR SPATIAL DELINEATION OF MARGINAL AREASRise Spatial
cite as: Duque, J. C.; Royuela, V.; Norenña, M. (2010) A stepwise tool for spatial delimitation of poverty areas. Medellín (Colombia) as a case study. In Seminar of Spatial Econometrics in honor of Doctor J.H.P. Paelinck, Oviedo, Spain. 22 October.
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
This article advances our understanding of regression-based data mining by comparing the utility of Least
Absolute Value (LAV) and Least Squares (LS) regression methods. Using demographic variables from
U.S. state-wide data, we fit variable regression models to dependent variables of varying distributions
using both LS and LAV. Forecasts generated from the resulting equations are used to compare the
performance of the regression methods under different dependent variable distribution conditions. Initial
findings indicate LAV procedures better forecast in data mining applications when the dependent variable
is non-normal. Our results differ from those found in prior research using simulated data.
This document describes a study that uses the Mahalanobis Taguchi System (MTS) to create a heat vulnerability index for New York City neighborhoods. The MTS is a statistical method that combines Mahalanobis distances and Taguchi orthogonal arrays to identify important variables. The study collects geographical, socioeconomic, and tree cover data for NYC census tracts. MTS is applied to calculate Mahalanobis distances between "normal" and "outside" tract groups. Taguchi arrays are used to test variables. The results are inconclusive due to limited data. More medical data is needed to better differentiate tract groups and identify significant variables for a heat vulnerability index.
The epidemiology of schistosomiasis in the later stages of a control program ...Alim A-H Yacoub Lovers
Southgate BA, Yacoub A. The epidemiology of schistosomiasis in the later stages of a control program based on chemotherapy: the Basrah study. 3. Antibody distributions and the use of age catalytic models and log-probit analysis in seroepidemiology. Transactions of the Royal Society of Tropical Medicine and Hygiene. 1987 Jan 1;81(3):468-75.
This study analyzes the climate change vulnerability of small holder farmers in Genderighe kebele of Dire Dawa
Administration. The purpose of the study was to generate a baseline data on the overall vulnerability of HHs in the
kebele and rank/prioritize villages based on their respective level of vulnerability. In so-doing, data was collected from
a total of 69 households in the 8 villages of Genderighe kebele, employing a proportionate random sampling
technique. According to Temesgen et al., (2008), vulnerability to climate change in Ethiopia is highly related to
poverty. Hence, the Head Count Ratio (HCR) method of poverty analysis was used in measuring the absolute level of
HHs’ vulnerability to climate change in the study area. Results from this analysis revealed that 48%of the sample
households in the kebele live below the poverty line implying that about 574 households were estimated to be poor
and/or vulnerable to climate change. The HCR indices were also estimated for each of the 8 villages in the study
kebele and were used as relative measure of vulnerability based on which villages were ranked. Thus, out of the 8
villages in Genderighe kebele, Halele, Eyidu arba and Kiyero villages have shown an HCR value of 0.78, 0.75 and
0.75 respectively. On the other hand, Haloli, Goro guda and Dieyno villages have shown a relatively smaller value of
HCR (0.50, 0.43 and 0.33 respectively), while Wechali and Gend rieghie proper villages had the smallest value, that
was 0.25 and 0.23 respectively. However, since these rankings are based on the relative value of HCR indices, which
relied only on HHs’ level of consumption, the research team has employed a more inclusive method of vulnerability
ranking, by applying a Multi-Criteria Analysis (MCA) that enables controlling for other socioeconomic and biophysical
factors. Hence, applying a simeltaneous analysis of different vulnerability indicators classified into five classes
(availability of technology, wealth, Infrastructure, Irrigation potential and frequency of drought and flood), the MCA
has constructed vulnerability indices for each village, based on which villages were prioritized once again. Comparing
the outcomes from the two ranking methods employed, it appears that some villages keep the same positions or
successive positions, which shows almost the same output in both methods. Thus, investing in the development of
the relatively underdeveloped villages, irrigation for villages with high potential and production of drought-tolerant
varieties of crops and species of livestock can all reduce the vulnerability of small farmers to climate change in the
study areas.
This document summarizes a study that used small area estimation to analyze poverty levels across sub-districts in Demak District, Indonesia. Regression analysis was used to model the relationship between poverty (dependent variable) and the percentage of farm households, access to water taps, and population density (independent variables). Small area estimation with a non-parametric kernel approach was then applied to estimate poverty levels for each sub-district using the model and additional data from statistics surveys. The results of this poverty mapping showed that population density was the dominant factor influencing poverty levels in some sub-districts of Demak.
This document explores the Modifiable Areal Unit Problem (MAUP) through analyzing 2001 UK Census data at different geographic scales and zoning systems in London. The MAUP causes differences in results based on the scale of aggregated data (scale effect) and how the spatial units are defined (zoning effect). The analysis found changes in unemployment and religious population percentages between ward and district scales, as well as differences in correlations between variables when boundaries were modified. This demonstrates how the MAUP can influence spatial analysis results based on how the data is organized geographically.
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...IOSRJM
-This research work investigated the behaviour of a new semiparametric non-linear (SPNL) model on
a set of panel data with very small time point (T = 1). The SPNL model incorporates the relationship between
individual independent variable and unobserved heterogeneity variable. Five different estimation techniques
namely; Least Square (LS), Generalized Method of Moments (GMM), Continuously Updating (CU), Empirical
Likelihood (EL) and Exponential Tilting (ET) Estimators were employed for the estimation; for the purpose of
modelling the metrical response variable non-linearly on a set of independent variables. The performances of
these estimators on the SPNL model were examined for different parameters in the model using the Least
Square Error (LSE), Mean Absolute Error (MAE) and Median Absolute Error (MedAE) criteria at the lowest
time point (T = 1). The results showed that the ET estimator which provided the least errors of estimation is
relatively more efficient for the proposed model than any of the other estimators considered. It is therefore
recommended that the ET estimator should be employed to estimate the SPNL model for panel data with very
small time point.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
The effectiveness of various analytical formulas for
estimating R2 Shrinkage in multiple regression analysis was
investigated. Two categories of formulas were identified estimators
of the squared population multiple correlation coefficient (
2
)
and those of the squared population cross-validity coefficient
(
2 c
). The authors compeered the effectiveness of the analytical
formulas for determining R2 shrinkage, with squared population
multiple correlation coefficient and number of predictors after
finding all combination among variables, maximum correlation
was selected to computed all two categories of formulas. The
results indicated that Among the 6 analytical formulas designed to
estimate the population
2
, the performance of the (Olkin & part
formula-1 for six variable then followed by Burket formula &
Lord formula-2 among the 9 analytical formulas were found to be
most stable and satisfactory.
Linear Algebra Project Urban Population Dynamics This project is.pdfairflyluggage
Linear Algebra Project : Urban Population Dynamics
This project is about population modeling and how linear algebra tools may be used to study it.
Background
Population modeling is useful from dierent perpectives :
1. planners at the city, state, and national level who look at human populations and need
forecasts of
populations in order to do planning for future needs. These future needs include housing,
schools, care
for elderly, jobs, and utilities such as electricity, water and transportation.
2. businesses do population planning so as to predict how the portions of the population that use
their
product will be changing.
3. ecologists use population models to study ecological systems, especially those where
endangered species
are involved so as to try to nd measures that will restore the population.
4. medical researchers treat microorganisms and viruses as populations and seek to understand
the dy-
namics of their populations ; especially why some thrive in certain environments but don\'t in
others.
In human situations, it is normal to take the intervals of 10 years as the census is taken every 10
years.
Thus the age groups would be 0-9, 10-19, 20-29 etc, so 8 or 9 age categories would probably be
appropriate.
The survival fractions would then show the fraction of \"newborns\" (0-9) who survive to age
10, the fraction
of 10 to 19 years old who survive to 20 etc. This type of data is compiled, for example, by
actuaries working
for insurance companies for life and medical insurance purposes.
The basic equations we begin with are
x(k + 1) = Ax(k) k = 0; 1; 2; ::: and x(0) given (0.1)
with solution found iteratively to be
x(k) = Akx(0) (0.2)
Your project
Suppose we are studying the population dynamics of Los Angeles for the purpose of making
planning proposal.
As above, we take the unit of time to be 10 years, and take 7 age groups : 0-9, 10-19,..., 50-59,
60+. Suppose
further that the population distribution as of 1990 is
(3:1; 2:8; 2:0; 2:5; 2:0; 1:8; 2:9)(105)
1
and that the Leslie matrix, A, for this model appears as
A :=
2
666666664
:2 1:2 1:1 :9 :1 0 0
:7 0 0 0 0 0 0
0 :82 0 0 0 0 0
0 0 :97 0 0 0 0
0 0 0 :97 0 0 0
0 0 0 0 :90 0 0
0 0 0 0 0 :87 0
3
777777775
Part One :
Interpret carefully each of the nonzero terms in the matrix. In addition, indicate what factors you
think
might change those numbers (they might be social, economical, political or environmental).
Part Two :
Predict :
what the population distribution will look like in 2000, 2010, 2020, and 2030 ?
what the total population will be in each of these years ?
by what fraction the total population changed each year ?
Additionally, what does your software tell you the largest positive eigenvalue of A is ?
Part Three :
Decide if you believe the population is going to zero, becoming stable, or is unstable in the long
run. Be
sure and describe in your write up how you arrived at your conclusion. If you have decided it is
unstable,
simulate it long enough that the column matrices for.
1. Estimation involves using sample statistics to estimate population parameters. There are two types of estimation - point estimation and interval estimation.
2. Point estimation provides a single value for the population parameter while interval estimation provides a range of values within which the population parameter is estimated to fall.
3. Good estimators are unbiased, consistent, sufficient, and efficient. The margin of error used in interval estimation depends on the standard error of the estimator.
Project #4 Urban Population Dynamics This project will acquaint y.pdfanandinternational01
Project #4: Urban Population Dynamics This project will acquaint you with population
modeling and how linear algebra tools may be used to study it. Background Kolman, pages
305-307. Population modeling is useful from many different perspectives: planners at the city,
state, and national level who look at human populations and need forecasts of populations in
order to do planning for future needs. These future needs include housing, schools, care for the
elderly, jobs, and utilities such as electricity,water and transportation. businesses do population
planning so as to predict how the portions of the population that use their product will be
changing. Ecologists use population models to study ecological systems, especially those where
endangered species are involved so as to try to find measures that will restore the population.
medical researchers treat microorganisms and viruses as populations and seek to understand the
dynamics of their populations; especially why some thrive in certain environments but don\'t in
others. In human situations, it is normal to take intervals of 10 years as the census is taken every
10 years. Thus the age groups would be 0-9,10-19,11-20 etc , so 8 or 9 age categories would
probably be appropriate. The survival fractions would then show the fraction of \"newborns\" (0-
9) who survive to age 10, the fraction of 10 to 19 year olds who survive to 20 etc. This type of
data is compiled, for example, by actuaries working for insurance companies for life and medical
insurance purposes. The basic equations we begin with are (1) x(k+1) = Ax(k) k=0,1,2,. . . and
x(0) given with solution found iteratively to be (2) x(k) = Akx(0) (see Kolman for details of the
structure of A, which is 7 x 7 in this case). Your Project Suppose we are studying the
population dynamics of Los Angeles for the purpose of making a planning proposal to the city
which will form the basis for predicting school, transportation, housing, water, and electrical
needs for the years from 2000 on. As above, we take the unit of time to be 10 years, and take 7
age groups: 0-9,10-19,...,50-59,60+. Suppose further that the population distribution as of 1990
(the last census) is (3.1, 2.8, 2.0, 2.5, 2.0, 1.8, 2.9) (x105 ) and that the Leslie matrix,A, for this
model appears as Part One: Interpret carefully each of the nonzero terms in the matrix. In
addition, indicate what factors you think might change those numbers (they might be social,
economical, political or environmental). Part Two: Predict: what the population distribution
will look like in 2000, 2010, 2020 and 2030 what the total population will be in each of those
years by what fraction the total population changed each year Additionally, what does your
software tell you the largest, positive eigenvalue of A is? Part Three: Decide if you believe the
population is going to zero, becoming stable, or is unstable in the long run. Be sure and describe
in your write up how you arrived at your conclusion. If.
large data set is not available for some disease such as Brain Tumor. This and part2 presentation shows how to find "Actionable solution from a difficult cancer dataset
This document discusses a machine learning project to predict household poverty levels. The goals are to build classification models to predict whether a household is poor or non-poor based on their attributes. The project will use a dataset containing household characteristics like home materials and assets. Linear regression algorithms will be trained and evaluated on the dataset. The expected outcome is identifying the features that influence poverty levels and accurately classifying households.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Criminal Justice Statistics: Lab 4
CRJS-3020-01 | Points: 30
Assignment Objectives
To assess your knowledge of concepts covered in the class text and lectures, as well as your practical knowledge using SPSS. Use the Intercity Youth Homicide Dataset, Codebook, and User Guide to answer the below questions.
Course Materials Covered
Correlation, Bivariate Regression
1. (7 points). You want to determine whether there is a significant relationship between the robbery rates for young adults (RA_ATR) and the density of establishments that sell alcohol (ALCDEN1) in a jurisdiction and you would like to know the strength of that relationship.
a. Visualize each variable as a graph/chart using best practices. Insert the graph/chart into this document. Also, visualize the bivariate relationship between the two variables.
b. State your hypotheses.
c. If appropriate, check whether your dependent and/or independent variables are normally distributed and explain why they are, or are not, normally distributed.
d. Run the appropriate test, and report the correct results (do not copy and paste results).
e. Finally, interpret your results. What is your explanation for the results you found?
2. (7 points). You want to determine whether there is a significant relationship between the number of juvenile homicides (JUVHOM1I) and the number of gang homicides (GANGHOM1) in a jurisdiction and you would like to know the strength of that relationship.
a. Visualize each variable as a graph/chart using best practices. Insert the graph/chart into this document. Also, visualize the bivariate relationship between the two variables.
b. State your hypotheses.
c. If appropriate, check whether your dependent and/or independent variables are normally distributed and explain why they are, or are not, normally distributed.
d. Run the appropriate test, and report the correct results (do not copy and paste results).
e. Finally, interpret your results. What is your explanation for the results you found?
3. (7 points). For the year 2006, you want to determine the percent of owner occupied households (OWNOCC) has a significant impact on the juvenile robbery rates (RA_JTR) in a jurisdiction and how changes in the percent of owner occupied household changes the expected juvenile robbery rates.
a. Visualize each variable as a graph/chart using best practices. Insert the graph/chart into this document. Also, visualize the bivariate relationship between the two variables.
b. State your hypotheses.
c. If appropriate, check whether your dependent and/or independent variables are normally distributed and explain why they are, or are not, normally distributed.
d. Run the appropriate test, and report the correct results (do not copy and paste results).
e. Finally, interpret your results. What is your explanation for the results you found?
4. (7 points). You want to determine the percent of unemployed in a community (UNEMP) has a significant impact on the number of homicides (HOM) and how ...
The objective of this study was to develop and evaluate a tripartite sequential classification sampling plans to monitor pest mite Tetranychus urticae (Koch, 1836) through time.Three tripartite plans using Wald's Sequential Probability Ratio Test based on tally 0 and 5 binomial counts were developedfor use at different times. For each Wald’s plan, three hypotheses were tested and three probabilities (Pdeci; i=1; 2; 3) for making decision were simulated.Tripartite sequential classification sampling plans with tally 0 binomial counts was compared to dichotomous plans repeated every Δt and after 2Δt.Performance of monitoring protocols were studied by monitoring height T. urticae populations with logistic growth. The results showed that tripartite classification reduced from 30 to 40% of the expected bouts than dichotomous sampling plans after Δt and 2Δt days and reduce sampling cots. Monitoring protocol B reduced the probability of intervening and produced more sample bouts and more samples, which resulted in lower expected and 95th percentiles for density at intervention and expected loss compared to protocol A.The use of tripartite classification plan requiredadjustment of the cd2 and cd1 values to accomplish an efficient integrated pest management during growth season.
Comparison of statistical methods commonly used in predictive modelingSalford Systems
This document compares four statistical methods commonly used in predictive modelling: Logistic Multiple Regression (LMR), Principal Component Regression (PCR), Classification and Regression Tree analysis (CART), and Multivariate Adaptive Regression Splines (MARS). It applies these methods to two ecological data sets to test their accuracy, reliability, ease of use, and implementation in a geographic information system (GIS). The results show that independent data is needed to validate models, and that MARS and CART achieved the best prediction success, although CART models became too complex for cartographic purposes with a large number of data points.
This document discusses modelling individual consumer behaviour using an agent-based model. It summarizes work to generate a synthetic population of individuals for the Leeds area of the UK that incorporates census and commercial data. Different population synthesis techniques were compared to create the population. Future work involves using the synthetic population to create households and incorporate individual characteristics and behaviours into agents to model shopping and travel.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
This document summarizes research on predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms. It reviews relevant techniques for handling class imbalance, including using the AUC evaluation metric, resampling methods like undersampling and oversampling, tuning the positive ratio, cross-validation, regularization for logistic regression, decision trees, and ensemble methods. The study aims to develop an optimal risk prediction model by jointly applying these techniques, with results showing that boosting on decision trees using oversampled data achieves the best performance.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
The document compares the predictive performance of classification trees and logistic regression models in determining whether individuals are insured or not based on demographic and socioeconomic characteristics. It first describes the data and techniques used, including classification trees, cost complexity pruning, logistic regression, bagging, and random forests. It then describes the research method of using different portions of the data for model training and testing. The results show that logistic regression performs as well as classification trees on nonlinear data and better on linear data. Both methods select income as an important predictor, while logistic regression favors dummy variables and classification trees favor continuous variables. Random forests have the highest predictive accuracy overall.
Estimation of parameters of linear econometric model and the power of test in...Alexander Decker
1) The document discusses using Monte Carlo simulations to estimate parameters of a linear econometric model and test the power of statistical tests in the presence of heteroscedasticity.
2) Two forms of heteroscedasticity (h(x)=X1/2 and h(x)=X) were introduced into the model. Sample sizes of 20, 50, and 100 were used with 50 replications each.
3) The bias, variance, and root mean square error of ordinary least squares (OLS) and generalized least squares (GLS) estimators were examined across sample sizes and heteroscedasticity forms. Both estimators were found to be unbiased but neither was consistently more efficient.
This document explores the Modifiable Areal Unit Problem (MAUP) through analyzing 2001 UK Census data at different geographic scales and zoning systems in London. The MAUP causes differences in results based on the scale of aggregated data (scale effect) and how the spatial units are defined (zoning effect). The analysis found changes in unemployment and religious population percentages between ward and district scales, as well as differences in correlations between variables when boundaries were modified. This demonstrates how the MAUP can influence spatial analysis results based on how the data is organized geographically.
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...IOSRJM
-This research work investigated the behaviour of a new semiparametric non-linear (SPNL) model on
a set of panel data with very small time point (T = 1). The SPNL model incorporates the relationship between
individual independent variable and unobserved heterogeneity variable. Five different estimation techniques
namely; Least Square (LS), Generalized Method of Moments (GMM), Continuously Updating (CU), Empirical
Likelihood (EL) and Exponential Tilting (ET) Estimators were employed for the estimation; for the purpose of
modelling the metrical response variable non-linearly on a set of independent variables. The performances of
these estimators on the SPNL model were examined for different parameters in the model using the Least
Square Error (LSE), Mean Absolute Error (MAE) and Median Absolute Error (MedAE) criteria at the lowest
time point (T = 1). The results showed that the ET estimator which provided the least errors of estimation is
relatively more efficient for the proposed model than any of the other estimators considered. It is therefore
recommended that the ET estimator should be employed to estimate the SPNL model for panel data with very
small time point.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
The effectiveness of various analytical formulas for
estimating R2 Shrinkage in multiple regression analysis was
investigated. Two categories of formulas were identified estimators
of the squared population multiple correlation coefficient (
2
)
and those of the squared population cross-validity coefficient
(
2 c
). The authors compeered the effectiveness of the analytical
formulas for determining R2 shrinkage, with squared population
multiple correlation coefficient and number of predictors after
finding all combination among variables, maximum correlation
was selected to computed all two categories of formulas. The
results indicated that Among the 6 analytical formulas designed to
estimate the population
2
, the performance of the (Olkin & part
formula-1 for six variable then followed by Burket formula &
Lord formula-2 among the 9 analytical formulas were found to be
most stable and satisfactory.
Linear Algebra Project Urban Population Dynamics This project is.pdfairflyluggage
Linear Algebra Project : Urban Population Dynamics
This project is about population modeling and how linear algebra tools may be used to study it.
Background
Population modeling is useful from dierent perpectives :
1. planners at the city, state, and national level who look at human populations and need
forecasts of
populations in order to do planning for future needs. These future needs include housing,
schools, care
for elderly, jobs, and utilities such as electricity, water and transportation.
2. businesses do population planning so as to predict how the portions of the population that use
their
product will be changing.
3. ecologists use population models to study ecological systems, especially those where
endangered species
are involved so as to try to nd measures that will restore the population.
4. medical researchers treat microorganisms and viruses as populations and seek to understand
the dy-
namics of their populations ; especially why some thrive in certain environments but don\'t in
others.
In human situations, it is normal to take the intervals of 10 years as the census is taken every 10
years.
Thus the age groups would be 0-9, 10-19, 20-29 etc, so 8 or 9 age categories would probably be
appropriate.
The survival fractions would then show the fraction of \"newborns\" (0-9) who survive to age
10, the fraction
of 10 to 19 years old who survive to 20 etc. This type of data is compiled, for example, by
actuaries working
for insurance companies for life and medical insurance purposes.
The basic equations we begin with are
x(k + 1) = Ax(k) k = 0; 1; 2; ::: and x(0) given (0.1)
with solution found iteratively to be
x(k) = Akx(0) (0.2)
Your project
Suppose we are studying the population dynamics of Los Angeles for the purpose of making
planning proposal.
As above, we take the unit of time to be 10 years, and take 7 age groups : 0-9, 10-19,..., 50-59,
60+. Suppose
further that the population distribution as of 1990 is
(3:1; 2:8; 2:0; 2:5; 2:0; 1:8; 2:9)(105)
1
and that the Leslie matrix, A, for this model appears as
A :=
2
666666664
:2 1:2 1:1 :9 :1 0 0
:7 0 0 0 0 0 0
0 :82 0 0 0 0 0
0 0 :97 0 0 0 0
0 0 0 :97 0 0 0
0 0 0 0 :90 0 0
0 0 0 0 0 :87 0
3
777777775
Part One :
Interpret carefully each of the nonzero terms in the matrix. In addition, indicate what factors you
think
might change those numbers (they might be social, economical, political or environmental).
Part Two :
Predict :
what the population distribution will look like in 2000, 2010, 2020, and 2030 ?
what the total population will be in each of these years ?
by what fraction the total population changed each year ?
Additionally, what does your software tell you the largest positive eigenvalue of A is ?
Part Three :
Decide if you believe the population is going to zero, becoming stable, or is unstable in the long
run. Be
sure and describe in your write up how you arrived at your conclusion. If you have decided it is
unstable,
simulate it long enough that the column matrices for.
1. Estimation involves using sample statistics to estimate population parameters. There are two types of estimation - point estimation and interval estimation.
2. Point estimation provides a single value for the population parameter while interval estimation provides a range of values within which the population parameter is estimated to fall.
3. Good estimators are unbiased, consistent, sufficient, and efficient. The margin of error used in interval estimation depends on the standard error of the estimator.
Project #4 Urban Population Dynamics This project will acquaint y.pdfanandinternational01
Project #4: Urban Population Dynamics This project will acquaint you with population
modeling and how linear algebra tools may be used to study it. Background Kolman, pages
305-307. Population modeling is useful from many different perspectives: planners at the city,
state, and national level who look at human populations and need forecasts of populations in
order to do planning for future needs. These future needs include housing, schools, care for the
elderly, jobs, and utilities such as electricity,water and transportation. businesses do population
planning so as to predict how the portions of the population that use their product will be
changing. Ecologists use population models to study ecological systems, especially those where
endangered species are involved so as to try to find measures that will restore the population.
medical researchers treat microorganisms and viruses as populations and seek to understand the
dynamics of their populations; especially why some thrive in certain environments but don\'t in
others. In human situations, it is normal to take intervals of 10 years as the census is taken every
10 years. Thus the age groups would be 0-9,10-19,11-20 etc , so 8 or 9 age categories would
probably be appropriate. The survival fractions would then show the fraction of \"newborns\" (0-
9) who survive to age 10, the fraction of 10 to 19 year olds who survive to 20 etc. This type of
data is compiled, for example, by actuaries working for insurance companies for life and medical
insurance purposes. The basic equations we begin with are (1) x(k+1) = Ax(k) k=0,1,2,. . . and
x(0) given with solution found iteratively to be (2) x(k) = Akx(0) (see Kolman for details of the
structure of A, which is 7 x 7 in this case). Your Project Suppose we are studying the
population dynamics of Los Angeles for the purpose of making a planning proposal to the city
which will form the basis for predicting school, transportation, housing, water, and electrical
needs for the years from 2000 on. As above, we take the unit of time to be 10 years, and take 7
age groups: 0-9,10-19,...,50-59,60+. Suppose further that the population distribution as of 1990
(the last census) is (3.1, 2.8, 2.0, 2.5, 2.0, 1.8, 2.9) (x105 ) and that the Leslie matrix,A, for this
model appears as Part One: Interpret carefully each of the nonzero terms in the matrix. In
addition, indicate what factors you think might change those numbers (they might be social,
economical, political or environmental). Part Two: Predict: what the population distribution
will look like in 2000, 2010, 2020 and 2030 what the total population will be in each of those
years by what fraction the total population changed each year Additionally, what does your
software tell you the largest, positive eigenvalue of A is? Part Three: Decide if you believe the
population is going to zero, becoming stable, or is unstable in the long run. Be sure and describe
in your write up how you arrived at your conclusion. If.
large data set is not available for some disease such as Brain Tumor. This and part2 presentation shows how to find "Actionable solution from a difficult cancer dataset
This document discusses a machine learning project to predict household poverty levels. The goals are to build classification models to predict whether a household is poor or non-poor based on their attributes. The project will use a dataset containing household characteristics like home materials and assets. Linear regression algorithms will be trained and evaluated on the dataset. The expected outcome is identifying the features that influence poverty levels and accurately classifying households.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Criminal Justice Statistics: Lab 4
CRJS-3020-01 | Points: 30
Assignment Objectives
To assess your knowledge of concepts covered in the class text and lectures, as well as your practical knowledge using SPSS. Use the Intercity Youth Homicide Dataset, Codebook, and User Guide to answer the below questions.
Course Materials Covered
Correlation, Bivariate Regression
1. (7 points). You want to determine whether there is a significant relationship between the robbery rates for young adults (RA_ATR) and the density of establishments that sell alcohol (ALCDEN1) in a jurisdiction and you would like to know the strength of that relationship.
a. Visualize each variable as a graph/chart using best practices. Insert the graph/chart into this document. Also, visualize the bivariate relationship between the two variables.
b. State your hypotheses.
c. If appropriate, check whether your dependent and/or independent variables are normally distributed and explain why they are, or are not, normally distributed.
d. Run the appropriate test, and report the correct results (do not copy and paste results).
e. Finally, interpret your results. What is your explanation for the results you found?
2. (7 points). You want to determine whether there is a significant relationship between the number of juvenile homicides (JUVHOM1I) and the number of gang homicides (GANGHOM1) in a jurisdiction and you would like to know the strength of that relationship.
a. Visualize each variable as a graph/chart using best practices. Insert the graph/chart into this document. Also, visualize the bivariate relationship between the two variables.
b. State your hypotheses.
c. If appropriate, check whether your dependent and/or independent variables are normally distributed and explain why they are, or are not, normally distributed.
d. Run the appropriate test, and report the correct results (do not copy and paste results).
e. Finally, interpret your results. What is your explanation for the results you found?
3. (7 points). For the year 2006, you want to determine the percent of owner occupied households (OWNOCC) has a significant impact on the juvenile robbery rates (RA_JTR) in a jurisdiction and how changes in the percent of owner occupied household changes the expected juvenile robbery rates.
a. Visualize each variable as a graph/chart using best practices. Insert the graph/chart into this document. Also, visualize the bivariate relationship between the two variables.
b. State your hypotheses.
c. If appropriate, check whether your dependent and/or independent variables are normally distributed and explain why they are, or are not, normally distributed.
d. Run the appropriate test, and report the correct results (do not copy and paste results).
e. Finally, interpret your results. What is your explanation for the results you found?
4. (7 points). You want to determine the percent of unemployed in a community (UNEMP) has a significant impact on the number of homicides (HOM) and how ...
The objective of this study was to develop and evaluate a tripartite sequential classification sampling plans to monitor pest mite Tetranychus urticae (Koch, 1836) through time.Three tripartite plans using Wald's Sequential Probability Ratio Test based on tally 0 and 5 binomial counts were developedfor use at different times. For each Wald’s plan, three hypotheses were tested and three probabilities (Pdeci; i=1; 2; 3) for making decision were simulated.Tripartite sequential classification sampling plans with tally 0 binomial counts was compared to dichotomous plans repeated every Δt and after 2Δt.Performance of monitoring protocols were studied by monitoring height T. urticae populations with logistic growth. The results showed that tripartite classification reduced from 30 to 40% of the expected bouts than dichotomous sampling plans after Δt and 2Δt days and reduce sampling cots. Monitoring protocol B reduced the probability of intervening and produced more sample bouts and more samples, which resulted in lower expected and 95th percentiles for density at intervention and expected loss compared to protocol A.The use of tripartite classification plan requiredadjustment of the cd2 and cd1 values to accomplish an efficient integrated pest management during growth season.
Comparison of statistical methods commonly used in predictive modelingSalford Systems
This document compares four statistical methods commonly used in predictive modelling: Logistic Multiple Regression (LMR), Principal Component Regression (PCR), Classification and Regression Tree analysis (CART), and Multivariate Adaptive Regression Splines (MARS). It applies these methods to two ecological data sets to test their accuracy, reliability, ease of use, and implementation in a geographic information system (GIS). The results show that independent data is needed to validate models, and that MARS and CART achieved the best prediction success, although CART models became too complex for cartographic purposes with a large number of data points.
This document discusses modelling individual consumer behaviour using an agent-based model. It summarizes work to generate a synthetic population of individuals for the Leeds area of the UK that incorporates census and commercial data. Different population synthesis techniques were compared to create the population. Future work involves using the synthetic population to create households and incorporate individual characteristics and behaviours into agents to model shopping and travel.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
This document summarizes research on predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms. It reviews relevant techniques for handling class imbalance, including using the AUC evaluation metric, resampling methods like undersampling and oversampling, tuning the positive ratio, cross-validation, regularization for logistic regression, decision trees, and ensemble methods. The study aims to develop an optimal risk prediction model by jointly applying these techniques, with results showing that boosting on decision trees using oversampled data achieves the best performance.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
The document compares the predictive performance of classification trees and logistic regression models in determining whether individuals are insured or not based on demographic and socioeconomic characteristics. It first describes the data and techniques used, including classification trees, cost complexity pruning, logistic regression, bagging, and random forests. It then describes the research method of using different portions of the data for model training and testing. The results show that logistic regression performs as well as classification trees on nonlinear data and better on linear data. Both methods select income as an important predictor, while logistic regression favors dummy variables and classification trees favor continuous variables. Random forests have the highest predictive accuracy overall.
Estimation of parameters of linear econometric model and the power of test in...Alexander Decker
1) The document discusses using Monte Carlo simulations to estimate parameters of a linear econometric model and test the power of statistical tests in the presence of heteroscedasticity.
2) Two forms of heteroscedasticity (h(x)=X1/2 and h(x)=X) were introduced into the model. Sample sizes of 20, 50, and 100 were used with 50 replications each.
3) The bias, variance, and root mean square error of ordinary least squares (OLS) and generalized least squares (GLS) estimators were examined across sample sizes and heteroscedasticity forms. Both estimators were found to be unbiased but neither was consistently more efficient.
This document announces the winners of the 2024 Youth Poster Contest organized by MATFORCE. It lists the grand prize and age category winners for grades K-6, 7-12, and individual age groups from 5 years old to 18 years old.
❼❷⓿❺❻❷❽❷❼❽ Dpboss Kalyan Satta Matka Guessing Matka Result Main Bazar chart Final Matka Satta Matta Matka 143 Kalyan Chart Satta fix Jodi Kalyan Final ank Matka Boss Satta 143 Matka 420 Golden Matka Final Satta Kalyan Penal Chart Dpboss 143 Guessing Kalyan Night Chart
KALYAN MATKA | MATKA RESULT | KALYAN MATKA TIPS | SATTA MATKA | MATKA.COM | MATKA PANA JODI TODAY | BATTA SATKA | MATKA PATTI JODI NUMBER | MATKA RESULTS | MATKA CHART | MATKA JODI | SATTA COM | FULL RATE GAME | MATKA GAME | MATKA WAPKA | ALL MATKA RESULT LIVE ONLINE | MATKA RESULT | KALYAN MATKA RESULT | DPBOSS MATKA 143 | MAIN MATKA
The cherry: beauty, softness, its heart-shaped plastic has inspired artists since Antiquity. Cherries and strawberries were considered the fruits of paradise and thus represented the souls of men.
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka ! Fix Satta Matka ! Matka Result ! Matka Guessing ! Final Matka ! Matka Result ! Dpboss Matka ! Matka Guessing ! Satta Matta Matka 143 ! Kalyan Matka ! Satta Matka Fast Result ! Kalyan Matka Guessing ! Dpboss Matka Guessing ! Satta 143 ! Kalyan Chart ! Kalyan final ! Satta guessing ! Matka tips ! Matka 143 ! India Matka ! Matka 420 ! matka Mumbai ! Satta chart ! Indian Satta ! Satta King ! Satta 143 ! Satta batta ! Satta मटका ! Satta chart ! Matka 143 ! Matka Satta ! India Matka ! Indian Satta Matka ! Final ank
This tutorial offers a step-by-step guide on how to effectively use Pinterest. It covers the basics such as account creation and navigation, as well as advanced techniques including creating eye-catching pins and optimizing your profile. The tutorial also explores collaboration and networking on the platform. With visual illustrations and clear instructions, this tutorial will equip you with the skills to navigate Pinterest confidently and achieve your goals.
Fashionista Chic Couture Maze & Coloring Adventures is a coloring and activity book filled with many maze games and coloring activities designed to delight and engage young fashion enthusiasts. Each page offers a unique blend of fashion-themed mazes and stylish illustrations to color, inspiring creativity and problem-solving skills in children.
1. 1
GJESM MANUSCRIPT TEMPLATE
o ORIGINAL RESEARCH PAPER
o CASE STUDY
o SHORT COMMUNICATION
Forecast generation model of municipal solid waste using
multiple linear regression
ABSTRACT: The objective of this study was to develop a forecast model to
determine the rate of generation of municipal solid wastein the municipalities of the
Cuenca del Cañón del Sumidero, Chiapas,Mexico.Multiplelinearregression was used
with social and demographic explanatory variables.The compiled databaseconsisted
of 9 variables with 118 specific data per variable, which were analyzed using a
multicollinearity test to select the most important ones. Initially,different regression
models were generated, but only 2 of them were considered useful,becausethey used
few predictors that were statistically significant. The most important variables to
predict the rate of waste generation in the study area were the population of each
municipality,themigration and the population density.Although other variables,such
as daily per capita incomeand averageschoolingarevery important,they do not seem
to have an effect on the response variable in this study. The model with the highest
parsimony resulted in an adjusted coefficient of 0.975, an average absolute
percentage error of 7.70, an average absolute deviation of 0.16 and an average root
squareerror of 0.19, showinga high influenceon the phenomenon studied and a good
predictivecapacity.
KEYWORDS: Explanatory variables; Forecast model; Multiple linear
regression;Statisticalanalysis; Waste generation.
NUMBER OF REFERENCES
34
NUMBER OF FIGURES
5
NUMBER OF TABLES
6
RUNNING TITLE: Forecastgenerationmodel of municipal solidwaste
INTRODUCTION
Because of its high management cost, the amount of Municipal Solid Waste (MSW) generated in
population settlements is a significant factor for the provision of public services. According to
Intharathirat et al. (2015); Keser et al. (2012); Khan et al. (2016), the amount of MSW and its
compositionvary dependingon social,environmental anddemographicfactors. Severalresearchers
have developed models to predict the amount of MSW generated (Mahmood et al., 2018;
Kannangara et al., 2018; Pan et al., 2019; Soni et al., 2019), while othersanalyze the variablesthat
influencetheirgenerationandcomposition(Chhay etal., 2018; Grazhdani, 2016; Liu and Wu, 2010;
Liu et al., 2019; Rybová et al., 2018). Unfortunately,due to the social, economic and geographical
2. 2
heterogeneityof the differentregionsof the world,it is difficulttomake inferencesorprojections
withthe proposedmodels,and therefore, the modelsandtheirvariableshavetobe adaptedtothe
conditionsof otherregions,sometimeswithlittle success. KumarandSamandder(2017) and Shan
(2010) reportsthatsome of the difficultiesforthe adaptationof thesemodels are relatedtolimited
or inaccessible information in other countries (databases). In addition, some variables are
theoretically valid, but difficult to measure. In other cases, the variables used do not provide
informationleadingto the explanationof the phenomenon,buthave tobe used,because the model
incorporatesthem. Mexico,thistopichasalsobeenaddressed,particularlyinthe centerandnorth
of the country (Buenrostro et al., 2001; Márquez et al., 2008; Ojedaet al., 2008; Rodríguez,2004).
However, it is evident that the models proposed are not applicable to the entire national context.
According to the OECD (2015), there are notable differencesbetween the central, northern and
especially southern regions of Mexico; these include disparities in income, education, access to
services,dispersionof localitiesandotherfactors,whichcause that the consumptionpatterns,and
therefore the amountof MSW,vary greatly. Thisstudy presentsamodel toforecastthe generation
rate of MSW in the municipalities of the Cuenca del Cañón del Sumidero (CCS), Chiapas State,
Mexico. The model considersthe informationof the most relevantandeasilyaccessible socialand
demographic variables for the studyarea, which correspond to statistical data for the years 2010-
2015. This model will allow the decision makers of the municipalities of the CCS to determine the
quantities of MSWgenerated,operate properlythe waste managementsystems,and evenacquire
infrastructure. This study has been carried out in municipalities of the Cuenca del Cañón del
Sumidero,ChiapasState,Mexicoduring2010 - 2015.
MATERIALS AND METHODS
Description of the study area and context
The CCS islocatedinthe State of Chiapas, inthe southeastof Mexico,betweenthe coordinates15°
56' 55" and 16° 57' 26" NorthLatitude, and 92° 30' 44" and 93° 44° 35" westlongitude (Fig.1).The
CCS has 24 municipalitiesand2,847 localities; 2,816 localitiesare rural while 31 are urban. 83% of
the populationof the studyarealivesinurbanareas (INEGI,2010). The degree of dispersionishigh,
especiallyinthe rural localitiesfarthestfromthe municipal seat.
3. 3
Fig. 1: Geographic location of the study area in Cuenca del Cañón de Sumidero in Mexico
Developmentof the model
This study uses a multiple linear regression (MLR) model to obtain the generation rates of MSW.
Because of theirversatilityandwell-foundedtheory, MLRmodelshave beenwidelyusedin various
scientificfields. Theirmaindisadvantage isthe preparationof the database (Piresetal.,2008). The
hypothesistouse theMLRinthis study isbasedonthe effectof the explanatoryvariables(socialand
demographic variables) on the response variable (generation rate of MSW). The linear function is
shownin Eq. 1.
𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2+.. .+𝛽𝑘𝑋𝑘 + 𝜀 (1)
Where, Y is the response variable, Xi (1, 2, 3 ... k) are the explanatory variables, βi (1, 2, 3 ... k) are
the regressioncoefficientsand 𝜀 isthe residual error.
Accordingto Agirre (2006), the MLR is basedontwoassumptions:i) the explanatoryvariablesmust
be independent, i.e., free of multicollinearity and ii) the dependent variable must be normally
distributed, with zero mean and constant variance. In order to determine the regression
coefficients,the least squares method, which is based on minimizing the sum of squared errors
(SSE),usingEq. 2.
𝑆𝑆𝐸 = ∑ (𝑌𝑖 − 𝑌𝑖
̂)
2
𝑛
𝑖=1 (2)
4. 4
Where 𝑌𝑖 isthe value of eachobservationand 𝑌𝑖
̂ isthe predictedvalue.Theoretically,low SSEvalues
reflectabetterfitof the regressionmodel (KumarandSamandder,2017).Inorderto determine the
bestregressionmodel(mostparsimony),the statisticalsignificance of theexplanatoryvariablesand
the general modelwere analyzed.The analysisof the explanatoryvariableswasperformedwiththe
t-test, while the degree of adjustment and usefulness of the proposed model was performed by
evaluatingthe F-testandthe value of 𝑅2
𝑎𝑑𝑗 usingEqs. 3 and 4, respectively.
𝐹 =
(𝑆𝑆𝑌𝑌−𝑆𝑆𝐸)/𝑘
𝑆𝑆𝐸/[𝑛−(𝑘+1)]
(3)
𝑅2
𝑎𝑑𝑗 = 1 − [
(𝑛−1)
𝑛−(𝑘+1)
](1 − 𝑅2) (4)
Where 𝑆𝑆𝑌𝑌 = ∑(𝑌𝑖 − 𝑌
̅)2 representsthe sumofthe squaresof the differenceof the observeddata
(𝑌𝑖) and the average of the data(𝑌
̅);k isthe numberof explanatoryvariablesincludedinthe model;
n isthe sample size;and 𝑅2 isthe coefficientof determination.The value of 𝑅2 wasnotconsidered
tomeasure the explanatorypowerof theregressionmodel,becauseitsvalue increaseswhenadding
more explanatoryvariables,anditcanbe a deceptive measure (Changetal.,2007).
Datacollection
Accordingto Beigl etal. (2008) and Kolekaretal. (2016), the methodsof data collectiondependon
the scale of the study.In investigationscarriedoutathouseholdorlocalitylevels,the acquisitionof
information isusuallycarriedoutthroughsurveysor interviews;while atdistrictor countryscales,
the information comes from a database registered by government agencies. This study was made
at districtscale and therefore the studyarea includesseveral municipalities.MSWgenerationwas
obtained from SEMANH (2013), the studies by Alvarado et al. (2009) and Araiza et al. (2015). The
social anddemographicinformation(explanatoryvariables) wasobtainedfrom CONAPO(2017) and
INEGI (2010). The compiled informationallowedthe elaborationof a database of 9 variables,with
118 specific data per variable, coming from all the municipalities of the state of Chiapas (Table 1).
The inferencesof the proposed model were made on the municipalities of the CCS. This database
was analyzedwiththe MINITABsoftware version16.
Exploratory analysisof variables
An exploratoryanalysisof the 9 variablesusedto check the normalityof the data was carriedout.
The test used was Kolmogorov-Smirnov, with a level of significance of ∝= 0.05. This test showed
that the variables YGen, XPop, XPd, XPbam, XHgs, XCes, XDpi, did not follow a normal distribution,because
Table 1: Description of variables
No. Name of the variable Symbol Type of variable Measure
1 MSW generation YGen Dependent Tons/day
2 Population XPop Independent Inhabitants
3 Population density XPd Independent Inhabitants/km2
4 Population born in another municipality XPbam Independent Inhabitants
5 Average schooling XAs Independent Years of study
6 Household with goods and services XHgs Independent Percent (%)
7 Commercial establishments and services XCes Independent
Number of
establishments
8 Daily per capita income XDpi Independent Mexican pesos/day
9 Marginalization index XMi Independent Percent (%)
5. 5
theirp-value wassmallerthanthe ∝ value considered.Inordertoadjusttheirvalues,the variables
were transformedwithnatural logarithms.The variables XAs andXMi were nottransformedbecause
theirdata followedanormal distribution (Table 2).
Multicollinearity analysisand variablescreening
An analysis of the explanatory variables was made prior to the selection of the best MLR model.
Through a multicollinearity test, some of the variables initially considered were eliminated.
Especially, the variance inflationfactor(VIF) and the Pearsoncorrelationcoefficient(r) were used.
SimilartoKeseretal. (2012),the r coefficientwasusedtodetectthe bivariate association,while the
VIFwas used todetectthe multivariate correlation.Eqs. 5and 6 describe the testsused.
𝑉𝐼𝐹𝑘 =
1
(1−𝑅𝑘
2)
(5)
𝑟 = √1 −
𝑆𝑆𝐸
𝑆𝑆𝑌𝑌
(6)
VIF value is calculated using the 𝑅2 of the regression equation; the explanatory variables denoted
by k are analyzedasdependentvariables,whilethe othersare usedasindependentvariables;thus,
VIFis calculatedforeachexplanatoryvariable k.
The cut-off value of VIFusedinthis study was4. Accordingto Ghineaet al. (2016), whenVIF<1, the
explanatory variables are not correlated; when 1 < VIF < 5, the explanatory variables are slightly
correlated; and when VIF > 5 or 10, the explanatory variables are highly correlated. The value of r
indicatesthe relationshipbetweentwovariables(positiveornegative);itsvalue rangesbetween -1
and1. There are noclearlydefined cut-off inthe literature.Arriaza(2006) indicatesthatwithvalues
of r greater than 0.3, there may be signs of correlation, with values greater than 0.8, there are
serious problems of multicollinearity. As in Grazhdani (2016), in this study it was considered that a
value of r ≥ 0.6 (positive ornegative),indicatescorrelationbetweenthe explanatoryvariables. The
elimination of explanatory variables was performed in an iterative procedure, i.e., the VIF values
were initially determined for the 8 variables; subsequently, the variable with the highest VIF was
eliminatedandthe next iterationwith7variableswasperformed.Thiseliminationprocedureended
whena VIFcut-off value of 4 wasfound.Finally,othereliminationswere made basedonthe values
of r. Subsequently, 3 explanatory variables were used in the search stage for a better model (of
greater parsimony). The first variable selected was XPop, i.e., the "total population" of each
Table 2: Normality test and transformation of variables
Original variable Transformed variable
Kolmogorov-Smirnov Kolmogorov-Smirnov
Statistical p-value Statistical p-value
YGen 0.338 <0.010 ln-YGen 0.050 >0.150
XPop 0.281 <0.010 ln-XPop 0.066 >0.150
XPd 0.271 <0.010 ln-XPd 0.039 >0.150
XPbam 0.387 <0.010 ln-XPbam 0.065 >0.150
XAs 0.053 >0.150 XAs --- ---
XHgs 0.183 <0.010 ln-XHgs 0.054 >0.150
XCes 0.349 <0.010 ln-XCes 0.056 >0.150
XDpi 0.117 <0.010 ln-XDpi 0.056 >0.150
XMi 0.054 >0.150 XMi --- ---
6. 6
municipality,underthe hypothesisthat the largerthe population,the greaterthe consumption and
thus the greater the amount of MSW generated. The second explanatory variable used was XPd
"population density", under the premise that dispersion patterns or agglomeration of inhabitants
per unit area influences MSW generation. The third variable used was XPbam "population born in
another municipality", which can be seen as migration, i.e., people who move to other places to
seekbetterlivingconditions.The processof mobilizationof people causeschangesinconsumption
patternsof a newplace of settlement. Othermodelsthatdo not follow the principle of parsimony
were also created (more than 3 explanatory variables), but they should not be used to forecast
waste generation rates, since they have very low accuracy values and some of their explanatory
variablesare notsignificant.
Accuracy of the modeland validation
In order to determine the accuracy of the best model found, 3 widely used measures were
employed: the Mean Absolute Percentage Error (MAPE), the Mean Absolute Deviation(MAD) and
the Root Mean Square Error (RMSE) (Eqs. 7, 8 and9, respectively). A value of these measuresclose
to zeroindicates ahighprecisionof the model (Azadi andkarimí2016).
𝑀𝐴𝑃𝐸 =
1
𝑛
∑ |
𝐴𝑡−𝐹𝑡
𝐴𝑡
|
𝑛
𝑡=1 ∗ 100 (7)
𝑀𝐴𝐷 =
1
𝑛
∑ |𝐴𝑡 − 𝐹𝑡|
𝑛
𝑡=1 (8)
𝑅𝑀𝑆𝐸 = √
1
𝑛
∑ (𝐴𝑡 − 𝐹𝑡)2
𝑛
𝑡=1 (9)
Inthese equations,𝐴𝑡 isthe observedvalue, 𝐹𝑡 isthe predictedvalue and n isthesample size.MAPE
is expressedin terms of percentage of error, MAD expresses the precision in the units of the data
analyzed,and RMSE indicateshowconcentratedthe data are aroundthe line of bestfit. Inorderto
performthe external validation of the model,the technique called 𝑅2𝑗𝑎𝑐𝑘𝑘𝑛𝑖𝑓𝑒 usingEq. 10. This
equationiscalculatedbysystematicallyeliminatingeachobservationfromthe dataset, estimating
the regressionequationanddeterminingtowhatextentthemodelisabletopredictthe observation
that wasremoved.
𝑅2𝑗𝑎𝑐𝑘𝑘𝑛𝑖𝑓𝑒 = 1 −
∑(𝑦𝑖−𝑌
̂(𝑖))2
∑(𝑦𝑖−𝑌
̅)2
(10)
The 𝑅2𝑗𝑎𝑐𝑘𝑘𝑛𝑖𝑓𝑒 coefficient varies between 0 and 100%, larger values suggesting models with
greaterpredictive capacity; 𝑌
̂(𝑖)denotesthe predictedvaluefori-thobservationobtainedwhenthe
regression model fits the data with 𝑦𝑖 omitted (or removed) from the sample; and𝑌
̅ is the simple
average of the observeddata.
Verification of model assumptions
The validity of the MLR models is subject to the behavior of the residual errors "𝜀" (difference
between observed and predicted values of the dependent variable), particularly their normal
distribution, their independence and homoscedasticity (Kumar and Samandder, 2017). The
verification of normality was carried out through the Kolmogorov-Smirnov test, with a level of
significanceof ∝ = 0.05. In ordertoverifythe independence of residues,theDurbin-Watsontest(d)
was applied, looking for values close to 2, because "d" varies between 0 and 4 (Mendenhall and
7. 7
Sincich, 2012). The homoscedasticity assumption was evaluated with the plot of residuals vs
predicted,bothstandardized,lookingfora residue behaviorthatdoesnotfitanyknownpattern.
RESULTS AND DISCUSSION
Statistical analysisof variables
The initial exploratoryanalysiswasperformedonthe response variableYGen,which hasthe behavior
shown in Fig. 2. It is observed that some municipalities, which appear to be outliers, show a very
highrate of MSW generation.
Fig. 2: MSW generation rates in the state of Chiapas
These atypical valueswerenoteliminatedfromthe analysisbecause theyare noterrors,butrather
data that come from the mostimportantmunicipalitiesof Chiapas,suchas"TuxtlaGutiérrez,
Comitán, San Cristóbal de las Casas and Tapachula". These municipalities are regional heads,
therefore, the number of inhabitants, their patterns of consumption and MSW generation, differ
significantlyfromthe restof the studiedarea.The normalitytestof the responsevariableandof the
8 explanatory variables is shown in Fig. 3. The non-normality of the variables YGen, XPop, XPd, XPbam,
XHgs, XCes and XDpi, can be seen. For this reason, these variables were transformed using natural
logarithms (Fig.3a,3b, 3c, 3d, 3f, 3g and 3h).
8. 8
Fig. 3: Behavior of the variables analyzed with respect to normality
Forecastmodel
The coefficients of the MLR model were determined using the Minitab software. Only the
explanatoryvariablesthatfulfilledthe multicollinearitycriterionwere used.Initially,2theoretically
validmodelswere determined;the firstone isshowninEq. 11.
9. 9
𝑙𝑛𝑌𝐺𝑒𝑛 = −8.91 + 1.10 𝑙𝑛𝑋𝑃𝑜𝑝 + 0.0259 𝑙𝑛𝑋𝑃𝑑
+ 0.0688 𝑙𝑛𝑋𝑃𝑏𝑎𝑚
(11)
Thisfirstmodel consistsof 3variables, XPop,XPd,XPbam (all transformed).The F-testassociatedwitha
variance analysis indicated that the model is statistically valid because p-value < 0.05. This model
can thus also be used for forecast purposes. However, it is important to be careful because the
explanatory variable XPd is not statistically significant since the null hypothesis that the coefficient
of the variable isequal to zero(H0: βi = 0) is met.Therefore,the explanatoryvariable isnotrelated
to the dependentvariable, i.e.,itshouldnotbe interpreted.
The secondmodel ispresentedinEq. 12, whichconsistsof 2 explanatoryvariables “XPop andXPbam”.
Similarto the firstmodel,here alsothe p-value andthe F-testindicate thatit is a statisticallyvalid
model that can be used for forecasting purposes. Particularly this model is the one of greater
parsimony,because itusesonly2variables.
𝑙𝑛𝑌𝐺𝑒𝑛 = −8.86 + 1.11 𝑙𝑛𝑋𝑃𝑜𝑝 + 0.0658 𝑙𝑛𝑋𝑃𝑏𝑎𝑚
(12)
All the informationassociatedwiththe analysisof variance ispresentedin Table 3.
Table 3: Analysis of variance of the proposed models
Model Source
Degree of
freedom (df)
Sum of
Squares
Mean
square
F Sig.
1
Regression 3 150.591 50.197 1,449.33 0.000
Residual 107 3.706 0.035
Total 110 154.297
2
Regression 2 150.549 75.274 2,169.16 0.000
Residual 108 3.748 0.035
Total 110 154.297
The verificationof assumptionsof the proposedmodels,especially model 2,is presentedin Fig. 4.
The probability-probability plot (p-p plot) (Fig. 4a) shows the values of the residuals with a linear
patternindicatingnormality;additionally,the Kolmogorov-Smirnovvalueanditsassociatedp-value
confirmit(p-value>0.15). The resultof the Durbin-Watsonindependencetestgave avalueof 1.979
for model 2, which indicates that the residuals are not correlated. The homoscedasticity test
presentedinFig.4bshowsabehaviorofthe residualsthatdoesnotfitanyknownpattern;therefore,
thissituationisadequate.
10. 10
Fig. 4: Verification of model assumptions: (a) Normality of residuals, (b) Independence of residuals
On the other hand, the 𝑅2 value of the equations in both modelswas 0.976, which indicates that
97.6% of the generation rate of MSW YGen (transformed) can be explained by the explanatory
variables used. It is important to note the gradual decrease of 𝑅2
𝑎𝑑𝑗 (0.975) with respect to 𝑅2,
which is due to the adjustment by the introduction of 2 and 3 variables in models 1 and 2,
respectively. The highvalue of 𝑅2 and𝑅2
𝑎𝑑𝑗 in these modelsisdue to the initial transformationof
the explanatoryvariables,aswell asthe response variable.Additionally,the datacollectioncarried
out in this study influencedthese valuesbecause theycome froma censusdatabase,andnot from
an informationsurveythroughinterviews. The internal validationof model 2 throughMAPE, MAD
and RMSE, showedthe valuesof 7.70, 0.16 and 0.19, respectively,whichindicatesahighprecision
11. 11
since the values of these tests are close to 0 (zero). The external validation by𝑅2𝑗𝑎𝑐𝑘𝑘𝑛𝑖𝑓𝑒
presentedavalue of 97.44%. Therefore,model 2alsohasa highforecastingcapacity.
Non-significantvariables
The analysisof the 8 explanatoryvariablesusingthe VIFtestproducedthe initial eliminationof the
variables XAs, 𝑙𝑛-XHgs, 𝑙𝑛-XCes and 𝑙𝑛-XDpi, since their value was higher than the cut-off of 4. The
variablesXAs and𝑙𝑛-XDpi have beenusedmainlyinstudiesathouseholdorlocalitylevels(Khanetal,
2016; Ojeda et al., 2008), but in this paperthey were usedat districtlevel,andthe effectof these
variables seems not to be important (low correlation with the response variable 𝑙𝑛-YGen). The
variables 𝑙𝑛-XHgs and𝑙𝑛-XCes were eliminatedbecause theyare highlycorrelatedwithXMi,since the
latteris a multidimensional indicatorthatmeasuresdeprivationina population,throughvariables
similar to those eliminated. Finally, through the r test, only XMi was eliminated, since it was highly
correlatedwith 𝑙𝑛-XPbam,witha coefficientof -0.695, i.e.,much higherthanthe cut-off value of 0.6
(positive or negative); additionally, this variable was less correlated with the 𝑙𝑛-YGen response
variable (Table 4).
Table 4: Correlation matrix of variables
Pearson's correlation ln-YGen ln-XPop ln- XPd ln- XPbam
ln-XPop 0.985 --- --- ---
ln- XPd 0.161 0.169 --- ---
ln- XPbam 0.638 0.573 -0.059 ---
XMi -0.355 -0.271 -0.097 -0.695
Significantvariables
The transformedvariables XPop,XPd andXPbam were usedinthe searchforthe best model, since their
VIFand r valueswere belowthe cut-off values. XPop hasbeenusedinthe studies of Azadi andkarimí
(2016) and Abdoli etal. (2011), as the mostimportantexplanatoryvariable.Inthis study,Pearson's
correlation r-value was 0.985, which indicates that it is also the variable most related to the
generationof waste,particularlyinapositive way, i.e., toalargerpopulation correspondsagreater
quantity of MSW. The variable XPd has been used in few publications. Bel and Mur (2009) use this
variable alsoto obtainthe costs associatedwithwaste management. Inthisstudy, r-value of 0.161
was obtained, which indicates a poor correlation with the response variable. The analysis of the
forecastmodel 1indicatedthatthisvariable isnotstatisticallysignificant,anditsuse mustbe taken
with caution. The XPbam variable is positively related to the response variable. Its Pearson's
correlation coefficient was 0.638. This variable is important in the study area, since it can be
concluded that people who move from one municipality to another have different consumption
patternsthatmodifythe amountsof MSW. Otherexplanatoryvariablesmentionedin Kolekaretal.
(2016), forinstance age,employmentstatus,level of urbanizationandenvironmentalvariablessuch
as precipitation or temperature, were not used in this study since it is difficult to find a database
withinformationonthese variables.
Othergenerated models
Eqs. 13, 14 and 15 showothermodelsgeneratedwiththe variablesinitiallyraised(models3,4 and
5 respectively). All these models are statistically significant and are also useful for forecasting
purposes,butincorporate explanatoryvariablesthatare not significant;therefore,theirresultsare
not accurate (Table 5).Additionally,theyhave low parsimonybecause theyincorporate more than
2 or 3 explanatory variables. Table 6 shows the statistical behavior of the predictors. The p-value
12. 12
andthe VIFmust be analyzedbecausetheyindicate multicollinearitybetweenthe variablesandalso
theirpossible interpretationwithinthe generatedmodel.
𝑙𝑛𝑌𝐺𝑒𝑛 = −8.48 + 1.13 𝑙𝑛𝑋𝑃𝑜𝑝 − 0.0019 𝑙𝑛𝑋𝑃𝑑
+ 0.0315 𝑙𝑛𝑋𝑃𝑏𝑎𝑚
− 0.0111𝑋𝑀𝑖 (13)
𝑙𝑛𝑌𝐺𝑒𝑛 = −8.42 + 1.13 𝑙𝑛𝑋𝑃𝑜𝑝 − 0.0008 𝑙𝑛𝑋𝑃𝑑
+ 0.0324 𝑙𝑛𝑋𝑃𝑏𝑎𝑚
− 0.0070𝑋𝐴𝑠 − 0.0119𝑋𝑀𝑖
(14)
𝑙𝑛𝑌𝐺𝑒𝑛 = −8.22 + 0.995 𝑙𝑛𝑋𝑃𝑜𝑝 + 0.0039 𝑙𝑛𝑋𝑃𝑑
+ 0.0261 𝑙𝑛𝑋𝑃𝑏𝑎𝑚
− 0.0006𝑋𝐴𝑠 +
0.133 𝑙𝑛𝑋𝐻𝑔𝑠
− 0.00525𝑋𝑀𝑖 (15)
Table 5: Analysis of variance of other generated models
Model Source
Degree of
freedom (df)
Sum of
Squares
Mean
square
F Sig. 𝑅2
𝑎𝑑𝑗
3
Regression 4 150.900 37.725 1,177.44 0.000 0.977
Residual 106 3.396 0.032
Total 110 154.297
4
Regression 5 150.902 30.180 933.42 0.000 0.977
Residual 105 3.395 0.032
Total 110 154.297
5
Regression 6 151.397 25.233 905.01 0.000 0.980
Residual 104 2.900 0.028
Total 110 154.297
Table 6: Statistical behavior of predictors in models 3, 4 and 5
Model Predictor Coefficient p-value VIF Comments
3
Constant -8.48 0.000 -------
There is no multicollinearity
among predictors,but some
of them are not statistically
significant(p-value>0.05).
𝑙𝑛𝑋𝑃𝑜𝑝 1.13 0.000 1.947
𝑙𝑛𝑋𝑃𝑑 -0.0019 0.937 1.300
𝑙𝑛𝑋𝑃𝑏𝑎𝑚 0.0315 0.053 3.556
𝑋𝑀𝑖 -0.0111 0.002 2.405
4
Constant -8.42 0.000 -------
There is multicollinearity
between predictors (VIF> 5)
and some of them are not
statistically significant (p-
value> 0.05).
𝑙𝑛𝑋𝑃𝑜𝑝 1.13 0.000 1.948
𝑙𝑛𝑋𝑃𝑑 -0.0008 0.975 1.374
𝑙𝑛𝑋𝑃𝑏𝑎𝑚 0.0324 0.055 3.789
𝑋𝐴𝑠 -0.0070 0.844 5.373
𝑋𝑀𝑖 -0.0119 0.026 5.177
5
Constant -8.22 0.000 -------
𝑙𝑛𝑋𝑃𝑜𝑝 0.995 0.000 5.634
𝑙𝑛𝑋𝑃𝑑 0.0039 0.869 1.377
𝑙𝑛𝑋𝑃𝑏𝑎𝑚 0.0261 0.096 3.824
𝑋𝐴𝑠 -0.0006 0.986 5.384
𝑙𝑛𝑋𝐻𝑔𝑠 0.133 0.000 6.180
𝑋𝑀𝑖 -0.0525 0.309 5.710
Inferencesaboutthemunicipalitiesof thestudy area
Basedon model 2 and itsstatistical analysis,inferenceswere made toforecastthe generationrate
of MSW in the municipalitiesof the CCS. The forecastwas made withthe most currentdata of the
13. 13
variables XPop andXPbam,correspondingtothe year2015. Fig.5 showsMSW generation forecastand
its comparisonwiththe original database. Inmost of the municipalitiesof the CCS, the generation
rate of MSW presentedagradual increase withrespecttopopulationgrowth(variable XPop),except
in Arriaga, Chiapilla,Osumacinta, Suchiapa,Teopisca, Tonalá, VenustianoCarranza and Villaflores,
due tothe fact thatthe populationof thesemunicipalitiesdidnotincreaseinthe 2010-2015 period.
Fig. 5: Forecast of MSW generation rates in the municipalities of the study area
Currently,the studyareagenerates1,600tons of MSW/day,of which74% comesfromthe regional
heads such as Berriozábal, Ocozocoautla de Espinosa, San Cristóbal de las Casas, Tuxtla Gutiérrez
and Villaflores.
CONCLUSION
In this study, a forecast model was developed to determine the generation of MSW in the
municipalitiesof the CCS,ChiapasState,Mexico.A MLRwas usedtoobtainthe forecastmodel with
social and demographicexplanatoryvariables. Twoforecastmodelswere presentedandanalyzed,
withvariablesthat metthe multicollinearitytest.The mostimportantvariablestopredictthe rate
of MSW generationinthe studyareawerethe populationof eachmunicipality(XPop),the population
born in another municipality (XPbam) and the population density (XPd). XPop is the most influential
explanatory variable of waste generation, particularly it is related in a positive way. XPbam is less
relatedtowaste generation. XPd isthe variablethatleastinfluenceswaste generationprediction;in
addition, it can present problems of correlation with other explanatory variables. Although other
variables,suchasdailypercapitaincome(XDpi)andaverage schooling(XAs),are veryimportant,they
donot seemtohave aneffectonthe responsevariableinthisstudy.The userof thisforecastmodel
shoulduse model 2, since it is the one withthe highestparsimony(itusesfewervariables); 𝑅2
𝑎𝑑𝑗,
MAPE, MAD and RMSE values indicated high influence on the explained phenomenon and high
forecastingcapacity.Additionally,itisimportanttomentionthatwhenusingthe modelsproposed
for forecastingpurposes,itisnecessarytomake a transformationinthe explanatoryandresponse
variables(use inverseof natural logarithm).The inferencesmade onthe municipalitiesof the study
area showed that, except in some municipalities,the MSW generation rate usually presenteda
gradual increase withrespecttopopulationgrowthand withrespect to the numberof inhabitants
that were born in anotherentity(migration). Finally,this study canbe a solidbasisforcomparison
14. 14
for future researchinthe area of study.It is possible touse differentmathematical modelssuchas
artificial neural network,principal componentanalysis,time-seriesanalysis,etc.,andcompare the
response variable orthe predictors.
ACKNOWLEDGEMENT
This study was supported by the Project Support Program for Research and Technological
Innovation [Project UNAM DGAPA-PAPIIT IN105516]. We also appreciate the support of Mexican
National Council forScience andTechnology(CONACYT) tocarry outthis work.
CONFLICT OF INTEREST
The authordeclaresthatthereisno conflictofinterestsregardingthepublicationof thismanuscript.
In addition,the ethical issues,includingplagiarism, informedconsent,misconduct,datafabrication
and/or falsification,double publicationand/or submission, and redundancy have beencompletely
observedbythe authors.
ABBREVIATIONS
∝ Level of significance
𝐴𝑡 Observedvalue
βi (1,2,3 ...k) Regressioncoefficients
CCS Cuencadel Cañóndel Sumidero
d Durbin-Watsontest
Eq. Equation
𝐹 Fishertest
𝐹𝑡 Predictedvalue
H0 Null hypothesis
k Numberof explanatoryvariables includedinthe model
ln-YGen Natural logarithmof MSW generation
ln-XPop Natural logarithmof population
ln-XPd Natural logarithmof populationdensity
ln-XPbam Natural logarithmof populationborninanothermunicipality
ln-XHgs Natural logarithmof householdwithgoodsandservices
ln-XCes Natural logarithmof commercial establishmentsandservices
ln-XDpi Natural logarithmof dailypercapitaincome
MAD Mean Absolute Deviation
MAPE Mean absolute percentage error
MLR Multiple LinearRegression
MSW Municipal SolidWaste
n Sample size
p-p plot Probability-probabilityplot
p-value Probabilityvalue
r Pearsoncorrelationcoefficient
r-value Pearsoncorrelationcoefficient
𝑅2 Coefficientof determination
𝑅2
𝑎𝑑𝑗 Adjusted coefficientof determination
𝑅2𝑗𝑎𝑐𝑘𝑘𝑛𝑖𝑓𝑒 Jackknife coefficientof determination
RMSE Root Mean Square Error
SSE Sumof SquaredErrors
15. 15
𝑆𝑆𝑌𝑌 Sumof the squaresof the difference of (𝑌𝑖) andthe (𝑌
̅)
VIF Variance InflationFactor
Xi (1,2,3 ... k) Explanatoryvariables
XAs Average schooling
XCes Commercial establishmentsandservices
XDpi Dailypercapita income
XHgs Householdwithgoodsandservices
XMi Marginalizationindex
XPbam Populationborninanothermunicipality
XPd Populationdensity
XPop Population
𝑌
̅ Average of observeddata
YGen MSW generation
𝑌𝑖 Value of eachindividualobservation
𝑌𝑖
̂ Predictedvalue
REFERENCES
Notice:(All referenceshyperlinks must be taken and included at under each references from the
link below automatically)
https://woodward.library.ubc.ca/research-help/journal-abbreviations/
Abdoli,M.;Falahnezhad,M.;Behboudian,S.,(2011).Multivariate econometricapproachforsolid
waste generation modeling:acase studyof Mashhad,Iran. Environ.Eng.Sci.,28(9): 627-633
(7 pages).
https://www.liebertpub.com/doi/abs/10.1089/ees.2010.0234
Agirre,E.;Ibarra, G.; Madariaga, I.,(2006). Regressionandmultilayerperceptron-basedmodelsto
forecasthourlyO3 and NO2 levelsinthe Bilbaoarea. Environ. Modell.Software.21(4):430–
446 (17 pages).
https://dl.acm.org/citation.cfm?id=1707340
Alvarado,H.;Nájera,H.; González,F.;Palacios,R.,(2009). Studyof generationandcharacterization
of householdsolidwaste inthe municipal seatof Chiapa de Corzo,Chiapas,Mexico.
LacandoniaJ.,3: 85-92 (8 pages).
https://cuid.unicach.mx/revistas/index.php/lacandonia/article/view/156
Araiza, J.;López,C.; Ramírez,N.,(2015). Municipal solidwaste management:case studyinLas
Margaritas, Chiapas.AIDISJ.Eng. Environ.Sci.:Res.Develop.Pract.,8(3):299-311 (13
pages).
http://www.revistas.unam.mx/index.php/aidis/article/view/53489
Arriaza,M., (2006). Practical guide todata analysis,juntade Andalucía,ministryof innovation,
science andbusiness,instituteof agricultural researchandtrainingandfishing,Spain.
16. 16
https://www.juntadeandalucia.es/agriculturaypesca/ifapa/servifapa/contenidoAlf?id=1c141ba8-
08cc-42fc-9df7-10a632eb3194
Azadi,S.;Karimí,A., (2016). Verifyingthe performance of artificial neural networkandmultiple
linearregressioninpredictingthe meanseasonal municipalsolidwastegenerationrate:a
case studyof Fars province,Iran.Waste Manage.,48: 14-23 (10 pages).
https://www.ncbi.nlm.nih.gov/pubmed/26482809
Beigl,P.;Lebersorger,S.;Salhofer,S.,(2008).Modellingmunicipal solidwaste generation:a
review.Waste Manage.,28(1):200-214 (15 pages).
https://www.ncbi.nlm.nih.gov/pubmed/17336051
Bel,G.; Mur, M., (2009). Intermunicipalcooperation,privatizationandwaste managementcosts:
evidence fromrural municipalities.Waste Manage.,29(10):2772–2778 (7 pages).
https://www.ncbi.nlm.nih.gov/pubmed/19556117
Buenrostro,O.;Bocco, G.; Vence,J.,(2001). Forecastinggenerationof urbansolidwaste in
developingcountries - acase studyinMexico.J. AirWaste Manage.,51(1): 86-93 (8 pages).
https://www.ncbi.nlm.nih.gov/pubmed/11218430
Chang,Y.; Lin,C.; Chyan,J.; Chen,I.;Chang, J. (2007). Multiple regressionmodelsforthe lower
heatingvalue of municipal solidwaste inTaiwan. J.Environ.Manage.,85(4):891–899 (9
pages).
https://www.ncbi.nlm.nih.gov/pubmed/17234326
Chhay,L.; Reyad,M.;Suy,R.; Islam,M.; MianM., (2018). Municipal solidwaste generationinChina:
influencing factor analysis and multi-model forecasting. J. Mater. Cycles Waste Manage.,
20(3): 1761–1770 (10 pages).
https://link.springer.com/article/10.1007/s10163-018-0743-4
CONAGUA,(2009). National WaterCommission.Integratedmanagementplanforthe Cuencadel
Cañóndel Sumidero.Chiapas,México.
CONAPO, (2017). National PopulationCouncil.Municipal Marginalization Index.
http://www.conapo.gob.mx/es/CONAPO/Datos_Abiertos_del_Indice_de_Marginacion
Accessed10 June 2017.
Ghinea,C.;Niculina,E.;Comanita,E.;Gavrilescu,M.;Campean,T.;Curteanu,S.;Gavrilescu,M.
(2016). Forecastingmunicipalsolidwaste generationusingprognostictoolsandregression
analysis. J.Environ.Manage.,182: 80-93 (14 pages).
https://www.ncbi.nlm.nih.gov/pubmed/27454099
Grazhdani,D.,(2016). Assessingthe variablesaffectingonthe rate of solidwaste generationand
recycling:anempirical analysisinPrespaPark.Waste Manage.,48: 3-13 (11 pages).
https://www.ncbi.nlm.nih.gov/pubmed/26482808
INEGI, (2010). Populationandhousingcensus2010: interactive dataquery.NationalInstitute of
StatisticandGeography.
17. 17
https://www.inegi.org.mx/programas/ccpv/2010/default.html
Intharathirat,R.;Salam,P.; Kumar,S.;Untong, A.,(2015). Forecastingof municipal solidwaste
quantityina developingcountryusingmultivariate greymodel.Waste Manage.,39: 3–14
(12 pages).
https://www.ncbi.nlm.nih.gov/pubmed/25704925
Kannangara, M.; Dua, R.; Ahmadi, L.; Bensebaa, F., (2018). Modeling and prediction of regional
municipal solidwaste generationanddiversioninCanadausingmachinelearningapproaches.
Waste Manage., 74: 3–15 (13 pages).
https://www.ncbi.nlm.nih.gov/pubmed/29221873
Khan,D.; Kumar,A.; Samadder,S.,(2016). Impact of socioeconomicstatusonmunicipal solid
waste generationrate.Waste Manage.,49: 15-25 (11 pages).
https://www.ncbi.nlm.nih.gov/pubmed/26831564
Keser,S.;Duzgun,S.;Aksoy,A.,(2012). Applicationof spatial andnon-spatial dataanalysisin
determinationof the factorsthatimpact municipal solidwaste generationratesinTurkey.
Waste Manage., 32(3): 359–371 (13 pages).
https://www.ncbi.nlm.nih.gov/pubmed/22104614
Kolekar, K.;Hazra,T.; Chakrabarty,S.,(2016). A review onpredictionof municipal solidwaste
generationmodels. ProcediaEnvironSci.,35:238 – 244 (7 pages).
https://www.sciencedirect.com/science/article/pii/S1878029616301761
Kumar,A.; Samandder,S.,(2017). Anempirical model forpredictionof householdsolidwaste
generationrate – a case studyof Dhanbad,India.Waste Manage.,68: 3-15 (13 pages).
https://www.ncbi.nlm.nih.gov/pubmed/28757221
Liu,C.; Wu, X.,(2010). Factors influencingmunicipal solidwaste generationinChina:amultiple
statistical analysisstudy.Waste Manage.Res.,29(4):371-378 (8 pages).
https://www.ncbi.nlm.nih.gov/pubmed/20699292
Liu, J.; Li, Q.; Gu, W.; Wang, C., (2019). The Impact of consumption patterns on the generation of
municipal solid waste in China: evidences from provincial data. Int. J. Environ. Res. Public
Health,16(10): 1-19 (19 pages).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6573004/
Mahmood, S.; Sharif, F.; Rahman, A.U.; Khan, A.U., (2018). Analysis and forecasting of municipal
solidwaste in NankanaCity usinggeo-spatial techniques.Environ.Monit.Assess.,190(5): 1-
14 (14 pages).
https://www.ncbi.nlm.nih.gov/pubmed/29644486
Márquez,M.; Ojeda,S.; Hidalgo,H.,(2008). Identificationof behaviorpatternsinhouseholdsolid
waste generationinMexicali’sCity:studycase.Resour.Conserv.Recycl.,52(11):1299–1306
(8 pages).
https://www.sciencedirect.com/science/article/pii/S0921344908001146
18. 18
Mendenhall,W.;Sincich,T.,(2012). A secondcourse in statisticsregressionanalysis,Prentice Hall,
UnitedStatesof America.
https://www.amazon.com/Second-Course-Statistics-Regression-Analysis/dp/0321691695
OECD, (2015). Measuringwell-beinginMexicanStates.OrganizationforEconomicCooperation
and Development.OECDPublishing,Paris,France.
http://www.oecd.org/regional/measuring-well-being-in-mexican-states-9789264246072-en.htm
Ojeda,S.;Lozano, G.; Morelos,R.; Armijo,C.,(2008). Mathematical modelingtopredictresidential
solidwaste generation.Waste Manage.,28(1):S7–S13 (7 pages).
https://www.ncbi.nlm.nih.gov/pubmed/18583125
Pan, A.;Yu, L.; Yang, Q, (2019). Characteristicsand forecastingof municipal solidwaste generation
inChina.Sustainability,11(5):1-11 (11 pages).
https://www.mdpi.com/2071-1050/11/5/1433
Pires,J.;Martins,F.;Sousa,S.;Alvim,M.;Pereira,M. (2008). Selectionandvalidationof parameters
inmultiple linearandprincipal componentregressions. Environ.Modell.Softw.,23(1):50-55
(6 pages).
https://dl.acm.org/citation.cfm?id=1290535.1290558
Rodríguez,M. (2004). Designof a mathematical model of municipal solidwaste generationin
NicolásRomero,Mexico.Master'sThesis,National PolytechnicInstitute,Mexico.
https://tesis.ipn.mx/handle/123456789/1617
Rybová,K.;Slavik,J.;Burcin,B.; Soukopová,J.;Kučera,T.;Černíková,A.,(2018). Socio-
demographicdeterminantsof municipal waste generation:case studyof the CzechRepublic.
J. Mater. CyclesWaste Manage.,20(3): 1884–1891 (8 pages).
https://link.springer.com/article/10.1007/s10163-018-0734-5
SEMAHN, Secretariat of the Environment and Natural History (2013). State program for the
preventionandintegral managementof municipal solidwaste andwaste special inthe state
of Chiapas.
https://www.gob.mx/cms/uploads/attachment/file/187451/Chiapas.pdf
Shan,C., (2010). Projectingmunicipal solidwaste:the case of Hong KongSAR.Resour.Conserv.
Recycl.,54(11): 759–768 (10 pages).
https://www.sciencedirect.com/science/article/pii/S0921344909002730
Soni, U.; Roy, A.; Verma, A.; Jain, V., (2019). Forecasting municipal solid waste generation using
artificial intelligence models—acase studyinIndia.SN Appl.Sci.,1(2):1-11 (11 pages).
https://link.springer.com/article/10.1007/s42452-018-0157-x
19. 19
GRAPHICAL ABSTRACT
A Graphical Abstract mustbe producedaccordingto the mainfindingsof the manuscriptresearch
storysummarizesvisuallyand pictorially. Providingsome selectedmanuscriptgraphs,tables,
images,diagramortextcannot be assumedas Graphical Abstract!
In orderto learnmore,refertothe Graphical AbstractGuidelineproducedbyElsevierPublisheras
the belowlinkorlookingatthe GJESM Publishedarticles:
https://www.elsevier.com/authors/journal-authors/graphical-abstract
HIGHLIGHTS
The provided HIGHLIGHTS must be written scientifically accordingto your research output and NOVELTY not
as your research aims and methodology into 3 to 4 HIGHLIGHTS statements:
The predictivemodel developed was constructed usingmultiplelinear regression,with explanatory
social and demographic variables thatmet the multicollinearity test;
The results of this work show that the variables most influential on waste generation are directly
related to the number of inhabitants;such as population,population density and population born in
another entity;
The suggested predictive model has a high parsimony, in addition, the adjusted coefficient of
determination and the accuracy coefficients indicated high influence on the explained phenomenon
and a high forecastingcapacity.