El modelo de regresión lineal simpleA continuación se ilustra cómo obtener el ajuste del modelo de regresión lineal simple...
3. Especifica los artículos del reporte.     Especifica las predicciones para valores de X en la caja de Predict Y at thes...
Linear Regression ReportPage/Date/Time    2 22/10/2010 10:32:31 a.m.Y = Calculo X = MatematicasSummary StatementThe equati...
Linear Regression ReportPage/Date/Time    3 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasRegression Estimati...
Linear Regression ReportPage/Date/Time    4 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasBootstrapSection(La...
Linear Regression ReportPage/Date/Time    5 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasBootstrap Section--...
Linear Regression ReportPage/Date/Time    6 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasCorrelation and R-S...
Linear Regression ReportPage/Date/Time    7 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasAnalysis of Varianc...
Linear RegressionReportPage/Date/Time    8 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasTests of Assumptions...
Linear Regression ReportPage/Date/Time    9 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasSerial Correlation ...
Linear Regression ReportPage/Date/Time    10 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasPRESS Section     ...
Linear Regression ReportPage/Date/Time    11 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasResidual Plots Sec...
Linear Regression ReportPage/Date/Time    12 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = Matematicas                  ...
Linear Regression ReportPage/Date/Time    13 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasPredicted Values a...
Linear Regression ReportPage/Date/Time    14 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasWorking-Hotelling ...
Linear Regression ReportPage/Date/Time    15 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasResidual Diagnosti...
Linear Regression ReportPage/Date/Time    16 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasOutlier Detection ...
Linear Regression ReportPage/Date/Time    17 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasOutlier & Influenc...
Linear Regression ReportPage/Date/Time    18 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasInverse Prediction...
Upcoming SlideShare
Loading in …5
×

Est3 tutorial3mejorado

555 views

Published on

Estadisticos en NCSS

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
555
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Est3 tutorial3mejorado

  1. 1. El modelo de regresión lineal simpleA continuación se ilustra cómo obtener el ajuste del modelo de regresión lineal simple con elpaquete NCSS.Los datos consisten de una calificación de aprovechamiento de Matemáticas de una muestraaleatoria de 10 estudiantes de primer ingreso a la universidad junto con la calificación deCálculo. La meta es hacer inferenciassobre el modelo donde la calificación de Matemáticas(x) explique la calificación de Cálculo (y) y obtener predicciones de Cálculo para valores deMatemáticas de 50 y 60. Los datos se escriben directamente en NCSS como se muestra acontinuación: Matemáticas Calculo 39 65 43 78 21 52 64 82 57 92 47 89 28 73 75 98 34 56 52 75Nota que a cada respuesta se le asigna su variable explicativa observada. Recuerda que laasignación de nombres de las columnas se puede hacer al oprimir el botón derecho del ratóny al escoger Variable Info y después Variable Name. No escribas acentos ni uses lañ.También recuerda no cometer errores tipográficos; para esto, trata de jalar el cuadrito abajoa la derecha de la celda que quieras copiar. 1. Abre la ventana Linear Regression. En el menú, selecccionaAnalysis, luego Regression/Correlation, y después Linear Regression. El procedimiento de análisis de varianza aparecerá. En el menú, selecciona File, luego New Template. Esto hará que el procedimiento sea el de “default”. 2. Especifica las variables. En la ventana Linear Regression, selecciona la ceja Variables. Especifica la caja Y: Dependent Variable(s) con la respuesta Cálculo haciendo doble clic y seleccionándola. Especifica la caja X: Independent Variable(s) con la variable explicativa Matematicas haciendo doble clic y seleccionándola.
  2. 2. 3. Especifica los artículos del reporte. Especifica las predicciones para valores de X en la caja de Predict Y at these X Values. Los valores se separan con un espacio. Para la ilustración teclea 50 60. Haz clic en la ceja Reportsy selecciona las cantidades que deseas que aparezcan en el análisis. Selecciona todos.4. Corre el procedimiento. En el menú de Run, selecciona RunProcedure. Aquí obtendrás el siguiente resultado: Linear Regression Report Page/Date/Time 1 22/10/2010 10:32:31 a.m. Database Y = Calculo X = Matematicas Linear RegressionPlotSection Calculo vs Matematicas 100.0 87.5 Calculo 75.0 62.5 50.0 20.0 35.0 50.0 65.0 80.0 Matematicas Run Summary Section Parameter Value Parameter Value Dependent Variable Calculo Rows Processed 10 Independent Variable Matematicas Rows Used in Estimation 10 Frequency Variable None Rows with X Missing 0 Weight Variable None Rows with Freq Missing 0 Intercept 40.7842 Rows Prediction Only 0 Slope 0.7656 Sum of Frequencies 10 R-Squared 0.7052 Sum of Weights 10.0000 Correlation 0.8398 Coefficient of Variation 0.1145 Mean Square Error 75.75323 Square Root of MSE 8.703633
  3. 3. Linear Regression ReportPage/Date/Time 2 22/10/2010 10:32:31 a.m.Y = Calculo X = MatematicasSummary StatementThe equation of the straight line relating Calculo and Matematicas is estimated as:(ecuación de regresion)Calculo =(40.7842) + (0.7656)*(Matematicas)using the 10 observations in this dataset.The y-intercept (βo),the estimated value of Calculo when Matematicas is zero, is 40.7842with a standard errorof8.5069.The slope (pendiente = β1), the estimated change in Calculo per unit change in Matematicas, is 0.7656with a standard error of 0.1750.The value of R-Squared ( ), the proportion of the variation inCalculo that can be accounted for by variation inMatematicas, is 0.7052.The correlationbetween Calculo and Matematicas is 0.8398.A significance test that the slope is zero resulted in a t-value of 4.3750.The significancelevel of this t-test is 0.0024.Since(entonces) 0.0024 < 0.0500, the hypothesis that the slope is zero isrejected.(Ho esrechazada)The estimated slope(pendiente) is 0.7656.The lower limit of the 95% = 0.05 confidence interval for the slope is0.3620 and the upper limit is 1.1691(.3620, 1.1691)= intervalo de confianza para la pendienteβ1.The estimated intercept is 40.7842.The lower limit ofthe 95% confidence interval for the intercept is 21.1673 and the upper limit is 60.4010(21.1673, 60.4010).= interval de confianza para la ordenada βoDescriptive Statistics SectionParameter Dependent IndependentVariable Calculo MatematicasCount 10 10Mean 76.0000 46.0000Standard Deviation 15.1144 16.5798Minimum 52.0000 21.0000Maximum 98.0000 75.0000
  4. 4. Linear Regression ReportPage/Date/Time 3 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasRegression Estimation SectionEstimación de la regression. Intercept SlopeParameter B(0) B(1)Regression Coefficients 40.7842 0.7656Lower 95% Confidence Limit 21.1673 0.3620Upper 95% Confidence Limit 60.4010 1.1691Standard Error 8.5069 0.1750Standardized Coefficient 0.0000 0.8398T Value 4.7943 4.3750Prob Level 0.0014 0.0024Reject H0 (Alpha = 0.0500) Yes Yes (prueba de hipot.)Power (Alpha = 0.0500) 0.9863 0.9677Regression of Y on X 40.7842 0.7656Inverse Regression from X on Y 26.0655 1.0855Orthogonal Regression of Y and X 34.7968 0.8957Notes:The above report shows the least-squares estimates of the intercept and slope followedby the corresponding standard errors, confidence intervals, and hypothesis tests. Notethat these results are based on several assumptions that should be validated beforethey are used.Estimated Model( 40.784155214228) + ( .765561843168957) * (Matematicas)
  5. 5. Linear Regression ReportPage/Date/Time 4 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasBootstrapSection(La idea básica del bootstrap es tratar la muestra como si fuera la población, esuna estimación para una población, por supuesto esto intuye a que tenemos una cantidad dedatos o individuos más grande, por ello el uso de 50 y 60)------------ Estimation Results ------------ | ------------ Bootstrap Confidence Limits ----------------Parameter Estimate | Conf. Level Lower UpperInterceptOriginal Value 40.7842 | 0.9000 25.9293 54.0720Bootstrap Mean 40.5099 | 0.9500 22.4576 56.5937Bias (BM - OV) -0.2743 | 0.9900 15.6398 65.6676Bias Corrected 41.0584Standard Error 8.5692SlopeOriginal Value 0.7656 | 0.9000 0.4716 1.0306Bootstrap Mean 0.7761 | 0.9500 0.3845 1.0980Bias (BM - OV) 0.0105 | 0.9900 0.1654 1.2003Bias Corrected 0.7551Standard Error 0.1699CorrelationOriginal Value 0.8398 | 0.9000 0.7338 1.0000Bootstrap Mean 0.8263 | 0.9500 0.7189 1.0000Bias (BM - OV) -0.0135 | 0.9900 0.6959 1.0000Bias Corrected 0.8533Standard Error 0.0991R-SquaredOriginal Value 0.7052 | 0.9000 0.5160 1.0000Bootstrap Mean 0.6925 | 0.9500 0.4875 1.0000Bias (BM - OV) -0.0127 | 0.9900 0.4428 1.0000Bias Corrected 0.7179Standard Error 0.1496Standard Error of EstimateOriginal Value 8.7036 | 0.9000 7.5912 12.0945Bootstrap Mean 7.7916 | 0.9500 7.3206 12.7867Bias (BM - OV) -0.9121 | 0.9900 6.7540 14.5573Bias Corrected 9.6157Standard Error 1.3796Orthogonal InterceptOriginal Value 34.7968 | 0.9000 19.6108 52.2714Bootstrap Mean 33.6771 | 0.9500 14.9337 59.1995Bias (BM - OV) -1.1197 | 0.9900 9.5260 77.3036Bias Corrected 35.9165Standard Error 12.8285Orthogonal SlopeOriginal Value 0.8957 | 0.9000 0.4728 1.1779Bootstrap Mean 0.9269 | 0.9500 0.3089 1.2517Bias (BM - OV) 0.0312 | 0.9900 -0.0725 1.3568Bias Corrected 0.8646Standard Error 0.3270
  6. 6. Linear Regression ReportPage/Date/Time 5 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasBootstrap Section------------ Estimation Results ------------ | ------------ Bootstrap Confidence Limits ----------------Parameter Estimate | Conf. Level Lower UpperPredicted Mean and Confidence Limits of Calculo when Matematicas = 50.0000Original Value 79.0622 | 0.9000 74.2706 83.1174Bootstrap Mean 79.3135 | 0.9500 73.1039 83.8730Bias (BM - OV) 0.2512 | 0.9900 70.4090 85.4468Bias Corrected 78.8110Standard Error 2.7673Predicted Mean and Confidence Limits of Calculo when Matematicas = 60.0000Original Value 86.7179 | 0.9000 80.4961 91.1953Bootstrap Mean 87.0742 | 0.9500 78.2697 92.1576Bias (BM - OV) 0.3563 | 0.9900 73.7842 94.5974Bias Corrected 86.3616Standard Error 3.4398Predicted Value and Prediction Limits of Calculo when Matematicas = 50.0000Original Value 79.0622 | 0.9000 58.7435 97.2475Bootstrap Mean 78.7827 | 0.9500 55.4098 100.4693Bias (BM - OV) -0.2796 | 0.9900 47.9876 109.9048Bias Corrected 79.3418Standard Error 12.0660Predicted Value and Prediction Limits of Calculo when Matematicas = 60.0000Original Value 86.7179 | 0.9000 66.2959 105.5009Bootstrap Mean 86.6635 | 0.9500 61.8813 108.2358Bias (BM - OV) -0.0544 | 0.9900 55.0788 116.5996Bias Corrected 86.7723Standard Error 12.2026Sampling Method = Observation, Confidence Limit Type = Reflection, Number of Samples = 3000.Notes:The main purpose of this report is to present the bootstrap confidence intervals of variousparameters. All gross outliers should have been removed. The sample size should be at least(al menos)50 and the sample should be representative of the population it was drawn from.
  7. 7. Linear Regression ReportPage/Date/Time 6 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasCorrelation and R-Squared Section Spearman Pearson Rank Correlation CorrelationParameter Coefficient R-Squared CoefficientEstimated Value 0.8398 0.7052 0.8788Lower 95% Conf. Limit (r distn) 0.4233Upper 95% Conf. Limit (r distn) 0.9540Lower 95% Conf. Limit (Fishers z) 0.4460 0.5578Upper 95% Conf. Limit (Fishers z) 0.9612 0.9711Adjusted (Rbar) 0.6684T-Value for H0: Rho = 0 4.3750 4.3750 5.2086Prob Level for H0: Rho = 0 0.0024 0.0024 0.0008Notes:The confidence interval for the Pearson correlation assumes that X and Y follow the bivariatenormal distribution. This is a different assumption from linear regression which assumes thatX is fixed and Y is normally distributed.Two confidence intervals are given. The first is based on the exact distribution of Pearsonscorrelation. The second is based on Fishers z transformation which approximates the exactdistribution using the normal distribution. Why are both provided? Because most booksonly mention Fishers approximate method, it will often be needed to do homework. However,the exact methods should be used whenever possible.The confidence limits can be used to test hypotheses about the correlation. To test thehypothesis that rho is a specific value, say r0, check to see if r0 is between theconfidence limits. If it is, the null hypothesis that rho = r0 is not rejected.If r0 isoutside the limits, the null hypothesis is rejected.Spearmans Rank correlation is calculated by replacing the orginal data with their ranks.This correlation is used when some of the assumptions may be invalid.
  8. 8. Linear Regression ReportPage/Date/Time 7 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasAnalysis of Variance Section (Tabla ANOVA) Sum of Mean Prob PowerSource DF Squares Square F-Ratio Level (5%)Intercept 1 57760 57760Slope 1 1449.974 1449.974 19.1408 0.0024 0.9677Error 8 606.0259 75.75323Adj. Total 9 2056 228.4444Total 10 59816s = Square Root(75.75323) = 8.703633Notes:The above report shows the F-Ratio for testing whether the slope is zero, the degrees of freedom,and the mean square error. The mean square error, which estimates the variance of the residuals,isusedextensively in thecalculation of hypothesistests and confidenceintervals.(El error cuadrado medioestima la varianza de los residuals es usado extensamente en el calculo de prueba de hipótesis eintervalos de confianza)Summary Matrices XX XX XY XX Inverse XX InverseIndex 0 1 2 0 10 10 460 760 0.9552951 -1.859337E-021 460 23634 36854 -1.859337E-02 4.042037E-042 (YY) 59816Determinant 24740 4.042037E-05Variance - CovarianceMatrix of RegressionCoefficients VC(b) VC(b)Index 0 10 72.36669 -1.4085081 -1.408508 3.061974E-02
  9. 9. Linear RegressionReportPage/Date/Time 8 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasTests of Assumptions Section Is the Assumption Test Prob Reasonable at the 0.2000Assumption/Test Value Level Level of Significance?Residuals follow Normal Distribution?Shapiro Wilk 0.9162 0.326279 YesAnderson Darling 0.3987 0.365069 YesDAgostinoSkewness 0.5271 0.598118 YesDAgostinoKurtosis -1.3574 0.174648 NoDAgostinoOmnibus 2.1204 0.346381 YesConstant Residual Variance?Modified Levene Test 0.0089 0.927267 YesRelationship is a Straight Line (Linea recta)?Lack of Linear Fit(Falta de ajuste lineal)F(0, 0) Test0.0000 0.000000 NoNo Serial Correlation?Evaluate the Serial-Correlation report and the Durbin-Watson test if you haveequal-spaced, time series data.Notes:A Yes meansthereis not enoughevidencetomakethisassumptionseemunreasonable (un “yes” quiere decir queno hay suficiente evidencia para hacer que la suposición parezca razonable).This lack of evidence may be because the sample size is too small, the assumptionsofthe test itself are not met, ortheassumptionisvalid(esta falta de evidencia puede ser debido a que el tamañode la muestra es muy pequeño, los supuestos de la prueba por si mismos, no se cumplen, o el supuestoes valido..A No meansthethattheassumptionis not reasonable (un "No " significa queelquelasuposiciónnoesrazonable). However, since these testsare related to sample size, you should assess the role of samplesize in the testsby also evaluating the appropriate plots and graphs. A large dataset (say N > 500) willoften fail at least one of the normality tests because it is hard to find a large dataset thatis perfectly normal.Normality and Constant Residual Variance:Possible remedies for the failure of these assumptions include using a transformation of Ysuch as the log or square root, correcting data-recording errors found by looking into outliers,adding additional independent variables, using robust regression, or using bootstrap methods.Straight-Line:Possible remedies for the failure of this assumption include using nonlinear regression orpolynomial regression.
  10. 10. Linear Regression ReportPage/Date/Time 9 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasSerial Correlation of Residuals Section Serial Serial SerialLag Correlation Lag Correlation Lag Correlation1 0.3611 9 172 -0.2875 10 183 -0.4307 11 194 -0.3158 12 205 13 216 14 227 15 238 16 24Notes:Each serial correlation is the Pearson correlation calculated between the original series ofresiduals and the residuals lagged the specified number of periods. This feature of residualsis only meaningfull for data obtained sorted in time order. One of the assumptions is that noneof these serial correlations is significant. Starred correlations are those for which|Fishers Z| > 1.645 which indicates whether the serial correlation is large.If serial correlation is detected in time series data, the remedy is to account for it eitherby replacing Y with first differences or by fitting the serial pattern using a method such as thatproposed by Cochrane and Orcutt.Durbin-Watson Test For Serial Correlation Did the Test RejectParameter Value H0: Rho(1) = 0?Durbin-Watson Value 1.1737Prob. Level: Positive Serial Correlation 0.1078 YesProb. Level: Negative Serial Correlation 0.9474 NoNotes:The Durbin-Watson test was created to test for first-order serial correlationin regression data taken over time. If the rows of your dataset do not representsuccessive time periods, you should ignore this test.This report gives the probability of rejecting the null hypothesis of no first-orderserial correlation. Possible remedies for serial correlation were given in the Notesto the Serial Correlation report, above.
  11. 11. Linear Regression ReportPage/Date/Time 10 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasPRESS Section From From PRESS RegularParameter Residuals ResidualsSum of Squared Residuals 888.5684 606.0259Sum of |Residuals| 84.87208 69.78011R-Squared 0.5678 0.7052Notes:A PRESS residual is found by estimating the regression equation without the observation,predicting the dependent variable, and subtracting the predicted value from the actual value.The PRESS values are calculated from these PRESS residuals. The Regular values are thecorresponding calculations based on the regular residuals.The PRESS values are often used to compare models in a multiple-regression variable selection.They show how well the model predicts observations that were not used in the estimation.Predicted Values and Confidence Limits Section Predicted Standard Lower 95% Upper 95%Matematicas Calculo Error Confidence Confidence(X) (Yhat|X) of Yhat Limit of Y|X Limit of Y|X50.0000 79.0622 2.8399 72.5133 85.611260.0000 86.7179 3.6847 78.2210 95.2147The confidence interval estimates the mean of the Y values in a large sample of individuals with this value of X.The interval is only accurate if all of the linear regression assumptions are valid.Predicted Values and Prediction Limits Section Predicted Standard Lower 95% Upper 95%Matematicas Calculo Error Prediction Prediction(X) (Yhat|X) of Yhat Limit of Y|X Limit of Y|X50.0000 79.0622 9.1552 57.9502 100.174360.0000 86.7179 9.4515 64.9228 108.5130The prediction interval estimates the predicted value of Y for a single individual with this value of X.The interval is only accurate if all of the linear regression assumptions are valid.
  12. 12. Linear Regression ReportPage/Date/Time 11 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasResidual Plots Section Residuals of Calculo vs Matematicas |Residuals of Calculo| vs Matematicas 15.0 14.0 |Residuals of Calculo|Residuals of Calculo 7.5 10.5 0.0 7.0 -7.5 3.5 -15.0 0.0 20.0 35.0 50.0 65.0 80.0 20.0 35.0 50.0 65.0 80.0 Matematicas Matematicas RStudent of Calculo vs Matematicas Residuals Calculo vs Row 2.0 15.0 Residuals of Calculo 1.1 7.5RStudent of Calculo 0.3 0.0 -0.6 -7.5 -1.5 -15.0 20.0 35.0 50.0 65.0 80.0 0.0 3.0 6.0 9.0 12.0 Matematicas Row Serial Correlation of Residuals Histogram of Residuals of Calculo 15.0 5.0Residuals of Calculo 7.5 3.8 Count 0.0 2.5 -7.5 1.3 -15.0 0.0 -15.0 -7.5 0.0 7.5 15.0 -15.0 -7.5 0.0 7.5 15.0 Lagged Residuals of Calculo Residuals of Calculo
  13. 13. Linear Regression ReportPage/Date/Time 12 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = Matematicas Normal Probability Plot of Residuals of Calculo 15.0 Residuals of Calculo 7.5 0.0 -7.5 -15.0 -2.0 -1.0 0.0 1.0 2.0 Expected NormalsOriginal Data Section Predicted Matematicas Calculo CalculoRow (X) (Y) (Yhat|X) Residual1 39.0000 65.0000 70.6411 -5.64112 43.0000 78.0000 73.7033 4.29673 21.0000 52.0000 56.8610 -4.86104 64.0000 82.0000 89.7801 -7.78015 57.0000 92.0000 84.4212 7.57886 47.0000 89.0000 76.7656 12.23447 28.0000 73.0000 62.2199 10.78018 75.0000 98.0000 98.2013 -0.20139 34.0000 56.0000 66.8133 -10.813310 52.0000 75.0000 80.5934 -5.5934This report provides a data list that may be used to verify whether the correct variables were selected.
  14. 14. Linear Regression ReportPage/Date/Time 13 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasPredicted Values and Confidence Limits of Means Predicted Standard Lower 95% Upper 95% Matematicas Calculo Calculo Error Conf. Limit Conf. LimitRow (X) (Y) (Yhat|X) of Yhat of Y Mean|X of Y Mean|X1 39.0000 65.0000 70.6411 3.0126 63.6940 77.58812 43.0000 78.0000 73.7033 2.8019 67.2420 80.16463 21.0000 52.0000 56.8610 5.1684 44.9425 68.77944 64.0000 82.0000 89.7801 4.1828 80.1345 99.42585 57.0000 92.0000 84.4212 3.3586 76.6762 92.16626 47.0000 89.0000 76.7656 2.7579 70.4059 83.12537 28.0000 73.0000 62.2199 4.1828 52.5742 71.86558 75.0000 98.0000 98.2013 5.7729 84.8889 111.51379 34.0000 56.0000 66.8133 3.4619 58.8302 74.796410 52.0000 75.0000 80.5934 2.9458 73.8004 87.3864The confidence interval estimates the mean of the Y values in a large sample of individuals with this value of X.The interval is only accurate if all of the linear regression assumptions are valid.Predicted Values and Prediction Limits Predicted Standard Lower 95% Upper 95% Matematicas Calculo Calculo Error Prediction PredictionRow (X) (Y) (Yhat|X) of Yhat Limit of Y|X Limit of Y|X1 39.0000 65.0000 70.6411 9.2103 49.4022 91.88002 43.0000 78.0000 73.7033 9.1435 52.6183 94.78833 21.0000 52.0000 56.8610 10.1225 33.5183 80.20364 64.0000 82.0000 89.7801 9.6566 67.5120 112.04825 57.0000 92.0000 84.4212 9.3292 62.9081 105.93436 47.0000 89.0000 76.7656 9.1301 55.7115 97.81977 28.0000 73.0000 62.2199 9.6566 39.9518 84.48808 75.0000 98.0000 98.2013 10.4441 74.1171 122.28559 34.0000 56.0000 66.8133 9.3668 45.2133 88.413210 52.0000 75.0000 80.5934 9.1886 59.4044 101.7824The prediction interval estimates the predicted value of Y for a single individual with this value of X.The interval is only accurate if all of the linear regression assumptions are valid.Los intervalos de predicciónestiman la predicción del valor de Y para un único valor de x (predicción puntual).
  15. 15. Linear Regression ReportPage/Date/Time 14 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasWorking-Hotelling Simultaneous Confidence Band Predicted Standard Lower 95% Upper 95% Matematicas Calculo Calculo Error Conf. Band Conf. BandRow (X) (Y) (Yhat|X) of Yhat of Y Mean|X of Y Mean|X1 39.0000 65.0000 70.6411 3.0126 43.7750 97.50722 43.0000 78.0000 73.7033 2.8019 48.7157 98.69093 21.0000 52.0000 56.8610 5.1684 10.7692 102.95274 64.0000 82.0000 89.7801 4.1828 52.4778 127.08245 57.0000 92.0000 84.4212 3.3586 54.4692 114.37316 47.0000 89.0000 76.7656 2.7579 52.1709 101.36027 28.0000 73.0000 62.2199 4.1828 24.9176 99.52228 75.0000 98.0000 98.2013 5.7729 46.7188 149.68389 34.0000 56.0000 66.8133 3.4619 35.9405 97.686010 52.0000 75.0000 80.5934 2.9458 54.3231 106.8637This is a confidence band for the regression line for all possible values of X from -infinity to + infinity.The confidence coefficient is the proportion of time that this procedure yields a band the includes the trueregression line when a large number of samples are taken using the same X values as in this sample.Residual Section Predicted Percent Matematicas Calculo Calculo Standardized AbsoluteRow (X) (Y) (Yhat|X) Residual Residual Error1 39.0000 65.0000 70.6411 -5.6411 -0.6908 8.67862 43.0000 78.0000 73.7033 4.2967 0.5214 5.50863 21.0000 52.0000 56.8610 -4.8610 -0.6941 9.34804 64.0000 82.0000 89.7801 -7.7801 -1.0193 9.48795 57.0000 92.0000 84.4212 7.5788 0.9439 8.23786 47.0000 89.0000 76.7656 12.2344 1.4820 13.74667 28.0000 73.0000 62.2199 10.7801 1.4124 14.76738 75.0000 98.0000 98.2013 -0.2013 -0.0309 0.20549 34.0000 56.0000 66.8133 -10.8133 -1.3541 19.309410 52.0000 75.0000 80.5934 -5.5934 -0.6830 7.4578The residual is the difference between the actual and the predicted Y values. The formula isResidual = Y - Yhat. The Percent Absolute Error is the 100 |Residual| / Y.
  16. 16. Linear Regression ReportPage/Date/Time 15 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasResidual Diagnostics Section Matematicas HatRow (X) Residual RStudent Diagonal Cooks D MSEi1 39.0000 -5.6411 -0.6664 0.1198 0.0325 81.41042 43.0000 4.2967 0.4963 0.1036 0.0157 83.63283 21.0000 -4.8610 -0.6698 0.3526 0.1312 81.36094 64.0000 -7.7801 -1.0222 0.2310 0.1560 75.33105 57.0000 7.5788 0.9366 0.1489 0.0779 76.93406 47.0000 12.2344 1.6277 0.1004 0.1226 62.80557 28.0000 10.7801 1.5249 0.2310 0.2995 64.98778 75.0000 -0.2013 *-0.0289 0.4399 0.0004 86.56489 34.0000 -10.8133 -1.4427 0.1582 0.1723 66.732110 52.0000 -5.5934 -0.6583 0.1146 0.0302 81.5275Outliers are rows that are separated from the rest of the data. Influential rows are those whoseomission results in a relatively large change in the results. This report lets you see both.An outlier may be defined as a row in which |RStudent| > 2. A moderately influential row is one withaCooksD> 0.5. A heavily influential row is one with a CooksD> 1.MSEi is the value of the Mean Square Error (the average of the sum of squared residuals) calculatedwith each row omitted.Leave One Row Out SectionRow RStudent DFFITS Cooks D CovRatio DFBETAS(0) DFBETAS(1)1 -0.6664 -0.2459 0.0325 1.3121 -0.1673 0.10002 0.4963 0.1687 0.0157 1.3598 0.0835 -0.03163 -0.6698 -0.4943 0.1312 * 1.7819 -0.4811 0.41844 -1.0222 -0.5602 0.1560 1.2859 0.2799 -0.42185 0.9366 0.3918 0.0779 1.2119 -0.1086 0.22456 1.6277 0.5438 0.1226 0.7641 0.1429 0.03457 1.5249 0.8357 0.2995 0.9570 0.7733 -0.62938 -0.0289 -0.0256 0.0004 * 2.3315 0.0174 -0.02259 -1.4427 -0.6255 0.1723 0.9219 -0.5199 0.379410 -0.6583 -0.2368 0.0302 1.3081 0.0083 -0.0844Each column gives the impact on some aspect of the linear regression of omitting that row.RStudent represents the size of the residual. DFFITS represents the change in the fitted value of a row.Cooks D summarizes the change in the fitted values of all rows. CovRatio represents the amount ofchange in the determinant of the covariance matrix. DFBETAS(0) and DFBETAS(1) give the amountof change in the intercept and slope.
  17. 17. Linear Regression ReportPage/Date/Time 16 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasOutlier Detection Chart Matematicas StandardizedRow (X) Residual Residual RStudent1 39.0000 -5.6411 |||............ -0.6908 ||............. -0.6664 |..............2 43.0000 4.2967 ||............. 0.5214 ||............. 0.4963 |..............3 21.0000 -4.8610 |||............ -0.6941 ||............. -0.6698 |..............4 64.0000 -7.7801 |||||.......... -1.0193 ||||........... -1.0222 |..............5 57.0000 7.5788 ||||........... 0.9439 |||............ 0.9366 |..............6 47.0000 12.2344 |||||||........ 1.4820 ||||||......... 1.6277 |..............7 28.0000 10.7801 ||||||......... 1.4124 |||||.......... 1.5249 |..............8 75.0000 -0.2013 |.............. -0.0309 |.............. -0.0289 |..............9 34.0000 -10.8133 ||||||......... -1.3541 |||||.......... -1.4427 |..............10 52.0000 -5.5934 |||............ -0.6830 ||............. -0.6583 |..............Outliers are rows that are separated from the rest of the data. Since outliers can have dramatic effectson the results, corrective action, such as elimination, must be carefully considered. Outlying rows shouldnot be automatically be removed unless a good reason for their removal can be given.An outlier may be defined as a row in which |RStudent| > 2. Rows with this characteristic have been starred.Influence Detection Chart MatematicasRow (X) DFFITS Cooks D DFBETAS(1)1 39.0000 -0.2459 |.............. 0.0325 |.............. 0.1000 |..............2 43.0000 0.1687 |.............. 0.0157 |.............. -0.0316 |..............3 21.0000 -0.4943 |.............. 0.1312 |.............. 0.4184 |..............4 64.0000 -0.5602 |.............. 0.1560 |.............. -0.4218 |..............5 57.0000 0.3918 |.............. 0.0779 |.............. 0.2245 |..............6 47.0000 0.5438 |.............. 0.1226 |.............. 0.0345 |..............7 28.0000 0.8357 |.............. 0.2995 ||............. -0.6293 |..............8 75.0000 -0.0256 |.............. 0.0004 |.............. -0.0225 |..............9 34.0000 -0.6255 |.............. 0.1723 |.............. 0.3794 |..............10 52.0000 -0.2368 |.............. 0.0302 |.............. -0.0844 |..............Influential rows are those whose omission results in a relatively large change in the results. They arenot necessarily harmful. However, they will distort the results if they are also outliers. The impact ofinfluential rows should be studied very carefully. Their accuracy should be double-checked.DFFITS is the standardized change in Yhat when the row is omitted. A row is influential whenDFFITS > 1 for small datasets (N < 30) or when DFFITS > 2*SQR(1/N) for medium to large datasets.Cooks D gives the influence of each row on the Yhats of all the rows. Cook suggests investigatingall rows having a Cooks D > 0.5. Rows in which Cooks D > 1.0 are very influential.DFBETAS(1) is the standardized change in the slope when this row is omitted. DFBETAS(1) > 1 for smalldatasets (N < 30) and DFBETAS(1) > 2/SQR(N) for medium and large datasets are indicative of influential rows.
  18. 18. Linear Regression ReportPage/Date/Time 17 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasOutlier & Influence Chart Hat Matematicas RStudent Cooks D DiagonalRow (X) (Outlier) (Influence) (Leverage)1 39.0000 -0.6664 |.............. 0.0325 |.............. 0.1198 |..............2 43.0000 0.4963 |.............. 0.0157 |.............. 0.1036 |..............3 21.0000 -0.6698 |.............. 0.1312 |.............. 0.3526 |||||||||||....4 64.0000 -1.0222 |.............. 0.1560 |.............. 0.2310 |||||..........5 57.0000 0.9366 |.............. 0.0779 |.............. 0.1489 ||.............6 47.0000 1.6277 |.............. 0.1226 |.............. 0.1004 |..............7 28.0000 1.5249 |.............. 0.2995 ||............. 0.2310 |||||..........8 75.0000 -0.0289 |.............. 0.0004 |.............. 0.4399 |||||||||||||||9 34.0000 -1.4427 |.............. 0.1723 |.............. 0.1582 ||.............10 52.0000 -0.6583 |.............. 0.0302 |.............. 0.1146 |..............Outliers are rows that are separated from the rest of the data. Influential rows are those whoseomission results in a relatively large change in the results. This report lets you see both.An outlier may be defined as a row in which |RStudent| > 2. A moderately influential row is one withaCooksD> 0.5. A heavily influential row is one with a CooksD> 1.Inverse Prediction of X Means Predicted Lower 95% Upper 95% Calculo Matematicas Matematicas Conf. Limit Conf. LimitRow (Y) (X) (Xhat|Y) X-Xhat|Y of X Mean|Y of X Mean|Y1 65.0000 39.0000 31.6315 7.3685 11.7810 40.42702 78.0000 43.0000 48.6125 -5.6125 39.6772 59.55773 52.0000 21.0000 14.6505 6.3495 -22.2829 27.46404 82.0000 64.0000 53.8374 10.1626 45.5434 68.16135 92.0000 57.0000 66.8997 -9.8997 56.8331 93.04626 89.0000 47.0000 62.9810 -15.9810 53.7409 85.28607 73.0000 28.0000 42.0813 -14.0813 30.4075 50.74018 98.0000 75.0000 74.7371 0.2629 62.6604 108.92379 56.0000 34.0000 19.8754 14.1246 -11.5925 31.243310 75.0000 52.0000 44.6938 7.3062 34.3891 53.9934This confidence interval estimates the mean of X in a large sample of individuals with this value of Y.This method of inverse prediction is also called calibration.
  19. 19. Linear Regression ReportPage/Date/Time 18 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = MatematicasInverse Prediction of X Individuals Predicted Lower 95% Upper 95% Calculo Matematicas Matematicas Prediction PredictionRow (Y) (X) (Xhat|Y) X-Xhat|Y Limit of X|Y Limit of X|Y1 65.0000 39.0000 31.6315 7.3685 -7.9089 60.11692 78.0000 43.0000 48.6125 -5.6125 17.2054 82.02953 52.0000 21.0000 14.6505 6.3495 -37.0380 42.21914 82.0000 64.0000 53.8374 10.1626 23.9947 89.71005 92.0000 57.0000 66.8997 -9.8997 39.1685 110.71086 89.0000 47.0000 62.9810 -15.9810 34.8652 104.16187 73.0000 28.0000 42.0813 -14.0813 8.0918 73.05598 98.0000 75.0000 74.7371 0.2629 47.2329 124.35119 56.0000 34.0000 19.8754 14.1246 -27.7306 47.381510 75.0000 52.0000 44.6938 7.3062 11.8213 76.5612This prediction interval estimates the predicted value of X for a single individual with this value of Y.This method of inverse prediction is also called calibration.

×