This document discusses using PROC FREQ and PROC LOGISTIC in SAS to analyze a binary outcome variable using contingency tables and logistic regression. It analyzes data on smoking status and abnormal breathing tests. PROC FREQ generates contingency tables and finds that the odds of an abnormal test are about 2.8 times higher for current smokers compared to never smokers. PROC LOGISTIC fits a logistic regression model that estimates the increase in log odds of an abnormal test for current smokers is 1.01 compared to never smokers.
This document introduces several concepts in estimation theory, including Bayesian parameter estimation, non-Bayesian parameter estimation, maximum likelihood estimation, and the Cramér-Rao lower bound. It provides examples of estimating parameters for linear and nonlinear models from observed data using different cost functions and derivation of the mean square error, maximum a posteriori, and maximum likelihood estimates.
This document describes an orifice flow calibration experiment conducted by Jessica Catlin, Dylan Helm, and Yen Nguyen. The objective was to develop a model for air flow rates between 0-0.3 SCFM. Data was collected using various equipment and analyzed to determine constants a=0.0575 and b=0.592 for the model Q = a(i - io)b. Testing of the model found errors within ±15.5% and statistical analysis found the mean residual to be insignificant. Uncertainty analysis calculated average error to be 0.00941 SCFM. The document concludes there may be unknown errors from volume measurements and operating limits.
Natural Gas Time Series Analysis
The author analyzes natural gas price data from 1996 to 2016 using R. After differencing to achieve stationarity, ARIMA models are fitted and the SARIMA(1,0,0)×(2,1,1)12 model is identified as best based on having the lowest AIC value and significant coefficients. Forecasting with this model shows the predicted values follow a similar decreasing trend as the actual later data. Diagnostic checks confirm the residuals exhibit white noise. The analysis provides useful prediction of natural gas prices.
ARCH/GARCH model.ARCH/GARCH is a method to measure the volatility of the series, to model the noise term of ARIMA model. ARCH/GARCH incorporates new information and analyze the series based on the conditional variance where users can forecast future values with updated information. Here we used ARIMA-ARCH model to forecast moments. And forecast error 0.9%
The document discusses steps for identifying and building ARIMA models for time series data. It describes ARIMA models as consisting of three components - identification, estimation, and diagnostic checking. For identification, it explains how to determine the p, d, and q values by examining the autocorrelation and partial autocorrelation functions of stationary differenced time series data. It then discusses using the method of moments to estimate ARIMA model parameters by equating sample statistics to population parameters.
Pierre-Charles Bierly provides examples of quality control tools and methods he has developed for his work, including spreadsheets, graphs, and programs. These include quality control charts to monitor precision, mass and charge balance calculations to check analysis completeness, concentration calculation programs, and alkalinity and solubility calculation programs. The tools are intended to help establish quality control programs and solve various process problems.
This document introduces several concepts in estimation theory, including Bayesian parameter estimation, non-Bayesian parameter estimation, maximum likelihood estimation, and the Cramér-Rao lower bound. It provides examples of estimating parameters for linear and nonlinear models from observed data using different cost functions and derivation of the mean square error, maximum a posteriori, and maximum likelihood estimates.
This document describes an orifice flow calibration experiment conducted by Jessica Catlin, Dylan Helm, and Yen Nguyen. The objective was to develop a model for air flow rates between 0-0.3 SCFM. Data was collected using various equipment and analyzed to determine constants a=0.0575 and b=0.592 for the model Q = a(i - io)b. Testing of the model found errors within ±15.5% and statistical analysis found the mean residual to be insignificant. Uncertainty analysis calculated average error to be 0.00941 SCFM. The document concludes there may be unknown errors from volume measurements and operating limits.
Natural Gas Time Series Analysis
The author analyzes natural gas price data from 1996 to 2016 using R. After differencing to achieve stationarity, ARIMA models are fitted and the SARIMA(1,0,0)×(2,1,1)12 model is identified as best based on having the lowest AIC value and significant coefficients. Forecasting with this model shows the predicted values follow a similar decreasing trend as the actual later data. Diagnostic checks confirm the residuals exhibit white noise. The analysis provides useful prediction of natural gas prices.
ARCH/GARCH model.ARCH/GARCH is a method to measure the volatility of the series, to model the noise term of ARIMA model. ARCH/GARCH incorporates new information and analyze the series based on the conditional variance where users can forecast future values with updated information. Here we used ARIMA-ARCH model to forecast moments. And forecast error 0.9%
The document discusses steps for identifying and building ARIMA models for time series data. It describes ARIMA models as consisting of three components - identification, estimation, and diagnostic checking. For identification, it explains how to determine the p, d, and q values by examining the autocorrelation and partial autocorrelation functions of stationary differenced time series data. It then discusses using the method of moments to estimate ARIMA model parameters by equating sample statistics to population parameters.
Pierre-Charles Bierly provides examples of quality control tools and methods he has developed for his work, including spreadsheets, graphs, and programs. These include quality control charts to monitor precision, mass and charge balance calculations to check analysis completeness, concentration calculation programs, and alkalinity and solubility calculation programs. The tools are intended to help establish quality control programs and solve various process problems.
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana. Presentation for CS 6212 final project in GWU during Fall 2015 (Prof. Arora's class)
EES Procedures and Functions for Heat exchanger calculationstmuliya
This file contains notes on the important topic of EES Functions for Heat Exchanger calculations. Some solved problems are also included.
These notes were prepared while teaching Heat Transfer course to the M.Tech. students in Mechanical Engineering Dept. of St. Joseph Engineering College, Vamanjoor, Mangalore, India.
It is hoped that these notes will be useful to teachers, students, researchers and professionals working in this field.
Contents:
• Overall heat transfer coefficient
• Importance of ‘Fouling factor’
• Analysis of heat exchangers by ‘Logarithmic Mean Temp Difference (LMTD)’ method
• Correction factors for Cross-flow and Shell & Tube heat exchangers
• Analysis of heat exchangers by ‘No. of Transfer Units (NTU)– Effectiveness (ε)’ method
• Compact heat exchangers
This document discusses the impact of timer resolution on the efficiency optimization of synchronous buck converters. It presents the problem of needing dead time to avoid short circuits but dead time reducing efficiency. Solutions like fixed, sensing-based and sensorless dead time adjustment are proposed. The analysis shows timer resolution limits optimization resolution. Experimental implementation on a microcontroller demonstrates high resolution PWM achieving 98.6% loss elimination vs 72% for low resolution, improving efficiency.
Vapor Combustor Improvement Project LinkedIn Presentation February 2016Tim Krimmel, MEM
This document presents a vapor combustor/flare system improvement project for Baker Hughes Chemical Facility. It describes the development of a nonlinear programming model in Excel to analyze the vapor system's performance over changing variables. Field data was collected from pressure transmitters and flowmeters to validate the model. A meeting with plant engineers helped determine production risk probabilities and vapor stream loadings to apply in a Monte Carlo simulation. The model was validated by comparing its descriptive statistics to field data statistics. The goal is to systematically analyze the system and propose solutions to improve performance using modeling, simulation, and data.
This Presentation describes, in short, Introduction to Time Series and the overall procedure required for Time Series Modelling including general terminologies and algorithms. However the detailed Mathematics is excluded in the slides, this ppt means to give a start to understanding the Time Series Modelling before going into detailed Statistics.
EES Functions and Procedures for Forced convection heat transfertmuliya
This file contains notes on Engineering Equation Solver (EES) Functions and Procedures for Forced convection heat transfer calculations. Some problems are also included.
These notes were prepared while teaching Heat Transfer course to the M.Tech. students in Mechanical Engineering Dept. of St. Joseph Engineering College, Vamanjoor, Mangalore, India.
It is hoped that these notes will be useful to teachers, students, researchers and professionals working in this field.
Contents:
• Forced convection – Tables of formulas
• Boundary layer, flow over flat plates, across cylinders, spheres and tube banks –
• Flow inside tubes and ducts
This document estimates various ARMA models, including AR(1), MA(3), and ARMA(1,1) models, on time series data. It then evaluates these models along with a random walk model on out-of-sample forecasting performance. Key metrics computed include AIC, SBIC, parameter estimates, standard errors, Ljung-Box and LM test statistics, mean squared error of forecasts, and Diebold-Mariano test statistics to compare the ARMA and random walk forecasts.
This document discusses forecasting gasoline prices in the United States using an ARIMA model. It provides background on gasoline, including its consumption and retail prices. The objective is to understand price volatility due to supply and demand constraints. Data on US gasoline prices from 1993-2014 is obtained from the EIA. After checking for stationarity and transforming the data, an ARIMA(1,1,3) model is identified as best. This model reveals gasoline prices are significantly related to past prices and unobserved factors. The validated model is used to forecast future gasoline prices.
This document compares different methods for predicting the density of liquid refrigerants, including correlations by ISH, Rackett, Yamada & Gunn, Spencer & Danner, Hankinson & Thomson, Reidel, and NM. It finds that the NM correlation best predicts densities for R22, R32, R134a, R152a, R600, and R12, while the Hankinson & Thomson method works best for R290, R600a, and R1270. The Reidel correlation accurately models R143a and R125, and the Yamada & Gunn and Spencer & Danner modifications are suitable for R123 and R718, respectively. Overall, the document evaluates methods for various refriger
Pressure research in kriss tilt effect 04122018 ver1.67Gigin Ginanjar
The document discusses tilt effects in high pressure pressure balances up to 500 MPa. It analyzes absolute and relative tilt effects through theoretical approaches, 2D and 3D FEA simulations, and experiments. The theoretical approach shows tilt can cause a change in effective area of around 3 ppm for 500 MPa balances and 1 ppm for 100 MPa balances. FEA simulations in perpendicular and tilted conditions were performed to investigate piston tilt effects on pressure distribution and effective area calculations. Experiments on various pressure balances showed 100 MPa balances follow a cosine behavior with tilt, while 500 MPa balances deviate more from ideal behavior.
This document provides an overview of ARMA, ARIMA, and SARIMA models. It describes the components of each model, including the autoregressive, integrated, and moving average parts. It also outlines the steps for identifying, estimating, and evaluating these models, including determining stationarity and selecting parameter values. The key assumptions of these times series models are that the data must be stationary or made stationary through differencing.
This document discusses thermodynamic properties of fluids, including:
1) Derivations of equations relating the primary thermodynamic properties of pressure, volume, temperature, internal energy, and entropy for homogeneous phases and fluids.
2) Calculations of changes in enthalpy, entropy, and internal energy based on changes in pressure and temperature.
3) The thermodynamic properties of Gibbs energy and residual properties.
4) An example problem calculating the enthalpy and entropy of saturated isobutane vapor at a specified temperature and pressure using compressibility factor and ideal gas heat capacity data.
This document discusses ARIMA (autoregressive integrated moving average) models for time series forecasting. It covers the basic steps for identifying and fitting ARIMA models, including plotting the data, identifying possible AR or MA components using the autocorrelation function (ACF) and partial autocorrelation function (PACF), estimating model parameters, checking the residuals to validate the model fit, and choosing the best model. An example analyzes quarterly US GNP data to demonstrate these steps.
This document describes a conjugate heat transfer analysis of an electronics cooling system using OpenFOAM. It outlines the objectives to develop a CFD model for CHT analysis and validate it with experiments. The methodology section describes the governing equations solved for fluid and solid regions as well as the interface coupling. A simple circuit board cooling case is modeled and tested. Additionally, a server cooling case is proposed with details on geometry, meshing, boundary conditions and results showing temperature distributions.
This document describes the group method for calculating multistage countercurrent cascades used in absorption, stripping, liquid-liquid extraction, and leaching. It provides the key equations for calculating the recovery fraction and determining the number of stages needed based on inlet and outlet stream compositions and flow rates. An example application to the absorption of hydrocarbon gases with an oil absorbent is included to demonstrate using the method to estimate exit streams given inlet conditions and the number of stages.
This document provides an introduction to ARIMA (AutoRegressive Integrated Moving Average) models. It discusses key assumptions of ARIMA including stationarity. ARIMA models combine autoregressive (AR) terms, differences or integrations (I), and moving averages (MA). The document outlines the Box-Jenkins approach for ARIMA modeling including identifying a model through correlograms and partial correlograms, estimating parameters, and diagnostic checking to validate the model prior to forecasting.
This document contains analysis of stationarity and unit root tests for the S&P 500 Index (SPIndex) and Atlanta housing price index (AtlantaHPIndex) time series data. Optimal lags were selected using the Bayesian information criterion. Unit root tests using these lags show that the null hypothesis of non-stationarity cannot be rejected for the SPIndex, but can be rejected for the AtlantaHPIndex, indicating it is stationary.
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
This document describes a study that uses multiple linear regression to model the rate of uninsured population in counties in Georgia. The study finds that the uninsured rate is closely related to demographic factors like age distribution, income levels, employment rates, gender distribution, and citizenship status. Specifically, counties with larger populations aged 18-24, higher median incomes, lower poverty rates, stronger job markets, and more native-born residents tended to have lower uninsured rates. The researchers used principal component analysis to address correlations between employment-related variables before selecting variables and building the regression model.
Churn Modeling-For-Mobile-Telecommunications Salford Systems
This document summarizes a study on predicting customer churn for a major mobile provider. TreeNet models were used to predict the probability of customers churning (switching providers) within a 30-60 day period. TreeNet models significantly outperformed other methods, increasing accuracy and the proportion of high-risk customers identified. Applying the most accurate TreeNet models could translate to millions in additional annual revenue by helping the provider preemptively retain more customers.
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana. Presentation for CS 6212 final project in GWU during Fall 2015 (Prof. Arora's class)
EES Procedures and Functions for Heat exchanger calculationstmuliya
This file contains notes on the important topic of EES Functions for Heat Exchanger calculations. Some solved problems are also included.
These notes were prepared while teaching Heat Transfer course to the M.Tech. students in Mechanical Engineering Dept. of St. Joseph Engineering College, Vamanjoor, Mangalore, India.
It is hoped that these notes will be useful to teachers, students, researchers and professionals working in this field.
Contents:
• Overall heat transfer coefficient
• Importance of ‘Fouling factor’
• Analysis of heat exchangers by ‘Logarithmic Mean Temp Difference (LMTD)’ method
• Correction factors for Cross-flow and Shell & Tube heat exchangers
• Analysis of heat exchangers by ‘No. of Transfer Units (NTU)– Effectiveness (ε)’ method
• Compact heat exchangers
This document discusses the impact of timer resolution on the efficiency optimization of synchronous buck converters. It presents the problem of needing dead time to avoid short circuits but dead time reducing efficiency. Solutions like fixed, sensing-based and sensorless dead time adjustment are proposed. The analysis shows timer resolution limits optimization resolution. Experimental implementation on a microcontroller demonstrates high resolution PWM achieving 98.6% loss elimination vs 72% for low resolution, improving efficiency.
Vapor Combustor Improvement Project LinkedIn Presentation February 2016Tim Krimmel, MEM
This document presents a vapor combustor/flare system improvement project for Baker Hughes Chemical Facility. It describes the development of a nonlinear programming model in Excel to analyze the vapor system's performance over changing variables. Field data was collected from pressure transmitters and flowmeters to validate the model. A meeting with plant engineers helped determine production risk probabilities and vapor stream loadings to apply in a Monte Carlo simulation. The model was validated by comparing its descriptive statistics to field data statistics. The goal is to systematically analyze the system and propose solutions to improve performance using modeling, simulation, and data.
This Presentation describes, in short, Introduction to Time Series and the overall procedure required for Time Series Modelling including general terminologies and algorithms. However the detailed Mathematics is excluded in the slides, this ppt means to give a start to understanding the Time Series Modelling before going into detailed Statistics.
EES Functions and Procedures for Forced convection heat transfertmuliya
This file contains notes on Engineering Equation Solver (EES) Functions and Procedures for Forced convection heat transfer calculations. Some problems are also included.
These notes were prepared while teaching Heat Transfer course to the M.Tech. students in Mechanical Engineering Dept. of St. Joseph Engineering College, Vamanjoor, Mangalore, India.
It is hoped that these notes will be useful to teachers, students, researchers and professionals working in this field.
Contents:
• Forced convection – Tables of formulas
• Boundary layer, flow over flat plates, across cylinders, spheres and tube banks –
• Flow inside tubes and ducts
This document estimates various ARMA models, including AR(1), MA(3), and ARMA(1,1) models, on time series data. It then evaluates these models along with a random walk model on out-of-sample forecasting performance. Key metrics computed include AIC, SBIC, parameter estimates, standard errors, Ljung-Box and LM test statistics, mean squared error of forecasts, and Diebold-Mariano test statistics to compare the ARMA and random walk forecasts.
This document discusses forecasting gasoline prices in the United States using an ARIMA model. It provides background on gasoline, including its consumption and retail prices. The objective is to understand price volatility due to supply and demand constraints. Data on US gasoline prices from 1993-2014 is obtained from the EIA. After checking for stationarity and transforming the data, an ARIMA(1,1,3) model is identified as best. This model reveals gasoline prices are significantly related to past prices and unobserved factors. The validated model is used to forecast future gasoline prices.
This document compares different methods for predicting the density of liquid refrigerants, including correlations by ISH, Rackett, Yamada & Gunn, Spencer & Danner, Hankinson & Thomson, Reidel, and NM. It finds that the NM correlation best predicts densities for R22, R32, R134a, R152a, R600, and R12, while the Hankinson & Thomson method works best for R290, R600a, and R1270. The Reidel correlation accurately models R143a and R125, and the Yamada & Gunn and Spencer & Danner modifications are suitable for R123 and R718, respectively. Overall, the document evaluates methods for various refriger
Pressure research in kriss tilt effect 04122018 ver1.67Gigin Ginanjar
The document discusses tilt effects in high pressure pressure balances up to 500 MPa. It analyzes absolute and relative tilt effects through theoretical approaches, 2D and 3D FEA simulations, and experiments. The theoretical approach shows tilt can cause a change in effective area of around 3 ppm for 500 MPa balances and 1 ppm for 100 MPa balances. FEA simulations in perpendicular and tilted conditions were performed to investigate piston tilt effects on pressure distribution and effective area calculations. Experiments on various pressure balances showed 100 MPa balances follow a cosine behavior with tilt, while 500 MPa balances deviate more from ideal behavior.
This document provides an overview of ARMA, ARIMA, and SARIMA models. It describes the components of each model, including the autoregressive, integrated, and moving average parts. It also outlines the steps for identifying, estimating, and evaluating these models, including determining stationarity and selecting parameter values. The key assumptions of these times series models are that the data must be stationary or made stationary through differencing.
This document discusses thermodynamic properties of fluids, including:
1) Derivations of equations relating the primary thermodynamic properties of pressure, volume, temperature, internal energy, and entropy for homogeneous phases and fluids.
2) Calculations of changes in enthalpy, entropy, and internal energy based on changes in pressure and temperature.
3) The thermodynamic properties of Gibbs energy and residual properties.
4) An example problem calculating the enthalpy and entropy of saturated isobutane vapor at a specified temperature and pressure using compressibility factor and ideal gas heat capacity data.
This document discusses ARIMA (autoregressive integrated moving average) models for time series forecasting. It covers the basic steps for identifying and fitting ARIMA models, including plotting the data, identifying possible AR or MA components using the autocorrelation function (ACF) and partial autocorrelation function (PACF), estimating model parameters, checking the residuals to validate the model fit, and choosing the best model. An example analyzes quarterly US GNP data to demonstrate these steps.
This document describes a conjugate heat transfer analysis of an electronics cooling system using OpenFOAM. It outlines the objectives to develop a CFD model for CHT analysis and validate it with experiments. The methodology section describes the governing equations solved for fluid and solid regions as well as the interface coupling. A simple circuit board cooling case is modeled and tested. Additionally, a server cooling case is proposed with details on geometry, meshing, boundary conditions and results showing temperature distributions.
This document describes the group method for calculating multistage countercurrent cascades used in absorption, stripping, liquid-liquid extraction, and leaching. It provides the key equations for calculating the recovery fraction and determining the number of stages needed based on inlet and outlet stream compositions and flow rates. An example application to the absorption of hydrocarbon gases with an oil absorbent is included to demonstrate using the method to estimate exit streams given inlet conditions and the number of stages.
This document provides an introduction to ARIMA (AutoRegressive Integrated Moving Average) models. It discusses key assumptions of ARIMA including stationarity. ARIMA models combine autoregressive (AR) terms, differences or integrations (I), and moving averages (MA). The document outlines the Box-Jenkins approach for ARIMA modeling including identifying a model through correlograms and partial correlograms, estimating parameters, and diagnostic checking to validate the model prior to forecasting.
This document contains analysis of stationarity and unit root tests for the S&P 500 Index (SPIndex) and Atlanta housing price index (AtlantaHPIndex) time series data. Optimal lags were selected using the Bayesian information criterion. Unit root tests using these lags show that the null hypothesis of non-stationarity cannot be rejected for the SPIndex, but can be rejected for the AtlantaHPIndex, indicating it is stationary.
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
This document describes a study that uses multiple linear regression to model the rate of uninsured population in counties in Georgia. The study finds that the uninsured rate is closely related to demographic factors like age distribution, income levels, employment rates, gender distribution, and citizenship status. Specifically, counties with larger populations aged 18-24, higher median incomes, lower poverty rates, stronger job markets, and more native-born residents tended to have lower uninsured rates. The researchers used principal component analysis to address correlations between employment-related variables before selecting variables and building the regression model.
Churn Modeling-For-Mobile-Telecommunications Salford Systems
This document summarizes a study on predicting customer churn for a major mobile provider. TreeNet models were used to predict the probability of customers churning (switching providers) within a 30-60 day period. TreeNet models significantly outperformed other methods, increasing accuracy and the proportion of high-risk customers identified. Applying the most accurate TreeNet models could translate to millions in additional annual revenue by helping the provider preemptively retain more customers.
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...Salford Systems
This document discusses using logistic regression and classification and regression tree (CART) analysis to develop models for predicting undiagnosed HIV infection. The models were developed using data from over 10,000 patients. Logistic regression was used to create the Denver HIV Risk Score based on demographics, sexual behaviors, and other risk factors. CART analysis was then used to develop a decision tree to classify patients into risk groups. Both methods showed good ability to predict undiagnosed HIV. Future work includes external validation of the decision tree and comparing screening approaches.
Improve Your Regression with CART and RandomForestsSalford Systems
Why You Should Watch: Learn the fundamentals of tree-based machine learning algorithms and how to easily fine tune and improve your Random Forest regression models.
Abstract: In this webinar we'll introduce you to two tree-based machine learning algorithms, CART® decision trees and RandomForests®. We will discuss the advantages of tree based techniques including their ability to automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll explore the CART algorithm, bootstrap sampling, and the Random Forest algorithm (all with animations) and compare their predictive performance using a real world dataset.
Predicting Hospital Readmission Using TreeNetSalford Systems
This document discusses using a TreeNet model to predict hospital readmissions using data from electronic medical records (EMRs). It provides an overview of the dataset used, which includes over 1,000 clinical and administrative variables for 1,612 heart failure patients. The document describes how TreeNet was used to create a predictive model from the EMR data, including feature selection and parameter tuning. It also discusses evaluating the model, assessing variable importance, and exploring model results to gain clinical insights. The goal is to develop an accurate model that can help reduce the risk of avoidable hospital readmissions.
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
This document discusses predictive modeling in insurance in the context of big data. It begins with an introduction to the speaker and outlines some key concepts in actuarial science from both American and European perspectives. It then provides examples of common actuarial problems involving ratemaking, pricing, and claims reserving. The document reviews the history of actuarial models and discusses issues around statistical learning, machine learning, and their relationship to statistics. It also covers model evaluation and various loss functions used in modeling.
The document discusses decision trees and random forest algorithms. It begins with an outline and defines the problem as determining target attribute values for new examples given a training data set. It then explains key requirements like discrete classes and sufficient data. The document goes on to describe the principles of decision trees, including entropy and information gain as criteria for splitting nodes. Random forests are introduced as consisting of multiple decision trees to help reduce variance. The summary concludes by noting out-of-bag error rate can estimate classification error as trees are added.
Logistic regression is a statistical model used to predict binary outcomes like disease presence/absence from several explanatory variables. It is similar to linear regression but for binary rather than continuous outcomes. The document provides an example analysis using logistic regression to predict risk of HHV8 infection from sexual behaviors and infections like HIV. The analysis found HIV and HSV2 history were associated with higher odds of HHV8 after adjusting for other variables, while gonorrhea history was not a significant independent predictor.
Logistic management and supply chain management are critical for businesses. Logistics involves planning, implementing, and controlling the flow of materials and finished goods from suppliers to customers. It has become more important with globalization, a focus on integrated supply chain management, and the outsourcing of non-core functions. Effective logistics can provide both cost leadership and differentiated products and services by optimizing activities across the value chain.
Logistic regression allows prediction of discrete outcomes from continuous and discrete variables. It addresses questions like discriminant analysis and multiple regression but without distributional assumptions. There are two main types: binary logistic regression for dichotomous dependent variables, and multinomial logistic regression for variables with more than two categories. Binary logistic regression expresses the log odds of the dependent variable as a function of the independent variables. Logistic regression assesses the effects of multiple explanatory variables on a binary outcome variable. It is useful when the dependent variable is non-parametric, there is no homoscedasticity, or normality and linearity are suspect.
Using CART For Beginners with A Teclo Example DatasetSalford Systems
Familiarize yourself with CART Decision Tree technology in this beginner's tutorial using a telecommunications example dataset from the 1990s. By the end of this tutorial you should feel comfortable using CART on your own with sample or real-world data.
Logistic Regression in Case-Control StudySatish Gupta
This document provides an introduction to using logistic regression in R to analyze case-control studies. It explains how to download and install R, perform basic operations and calculations, handle data, load libraries, and conduct both conditional and unconditional logistic regression. Conditional logistic regression is recommended for matched case-control studies as it provides unbiased results. The document demonstrates how to perform logistic regression on a lung cancer dataset to analyze the association between disease status and genetic and environmental factors.
This document provides an overview of statistical process control (SPC). It discusses key SPC concepts including:
1) SPC focuses on detecting and eliminating abnormal variations (assignable causes) to achieve consistent quality.
2) SPC requires knowledge of basic statistics, variation, histograms, process capability, and control charts. Control charts are used to monitor a process and detect when assignable causes result in variations outside the natural limits.
3) A histogram provides a visual representation of a process and can indicate if a process is capable and centered on the target, or if assignable causes are present.
Ch 03 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Chyi-Tsong Chen
The slides of Chapter 3 of the book entitled "MATLAB Applications in Chemical Engineering": Interpolation, Differentiation, and Integration. Author: Prof. Chyi-Tsong Chen (陳奇中教授); Center for General Education, National Quemoy University; Kinmen, Taiwan; E-mail: chyitsongchen@gmail.com.
Ebook purchase: https://play.google.com/store/books/details/MATLAB_Applications_in_Chemical_Engineering?id=kpxwEAAAQBAJ&hl=en_US&gl=US
This document discusses various statistical tools used in decision making, including regression analysis, confidence intervals, comparison tests, and analysis of variance. It provides examples of how regression analysis can be used to determine correlations and unknown parameters. It also explains how confidence intervals are calculated and used to determine how reliable a sample statistic is in estimating an unknown population parameter. Comparison tests are outlined as a method to determine if one process or supplier is better than another.
The document describes an experiment examining the effect of different culturing conditions on the growth of Methicillin-resistant Staphylococcus aureus (MRSA) strains. Five MRSA strains were cultured under various time, temperature, and tryptone concentration levels. ANOVA and polynomial regression analyses found that time, temperature, and concentration all significantly affected bacterial counts, with some interaction effects. The optimal conditions estimated were 48 hours for time and 35°C for temperature based on maximizing counts in the regression models.
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Correlation and Reg...J. García - Verdugo
The document discusses correlation and regression analysis. It provides an overview of key concepts like the regression coefficient, correlation coefficient, and fitted line plots. It also describes how to calculate regression using the method of least squares and how to validate factors using tools like t-tests, ANOVA, and regression. An example is shown analyzing the relationship between softening temperature measured at a supplier vs. a customer. The correlation between the two factors is calculated to be 0.834, indicating a strong positive correlation.
The document discusses control systems and feedback control. It provides examples of regulatory control using a siphon water level control system. It describes open loop and closed loop control structures. Key signals in a feedback control loop are defined including the set point, error, control, disturbance, and noise signals. Time domain analysis in MATLAB is demonstrated including step response evaluations. Stability concepts are explained for different pole locations including real poles, imaginary poles, and complex poles. Effects of zeros and additional poles on stability are also covered. The document concludes with a brief discussion of specifying controller requirements.
The document discusses process capability and statistical quality control. It provides information on different types of process variation and process capability indices. It also summarizes key concepts in statistical process control including control charts for attributes and variables as well as acceptance sampling plans. Examples are given for constructing control charts and solving acceptance sampling problems.
This document provides an introduction to correlation and regression analysis. It defines correlation as a measure of the association between two variables and regression as using one variable to predict another. The key aspects covered are:
- Calculating correlation using Pearson's correlation coefficient r to measure the strength and direction of association between variables.
- Performing simple linear regression to find the "line of best fit" to predict a dependent variable from an independent variable.
- Using a TI-83 calculator to graphically display scatter plots of data and calculate the regression equation and correlation coefficient.
This document provides an overview of key concepts in statistics for quantitative analysis, including:
- Statistics are mathematical tools used to describe and make judgments about data. The type of statistics discussed assumes data has a normal (bell-shaped) distribution.
- The normal distribution is characterized by a mean (μ) and standard deviation (σ or s). Standard deviation quantifies the spread of data around the mean.
- Common statistical tests covered include confidence intervals, comparing a measured value to a known value using a t-test, and comparing means of two data sets using an F-test and t-test.
- The F-test determines if the standard deviations of two data sets are significantly different before using
This frequency table summarizes data from 41 respondents. It shows that most respondents were aged 20-35 years old (95.1%), had a secondary education (56.1%), were not currently working (53.7%), were primiparous (had one child, 51.2%), and were between 13-28 weeks pregnant (58.5%). Descriptive statistics are also shown for pre- and post-test scores on knowledge (prePeng and postPeng) and attitudes (preSikap and postSikap). Normality tests and non-parametric tests revealed significant differences between pre- and post-test scores. General linear models found no significant effects of demographic variables on post-test scores.
The document discusses the steps for conducting a response surface methodology (RSM) experiment using central composite design (CCD). It involves determining independent and dependent variables, selecting an appropriate CCD, conducting the experiment runs according to the design, analyzing the data using statistical methods to develop a mathematical model and check its adequacy, and using the model to optimize responses. Key aspects of RSM and CCD covered include developing the design, analyzing results through ANOVA and regression, and checking model validity.
This document outlines statistical quality control techniques for evaluating manufacturing and service processes. It discusses measuring and controlling process variation using variables like mean, standard deviation and control charts. Key aspects covered include process capability analysis using metrics like Cpk, acceptance sampling plans to determine quality levels while balancing producer and consumer risks, and operating characteristic curves.
- Response surface methodology (RSM) is used to optimize processes with multiple variables to maximize or minimize a response. It uses experimental design and regression analysis.
- The method of steepest ascent is used to sequentially move from an initial guess towards the optimum region using a first-order model. Additional experiments are conducted to fit higher-order models closer to the optimum.
- A second-order model that includes interaction and quadratic terms can identify if the stationary point is a maximum, minimum, or saddle point. Canonical analysis of the eigenvalues further characterizes the stationary point.
This document discusses statistical process control techniques. It defines natural and special variation in processes, and how control charts can be used to distinguish between them. Specifically, it covers x-bar and R charts to monitor central tendency and dispersion of a process, p charts and c charts for attribute data to monitor defect rates, and how to calculate control limits for these different chart types. Examples are provided to illustrate how to construct and interpret x-bar, p, and c charts.
This document provides an overview of statistical process control (SPC) concepts including control charts, process capability, and applying SPC to services. It discusses control charts for attributes like p-charts and c-charts and control charts for variables like x-bar charts and R-charts. It also covers determining control limits, identifying patterns in control charts, and using Excel for SPC.
TMPA-2017: The Quest for Average Response TimeIosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
The Quest for Average Response Time
Thomas A. Henzinger (President, IST, Austria Institute of Science and Technology)
For video follow the link: https://youtu.be/bCMj2toH1b4
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
The document discusses various models that have been used to model power markets, including models derived from finance like Black-Scholes and multifactor models. It notes that most early models simply transposed models from finance without considering factors specific to power markets like seasonality. More recently, models have started to incorporate external variables like temperature and better represent features of power prices like switching behavior and jumps. Overall, significant work remains to develop models that fully capture the complexity of power markets.
Statistical Process Control (SPC) uses statistical methods like control charts to monitor and control processes by distinguishing between common and assignable causes of variation, with the goal of keeping processes stable and within specification limits through the detection and elimination of assignable causes. SPC analyzes variables and attributes data through techniques such as x-charts, R-charts, and p-charts to measure factors like the central tendency and dispersion of a process. Process capability analysis compares the natural variation in a process to specification limits to determine if the process is capable of consistently meeting customer requirements.
The document discusses tests for evaluating pseudo-random number generators. It describes the Kolmogorov-Smirnov test and chi-square test, which compare the distribution of generated random numbers to a uniform distribution. The Kolmogorov-Smirnov test calculates a test statistic D and compares it to a critical value table to determine if the numbers are uniformly distributed. The chi-square test divides the range into classes and calculates a test statistic based on observed and expected frequencies in each class. An example demonstrates applying the Kolmogorov-Smirnov test to a set of numbers.
Similar to Analysis Of A Binary Outcome Variable (20)
1. Analysis of a Binary Outcome Variable Using the FREQ and the LOGISTIC Procedures Arthur Li
2.
3.
4.
5. ODDS RATIO A Odds 1 = B Odds 0 = C D D C 0 B A 1 Exposure (X) 0 1 Outcome (Y) Odds Ratio = Odds 1 Odds 0 AD BC =
6.
7.
8. PROC FREQ data breathTest; input test $ 1 - 8 neversmk $ 10 - 16 count; datalines ; abnormal current 131 normal current 927 abnormal never 38 normal never 741 ; 741 (D) 38 (C) NEVER (0) 927 (B) 131 (A) CURRENT (1) SMOKING STATUS (X) NORMAL (0) ABNORMAL (1) BREATHING TEST (Y)
9. PROC FREQ proc freq data =breathTest; weight count; tables neversmk*test; run ; the data is entered directly from the cell count of the table The FREQ Procedure Table of neversmk by test neversmk test Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚abnormal‚normal ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ current ‚ 131 ‚ 927 ‚ 1058 ‚ 7.13 ‚ 50.46 ‚ 57.59 ‚ 12.38 ‚ 87.62 ‚ ‚ 77.51 ‚ 55.58 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ never ‚ 38 ‚ 741 ‚ 779 ‚ 2.07 ‚ 40.34 ‚ 42.41 ‚ 4.88 ‚ 95.12 ‚ ‚ 22.49 ‚ 44.42 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 169 1668 1837 9.20 90.80 100.00
10.
11.
12. PROC FREQ - CHISQ proc freq data =breathTest; weight count; tables neversmk*test/ relrisk chisq ; run ; Statistics for Table of neversmk by test Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 30.2421 <.0001 Likelihood Ratio Chi-Square 1 32.3820 <.0001 Continuity Adj. Chi-Square 1 29.3505 <.0001 Mantel-Haenszel Chi-Square 1 30.2257 <.0001 Phi Coefficient 0.1283 Contingency Coefficient 0.1273 Cramer's V 0.1283
13.
14.
15. LOGISTIC REGRESSION MODEL Reference cell coding β: the increment in log odds for current smokers compared to those that never smoked 741 38 NEVER 927 131 CURRENT SMOKING STATUS NORMAL ABNORMAL BREATHING TEST
16. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk / param =ref; weight count; model test = neversmk; run ; The LOGISTIC Procedure Model Information Data Set WORK.BREATHTEST Response Variable test Number of Response Levels 2 Weight Variable count Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 4 Number of Observations Used 4 Sum of Weights Read 1837 Sum of Weights Used 1837
17.
18.
19.
20.
21.
22.
23.
24.
25.
26. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = "never" ) / param =ref; weight count; model test = neversmk; run ; Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq neversmk 1 28.2434 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.9704 0.1663 318.9365 <.0001 neversmk current 1 1.0136 0.1907 28.2434 <.0001 Current smoker has 1.01 increase in the log odds of having abnormal test compared to people who never smoked OR = exp(1.0136) = 2.756
27. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = "never" ) / param =ref; weight count; model test = neversmk; run ; Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits neversmk current vs never 2.756 1.896 4.004 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 2.7557 1.8962 4.0047 Cohort (Col1 Risk) 2.5383 1.7904 3.5987 Cohort (Col2 Risk) 0.9211 0.8960 0.9470 Sample Size = 1837 Result from PROC FREQ:
28. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = "never" ) / param =ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk; run ; ODDSRATIO <‘label’> variable </options>; new to 9.2! Wald Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits smoking neversmk current vs never 2.756 1.896 4.004
29.
30. CONFOUNDING Smoking Test Age Not including Age can cause either over-/under-estimates of the relationship between Smoking & Test
31. CONFOUNDING Log (odds) Non smoker smoker Smoking Test Age Adjusting age, you are comparing smoker and non-smoker at the common values of age Age Non smoker Non smoker smoker smoker < 40 ≥ 40
32.
33.
34.
35.
36.
37. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING Is the association between “Smoking” & “Test” different in the 2 age groups? There is an interaction. Report age-specific OR No Interaction. Is “Age” a confounder? Report Crude OR Report Age-Adjusted OR Y N Y N
38.
39. PROC FREQ: INTERACTION EFFECT data breathTestAge; input test $ 1 - 8 neversmk $ 10 - 16 over40 $ 18 - 20 count; datalines ; normal never no 577 abnormal never no 34 normal current no 682 abnormal current no 57 normal never yes 164 abnormal never yes 4 normal current yes 245 abnormal current yes 74 ;
40.
41. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 18.0829 DF 1 Pr > ChiSq <.0001 Total Sample Size = 1837 the association between smoking status and the breathing test are not the same across different age groups
42. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Statistics for Table 1 of neversmk by test Controlling for over40=no Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 2.4559 0.1171 Likelihood Ratio Chi-Square 1 2.4893 0.1146 Continuity Adj. Chi-Square 1 2.1260 0.1448 Mantel-Haenszel Chi-Square 1 2.4541 0.1172 Phi Coefficient 0.0427 Contingency Coefficient 0.0426 Cramer's V 0.0427 Statistics for Table 1 of neversmk by test Controlling for over40=no Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 1.4184 0.9144 2.2000 Cohort (Col1 Risk) 1.3861 0.9190 2.0906 Cohort (Col2 Risk) 0.9772 0.9499 1.0054 Sample Size = 1350
43. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Statistics for Table 2 of neversmk by test Controlling for over40=yes Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 35.4510 <.0001 Likelihood Ratio Chi-Square 1 45.1246 <.0001 Continuity Adj. Chi-Square 1 33.9203 <.0001 Mantel-Haenszel Chi-Square 1 35.3782 <.0001 Phi Coefficient 0.2698 Contingency Coefficient 0.2605 Cramer's V 0.2698 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 12.3837 4.4416 34.5272 Cohort (Col1 Risk) 9.7429 3.6253 26.1844 Cohort (Col2 Risk) 0.7868 0.7374 0.8394
44. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Summary Statistics for neversmk by test Controlling for over40 Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 25.2444 <.0001 2 Row Mean Scores Differ 1 25.2444 <.0001 3 General Association 1 25.2444 <.0001 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 2.5683 1.7618 3.7441 (Odds Ratio) Logit 1.9840 1.3252 2.9702 Cohort Mantel-Haenszel 2.4174 1.6754 3.4879 (Col1 Risk) Logit 1.8475 1.2641 2.7001 Cohort Mantel-Haenszel 0.9289 0.9046 0.9538 (Col2 Risk) Logit 0.9437 0.9195 0.9686 These statistics and its adjusted OR are only useful if there is a homogeneity in the OR across each category of the adjusting variable
45. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ;
46. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8315 0.1765 257.4193 <.0001 neversmk current 1 0.3495 0.2240 2.4355 0.1186 over40 yes 1 -0.8820 0.5359 2.7086 0.0998 neversmk*over40 current yes 1 2.1668 0.5691 14.4985 0.0001 Wald Test:
47. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Likelihood Ratio Test:
48. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1055.467 SC 1130.497 1055.785 -2 Log L 1128.417 1047.467 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 80.9500 3 <.0001 Score 95.7956 3 <.0001 Wald 81.3305 3 <.0001
49. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1074.123 SC 1130.497 1074.361 -2 Log L 1128.417 1068.123 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 60.2942 2 <.0001 Score 61.2515 2 <.0001 Wald 56.4737 2 <.0001
50. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; ods output FitStatistics = log2Ratio_full GlobalTests = df_full; data _null_ ; set log2Ratio_full; if Criterion = '-2 Log L' ; call symput( 'neg2L_full' , InterceptAndCovariates); data _null_ ; set df_full; if Test = 'Likelihood Ratio' ; call symput( 'df_full' , DF);
51. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40; ods output FitStatistics = log2Ratio_reduce GlobalTests = df_reduce; data _null_ ; set log2Ratio_reduce; if Criterion = '-2 Log L' ; call symput( 'neg2L_reduce' , InterceptAndCovariates); data _null_ ; set df_reduce; if Test = 'Likelihood Ratio' ; call symput( 'df_reduce' , DF); run ;
52. PROC LOGISTIC: INTERACTION EFFECT data result; LR = &neg2L_reduce - &neg2L_full; df = &df_full - &df_reduce; p = 1 -probchi(LR,df); label LR = 'Likelihood Ratio' ; proc print data =result label noobs ; title "Likelihood ratio test" ; run ; Likelihood ratio test Likelihood Ratio df p 20.6558 1 .000005497
53. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; oddsratio neversmk/ at (over40 = 'no' ) ; oddsratio neversmk/ at (over40 = 'yes' ); run ; Wald Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits neversmk current vs never at over40=no 1.418 0.914 2.200 neversmk current vs never at over40=yes 12.383 4.441 34.525
54.
55. NURSE HEALTH STUDY data nurse_study; input bc age oc count; datalines ; 1 0 1 71 0 0 1 28418 1 0 0 35 0 0 0 12267 1 1 1 143 0 1 1 20661 1 1 0 321 0 1 0 44424 ; BREAST CANCER 35 71 CASE (1) AGE 30 – 39 (0) 12267 28418 CONTROL (0) 44424 321 NO (0) 20651 143 YES (1) OC USE CONTROL (0) CASE (1) AGE 40 – 55 (1)
56. NURSE HEALTH STUDY proc freq data =nurse_study order =data; weight count; tables age*oc*bc/ chisq relrisk cmh ; run ; Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 0.1521 DF 1 Pr > ChiSq 0.6966 There is no interaction Check for confounding
57. NURSE HEALTH STUDY Summary Statistics for oc by bc Controlling for age Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 0.4361 0.5090 2 Row Mean Scores Differ 1 0.4361 0.5090 3 General Association 1 0.4361 0.5090 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 0.9419 0.7882 1.1256 (Odds Ratio) Logit 0.9415 0.7882 1.1246 Cohort Mantel-Haenszel 0.9422 0.7897 1.1243 (Col1 Risk) Logit 0.9419 0.7894 1.1238 Cohort Mantel-Haenszel 1.0003 0.9994 1.0013 (Col2 Risk) Logit 1.0003 0.9995 1.0012
58. NURSE HEALTH STUDY proc freq data =nurse_study order =data; weight count; tables oc*bc/ chisq relrisk ; run ; Statistics for Table of oc by bc Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 17.8881 <.0001 Likelihood Ratio Chi-Square 1 18.1401 <.0001 Continuity Adj. Chi-Square 1 17.5337 <.0001 Mantel-Haenszel Chi-Square 1 17.8879 <.0001 Phi Coefficient -0.0130 Contingency Coefficient 0.0130 Cramer's V -0.0130 Statistics for Table of oc by bc Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 0.6944 0.5858 0.8230 Cohort (Col1 Risk) 0.6957 0.5874 0.8239 Cohort (Col2 Risk) 1.0019 1.0010 1.0028
59.
60. NURSE HEALTH STUDY proc logistic data =nurse_study descending ; weight count; model bc = oc age; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.9083 0.1156 2612.5788 <.0001 oc 1 -0.0602 0.0911 0.4360 0.5090 age 1 0.9835 0.1133 75.3707 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits oc 0.942 0.788 1.126 age 2.674 2.141 3.338
61. NURSE HEALTH STUDY proc logistic data =nurse_study descending ; weight count; model bc = oc; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.0704 0.0532 9095.8096 <.0001 oc 1 -0.3646 0.0867 17.6834 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits oc 0.694 0.586 0.823