This document provides an overview of regression models and their uses in institutional research. It will introduce common regression techniques like linear regression, logistic regression, and hierarchical linear modeling. The workshop aims to educate participants on different types of regression, how to interpret results, and how regression can help answer research questions. Key topics that will be covered include scales of measurement, assumptions of different regression techniques, and practical applications of regression in institutional research. The goal is for participants to learn what regression models may help further their institutional research work.
This document discusses categorical data analysis and chi-square tests. It explains that categorical data analysis involves variables that are categorical or nominal. Chi-square tests can be used to examine relationships between categorical variables. The document provides an example of a contingency table and chi-square test using SPSS to analyze the relationship between gender and nutrition knowledge. Assumptions of the chi-square test are outlined and it is explained what to do if assumptions are not met, such as using Fisher's exact test for 2x2 tables.
First Learning Community Presentation Final 10-23-2012 Colby Stoever
This document summarizes retention data from Texas A&M University-Corpus Christi. It shows first-time in college retention rates have increased from 57% to 64% from 2005 to 2010. It then analyzes retention based on factors like gender (female retention is higher), admission type (normal admits retain at a higher rate), academic standing (students in good standing retain more), and high school quartile (top quarter retains most). Subsequent slides provide more detailed breakdowns of retention by characteristics of student cohorts from 2011 and 2012, identifying risk factors. The goal is to build better data resources to understand student retention.
Qualtrics is an advanced online survey platform that allows for robust survey design, distribution, and reporting. It offers over 100 question types, logic and randomization functions, multimedia integration, and collaboration tools. Surveys can be distributed via email, links, QR codes and social media. Reporting includes visualizations, statistics, and export options. The Health Science Center plans to implement Qualtrics in April to provide a new survey tool for the community.
Using Multiple Tools to Create DashboardsColby Stoever
The document discusses creating dashboards using multiple tools by pulling data from various sources into databases. It describes using SAS macros and SQL stored procedures to automate dashboard updates by storing recurring reports and datasets. Key points covered include identifying frequently requested data, designing databases for measures and grouped/summarized data, moving data into formats for tools like Tableau, and scheduling automatic updates. Security, customization for different audiences, and the purpose of dashboards for stakeholders are also addressed.
The document provides an introduction to statistics, discussing the meaning, history, and applications of statistics. It defines key statistical concepts such as population and sample, descriptive and inferential statistics. It also discusses the different types of variables and levels of measurement. The document traces the history of statistics from ancient times to the present day, highlighting important contributors to the field. It provides examples of how statistics is used in different domains like education, business, research, and government.
This document discusses various statistical concepts including mean deviation, standard deviation, correlation, and regression analysis. It defines these terms and provides their formulas and applications. Mean deviation measures the average deviations from the mean while standard deviation is a measure of data dispersion. Correlation quantifies the relationship between two variables. Regression analysis is used to predict how independent variables affect dependent variables, with linear and logistic regression being discussed.
The document discusses model specification for multiple regression analysis, focusing on measures of fit including R-squared and standard error of regression, and how to properly interpret these statistics. It emphasizes the importance of random sampling to establish causal relationships and warns of potential biases from non-random samples, such as when evaluating mutual fund performance or estimating political support based on telephone and automobile owners.
This document provides an overview of key concepts in quantitative data analysis, including:
1. It describes four scales of measurement (nominal, ordinal, interval, ratio) and warns against using statistics inappropriate for the scale of data.
2. It distinguishes between parametric and non-parametric statistics, descriptive and inferential statistics, and the types of variables and analyses.
3. It explains important statistical concepts like hypotheses, one-tailed and two-tailed tests, distributions, significance, and avoiding type I and II errors in hypothesis testing.
This document discusses categorical data analysis and chi-square tests. It explains that categorical data analysis involves variables that are categorical or nominal. Chi-square tests can be used to examine relationships between categorical variables. The document provides an example of a contingency table and chi-square test using SPSS to analyze the relationship between gender and nutrition knowledge. Assumptions of the chi-square test are outlined and it is explained what to do if assumptions are not met, such as using Fisher's exact test for 2x2 tables.
First Learning Community Presentation Final 10-23-2012 Colby Stoever
This document summarizes retention data from Texas A&M University-Corpus Christi. It shows first-time in college retention rates have increased from 57% to 64% from 2005 to 2010. It then analyzes retention based on factors like gender (female retention is higher), admission type (normal admits retain at a higher rate), academic standing (students in good standing retain more), and high school quartile (top quarter retains most). Subsequent slides provide more detailed breakdowns of retention by characteristics of student cohorts from 2011 and 2012, identifying risk factors. The goal is to build better data resources to understand student retention.
Qualtrics is an advanced online survey platform that allows for robust survey design, distribution, and reporting. It offers over 100 question types, logic and randomization functions, multimedia integration, and collaboration tools. Surveys can be distributed via email, links, QR codes and social media. Reporting includes visualizations, statistics, and export options. The Health Science Center plans to implement Qualtrics in April to provide a new survey tool for the community.
Using Multiple Tools to Create DashboardsColby Stoever
The document discusses creating dashboards using multiple tools by pulling data from various sources into databases. It describes using SAS macros and SQL stored procedures to automate dashboard updates by storing recurring reports and datasets. Key points covered include identifying frequently requested data, designing databases for measures and grouped/summarized data, moving data into formats for tools like Tableau, and scheduling automatic updates. Security, customization for different audiences, and the purpose of dashboards for stakeholders are also addressed.
The document provides an introduction to statistics, discussing the meaning, history, and applications of statistics. It defines key statistical concepts such as population and sample, descriptive and inferential statistics. It also discusses the different types of variables and levels of measurement. The document traces the history of statistics from ancient times to the present day, highlighting important contributors to the field. It provides examples of how statistics is used in different domains like education, business, research, and government.
This document discusses various statistical concepts including mean deviation, standard deviation, correlation, and regression analysis. It defines these terms and provides their formulas and applications. Mean deviation measures the average deviations from the mean while standard deviation is a measure of data dispersion. Correlation quantifies the relationship between two variables. Regression analysis is used to predict how independent variables affect dependent variables, with linear and logistic regression being discussed.
The document discusses model specification for multiple regression analysis, focusing on measures of fit including R-squared and standard error of regression, and how to properly interpret these statistics. It emphasizes the importance of random sampling to establish causal relationships and warns of potential biases from non-random samples, such as when evaluating mutual fund performance or estimating political support based on telephone and automobile owners.
This document provides an overview of key concepts in quantitative data analysis, including:
1. It describes four scales of measurement (nominal, ordinal, interval, ratio) and warns against using statistics inappropriate for the scale of data.
2. It distinguishes between parametric and non-parametric statistics, descriptive and inferential statistics, and the types of variables and analyses.
3. It explains important statistical concepts like hypotheses, one-tailed and two-tailed tests, distributions, significance, and avoiding type I and II errors in hypothesis testing.
Here are the steps to perform linear regression on the advertising dataset to predict sales based on TV spend:
1. Import necessary libraries
2. Split data into training and test sets
3. Create linear regression object
4. Fit the model on training data
5. Predict on test data
6. Calculate accuracy on test data
7. Print coefficient and intercept values
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42
This document discusses key concepts in research methods and statistics. It covers the four scales of measurement (nominal, ordinal, interval, ratio), inferential statistics, stimuli, and the basic statistical symbols. It also discusses experimental design elements like independent and dependent variables, operational definitions, sampling methods, reliability and validity.
This document provides an overview of econometric modeling techniques. It discusses objectives of econometric modeling including empirical verification of economic theories and policy analysis. It also describes types of econometric models such as single-equation regression models, simultaneous-equation models, and time series models. Model building criteria and assumptions of single-equation regression models are outlined along with methods for dealing with violations of assumptions like multicollinearity and autocorrelation.
5. Identifying variables and constructing hypothesisRazif Shahril
This document discusses identifying variables and constructing hypotheses in research methodology. It defines a variable as a concept that can be measured on different scales and provides examples of converting concepts into measurable variables. The document outlines different types of variables based on their role in a cause model or study design. It also describes nominal, ordinal, interval and ratio measurement scales. Finally, the document defines a hypothesis as a testable statement about the relationship between two or more variables and lists some functions of a hypothesis in focusing and guiding a research study.
This document discusses multicollinearity in regression analysis. It defines multicollinearity as an exact or near-exact linear relationship between explanatory variables. In cases of perfect multicollinearity, individual regression coefficients cannot be estimated. Near or imperfect multicollinearity is more common in real data and can lead to less precise coefficient estimates with wider confidence intervals. The document discusses various methods for detecting multicollinearity, such as auxiliary regressions and variance inflation factors, and potential remedies like dropping or transforming variables. However, multicollinearity diagnosis depends on the specific data sample and goals of the analysis.
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS).docxwhitneyleman54422
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS) 1
Practice Problems: 12.5 (p. 588), 12.9 (p.588)
(4 pts.) 1. For each of the following graphs, identify the form, direction (if possible) and relative
strength. In addition, state if you think that there is an association between X and Y. No
explanation is required.
a) b)
c) d)
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS) 2
(14 pts.) 2. Deep-water (>300m) wave forecasts are important for large cargo ships. One
method of prediction suggests that the wind speed (x, in knots) is linearly related to the wave
height (y, in feet). A random sample of buoys was obtained, and the wind speed and wave
height was measured at each. The summary data is shown below.
n = 20, SXX = 91.75, SYY = 15.952, SXY= 36.4, x̄ = 9.25, ȳ = 1.68
The scatter plot of the data is shown below:
(2 pts.) a) Find the estimated regression line for the regression of Wave Height as a function of
Wind Speed.
(1 pt.) b) Does the y-intercept have any physical meaning?
(1 pt.) c) How much change in wave height is expected when the wind speed increases by one
knot? Please explain your answer.
(1 pt.) d) What is the expected value of wave height when the wind speed is 8.6 knots (10
mph)?
(6 pts.) e) Complete the following ANOVA table.
Source of variation Degrees of Freedom Sum of squares Mean square
Regression
Error
Total
(1 pt.) f) What is the estimated variance?
(1 pt.) g) What is the proportion of the wave height that is explained by wind speed?
(1 pt.) h) From the information in the previous parts of this question, do you believe that there is
an association between wave height and wind speed? Please explain your answer. No
additional calculations are required.
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS) 3
(2 pts.) 3. Some physicians use the cholesterol ratio (CR = total cholesterol/HDL cholesterol) as
a measure of a patient’s risk of heart disease. In addition, the triglyceride concentration (TG)
is associated with coronary artery disease in many patients. In a study of the relationship
between these two variables, a random sample of adults was obtained, and the triglyceride
level denoted as x1 in mg/dL and cholesterol ratio (y) was obtained for each person. The
scatterplot and regression line of ln(triglyceride level - 129) denoted as x2 vs. cholesterol ratio
is below.
The ANOVA summary table is
Source of Variation Sum of Squares Degrees of freedom Mean Square
Regression 103.16 1 103.16
Error 3.20 23 0.14
Total 106.36 24
(1 pt.) a) What is the coefficient of determination?
(1 pt.) b) Do you think that an increase in the triglyceride level causes an increase in the
cholesterol level? Please explain your answer.
(1 pt.) BONUS: Why do you think that they had to take the logarithm of the triglyceride level?
Additional Problems: Note, the book gives.
Pitfalls of multivariate pattern analysis(MVPA), fMRI Emily Yunha Shin
Two papers review:
* (Part of) A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives
* The impact of study design on pattern estimation for single-trial multivariate pattern analysis
This document discusses conceptual frameworks, theories, and research questions/hypotheses in qualitative and quantitative research. It provides:
1. An overview of what conceptual frameworks are and how they are used to define variables and relationships in a study.
2. Descriptions of how theories are applied differently in qualitative versus quantitative research, such as testing theories deductively in quantitative research and generating theories inductively in qualitative research.
3. Guidelines for writing good qualitative research questions, quantitative research questions/hypotheses, and mixed methods research questions/hypotheses.
This document discusses laboratory errors, their causes, types, and impacts. It describes that errors can occur in the pre-analytical, analytical, and post-analytical phases of testing and provide examples of errors in each phase. Errors are categorized as either determinate (systemic) errors, which are reproducible and can be identified and corrected, or indeterminate (random) errors, which are caused by uncontrollable variables and cannot be eliminated. The key goals are improving precision by reducing indeterminate errors and improving accuracy by reducing determinate errors.
The item characteristic curve is a fundamental concept in item response theory that describes the relationship between a test-taker's ability and their probability of correctly answering an item. It is defined by two key parameters: item difficulty, which indicates where on the ability scale an item best discriminates, and item discrimination, which refers to how well an item can differentiate ability levels above and below the difficulty point. Together, these parameters determine the shape of the curve, with steeper curves indicating better discrimination.
1. Regression analysis is a statistical process for estimating relationships between variables, including linear regression, logistic regression, and other types.
2. It allows predicting a dependent or response variable's values based on the values of independent or input variables.
3. Multiple linear regression allows modeling relationships between a scalar dependent variable and two or more explanatory variables.
This document outlines the syllabus for a course titled "Predictive Analytics" taught by K. Mohanasundaram. The syllabus covers topics such as introduction to business analytics, mathematical modelling, data prediction techniques, regression analysis methods like simple linear regression, logistic regression, and forecasting techniques. It recommends textbooks and references for the course and provides an introduction to concepts like uncertainty modelling using probability distributions and random variables.
Lesson 2 introduction to research in language studiesgabonetoby
This document discusses key aspects of the research process and developing a theoretical framework. It covers selecting a research area and questions, developing hypotheses, identifying variables, and determining appropriate measurement levels for variables. The role and importance of theory in research is explained. Guidelines are provided for constructing a theoretical framework, including examining the research problem and related literature to identify relevant theories. Examples of independent, dependent, and control variables are given. Finally, the document outlines the different levels of variable measurement - nominal, ordinal, interval, and ratio.
This document discusses various multivariate statistical methods including logistic regression, multinomial logistic regression, and multilevel modeling. It provides examples and comparisons of different types of regressions, discusses model building and interpretation. It also covers diagnostic tests, effect sizes, and hypothesis testing in multilevel modeling. An example article on patient experience ratings is summarized, highlighting the dependent and independent variables, confounders, and methods used including multinomial logistic regression and interpreting odds ratios.
This document provides an introduction to using directed acyclic graphs (DAGs) for confounder selection in nonexperimental studies. It discusses what DAGs are, their benefits over traditional covariate selection approaches, limitations, key terminology, and examples of how DAGs can identify when adjustment is needed or could induce bias. The document also introduces d-separation criteria for assessing open and closed paths in DAGs and overviews software tools for applying these rules to select minimum adjustment sets from complex causal diagrams.
The document discusses the scientific method or "Sistemang Harana" in Tagalog. It explains the key steps in the scientific method as identification, formulation of hypotheses, testing of hypotheses through repeated trials and observation of effects, and reaching a conclusion and recommendations. The overall process aims to identify and solve problems in a systematic way.
This document provides an overview of inferential statistics and statistical tests that can be used, including correlation tests, t-tests, and how to determine which tests are appropriate. It discusses the assumptions of parametric tests like Pearson's correlation and t-tests, and how to check assumptions graphically and using statistical tests. Specific procedures for conducting correlation analyses in Excel and SPSS are outlined, along with how to interpret and report the results.
This document provides an introduction to exploratory factor analysis (EFA). It discusses key concepts such as factors, factor loadings, communalities, assumptions of EFA, extraction and rotation methods. An example is provided applying EFA to anthropometric and physical performance data from 21 participants. Three factors were extracted accounting for over 80% of the variance: an anthropometric factor with high loadings for weight, height and leg length; a physical performance factor with high loadings for shuttle run, 50m dash and 12m run/walk; and a third factor with high loading for shoulder width only.
The document discusses inverse theory and modeling. It defines inversion as using observations to determine objects based on their properties. An inverse problem involves data/observations, a model relating the observations to model parameters, and determining the model parameters through either forward or inverse modeling. Forward modeling varies model parameters to match predictions to observations, while inverse modeling computes the model parameters needed to reproduce the observations. Inverse problems can be linear, involving independent parameters, or nonlinear, where parameters are interdependent; linear problems are often solved using least squares fitting.
Statistical concepts and their applications in various fields:
- Statistics involves collecting and analyzing numerical data to draw valid conclusions. It requires careful research planning and design.
- Descriptive statistics summarize data through measures of central tendency (mean, median, mode) and variability (range, standard deviation).
- Inferential statistics test hypotheses and make estimates about populations based on samples.
- Biostatistics is applied in community medicine, public health, cancer research, pharmacology, and demography to study disease trends, treatment effectiveness, and population attributes. It is also used in advanced biomedical technologies and ecology.
The Office of Institutional Analysis at UTHSCSA supports the academic mission by providing official student and faculty data to internal and external stakeholders in a timely manner. This includes developing reports, maintaining data warehouses, and verifying data accuracy for accreditation and surveys. The office also provides student information to academic departments for individual program accreditation.
This document provides a template for a program planning tool called a logic model. A logic model outlines the key components of a program or project, including the problem statement, strategies or interventions, inputs, outputs, short-term outcomes, and long-term outcomes. It also provides questions to consider at different stages of a project to ensure the strategies will effectively solve the problem and that objectives are being achieved. The logic model template can be used for a variety of projects to help conceptualize, plan, implement, and evaluate all aspects of the program.
Here are the steps to perform linear regression on the advertising dataset to predict sales based on TV spend:
1. Import necessary libraries
2. Split data into training and test sets
3. Create linear regression object
4. Fit the model on training data
5. Predict on test data
6. Calculate accuracy on test data
7. Print coefficient and intercept values
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42
This document discusses key concepts in research methods and statistics. It covers the four scales of measurement (nominal, ordinal, interval, ratio), inferential statistics, stimuli, and the basic statistical symbols. It also discusses experimental design elements like independent and dependent variables, operational definitions, sampling methods, reliability and validity.
This document provides an overview of econometric modeling techniques. It discusses objectives of econometric modeling including empirical verification of economic theories and policy analysis. It also describes types of econometric models such as single-equation regression models, simultaneous-equation models, and time series models. Model building criteria and assumptions of single-equation regression models are outlined along with methods for dealing with violations of assumptions like multicollinearity and autocorrelation.
5. Identifying variables and constructing hypothesisRazif Shahril
This document discusses identifying variables and constructing hypotheses in research methodology. It defines a variable as a concept that can be measured on different scales and provides examples of converting concepts into measurable variables. The document outlines different types of variables based on their role in a cause model or study design. It also describes nominal, ordinal, interval and ratio measurement scales. Finally, the document defines a hypothesis as a testable statement about the relationship between two or more variables and lists some functions of a hypothesis in focusing and guiding a research study.
This document discusses multicollinearity in regression analysis. It defines multicollinearity as an exact or near-exact linear relationship between explanatory variables. In cases of perfect multicollinearity, individual regression coefficients cannot be estimated. Near or imperfect multicollinearity is more common in real data and can lead to less precise coefficient estimates with wider confidence intervals. The document discusses various methods for detecting multicollinearity, such as auxiliary regressions and variance inflation factors, and potential remedies like dropping or transforming variables. However, multicollinearity diagnosis depends on the specific data sample and goals of the analysis.
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS).docxwhitneyleman54422
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS) 1
Practice Problems: 12.5 (p. 588), 12.9 (p.588)
(4 pts.) 1. For each of the following graphs, identify the form, direction (if possible) and relative
strength. In addition, state if you think that there is an association between X and Y. No
explanation is required.
a) b)
c) d)
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS) 2
(14 pts.) 2. Deep-water (>300m) wave forecasts are important for large cargo ships. One
method of prediction suggests that the wind speed (x, in knots) is linearly related to the wave
height (y, in feet). A random sample of buoys was obtained, and the wind speed and wave
height was measured at each. The summary data is shown below.
n = 20, SXX = 91.75, SYY = 15.952, SXY= 36.4, x̄ = 9.25, ȳ = 1.68
The scatter plot of the data is shown below:
(2 pts.) a) Find the estimated regression line for the regression of Wave Height as a function of
Wind Speed.
(1 pt.) b) Does the y-intercept have any physical meaning?
(1 pt.) c) How much change in wave height is expected when the wind speed increases by one
knot? Please explain your answer.
(1 pt.) d) What is the expected value of wave height when the wind speed is 8.6 knots (10
mph)?
(6 pts.) e) Complete the following ANOVA table.
Source of variation Degrees of Freedom Sum of squares Mean square
Regression
Error
Total
(1 pt.) f) What is the estimated variance?
(1 pt.) g) What is the proportion of the wave height that is explained by wind speed?
(1 pt.) h) From the information in the previous parts of this question, do you believe that there is
an association between wave height and wind speed? Please explain your answer. No
additional calculations are required.
STAT 350 (Spring 2017) Homework 11 (20 points + 1 point BONUS) 3
(2 pts.) 3. Some physicians use the cholesterol ratio (CR = total cholesterol/HDL cholesterol) as
a measure of a patient’s risk of heart disease. In addition, the triglyceride concentration (TG)
is associated with coronary artery disease in many patients. In a study of the relationship
between these two variables, a random sample of adults was obtained, and the triglyceride
level denoted as x1 in mg/dL and cholesterol ratio (y) was obtained for each person. The
scatterplot and regression line of ln(triglyceride level - 129) denoted as x2 vs. cholesterol ratio
is below.
The ANOVA summary table is
Source of Variation Sum of Squares Degrees of freedom Mean Square
Regression 103.16 1 103.16
Error 3.20 23 0.14
Total 106.36 24
(1 pt.) a) What is the coefficient of determination?
(1 pt.) b) Do you think that an increase in the triglyceride level causes an increase in the
cholesterol level? Please explain your answer.
(1 pt.) BONUS: Why do you think that they had to take the logarithm of the triglyceride level?
Additional Problems: Note, the book gives.
Pitfalls of multivariate pattern analysis(MVPA), fMRI Emily Yunha Shin
Two papers review:
* (Part of) A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives
* The impact of study design on pattern estimation for single-trial multivariate pattern analysis
This document discusses conceptual frameworks, theories, and research questions/hypotheses in qualitative and quantitative research. It provides:
1. An overview of what conceptual frameworks are and how they are used to define variables and relationships in a study.
2. Descriptions of how theories are applied differently in qualitative versus quantitative research, such as testing theories deductively in quantitative research and generating theories inductively in qualitative research.
3. Guidelines for writing good qualitative research questions, quantitative research questions/hypotheses, and mixed methods research questions/hypotheses.
This document discusses laboratory errors, their causes, types, and impacts. It describes that errors can occur in the pre-analytical, analytical, and post-analytical phases of testing and provide examples of errors in each phase. Errors are categorized as either determinate (systemic) errors, which are reproducible and can be identified and corrected, or indeterminate (random) errors, which are caused by uncontrollable variables and cannot be eliminated. The key goals are improving precision by reducing indeterminate errors and improving accuracy by reducing determinate errors.
The item characteristic curve is a fundamental concept in item response theory that describes the relationship between a test-taker's ability and their probability of correctly answering an item. It is defined by two key parameters: item difficulty, which indicates where on the ability scale an item best discriminates, and item discrimination, which refers to how well an item can differentiate ability levels above and below the difficulty point. Together, these parameters determine the shape of the curve, with steeper curves indicating better discrimination.
1. Regression analysis is a statistical process for estimating relationships between variables, including linear regression, logistic regression, and other types.
2. It allows predicting a dependent or response variable's values based on the values of independent or input variables.
3. Multiple linear regression allows modeling relationships between a scalar dependent variable and two or more explanatory variables.
This document outlines the syllabus for a course titled "Predictive Analytics" taught by K. Mohanasundaram. The syllabus covers topics such as introduction to business analytics, mathematical modelling, data prediction techniques, regression analysis methods like simple linear regression, logistic regression, and forecasting techniques. It recommends textbooks and references for the course and provides an introduction to concepts like uncertainty modelling using probability distributions and random variables.
Lesson 2 introduction to research in language studiesgabonetoby
This document discusses key aspects of the research process and developing a theoretical framework. It covers selecting a research area and questions, developing hypotheses, identifying variables, and determining appropriate measurement levels for variables. The role and importance of theory in research is explained. Guidelines are provided for constructing a theoretical framework, including examining the research problem and related literature to identify relevant theories. Examples of independent, dependent, and control variables are given. Finally, the document outlines the different levels of variable measurement - nominal, ordinal, interval, and ratio.
This document discusses various multivariate statistical methods including logistic regression, multinomial logistic regression, and multilevel modeling. It provides examples and comparisons of different types of regressions, discusses model building and interpretation. It also covers diagnostic tests, effect sizes, and hypothesis testing in multilevel modeling. An example article on patient experience ratings is summarized, highlighting the dependent and independent variables, confounders, and methods used including multinomial logistic regression and interpreting odds ratios.
This document provides an introduction to using directed acyclic graphs (DAGs) for confounder selection in nonexperimental studies. It discusses what DAGs are, their benefits over traditional covariate selection approaches, limitations, key terminology, and examples of how DAGs can identify when adjustment is needed or could induce bias. The document also introduces d-separation criteria for assessing open and closed paths in DAGs and overviews software tools for applying these rules to select minimum adjustment sets from complex causal diagrams.
The document discusses the scientific method or "Sistemang Harana" in Tagalog. It explains the key steps in the scientific method as identification, formulation of hypotheses, testing of hypotheses through repeated trials and observation of effects, and reaching a conclusion and recommendations. The overall process aims to identify and solve problems in a systematic way.
This document provides an overview of inferential statistics and statistical tests that can be used, including correlation tests, t-tests, and how to determine which tests are appropriate. It discusses the assumptions of parametric tests like Pearson's correlation and t-tests, and how to check assumptions graphically and using statistical tests. Specific procedures for conducting correlation analyses in Excel and SPSS are outlined, along with how to interpret and report the results.
This document provides an introduction to exploratory factor analysis (EFA). It discusses key concepts such as factors, factor loadings, communalities, assumptions of EFA, extraction and rotation methods. An example is provided applying EFA to anthropometric and physical performance data from 21 participants. Three factors were extracted accounting for over 80% of the variance: an anthropometric factor with high loadings for weight, height and leg length; a physical performance factor with high loadings for shuttle run, 50m dash and 12m run/walk; and a third factor with high loading for shoulder width only.
The document discusses inverse theory and modeling. It defines inversion as using observations to determine objects based on their properties. An inverse problem involves data/observations, a model relating the observations to model parameters, and determining the model parameters through either forward or inverse modeling. Forward modeling varies model parameters to match predictions to observations, while inverse modeling computes the model parameters needed to reproduce the observations. Inverse problems can be linear, involving independent parameters, or nonlinear, where parameters are interdependent; linear problems are often solved using least squares fitting.
Statistical concepts and their applications in various fields:
- Statistics involves collecting and analyzing numerical data to draw valid conclusions. It requires careful research planning and design.
- Descriptive statistics summarize data through measures of central tendency (mean, median, mode) and variability (range, standard deviation).
- Inferential statistics test hypotheses and make estimates about populations based on samples.
- Biostatistics is applied in community medicine, public health, cancer research, pharmacology, and demography to study disease trends, treatment effectiveness, and population attributes. It is also used in advanced biomedical technologies and ecology.
Similar to Institutional Research and Regression (20)
The Office of Institutional Analysis at UTHSCSA supports the academic mission by providing official student and faculty data to internal and external stakeholders in a timely manner. This includes developing reports, maintaining data warehouses, and verifying data accuracy for accreditation and surveys. The office also provides student information to academic departments for individual program accreditation.
This document provides a template for a program planning tool called a logic model. A logic model outlines the key components of a program or project, including the problem statement, strategies or interventions, inputs, outputs, short-term outcomes, and long-term outcomes. It also provides questions to consider at different stages of a project to ensure the strategies will effectively solve the problem and that objectives are being achieved. The logic model template can be used for a variety of projects to help conceptualize, plan, implement, and evaluate all aspects of the program.
Tracking Secondary AVID Students into Higher Educationfy2009Colby Stoever
This document analyzes data on high school graduates from fiscal year 2009 comparing those who participated in AVID (Advancement Via Individual Determination) programs and those who did not. It finds that AVID students were more likely to be Hispanic, at-risk, and of economic disadvantage. They also had higher rates of enrollment and persistence in higher education the following year compared to non-AVID students, with persistence rates increasing based on years of AVID participation.
Developmental Education Program Survey (DEPS)Colby Stoever
This document summarizes the results of a statewide survey of developmental education programs and policies in Texas for fiscal year 2011. Some key findings include:
- Developmental education plans were not standardized across institutions previously, so the Developmental Education Program Survey (DEPS) was created to collect consistent information across institutions.
- DEPS includes sections on general information, academic advising, college readiness assessments, course information, and faculty development.
- Most institutions require academic advising and monitoring of academic performance for developmental students, and many have early warning systems to identify struggling students.
- The most commonly used college readiness assessments are ACCUPLACER, COMPASS, and THEA for math, reading, and writing.
-
The Institutional Context For QEP Kick-Off MeetingColby Stoever
1) The Institutional Context document provides data on the student population at UTHSCSA including demographics, satisfaction rates, and graduation/pass rates. It shows that the majority of students are in professional programs and the student body is diverse.
2) It also includes information on faculty including levels of effort in areas like instruction, research and service. Most faculty have responsibilities in multiple areas.
3) The data presented is complex and more information is needed in areas like alumni outcomes to fully understand student success after graduating from UTHSCSA programs.
The document describes a proposed initiative called the Texas Pathways to improve student outcomes through increased data sharing and use between secondary and postsecondary educational institutions. Key points:
- It would establish regional partnerships between schools and colleges to share student data and form teams to address issues.
- Teams of educators would receive data reports and training to identify problems and make curriculum/policy changes based on evidence.
- Goals are to improve coordination, enrollment, completion rates and other student outcomes by giving educators better access to data and skills to use it for continuous improvement.
- Outcomes would be evaluated to inform future statewide education decisions and policy.
The document provides an overview of using SAS for institutional research. It discusses importing and exporting data, manipulating data using the DATA step, and performing basic statistical analysis using common SAS procedures. The workshop goals are to teach basic SAS programming skills including importing, the DATA step, PROC SORT, exporting, PROC PRINT, PROC FREQ, PROC SUMMARY, and PROC TABULATE in three hours. If time allows, it will also cover macros, creating text files, and email reports.
This document summarizes the role and achievements of the author as a Data Analyst for Austin I.S.D. from December 2012 to May 2013. The author provided data and automated reporting processes for program evaluation, grants, and stakeholder reports. Major achievements included creating an automated parent survey report generation system that reduced production time from 4 weeks to 1 week, evaluating grants using statistical techniques to prove the ineffectiveness of a contractor's methods and increase funding for an attendance project, and decreasing ad-hoc reporting time for K-12 data.
Executive Director of Institutional Research June 2013 to October 2016Colby Stoever
The document summarizes the experience of an Executive Director of Institutional Research at the University of Texas Health Science Center in San Antonio from 2013 to 2016. In this role, they managed reporting and data analysis projects, increased state funding by over $3 million by correcting reporting errors, removed the institution from a federal funding watch list through accurate reporting, and improved overall reporting efficiency. Some key accomplishments included creating the institution's first data warehouse, academic dashboard, faculty reporting system, and student satisfaction survey.
3. ABSTRACT
• WORKSHOP 5: REGRESSION MODELS AND THEIR USES IN INSTITUTIONAL RESEARCH
THE WORKSHOP WILL NOT BE A STATISTICAL LECTURE ON THE MATHEMATICS UNDERLYING
REGRESSION TECHNIQUES, AND IT WILL NOT BE ABOUT PROGRAMMING REGRESSION MODELS
INTO MY FAVORITE STATISTICAL PROGRAMMING LANGUAGE. (ALTHOUGH AT DINNER, I WOULD
LOVE TO DISCUSS THESE TOPICS). RATHER THIS WORKSHOP WILL INTRODUCE PARTICIPANTS TO
THE WORLD OF REGRESSION BY DISCUSSING WHAT TYPES OF REGRESSION MODELS EXIST, WHAT
TYPES OF RESEARCH QUESTIONS CAN BE PARTIALLY ANSWERED WITH THESE REGRESSION
MODELS, WHERE A RESEARCHER CAN REALLY MESS UP, AND BASIC INTERPRETATION OF RESULTS
FOR END USERS.
THE GOAL OF THIS WORKSHOP IS TO EDUCATION PARTICIPANTS ON WHAT REGRESSION MODELS
THEY WANT TO INVESTIGATE FURTHER TO BETTER THEIR IR OFFICES.
4. WHAT WE WILL NOT DO
• MATH IS OUT.
• (SOMETIME)FORMULAS BUT YOU WILL NOT HAVE TO REMEMBER THEM OR
UNDERSTAND THEM.
• EACH OF THE TOPICS DISCUSSED IN THIS WORKSHOP COULD FILL A COURSE OR
AT LEAST TWO WEEKS OF LECTURES.
5. WHAT WE WILL DO
• LEARN ABOUT TYPES OF REGRESSION.
• BECOME AWARE OF WHAT CAN BE DONE. ******
• LEARN HOW TO INTERPRET INFORMATION FROM REGRESSION MODELS. (VERY
BASIC)
• LEARN WHAT RESEARCH QUESTIONS CAN AND CANNOT BE ANSWERED BY EACH
REGRESSION TYPE.
• LEARN ABOUT COMMON PROBLEMS.
• LEARN IR USES.
6. CONTENTS
• QUICK REVIEW OF INFORMATION
• LINEAR REGRESSION
• LOGISTICAL REGRESSION
• MULTINOMIAL REGRESSION
• ORDINAL REGRESSION
• POISSON REGRESSION
• HIERARCHICAL LINEAR MODELING
• EVENT-HISTORY ANALYSIS
• REGRESSION-DISCONTINUITY DESIGN
• TIME SERIES REGRESSION (FORECASTING)
7. SCALES OF MEASUREMENT
• NOMINAL SCALE
• ORDINAL SCALE
• INTERVAL SCALE
• RATIO SCALE
• NEVER FORCE A METRIC INTO ANOTHER SCALE (NO MEDIAN SPLITS).
10. CORRELATION VERSUS CAUSATION
• MOST IR RESEARCH DOES NOT AND OFTEN CANNOT EXPLAIN A CAUSAL
RELATIONSHIP.
• STATISTICS (NO MATTER HOW WELL DONE) DO NOT PROVE BY THEMSELVES
CASUAL RELATIONSHIPS.
• ONLY RESEARCH METHODOLOGY CAN HELP PROVE CAUSATION.
• IT IS OFTEN UNETHICAL BECAUSE THE RANDOM ASSIGNMENT IS NOT ALWAYS
ETHICAL.
• CHOOSE YOUR LANGUAGE WISELY.
11. MEDIATION VERSUS MODERATION
• MEDIATION (MEDIATOR VARIABLES)- SHOWING A DIRECT RELATIONSHIP
BETWEEN A OUTCOME AND PREDICTOR VARIABLE
• MODERATION (MODERATOR VARIABLES)-INFLUENCES THE DIRECTION (SIGN) OR
STRENGTH OF A RELATIONSHIP BETWEEN A PREDICTOR AND OUTCOME
VARIABLE.
• SPECIAL TECHNIQUES ARE NEEDED TO DETERMINE MODERATION.
• BARON, R. M., & KENNY, D. A. (1986). THE MODERATOR-MEDIATOR VARIABLE DISTINCTION IN SOCIAL
PSYCHOLOGICAL RESEARCH: CONCEPTUAL, STRATEGIC, AND STATISTICAL CONSIDERATIONS. JOURNAL OF
PERSONALITY AND SOCIAL PSYCHOLOGY, 51, 1173-1182.
12. POWER
• TYPE I ERROR- INCORRECT REJECTION OF A TRUE NULL HYPOTHESIS
• TYPE II ERROR- THE FAILURE TO REJECT A FALSE NULL HYPOTHESIS
• POWER OF A STATISTICAL TEST IS THE PROBABILITY THAT THE TEST WILL REJECT
THE NULL HYPOTHESIS WHEN THE ALTERNATIVE HYPOTHESIS IS TRUE (I.E. THE
PROBABILITY OF NOT COMMITTING A TYPE II ERROR).
• IR HAS LARGE POTENTIAL TO COMMIT BOTH
14. PRACTICAL VERSUS STATISTICAL
SIGNIFICANCE
• P<0.05 OR P<.001 OR P<.00001 WHICH IS BETTER???
• P-VALUES ARE N SAMPLE SIZE DEPENDENT.
• THE MORE N THE MORE CHANCES OF PROVING STATISTICAL SIGNIFICANCE
• P-VALUES ARE NOT DIRECTLY RELATED TO THE SIZE OF A RELATIONSHIP OR
EFFECT.
• EFFECT SIZE STATISTICS LIKE R2, R2, COHEN’S D, AND OMEGA-SQUARED
• IR RESEARCHERS NEED TO BE CARE ABOUT OVER POWERED MODELS --
15.
16. ANOVA, ANCOVA, MANOVA VS REGRESSION
• ANOVA, ANCOVA, & MANOVA ARE SPECIAL FORMS OF REGRESSION.
• NO STATISTIC PROVES CAUSATION.
• ANOVA, ACOVA, MANOVA DO NOT PROVE CAUSATION.
• REGRESSION IS NOT JUST CORRELATION. IT CAN HELP PROVE CAUSATION.
18. WHAT DOES LINEAR REGRESSION TELL US?
• LINEAR REGRESSION HELPS US UNDERSTAND HOW ONE VARIABLE RELATES TO ONE
OR MANY VARIABLES.
• FOR EXAMPLE, HOW DO STUDENTS SAT MATH SCORES RELATE TO SES, GENDER, AND
HIGH SCHOOL GPA.
• LINEAR REGRESSION CAN HELP PREDICT ONE VARIABLE GIVEN ONE OR MULTIPLE
VARIABLES.
• LINEAR REGRESSION CAN TELL US HOW MUCH SEVERAL VARIABLES RELATED TO ONE
VARIABLE.
• LINEAR REGRESSION CAN TELL US HOW, HOW MUCH, AND IF VARIABLES RELATED TO
A VARIABLE.
19. LINEAR REGRESSION
• OUTCOME VARIABLES (DEPENDENT VARIABLES)
MUST BE INTERVAL OR RATIO SCALE.
• PREDICTOR VARIABLES (INDEPENDENT VARIABLES)
CAN BE ALL TYPES OF VARIABLES.
20. DUMMY CODING AND REFERENCES
• PREDICTOR VARIABLES (INDEPENDENT VARIABLES) CAN BE ALL TYPES OF
VARIABLES.
• BUT…..
• NOMINAL VARIABLES LIKE GENDER BECAUSE THEY ARE DICHOTOMOUS (TWO
CHOICES)
• ETHNICITY – IS NOT SO EASY IF YOU HAVE 5 ETHNICITIES YOUR REGRESSION
MODEL WILL NEED 4 VARIABLES + 1 REFERENCE GROUP
23. UNSTANDARDIZED EQUATION
• UNSTANDARDIZED EQUATION
• 𝑌 = 𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑥2 UNSTANDARDIZED EQUATION
• ALL NUMBERS ARE IN TERM OF THE OUTCOME VARIABLE
• YOU CAN ACTUALLY PLUG ACTUAL VALUES OF THE PREDICTOR VARIABLES AND
GET A PREDICTED VALUE OF THE OUTCOME VARIABLE
29. EXAMPLE
• OVERALL MODEL USING THE ELA EXIT LEVEL TAKS AS THE VARIABLE OF INTEREST WAS SIGNIFICANT
(F(3,232687)=4551.81, P<.0001, ADJR2=.0554).
• MODEL USING ELA TAKS SCALES SCORES FOUND THAT MALES (𝛽 = .119, P<.0001, SR2=0.013)
• AND STUDENTS WHO PARTICIPATED IN FREE/REDUCED LUNCH PROGRAMS (𝛽 = −.205, P<.0001,
SR2=0.041) WERE PREDICTED TO HAVE STATISTICALLY LOWER SCORES ON THE ELA TAKS.
• GENDER ONLY EXPLAINED 1.0% OF THE VARIABLE IN THE ELA TAKS AND SHOULD BE VIEW HAS A
USELESS VARIABLE IN PREDICTING ELA TAKS SCORES.
• STATISTICALLY, AVID H.S. GRADUATES WERE PREDICTED TO HAVE HIGHER ELA TAKS SCORES THAN
NON-AVID H.S. GRADUATES (𝛽 = .026, P<.0001, SR2=0.0006); HOWEVER SINCE ONLY 0.06% OF THE
VARIANCE IN THE TAKS TEST WAS EXPLAINED BY AVID PARTICIPATION, READERS SHOULD VIEW AVID
AND NON-AVID H.S. GRADUATES AS PERFORMING THE SAME ON THE ELA EXIT LEVEL TAKS.
30. LINEAR REGRESSION AND IR
• IT IS GREAT IF YOU HAVE AN OUTCOME VARIABLE THAT IS IN INTERVAL OR
RATIO SCALE.
• MOST INDIVIDUALS CAN UNDERSTAND THE INFORMATION PRODUCED.
• HOWEVER, MANY IR OUTCOME VARIABLES ARE NOT INTERVAL OR RATIO SCALE.
31. WARNING: LINEAR REGRESSION
• SOMETIME RELATIONSHIPS ARE NOT LINEAR.
• PREDICTOR VARIABLES THAT ARE HIGHLY RELATED TO EACH OTHER CAN BE A
PROBLEM.
• SMALLER IS BETTER. --- PARSIMONIOUS MODELS ARE BETTER.
• OUTCOME VARIABLE THAT ARE BINARY OR CLASSIFICATION VARIABLES.
• OVER POWERED MODELS.
32. LOGISTIC REGRESSION
• WE USED LOGISTIC REGRESSION TO PREDICT DICHOTOMOUS OUTCOMES.
• Y = B0 + B1X + E
• UNLIKE LINEAR REGRESSION, THERE IS NO STANDARDIZED MODEL.
• UNFORTUNATELY, THE UNSTANDARDIZED MODEL IS NOT AS EASY TO
UNDERSTAND AS LINEAR REGRESSION.
• ALSO, THE MAGIC OF R2 DOES NOT EXIST FOR LOGISTIC REGRESSION
• IN FACT, HYPOTHESIS TEST IS DONE USING CHI-SQUARE TEST.
36. PROBABILITIES
• PROBABILITY- IS THE LIKELIHOOD OF AN EVENT (OR THING) OCCURRING
• THE TIMES THE EVENT CAN OCCUR/ THE NUMBER OF POSSIBLE EVENTS
• HEADS ON A COIN FLIP– ½- .5
• SIX ON A DICE – 1/6-.167
• EXIST FROM 0 TO 1
• NO CHANCE OF EVENT OCCURRING- GIVING A 7 ON A DICE ROLL
• 0.5 EQUAL CHANCES OF AN EVENT OCCURRING
• 1 NO CHANCE OF AN EVENT NOT OCCURRING – 1 PICK IN A DRAW
37. LOGITS
• LOGITS ARE THE NATURAL EXPRESSION OF LOGISTIC REGRESSION.
• IT MAKES A NON-CONTINUOUS THING TO CONTINUOUS.
• IT COMPUTED THROUGH THE FORMULA
• LOGIT(PROMOTION)=.39(PUBS)-6.00
• 4 PUBS=LOGIT=-4.44
• EXIST -∞ TO +∞
• IT IS NOT EASY TO UNDERSTAND OR EXPLAIN
38. ODDS
• ODDS ARE THE EXP(LOGIT)
• EXISTS 0 TO ∞.
• 0 TO 1 --- 0 MEANS NO CHANCE OF THE THING OCCURRING
• 1 MEANS 1 TO 1 CHANCE—COIN FLIP
• 1 TO ∞
• NOW, WE ARE TALKING ABOUT THE INCREASE LIKELIHOOD OF AN EVENT
OCCURRING.
• ABOVE 1 IS EASY TO INTERPRET
• BELOW 1 IS NOT EASY TO INTERPRET
39. LOGITS, ODDS, AND PROBABILITIES
• LOGITS ARE NOT LOGIC TO INDIVIDUALS OUTSIDE OF STATISTIC.
• ODDS ARE EASY INTERPRET IF ABOVE ONE
• PROBABILITIES HARD FOR PEOPLE TO UNDERSTAND AS WELL
40. COOL STUFF
• DIFFERENCE IN CHI-SQUARE BETWEEN MODELS
• COX AND SNELL INDEX
• NAGELKERKE INDEX
• NON-CENTRALITY PARAMETER
41. LOGISTIC REGRESSION AND IR
• MANY ISSUES IR WANTS TO STUDY ARE DICHOTOMOUS.
• PRETTY EASY FOR MOST INDIVIDUALS TO UNDERSTAND WHEN ODDS RATIOS
ARE USED.
42. MULTINOMIAL LOGISTIC REGRESSION
• WHAT IF YOUR OUTCOME VARIABLE IS NOMINAL BUT NOT DICHOTOMOUS?
• LETTER GRADES IN A COURSE.
• HIGH, MEDIUM, LOW
• MULTINOMIAL LOGISTIC REGRESSION CAN TELL YOU HOW INDIVIDUALS WILL
LIKELY BE PLACED IN THE GROUP OF YOUR OUTCOME VARIABLE.
• I JUST WANT YOU TO BE AWARE OF THE EXISTENCE OF MULTINOMIAL
REGRESSION.
43. ORDINAL LOGISTIC REGRESSION
• WHAT IF YOUR OUTCOME VARIABLE IS ORDINAL (RANK)?
• CLASS RANK, PLACES IN A RACE OR TOURNAMENT
• I HAVE NEVER DONE THIS BEFORE
44. POISSON REGRESSION
• COUNT DATA.
• PREDICTS NUMBER OF EVENTS THAT OCCUR IN A SPECIFIC TIME PERIOD FROM
ONE OR MORE INDEPENDENT VARIABLES.
• WORKS EVEN WHEN EVENTS ARE RARE OR MANY PEOPLE HAVE ZERO EVENTS
• GREAT FOR ATTENDANCE DATA, WHEN DEATH IS AN EVENT
47. HIERARCHICAL LINEAR MODELS
• HIERARCHICAL LINEAR MODELS- REGRESSION ANALYSIS MODELS THAT
CONTAIN PREDICTORS MEASURED AT MORE THAN ONE LEVEL OF AGGREGATION
OF DATA (COHEN, ET AL. , 2003)
• AS CALLED MULTILEVEL MODEL.
• LINEAR AND LOGISTIC REGRESSION MODEL ASSUMES INDEPENDENTS IN
OBSERVATIONS.
• ARE ANY OBSERVATIONS EVER TRULY INDEPENDENT?
49. HIERARCHICAL LINEAR MODELS
• STUDENTS → CLASSES → COLLEGE →UNIVERSITY
• CLUSTERING – OF OBSERVATIONS IS COMMON
• WHAT ARE SOME OF CLUSTERS THAT WERE COMMON?
50. HIERARCHICAL LINEAR MODELS
• INTRACLASS CORRELATION
• ICC IS THE PROPORTION OF VARIANCE IN THE OUTCOME ATTRIBUTABLE TO THE
GROUPING.
• RANGE FROM 0 TO 1;
• 0 IS SMALL AND 1 IS LARGE.
• SOME RESEARCHERS SAY EVEN A SMALL CORRELATION MEANS DEPENDENTS OF
OBSERVATIONS (0.05)
• THE FORMULA LARGE AND NOT NEEDED FOR THIS WORKSHOP.
51. REMEMBER – HLM IS ALSO CALLED
MULTILEVEL MODELING.
BASICALLY, HLM ASSUMES EACH
CLUSTER IN YOUR DATA HAS A
DIFFERENT REGRESSION LINE.
HLM TAKES ALL OF THESE
REGRESSION LINES AND AVERAGES
THEM TOGETHER.
52. HIERARCHICAL LINEAR MODELS
• FOR EXAMPLE – I WANT TO PREDICT MATH ACHIEVEMENT GIVEN MATH ANXIETY
• HLM—MORE THAN ONE LEVEL, MORE THAN ONE EQUATION
• LEVEL ONE IS LIKE NORMAL LINEAR REGRESSION
• IT REPRESENTS THE LOWEST LEVEL (THE TRUE UNIT OF ANALYSIS)—THE INDIVIDUAL
STUDENT –PARTICIPANT
• ONE-LEVEL REGRESSION IS SINGLE EQUATION CONTAINING COEFFICIENTS
• LEVEL TWO IS STUDENTS NESTED WITHIN CLASSES
• REGRESSION EQUATIONS FOR REGRESSION COEFFICIENTS
53. HLM AND IR
• HLM IS DIFFERENT TO LEARN FOR SEASONED RESEARCHER.
• NESTING WITHIN AN ORGANIZATION CAN BE IMPORTANT.
• HOWEVER, THINK ABOUT EXPLAIN THE IDEA OF “AVERAGE OF REGRESSION
COEFFIENTS” COEFFICIENT
•
55. EVENT HISTORY ANALYSIS
• LOGISTIC, HLM, AND HLM ARE SNAP SHOT DATA.
• THEY ONLY LOOK AT ONE POINT IN TIME.
• WHAT IF YOU NEED TO KNOW WHEN AN EVENT OCCURS OR WHEN IT IS LIKELY
TO OCCUR?
• EVENT HISTORY ANALYSIS IS LOGISTIC REGRESSION WITH TIME AS A VARIABLE.
56. EVENT HISTORY ANALYSIS
• EVENTS ARE TRANSITIONS IN STATUS (CHANGES FROM ONE STATE TO
ANOTHER)
• METHODOLOGICAL FEATURES OF DATA AND EVENT OCCURRENCE DESIGNS
• TARGET EVENT
• OCCURRENCE OF A PARTICULAR (WELL DEFINED) EVENT IS THE FOCUS OF STUDY
• BEGINNING OF TIME/ BEFORE THE EVENT
• THE POINT THE STUDY STARTS OR WHEN NO ONE HAS YET EXPERIENCED THE TARGET
EVENT
• END OF TIME
• AN EVENT TIME FOR EACH SUBJECT
57. EVENT HISTORY ANALYSIS
• DISCRETE-TIME ANALYSIS-
• CONTINUOUS-TIME ANALYSIS-
• WE WILL TALK MAIN ABOUT DISCRETE TIME ANALYSIS BECAUSE IR WOULD
MAINLY USE IT.
•
58. EVENT HISTORY ANALYSIS
• CENSORING
• LEFT CENSORING – EVENT OCCURRED BEFORE THE STUDY STARTED
• RIGHT CENSORING – EVENT OCCURRED AFTER THE STUDY ENDED
• INTERVAL –CENSORING- INDIVIDUAL IS REMOVED FROM RISK AT THE TIME OF
DIFFERENT EVENT
• LIFE TABLES
• EXAMPLE DEVELOPMENTAL EDUCATION – THE MOVE FROM DEVELOPMENTAL
EDUCATION TO COLLEGE READINESS
• 100 DE STUDENTS START COLLEGE
59. EVENT HISTORY ANALYSIS
• DISCRETE TIME HAZARD 𝑝 – THE ESTIMATED PROBABILITY OF DROPOUT IN A
INTERVAL --
• 20 DROPPED OUT AND 80 REMAIN AFTER A SEMESTER 20/100=.2
• 20 DROPPED OUT AND 60 REMAIN AFTER TWO SEMESTER 20/80 = .25
• SURVIVOR FUNCTION – THE PROPORTION OF SURVIVORS IN THE PREVIOUS
PERIOD TIMES THOSE WHO SURVIVE TO WHATEVER PERIOD (.55)
60. EVENT HISTORY ANALYSIS
• ONE CAN ALSO ADD THE OTHER VARIABLES TO SEE HOW THEY INFLUENCE
SURVIVAL ANALYSIS.
• FOR EXAMPLE , YOU CAN KNOW IF A MALE IS MORE LIKELY TO DROP OUT IN
THE SECOND SEMESTER ABOVE THE NORMAL SURVIVAL RATE
61. EVENT HISTORY ANALYSIS AND IR
• EASY TO LEARN
• CAN ANSWER IMPORTANT QUESTIONS TO LEADERSHIP
• REGRESSION WITH TIME – NOT JUST ONE POINT IN TIME – WHEN IS THE CRITICAL
TIME POINT.