This document discusses design matrices and contrast statements in SAS regression procedures. It defines a design matrix as a matrix of explanatory variable values used in regression analyses. It describes three common coding schemes for design matrices in SAS - GLM, effect, and reference (REF) coding - and how they parameterize categorical variables. It provides examples of how these different coding schemes impact odds ratio interpretation and the formulation of contrast statements in logistic regression analyses.
This document outlines the objectives and key aspects of qualitative research design. It discusses qualitative statement, types of qualitative designs including phenomenology, grounded theory, case study, narrative synthesis, ethnography, historical research and action research. It also covers sampling design, observational design, operational design, data analysis design and the differences between qualitative and quantitative designs. Key points covered include that qualitative research aims to understand and describe central experiences or processes for participants through methods like interviews and observations. Different qualitative designs have different focuses such as experiences for phenomenology or groups for ethnography. The document provides examples of studies for each design type.
The document provides an overview of univariate statistical analysis and inferential statistics, including key concepts like population and sample distributions, measures of central tendency and dispersion, the normal distribution, sampling distributions, confidence intervals, and how these statistical techniques are used to make inferences about populations based on samples. It also discusses important steps in the data analysis process like data preparation, selecting appropriate analysis strategies and techniques based on the research objectives and data types.
. Science is not just a bulk of
knowledge, but knowledge assembled by the appliance of the scientific methodology.
The scientific method has led to the
discovery of some of the most important concepts in science today such as evolution,
gravitational theory, relativity, and too much more to list. It helps to catch frauds and bring the
truth to light. It continues to be the standard for which all scientific discoveries are measured and
verified and it has stood the test of time to be used in all fields in science and it has applications
in many other industries.The primary goal of scientific research is to describe and explain
reality. Research begins with defining and describing what is already known about a subject.
This requires reviewing the literature and synthesizing the information generated by various
studies in the past.In scientific method, logic supports in preparing propositions clearly and
precisely so that their probable substitutes become clear. Further, logic develops the
consequences of such alternatives, and when these are compared with observable phenomena, it
becomes possible for the researcher or the scientist to state which alternative is most in harmony
with the observed facts. All this is done through experimentation and survey investigations
which constitute the integral parts of scientific method. The scientific research will have the following steps:-
1. Observe an event: - The first process in the scientific method involves the observation
of a phenomenon, event, or “problem.” The discovery of such a phenomenon may occur
due to an interest on the observers‟ part, a suggestion or assignment, or it may be an
annoyance that one wishes to resolve.
2. Develop a hypothesis: - Observation leads to a question that needs to be answered tohuman curiosity about the observation, such as why or how this event happened
or what it is like. In order to develop this question, observation may involve taking
measures to quantify it in order to better describe it. Scientific questions need to be
human curiosity about the observation, such as why or how this event happened
or what it is like. In order to develop this question, observation may involve taking
measures to quantify it in order to better describe it. Scientific questions need to be
answerable and lead to the formation of a hypothesis about the problem.
3. Test the prediction. A scientific hypothesis has to be testable and also has to be proven
to be accurate. If it does fall short, another hypothesis may be tested, usually one that has
taken into consideration the fact that the last tested hypothesis failed. Prediction is a
statement about the way things will happen in the future, often but not always based on
experience or knowledge. 4 Observe the result: - All evidence and conclusions must be analyzed properly. 5. An experiment is designed to prove or disprove the hypothesis.6.Revise the hypothesis 7. Repeat as needed.8. A successful hypot
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.2: Contingency Tables
This document provides an introduction to correlation and regression analysis. It defines correlation as a measure of the association between two variables and regression as using one variable to predict another. The key aspects covered are:
- Calculating correlation using Pearson's correlation coefficient r to measure the strength and direction of association between variables.
- Performing simple linear regression to find the "line of best fit" to predict a dependent variable from an independent variable.
- Using a TI-83 calculator to graphically display scatter plots of data and calculate the regression equation and correlation coefficient.
Research methodology provides guidance on conducting research systematically and scientifically. It explains both qualitative and quantitative research methods and the steps of the research process, including defining the problem, reviewing literature, developing hypotheses, collecting and analyzing data, and reporting findings. Key aspects of research methodology include formulating testable hypotheses, designing studies (e.g. experiments) to test hypotheses, and using statistical analysis to accept or reject null hypotheses.
This document summarizes quantitative data analysis techniques for summarizing data from samples and generalizing to populations. It discusses variables, simple and effect statistics, statistical models, and precision of estimates. Key points covered include describing data distribution through plots and statistics, common effect statistics for different variable types and models, ensuring model fit, and interpreting precision, significance, and probability to generalize from samples.
Research involves defining problems, formulating hypotheses, collecting and evaluating data, reaching conclusions, and testing those conclusions. It is a systematic process that requires accurate data collection and adherence to ethical standards. Research aims to generate new knowledge and insights through logical reasoning using both inductive and deductive methods. The purpose of research can be descriptive, explanatory, or exploratory. There are different types of research methodologies including basic vs applied, descriptive vs exploratory, correlational vs explanatory, qualitative vs quantitative, and conceptual vs empirical research.
This document outlines the objectives and key aspects of qualitative research design. It discusses qualitative statement, types of qualitative designs including phenomenology, grounded theory, case study, narrative synthesis, ethnography, historical research and action research. It also covers sampling design, observational design, operational design, data analysis design and the differences between qualitative and quantitative designs. Key points covered include that qualitative research aims to understand and describe central experiences or processes for participants through methods like interviews and observations. Different qualitative designs have different focuses such as experiences for phenomenology or groups for ethnography. The document provides examples of studies for each design type.
The document provides an overview of univariate statistical analysis and inferential statistics, including key concepts like population and sample distributions, measures of central tendency and dispersion, the normal distribution, sampling distributions, confidence intervals, and how these statistical techniques are used to make inferences about populations based on samples. It also discusses important steps in the data analysis process like data preparation, selecting appropriate analysis strategies and techniques based on the research objectives and data types.
. Science is not just a bulk of
knowledge, but knowledge assembled by the appliance of the scientific methodology.
The scientific method has led to the
discovery of some of the most important concepts in science today such as evolution,
gravitational theory, relativity, and too much more to list. It helps to catch frauds and bring the
truth to light. It continues to be the standard for which all scientific discoveries are measured and
verified and it has stood the test of time to be used in all fields in science and it has applications
in many other industries.The primary goal of scientific research is to describe and explain
reality. Research begins with defining and describing what is already known about a subject.
This requires reviewing the literature and synthesizing the information generated by various
studies in the past.In scientific method, logic supports in preparing propositions clearly and
precisely so that their probable substitutes become clear. Further, logic develops the
consequences of such alternatives, and when these are compared with observable phenomena, it
becomes possible for the researcher or the scientist to state which alternative is most in harmony
with the observed facts. All this is done through experimentation and survey investigations
which constitute the integral parts of scientific method. The scientific research will have the following steps:-
1. Observe an event: - The first process in the scientific method involves the observation
of a phenomenon, event, or “problem.” The discovery of such a phenomenon may occur
due to an interest on the observers‟ part, a suggestion or assignment, or it may be an
annoyance that one wishes to resolve.
2. Develop a hypothesis: - Observation leads to a question that needs to be answered tohuman curiosity about the observation, such as why or how this event happened
or what it is like. In order to develop this question, observation may involve taking
measures to quantify it in order to better describe it. Scientific questions need to be
human curiosity about the observation, such as why or how this event happened
or what it is like. In order to develop this question, observation may involve taking
measures to quantify it in order to better describe it. Scientific questions need to be
answerable and lead to the formation of a hypothesis about the problem.
3. Test the prediction. A scientific hypothesis has to be testable and also has to be proven
to be accurate. If it does fall short, another hypothesis may be tested, usually one that has
taken into consideration the fact that the last tested hypothesis failed. Prediction is a
statement about the way things will happen in the future, often but not always based on
experience or knowledge. 4 Observe the result: - All evidence and conclusions must be analyzed properly. 5. An experiment is designed to prove or disprove the hypothesis.6.Revise the hypothesis 7. Repeat as needed.8. A successful hypot
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.2: Contingency Tables
This document provides an introduction to correlation and regression analysis. It defines correlation as a measure of the association between two variables and regression as using one variable to predict another. The key aspects covered are:
- Calculating correlation using Pearson's correlation coefficient r to measure the strength and direction of association between variables.
- Performing simple linear regression to find the "line of best fit" to predict a dependent variable from an independent variable.
- Using a TI-83 calculator to graphically display scatter plots of data and calculate the regression equation and correlation coefficient.
Research methodology provides guidance on conducting research systematically and scientifically. It explains both qualitative and quantitative research methods and the steps of the research process, including defining the problem, reviewing literature, developing hypotheses, collecting and analyzing data, and reporting findings. Key aspects of research methodology include formulating testable hypotheses, designing studies (e.g. experiments) to test hypotheses, and using statistical analysis to accept or reject null hypotheses.
This document summarizes quantitative data analysis techniques for summarizing data from samples and generalizing to populations. It discusses variables, simple and effect statistics, statistical models, and precision of estimates. Key points covered include describing data distribution through plots and statistics, common effect statistics for different variable types and models, ensuring model fit, and interpreting precision, significance, and probability to generalize from samples.
Research involves defining problems, formulating hypotheses, collecting and evaluating data, reaching conclusions, and testing those conclusions. It is a systematic process that requires accurate data collection and adherence to ethical standards. Research aims to generate new knowledge and insights through logical reasoning using both inductive and deductive methods. The purpose of research can be descriptive, explanatory, or exploratory. There are different types of research methodologies including basic vs applied, descriptive vs exploratory, correlational vs explanatory, qualitative vs quantitative, and conceptual vs empirical research.
Regression and correlation analysis allow researchers to assess relationships between variables. Regression fits a line to two variables that minimizes the sum of squared errors, representing how well the independent variable predicts the dependent variable. Correlation assesses the strength and direction of association, ranging from -1 to 1. R-squared indicates the proportion of variance in the dependent variable explained by the independent variable.
This document provides an overview of single linear regression. It explains that single linear regression extends the concept of correlation by using one variable to predict the value of another variable. It discusses using scatter plots to visualize the relationship between two variables and determine if the relationship is strong or weak, and whether it is positive or negative. Examples are provided to illustrate single linear regression concepts and how to interpret different types of relationships between variables.
This document defines research and discusses the key differences between basic (academic) research and applied (contract) research. It provides Creswell's definition of research as a process using steps to collect and analyze information to increase understanding of a topic. It also outlines 10 aspects of educational research by Gray Anderson, including using research to solve problems, gather new data, develop generalizations, and carefully record and report findings. The document then explains that basic research, also called pure or fundamental research, focuses on discovering truth or developing theories without practical goals in mind. Applied research deals with solving real-world problems and testing theories. Key differences between basic and applied research are discussed, such as ownership of results and focus on layperson versus specialized
The document discusses qualitative coding and memo writing. It provides an overview of coding approaches like descriptive, in vivo, and pattern coding. Codes are short phrases that symbolically represent portions of data. Memos are written reflections on codes, their relationships, and emerging ideas. The document emphasizes that coding and memo writing are iterative, cyclical processes to develop categories and analyze their connections for qualitative research.
This document provides an example of simple linear regression with one independent variable. It explains that linear regression finds the line of best fit by estimating values for the slope (b1) and y-intercept (b0) that minimize the sum of the squared errors between the observed data points and the regression line. It provides the formulas for calculating the least squares estimates of b1 and b0. The document includes a table of temperature and sales data and a corresponding scatter plot as an example of simple linear regression analysis.
Objectives
To provide an introduction to the statistical analysis of
failure time data
To discuss the impact of data censoring on data analysis
To demonstrate software tools for reliability data analysis
Organization
Reliability definition
Characteristics of reliability data
Statistical analysis of censored reliability data
Statistical inference: Probability and DistributionEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 1 (probability) and week 2 (distribution).
The document provides an introduction to statistics and probability. It discusses key concepts including population and sample, types of data and variables, measurement scales, errors in measurement, and statistical inference. The objectives are to develop a statistical thinking approach and teach basic statistical techniques. The course will cover descriptive statistics, probability, and inferential statistics over 15 lectures. Students will complete homework assignments and exams.
Categorical data analysis refers to methods for analyzing discrete or categorical response variables. Common distributions for categorical data include the Bernoulli, binomial, Poisson, and multinomial distributions. Chi-square tests can be used to test goodness of fit, independence, and homogeneity for categorical data. The chi-square test statistic compares observed and expected frequencies in one or more categories. A larger chi-square value provides more evidence to reject the null hypothesis of a good fit or independence between variables.
Here are the key steps in systematic random sampling:
1. Number the sampling frame from 1 to N
2. Calculate the sampling interval K by dividing the total population by the desired sample size
3. Select a random number between 1 and K to determine the first sample unit
4. Then select every Kth unit after that to complete the sample
For example, if sampling 100 units from a population of 400, the sampling interval K would be 4. A random start between 1-4 would be selected, then every 4th unit after that. This ensures a systematically selected random sample.
The Wilcoxon signed-rank test is a non-parametric statistical test used to compare two related samples or repeated measurements on a single sample to assess if their population mean ranks differ. It can be used as an alternative to the paired t-test when the population cannot be assumed to be normally distributed. The test involves ranking the differences between paired observations, ignoring the signs of the differences, and comparing the sum of the ranks of the positive or negative differences to critical values to determine if there are statistically significant differences between the samples. A limitation is that observations with a difference of zero are discarded, which can be a concern if samples come from a discrete distribution.
3.2 Definition and Concepts
3.2.1 Hypothesis Testing
3.2.2 The Core Logic of Hypothesis Testing
3.2.3 The Hypothesis – Testing Process
3.2.4 Implications of Rejecting or Failing to Reject the Null Hypothesis
3.2.5 One-Tailed and Two-Tailed Hypothesis Tests
3.2.6 Decision Errors
3.3 Type I Error
3.4 Type II Error
3.5 Relationship between Type I and Type II Errors
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It introduces the main interfaces for working with data in SPSS, including the data view, variable view, output view, draft view, and syntax view. It also provides instructions for installing sample data files and demonstrates how to generate a basic cross-tabulation output of employment by gender using the automated features.
Quantitative Methods of Research-Intro to research
Once a researcher has written the research question, the next step is to determine the appropriate research methodology necessary to study the question. The three main types of research design methods are qualitative, quantitative and mixed methods.
Quantitative research involves the systematic collection and analysis of data.
The document discusses simple linear regression. It defines key terms like regression equation, regression line, slope, intercept, residuals, and residual plot. It provides examples of using sample data to generate a regression equation and evaluating that regression model. Specifically, it shows generating a regression equation from bivariate data, checking assumptions visually through scatter plots and residual plots, and interpreting the slope as the marginal change in the response variable from a one unit change in the explanatory variable.
Chi-Square test of Homogeneity by Pops P. Macalino (TSU-MAEd)pops macalino
The chi-square test of homogeneity determines if two or more populations have the same distribution for a categorical variable. It compares the observed frequencies in each category to the expected frequencies if the null hypothesis of homogeneity is true. The test statistic is distributed as chi-square with degrees of freedom equal to (number of rows - 1) * (number of columns - 1). If the computed chi-square value exceeds the critical value, the null hypothesis of homogeneity is rejected.
This document discusses logistic regression for categorical response variables. It provides examples of binary and ordinal categorical response variables like whether someone smokes (yes/no) or the success of a medical treatment (survives/dies). It then demonstrates how to perform binary logistic regression in R to predict a binary outcome like gender from height. Key aspects covered include interpreting the logistic regression coefficients, plotting the logistic curve, and calculating odds ratios to compare two groups.
The document compares linear regression using gradient descent and normal equations on two datasets. For the FRIED dataset, gradient descent without regularization had the best results. Adding higher degree polynomials and variable multiplications increased the model complexity but led to overfitting. For the ABALONE dataset, gradient descent with lambda=0.03 performed best. Normal equations was faster for the smaller ABALONE dataset but slower for the larger FRIED dataset due to its cubic runtime complexity. Increasing the model complexity provided better fits to the training data but risked overfitting.
Regression and correlation analysis allow researchers to assess relationships between variables. Regression fits a line to two variables that minimizes the sum of squared errors, representing how well the independent variable predicts the dependent variable. Correlation assesses the strength and direction of association, ranging from -1 to 1. R-squared indicates the proportion of variance in the dependent variable explained by the independent variable.
This document provides an overview of single linear regression. It explains that single linear regression extends the concept of correlation by using one variable to predict the value of another variable. It discusses using scatter plots to visualize the relationship between two variables and determine if the relationship is strong or weak, and whether it is positive or negative. Examples are provided to illustrate single linear regression concepts and how to interpret different types of relationships between variables.
This document defines research and discusses the key differences between basic (academic) research and applied (contract) research. It provides Creswell's definition of research as a process using steps to collect and analyze information to increase understanding of a topic. It also outlines 10 aspects of educational research by Gray Anderson, including using research to solve problems, gather new data, develop generalizations, and carefully record and report findings. The document then explains that basic research, also called pure or fundamental research, focuses on discovering truth or developing theories without practical goals in mind. Applied research deals with solving real-world problems and testing theories. Key differences between basic and applied research are discussed, such as ownership of results and focus on layperson versus specialized
The document discusses qualitative coding and memo writing. It provides an overview of coding approaches like descriptive, in vivo, and pattern coding. Codes are short phrases that symbolically represent portions of data. Memos are written reflections on codes, their relationships, and emerging ideas. The document emphasizes that coding and memo writing are iterative, cyclical processes to develop categories and analyze their connections for qualitative research.
This document provides an example of simple linear regression with one independent variable. It explains that linear regression finds the line of best fit by estimating values for the slope (b1) and y-intercept (b0) that minimize the sum of the squared errors between the observed data points and the regression line. It provides the formulas for calculating the least squares estimates of b1 and b0. The document includes a table of temperature and sales data and a corresponding scatter plot as an example of simple linear regression analysis.
Objectives
To provide an introduction to the statistical analysis of
failure time data
To discuss the impact of data censoring on data analysis
To demonstrate software tools for reliability data analysis
Organization
Reliability definition
Characteristics of reliability data
Statistical analysis of censored reliability data
Statistical inference: Probability and DistributionEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 1 (probability) and week 2 (distribution).
The document provides an introduction to statistics and probability. It discusses key concepts including population and sample, types of data and variables, measurement scales, errors in measurement, and statistical inference. The objectives are to develop a statistical thinking approach and teach basic statistical techniques. The course will cover descriptive statistics, probability, and inferential statistics over 15 lectures. Students will complete homework assignments and exams.
Categorical data analysis refers to methods for analyzing discrete or categorical response variables. Common distributions for categorical data include the Bernoulli, binomial, Poisson, and multinomial distributions. Chi-square tests can be used to test goodness of fit, independence, and homogeneity for categorical data. The chi-square test statistic compares observed and expected frequencies in one or more categories. A larger chi-square value provides more evidence to reject the null hypothesis of a good fit or independence between variables.
Here are the key steps in systematic random sampling:
1. Number the sampling frame from 1 to N
2. Calculate the sampling interval K by dividing the total population by the desired sample size
3. Select a random number between 1 and K to determine the first sample unit
4. Then select every Kth unit after that to complete the sample
For example, if sampling 100 units from a population of 400, the sampling interval K would be 4. A random start between 1-4 would be selected, then every 4th unit after that. This ensures a systematically selected random sample.
The Wilcoxon signed-rank test is a non-parametric statistical test used to compare two related samples or repeated measurements on a single sample to assess if their population mean ranks differ. It can be used as an alternative to the paired t-test when the population cannot be assumed to be normally distributed. The test involves ranking the differences between paired observations, ignoring the signs of the differences, and comparing the sum of the ranks of the positive or negative differences to critical values to determine if there are statistically significant differences between the samples. A limitation is that observations with a difference of zero are discarded, which can be a concern if samples come from a discrete distribution.
3.2 Definition and Concepts
3.2.1 Hypothesis Testing
3.2.2 The Core Logic of Hypothesis Testing
3.2.3 The Hypothesis – Testing Process
3.2.4 Implications of Rejecting or Failing to Reject the Null Hypothesis
3.2.5 One-Tailed and Two-Tailed Hypothesis Tests
3.2.6 Decision Errors
3.3 Type I Error
3.4 Type II Error
3.5 Relationship between Type I and Type II Errors
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It introduces the main interfaces for working with data in SPSS, including the data view, variable view, output view, draft view, and syntax view. It also provides instructions for installing sample data files and demonstrates how to generate a basic cross-tabulation output of employment by gender using the automated features.
Quantitative Methods of Research-Intro to research
Once a researcher has written the research question, the next step is to determine the appropriate research methodology necessary to study the question. The three main types of research design methods are qualitative, quantitative and mixed methods.
Quantitative research involves the systematic collection and analysis of data.
The document discusses simple linear regression. It defines key terms like regression equation, regression line, slope, intercept, residuals, and residual plot. It provides examples of using sample data to generate a regression equation and evaluating that regression model. Specifically, it shows generating a regression equation from bivariate data, checking assumptions visually through scatter plots and residual plots, and interpreting the slope as the marginal change in the response variable from a one unit change in the explanatory variable.
Chi-Square test of Homogeneity by Pops P. Macalino (TSU-MAEd)pops macalino
The chi-square test of homogeneity determines if two or more populations have the same distribution for a categorical variable. It compares the observed frequencies in each category to the expected frequencies if the null hypothesis of homogeneity is true. The test statistic is distributed as chi-square with degrees of freedom equal to (number of rows - 1) * (number of columns - 1). If the computed chi-square value exceeds the critical value, the null hypothesis of homogeneity is rejected.
This document discusses logistic regression for categorical response variables. It provides examples of binary and ordinal categorical response variables like whether someone smokes (yes/no) or the success of a medical treatment (survives/dies). It then demonstrates how to perform binary logistic regression in R to predict a binary outcome like gender from height. Key aspects covered include interpreting the logistic regression coefficients, plotting the logistic curve, and calculating odds ratios to compare two groups.
The document compares linear regression using gradient descent and normal equations on two datasets. For the FRIED dataset, gradient descent without regularization had the best results. Adding higher degree polynomials and variable multiplications increased the model complexity but led to overfitting. For the ABALONE dataset, gradient descent with lambda=0.03 performed best. Normal equations was faster for the smaller ABALONE dataset but slower for the larger FRIED dataset due to its cubic runtime complexity. Increasing the model complexity provided better fits to the training data but risked overfitting.
This document provides an overview of statistical concepts and analysis techniques in R, including measures of central tendency, data variability, correlation, regression, and time series analysis. Key points covered include mean, median, mode, variance, standard deviation, z-scores, quartiles, standard deviation vs variance, correlation, ANOVA, and importing/working with different data structures in R like vectors, lists, matrices, and data frames.
This document discusses algorithm-independent machine learning techniques. It introduces concepts like bias and variance, which can quantify how well a learning algorithm matches a problem without depending on a specific algorithm. Methods like cross-validation, bootstrapping, and resampling can be used with different algorithms. While no algorithm is inherently superior, such techniques provide guidance on algorithm use and help integrate multiple classifiers.
The document discusses algorithm-independent machine learning and some fundamental problems in machine learning. It introduces concepts like bias and variance, the no free lunch theorem, and minimum description length principle. Key ideas are that no learning algorithm is inherently superior, algorithms can be evaluated based on how well they match the learning problem, and assumptions are needed to determine similarity between patterns or features.
This document discusses algorithm-independent machine learning techniques. It introduces concepts like bias and variance which can be used to quantify how well a learning algorithm matches a problem, regardless of the specific algorithm used. It discusses techniques like cross-validation, resampling, and combining multiple classifiers that can improve performance in a way that is independent of the learning algorithm. The document also covers principles like minimum description length and no free lunch which provide theoretical foundations for algorithm-independent machine learning.
I am Hannah Lucy. Currently associated with statisticshomeworkhelper.com as statistics homework helper. After completing my master's from Kean University, USA, I was in search of an opportunity that expands my area of knowledge hence I decided to help students with their homework. I have written several statistics homework till date to help students overcome numerous difficulties they face.
This document summarizes a paper that adapts naive Bayes for label ranking problems. It applies this to algorithm recommendation and ranking financial analysts. It modifies naive Bayes to calculate prior and conditional probabilities based on similarity rather than classification. This outperforms baselines in experiments recommending algorithms and ranking financial analysts based on past performance data. Future work includes handling missing data and adapting it for continuous variables.
The document provides information about various bioinformatics lessons that will take place on Thursdays, including topics like biological databases, sequence alignments, database searching using FASTA and BLAST, phylogenetics, and protein structure. It also includes details about database searching methods like dynamic programming, FASTA, BLAST, and parameters that can be adjusted for BLAST searches.
Logistic Regression in Case-Control StudySatish Gupta
This document provides an introduction to using logistic regression in R to analyze case-control studies. It explains how to download and install R, perform basic operations and calculations, handle data, load libraries, and conduct both conditional and unconditional logistic regression. Conditional logistic regression is recommended for matched case-control studies as it provides unbiased results. The document demonstrates how to perform logistic regression on a lung cancer dataset to analyze the association between disease status and genetic and environmental factors.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
I am Hannah Lucy. Currently associated with excelhomeworkhelp.com as excel homework helper. After completing my master's from Kean University, USA, I was in search of an opportunity that expands my area of knowledge hence I decided to help students with their homework. I have written several excel homework till date to help students overcome numerous difficulties they face.
Calibrating Probability with Undersampling for Unbalanced ClassificationAndrea Dal Pozzolo
This study examines how undersampling affects posterior probability estimates in unbalanced classification tasks. It shows that undersampling warps the posterior probabilities away from the true probabilities. However, the study presents a method to correct the warped probabilities using a simple formula, which provides calibrated probabilities without loss of predictive performance. Experiments on real-world datasets demonstrate that the corrected probabilities have better calibration than uncorrected probabilities while maintaining ranking quality.
This document provides information on performing a one-way analysis of variance (ANOVA). It discusses the F-distribution, key terms used in ANOVA like factors and treatments, and how to calculate and interpret an ANOVA test statistic. An example demonstrates how to conduct a one-way ANOVA to determine if three golf clubs produce different average driving distances.
The document discusses the steps for conducting a response surface methodology (RSM) experiment using central composite design (CCD). It involves determining independent and dependent variables, selecting an appropriate CCD, conducting the experiment runs according to the design, analyzing the data using statistical methods to develop a mathematical model and check its adequacy, and using the model to optimize responses. Key aspects of RSM and CCD covered include developing the design, analyzing results through ANOVA and regression, and checking model validity.
Digital electronics k map comparators and their functionkumarankit06875
This document provides an overview of a digital electronics presentation covering K-maps, comparators, and their applications. The agenda includes an introduction to K-maps and how they are used to simplify Boolean expressions. It also covers comparators, their operation and function. Examples are given of using K-maps to minimize logic expressions and identify prime implicants. The applications of K-maps in digital circuit design optimization are discussed. Comparators and their use in examples is briefly outlined.
This document provides teaching suggestions for regression models:
1) It suggests emphasizing the difference between independent and dependent variables in a regression model using examples.
2) It notes that correlation does not necessarily imply causation and gives an example of variables that are correlated but changing one does not affect the other.
3) It recommends having students manually draw regression lines through data points to appreciate the least squares criterion.
4) It advises selecting random data values to generate a regression line in Excel to demonstrate determining the coefficient of determination and F-test.
5) It suggests discussing the full and shortcut regression formulas to provide a better understanding of the concepts.
This document provides teaching suggestions for regression models:
1) It suggests emphasizing the difference between independent and dependent variables in a regression model using examples.
2) It notes that correlation does not necessarily imply causation and gives an example of variables that are correlated but changing one does not affect the other.
3) It recommends having students manually draw regression lines through data points to appreciate the least squares criterion.
4) It advises selecting random data values to generate a regression line in Excel to demonstrate determining the coefficient of determination and F-test.
5) It suggests discussing the full and shortcut regression formulas to provide a better understanding of the concepts.
Asslam o alaikum dear students, my name is Nadeem Altaf. I am from Pakistan. I am a student & there isan topic about Graeco Latin Square Design and Other designs
1. Design Matrix and
Contrast Statement in SAS Regression
Procedures
Gang Cui, MPH
Sr. Biostatistician, CSCC, UNC
2. Outline
• Motivations
• Design Matrix definition
• Three commonly used design matrix coding schemes in
logistic regression
• Contrast statement under effect and GLM coding scheme
• Summary of design matrix choices in all regression procedures
13:10
3. Motivation
• Deep learning on certain certain elusive topic
• Mutual learning and knowledge sharing
• Light weight but practical
• Engaging and enjoy
13:10
4. Design Matrix
• Design matrix is a matrix of “values” of explanatory
variables of a set of objects. The values in matrix is not the
raw value in dataset, also called parameterization.
– Exp. Comparing means of y among three groups: with y being the continuous response,
x being explanatory variable having 3 groups (1,2,3), 𝜇𝑖 being the mean of y of each
group, 𝜏𝑖 being the difference of each group comparing with reference group.
𝑦𝑖𝑗 = 𝜇𝑖 + 𝜖𝑖𝑗 𝑦𝑖𝑗 = 𝜇1 + 𝜏𝑖 + 𝜖𝑖𝑗
32
31
22
21
13
12
11
3
2
1
32
31
22
21
13
12
11
100
100
010
010
001
001
001
y
y
y
y
y
y
y
32
31
22
21
13
12
11
3
2
1
32
31
22
21
13
12
11
101
101
011
011
001
001
001
y
y
y
y
y
y
y
𝑦11 = 𝜇1 + 𝜖11
𝑦12 = 𝜇1 + 𝜖12
𝑦13 = 𝜇1 + 𝜖13
𝑦21 = 𝜇2 + 𝜖21
𝑦22 = 𝜇2 + 𝜖22
𝑦31 = 𝜇3 + 𝜖31
𝑦32 = 𝜇3 + 𝜖32
𝑦11 = 𝜇1 + 𝜖11
𝑦12 = 𝜇1 + 𝜖12
𝑦13 = 𝜇1 + 𝜖13
𝑦21 = 𝜇1 + 𝜏2 + 𝜖22
𝑦22 = 𝜇1 + 𝜏2 + 𝜖22
𝑦31 = 𝜇1 + 𝜏3 + 𝜖31
𝑦32 = 𝜇1 + 𝜏3 + 𝜖32 13:10
5. Design Matrix
In dataset, you may have explanatory vars Sex =“M”/”F”, and Age_Group =1/2/3. SAS
class and model statement will be:
Class Sex Age_Group/<options;
model outcome_var=sex age_group/<options>;
In design matrix, SAS creates “dummy variables” for sex and age_group, with value 1
and 0 only (GLM and REF coding). True mathematic model will be something like:
Outcom_var=𝜇+𝛽iSex1+ 𝛽2Sex2+ 𝛽3Age_Group1+ 𝛽4Age_Group2 +𝛽5Age_Group3,
where 𝜇 is the mean of reference group, 𝛽 is added effect corresponding to each group.
In GLM 𝛽 for reference group will be set to 0.
Data Design Matrix
Sex Age_Group
Sex Age_Group 𝜇 (intercept) Sex1 Sex2 Age_Group1 Age_Group2 Age_Group3
M 1 1 1 0 1 0 0
M 2 1 1 0 0 1 0
M 3 1 1 0 0 0 1
F 1 1 0 1 1 0 0
F 2 1 0 1 0 1 0
F 3 1 0 1 0 0 1
Dummy
variables
13:10
6. Design Matrix Choices in SAS
Three most common design matrix coding schemes:
• GLM - (indicator or dummy coding) reference value coded as “1”.
• Effect – Ref value coded as “-1”. Beta estimates are estimating the difference
in the effect of each nonreference level compared to the average effect over
all levels.
• REF - Reference cell coding, reference value coded as “0”.
Specified in Class statement global option Param=<GLM/EFFECT/REF>:
class <classvar1> <classvar2>/param = glm/effect/ref <or by default>;
*Note:
1. Design matrix option has profound impact on CONTRAST/LSMEAN statement and
result interpretation.
2. Different procedures have different default design matrix options.
3. Not all procedures allow all three options.
13:10
7. PARAM=GLM in PROC Logistic
• Suppose having a dataset with binary outcome Pain(Yes/No), and
explanatory variable treatment (A, B, P) and sex (M, F).
proc logistic data=data1;
class Treatment Sex/param=GLM;
model Pain= Treatment Sex/ expb;
run;
Class Level Information
Class Value Design Variables
Treatment A 1 0 0
B 0 1 0
P 0 0 1
Sex F 1 0
M 0 1
*“Dummy coding”: c columns in design matrix,
c is the number of level of class var.
Reference value coded as 1.
Reference is always “LAST” value with GLM, you
don’t have choice unless you formatted ref as
the “LAST”.
13:10
8. PARAM=GLM in PROC Logistic
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Treatment A vs P 8.960 1.949 41.201
Treatment B vs P 11.822 2.453 56.972
Sex F vs M 4.354 1.206 15.719
With param=GLM, you can simply exp(beta)
to get OR, because 𝛽 coefficient estimate is
estimating effect comparing to reference
level.
OR of A vs P=exp(0.2.1928)=8.960;
OR of B vs P=exp(0.2.4700)=11.822
OR of F vs M=exp(1.4711)=4.354
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq Exp(Est)
Intercept 1 -1.9705 0.7030 7.8570 0.0051 0.139
Treatment A 1 2.1928 0.7784 7.9353 0.0048 8.960
Treatment B 1 2.4700 0.8023 9.4770 0.0021 11.822
Treatment P 0 0 . . . .
Sex F 1 1.4711 0.6550 5.0440 0.0247 4.354
Sex M 0 0 . . . .
13:10
9. PARAM=EFFECT in PROC Logistic
proc logistic data=data1;
class Treatment Sex/param=effect;
model Pain= Treatment Sex/ expb;
run;
Class Level Information
Class Value Design Variables
Treatment A 1 0
B 0 1
P -1 -1
Sex F 1
M -1
*Default param=effect
*Default reference is “LAST” value. But you
can change reference.
*There is c-1 column in design matrix, c is the
number of level of class var. Reference value
coded as -1.
13:10
10. PARAM=EFFECT in PROC Logistic
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq Exp(Est)
Intercept 1 0.3193 0.3089 1.0685 0.3013 1.376
Treatment A 1 0.6385 0.4323 2.1815 0.1397 1.894
Treatment B 1 0.9157 0.4467 4.2032 0.0403 2.499
Sex F 1 0.7355 0.3275 5.0440 0.0247 2.087
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Treatment A vs P 8.960 1.949 41.201
Treatment B vs P 11.822 2.453 56.972
Sex F vs M 4.354 1.206 15.719
*Normally, one CANNOT calculate
OR=exp(beta) in GLM, because 𝛽 coefficient
estimate is estimating effect comparing to
average of all level.
* It is possible that p-value in beta estimate
and 95% of OR in OR estimate are NOT
consistent
OR of A vs P=exp(2*0.6385+0.9157))=8.960;
OR of B vs P=exp(2*0.9157+0.6385))=11.822
OR if F vs M=exp(2x0.7355)=4.354
Huh
13:10
11. PARAM=EFFECT in PROC Logistic
According to design matrix: treatment=P is coded as -1:
Therefore the logit function of each treatment group, difference
between non-ref vs ref, Odds and OR are:
B
A
11
10
01
F
1
1
Matrix design for treatment Matrix design for sex
Class Logit function Odds difference OR
Trt A L(A)=𝛽0+𝛽A Exp(𝛽0+𝛽A) A vs P: L(A)-L(P)=2𝛽A+𝛽B Exp(2𝛽A+𝛽B)
B L(B)=𝛽0+𝛽B Exp(𝛽0+𝛽B) B vs P: L(B)-L(P)=2𝛽B+𝛽A Exp(2𝛽B+𝛽A)
P L(P)=𝛽0-𝛽A-𝛽B Exp(𝛽0-𝛽A-𝛽B)
Sex F L(F)=𝛽0+𝛽F Exp(𝛽0+𝛽F) F vs M: L(F)-L(M)=2𝛽F Exp(2𝛽F)
M L(M)=𝛽0-𝛽F Exp(𝛽0-𝛽F)
13:10
12. PARAM=REF in PROC Logistic
• proc logistic data=data1;
class Treatment Sex/param=REF;
• model Pain= Treatment Sex/ expb;
• run;
*Reference value coded as 0.
Class Level Information
Class Value Design Variables
Treatment A 1 0
B 0 1
P 0 0
Sex F 1
M 0
13:10
13. PARAM=REF in PROC Logistic
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Treatment A vs P 8.960 1.949 41.201
Treatment B vs P 11.822 2.453 56.972
Sex F vs M 4.354 1.206 15.719
With param=REF, SAS output simply omit
reference row, otherwise, estimate values
are the same as GLM. OR is calculated in the
same way as GLM.
OR of A vs P=exp(2.1928)=8.960;
OR of B vs P=exp(2.4700)=11.822
OR of F vs M=exp(1.4711)=4.354
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq Exp(Est)
Intercept 1 -1.9705 0.7030 7.8570 0.0051 0.139
Treatment A 1 2.1928 0.7784 7.9353 0.0048 8.960
Treatment B 1 2.4700 0.8023 9.4770 0.0021 11.822
Sex F 1 1.4711 0.6550 5.0440 0.0247 4.354
13:10
14. Mixed Coding System in PROC Logistic
One can specify different param option for different class vars, though not usual.
Individual parametrization trump global option, unless the global option is GLM.
proc logistic data=data1;
class Treatment(param=effect) Sex(param=ref)/param=ref;
model Pain= Treatment Sex / expb;
run;
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -0.4163 0.4252 0.9586 0.3276 0.659
Treatment A 1 0.6385 0.4323 2.1815 0.1397 1.894
Treatment B 1 0.9157 0.4467 4.2032 0.0403 2.499
Sex F 1 1.4711 0.6550 5.0440 0.0247 4.354
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Treatment A vs P 8.960 1.949 41.201
Treatment B vs P 11.822 2.453 56.972
Sex F vs M 4.354 1.206 15.719
OR of A vs P=exp(2*0.6385+0.9157)=8.960
OR of B vs P=exp(2*0.9175+0.9157)=11.822
OR of F vs M=exp(1.4711)=4.354
13:10
15. Contrast Statement
What if I want to compare treatment A vs B, or average A and B vs P, without changing
reference group and without rerun the procedure?
The answer is using Contrast statement. However, different coding have profound
impact on how to write Contrast statement.
General syntax:
CONTRAST 'label' <var-name > <dummy_coeff_1 …dummy_coeffecient_n>/options;
13:10
16. Contrast Statement with Effect Coding in Logistic
Comparing A vs B: We know L(A)=𝛽0+𝛽A, and L(B)=𝛽0+𝛽B, therefore
L(A)-L(B)= 𝛽A - 𝛽B, and we can write contrast statement as:
Contrast “A vs B” treatment 1 -1;
Class Logit function Odds difference OR
Trt A L(A)=𝛽0+𝛽A Exp(𝛽0+𝛽A) A vs P: L(A)-L(P)=2𝛽A+𝛽B Exp(2𝛽A+𝛽B)
B L(B)=𝛽0+𝛽B Exp(𝛽0+𝛽B) B vs P: L(B)-L(P)=2𝛽B+𝛽A Exp(2𝛽B+𝛽A)
P L(P)=𝛽0-𝛽A-𝛽B Exp(𝛽0-𝛽A-𝛽B)
Sex F L(F)=𝛽0+𝛽F Exp(𝛽0+𝛽F) F vs M: L(F)-L(M)=2𝛽F Exp(2𝛽F)
M L(M)=𝛽0-𝛽F Exp(𝛽0-𝛽F)
13:10
17. Contrast Statement with Effect Coding in Logistic
Contrast Estimation and Testing Results by Row
Contrast Type Row Estimate Standard
Error
Alpha Confidence Limits Wald
Chi-Square
Pr > ChiSq
A vs B PARM 1 -0.2772 0.7463 0.05 -1.7399 1.1855 0.1380 0.7103
A vs B EXP 1 0.7579 0.5656 0.05 0.1755 3.2725 0.1380 0.7103
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq Exp(Est)
Intercept 1 0.3193 0.3089 1.0685 0.3013 1.376
Treatment A 1 0.6385 0.4323 2.1815 0.1397 1.894
Treatment B 1 0.9157 0.4467 4.2032 0.0403 2.499
Sex F 1 0.7355 0.3275 5.0440 0.0247 2.087
proc logistic data=data1;
class Treatment Sex/param=effect;
model Pain= Treatment Sex /e expb;
Contrast "A vs B" treatment 1 -1/ e estimate=both;
run;
Coefficients of Contrast A
vs B
Parameter Row1
TreatmentA 1
TreatmentB -1
13:10
18. Contrast Statement with Effect Coding in Logistic
Now compare average of (A and B) vs P:
½{L(A)+L(B)} - L(P) = 1/2(𝛽0+𝛽A+𝛽0+𝛽B) - (𝛽0-𝛽A-𝛽B) = 1.5𝛽A+1.5 𝛽B
proc logistic data=data1;
class Treatment Sex/param=effect;
model Pain= Treatment Sex / expb;
Contrast “average of (A and B) vs P" treatment 1.5 1.5/ e estimate=both;
run;
Coefficients of Contrast average of (A and B) vs P
Parameter Row1
Intercept 0
TreatmentA 1.5
TreatmentB 1.5
Contrast Estimation and Testing Results by Row
Contrast Type Row Estimate Standard
Error
Alpha Confidence Limits Wald
Chi-Square
Pr > ChiSq
average of (A and B) vs P PARM 1 2.3314 0.6969 0.05 0.9656 3.6972 11.1931 0.0008
average of (A and B) vs P EXP 1 10.2923 7.1722 0.05 2.6263 40.3341 11.1931 0.0008
13:10
19. Contrast statement Coding in PROC GLM
• Unlike proc logistic, GLM coding is the only coding scheme in proc GLM.
• Always use the “LAST” value as reference group. A=2 and B=3 in following case.
Example: Y being continuous response following normal distribution with constant
variance. The model has two factors, A with 2 levels and B with 3 levels.
Main Effect model: Yij = μ + αi + βj + εij, where i is the level of factor A, and j is the level
of B
The design matrix of main effect model:
Data Design Matrix
A B
A B 𝜇 (intercept) A1 A2 B1 B2 B3
1 1 1 1 0 1 0 0
1 2 1 1 0 0 1 0
1 3 1 1 0 0 0 1
2 1 1 0 1 1 0 0
2 2 1 0 1 0 1 0
2 3 1 0 1 0 0 1
13:10
20. Contrast Statement in PROC GLM
Test hypothesis 1: H 0 : μB1 = μB2
From the design matrix, we know μ B1 = μ + αi + β1 and μ B2 = μ + αi + β2
μB1 - μB2 = β1 -β2
Contrast statement: Contrast “B=1 vs B=2” B 1 -1 / e
proc glm data=data3;
class a b;
model y=a b/solution;
contrast "B=1 vs B=2" B 1 -1/e;
ESTIMATE "B=1 vs B=2" B 1 -1/e; *Usually Contrast and Esitmate go hand-in-hand, Esitmate give
estimate of difference and SE
run;
Contrast DF Contrast SS Mean Square F Value Pr > F
B=1 vs B=2 1 4.98877622 4.98877622 4.85 0.0328
Parameter Estimate Standard Error t Value Pr > |t|
B=1 vs B=2 -0.84927549 0.38544563 -2.20 0.0328
13:10
21. Contrast Statement Coding in PROC GLM
Model with crossed effects: Yijk = μ + αi + βj + αβij + εijk , where i is the level of factor A,
and j is the level of B, and k is level of A*B
The design matrix of crossed effect model:
Data Design Matrix
A B A*B
A B μ A1 A2 B1 B2 B3
A1
B1
A1
B2
A1
B3
A2
B1
A2
B2
A2
B3
1 1 1 1 0 1 0 0 1 0 0 0 0 0
1 2 1 1 0 0 1 0 0 1 0 0 0 0
1 3 1 1 0 0 0 1 0 0 1 0 0 0
2 1 1 0 1 1 0 0 0 0 0 1 0 0
2 2 1 0 1 0 1 0 0 0 0 0 1 0
2 3 1 0 1 0 0 1 0 0 0 0 0 1
13:10
22. Contrast Statement in PROC GLM
Test hypothesis 2: H 0 : μ 𝛼B11 = μ 𝛼B12
From the design matrix, we know: μ 𝛼B11 = μ + α1 + β1 + αβ11 and μ 𝛼B12= μ + α1 + β2 + αβ12
μB1 - μB2 = β1 -β2+ αβ11- αβ12
Contrast statement: now have two dummy coefficients
proc glm data=data3;
class a b;
model y=a b/solution;
contrast "B=1 vs B=2" B 1 -1
A*B 1 -1; *This is equivalent as 1 -1 0 0 0 0 , trailing 0 can be ignored;
ESTIMATE "B=1 vs B=2" B 1 -1;
A*B 1 -1;
run;
Contrast DF Contrast SS Mean Square F Value Pr > F
AB11 vs AB12 1 6.81411446 6.81411446 6.53 0.0143
Parameter Estimate Standard Error t Value Pr > |t|
AB11 vs AB12 -1.40976950 0.55182283 -2.55 0.0143
13:10
23. Steps to Construct Contrast Statement
Step 1. Write down the model, two crucial parts to this.
– Parameterization: how design variables in class statement
are coded.
– Parameter ordering: the order of parameters depends on
class statement and order option. Confirm from SAS
output.
Step 2. Write down the hypothesis to be tested.
Step 3. Write the CONTRAST.
13:10
24. Default and Alternative Coding in Regression
SAS Procedures
Default design matrix coding Regression procedures
GLM coding (indicator or dummy):
ref value coded as 1
GENMOD, GLM, GLMSELECT, GLIMMIX,
LIFEREG, MIXED, and SURVEYPHREG
EFFECT coding (deviation from
mean) : ref value coded as -1
CATMOD, LOGISTIC, and SURVEYLOGISTIC
REF coding: ref value coded as 0 PHREG and TRANSREG
Alternative coding allowed? Regression procedures
Allowed LOGISTIC, GENMOD, GLMSELECT, PHREG,
SURVEYLOGISTIC, and SURVEYPHREG.
Not allowed GLM, MIXED, GLIMMIX, and LIFEREG
13:10
25. Take Home Messages
• Design matrix has profound impact on CONTRAST/LSMEAN statement and
result interpretation.
• First, know the design matrix you are using and true math linear
combination of the matrix.
• The variables order in class statement matters.
• Effect coding is estimating effect comparing to average of all level, not to
the reference.
• Follow steps of constructing contrast statement:
– Step 1: know the design matrix and order of parameters
– Step 2: Write done the hypothesis tests
– Step 3: Construct contrast and ESTIMATE statement
• Different regression procedures have different default design matrix
schemes, and not all procedure allows alternative schemes.
13:10
Sometime, when I read statistical and SAS reference, I felt it is better to share my learning with others, and it will benefit both me and the center. When you study or read something, you think you understand until you try to explain to other clearly. That is what homework for. Sometime, explaining to others is the best way to really learn something, because it require you to think deeper and more thoroughly.
In real work, everybody is doing unique works and using specific skills, therefore I also want to set up an example of this mini-lunch presentation at the center, to encourage sharing learning experiences at the center, so we can learn from each other. Sometime, I wish I could just watching a video or presentation that put together material related to specific topic of interest.
For the mini-lunch presentation, the topic will be small and focused, and directly relate to our daily tasks, and hopefully engaging. By the way, I don’t take credit for material here, I just put together what I learned, and try to present it clearly.
For this presentation, I will focus on three common parameterization choices in SAS egression procedures and how to write contrast and LSMEAN statement to perform various hypothesis tests. In all examples here, all explanatory variables are categorical, with response variable being either binary as in logistic model or continuous in GLM.
Sometime, when I read statistical and SAS reference, I felt it is better to share my learning with others, and it will benefit both me and the center. When you study or read something, you think you understand until you try to explain to other clearly. Sometime, explaining to others is the best way to really learn something, because it require you to think deeper and more thoroughly.
In real work, everybody is doing unique works and using specific skills, therefore I also want to set up an example of this mini-lunch presentation at the center, to encourage sharing learning experiences at the center, so we can learn from each other. Sometime, I wish I could just watching a video or presentation that put together material related to specific topic of interest.
For the mini-lunch presentation, the topic will be small and focused, and directly relate to our daily tasks, and hopefully engaging. By the way, I don’t take credit for material here, I just put together what I learned, and try to present it clearly.
According study interest and hypothesis, the design matrix could be different for the same model. Take this simple ANOVA example, y being continuous response variable, and has one explanatory variable with three group