Logistic regression is used to predict categorical outcomes. The presented document discusses logistic regression, including its objectives, assumptions, key terms, and an example application to predicting basketball match outcomes. Logistic regression uses maximum likelihood estimation to model the relationship between a binary dependent variable and independent variables. The document provides an illustrated example of conducting logistic regression in SPSS to predict match results based on variables like passes, rebounds, free throws, and blocks.
Multiple Regression and Logistic RegressionKaushik Rajan
1) Multiple Regression to predict Life Expectancy using independent variables Lifeexpectancymale, Lifeexpectancyfemale, Adultswhosmoke, Bingedrinkingadults, Healthyeatingadults and Physicallyactiveadults.
2) Binomial Logistic Regression to predict the Gender (0 - Male, 1 - Female) with the help of independent variables such as LifeExpectancy, Smokingadults, DrinkingAdults, Physicallyactiveadults and Healthyeatingadults.
Tools used:
> RStudio for Data pre-processing and exploratory data analysis
> SPSS for building the models
> LATEX for documentation
Multiple Linear Regression II and ANOVA IJames Neill
Explains advanced use of multiple linear regression, including residuals, interactions and analysis of change, then introduces the principles of ANOVA starting with explanation of t-tests.
Multiple Regression and Logistic RegressionKaushik Rajan
1) Multiple Regression to predict Life Expectancy using independent variables Lifeexpectancymale, Lifeexpectancyfemale, Adultswhosmoke, Bingedrinkingadults, Healthyeatingadults and Physicallyactiveadults.
2) Binomial Logistic Regression to predict the Gender (0 - Male, 1 - Female) with the help of independent variables such as LifeExpectancy, Smokingadults, DrinkingAdults, Physicallyactiveadults and Healthyeatingadults.
Tools used:
> RStudio for Data pre-processing and exploratory data analysis
> SPSS for building the models
> LATEX for documentation
Multiple Linear Regression II and ANOVA IJames Neill
Explains advanced use of multiple linear regression, including residuals, interactions and analysis of change, then introduces the principles of ANOVA starting with explanation of t-tests.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Residuals represent variation in the data that cannot be explained by the model.
Residual plots useful for discovering patterns, outliers or misspecifications of the model. Systematic patterns discovered may suggest how to reformulate the model.
If the residuals exhibit no pattern, then this is a good indication that the model is appropriate for the particular data.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Residuals represent variation in the data that cannot be explained by the model.
Residual plots useful for discovering patterns, outliers or misspecifications of the model. Systematic patterns discovered may suggest how to reformulate the model.
If the residuals exhibit no pattern, then this is a good indication that the model is appropriate for the particular data.
1Chapter 11 • Interval Estimation of a Populatio.docxnovabroom
1
Chapter 11
• Interval Estimation of a Population Variance
𝑛 − 1 𝑠!
𝜒!
!
! ≤ 𝜎
! ≤
𝑛 − 1 𝑠!
𝜒
!!!!
!
where 𝜒! values are based on a chi-squared distribution with (n-1) degrees of freedom.
• Hypothesis Testing about the Variances of Two Populations
Test Statistic
𝑭 =
𝒔𝟏𝟐
𝒔𝟐𝟐
The critical value 𝐹! is based on an F distribution with 𝑛! − 1 (numerator) and 𝑛! − 1
(denominator) degrees of freedom
Chapter 13
Test for the Equality of k Population Means
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝐸𝑟𝑟𝑜𝑟
𝐹 =
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
The critical value 𝐹!is based on an F distribution with k-1 numerator degrees of freedom and n-k
denominator degrees of freedom.
Chapter 14 (Simple Linear Regression)
b 1 =
!!! ! (!!! !)
!!! ! ²
b1: slope coefficient in a simple regression line
b0: intercept coefficient (coefficient of the constant) in a simple regression line
𝑦: estimated value of the dependent variable y
b0 = 𝑦 − b!x
𝑦 = b0 + b 1 x (estimated equation)
2
𝑆𝑆𝐸 = (𝑌! − 𝑌)²
SST = SSR + SSE ⟹
(𝑌! − 𝑌)² = SST = ( 𝑌 −𝑌)²+ (𝑌! − 𝑌)²
𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 = 𝑆𝑆𝑅 = ( 𝑌 −𝑌)²
total variation in Y = SST = (𝑌! − 𝑌)²
R2 = !!"
!!"
=coefficient of determination,
sample correlation coefficient = 𝑟!,! = 𝑅² (carry the sign of b1)
tstat = b1 / se(b1), se(b1):standard error of the slope coefficient,
residual = error = Y-𝑌
1
Chapter 11
• Interval Estimation of a Population Variance
𝑛 − 1 𝑠!
𝜒!
!
! ≤ 𝜎
! ≤
𝑛 − 1 𝑠!
𝜒
!!!!
!
where 𝜒! values are based on a chi-squared distribution with (n-1) degrees of freedom.
• Hypothesis Testing about the Variances of Two Populations
Test Statistic
𝑭 =
𝒔𝟏𝟐
𝒔𝟐𝟐
The critical value 𝐹! is based on an F distribution with 𝑛! − 1 (numerator) and 𝑛! − 1
(denominator) degrees of freedom
Chapter 13
Test for the Equality of k Population Means
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝐸𝑟𝑟𝑜𝑟
𝐹 =
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
The critical value 𝐹!is based on an F distribution with k-1 numerator degrees of freedom and n-k
denominator degrees of freedom.
Chapter 14 (Simple Linear Regression)
b 1 =
!!! ! (!!! !)
!!! ! ²
b1: slope coefficient in a simple regression line
b0: intercept coefficient (coefficien
1Chapter 11 • Interval Estimation of a Populatio.docxjesusamckone
1
Chapter 11
• Interval Estimation of a Population Variance
𝑛 − 1 𝑠!
𝜒!
!
! ≤ 𝜎
! ≤
𝑛 − 1 𝑠!
𝜒
!!!!
!
where 𝜒! values are based on a chi-squared distribution with (n-1) degrees of freedom.
• Hypothesis Testing about the Variances of Two Populations
Test Statistic
𝑭 =
𝒔𝟏𝟐
𝒔𝟐𝟐
The critical value 𝐹! is based on an F distribution with 𝑛! − 1 (numerator) and 𝑛! − 1
(denominator) degrees of freedom
Chapter 13
Test for the Equality of k Population Means
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝐸𝑟𝑟𝑜𝑟
𝐹 =
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
The critical value 𝐹!is based on an F distribution with k-1 numerator degrees of freedom and n-k
denominator degrees of freedom.
Chapter 14 (Simple Linear Regression)
b 1 =
!!! ! (!!! !)
!!! ! ²
b1: slope coefficient in a simple regression line
b0: intercept coefficient (coefficient of the constant) in a simple regression line
𝑦: estimated value of the dependent variable y
b0 = 𝑦 − b!x
𝑦 = b0 + b 1 x (estimated equation)
2
𝑆𝑆𝐸 = (𝑌! − 𝑌)²
SST = SSR + SSE ⟹
(𝑌! − 𝑌)² = SST = ( 𝑌 −𝑌)²+ (𝑌! − 𝑌)²
𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 = 𝑆𝑆𝑅 = ( 𝑌 −𝑌)²
total variation in Y = SST = (𝑌! − 𝑌)²
R2 = !!"
!!"
=coefficient of determination,
sample correlation coefficient = 𝑟!,! = 𝑅² (carry the sign of b1)
tstat = b1 / se(b1), se(b1):standard error of the slope coefficient,
residual = error = Y-𝑌
1
Chapter 11
• Interval Estimation of a Population Variance
𝑛 − 1 𝑠!
𝜒!
!
! ≤ 𝜎
! ≤
𝑛 − 1 𝑠!
𝜒
!!!!
!
where 𝜒! values are based on a chi-squared distribution with (n-1) degrees of freedom.
• Hypothesis Testing about the Variances of Two Populations
Test Statistic
𝑭 =
𝒔𝟏𝟐
𝒔𝟐𝟐
The critical value 𝐹! is based on an F distribution with 𝑛! − 1 (numerator) and 𝑛! − 1
(denominator) degrees of freedom
Chapter 13
Test for the Equality of k Population Means
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝐸𝑟𝑟𝑜𝑟
𝐹 =
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟
The critical value 𝐹!is based on an F distribution with k-1 numerator degrees of freedom and n-k
denominator degrees of freedom.
Chapter 14 (Simple Linear Regression)
b 1 =
!!! ! (!!! !)
!!! ! ²
b1: slope coefficient in a simple regression line
b0: intercept coefficient (coefficien
8. A 2 x 2 Experimental Design - Quality and Economy (x1 and x2.docxblondellchancy
8. A 2 x 2 Experimental Design: - Quality and Economy (x1 and x2 as independent variables)
Dr. Boonghee Yoo
[email protected]
RMI Distinguished Professor in Business and
Professor of Marketing & International Business
Make changes on the names, labels, and measure on the variable view.
Check the measure.
Have the same keys between “Name” and “Label.”
Run factor analysis for ys (dependent variables).
Select “Principal axis factoring” from “Extraction.”
The two-factor solution seems the best as (1) they are over one eigenvalue each and (2) the variance explained for is over 60%.
The new eigenvalues after the rotation.
The rotated factor matrix is clear.
But note that y3 and y1 are collapsed into one factor.
If not you should rerun factor analysis after removing the most problematic item one at a time.
Repeat this procedure until the rotated factor pattern has
(1) no cross-loading,
(2) no weak factor loading (< 0.5), and
(3) an adequate number of items (not more than 5 items per factor).
If a clear factor pattern is obtained, name the factors.
Attitude and purchase intention (y3 and y1)
Boycotting intention (y2)
Compute the reliability of the items of each factor
Make sure all responses were used.
Cronbach’s a (= Reliability a) must be greater than 0.70. Then, you can create the composite variable out of the member items.
Means and STDs must be similar among the items.
No a here should be greater than Cronbach’s a. If not, you should delete such item(s) to increase a.
Create the composite variable for each factor.
BI = mean (y2_1,y2_2,y2_3)
“PI” will be added to the data.
Go to the Variable View and change its “Name” and “Label.”
8. A 2 x 2 Experimental Design: - Quality and Economy (x1 and x2 as independent variables)
Dr. Boonghee Yoo
[email protected]
RMI Distinguished Professor in Business and
Professor of Marketing & International Business
BLOCK 1. Title and introductory paragraph.
Title and introductory paragraph
Plus, background questions
BLOCK 2 to 5. Show one of four treatments randomly.
x1(hi), x2 (hi)
x1 (hi), x2 (low)
x1 (low), x2 (hi)
x1 (low), x2 (low)
BLOCK 6. Questions.
Manipulation check questions (multi-item scales)
y1, y2, and y3 (multi-item scales)
Socio-demographic questions
Write “Thank you for participation.”
The questionnaire (6 blocks)
A 2x2 between-sample design: SQ (Service quality and ECON (Contribution to local economy)
Each of the four BLOCKs consist of:
The instruction: e.g., “Please read the following description of company ABC carefully.”
The scenario: An image file or written statement
(No questions inside the scenario blocks)
Qualtrics Survey Flow (6 blocks)
Manipulation check questions y1, y2, …, yn
Questions to verify that subjects were manipulated as intended. For example, if the stimulus is dollar-amount price, the manipulation check ...
8. A 2 x 2 Experimental Design - Quality and Economy (x1 and x2.docxpriestmanmable
8. A 2 x 2 Experimental Design: - Quality and Economy (x1 and x2 as independent variables)
Dr. Boonghee Yoo
[email protected]
RMI Distinguished Professor in Business and
Professor of Marketing & International Business
Make changes on the names, labels, and measure on the variable view.
Check the measure.
Have the same keys between “Name” and “Label.”
Run factor analysis for ys (dependent variables).
Select “Principal axis factoring” from “Extraction.”
The two-factor solution seems the best as (1) they are over one eigenvalue each and (2) the variance explained for is over 60%.
The new eigenvalues after the rotation.
The rotated factor matrix is clear.
But note that y3 and y1 are collapsed into one factor.
If not you should rerun factor analysis after removing the most problematic item one at a time.
Repeat this procedure until the rotated factor pattern has
(1) no cross-loading,
(2) no weak factor loading (< 0.5), and
(3) an adequate number of items (not more than 5 items per factor).
If a clear factor pattern is obtained, name the factors.
Attitude and purchase intention (y3 and y1)
Boycotting intention (y2)
Compute the reliability of the items of each factor
Make sure all responses were used.
Cronbach’s a (= Reliability a) must be greater than 0.70. Then, you can create the composite variable out of the member items.
Means and STDs must be similar among the items.
No a here should be greater than Cronbach’s a. If not, you should delete such item(s) to increase a.
Create the composite variable for each factor.
BI = mean (y2_1,y2_2,y2_3)
“PI” will be added to the data.
Go to the Variable View and change its “Name” and “Label.”
8. A 2 x 2 Experimental Design: - Quality and Economy (x1 and x2 as independent variables)
Dr. Boonghee Yoo
[email protected]
RMI Distinguished Professor in Business and
Professor of Marketing & International Business
BLOCK 1. Title and introductory paragraph.
Title and introductory paragraph
Plus, background questions
BLOCK 2 to 5. Show one of four treatments randomly.
x1(hi), x2 (hi)
x1 (hi), x2 (low)
x1 (low), x2 (hi)
x1 (low), x2 (low)
BLOCK 6. Questions.
Manipulation check questions (multi-item scales)
y1, y2, and y3 (multi-item scales)
Socio-demographic questions
Write “Thank you for participation.”
The questionnaire (6 blocks)
A 2x2 between-sample design: SQ (Service quality and ECON (Contribution to local economy)
Each of the four BLOCKs consist of:
The instruction: e.g., “Please read the following description of company ABC carefully.”
The scenario: An image file or written statement
(No questions inside the scenario blocks)
Qualtrics Survey Flow (6 blocks)
Manipulation check questions y1, y2, …, yn
Questions to verify that subjects were manipulated as intended. For example, if the stimulus is dollar-amount price, the manipulation check.
30
2
37
-
-
K
K
30
2
37
-
-
K
K
� EMBED Equation.3 ���-5
� EMBED Equation.3 ���5
Sum=b
0
Product=a x c
1x (-25)
x2+0x-25=0 substitute the value of bx to facilitate factorization
The denominator is a constant term
(x2+5x)(-5x-25)=0
x (x+5)-5(x+5)=0
(x+5)(x-5) are the factors for the domain
All values of x in the expression are included as dividing by the domain will yield a solution
Thus D=� EMBED Equation.3 ���
� EMBED Equation.3 ���The Domain is Integer 2 which divides any factor in the numerator. We factorize the Numerator to obtain the factors.
� EMBED Equation.3 ���-6
� EMBED Equation.3 ���5
Sum=b
-1
Product=a x c
1x (-30)
k2+5k-6k-30=0 substitute the value of bx to facilitate factorization
(k2+5k)(-6k-30)=0
k (k+5)-6(k+5)=0
(k+5)(k-6) are the factors for the domain
k=-5 and k=6 are the excluded values in the expression since they will equal 0 in any range of values.
Thus D={K:K� EMBED Equation.3 ���� EMBED Equation.3 ���,K� EMBED Equation.3 ���(-5) ,k� EMBED Equation.3 ���6}
L
� EMBED Equation.3 ���Factorizing the domain in the form we find factors:
Â
2
25
2
-
x
30
2
37
-
-
K
K
30
2
37
-
-
K
K
Î
Â
¹
¹
30
2
37
-
-
K
K
_1071350467.unknown
_1071351244.unknown
_1071350591.unknown
_1071350663.unknown
_1071348848.unknown
INSTRUCTOR GUIDANCE EXAMPLE: Week One Discussion
Domains of Rational Expressions
Students, you are perfectly welcome to format your math work just as I have done in
these examples. However, the written parts of the assignment MUST be done about
your own thoughts and in your own words. You are NOT to simply copy this wording
into your posts!
Here are my given rational expressions oh which to base my work.
25x2 – 4
67
5 – 9w
9w2 – 4
The domain of a rational expression is the set of all numbers which are allowed to
substitute for the variable in the expression. It is possible that some numbers will not be
allowed depending on what the denominator has in it.
In our Real Number System division by zero cannot be done. There is no number (or
any other object) which can be the answer to division by zero so we must simply call the
attempt “undefined.” A denominator cannot be zero because in a rational number or
expression the denominator divides the numerator.
In my first expression, the denominator is a constant term, meaning there is no variable
present. Since it is impossible for 67 to equal zero, there are no excluded values for the
domain. We can say the domain (D) is the set of all Real Numbers, written in set
notation that would look like this:
D = {x| x ∈ ℜ} or even more simply as D = ℜ.
For my second expression, I need to set the denominator equal to zero to find my
excluded values for w.
9w2 – 4 = 0 I notice this is a difference of squares which I can factor.
(3w – 2)(3w + 2) = 0 Set each factor equal to zero.
3w – 2 = 0 or 3w + 2 = 0 Add or subtract 2 from both sides.
TOPIC Bench-marking Testing1. Windows operating system (Microso.docxjuliennehar
TOPIC: Bench-marking Testing
1. Windows operating system (Microsoft Windows 10 Pro 10.0.17763) in terms of what the literature says about the efficiencies AND inefficiencies for each in terms of Performance that you will measure (graphics, cpu, memory, file storage). This section should be really detailed and contain subheadings. Basically there are 4 sections.
2. Research what benchmarking is, its purpose, why its a valuable tool for IT managers.
3. Research at least two benchmark tools that you can use in your research (so free and downloadable). 2 for Windows Describe what the benchmark tool is, who developed it, and find a case study where its been used (if possible).
4. Discuss the data and visual reports that the tool will give you so you can compare the results. Be specific here...this is critical to success.
***You need at least 2 references PER fact. You must use APA inline citations.
8. A 2 x 2 Experimental Design: - Quality and Economy (x1 and x2 manipulation checks)
Dr. Boonghee Yoo
[email protected]
RMI Distinguished Professor in Business and
Professor of Marketing & International Business
Run factor analysis for x1 and x2 manipulation check questions.
2
x1 MC - Perceived service quality
x2 MC - Perceived contribution to local economy
Compute the composite variable for each x MC.
3
Create x1MC and x2MC.
Run t-test to check if the manipulation is well done.
Test variable (x1MC here):
Interval- or ratio-scaled
variable(s)
Grouping variable (x1 here):
A nominal-scaled variable:
Select two groups that
you want to compare.
Independent-samples t-test
Step 1.
See the sample mean of each group.
See if the mean difference is as expected (e.g., Hi > Low).
Step 2. Levene’s test (Ho: s2group1 = s2group2)
If p-value of Levene’s test > alpha, read the “Equal variances assumed” line.
If p-value < alpha, read the “Equal variances NOT assumed” line.
Step 3. t-test
Read the t-value, which is the test statistics.
And read p-value.
Levene’s test (Ho: s2group1 = s2group2)
The graph confirms a successful manipulation.
6
The service quality of the “High” scenario is perceived to be higher than that of the “Low” scenario.
8. A 2 x 2 Experimental Design: - Quality and Economy (x1 and x2 as independent variables)
Dr. Boonghee Yoo
[email protected]
RMI Distinguished Professor in Business and
Professor of Marketing & International Business
Make changes on the names, labels, and measure on the variable view.
Check the measure.
Have the same keys between “Name” and “Label.”
Run factor analysis for ys (dependent variables).
Select “Principal axis factoring” from “Extraction.”
The two-factor solution seems the best as (1) they are over one eigenvalue each and (2) the variance explained for is over 60%.
The new eigenvalues after the rotation.
The rotated factor matrix is clear.
But note that y3 and y1 are collapsed into one factor.
If ...
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Logistic regression with SPSS
1. LOGISTIC REGRESSION
Presented by
Mr. Vijay Singh Rawat
Ms. Shweta
(Research Scholar)
Ph. D
Course work 2017-18
Lakshmibai National Institute of Physical Education, Gwalior, India
(Deemed to be University)
2. INTRODUCTION
• Logistic regression is a predictive analysis.
• Used in a situation when a researcher is interested to predict
the occurrence of any happenings.
3. Objective of Logistic Regression
• The objective of Logistic regression is to find the best fitting
model to describe the relationship between the dichotomous
characteristics of interest and a set of independent variables.
4. Continuous vs. Categorical variables
• Independent variables (x):
– Continuous: age, income, height- use numerical value.
– Categorical: gender, city, ethnicity – use dummies
• Dependent variable (y):
– continuous: consumption, time spend- use numerical value
– categorical: yes/ no
5. Examples of Binary Outcomes
• Should a bank give a person loan or not.
• What determines admittance into a school.
• Which consumers are more likely to buy a new product.
6. Uses of Logistic Regression
• Prediction of group membership
• It is also provides knowledge of the relationship and strength
among the variables.
• Casual relationship between one or more independent
variables and one binary dependent variables.
• Used to forecast the outcome event.
• Used to predict changes in probabilities.
7. Assumptions
• The relationship between the dependent and independent
variable may be linear or non-linear.
• The outcome variable must be coded as 0 and 1.
• The independent variable do not need to be metric.
• Independent variable linearly related to the log odds.
• It requires quit large sample size.
8. Key terms in Logistic Regression
• Dependent variable
– It is binary in nature.
• Independent variable
– Select the different variables that you expect to influence
the dependent variable.
• Hosmer-lemeshow test
– It is commonly used measure of goodness of fit.
• Odd ratio
– It is the ratio of the probability of success to the probability
of failure.
9. • Classification table
– In this table the observed values for the dependent outcome and
the predicted values are cross classified.
• Maximum likelihood
– Maximum likelihood is the method of finding the least possible
deviation between the observed and predicted values using the
concept of calculus specifically derivatives.
• Logit
– The logit is function which is equal to the log odds of a
variable. If p is a probability that Y=1(occurrence of an event),
then p/(1-p) is corresponding odds. The logit of probability p is
given by
p
p
pLogit
1
log)(
10. Predicting the Probability p
nn xbxbxbbZ .........22110
•bo is the intercept and b1,b2,b3 are the slopes against independent
variables x1 , xn
11. Predicting p with Log(Odds)
zxbb
pˆ1
pˆ
log 110
zxbb
ee
pˆ1
pˆ 10
z
z
xbb
xbb
e1
e
e1
e
pˆ
10
10
By knowing z the probability can be estimatedpˆ
12. Advantage of using Logit Function
3322110)1/( xbxbxbbppIn
+-
0.5
p
1
z
0
Figure 1- Shape of the logistic function
13. Application in Sports Research
• Predicting successful free throw shot in basketball on the basis
of independent variables such as player’s height, accuracy, arm
strength and eye hand coordination etc.
• Predicting winning in football match on the basis of
independent variables like number of passes, number of
turnovers, penalty yardage, fouls committed etc.,
• Finding likelihood of a particular horse finishing first in a
specific race.
14. Logistic Regression with SPSS
Objective: Predicting success in basketball match
____________________________________________
Match Result Number of Offensive Free throws Blocks
Pass rebound throws
1 1 0 1 1 1
2 0 1 0 0 0
3 1 0 1 1 0
4 1 1 0 0 1
5 0 1 1 1 0
6 0 0 0 0 1
7 1 1 0 1 0
8 0 0 1 0 1
9 1 1 0 1 1
10 0 1 1 0 0
11 1 0 0 1 0
12 0 1 0 0 1
13 1 1 1 1 0
14 0 0 0 0 1
15 1 1 1 1 0
16 0 0 0 1 1
17 0 1 1 0 0
18 1 0 0 1 1
19 0 1 1 0 0
20 1 0 0 1 0
21 0 1 1 0 1
22 1 0 0 1 1
__________________________________________________________________
Dependent Variable
Independent Variable
Result in Basketball Match:
1: Win
0:Loose
No. of pass : 1 = lower 0 = higher
Offensive rebound : 1 = lower 0 = higher
Free throws : 1 = lower 0 = higher
Blocks : 1 = lower 0 = higher
Team having average number of pass less than the opponent is coded as 1 and the other as 0.
Similar coding for other variables
- An Illustration
14
15. SPSS Commands for the logistic regression
Step-1 Preparation of Data file
Fig 1 – screen showing variable view for the logistic regression analysis in SPSS
16. Fig 2- screen showing data file for the logistic regression analysis in SPSS
17. Step -2 Initiating command for logistic regression
Fig 3- screen showing of SPSS commands for logistic regression
Analyze Regression Binary logistic
18. Fig 4- screen showing selection of variables for logistic regression
Defining variables
1.Dependent box 2.Covariate box 3.Categorical covariate box
Step -3 Selecting variable for Analysis
19. Step -4 Selecting option for Computation
Fig 5- screen showing option for generating Hosmer-lemeshow goodness of fit and
confidence intervals
CONTINUE
THENOK
20. Step-5 Selecting method for entering independent
variable in logistic regression
A. Confirmatory study
B. Exploratory study
• Clicking the option ok to get the output
Step-6 Getting the output
21. The logistic regression in SPSS is run in two steps
• First step (block 0)
– It includes no predictors and just the intercept.
• Second step (block 1)
– It includes the variable in the analysis and coding of
independent and dependent variable..
22. INTERPRETATIONS OF FINDING
1. Case processing
summary
2. Dependent variable
encoding
3. Categorical variable
coding
Block 0
1. Classification
table(model without
predictors)
2. Variable in the
equation
3. Variable not in the
equation
Block 1
1. Omnibus tests of model
coefficients
2. Model summary
3. Homer –lemeshow test
4. Classification table
(model with predictors)
5. Variable in the equation
(with predictors)
23. A. CASE PROCESSING AND CODING SUMMARY
TABLE 1.1 -Case Processing Summary
Unweighted Casesa N Percent
Selected Cases
Included in Analysis 22 100.0
Missing Cases 0 .0
Total 22 100.0
Unselected Cases 0 .0
Total 22 100.0
a. If weight is in effect, see classification table for the total number of cases.
Table 1.1 shows the number of cases in each category
24. Table 1.2 shown coding of dependent variable
Table 1.2 -Dependent variable encoding
Original Value Internal Value
Losing 0
winning 1
25. Table 1.3-Categorical Variables Coding
frequency
Parameter coding
(1)
number of blocks
lower 12 1.000
higher 10 .000
offensive rebound
lower 12 1.000
higher 10 .000
free throws
lower 10 1.000
higher 12 .000
number of pass
lower 10 1.000
higher 12 .000
Table 1.3 shown coding of categorical variable
26. B. Analyzing Logistics model
Table 1.4 -Classification Table (model without predictor)
Observed Predicted
output Percentage
Correct
losing winning
Step 0
output
losing 0 11 .0
winning 0 11 100.0
Overall Percentage 50.0
a. Constant is included in the model.
b. The cut value is .500
Table 1.4 indicate that without independent variable, one simply guess that particular
team win match and it would be 50% correct of the time.
1. Block 0: logistic model without predictor
27. Table 1.5-Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 0 Constant .000 .426 .000 1 1.000 1.000
Figure 1.6-Variables not in the Equation
Score df Sig.
Step 0
Variables
pass(1) .733 1 .392
rebound(1) 11.733 1 .001
f_throw(1) .733 1 .392
blocks(1) .000 1 1.000
Overall Statistics 11.942 4 .018
Table 1.5 shows that Wald statistics is not significant as its significance value is 1.00,
which is more then 0.05.
Table 1.6 indicates whether each independent variable may improve the model or not.
28. 2. Block 1 logistics model with predictors
(testing significance of the model)
Table 1.7-Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 16.895a .461 .615
a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.
Table 1.7 shown -2 log likelihood statistics and variation in the dependent variable.
Table 1.8-Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 6.834 8 .555
Table 1.8 testing goodness of fit of model with the help of chi-square value.
29. Table 1.9-Classification Tablea
Observed Predicted
output Percentage
Correct
losing winning
Step 1
output
losing 9 2 81.8
winning 1 10 90.9
Overall Percentage 86.4
a. The cut value is .500
Table 1.9 shows the observed and predicted values of the dependent variable.
30. Developing logistic model
Table 1.10-Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1a
pass(1) -.337 1.452 .054 1 .817 .714
rebound(1) 4.190 1.556 7.249 1 .007 65.990
f_throw(1) -.337 1.452 .054 1 .817 .714
blocks(1) .834 1.390 .360 1 .548 2.303
Constant -2.539 1.416 3.213 1 .073 .079
a. Variable(s) entered on step 1: pass, rebound, free throw, blocks.
Table 1.10 shows the value of regression coefficients (B), Wald statistics, its
significance, and odds ratio exp(B) for each variable in both the models.
31. Developing logistic model
Where p is the probability of winning the match.
Note-Only those variable that are found to be significant should be included in
the model but for describing the results comprehensively, other variable have been
included in this model.
Log p/1-p= -2.539 + 0.834 * blocks – 0.337 * free throw
+ 4.190 * offensive rebound -0.337*no. of pass
32. Explanation of odds ratio
In table 1.11, the exp(B) represents the odds ratio for all the
predictors. If the value of the odds ratio is large, its predictive
value is also large.
Since odds ratio = p/1-p = p= odds ratio/1+odds ratio
For offensive rebound, p= 65.99/1+65.99=0.985
This indicate that if a team’s average offensive rebound is more then
this, their probability of winning would be 0.985.
33. Interpretation of the logistic Regression model
Log p/1-p= -2.539 + 0.834 * 1 – 0.337 * 1+ 4.190 * 1 -0.337*0=2.148
Odds ratio= p/1-p=e2.148=8.5677
P= 8.5677/1+8.5677=0.8955
Thus, it may be concluded that the probability of the team A to win in the
match would be 0.8955.