SlideShare a Scribd company logo
1 of 31
Page 1 of 31
NFL 2013 Combine Data Multivariate Analysis
John Michael Croft, Brian Ginburg, Gary Keller and William Ward
Kennesaw State University
Page 2 of 31
Abstract
The purpose of this research is to examine the difference in multiple response variables
between groups of player positions via multivariate methods. Due to exploratory analyses and
data cleansing seeking to reduce multicolinearity among response variables, the final analysis
suggests multivariate normality reducing the probability of Type I errors when compared with a
series of univariate analyses of variances. The analysis provides strong evidence of significant
differences between groups across multiple response variables. Contrasts are utilized to highlight
the most significant differences between Group1 (FS; SS; CB; WR) vs Group 3 (OT; OC; OG;
DT) in response variables: Hands, Bench, Vertical (-.7inches, -11.87 reps, 8.7 inches,
respectively, on average) and Group 3 (OLB, ILB, DE, TE) vs Group 4 (RB) in response
variable: Height (5.97 inches on average).
Page 3 of 31
Exploratory Multivariate Analysis of the NFL Combine Data
The purpose of this analysis is to report findings from 2013 NFL Combine data using a
multivariate approach. All charts, graphs, figures, &c… can be found in the appendices at the
end of the analysis while some have been placed within the body to emphasize the importance of
the topic being addressed. Since 1982, the NFL Combine (an invitation only event) evaluates
college football players’ physical abilities and mental awareness. NFL teams use the results to
make targeted evaluations of draft prospects. Table 1 contains the original dataset variables, a
brief description, general and specific types, and measurement units.
Player positions form the basis of this analysis. Kickers (K), Long snappers (LS), and
Punters (P) are not found in the 2013 data subset, while Quarterbacks (QB) have been omitted
due to lack of observations (n=14<20). Table A displays the initial groups (A - F) prior to the
exploratory analysis and final groups (1 - 4) after the exploratory analysis.
The initial groups above are based on an assumption that players at similar positions have
similar attributes. Tight Ends have been arbitrarily assigned to Group C primarily for group
sample size consistency as well as expecting similar attributes (e.g. height, weight, &c...). The
final groups above will be discussed later but reclassify certain positions to better align with
FS FS
SS SS
CB CB
DE WR
DT DE
LB LB
TE TE
OT OT
OG OG
OC OC
Group E WR DT
Group F RB Group 4 RB
Table A: Player Position Groupings
Group 2
Group B
Group C
Group D
Group 3
Group A
Initial Groups Final Groups
Group 1
Page 4 of 31
adjusted expectations after the exploratory analysis. Significant differences in response variables
due to perceived group attribute differences (e.g. big v. small; fast v. slow; short v. tall) were
expected. Figure 1 shows approximately equal initial group sizes. The global hypothesis expects
significant group differences in at least one response variable.
Data Cleansing
The following variables are considered redundant or inconsequential and have
been omitted from this analysis: College, FirstName, HeightFeet, HeightInches, LastName,
Name, Pick, PickRound, PickTotal, Round, and Year.
Missing values are assumed missing at random and have been set to missing to observe
percent missing per variable and per observation (see Tables 2 & 3). Variables missing more
than 20% were omitted from the analysis: Wonderlic, TwentyYD, ThreeCone, TwentySS.
Observations missing more than 33.34% were omitted from the analysis: ID #’s 9225, 8984,
9107, 9140. All remaining missing values were imputed via linear regression (by position) due to
the Central Limit Theorem (n>30) assuming normality.
While moderate response variable correlations are desirable, significant correlations (>.7)
were examined to reduce multicolinearity and increase the power of the analysis. Table 4 shows
all possible correlations with significant correlations highlighted. All response variables, other
than Hands and Bench, are significantly correlated with at least one other response variable. In
conjunction with evaluating standardized effect sizes (Figure 2), Broad and TenYd have been
omitted from further analysis. Acknowledging FortyYD has marginally higher correlations than
TenYD, assumed industry preference is to keep FortyYD in the analysis.
Page 5 of 31
Figure 2: Initial Group Variable Profile Plot
Assumptions
The initial Mardia’s test (Table 5) suggests non-multivariate normality in the symmetry
(p = .003) with marginal multivariate normality in the distributional spread (p =
.133). Attempting to refine the analysis, individual variables were examined for univariate
normality (Figures 3 - 9). Weight (bimodal), FortyYD (skewed), and Arms (skewed) were
omitted from further analysis due to apparent non-univariate normality. The final Mardia’s test
(Table 6) suggests multivariate normality in both symmetry (p = .293) and distributional spread
(p = .428).
Weight Arms Hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2
HeightInc
hesTotal
Weight 1 0.63631 0.5426 0.88516 0.87387 -0.75372 -0.77584 0.64951 0.71823
Arms 0.63631 1 0.53112 0.48541 0.48194 -0.33836 -0.31332 0.23887 0.76487
Hands 0.5426 0.53112 1 0.46659 0.4461 -0.31323 -0.33992 0.37235 0.5269
fortyyd_2 0.88516 0.48541 0.46659 1 0.93863 -0.8223 -0.85232 0.49038 0.55655
tenyd_2 0.87387 0.48194 0.4461 0.93863 1 -0.81002 -0.83432 0.48381 0.57244
vertical_2 -0.7537 -0.3384 -0.3132 -0.8223 -0.81 1 0.89585 -0.36448 -0.41117
broad_2 -0.7758 -0.3133 -0.3399 -0.85232 -0.8343 0.89585 1 -0.40607 -0.41073
bench_2 0.64951 0.23887 0.37235 0.49038 0.48381 -0.36448 -0.40607 1 0.2994
HeightInchesTotal 0.71823 0.76487 0.5269 0.55655 0.57244 -0.41117 -0.41073 0.2994 1
Table 4: Pearson Correlation Coefficients
COL1
-2
-1
0
1
2
name
Weight Arms Hands Forty Ten Vert Broad Bench Height
Group DB DL LB OL RB WR
Test Estimate Stat pval
Skewness 4.832383 220.3613 0.002581
Kurtosis 101.5743 1.503087 0.132817
Table 5: Initial Mardia's Test
Test Estimate Stat pval
Skewness 0.501283 22.90982 0.293245
Kurtosis 23.33105 -0.793283 0.427613
Table 6: Final Mardia's Test
Page 6 of 31
At this time the reader is reminded of and encouraged to review Table A, delineating the
initial groups (A - F) from the final groups (1 - 4). Figure 10 suggests concerns with variance
homogeneity between the initial groups - the Vertical boxplot is provided as an example. Other
variables’ boxplots suggest similar concerns but have been omitted as redundant. Table 7
supports nonhomogeneous variance between the initial groups (p < .001).
Players were reclassified into final groups (1-4) attemptimg to correct for non-
homogeneous variance. Group 1 is a combination of Groups A plus E; Group 2 is a combination
of Group C plus DE; Group 3 is a combination of Group D plus DT; Group 4 is the same as
Group F. Group sample sizes remain similar (Figure 11). Table 8 supports variance homogeneity
between final groups (p < .552).
Observations are assumed independent from each other as players are measured
separately from one another (i.e. One player’s results do not influence another player’s
results.) Univariate independence is assumed suggesting multivariate independence can be
assumed.
Mahalanobis distances were calculated per observation. An upper limit of 13 was
approximated using the mean and adding three standard deviations (3.9 + 3*(2.9)) to determine
outliers. Five outliers were detected but were not removed due to low marginal impact on the
analysis.
Chi-Square DF Pr > ChiSq
113.146532 50 <.0001
Table 7: MVN Variance Test
Chi-Square DF Pr > ChiSq
28.352169 30 0.5518
Table 8: MVN Variance Test
Page 7 of 31
Results
Tables 9 & 10 contain multivariate analysis of variance test criteria, F-stat
approximations, and characteristic roots. A Wilk’s lambda of .113 indicates at least one group is
significantly different from another for at least one response variable (p < .0001), rejecting the
null hypothesis. Consideration could be given to evaluating our model in one dimension with a
single variable dominating the model (89.63% characteristic root) suggesting Roy’s greatest root
should be the test criteria utilized. However, all test criteria are satisfied to support rejecting the
null hypothesis (p<.0001).
Univariate analyses of variances were analyzed per response variables (Table 11). The
univariate results indicate significant differences between groups per response variable,
suggesting contrasts be analyzed per response variable.
Figure 12 shows a standardized profile plot of the final groups across all remaining
response variables to aid in determining which contrasts to examine.
Figure 12: Final Group Variable Profile Plot
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.1125846 74.43 12 696.12 <.0001
Pillai's Trace 1.2195636 45.38 12 795 <.0001
Hotelling-Lawley
Trace
5.1503363 112.51 12 455.96 <.0001
Roy's Greatest Root 4.6160624 305.81 4 265 <.0001
Table 9: MANOVA Test Criteria & F Approximations
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
COL1
-2
-1
0
1
name
Hands Vert Bench Height
group_2 DB/WR
LB/DE/TE OL/DT
RB
Page 8 of 31
Table 12 summarizes all contrasts consider:
 Vertical: All contrasts significantly different (all p values ≤ .01) except Group 2
vs Group 4 (p = 0.6245) with Group 1 vs Group 3 being most significant (SS =
2970.98, Estimate = 8.70).
 Bench: All contrasts significantly different (all p values < .0001) except Group 2
vs Group 3 (p = .468) with Group 1 vs Group 3 being most significant (SS =
5474.55, Estimate = -11.81).
 Hands: All contrasts significantly different (all p values ≤ .0243) except Group 2
vs Group 3 (p = .6897) with Group 1 vs. Group 3 being most significant (SS =
19.18, Estimate = -0.70).
 Height: All contrasts significantly different (all p values ≤ .0012) with Group 3
vs. Group 4 being most significant (SS = 874.20, Estimate = 5.97).
Contrast Contrast SS Estimate Pr > F
DB/WR vs LB/DE/TE 149.132746 1.9346678 <.0001
DB/WR vs OL/DT 2970.979953 8.69776183 <.0001
DB/WR vs RB 72.29268 1.67462185 0.0014
LB/DE/TE vs OL/DT 1692.05026 6.76309403 <.0001
LB/DE/TE vs RB 1.675503 -0.260046 0.6245
OL/DT vs RB 1211.140559 -7.02314 <.0001
DB/WR vs LB/DE/TE 2159.701381 -7.3623549 <.0001
DB/WR vs OL/DT 5474.545985 -11.806786 <.0001
DB/WR vs RB 1132.773171 -6.6288939 <.0001
LB/DE/TE vs OL/DT 730.726452 -4.4444314 <.0001
LB/DE/TE vs RB 13.329045 0.733461 0.4677
OL/DT vs RB 658.321367 5.1778925 <.0001
DB/WR vs LB/DE/TE 10.55147097 -0.5146078 <.0001
DB/WR vs OL/DT 19.17916754 -0.6988316 <.0001
DB/WR vs RB 0.03911035 0.03895072 0.6897
LB/DE/TE vs OL/DT 1.25549104 -0.1842237 0.0243
LB/DE/TE vs RB 7.59227804 0.55355856 <.0001
OL/DT vs RB 13.36559713 0.7377823 <.0001
DB/WR vs LB/DE/TE 332.3727022 -2.8882353 <.0001
DB/WR vs OL/DT 606.3480271 -3.9293312 <.0001
DB/WR vs RB 107.0115459 2.03744038 <.0001
LB/DE/TE vs OL/DT 40.0962606 -1.0410959 0.0012
LB/DE/TE vs RB 601.1413339 4.92567568 <.0001
OL/DT vs RB 874.1998387 5.96677157 <.0001
Height
Table 12: Contrasts & Estimates
VerticalBenchHands
Page 9 of 31
Conclusion
The analysis supports the expected hypothesized significant differences between groups
of 2013 NFL draft combine participants. The most significant differences are found between
Group 1 vs Group 3 (Vertical; Bench; Hands); i.e. Defensive backs and wide receivers, on
average, jump 8.7 inches higher, bench press 11.87 less reps, and have hands .7 inches less than
offensive linemen and defensive tackles. On average, this is expected due to the nature of
positions within each group – defensive backs and wide receivers are required to be more athletic
overall, running faster longer, jumping higher to catch passes while offensive linemen and
defensive tackles require stamina and stability to pass block and run block constantly coming in
contact with the opposing team.
However, the most significant difference in height is between Group 3 vs Group 4; i.e.
Running backs, on average, are 5.97 inches shorter than offensive linemen and defensive tackles.
On average, this is expected due to the nature of positions within each group – running backs are
required to be more mobile and agile to break tackles, hurdle defenders and outrun the opposing
team while offensive linemen and defensive tackles were discuss above. Additionally defensive
tackles are looking to disrupt passing attempts with maximum vertical extension utilizing the
additional 5.97 inches in height.
Overall, the analysis provide strong evidence toward significant differences between
groups primarily due to the inherent athleticism commonly found within each group allowing
similar within group performances across response variables.
Recommend offensive linemen and defensive tackles focus primarily on stamina and
stability while defensive backs, wide receivers and running backs focus more on mobility and
Page 10 of 31
agility. Linebackers, defensive ends, and tight ends should attempt to focus on some combination
of stamina, stability, mobility and agility as versatility is required at those positions; recommend
heavier players focus on stamina and stability while lighter players focus on mobility and agility.
While linear combinations were not compared, it is noted the groups somewhat achieve
this organically by grouping positions of players with similar size, weight and athleticism.
Future Research
Comparing the results of the current analysis with same players’ production over the first
2-5 years of their career may be of interest (both drafted and undrafted participants) as well as
predicting future combine participant responses. Recommend future studies focus on the
differences among drafted and undrafted combine participants per same response variables.
Additionally, focusing only on drafted combine participants would allow draft picks to be
evaluated as an additional response variable.
Page 11 of 31
Appendix 1: Tables
FS FS
SS SS
CB CB
DE WR
DT DE
LB LB
TE TE
OT OT
OG OG
OC OC
Group E WR DT
Group F RB Group 4 RB
Table A: Player Position Groupings
Group 2
Group B
Group C
Group D
Group 3
Group A
Initial Groups Final Groups
Group 1
Variable Name Discription General Type Specific Type Measurement Units
Arms Length of Arms Quantitative Interval/Ratio Inches
Bench Number of 225 pound reps Quantitative Interval/Ratio Number of reps
Broad Broad Jump Quantitative Interval/Ratio Inches
College College Attended Qualitative Nominal N/A
FirstName First Name Categorical Nominal N/A
FortyYD 40 Yard Dash Time Quantitative Interval/Ratio Seconds
Hands Length of Hands Quantitative Interval/Ratio Inches
HeightFeet Height in Feet Only Quantitative Interval/Ratio Feet
HeightInch Height in Inches Quantitative Interval/Ratio Inches
HeightInches Remaining Inches Quantitative Interval/Ratio Inches
ID ID Number Quantitative Identifier Variable N/A
LastName Last Name Categorical Nominal N/A
Name Player's Name Categorical Nominal N/A
Pick Pick Number in Round and Overall Quantitative Interval/Ratio Pick in Round (Pick in Draft)
PickRound Pick Number in Draft Round Quantitative Interval/Ratio Pick Number in Round
PickTotal Overall Draft Pick Number Quantitative Interval/Ratio Pick Number in Overall Draft
Position Primary Position Categorical Nominal N/A
Round Draft Round Evaluated Quantitative Interval/Ratio Round Number
TenYD First 10 Yards Quantitative Interval/Ratio Seconds
ThreeCone 3 Cone Drill Time Quantitative Interval/Ratio Seconds
TwentySS 20 Yard Shuttle Time Quantitative Interval/Ratio Seconds
TwentyYD First 20 Yards Quantitative Interval/Ratio Seconds
Vertical Vertical Jump Quantitative Interval/Ratio Inches
Weight Weight in Pounds Quantitative Interval/Ratio Pounds
Wonderlic Wonderlic Intelligence Score Quantitative Interval/Ratio Score
Year Combine Year Quantitative Interval/Ratio Year
Table 1: List of Variables in the NFL Combine Data
Page 12 of 31
Variable N N Miss % Miss Individual N N Miss % Miss
Wonderlic 0 287 100.00% 9225 6 6 50.00%
TwentyYD 8 279 97.21% 8984 7 5 41.67%
ThreeCone 205 82 28.57% 9107 7 5 41.67%
TwentySS 219 68 23.69% 9140 7 5 41.67%
Bench 230 57 19.86% 9007 8 4 33.33%
TenYD 248 39 13.59% 9012 8 4 33.33%
Vertical 248 39 13.59% 9018 8 4 33.33%
Broad 255 32 11.15% 9028 8 4 33.33%
FortyYD 272 15 5.23% 9037 8 4 33.33%
Arms 286 1 0.35% 9043 8 4 33.33%
Hands 286 1 0.35% 9058 8 4 33.33%
id 287 0 0.00% 9064 8 4 33.33%
Year 287 0 0.00% 9065 8 4 33.33%
HeightFeet 287 0 0.00% 9083 8 4 33.33%
HeightInches 287 0 0.00% 9095 8 4 33.33%
Weight 287 0 0.00% 9139 8 4 33.33%
HeightInchesTotal 287 0 0.00% 9185 8 4 33.33%
8972 9 3 25.00%
8977 9 3 25.00%
9009 9 3 25.00%
9175 9 3 25.00%
8966 10 2 16.67%
8983 10 2 16.67%
9001 10 2 16.67%
Table 2: Variable Reduction
(>25% missing)
Table 3: Observation Reduction
(>.334 Missing)
Weight Arms Hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2
HeightInc
hesTotal
Weight 1 0.63631 0.5426 0.88516 0.87387 -0.75372 -0.77584 0.64951 0.71823
Arms 0.63631 1 0.53112 0.48541 0.48194 -0.33836 -0.31332 0.23887 0.76487
Hands 0.5426 0.53112 1 0.46659 0.4461 -0.31323 -0.33992 0.37235 0.5269
fortyyd_2 0.88516 0.48541 0.46659 1 0.93863 -0.8223 -0.85232 0.49038 0.55655
tenyd_2 0.87387 0.48194 0.4461 0.93863 1 -0.81002 -0.83432 0.48381 0.57244
vertical_2 -0.7537 -0.3384 -0.3132 -0.8223 -0.81 1 0.89585 -0.36448 -0.41117
broad_2 -0.7758 -0.3133 -0.3399 -0.85232 -0.8343 0.89585 1 -0.40607 -0.41073
bench_2 0.64951 0.23887 0.37235 0.49038 0.48381 -0.36448 -0.40607 1 0.2994
HeightInchesTotal 0.71823 0.76487 0.5269 0.55655 0.57244 -0.41117 -0.41073 0.2994 1
Table 4: Pearson Correlation Coefficients
Page 13 of 31
Test Estimate Stat pval
Skewness 4.832383 220.3613 0.002581
Kurtosis 101.5743 1.503087 0.132817
Table 5: Initial Mardia's Test
Test Estimate Stat pval
Skewness 0.501283 22.90982 0.293245
Kurtosis 23.33105 -0.793283 0.427613
Table 6: Final Mardia's Test
Chi-Square DF Pr > ChiSq
113.146532 50 <.0001
Table 7: MVN Variance Test
Chi-Square DF Pr > ChiSq
28.352169 30 0.5518
Table 8: MVN Variance Test
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.1125846 74.43 12 696.12 <.0001
Pillai's Trace 1.2195636 45.38 12 795 <.0001
Hotelling-Lawley
Trace
5.1503363 112.51 12 455.96 <.0001
Roy's Greatest Root 4.6160624 305.81 4 265 <.0001
Table 9: MANOVA Test Criteria & F Approximations
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
vertical_2 bench_2 Hands
HeightInche
sTotal
4.61606237 89.63 -0.0184358 0.00785156 0.0187705 0.01626727
0.4222601 8.2 0.01059568 -0.0034606 0.0141656 0.02524815
0.1120138 2.17 0.01125293 0.00875417 0.0206764 -0.0031582
0 0 -0.0011242 -0.0037193 0.1251432 -0.0135694
Table 10: Characteristic Roots and Vectors
Characteristic Root Percent
Characteristic Vector V'EV=1
Variable F Value Pr > F
Vertical 156.76 <.0001
Bench 75.01 <.0001
Hands 36.46 <.0001
HeightinInchesTotal 109.42 <.0001
Table 11: Univariate Analysis of Variance
Page 14 of 31
Contrast Contrast SS Estimate Pr > F
DB/WR vs LB/DE/TE 149.132746 1.9346678 <.0001
DB/WR vs OL/DT 2970.979953 8.69776183 <.0001
DB/WR vs RB 72.29268 1.67462185 0.0014
LB/DE/TE vs OL/DT 1692.05026 6.76309403 <.0001
LB/DE/TE vs RB 1.675503 -0.260046 0.6245
OL/DT vs RB 1211.140559 -7.02314 <.0001
DB/WR vs LB/DE/TE 2159.701381 -7.3623549 <.0001
DB/WR vs OL/DT 5474.545985 -11.806786 <.0001
DB/WR vs RB 1132.773171 -6.6288939 <.0001
LB/DE/TE vs OL/DT 730.726452 -4.4444314 <.0001
LB/DE/TE vs RB 13.329045 0.733461 0.4677
OL/DT vs RB 658.321367 5.1778925 <.0001
DB/WR vs LB/DE/TE 10.55147097 -0.5146078 <.0001
DB/WR vs OL/DT 19.17916754 -0.6988316 <.0001
DB/WR vs RB 0.03911035 0.03895072 0.6897
LB/DE/TE vs OL/DT 1.25549104 -0.1842237 0.0243
LB/DE/TE vs RB 7.59227804 0.55355856 <.0001
OL/DT vs RB 13.36559713 0.7377823 <.0001
DB/WR vs LB/DE/TE 332.3727022 -2.8882353 <.0001
DB/WR vs OL/DT 606.3480271 -3.9293312 <.0001
DB/WR vs RB 107.0115459 2.03744038 <.0001
LB/DE/TE vs OL/DT 40.0962606 -1.0410959 0.0012
LB/DE/TE vs RB 601.1413339 4.92567568 <.0001
OL/DT vs RB 874.1998387 5.96677157 <.0001
Height
Table 12: Contrasts & Estimates
VerticalBenchHands
Page 15 of 31
Appendix 2: Figures
Figure 1: Initial Group Frequency Distribution
Figure 2: Initial Group Variable Profile Plot
Figure 3: Forty Yard Time Histogram (in seconds)
COL1
-2
-1
0
1
2
name
Weight Arms Hands Forty Ten Vert Broad Bench Height
Group DB DL LB OL RB WR
Page 16 of 31
Figure 4: Weight Histogram (in pounds)
Figure 5: Bench Press Histogram (# of reps)
Figure 6: Vertical Jump Histogram (in inches)
Page 17 of 31
Figure 7: Hand Length Histogram (in inches)
Figure 8: Height Histogram (in inches)
Figure 9: Arms Histogram (in inches)
Page 18 of 31
Figure 10: Vertical Jump Boxplot (in inches)
Figure 11: Final Group Frequency Distribution
Figure 12: Final Group Variable Profile Plot
COL1
-2
-1
0
1
name
Hands Vert Bench Height
group_2 DB/WR
LB/DE/TE OL/DT
RB
Page 19 of 31
Appendix 3: SAS Code
*============================================================================
============================================*
Create Library and
Read Data to the Library
*============================================================================
============================================*;
libname C13 "ClientF$Stat ClassesCurrentMultivariate Data
AnalysisProject1";
proc import datafile="ClientF$Stat ClassesCurrentMultivariate Data
AnalysisProject1combine.csv"
out=combine
dbms=csv
replace;
getnames=yes;
run;
data C13.combine;
set combine;
run;
*============================================================================
============================================*
Variable Audit
*============================================================================
============================================*;
proc means data = C13.combine;
run;
*============================================================================
============================================*
Set all other 0 Values to missing
*============================================================================
============================================*;
data C13.combine_2 (drop = i);
set C13.combine;
array var{*} arms hands fortyyd twentyyd tenyd twentyss threecone
vertical broad bench round pickround picktotal wonderlic;
do i = 1 to 14;
if var{i} = 0 then var{i} = . ;
end;
run;
proc means data = C13.combine_2 n nmiss min max mean std;
run;
data C13.combine_2 (drop = wonderlic twentyyd threecone twentyss);
set C13.combine_2;
run;
Page 20 of 31
*============================================================================
============================================*
Use a transpose to identify individuals
that have several missing values.
*============================================================================
============================================*;
data temp (drop = college firstname lastname name pick pickround picktotal
round year) ;
set C13.combine_2;
run;
proc transpose data = temp out = transpose;
run;
proc means data = transpose n nmiss;
run;
*============================================================================
============================================*
Remove Individuals with more than 33%
missing values.
*============================================================================
============================================*;
data C13.combine_3;
set C13.combine_2;
if id = 9225 or id = 8984 or id = 9107 or id = 9140 then delete;
run;
proc means data = C13.combine_3 n nmiss;
run;
*============================================================================
============================================*
Need to impute the following variables:
fortyyd tenyd vertcal broad bench
Regression Imputation: use height in inches
weight, and position as predictors
Run Regression Imputation on all 5 to get in one dataset
*============================================================================
============================================*;
proc freq data = C13.combine_3;
tables position;
run;
*** Create Dummy Variables for Postion with QB the base ***;
data C13.combine_3;
set C13.combine_3;
if position = "CB" then CB = 1;
else CB = 0;
Page 21 of 31
if position = "DE" then DE = 1;
else DE = 0;
if position = "DT" then DT = 1;
else DT = 0;
if position = "FS" then FS = 1;
else FS = 0;
if position = "IL" then IL = 1;
else IL = 0;
if position = "OC" then OC = 1;
else OC = 0;
if position = "OG" then OG = 1;
else OG = 0;
if position = "OL" then OL = 1;
else OL = 0;
if position = "OT" then OT = 1;
else OT = 0;
if position = "WR" then WR = 1;
else WR = 0;
if position = "RB" then RB = 1;
else RB = 0;
if position = "SS" then SS = 1;
else SS = 0;
if position = "TE" then TE = 1;
else TE = 0;
run;
*** Regression Imputation ***;
proc reg data = C13.combine_3;
model fortyyd = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF;
output out=impute_1 p=predicted_fortyyd;
run;
quit;
proc reg data = impute_1;
model tenyd = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF;
output out=impute_2 p=predicted_tenyd;
run;
quit;
proc reg data = impute_2;
model vertical = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF;
output out=impute_3 p=predicted_vertical;
run;
quit;
Page 22 of 31
proc reg data = impute_3;
model Broad = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF;
output out=impute_4 p=predicted_broad;
run;
quit;
proc reg data = impute_4;
model Bench = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF;
output out=impute_5 p=predicted_bench;
run;
quit;
data C13.combine_Imputation_GK;
set impute_5;
/*=====================================================
fortyy_2, vertical_2, etc. are the imputed values
*=====================================================*/
if fortyyd = . then fortyyd_2 = predicted_fortyyd;
else fortyyd_2 = fortyyd;
if tenyd = . then tenyd_2 = predicted_tenyd;
else tenyd_2 = tenyd;
if vertical = . then vertical_2 = predicted_vertical;
else vertical_2 = vertical;
if broad = . then broad_2 = predicted_broad;
else broad_2 = broad;
if bench = . then bench_2 = predicted_bench;
else bench_2 = bench;
run;
*============================================================================
===============*
Remove unnecessary variable and create the groups.
*============================================================================
==============*;
data master;
set C13.combine_imputation_gk;
run;
proc freq data = master;
table position;
run;
data master_2 (keep= id name position group weight arms hands fortyYd tenyd
vertical broad bench
heightinchestotal fortyyd_2 tenyd_2
vertical_2 broad_2 bench_2);
Page 23 of 31
set master;
if position = "QB" then delete;
else if Position = "DE" then Group = "DL";
else if Position = "DT" then Group = "DL";
else if Position = "IL" then Group = "LB";
else if Position = "OL" then Group = "LB";
else if Position = "CB" then Group = "DB";
else if Position = "SS" then Group = "DB";
else if Position = "FS" then Group = "DB";
else if Position = "OT" then Group = "OL";
else if Position = "OC" then Group = "OL";
else if Position = "OG" then Group = "OL";
else if Position = "TE" then Group = "LB";
else if Position = "RB" then Group = "RB";
else if Position = "WR" then Group = "WR";
else group = "";
run;
proc freq data = master_2;
tables position*group;
run;
data C13.master;
set master_2;
run;
*============================================================================
===============*
Profile Analysis
*============================================================================
==============*;
*** Standardize the values for each possible Y ***;
proc means data = C13.master;
var weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2
heightinchestotal;
output out = standard mean = avg_weight avg_arms avg_hands avg_forty
avg_ten avg_vert avg_broad avg_bench avg_height
std = std_weight std_arms std_hands std_forty
std_ten std_vert std_broad std_bench std_height;
run;
proc sql;
create table standard_2 as
select *
from C13.master, standard;
quit;
data standard_3 (drop= weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2
bench_2 heightinchestotal
avg_weight avg_arms avg_hands avg_forty
avg_ten avg_vert avg_broad avg_bench avg_height
std_weight std_arms ste_hands std_forty
std_ten std_vert std_broad std_bench std_height
Page 24 of 31
_type_ _freq_ fortyyd tenyd vertical broad
bench name position id);
set standard_2;
s_weight = (weight-avg_weight)/std_weight;
s_arms = (arms-avg_arms)/std_arms;
s_hands = (hands-avg_hands)/std_hands;
s_forty = (fortyyd_2-avg_forty)/std_forty;
s_ten = (tenyd_2-avg_ten)/std_ten;
s_vert = (vertical_2-avg_vert)/std_vert;
s_broad = (broad_2-avg_broad)/std_broad;
s_bench = (bench_2-avg_bench)/std_bench;
s_height = (heightinchestotal-avg_height)/std_height;
run;
*** Obtain the average of the standardized values and plot per group ***;
proc means data = standard_3;
class group;
var s_weight s_arms s_hands s_forty s_ten s_vert s_broad s_bench
s_height;
output out = temp mean = avg_weight avg_arms avg_hands avg_forty
avg_ten avg_vert avg_broad avg_bench avg_height;
run;
data temp2 (drop= _freq_ _type_);
set temp;
run;
proc transpose data = temp2 out=trans;
by group;
run;
proc format;
value varfmt
1 = "Weight"
2 = "Arms"
3 = "Hands"
4 = "Forty"
5 = "Ten"
6 = "Vert"
7 = "Broad"
8 = "Bench"
9 = "Height";
run;
data temp3;
set trans;
if _name_ = "avg_weight" then name = 1;
else if _name_ = "avg_arms" then name = 2;
else if _name_ = "avg_hands" then name = 3;
else if _name_ = "avg_forty" then name = 4;
else if _name_ = "avg_ten" then name = 5;
else if _name_ = "avg_vert" then name = 6;
else if _name_ = "avg_broad" then name = 7;
else if _name_ = "avg_bench" then name = 8;
else if _name_ = "avg_height" then name = 9;
Page 25 of 31
else name = 10;
format name varfmt.;
run;
symbol1 interpol=join value=dot;
proc gplot data = temp3;
plot col1*name=group;
run;
*** Check correlations for vert and broad and ten and forty ***;
proc corr data = C13.master;
var vertical_2 broad_2;
run;
proc corr data = C13.master;
var fortyyd_2 tenyd_2;
run;
*** Drop Broad_2 and Ten_2 ***;
data C13.master_2 (drop= broad_2 tenyd_2 broad tenyd);
set C13.master;
run;
*============================================================================
============================================*
Multivariate Normality Check: Mardia's Kurtosis / Skewness
*============================================================================
============================================*;
%let newinpt= vertical_2 bench_2 hands heightinchestotal;
proc iml;
use C13.master_2;
read all var {&newinpt} into y;
n = nrow(y) ;
p = ncol(y) ;
dfchi = p*(p+1)*(p+2)/6 ;
q = i(n) - (1/n)*j(n,n,1);
s = (1/(n))*y`*q*y ; s_inv = inv(s) ;
g_matrix = q*y*s_inv*y`*q;
beta1hat = ( sum(g_matrix#g_matrix#g_matrix) )/(n*n);
beta2hat =trace( g_matrix#g_matrix )/n ;
k=(p+1)*(n+1)*(n+3)/(n*((n+1)*(p+1)-6));
kappa1 = n*beta1hat*k/6 ;
kappa2 = (beta2hat - p*(p+2) ) /sqrt(8*p*(p+2)/n) ;
pvalskew = 1 - probchi(kappa1,dfchi) ;
pvalkurt = 2*( 1 - probnorm(abs(kappa2)) );
print s ;
print s_inv ;
print 'TESTS:';
print 'Based on skewness: ' beta1hat kappa1 pvalskew ;
print 'Based on kurtosis: ' beta2hat kappa2 pvalkurt;
Page 26 of 31
quit;
*** Macro to look at Univariate Normality ***;
%Macro Hist(var= );
proc univariate data = C13.master_2;
var &var;
histogram;
run;
%Mend;
%Hist (var=fortyyd_2);
%Hist (var=vertical_2);
%Hist (var=bench_2);
%Hist (var=heightinchestotal);
%Hist (var=weight);
%Hist (var=arms);
%Hist (var=hands);
*** Ran several iterations of this test to get a set of variables that
are multivariate normal ***;
data C13.master_3 (drop= fortyyd vertical bench fortyyd_2 weight arms);
set C13.master_2;
run;
*============================================================================
============================================*
Covariance Matrix Structure
*============================================================================
============================================*;
proc discrim data = C13.master_3 pool=test;
class group;
var vertical_2 bench_2 hands heightinchestotal;
run;
*** This assumption is highly violated. Try to group differently ***;
data regroup;
set C13.master_3;
if position = "QB" then delete;
else if Position = "DE" then group_2 = "LB/DE/TE";
else if Position = "DT" then group_2 = "OL/DT";
else if Position = "IL" then group_2 = "LB/DE/TE";
else if Position = "OL" then group_2 = "LB/DE/TE";
else if Position = "CB" then group_2 = "DB/WR";
else if Position = "SS" then group_2 = "DB/WR";
else if Position = "FS" then group_2 = "DB/WR";
else if Position = "OT" then group_2 = "OL/DT";
else if Position = "OC" then group_2 = "OL/DT";
else if Position = "OG" then group_2 = "OL/DT";
else if Position = "TE" then group_2 = "LB/DE/TE";
else if Position = "RB" then group_2 = "RB";
Page 27 of 31
else if Position = "WR" then group_2 = "DB/WR";
else group_2 = "";
run;
proc discrim data = regroup pool=test;
class group_2;
var vertical_2 bench_2 hands heightinchestotal;
run;
data C13.master_4;
set regroup;
run;
*============================================================================
============================================*
Redo Profile Analysis Based on New Groups
*============================================================================
============================================*;
data new_standard;
set c13.master;
if position = "QB" then delete;
else if Position = "DE" then group_2 = "LB/DE/TE";
else if Position = "DT" then group_2 = "OL/DT";
else if Position = "IL" then group_2 = "LB/DE/TE";
else if Position = "OL" then group_2 = "LB/DE/TE";
else if Position = "CB" then group_2 = "DB/WR";
else if Position = "SS" then group_2 = "DB/WR";
else if Position = "FS" then group_2 = "DB/WR";
else if Position = "OT" then group_2 = "OL/DT";
else if Position = "OC" then group_2 = "OL/DT";
else if Position = "OG" then group_2 = "OL/DT";
else if Position = "TE" then group_2 = "LB/DE/TE";
else if Position = "RB" then group_2 = "RB";
else if Position = "WR" then group_2 = "DB/WR";
else group_2 = "";
run;
*** Standardize the values for each possible Y ***;
proc means data = new_standard;
var weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2
heightinchestotal;
output out = standard mean = avg_weight avg_arms avg_hands avg_forty
avg_ten avg_vert avg_broad avg_bench avg_height
std = std_weight std_arms std_hands std_forty
std_ten std_vert std_broad std_bench std_height;
run;
proc sql;
create table standard_2 as
select *
from new_standard, standard;
quit;
Page 28 of 31
data standard_3 (drop= weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2
bench_2 heightinchestotal
avg_weight avg_arms avg_hands avg_forty
avg_ten avg_vert avg_broad avg_bench avg_height
std_weight std_arms ste_hands std_forty
std_ten std_vert std_broad std_bench std_height
_type_ _freq_ fortyyd tenyd vertical broad
bench name position id);
set standard_2;
s_weight = (weight-avg_weight)/std_weight;
s_arms = (arms-avg_arms)/std_arms;
s_hands = (hands-avg_hands)/std_hands;
s_forty = (fortyyd_2-avg_forty)/std_forty;
s_ten = (tenyd_2-avg_ten)/std_ten;
s_vert = (vertical_2-avg_vert)/std_vert;
s_broad = (broad_2-avg_broad)/std_broad;
s_bench = (bench_2-avg_bench)/std_bench;
s_height = (heightinchestotal-avg_height)/std_height;
run;
*** Obtain the average of the standardized values and plot per group ***;
proc means data = standard_3;
class group_2;
var s_weight s_arms s_hands s_forty s_ten s_vert s_broad s_bench
s_height;
output out = temp mean = avg_weight avg_arms avg_hands avg_forty
avg_ten avg_vert avg_broad avg_bench avg_height;
run;
data temp2 (drop= _freq_ _type_);
set temp;
run;
proc transpose data = temp2 out=trans;
by group_2;
run;
proc format;
value varfmt
1 = "Weight"
2 = "Arms"
3 = "Hands"
4 = "Forty"
5 = "Ten"
6 = "Vert"
7 = "Broad"
8 = "Bench"
9 = "Height";
run;
data temp3;
set trans;
if _name_ = "avg_weight" then name = 1;
else if _name_ = "avg_arms" then name = 2;
else if _name_ = "avg_hands" then name = 3;
Page 29 of 31
else if _name_ = "avg_forty" then name = 4;
else if _name_ = "avg_ten" then name = 5;
else if _name_ = "avg_vert" then name = 6;
else if _name_ = "avg_broad" then name = 7;
else if _name_ = "avg_bench" then name = 8;
else if _name_ = "avg_height" then name = 9;
else name = 10;
format name varfmt.;
run;
symbol1 interpol=join value=dot;
proc gplot data = temp3;
plot col1*name=group_2;
run;
*** Profile Analysis Leads to the Same Y's to remove
Move on to Outlier Detection and MANOVA ***;
*============================================================================
============================================*
Check for Outliers
*============================================================================
============================================*;
%INCLUDE "ClientF$Stat ClassesCurrentMultivariate Data
AnalysisProject1mnorm.sas";
*EXAMPLE 1;
%MNORM(DATA=C13.master_4,CLASS=Group_2 ,RESPONSE=vertical_2 bench_2 hands
heightinchestotal ,ID=id)
proc means data = C13.master_4_mnorm mean median std;
var MNORM_SMD;
run;
*** Mean is about 3.94 and STD is about 3.07 ***;
data outlier;
set C13.master_4_mnorm;
if MNORM_SMD > 3.94 + (3*3.07) then Outlier = 1;
else outlier = 0;
run;
proc sort data = outlier;
by descending MNORM_SMD;
run;
proc print data = outlier (obs=20);
var ID name MNORM_SMD outlier;
run;
*** Limited Outliers (only 5) Assumption met ***;
*============================================================================
============================================*
Page 30 of 31
Profile Analysis Pre-MANOVA
*============================================================================
============================================*;
*** Standardize the values for each possible Y ***;
proc means data = C13.master_4;
var hands vertical_2 bench_2 heightinchestotal;
output out = standard mean = avg_hands avg_vert avg_bench avg_height
std = std_hands std_vert std_bench
std_height;
run;
proc sql;
create table standard_2 as
select *
from C13.master_4, standard;
quit;
data standard_3;
set standard_2;
s_hands = (hands-avg_hands)/std_hands;
s_vert = (vertical_2-avg_vert)/std_vert;
s_bench = (bench_2-avg_bench)/std_bench;
s_height = (heightinchestotal-avg_height)/std_height;
run;
*** Obtain the average of the standardized values and plot per group ***;
proc means data = standard_3;
class group_2;
var s_hands s_vert s_bench s_height;
output out = temp mean = avg_hands avg_vert avg_bench avg_height;
run;
data temp2 (drop= _freq_ _type_);
set temp;
run;
proc transpose data = temp2 out=trans;
by group_2;
run;
proc format;
value re_varfmt
1 = "Hands"
2 = "Vert"
3 = "Bench"
4 = "Height";
run;
data temp3;
set trans;
if _name_ = "avg_hands" then name = 1;
else if _name_ = "avg_vert" then name = 2;
else if _name_ = "avg_bench" then name = 3;
Page 31 of 31
else if _name_ = "avg_height" then name = 4;
format name re_varfmt.;
run;
symbol1 interpol=join value=dot;
proc gplot data = temp3;
plot col1*name=group_2;
run;
*============================================================================
============================================*
MANOVA
*============================================================================
============================================*;
proc sort data = C13.master_4 out=test;
by group_2;
run;
/*==================*
Order of Groups
"DB/WR"
"LB/DE/TE"
"OL/DT"
"RB"
*==================*/
proc glm data = C13.master_4;
class group_2;
model vertical_2 bench_2 hands heightinchestotal = group_2;
manova h = group_2;
contrast "DB/WR vs LB/DE/TE" group_2 1 -1 0 0;
contrast "DB/WR vs OL/DT" group_2 1 0 -1 0;
contrast "DB/WR vs RB" group_2 1 0 0 -1;
contrast "LB/DE/TE vs OL/DT" group_2 0 1 -1 0;
contrast "LB/DE/TE vs RB" group_2 0 1 0 -1;
contrast "OL/DT vs RB" group_2 0 0 1 -1; MANOVA H = _ALL_;
estimate "DB/WR vs LB/DE/TE" group_2 1 -1 0 0;
estimate "DB/WR vs OL/DT" group_2 1 0 -1 0;
estimate "DB/WR vs RB" group_2 1 0 0 -1;
estimate "LB/DE/TE vs OL/DT" group_2 0 1 -1 0;
estimate "LB/DE/TE vs RB" group_2 0 1 0 -1;
estimate "OL/DT vs RB" group_2 0 0 1 -1;
run;

More Related Content

What's hot

Reporting a multiple linear regression in APA
Reporting a multiple linear regression in APAReporting a multiple linear regression in APA
Reporting a multiple linear regression in APAAmit Sharma
 
F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)Marianne Maluyo
 
A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...
A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...
A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...Diego Saldana
 
Stats Ch 3 worksheet
Stats Ch 3 worksheetStats Ch 3 worksheet
Stats Ch 3 worksheetopilipets
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI posterFunan Shi
 
Anova one way sem 1 20142015 dk
Anova one way sem 1 20142015 dkAnova one way sem 1 20142015 dk
Anova one way sem 1 20142015 dkSyifa' Humaira
 
One-way ANOVA research paper
One-way ANOVA research paperOne-way ANOVA research paper
One-way ANOVA research paperJose Dela Cruz
 
Two-way Mixed Design with SPSS
Two-way Mixed Design with SPSSTwo-way Mixed Design with SPSS
Two-way Mixed Design with SPSSJ P Verma
 
Reporting a paired sample t -test
Reporting a paired sample t -testReporting a paired sample t -test
Reporting a paired sample t -testAmit Sharma
 
[GeertvanKollenburg]-firstyearpaper
[GeertvanKollenburg]-firstyearpaper[GeertvanKollenburg]-firstyearpaper
[GeertvanKollenburg]-firstyearpaperGeert van Kollenburg
 
Reporting a Factorial ANOVA
Reporting a Factorial ANOVAReporting a Factorial ANOVA
Reporting a Factorial ANOVAKen Plummer
 
Kathleen Preston Jan 9, 2009 Presentation
Kathleen Preston Jan 9, 2009 PresentationKathleen Preston Jan 9, 2009 Presentation
Kathleen Preston Jan 9, 2009 Presentationguestbeb22e
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of varianceAya Christeen
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of varianceJudianto Nugroho
 
Mixed between-within groups ANOVA
Mixed between-within groups ANOVAMixed between-within groups ANOVA
Mixed between-within groups ANOVAMahsa Farahanynia
 

What's hot (18)

Reporting a multiple linear regression in APA
Reporting a multiple linear regression in APAReporting a multiple linear regression in APA
Reporting a multiple linear regression in APA
 
F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)
 
A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...
A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...
A Brief Introduction to the Temporal Group LASSO and its Potential Applicatio...
 
Stats Ch 3 worksheet
Stats Ch 3 worksheetStats Ch 3 worksheet
Stats Ch 3 worksheet
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI poster
 
Anova one way sem 1 20142015 dk
Anova one way sem 1 20142015 dkAnova one way sem 1 20142015 dk
Anova one way sem 1 20142015 dk
 
One-way ANOVA research paper
One-way ANOVA research paperOne-way ANOVA research paper
One-way ANOVA research paper
 
Two-way Mixed Design with SPSS
Two-way Mixed Design with SPSSTwo-way Mixed Design with SPSS
Two-way Mixed Design with SPSS
 
Reporting a paired sample t -test
Reporting a paired sample t -testReporting a paired sample t -test
Reporting a paired sample t -test
 
[GeertvanKollenburg]-firstyearpaper
[GeertvanKollenburg]-firstyearpaper[GeertvanKollenburg]-firstyearpaper
[GeertvanKollenburg]-firstyearpaper
 
Reporting a Factorial ANOVA
Reporting a Factorial ANOVAReporting a Factorial ANOVA
Reporting a Factorial ANOVA
 
Kathleen Preston Jan 9, 2009 Presentation
Kathleen Preston Jan 9, 2009 PresentationKathleen Preston Jan 9, 2009 Presentation
Kathleen Preston Jan 9, 2009 Presentation
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of variance
 
Mixed between-within groups ANOVA
Mixed between-within groups ANOVAMixed between-within groups ANOVA
Mixed between-within groups ANOVA
 
Lec 4 random sampling
Lec 4 random samplingLec 4 random sampling
Lec 4 random sampling
 

Viewers also liked

SAS Data Mining - Crime Modeling
SAS Data Mining - Crime ModelingSAS Data Mining - Crime Modeling
SAS Data Mining - Crime ModelingJohn Michael Croft
 
Risk evaluation for non-equity investments
Risk evaluation for non-equity investmentsRisk evaluation for non-equity investments
Risk evaluation for non-equity investmentsMattia Zanella
 
How mathematicians predict the future?
How mathematicians predict the future?How mathematicians predict the future?
How mathematicians predict the future?Mattia Zanella
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentationArun Kumar
 
PS CH 10 matter properties and changes edited
PS CH 10 matter properties and changes editedPS CH 10 matter properties and changes edited
PS CH 10 matter properties and changes editedEsther Herrera
 
Chapter36a
Chapter36aChapter36a
Chapter36aYing Liu
 
Level of Measurement, Frequency Distribution,Stem & Leaf
Level of Measurement, Frequency Distribution,Stem & Leaf   Level of Measurement, Frequency Distribution,Stem & Leaf
Level of Measurement, Frequency Distribution,Stem & Leaf Qasim Raza
 
Test for equal variances
Test for equal variancesTest for equal variances
Test for equal variancesJohn Smith
 
Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015pdiddyboy2
 
Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...
Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...
Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...SKIM
 

Viewers also liked (20)

S8110Croft
S8110CroftS8110Croft
S8110Croft
 
SAS Day Poster 2016
SAS Day Poster 2016SAS Day Poster 2016
SAS Day Poster 2016
 
8225 project report (2) (1)
8225 project report (2) (1)8225 project report (2) (1)
8225 project report (2) (1)
 
Homework 3 Write Up vJMC
Homework 3 Write Up vJMCHomework 3 Write Up vJMC
Homework 3 Write Up vJMC
 
R day
R dayR day
R day
 
SAS Data Mining - Crime Modeling
SAS Data Mining - Crime ModelingSAS Data Mining - Crime Modeling
SAS Data Mining - Crime Modeling
 
Risk evaluation for non-equity investments
Risk evaluation for non-equity investmentsRisk evaluation for non-equity investments
Risk evaluation for non-equity investments
 
How mathematicians predict the future?
How mathematicians predict the future?How mathematicians predict the future?
How mathematicians predict the future?
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentation
 
Ch 2 data analysis
Ch 2 data analysisCh 2 data analysis
Ch 2 data analysis
 
HTML for Education
HTML for EducationHTML for Education
HTML for Education
 
PS CH 10 matter properties and changes edited
PS CH 10 matter properties and changes editedPS CH 10 matter properties and changes edited
PS CH 10 matter properties and changes edited
 
Chapter36a
Chapter36aChapter36a
Chapter36a
 
Chap017
Chap017Chap017
Chap017
 
Level of Measurement, Frequency Distribution,Stem & Leaf
Level of Measurement, Frequency Distribution,Stem & Leaf   Level of Measurement, Frequency Distribution,Stem & Leaf
Level of Measurement, Frequency Distribution,Stem & Leaf
 
121 vhgfhg
121 vhgfhg121 vhgfhg
121 vhgfhg
 
Test for equal variances
Test for equal variancesTest for equal variances
Test for equal variances
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015
 
Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...
Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...
Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative...
 

Similar to NFL 2013 Combine Data Multivariate Analysis

ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOIANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOIprasad439227
 
Quality Engineering material
Quality Engineering materialQuality Engineering material
Quality Engineering materialTeluguSudhakar3
 
3010l8.pdf
3010l8.pdf3010l8.pdf
3010l8.pdfdawitg2
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminHazilahMohd
 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundationssmackinnon
 
Wooldridge (2011), Difference in difference estimation.pdf
Wooldridge (2011), Difference in difference estimation.pdfWooldridge (2011), Difference in difference estimation.pdf
Wooldridge (2011), Difference in difference estimation.pdfYESICANATALYHUAMANCA
 
5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx
5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx
5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docxtroutmanboris
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysiskrishnadk
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATAorajjournal
 

Similar to NFL 2013 Combine Data Multivariate Analysis (20)

anova.ppt
anova.pptanova.ppt
anova.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOIANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVA.ppt
ANOVA.pptANOVA.ppt
ANOVA.ppt
 
Quality Engineering material
Quality Engineering materialQuality Engineering material
Quality Engineering material
 
Anova.ppt
Anova.pptAnova.ppt
Anova.ppt
 
3010l8.pdf
3010l8.pdf3010l8.pdf
3010l8.pdf
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd Amin
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Stat2013
Stat2013Stat2013
Stat2013
 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundations
 
Wooldridge (2011), Difference in difference estimation.pdf
Wooldridge (2011), Difference in difference estimation.pdfWooldridge (2011), Difference in difference estimation.pdf
Wooldridge (2011), Difference in difference estimation.pdf
 
5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx
5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx
5DDBA 8307 Week 6 Assignment Template – Multiple Regression.docx
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysis
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
 
201977 1-1-3-pb
201977 1-1-3-pb201977 1-1-3-pb
201977 1-1-3-pb
 

More from John Michael Croft

More from John Michael Croft (9)

Final Project Final Doc
Final Project Final DocFinal Project Final Doc
Final Project Final Doc
 
HW7 Memo
HW7 MemoHW7 Memo
HW7 Memo
 
Sweden Final Copy
Sweden Final CopySweden Final Copy
Sweden Final Copy
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores Final
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Final NBA Power Point
Final NBA Power PointFinal NBA Power Point
Final NBA Power Point
 
River Forest ppoint for Lenders
River Forest ppoint for LendersRiver Forest ppoint for Lenders
River Forest ppoint for Lenders
 
River Forest ppoint for investors
River Forest ppoint for investorsRiver Forest ppoint for investors
River Forest ppoint for investors
 
Econ club by laws
Econ club by lawsEcon club by laws
Econ club by laws
 

NFL 2013 Combine Data Multivariate Analysis

  • 1. Page 1 of 31 NFL 2013 Combine Data Multivariate Analysis John Michael Croft, Brian Ginburg, Gary Keller and William Ward Kennesaw State University
  • 2. Page 2 of 31 Abstract The purpose of this research is to examine the difference in multiple response variables between groups of player positions via multivariate methods. Due to exploratory analyses and data cleansing seeking to reduce multicolinearity among response variables, the final analysis suggests multivariate normality reducing the probability of Type I errors when compared with a series of univariate analyses of variances. The analysis provides strong evidence of significant differences between groups across multiple response variables. Contrasts are utilized to highlight the most significant differences between Group1 (FS; SS; CB; WR) vs Group 3 (OT; OC; OG; DT) in response variables: Hands, Bench, Vertical (-.7inches, -11.87 reps, 8.7 inches, respectively, on average) and Group 3 (OLB, ILB, DE, TE) vs Group 4 (RB) in response variable: Height (5.97 inches on average).
  • 3. Page 3 of 31 Exploratory Multivariate Analysis of the NFL Combine Data The purpose of this analysis is to report findings from 2013 NFL Combine data using a multivariate approach. All charts, graphs, figures, &c… can be found in the appendices at the end of the analysis while some have been placed within the body to emphasize the importance of the topic being addressed. Since 1982, the NFL Combine (an invitation only event) evaluates college football players’ physical abilities and mental awareness. NFL teams use the results to make targeted evaluations of draft prospects. Table 1 contains the original dataset variables, a brief description, general and specific types, and measurement units. Player positions form the basis of this analysis. Kickers (K), Long snappers (LS), and Punters (P) are not found in the 2013 data subset, while Quarterbacks (QB) have been omitted due to lack of observations (n=14<20). Table A displays the initial groups (A - F) prior to the exploratory analysis and final groups (1 - 4) after the exploratory analysis. The initial groups above are based on an assumption that players at similar positions have similar attributes. Tight Ends have been arbitrarily assigned to Group C primarily for group sample size consistency as well as expecting similar attributes (e.g. height, weight, &c...). The final groups above will be discussed later but reclassify certain positions to better align with FS FS SS SS CB CB DE WR DT DE LB LB TE TE OT OT OG OG OC OC Group E WR DT Group F RB Group 4 RB Table A: Player Position Groupings Group 2 Group B Group C Group D Group 3 Group A Initial Groups Final Groups Group 1
  • 4. Page 4 of 31 adjusted expectations after the exploratory analysis. Significant differences in response variables due to perceived group attribute differences (e.g. big v. small; fast v. slow; short v. tall) were expected. Figure 1 shows approximately equal initial group sizes. The global hypothesis expects significant group differences in at least one response variable. Data Cleansing The following variables are considered redundant or inconsequential and have been omitted from this analysis: College, FirstName, HeightFeet, HeightInches, LastName, Name, Pick, PickRound, PickTotal, Round, and Year. Missing values are assumed missing at random and have been set to missing to observe percent missing per variable and per observation (see Tables 2 & 3). Variables missing more than 20% were omitted from the analysis: Wonderlic, TwentyYD, ThreeCone, TwentySS. Observations missing more than 33.34% were omitted from the analysis: ID #’s 9225, 8984, 9107, 9140. All remaining missing values were imputed via linear regression (by position) due to the Central Limit Theorem (n>30) assuming normality. While moderate response variable correlations are desirable, significant correlations (>.7) were examined to reduce multicolinearity and increase the power of the analysis. Table 4 shows all possible correlations with significant correlations highlighted. All response variables, other than Hands and Bench, are significantly correlated with at least one other response variable. In conjunction with evaluating standardized effect sizes (Figure 2), Broad and TenYd have been omitted from further analysis. Acknowledging FortyYD has marginally higher correlations than TenYD, assumed industry preference is to keep FortyYD in the analysis.
  • 5. Page 5 of 31 Figure 2: Initial Group Variable Profile Plot Assumptions The initial Mardia’s test (Table 5) suggests non-multivariate normality in the symmetry (p = .003) with marginal multivariate normality in the distributional spread (p = .133). Attempting to refine the analysis, individual variables were examined for univariate normality (Figures 3 - 9). Weight (bimodal), FortyYD (skewed), and Arms (skewed) were omitted from further analysis due to apparent non-univariate normality. The final Mardia’s test (Table 6) suggests multivariate normality in both symmetry (p = .293) and distributional spread (p = .428). Weight Arms Hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2 HeightInc hesTotal Weight 1 0.63631 0.5426 0.88516 0.87387 -0.75372 -0.77584 0.64951 0.71823 Arms 0.63631 1 0.53112 0.48541 0.48194 -0.33836 -0.31332 0.23887 0.76487 Hands 0.5426 0.53112 1 0.46659 0.4461 -0.31323 -0.33992 0.37235 0.5269 fortyyd_2 0.88516 0.48541 0.46659 1 0.93863 -0.8223 -0.85232 0.49038 0.55655 tenyd_2 0.87387 0.48194 0.4461 0.93863 1 -0.81002 -0.83432 0.48381 0.57244 vertical_2 -0.7537 -0.3384 -0.3132 -0.8223 -0.81 1 0.89585 -0.36448 -0.41117 broad_2 -0.7758 -0.3133 -0.3399 -0.85232 -0.8343 0.89585 1 -0.40607 -0.41073 bench_2 0.64951 0.23887 0.37235 0.49038 0.48381 -0.36448 -0.40607 1 0.2994 HeightInchesTotal 0.71823 0.76487 0.5269 0.55655 0.57244 -0.41117 -0.41073 0.2994 1 Table 4: Pearson Correlation Coefficients COL1 -2 -1 0 1 2 name Weight Arms Hands Forty Ten Vert Broad Bench Height Group DB DL LB OL RB WR Test Estimate Stat pval Skewness 4.832383 220.3613 0.002581 Kurtosis 101.5743 1.503087 0.132817 Table 5: Initial Mardia's Test Test Estimate Stat pval Skewness 0.501283 22.90982 0.293245 Kurtosis 23.33105 -0.793283 0.427613 Table 6: Final Mardia's Test
  • 6. Page 6 of 31 At this time the reader is reminded of and encouraged to review Table A, delineating the initial groups (A - F) from the final groups (1 - 4). Figure 10 suggests concerns with variance homogeneity between the initial groups - the Vertical boxplot is provided as an example. Other variables’ boxplots suggest similar concerns but have been omitted as redundant. Table 7 supports nonhomogeneous variance between the initial groups (p < .001). Players were reclassified into final groups (1-4) attemptimg to correct for non- homogeneous variance. Group 1 is a combination of Groups A plus E; Group 2 is a combination of Group C plus DE; Group 3 is a combination of Group D plus DT; Group 4 is the same as Group F. Group sample sizes remain similar (Figure 11). Table 8 supports variance homogeneity between final groups (p < .552). Observations are assumed independent from each other as players are measured separately from one another (i.e. One player’s results do not influence another player’s results.) Univariate independence is assumed suggesting multivariate independence can be assumed. Mahalanobis distances were calculated per observation. An upper limit of 13 was approximated using the mean and adding three standard deviations (3.9 + 3*(2.9)) to determine outliers. Five outliers were detected but were not removed due to low marginal impact on the analysis. Chi-Square DF Pr > ChiSq 113.146532 50 <.0001 Table 7: MVN Variance Test Chi-Square DF Pr > ChiSq 28.352169 30 0.5518 Table 8: MVN Variance Test
  • 7. Page 7 of 31 Results Tables 9 & 10 contain multivariate analysis of variance test criteria, F-stat approximations, and characteristic roots. A Wilk’s lambda of .113 indicates at least one group is significantly different from another for at least one response variable (p < .0001), rejecting the null hypothesis. Consideration could be given to evaluating our model in one dimension with a single variable dominating the model (89.63% characteristic root) suggesting Roy’s greatest root should be the test criteria utilized. However, all test criteria are satisfied to support rejecting the null hypothesis (p<.0001). Univariate analyses of variances were analyzed per response variables (Table 11). The univariate results indicate significant differences between groups per response variable, suggesting contrasts be analyzed per response variable. Figure 12 shows a standardized profile plot of the final groups across all remaining response variables to aid in determining which contrasts to examine. Figure 12: Final Group Variable Profile Plot Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.1125846 74.43 12 696.12 <.0001 Pillai's Trace 1.2195636 45.38 12 795 <.0001 Hotelling-Lawley Trace 5.1503363 112.51 12 455.96 <.0001 Roy's Greatest Root 4.6160624 305.81 4 265 <.0001 Table 9: MANOVA Test Criteria & F Approximations NOTE: F Statistic for Roy's Greatest Root is an upper bound. COL1 -2 -1 0 1 name Hands Vert Bench Height group_2 DB/WR LB/DE/TE OL/DT RB
  • 8. Page 8 of 31 Table 12 summarizes all contrasts consider:  Vertical: All contrasts significantly different (all p values ≤ .01) except Group 2 vs Group 4 (p = 0.6245) with Group 1 vs Group 3 being most significant (SS = 2970.98, Estimate = 8.70).  Bench: All contrasts significantly different (all p values < .0001) except Group 2 vs Group 3 (p = .468) with Group 1 vs Group 3 being most significant (SS = 5474.55, Estimate = -11.81).  Hands: All contrasts significantly different (all p values ≤ .0243) except Group 2 vs Group 3 (p = .6897) with Group 1 vs. Group 3 being most significant (SS = 19.18, Estimate = -0.70).  Height: All contrasts significantly different (all p values ≤ .0012) with Group 3 vs. Group 4 being most significant (SS = 874.20, Estimate = 5.97). Contrast Contrast SS Estimate Pr > F DB/WR vs LB/DE/TE 149.132746 1.9346678 <.0001 DB/WR vs OL/DT 2970.979953 8.69776183 <.0001 DB/WR vs RB 72.29268 1.67462185 0.0014 LB/DE/TE vs OL/DT 1692.05026 6.76309403 <.0001 LB/DE/TE vs RB 1.675503 -0.260046 0.6245 OL/DT vs RB 1211.140559 -7.02314 <.0001 DB/WR vs LB/DE/TE 2159.701381 -7.3623549 <.0001 DB/WR vs OL/DT 5474.545985 -11.806786 <.0001 DB/WR vs RB 1132.773171 -6.6288939 <.0001 LB/DE/TE vs OL/DT 730.726452 -4.4444314 <.0001 LB/DE/TE vs RB 13.329045 0.733461 0.4677 OL/DT vs RB 658.321367 5.1778925 <.0001 DB/WR vs LB/DE/TE 10.55147097 -0.5146078 <.0001 DB/WR vs OL/DT 19.17916754 -0.6988316 <.0001 DB/WR vs RB 0.03911035 0.03895072 0.6897 LB/DE/TE vs OL/DT 1.25549104 -0.1842237 0.0243 LB/DE/TE vs RB 7.59227804 0.55355856 <.0001 OL/DT vs RB 13.36559713 0.7377823 <.0001 DB/WR vs LB/DE/TE 332.3727022 -2.8882353 <.0001 DB/WR vs OL/DT 606.3480271 -3.9293312 <.0001 DB/WR vs RB 107.0115459 2.03744038 <.0001 LB/DE/TE vs OL/DT 40.0962606 -1.0410959 0.0012 LB/DE/TE vs RB 601.1413339 4.92567568 <.0001 OL/DT vs RB 874.1998387 5.96677157 <.0001 Height Table 12: Contrasts & Estimates VerticalBenchHands
  • 9. Page 9 of 31 Conclusion The analysis supports the expected hypothesized significant differences between groups of 2013 NFL draft combine participants. The most significant differences are found between Group 1 vs Group 3 (Vertical; Bench; Hands); i.e. Defensive backs and wide receivers, on average, jump 8.7 inches higher, bench press 11.87 less reps, and have hands .7 inches less than offensive linemen and defensive tackles. On average, this is expected due to the nature of positions within each group – defensive backs and wide receivers are required to be more athletic overall, running faster longer, jumping higher to catch passes while offensive linemen and defensive tackles require stamina and stability to pass block and run block constantly coming in contact with the opposing team. However, the most significant difference in height is between Group 3 vs Group 4; i.e. Running backs, on average, are 5.97 inches shorter than offensive linemen and defensive tackles. On average, this is expected due to the nature of positions within each group – running backs are required to be more mobile and agile to break tackles, hurdle defenders and outrun the opposing team while offensive linemen and defensive tackles were discuss above. Additionally defensive tackles are looking to disrupt passing attempts with maximum vertical extension utilizing the additional 5.97 inches in height. Overall, the analysis provide strong evidence toward significant differences between groups primarily due to the inherent athleticism commonly found within each group allowing similar within group performances across response variables. Recommend offensive linemen and defensive tackles focus primarily on stamina and stability while defensive backs, wide receivers and running backs focus more on mobility and
  • 10. Page 10 of 31 agility. Linebackers, defensive ends, and tight ends should attempt to focus on some combination of stamina, stability, mobility and agility as versatility is required at those positions; recommend heavier players focus on stamina and stability while lighter players focus on mobility and agility. While linear combinations were not compared, it is noted the groups somewhat achieve this organically by grouping positions of players with similar size, weight and athleticism. Future Research Comparing the results of the current analysis with same players’ production over the first 2-5 years of their career may be of interest (both drafted and undrafted participants) as well as predicting future combine participant responses. Recommend future studies focus on the differences among drafted and undrafted combine participants per same response variables. Additionally, focusing only on drafted combine participants would allow draft picks to be evaluated as an additional response variable.
  • 11. Page 11 of 31 Appendix 1: Tables FS FS SS SS CB CB DE WR DT DE LB LB TE TE OT OT OG OG OC OC Group E WR DT Group F RB Group 4 RB Table A: Player Position Groupings Group 2 Group B Group C Group D Group 3 Group A Initial Groups Final Groups Group 1 Variable Name Discription General Type Specific Type Measurement Units Arms Length of Arms Quantitative Interval/Ratio Inches Bench Number of 225 pound reps Quantitative Interval/Ratio Number of reps Broad Broad Jump Quantitative Interval/Ratio Inches College College Attended Qualitative Nominal N/A FirstName First Name Categorical Nominal N/A FortyYD 40 Yard Dash Time Quantitative Interval/Ratio Seconds Hands Length of Hands Quantitative Interval/Ratio Inches HeightFeet Height in Feet Only Quantitative Interval/Ratio Feet HeightInch Height in Inches Quantitative Interval/Ratio Inches HeightInches Remaining Inches Quantitative Interval/Ratio Inches ID ID Number Quantitative Identifier Variable N/A LastName Last Name Categorical Nominal N/A Name Player's Name Categorical Nominal N/A Pick Pick Number in Round and Overall Quantitative Interval/Ratio Pick in Round (Pick in Draft) PickRound Pick Number in Draft Round Quantitative Interval/Ratio Pick Number in Round PickTotal Overall Draft Pick Number Quantitative Interval/Ratio Pick Number in Overall Draft Position Primary Position Categorical Nominal N/A Round Draft Round Evaluated Quantitative Interval/Ratio Round Number TenYD First 10 Yards Quantitative Interval/Ratio Seconds ThreeCone 3 Cone Drill Time Quantitative Interval/Ratio Seconds TwentySS 20 Yard Shuttle Time Quantitative Interval/Ratio Seconds TwentyYD First 20 Yards Quantitative Interval/Ratio Seconds Vertical Vertical Jump Quantitative Interval/Ratio Inches Weight Weight in Pounds Quantitative Interval/Ratio Pounds Wonderlic Wonderlic Intelligence Score Quantitative Interval/Ratio Score Year Combine Year Quantitative Interval/Ratio Year Table 1: List of Variables in the NFL Combine Data
  • 12. Page 12 of 31 Variable N N Miss % Miss Individual N N Miss % Miss Wonderlic 0 287 100.00% 9225 6 6 50.00% TwentyYD 8 279 97.21% 8984 7 5 41.67% ThreeCone 205 82 28.57% 9107 7 5 41.67% TwentySS 219 68 23.69% 9140 7 5 41.67% Bench 230 57 19.86% 9007 8 4 33.33% TenYD 248 39 13.59% 9012 8 4 33.33% Vertical 248 39 13.59% 9018 8 4 33.33% Broad 255 32 11.15% 9028 8 4 33.33% FortyYD 272 15 5.23% 9037 8 4 33.33% Arms 286 1 0.35% 9043 8 4 33.33% Hands 286 1 0.35% 9058 8 4 33.33% id 287 0 0.00% 9064 8 4 33.33% Year 287 0 0.00% 9065 8 4 33.33% HeightFeet 287 0 0.00% 9083 8 4 33.33% HeightInches 287 0 0.00% 9095 8 4 33.33% Weight 287 0 0.00% 9139 8 4 33.33% HeightInchesTotal 287 0 0.00% 9185 8 4 33.33% 8972 9 3 25.00% 8977 9 3 25.00% 9009 9 3 25.00% 9175 9 3 25.00% 8966 10 2 16.67% 8983 10 2 16.67% 9001 10 2 16.67% Table 2: Variable Reduction (>25% missing) Table 3: Observation Reduction (>.334 Missing) Weight Arms Hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2 HeightInc hesTotal Weight 1 0.63631 0.5426 0.88516 0.87387 -0.75372 -0.77584 0.64951 0.71823 Arms 0.63631 1 0.53112 0.48541 0.48194 -0.33836 -0.31332 0.23887 0.76487 Hands 0.5426 0.53112 1 0.46659 0.4461 -0.31323 -0.33992 0.37235 0.5269 fortyyd_2 0.88516 0.48541 0.46659 1 0.93863 -0.8223 -0.85232 0.49038 0.55655 tenyd_2 0.87387 0.48194 0.4461 0.93863 1 -0.81002 -0.83432 0.48381 0.57244 vertical_2 -0.7537 -0.3384 -0.3132 -0.8223 -0.81 1 0.89585 -0.36448 -0.41117 broad_2 -0.7758 -0.3133 -0.3399 -0.85232 -0.8343 0.89585 1 -0.40607 -0.41073 bench_2 0.64951 0.23887 0.37235 0.49038 0.48381 -0.36448 -0.40607 1 0.2994 HeightInchesTotal 0.71823 0.76487 0.5269 0.55655 0.57244 -0.41117 -0.41073 0.2994 1 Table 4: Pearson Correlation Coefficients
  • 13. Page 13 of 31 Test Estimate Stat pval Skewness 4.832383 220.3613 0.002581 Kurtosis 101.5743 1.503087 0.132817 Table 5: Initial Mardia's Test Test Estimate Stat pval Skewness 0.501283 22.90982 0.293245 Kurtosis 23.33105 -0.793283 0.427613 Table 6: Final Mardia's Test Chi-Square DF Pr > ChiSq 113.146532 50 <.0001 Table 7: MVN Variance Test Chi-Square DF Pr > ChiSq 28.352169 30 0.5518 Table 8: MVN Variance Test Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.1125846 74.43 12 696.12 <.0001 Pillai's Trace 1.2195636 45.38 12 795 <.0001 Hotelling-Lawley Trace 5.1503363 112.51 12 455.96 <.0001 Roy's Greatest Root 4.6160624 305.81 4 265 <.0001 Table 9: MANOVA Test Criteria & F Approximations NOTE: F Statistic for Roy's Greatest Root is an upper bound. vertical_2 bench_2 Hands HeightInche sTotal 4.61606237 89.63 -0.0184358 0.00785156 0.0187705 0.01626727 0.4222601 8.2 0.01059568 -0.0034606 0.0141656 0.02524815 0.1120138 2.17 0.01125293 0.00875417 0.0206764 -0.0031582 0 0 -0.0011242 -0.0037193 0.1251432 -0.0135694 Table 10: Characteristic Roots and Vectors Characteristic Root Percent Characteristic Vector V'EV=1 Variable F Value Pr > F Vertical 156.76 <.0001 Bench 75.01 <.0001 Hands 36.46 <.0001 HeightinInchesTotal 109.42 <.0001 Table 11: Univariate Analysis of Variance
  • 14. Page 14 of 31 Contrast Contrast SS Estimate Pr > F DB/WR vs LB/DE/TE 149.132746 1.9346678 <.0001 DB/WR vs OL/DT 2970.979953 8.69776183 <.0001 DB/WR vs RB 72.29268 1.67462185 0.0014 LB/DE/TE vs OL/DT 1692.05026 6.76309403 <.0001 LB/DE/TE vs RB 1.675503 -0.260046 0.6245 OL/DT vs RB 1211.140559 -7.02314 <.0001 DB/WR vs LB/DE/TE 2159.701381 -7.3623549 <.0001 DB/WR vs OL/DT 5474.545985 -11.806786 <.0001 DB/WR vs RB 1132.773171 -6.6288939 <.0001 LB/DE/TE vs OL/DT 730.726452 -4.4444314 <.0001 LB/DE/TE vs RB 13.329045 0.733461 0.4677 OL/DT vs RB 658.321367 5.1778925 <.0001 DB/WR vs LB/DE/TE 10.55147097 -0.5146078 <.0001 DB/WR vs OL/DT 19.17916754 -0.6988316 <.0001 DB/WR vs RB 0.03911035 0.03895072 0.6897 LB/DE/TE vs OL/DT 1.25549104 -0.1842237 0.0243 LB/DE/TE vs RB 7.59227804 0.55355856 <.0001 OL/DT vs RB 13.36559713 0.7377823 <.0001 DB/WR vs LB/DE/TE 332.3727022 -2.8882353 <.0001 DB/WR vs OL/DT 606.3480271 -3.9293312 <.0001 DB/WR vs RB 107.0115459 2.03744038 <.0001 LB/DE/TE vs OL/DT 40.0962606 -1.0410959 0.0012 LB/DE/TE vs RB 601.1413339 4.92567568 <.0001 OL/DT vs RB 874.1998387 5.96677157 <.0001 Height Table 12: Contrasts & Estimates VerticalBenchHands
  • 15. Page 15 of 31 Appendix 2: Figures Figure 1: Initial Group Frequency Distribution Figure 2: Initial Group Variable Profile Plot Figure 3: Forty Yard Time Histogram (in seconds) COL1 -2 -1 0 1 2 name Weight Arms Hands Forty Ten Vert Broad Bench Height Group DB DL LB OL RB WR
  • 16. Page 16 of 31 Figure 4: Weight Histogram (in pounds) Figure 5: Bench Press Histogram (# of reps) Figure 6: Vertical Jump Histogram (in inches)
  • 17. Page 17 of 31 Figure 7: Hand Length Histogram (in inches) Figure 8: Height Histogram (in inches) Figure 9: Arms Histogram (in inches)
  • 18. Page 18 of 31 Figure 10: Vertical Jump Boxplot (in inches) Figure 11: Final Group Frequency Distribution Figure 12: Final Group Variable Profile Plot COL1 -2 -1 0 1 name Hands Vert Bench Height group_2 DB/WR LB/DE/TE OL/DT RB
  • 19. Page 19 of 31 Appendix 3: SAS Code *============================================================================ ============================================* Create Library and Read Data to the Library *============================================================================ ============================================*; libname C13 "ClientF$Stat ClassesCurrentMultivariate Data AnalysisProject1"; proc import datafile="ClientF$Stat ClassesCurrentMultivariate Data AnalysisProject1combine.csv" out=combine dbms=csv replace; getnames=yes; run; data C13.combine; set combine; run; *============================================================================ ============================================* Variable Audit *============================================================================ ============================================*; proc means data = C13.combine; run; *============================================================================ ============================================* Set all other 0 Values to missing *============================================================================ ============================================*; data C13.combine_2 (drop = i); set C13.combine; array var{*} arms hands fortyyd twentyyd tenyd twentyss threecone vertical broad bench round pickround picktotal wonderlic; do i = 1 to 14; if var{i} = 0 then var{i} = . ; end; run; proc means data = C13.combine_2 n nmiss min max mean std; run; data C13.combine_2 (drop = wonderlic twentyyd threecone twentyss); set C13.combine_2; run;
  • 20. Page 20 of 31 *============================================================================ ============================================* Use a transpose to identify individuals that have several missing values. *============================================================================ ============================================*; data temp (drop = college firstname lastname name pick pickround picktotal round year) ; set C13.combine_2; run; proc transpose data = temp out = transpose; run; proc means data = transpose n nmiss; run; *============================================================================ ============================================* Remove Individuals with more than 33% missing values. *============================================================================ ============================================*; data C13.combine_3; set C13.combine_2; if id = 9225 or id = 8984 or id = 9107 or id = 9140 then delete; run; proc means data = C13.combine_3 n nmiss; run; *============================================================================ ============================================* Need to impute the following variables: fortyyd tenyd vertcal broad bench Regression Imputation: use height in inches weight, and position as predictors Run Regression Imputation on all 5 to get in one dataset *============================================================================ ============================================*; proc freq data = C13.combine_3; tables position; run; *** Create Dummy Variables for Postion with QB the base ***; data C13.combine_3; set C13.combine_3; if position = "CB" then CB = 1; else CB = 0;
  • 21. Page 21 of 31 if position = "DE" then DE = 1; else DE = 0; if position = "DT" then DT = 1; else DT = 0; if position = "FS" then FS = 1; else FS = 0; if position = "IL" then IL = 1; else IL = 0; if position = "OC" then OC = 1; else OC = 0; if position = "OG" then OG = 1; else OG = 0; if position = "OL" then OL = 1; else OL = 0; if position = "OT" then OT = 1; else OT = 0; if position = "WR" then WR = 1; else WR = 0; if position = "RB" then RB = 1; else RB = 0; if position = "SS" then SS = 1; else SS = 0; if position = "TE" then TE = 1; else TE = 0; run; *** Regression Imputation ***; proc reg data = C13.combine_3; model fortyyd = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF; output out=impute_1 p=predicted_fortyyd; run; quit; proc reg data = impute_1; model tenyd = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF; output out=impute_2 p=predicted_tenyd; run; quit; proc reg data = impute_2; model vertical = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF; output out=impute_3 p=predicted_vertical; run; quit;
  • 22. Page 22 of 31 proc reg data = impute_3; model Broad = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF; output out=impute_4 p=predicted_broad; run; quit; proc reg data = impute_4; model Bench = CB DE DT FS IL OC OG OL OT WR RB SS TE / VIF; output out=impute_5 p=predicted_bench; run; quit; data C13.combine_Imputation_GK; set impute_5; /*===================================================== fortyy_2, vertical_2, etc. are the imputed values *=====================================================*/ if fortyyd = . then fortyyd_2 = predicted_fortyyd; else fortyyd_2 = fortyyd; if tenyd = . then tenyd_2 = predicted_tenyd; else tenyd_2 = tenyd; if vertical = . then vertical_2 = predicted_vertical; else vertical_2 = vertical; if broad = . then broad_2 = predicted_broad; else broad_2 = broad; if bench = . then bench_2 = predicted_bench; else bench_2 = bench; run; *============================================================================ ===============* Remove unnecessary variable and create the groups. *============================================================================ ==============*; data master; set C13.combine_imputation_gk; run; proc freq data = master; table position; run; data master_2 (keep= id name position group weight arms hands fortyYd tenyd vertical broad bench heightinchestotal fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2);
  • 23. Page 23 of 31 set master; if position = "QB" then delete; else if Position = "DE" then Group = "DL"; else if Position = "DT" then Group = "DL"; else if Position = "IL" then Group = "LB"; else if Position = "OL" then Group = "LB"; else if Position = "CB" then Group = "DB"; else if Position = "SS" then Group = "DB"; else if Position = "FS" then Group = "DB"; else if Position = "OT" then Group = "OL"; else if Position = "OC" then Group = "OL"; else if Position = "OG" then Group = "OL"; else if Position = "TE" then Group = "LB"; else if Position = "RB" then Group = "RB"; else if Position = "WR" then Group = "WR"; else group = ""; run; proc freq data = master_2; tables position*group; run; data C13.master; set master_2; run; *============================================================================ ===============* Profile Analysis *============================================================================ ==============*; *** Standardize the values for each possible Y ***; proc means data = C13.master; var weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2 heightinchestotal; output out = standard mean = avg_weight avg_arms avg_hands avg_forty avg_ten avg_vert avg_broad avg_bench avg_height std = std_weight std_arms std_hands std_forty std_ten std_vert std_broad std_bench std_height; run; proc sql; create table standard_2 as select * from C13.master, standard; quit; data standard_3 (drop= weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2 heightinchestotal avg_weight avg_arms avg_hands avg_forty avg_ten avg_vert avg_broad avg_bench avg_height std_weight std_arms ste_hands std_forty std_ten std_vert std_broad std_bench std_height
  • 24. Page 24 of 31 _type_ _freq_ fortyyd tenyd vertical broad bench name position id); set standard_2; s_weight = (weight-avg_weight)/std_weight; s_arms = (arms-avg_arms)/std_arms; s_hands = (hands-avg_hands)/std_hands; s_forty = (fortyyd_2-avg_forty)/std_forty; s_ten = (tenyd_2-avg_ten)/std_ten; s_vert = (vertical_2-avg_vert)/std_vert; s_broad = (broad_2-avg_broad)/std_broad; s_bench = (bench_2-avg_bench)/std_bench; s_height = (heightinchestotal-avg_height)/std_height; run; *** Obtain the average of the standardized values and plot per group ***; proc means data = standard_3; class group; var s_weight s_arms s_hands s_forty s_ten s_vert s_broad s_bench s_height; output out = temp mean = avg_weight avg_arms avg_hands avg_forty avg_ten avg_vert avg_broad avg_bench avg_height; run; data temp2 (drop= _freq_ _type_); set temp; run; proc transpose data = temp2 out=trans; by group; run; proc format; value varfmt 1 = "Weight" 2 = "Arms" 3 = "Hands" 4 = "Forty" 5 = "Ten" 6 = "Vert" 7 = "Broad" 8 = "Bench" 9 = "Height"; run; data temp3; set trans; if _name_ = "avg_weight" then name = 1; else if _name_ = "avg_arms" then name = 2; else if _name_ = "avg_hands" then name = 3; else if _name_ = "avg_forty" then name = 4; else if _name_ = "avg_ten" then name = 5; else if _name_ = "avg_vert" then name = 6; else if _name_ = "avg_broad" then name = 7; else if _name_ = "avg_bench" then name = 8; else if _name_ = "avg_height" then name = 9;
  • 25. Page 25 of 31 else name = 10; format name varfmt.; run; symbol1 interpol=join value=dot; proc gplot data = temp3; plot col1*name=group; run; *** Check correlations for vert and broad and ten and forty ***; proc corr data = C13.master; var vertical_2 broad_2; run; proc corr data = C13.master; var fortyyd_2 tenyd_2; run; *** Drop Broad_2 and Ten_2 ***; data C13.master_2 (drop= broad_2 tenyd_2 broad tenyd); set C13.master; run; *============================================================================ ============================================* Multivariate Normality Check: Mardia's Kurtosis / Skewness *============================================================================ ============================================*; %let newinpt= vertical_2 bench_2 hands heightinchestotal; proc iml; use C13.master_2; read all var {&newinpt} into y; n = nrow(y) ; p = ncol(y) ; dfchi = p*(p+1)*(p+2)/6 ; q = i(n) - (1/n)*j(n,n,1); s = (1/(n))*y`*q*y ; s_inv = inv(s) ; g_matrix = q*y*s_inv*y`*q; beta1hat = ( sum(g_matrix#g_matrix#g_matrix) )/(n*n); beta2hat =trace( g_matrix#g_matrix )/n ; k=(p+1)*(n+1)*(n+3)/(n*((n+1)*(p+1)-6)); kappa1 = n*beta1hat*k/6 ; kappa2 = (beta2hat - p*(p+2) ) /sqrt(8*p*(p+2)/n) ; pvalskew = 1 - probchi(kappa1,dfchi) ; pvalkurt = 2*( 1 - probnorm(abs(kappa2)) ); print s ; print s_inv ; print 'TESTS:'; print 'Based on skewness: ' beta1hat kappa1 pvalskew ; print 'Based on kurtosis: ' beta2hat kappa2 pvalkurt;
  • 26. Page 26 of 31 quit; *** Macro to look at Univariate Normality ***; %Macro Hist(var= ); proc univariate data = C13.master_2; var &var; histogram; run; %Mend; %Hist (var=fortyyd_2); %Hist (var=vertical_2); %Hist (var=bench_2); %Hist (var=heightinchestotal); %Hist (var=weight); %Hist (var=arms); %Hist (var=hands); *** Ran several iterations of this test to get a set of variables that are multivariate normal ***; data C13.master_3 (drop= fortyyd vertical bench fortyyd_2 weight arms); set C13.master_2; run; *============================================================================ ============================================* Covariance Matrix Structure *============================================================================ ============================================*; proc discrim data = C13.master_3 pool=test; class group; var vertical_2 bench_2 hands heightinchestotal; run; *** This assumption is highly violated. Try to group differently ***; data regroup; set C13.master_3; if position = "QB" then delete; else if Position = "DE" then group_2 = "LB/DE/TE"; else if Position = "DT" then group_2 = "OL/DT"; else if Position = "IL" then group_2 = "LB/DE/TE"; else if Position = "OL" then group_2 = "LB/DE/TE"; else if Position = "CB" then group_2 = "DB/WR"; else if Position = "SS" then group_2 = "DB/WR"; else if Position = "FS" then group_2 = "DB/WR"; else if Position = "OT" then group_2 = "OL/DT"; else if Position = "OC" then group_2 = "OL/DT"; else if Position = "OG" then group_2 = "OL/DT"; else if Position = "TE" then group_2 = "LB/DE/TE"; else if Position = "RB" then group_2 = "RB";
  • 27. Page 27 of 31 else if Position = "WR" then group_2 = "DB/WR"; else group_2 = ""; run; proc discrim data = regroup pool=test; class group_2; var vertical_2 bench_2 hands heightinchestotal; run; data C13.master_4; set regroup; run; *============================================================================ ============================================* Redo Profile Analysis Based on New Groups *============================================================================ ============================================*; data new_standard; set c13.master; if position = "QB" then delete; else if Position = "DE" then group_2 = "LB/DE/TE"; else if Position = "DT" then group_2 = "OL/DT"; else if Position = "IL" then group_2 = "LB/DE/TE"; else if Position = "OL" then group_2 = "LB/DE/TE"; else if Position = "CB" then group_2 = "DB/WR"; else if Position = "SS" then group_2 = "DB/WR"; else if Position = "FS" then group_2 = "DB/WR"; else if Position = "OT" then group_2 = "OL/DT"; else if Position = "OC" then group_2 = "OL/DT"; else if Position = "OG" then group_2 = "OL/DT"; else if Position = "TE" then group_2 = "LB/DE/TE"; else if Position = "RB" then group_2 = "RB"; else if Position = "WR" then group_2 = "DB/WR"; else group_2 = ""; run; *** Standardize the values for each possible Y ***; proc means data = new_standard; var weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2 heightinchestotal; output out = standard mean = avg_weight avg_arms avg_hands avg_forty avg_ten avg_vert avg_broad avg_bench avg_height std = std_weight std_arms std_hands std_forty std_ten std_vert std_broad std_bench std_height; run; proc sql; create table standard_2 as select * from new_standard, standard; quit;
  • 28. Page 28 of 31 data standard_3 (drop= weight arms hands fortyyd_2 tenyd_2 vertical_2 broad_2 bench_2 heightinchestotal avg_weight avg_arms avg_hands avg_forty avg_ten avg_vert avg_broad avg_bench avg_height std_weight std_arms ste_hands std_forty std_ten std_vert std_broad std_bench std_height _type_ _freq_ fortyyd tenyd vertical broad bench name position id); set standard_2; s_weight = (weight-avg_weight)/std_weight; s_arms = (arms-avg_arms)/std_arms; s_hands = (hands-avg_hands)/std_hands; s_forty = (fortyyd_2-avg_forty)/std_forty; s_ten = (tenyd_2-avg_ten)/std_ten; s_vert = (vertical_2-avg_vert)/std_vert; s_broad = (broad_2-avg_broad)/std_broad; s_bench = (bench_2-avg_bench)/std_bench; s_height = (heightinchestotal-avg_height)/std_height; run; *** Obtain the average of the standardized values and plot per group ***; proc means data = standard_3; class group_2; var s_weight s_arms s_hands s_forty s_ten s_vert s_broad s_bench s_height; output out = temp mean = avg_weight avg_arms avg_hands avg_forty avg_ten avg_vert avg_broad avg_bench avg_height; run; data temp2 (drop= _freq_ _type_); set temp; run; proc transpose data = temp2 out=trans; by group_2; run; proc format; value varfmt 1 = "Weight" 2 = "Arms" 3 = "Hands" 4 = "Forty" 5 = "Ten" 6 = "Vert" 7 = "Broad" 8 = "Bench" 9 = "Height"; run; data temp3; set trans; if _name_ = "avg_weight" then name = 1; else if _name_ = "avg_arms" then name = 2; else if _name_ = "avg_hands" then name = 3;
  • 29. Page 29 of 31 else if _name_ = "avg_forty" then name = 4; else if _name_ = "avg_ten" then name = 5; else if _name_ = "avg_vert" then name = 6; else if _name_ = "avg_broad" then name = 7; else if _name_ = "avg_bench" then name = 8; else if _name_ = "avg_height" then name = 9; else name = 10; format name varfmt.; run; symbol1 interpol=join value=dot; proc gplot data = temp3; plot col1*name=group_2; run; *** Profile Analysis Leads to the Same Y's to remove Move on to Outlier Detection and MANOVA ***; *============================================================================ ============================================* Check for Outliers *============================================================================ ============================================*; %INCLUDE "ClientF$Stat ClassesCurrentMultivariate Data AnalysisProject1mnorm.sas"; *EXAMPLE 1; %MNORM(DATA=C13.master_4,CLASS=Group_2 ,RESPONSE=vertical_2 bench_2 hands heightinchestotal ,ID=id) proc means data = C13.master_4_mnorm mean median std; var MNORM_SMD; run; *** Mean is about 3.94 and STD is about 3.07 ***; data outlier; set C13.master_4_mnorm; if MNORM_SMD > 3.94 + (3*3.07) then Outlier = 1; else outlier = 0; run; proc sort data = outlier; by descending MNORM_SMD; run; proc print data = outlier (obs=20); var ID name MNORM_SMD outlier; run; *** Limited Outliers (only 5) Assumption met ***; *============================================================================ ============================================*
  • 30. Page 30 of 31 Profile Analysis Pre-MANOVA *============================================================================ ============================================*; *** Standardize the values for each possible Y ***; proc means data = C13.master_4; var hands vertical_2 bench_2 heightinchestotal; output out = standard mean = avg_hands avg_vert avg_bench avg_height std = std_hands std_vert std_bench std_height; run; proc sql; create table standard_2 as select * from C13.master_4, standard; quit; data standard_3; set standard_2; s_hands = (hands-avg_hands)/std_hands; s_vert = (vertical_2-avg_vert)/std_vert; s_bench = (bench_2-avg_bench)/std_bench; s_height = (heightinchestotal-avg_height)/std_height; run; *** Obtain the average of the standardized values and plot per group ***; proc means data = standard_3; class group_2; var s_hands s_vert s_bench s_height; output out = temp mean = avg_hands avg_vert avg_bench avg_height; run; data temp2 (drop= _freq_ _type_); set temp; run; proc transpose data = temp2 out=trans; by group_2; run; proc format; value re_varfmt 1 = "Hands" 2 = "Vert" 3 = "Bench" 4 = "Height"; run; data temp3; set trans; if _name_ = "avg_hands" then name = 1; else if _name_ = "avg_vert" then name = 2; else if _name_ = "avg_bench" then name = 3;
  • 31. Page 31 of 31 else if _name_ = "avg_height" then name = 4; format name re_varfmt.; run; symbol1 interpol=join value=dot; proc gplot data = temp3; plot col1*name=group_2; run; *============================================================================ ============================================* MANOVA *============================================================================ ============================================*; proc sort data = C13.master_4 out=test; by group_2; run; /*==================* Order of Groups "DB/WR" "LB/DE/TE" "OL/DT" "RB" *==================*/ proc glm data = C13.master_4; class group_2; model vertical_2 bench_2 hands heightinchestotal = group_2; manova h = group_2; contrast "DB/WR vs LB/DE/TE" group_2 1 -1 0 0; contrast "DB/WR vs OL/DT" group_2 1 0 -1 0; contrast "DB/WR vs RB" group_2 1 0 0 -1; contrast "LB/DE/TE vs OL/DT" group_2 0 1 -1 0; contrast "LB/DE/TE vs RB" group_2 0 1 0 -1; contrast "OL/DT vs RB" group_2 0 0 1 -1; MANOVA H = _ALL_; estimate "DB/WR vs LB/DE/TE" group_2 1 -1 0 0; estimate "DB/WR vs OL/DT" group_2 1 0 -1 0; estimate "DB/WR vs RB" group_2 1 0 0 -1; estimate "LB/DE/TE vs OL/DT" group_2 0 1 -1 0; estimate "LB/DE/TE vs RB" group_2 0 1 0 -1; estimate "OL/DT vs RB" group_2 0 0 1 -1; run;