Assessing Teamwork And Collaboration In High School Students A Multi-Method Approach
1. http://cjs.sagepub.com
Psychology
Canadian Journal of School
DOI: 10.1177/0829573509335470
2009; 24; 108
Canadian Journal of School Psychology
Roberts
Lijuan Wang, Carolyn MacCann, Xiaohua Zhuang, Ou Lydia Liu and Richard D.
Multimethod Approach
Assessing Teamwork and Collaboration in High School Students: A
http://cjs.sagepub.com/cgi/content/abstract/24/2/108
The online version of this article can be found at:
Published by:
http://www.sagepublications.com
On behalf of:
Canadian Association of School Psychologists
can be found at:
Canadian Journal of School Psychology
Additional services and information for
http://cjs.sagepub.com/cgi/alerts
Email Alerts:
http://cjs.sagepub.com/subscriptions
Subscriptions:
http://www.sagepub.com/journalsReprints.nav
Reprints:
http://www.sagepub.com/journalsPermissions.nav
Permissions:
http://cjs.sagepub.com/cgi/content/refs/24/2/108
Citations
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
3. Wang et al. / Assessing Teamwork 109
d’auto-évaluation, de jugements de situations et d’évaluation par les enseignants du travail
d’équipe chez les étudiants du secondaire. Diverses techniques multivariées ont été utili-
sées pour déterminer la structure des mesures, y compris des analyses factorielles et de
classes latentes. Les mesures se révèlent raisonnablement fiables et leur validité est satis-
faisante. L’auto-évaluation, les jugements de situations et l’évaluation par les enseignants
sont en intercorrélation et toutes ces mesures sont également liées au rendement scolaire.
Nous abordons le pour et le contre de chaque méthodologie, de même que les utilisations
possibles de ce système d’évaluation (p. ex. l’évaluation de programmes scolaires qui
comprennent des curriculum avec des modules fondés sur le travail d’équipe).
Keywords: teamwork; situational judgment test; teacher ratings; academic achieve-
ment; latent class analysis; structural equation modeling
Teamwork has been touted as one of the major skills comprising workforce
readiness in the 21st century (e.g., Barton, 2007) and has become an essential
process in education, with teachers frequently assigning projects that require student
collaboration (Ahles & Bosworth, 2004). Despite the perceived importance of team-
work in secondary education, reliable assessments of teamwork at the high school
level are scarce, with existing assessments primarily targeting business organiza-
tions or college students (Loughry, Ohland, & Moore, 2007; Morgeson, Reider, &
Campion, 2005; O’Neil, Wang, & Lee, 2003).
The current study aims to develop a teamwork assessment that (a) targets high
school students, (b) uses multiple methods of measurement (self-report, other-report,
and situational judgment test [SJT] procedures), (c) results in reliable measures, and
(d) yields scores with demonstrable validity evidence. These instruments could be
used to identify students’ teamwork skills, to design intervention programs around
the assessment, and to provide career guidance for students. In addition, information
derived from the assessment procedures may inform the development of teamwork-
training curricula in the high school.
Individual Differences in Teamwork: Conceptual Models
Although models of teamwork differ in the details, conceptual correspondences
between the components of teamwork models suggest five general content areas (e.g.,
O’Neil et al., 2003; Stevens & Campion, 1994). These content areas are (a) task-
related process skills, (b) cooperation with other team members, (c) influencing
team members through support and encouragement, (d) resolution of conflicts
among team members via negotiation strategies, and (e) guidance and mentorship of
other team members. As process skills appear to have a clear cognitive load (and
hence might be most appropriately measured with knowledge assessments), we limit
our definition of teamwork to the four latter content areas in the current study. That is,
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
4. 110 Canadian Journal of School Psychology
teamwork is defined as (a) cooperation with others, (b) influence through support and
encouragement (hereafter referred to as advocate), (c) resolving conflict/negotiating,
and (d) guiding others. A multimethod teamwork assessment system is developed to
cover these content domains.
Assessing Teamwork
Individual differences in teamwork are commonly assessed with self- or other-
reports (e.g., Loughry et al., 2007) and less commonly with SJTs (e.g., Stevens &
Campion, 1999). There are practical concerns with each method of assessment in
isolation: SJTs are difficult to score, self-report ratings may produce response distor-
tion, and other-report ratings may be susceptible to halo effects. Thus, using multiple
methods represents an innovative approach to teamwork assessment, ensuring that
potential measurement issues are limited to one part of the assessment system. In
addition, the relationship between different methods of measurement (self-reports,
SJTs, and teacher-reports) can be examined as an important methodological issue.
Establishing Validity Evidence for Teamwork Scores
Several different types of validity evidence for the teamwork assessment system
are examined. First, the multiple measures should converge to assess the same con-
struct (teamwork) as evidence of convergent validity. Second, limited group differ-
ences (i.e., gender or ethnicity differences) in teamwork scores are expected as group
membership is conceptually unrelated to teamwork, constituting a form of discrimi-
nant validity evidence. Third, teamwork scores should predict students’ grades, as
evidence of criterion validity. Fourth, latent class analyses of test takers’ responses
to the SJT should converge with expert opinion, as evidence for the validity of the
expert scoring rubric. Fifth, developmental and learning trends in teamwork are
expected as a further form of validity evidence. Further background to these claims
follows.
Convergent validity evidence. Scores on the three different measurement methods
should converge, as all are based on the same definition of teamwork. Positive cor-
relations between the different measurement methods are evidence of convergent
validity.
Discriminant validity evidence. The distinctiveness of teamwork scores from
ethnicity and gender constitutes evidence of discriminant validity. Although there is
little literature examining group differences in teamwork, related literatures suggest
that noncognitive assessments reduce adverse impact (e.g., McDaniel, Morgeson,
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
5. Wang et al. / Assessing Teamwork 111
Finnegan, Campion, & Braverman, 2001). Teamwork may relate to socialization
processes in much the same ways as similar constructs, such as social and emotional
competencies (Zeidner, Matthews, & Roberts, 2009). However, if the teamwork
assessments are purely measures of socialization styles or types of experience
(which differ across groups), then such measures are not truly assessing teamwork
but rather social interaction norms. For this reason, we consider the absence of group
differences on the teamwork assessments to be a form of discriminant validity.
Test-criterion evidence. Test-criterion relations of the teamwork assessments are
evaluated against students’grades. Given that students’learning and achievement may
relate to social demands of the classroom, cognitive demands of mastering academic
material, and teachers stressing collaborative approaches to learning, teamwork skills
gain in importance as components of academic success (Ahles & Bosworth, 2004).
Teamwork scores should thus predict school grades. However, it might be the case that
teamwork measures predict grades especially in classes where teamwork is pivotal
(e.g., music, where ensembles or group performances are common) versus those where
it may sometimes be downplayed (e.g., history, where individual assessments are more
common than team projects).
Validity evidence for the SJTs. Although expert scoring is recommended for SJTs
(e.g., McDaniel & Nguyen, 2001), experts may disagree, criteria for teamwork exper-
tise are not obvious, and multiple correct answers to situational items may be possi-
ble. For these reasons, latent class analysis (LCA) is used as a procedure for ensuring
the validity of expert scoring of the teamwork SJT. The LCA identifies qualitatively
different groups of cases based on consistencies in response patterns. An LCA of SJT
items can determine whether there are distinct groups of test takers showing different
patterns of response on the SJT. If discrete groups of test takers have higher or lower
expert-derived scores, this provides evidence that expert scoring is valid.
Development trends in teamwork. Development and learning trends for teamwork
are also examined (i.e., higher teamwork scores are expected for older students as a
product of education and socialization).
Aims of the Study
This study serves three primary purposes: (a) develop a multiple method assess-
ment system to measure teamwork in high school students, (b) provide preliminary
reliability and validity evidence for the teamwork assessment system by examining
each measure in isolation and then by examining the relation between the three meth-
ods (i.e., convergent validity evidence), and (c) provide additional validity evidence
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
6. 112 Canadian Journal of School Psychology
for the assessments by examining relationships between teamwork scores and age,
ethnicity, gender, and academic achievement.
Method
Participants
Participants were 159 high school students (51.6% female adolescents) partici-
pating in Ford Partnership for Advanced Studies (PAS) courses, whose teachers
indicated a willingness to participate in the teamwork study. The Ford PAS is a set
of curriculum modules linking classroom education to real-world employment appli-
cations, through developing links between schools and local colleges, universities,
and businesses. The learning process is based on inquiry and collaborative, project-
based learning experiences and aims to teach students four essential workplace
skills: Teamwork, critical thinking, problem solving, and communication (see http://
www.fordpas.org). Participants’mean age was 16.10 years (SD = 1.03). Participants’
self-identified ethnicity was 64.2% African American, 18.9% White, 3.1% Hispanic,
and 13.8% American Indian, Asian, or Other. Although this sample is not representa-
tive of U.S. high school students, it is consistent with the student population gener-
ally participating in the Ford PAS program.
Measures
Teamwork
A self-report rating scale, SJT, and behaviorally anchored teacher-rating scale were
developed to assess the four content domains of teamwork. All three measures were
developed by specialists and reviewed by two expert panels: (a) content experts (cur-
riculum developers and educators who have taught teamwork curricula) and (b) fairness
and sensitivity experts (i.e., individuals trained to meet guidelines according to estab-
lished standards designed to reduce bias and promote legal and public defensibility).
Self-report teamwork assessment. There were 57 items developed to assess Coo-
peration (15 items), Advocate (12 items), Negotiate (17 items), and Guiding Others
(13 items). Students responded to these items on a 6-point scale from never to always
(see Table 1 for sample items).
SJT assessment. Curricula materials and student exercises supplied by Ford PAS
were used to develop scenarios and response options. Two scenarios were developed
for each of the four components, such that the SJT represented all four content com-
ponents (however, due to limited testing time, we did not attempt to develop enough
items to examine multiple teamwork factors). For each scenario, students rated the
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
7. Wang et al. / Assessing Teamwork 113
effectiveness of four response options on a 5-point scale, from very ineffective to
very effective. A panel of three assessment specialists in educational and psycho-
logical testing decided on the best response (with final decision based on the group
reaching consensus), and the test taker’s rating of this response was used to score
each item. A sample item is described below.
Table 1
Factor Loadings of the Revised Self-Report Teamwork Scale
Item Content Cooperate Advocate/Guide Negotiate M (SD)
I enjoy bringing team .75 4.45 (1.29)
members togethera
Sharing ideas .74 4.58 (1.20)
Acknowledging peers’ .71 4.71 (1.19)
accomplishments
Helping team .67 4.77 (1.19)
Valuing different perspectives .67 4.40 (1.26)
Providing feedback .60 4.16 (1.18)
Exchanging creative ideas .60 4.84 (1.26)
Cooperating with students .53 4.72 (1.12)
Enjoying team activities .46 4.43 (1.33)
Inspired by others .46 4.09 (1.25)
Contributing team’s goals .43 4.50 (1.22)
Respecting peer opinions .41 .34 4.76 (1.11)
I like to be in charge .70 3.75 (1.42)
of groups or projectsa
Help others see things my way .65 4.03 (1.12)
Convincing peers .57 3.80 (1.19)
Believe good leaders .56 4.68 (1.36)
Comfortable providing criticism .44 3.80 (1.46)
Persuading peers attentively .43 4.30 (1.29)
Influencing peers .41 3.61 (1.44)
Suggesting solutions .30 3.99 (1.23)
Constructively argue .23 4.15 (1.27)
I am a good listenera
.72 4.91 (1.11)
Open to varying opinions .65 4.64 (1.23)
Take others’ interests into account .55 4.32 (1.28)
Adaptable in team .53 4.16 (1.15)
Flexible in team .51 4.68 (1.16)
Find best solution .49 4.38 (1.37)
Dislike people challenging viewsb
.38 3.92 (1.29)
Understanding team .30 .31 5.33 (1.04)
member differences
Consider team first .29 3.98 (1.36)
Note: For better readability, factor loadings that are less than .20 are not listed.
a
Items are complete examples.
b
Reverse-scored item.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
8. 114 Canadian Journal of School Psychology
You are part of a study group that has been assigned a large presentation for class. As
you are all dividing up the workload, it becomes clear that both you and another mem-
ber of the group are interested in researching the same aspect of the topic. Your col-
league already has a great deal of experience in this area, but you have been extremely
excited about working on this part of the project for several months. Rate the following
approach to dealing with this situation: (a) Flip a coin to determine who gets to work on
that particular aspect of the assignment, (b) insist that, for the good of the group, you
should work on that aspect of the assignment because your interest in the area means
you will do a particularly good job, (c) compromise your preferences for the good of
the group and allow your friend to work on that aspect of the assignment [best response
by expert judgment], (d) suggest to the other group member that you both share the
research for that aspect of the assignment and also share the research on another less-
desirable topic.
Teacher-rating scale. Teachers evaluated each student’s level of teamwork
against ten behaviorally anchored items. A sample item follows:
When working on a group goal or project, this student
(1) (2) (3) (4) (5)
ignores or does not listens to others’ always listens
notice others’ ideas contributions to others
or suggestions and respects
their contributions
Self-Reported Grades
Students reported their grades in reading/language arts, math, science, social sci-
ence, art, and music from the previous semester. A total of 151 grades were reported
for reading, 150 for math, 146 for social science, 135 for social studies, 49 for art,
and 41 for music.
Procedure
Participants were tested in class (20 students per class) with teachers reporting on
their students’ performance during this time. All tests were administered in paper-
and-pencil format and were self-paced. The entire student protocol lasted 45 min. All
protocols were approved by the Educational Testing Service human ethics review
committee.
Data Analysis Steps
Exploratory factor analyses (EFAs). Separate EFAs using principal factor analy-
sis with promax rotation were conducted for the student self-report scale, SJT (eight
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
9. Wang et al. / Assessing Teamwork 115
key items that experts selected as the best), and teacher-report scale. Problematic
items (loadings <.30 or cross-loadings to multiple factors) were removed to reduce
the item pool. Parallel analysis was conducted to determine the number of factors for
each scale, using a SAS macro program that allows parallel analysis of ordinal data
such as rating scale items (Liu & Rijmen, 2008).
Confirmatory factor analyses (CFAs). The CFAs were then used to compare
structural models for the self-report scale (to determine whether highly correlating
latent factors are sufficiently different from each other to constitute separate factors).
The following rules of thumb were used to evaluate fit statistics: (a) acceptable fit:
root mean square error of approximation (RMSEA) ≤ .08, comparative fit index
(CFI) ≥ .90; (b) good fit: RMSEA ≤ .05, CFI ≥ .95 (e.g., Hu & Bentler, 1999; Marsh,
Hau, & Wen, 2004). However, Hu and Bentler (1995) also suggest that the rule of
thumb is not absolute because it does not work equally well with various types of
indices, sample sizes, estimators, or distributions. Both EFAs and CFAs were con-
ducted with Mplus (Muthén & Muthén, 1998-2007).
LCA. The LCA was applied to the eight expert-ratified SJT item responses, to
see whether the obtained latent classes converge with expert scoring. Only the eight
key items were used in the LCA analysis as the small sample size and nested item-
scenario data structure meant that a model based on all 32 responses would not be
reliable. Akaike information criterion (AIC), Bayesian information criterion (BIC),
and profile plots were used to select the number of classes.
Correlational analyses. Correlations among self-report, teacher-report, and SJT
factor scores were calculated to determine whether the scores showed convergent
validity evidence. Correlations of teamwork factor scores with demographic vari-
ables and course grades were also calculated, providing further validity evidence.
Results
Scale Dimensionality
Self-report scale. A four-factor CFA based on the theoretical assignment of items
to factors fit data poorly (e.g., CFI = .56; RMSEA = .08), suggesting that a four-
factor solution was not the best fit to the data. The initial EFA was then conducted
on the correlation matrix, using a four-factor solution suggested by parallel analysis.
However, results indicated some potentially problematic items due to negative factor
loadings, items that did not load saliently (>.30) on any factor, or items that showed
multiple loadings (>.40) on different factors. In addition, these four factors did not
correspond to the four subscales originally postulated. A sequential deletion method
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
10. 116 Canadian Journal of School Psychology
was then used to eliminate problematic items. Items with negative loadings were
first eliminated, and then items with low loadings and multiple loadings were
eliminated. On the basis of item analyses, 27 items (13 of which were reverse-coded)
were dropped. An EFA was conducted on these discarded items, and no clear factor
pattern was identified. These 27 items were excluded from further analysis.1
The
remaining items were then reanalyzed using EFA. Results from the scree plot and
parallel analysis indicate a three-factor solution. Factor loadings from this analysis
are shown in Table 1.
The reduced self-report scale consisted of 30 items and captured three factors:
(a) Cooperation (12 items), (b)Advocate/Guide (9 items), and (c) Negotiation (9 items).
Cooperation includes tendencies to bring ideas together, seek solutions, and provide
feedback to team members. Advocate/Guide refers to tendencies to direct others,
provide appropriate suggestions and criticism, and persuade others (i.e., the factor
primarily represents advocating content, although elements of guiding others, which
did not emerge as a separate factor, also appear here). Negotiation includes students’
tendency to listen, to adapt to change while there are conflicts, and the ability to
solve conflicts. Cronbach alpha’s for the three retained factors were acceptable (i.e.,
.88, .80, and .78, respectively). Table 1 also provides the mean and standard devia-
tion of each item.
For CFA, items were assigned to one of the three dimensions based on the factor
loading matrix from the EFA shown in Table 1. Fit indices from a CFA based on
these three factors (with no cross-loadings) are shown in Table 2. The RMSEA indi-
cates good fit, although the CFI is slightly below conventional estimates (at .85
rather than .90). Correlations among the latent variables were relatively high: .66
(Cooperation, Advocate/Guide), .79 (Cooperation, Negotiation), and .59 (Advocate/
Guide, Negotiation). For this reason, we fitted three alternative two-factor models to
test if the three factors were statistically distinct: (a) Cooperation and Advocate/
Guide combine to form one factor (i.e., correlation between them equals 1.00 and
correlations of each factor with the Negotiation factor were also constrained to be
equal); (b) Cooperation and Negotiation combine to form one factor; and (c)Advocate/
Guide and Negotiation combine to form one factor. Results of the likelihood ratio
tests shown in Table 2 indicate that combining any two factors into one significantly
lowers model fit.
SJT. Scree and parallel analysis of the eight SJT items indicated a one-factor solu-
tion. The Cronbach alpha was .71. Fit indices from a one-factor CFA model indi-
cated good fit (CFI = .96, RMSEA = .049).
Unrestricted LCA models with one to four classes were fitted to the eight key SJT
items with raw data used as input. Table 3 displays the goodness-of-fit indices for
these LCA models. The AIC indicates a three-class solution, BIC a two-class solu-
tion. Profile plots (mean ratings of the 32 item responses for each class) were gener-
ated for both the two-class LCA model (Figure 1a) and the three-class LCA model
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
11. Wang et al. / Assessing Teamwork 117
(Figure 1b). The plots show that the Class 1 and Class 3 in the three-class model
were not visually distinguishable. However, the two classes from the two-class model
discriminated very clearly. Combining the results from AIC, BIC, and profile dis-
crimination abilities, a two-class model was selected to represent the dimensionality
of the SJT scale (one dimension, two classes). From Figure 1a, we can also see that
students in Class 1 rated all of the eight expert-endorsed response options more
highly than those in Class 2. This result provides a form of validity evidence for
the expert keys. There was also more variation across reactions/items within each
scenario for Class 1 than Class 2 (i.e., Class-1 students can better differentiate reac-
tions within each scenario than Class-2 students). Therefore, we label Class-1 as the
high teamwork skill group and Class-2 as the low teamwork skill group.
Teacher-report scale. The scree and parallel analysis with EFA showed that the
teacher evaluation scale was unidimensional, with the first factor explaining 83%
of the variance of the 10 items. The Cronbach alpha of the scale was .98. The mean
of the composite scores of these 10 items was 3.35 (SD = 1.14).
Table 2
Fit Indices from the Confirmatory Factor
Models for the Self-Report Teamwork Scores
Correlation Between Factors
Fixed to 1.00
Fit Indices Three-Factor Model F1, F2 F1, F3 F2, F3
Chi-square 651.5 743.8 716.1 741.6
df 402 404 404 404
CFI .84 .79 .81 .80
RMSEA .06 .07 .07 .07
Likelihood ratio test 92.3/2 64.6/2 90.1/2
Note: F1 = Cooperation; F2 = Advocate/Guide; F3 = Negotiation; CFI = comparative fit index; RMSEA =
root mean square error of approximation.
Table 3
Goodness-of-Fit Indices for One- through Four-Class
LCA Models for the SJT Teamwork Scores
Latent Class Log Likelihood Number of Parameters AIC BIC
1 –1541.1 31 3144.2 3238.5
2 –1428.4 63 2982.9 3174.6
3 –1382.9 95 2955.9 3245.0
4 –1361.4 127 2976.7 3363.2
Note: AIC = Akaike information criterion; BIC = Bayesian information criterion.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
12. 118 Canadian Journal of School Psychology
Figure 1
Mean Profile Plots With the Latent Class Analyses (Two vs. Three Classes)
1
1.5
2
2.5
3
3.5
4
4.5
5
a b c d a b c d a b c d a b c d a b c d a b c d a b c d a b c d
1 2 3 4 5 6 7 8
Response options a to d for each of the eight scenarios
Mean
rating
Class 1
Class 2
a
1
1.5
2
2.5
3
3.5
4
4.5
5
a b c d a b c d a b c d a b c d a b c d a b c d a b c d a b c d
1 2 3 4 5 6 7 8
Response options a to d for each of the eight scenarios
Mean
rating
Class 1
Class 2
Class 3
b
Note: The y axis represents rated means of effectiveness of each reaction across latent classes. The x axis
represents reaction (a, b, c, and d) and scenario numbers (1-8). For Scenario 1-8, the most effective reac-
tions rated by experts are c, d, d, c, d, d, c, and d.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
13. Wang et al. / Assessing Teamwork 119
Relationships between Three Evaluation Methods
Table 4 displays the correlations of the five teamwork scales (Cooperation,Advocate/
Guide, Negotiation, SJT, and teacher report). The SJT assessment was significantly
correlated with all three self-report scores around the same magnitude (as might be
expected as scenarios covered the same teamwork subcomponents). The SJT and the
teacher report also shared a moderate and positive relationship. Teacher-report scores
were significantly correlated with two of the student self-report scores: cooperation
and advocating/influence.
To further investigate the relationship between the three assessment methods,
self-report and teacher-report scores were also compared across the two SJT latent
classes, and the results are given in Table 5. The high-teamwork-skill class scored
higher on the three self-report scales than the low-teamwork class, although no sig-
nificant differences between two classes were observed for the teacher-report scale.
Table 4
Correlations between all Teamwork Scores
Student Self-Report
Teamwork Score Cooperation Advocate/Guide Negotiation SJT Teacher Report
Cooperation .88
Advocate/guide .66** .80
Negotiation .79** .59** .78
SJT .52** .47** .60** .71
Teacher report .19* .32** .14 .33** .98
Note: Cronbach’s alpha reliability shown on diagonal. SJT = situational judgment test.
*p < .05. **p < .01.
Table 5
Teamwork Scores Compared by Latent Classes
High Teamwork Low Teamwork
Skills (n = 64) Skills (n = 95)
Teamwork Score M (SD) M (SD) t
Cooperation 4.74 (0.74) 4.32 (0.81) 3.38**
Advocate/guide 4.14 (0.81) 3.88 (0.78) 2.00*
Negotiation 4.59 (0.68) 4.36 (0.78) 2.01*
SJT 4.27 (0.35) 3.50 (0.44) 11.85**
Teacher report 3.43 (1.05) 3.28 (1.25) 0.81
Note: SJT = situational judgment test.
*p < .05. **p < .01.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
14. 120 Canadian Journal of School Psychology
Relationships between Teamwork Scores and Demographic Variables
No significant gender or ethnic differences were found for the self-report subscales,
teacher-report scores, or SJT scores. Age was positively correlated with the three self-
report (r = .31 to .35, p < .01) and SJT scores (r = .32, p < .01) but not significantly with
the teacher-report score. Participants in the high-teamwork latent class were significantly
older than participants in the low-teamwork latent class (M = 16.31, SD = 0.96 vs.
M = 15.91, SD = 1.07; t = 2.42, p < .05). Age significantly predicted all teamwork mea-
sures when controlling for the number of Ford PAS modules a student had undertaken
(partial r = .25 to .30 for self-reports, .28 for the SJT, and .18 for the teacher report,
p < .05 in all cases). However, the number of Ford PAS modules undertaken was not a
significant predictor of any teamwork measure after controlling for the age variable.
Relationships between Teamwork Scores and Course Grades
The SJT scores did not correlate significantly with students’ grades. The teacher-
report correlated significantly with math, science, and social studies grades (r = .21, .30,
and .27, respectively, p < .01). Cooperation correlated moderately with science and
music grades (r = .18 and .38, p < .05), whereas Advocate/Guide correlated positively
with science (r = .32, p < .01), social science (r = .19, p < .05), and music grades (r = .40,
p < .01). Negotiation shared a positive correlation with music grades only (r = .50,
p < .01). A grades composite was calculated by taking the means of the different course
grades. Only Advocate/Guide scores of the self-report scale and total teacher-report
score were significantly correlated with the grades composite (r = .25, p < .01).
Discussion
Scores on the three teamwork assessments: (a) related to each other, (b) were
unrelated to ethnicity or gender, (c) increased with age, and (d) related to school
grades. Generally, the aims of the study were met: A multiple-method assessment
system was developed for high school students, and scores from this assessment
suite showed evidence of convergent and criterion-related validity. Although all
teamwork assessments were related, there were some distinguishing features of the
different methods, with related implications for the use and purposes of such assess-
ments. These issues are discussed below.
Relationships among Teamwork Measures
The strongest relationship between self- and teacher-reports was for Advocate/
Guide, perhaps because this factor is more obvious to external observers (Cooperation
and Negotiation appear less open to observation). The SJT was less strongly related
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
15. Wang et al. / Assessing Teamwork 121
to the teacher-report than the self-report scales. Plausibly if educators wish to mea-
sure constructs that are not obviously and frequently observable in students’ behav-
iors, they may need to supplement teacher-ratings with reports from observers who
know the student well (e.g., family members, peers). Generally, these results support
the construct validity of the scores but indicated that different measurement methods
may capture different aspects of the teamwork construct.
Developmental Trajectory of Teamwork over Late Adolescence
This study found significant positive correlations of age with self-reports and SJT
teamwork scores. As these students were involved in teamwork courses run by Ford
PAS, results may be due to learning effects (as the students in higher grades often
take more courses than students in lower grades). However, correlations remained
significant after controlling for number of Ford PAS modules taken, indicating that
increases are due to maturation rather than course exposure. This outcome has
important implications for using such assessments for program evaluation as
increasing teamwork would have to be compared to natural developmental gains.
That said, it should be noted that this study was not a controlled intervention design.
Future studies might be undertaken to investigate intervention effects by using a
pre–post and control group design and having the multiple-method assessment of
teamwork serve as the dependent variable(s).
Does Teamwork Predict Academic Achievement?
Self-reported teamwork correlated with different courses grades to varying degrees.
Only the teacher-report and the Advocate/Guide self-report score predicted the grades
composite, although several measures predicted individual subject grades, with the
strongest relationship found for music. Although music had the lowest number of cases
(n = 41), this relationship makes conceptual sense. Of all the subjects measured, aca-
demic performance in music depends most on teamwork: Playing pieces as a group
forms an essential part of the subject, with the negotiation of piece choice, solos, and
group practice times playing a role in final performance and grade. Such a focus on
team performance is not essential for many other subjects, although it is certainly rel-
evant to the performing arts, debating, or team sports (although note that the importance
of teamwork might differ depending on pedagogical approach to learning, for example
with the mathlete competitive teams for mathematics). In short, overall grade point
average may be too broad a variable for teamwork to predict; that is, teamwork may be
more useful in predicting those aspects of academic performance where it is stressed.
Comparison of Self-report, SJT, and Teacher-Report Scores
There are advantages and disadvantages to each assessment method used in this
study, and an argument could be made that these compliment and offset each other.
The primary problem with self-reports is that these may be susceptible to impression
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
16. 122 Canadian Journal of School Psychology
management—Students may fake good or even fake bad. Teacher-reports are rela-
tively more objective and may reduce faking problems. However, it is unlikely that a
teacher could observe all aspects of students’ teamwork skills, thus teacher-ratings
may not cover the entire spectrum of the teamwork construct (Youngstrom, Loeber,
& Stouthamer-Loeber, 2000). In addition, teacher-ratings of students may be difficult
to implement in practice, given the teacher workload of such an activity in a typical 20
to 30 student classroom. Indeed, the zero variability found for 35 teacher-report cases
in this study might indicate that teachers were fatigued by this assessment procedure.
A further advantage of teacher assessments is that ratings are not confounded by the
verbal ability of the student. Students who are English language learners, for example,
may not understand some items and thus provide answers indicating poor teamwork
due to lack of language comprehension rather than poor teamwork. This potential con-
found might be particularly problematic for the SJT, which involve large amounts of
text. However, presenting SJT items via video or audio may ameliorate this problem.
In addition, SJTs may detect subtle judgment processes by asking participants to
provide intuitive judgments about ecologically valid scenarios. This ecological valid-
ity may also make SJT items more engaging than traditional self-reports, although
such context-rich material makes objective standards for scoring difficult. In this
study, however, the LCA of students’ responses provided independent confirmation
of expert judgments. Classes who disagreed/agreed with the expert scoring-key were
identified, providing evidence for the validity of the expert judgments. This novel
approach to ascertaining the validity of expert opinion could be usefully extended
to SJTs assessing workplace competencies, tacit knowledge, or emotional intelli-
gence. Although the small sample size in this study made a multilevel analysis of all
32 responses (4 × 8 scenarios) impossible, multilevel LCA models could be devel-
oped in future to accommodate all responses in a ratings-based SJT.
Limitations of the Current Study
As with any new assessment developed, there is a clear need to conduct additional
studies. In the present instance, this includes (a) formally testing factor structure across
additional, disparate samples (e.g., non-Ford PAS students, ethnic groups), (b) getting
a better understanding of the extent that group performance factors into grades in spe-
cific subjects in schools, and (c) formally conducting a multimethod multitrait design
(by, for example, having more SJTs to possibly explore the dimensionality of the
construct). Clearly too, it would be especially important to conduct a longitudinal
study of these teamwork assessments to more fully understand developmental trends
and possible casual mechanisms.
Future Applications for Teamwork Research in High Schools
Subject to certain caveats, including the need for further validity studies, there are
several useful ways that this teamwork instrument might be applied in high schools.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
17. Wang et al. / Assessing Teamwork 123
First, this multiple-method assessment might be used for early identification and
primary intervention, as deficits in teamwork can potentially harm students’ higher
education, career opportunities, and quality of life. Certainly, it would be beneficial
to identify students with deficits in teamwork sooner rather than later and provide
appropriate remediation. Students high on teamwork might be selected as mentors
or role models for students low on teamwork, with study or project groups composed
accordingly. In addition, feedback and suggestions for improvement might be given
to students based on their own unique profile of teamwork skills.
Second, the instrument might be used to gauge the effects of training. Programs
emphasizing teamwork (e.g., Ford PAS) are already implemented in some schools.
The multiple-method assessment could help determine those aspects of teamwork that
might most be amenable to training and fine-tune the programs accordingly. Third, the
instrument might be used as a form of career guidance/advisement, in conjunction with
cognitive tests and interest inventories. For example, students with very high negotia-
tion skills might be directed toward courses or careers where these skills might prove
valuable.
Overall, this study suggests some promising new directions in teamwork research
and its application in high schools. A reliable multiple-method teamwork assess-
ment system was developed with promising validity evidence. Such an instrument
might profitably be used for manifold purposes in high schools, with multiple meth-
ods a useful technique for overcoming the practical limitations evident in giving
any assessment in isolation.
Note
1. After exclusion, only one reverse-keyed item remained. The phenomenon where reverse-keyed
items do not load on the same factors as non-reverse-keyed items (and hence are removed from the item
pool) is not isolated to the current dataset but is a measurement issue that has been commented on fre-
quently in the literature (e.g., Barnette, 2000). It is as yet unresolved how to concurrently deal with this
issue and also control for acquiescence or other response sets.
References
Ahles, C. B., & Bosworth, C. C. (2004). The perception and reality of student and workplace teams.
Journalism & Mass Communication Educator, 59, 42-59.
Barnette, J. J. (2000). Effects of stem and Likert-response option reversals on survey internal consistency:
If you feel the need, there is a better alternative to using those negatively worded stems. Educational
and Psychological Measurement, 60, 361-370.
Barton, P. E. (2007). What about those who don’t go? Educational Leadership, 64, 26-27.
Hu, L.-T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation mod-
eling: Concepts, issues, and applications (pp. 76-99). Thousand Oaks, CA: SAGE.
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary
Journal, 6, 1-55.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from
18. 124 Canadian Journal of School Psychology
Liu, O. L., & Rijmen, F. (2008). A modified procedure for parallel analysis for ordered categorical data.
Behavior Research Methods, 40, 556-562.
Loughry, M. L., Ohland, M. W., & Moore, D. D. (2007). Development of a theory-based assessment of
team member effectiveness. Educational and Psychological Measurement, 67, 505-524.
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing
approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s
(1999) findings. Structural Equation Modeling: A Multidisciplinary Journal, 11, 320-341.
McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P. (2001). Use of
situational judgment tests to predict job performance: A clarification of the literature. Journal of
Applied Psychology, 86, 730-740.
McDaniel, M. A., & Nguyen, N. T. (2001). Situational judgment tests: A review of practice and constructs
assessed. International Journal of Selection and Assessment, 9, 103-113.
Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Selecting individuals in team settings: The
importance of social skills, personality characteristics, and teamwork knowledge. Personnel Psychology,
58, 583-611.
Muthén, L. K., & Muthén, B. O. (1998 -2007). Mplus user’s guide (4th ed.). Los Angeles, CA: Author.
O’Neil, H. F., Jr., Wang, S., Jr., & Lee, C. (2003). Assessment of teamwork skills via a teamwork ques-
tionnaire. In H. F. O’Neil Jr. & R. S. Perez. (Eds.), Technology applications in education: A learning
view (pp. 283-303). Mahwah, NJ: Erlbaum.
Stevens, M. J., & Campion, M. A. (1994). The knowledge, skill, and ability requirements for teamwork:
Implications for human resource management. Journal of Management, 20, 503-530.
Stevens, M. J., & Campion, M. A. (1999). Staffing work teams: Development and validation of a selection
test for teamwork settings. Journal of Management, 25, 207-228.
Youngstrom, E. A., Loeber, R., & Stouthamer-Loeber, M. (2000). Patterns and correlates of agreement
between parent, teacher, and male youth behavior ratings. Journal of Consulting and Clinical Psychology,
68, 1-13.
Zeidner, M., Matthews, G., & Roberts, R. D. (2009). What we know about emotional intelligence: How
it affects learning, work, relationships, and our mental health. Cambridge, MA: MIT Press.
Lijuan Wang obtained her PhD degree in quantitative psychology at the University of Virginia in 2008
and is working as an assistant professor in the Department of Psychology at the University of Notre
Dame.
Carolyn MacCann received her PhD in psychology from the University of Sydney in 2006, before
undertaking a 2-year postdoctoral fellowship at the Educational Testing Service, Princeton. Her research
focuses on social and emotional competencies and noncognitive constructs.
Xiaohua Zhuang is a PhD candidate in cognitive psychology at Rutgers University. Her research inter-
ests include visual attention, inattentional blindness, and visual consciousness.
Ou Lydia Liu received her PhD from University of California, Berkeley, in quantitative methods and
evaluation. She is currently an associate research scientist at Educational Testing Service, Princeton. Her
research areas include educational and psychological instrument validation, accountability measures in
higher education, and science assessment.
Richard D. Roberts, PhD, is a principal research scientist in the Center for New Constructs in the
Educational Testing Service’s Research & Development Division, Princeton, NJ. His main areas of spe-
cialization are assessment and human individual differences, where he has published more than 150
peer-reviewed works and received significant grants, contracts, and awards.
at University of Sydney on January 13, 2010
http://cjs.sagepub.com
Downloaded from