Team Formation Dynamics in Online Learning: Homophily and Heterophily

Team Formation Dynamics: A Study Using Online
Learning Data
Milad Eftekhar
Department of Computer
Science
University of Toronto
Toronto, ON, Canada
milad@cs.toronto.edu
Farnaz Ronaghi
∗
Department of Management
Science and Engineering
Stanford University
Palo Alto, CA, USA
farnaaz@stanford.edu
Amin Saberi
Department of Management
Science and Engineering
Stanford University
Palo Alto, CA, USA
saberi@stanford.edu
ABSTRACT
Using data from online courses, we study the dynamics of
team formation in online environments. In particular, we
observe that the teams formed by online students for com-
pleting course projects are homogeneous in terms of age,
location and education level but diverse in terms of pri-
mary skill. Motivated by the data, we propose a coalitional
game that captures the teaming preferences of individuals
and show that the core of the resulting game is always non-
empty. Even though our proof is constructive, it does not
always yield a polynomial-time algorithm. We show that it
is NP-hard to find a solution in the core in the general case
and propose polynomial-time algorithms for natural special
cases motivated by observations of online course data.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications–
Data Mining; K.3.1 [Computers and Education]: Com-
puter Uses in Education – Collaborative Learning; I.2.11
[Artificial Intelligence]: Distributed Artificial Intelligence
– Multiagent Systems; J.4 [Social and Behavioral Sci-
ences]: Economics
General Terms
Algorithms, Measurement, Theory
Keywords
Team Formation; Online Education; Social Learning; Core
of Collaborative Games
1. INTRODUCTION
In addition to increasing access to education at global
scale, the emergence of Massive Open Online Courses (MOOCs)
∗This work was done while authors were at NovoEd Inc.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
COSN’15, November 2–3, 2015, Palo Alto, California, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3951-3/15/11 ...$15.00.
DOI: http://dx.doi.org/10.1145/2817946.2817967.
has given us a new view into how people learn, interact, and
collaborate. The availability of the MOOC data across re-
search institutions, their scale and granularity, as well as
their reach across countries and cultures make them an in-
valuable source for research.
This paper focuses on the dynamics of team formation
in an online environment. We studied team formation in
11 online courses with about 50 thousand participants from
over 150 different countries. The courses were offered on
NovoEd, where teamwork and collaboration are an integral
part of the coursework. Students were asked to form teams
to work on a business project. The teams were organic;
participants could search and browse each other’s profiles
and join each other’s teams.
We studied the teams formed by students and reviewed
the joint distribution of characteristics, e.g., age, location,
gender, and education level. We observed a high degree of
homogeneity across age, location, and education level but
surprisingly not across gender. This was consistent across
the courses as well as across different segments of the popu-
lation. The tendency of individuals to associate with similar
others - dubbed as homophily - is discovered in a vast array
of studies about social networks.
The more interesting observation here is that among suc-
cessful organic teams (teams that possess a high fraction
of members who acquired a statement of accomplishment
at the end of the course), we also observed a high degree
of diversity (heterophily) in terms of participants’ primary
skill sets. In other words, multidisciplinary teams with more
diverse skill sets were more successful than the rest: they
sought to form multidisciplinary teams by seeking team-
mates with complementary skills. Homophily (across age,
location, and education level) and heterophily (across skill
set) proved to be an effective strategy, leading to more suc-
cessful teams. Section 3 provides a detailed description of
our observations.
Based on the observations, we proposed a simple game
theoretic model to capture the preferences of individuals and
their dynamics. In our model, each individual (or “agent”)
was endowed with a vector of characteristics, like age, lo-
cation, or skill set. We partitioned agents into teams of a
certain size and assumed that team success depended on
the characteristics of the individuals present in that team.
Obviously, every agent would seek a team with the highest
quality but would have to be accepted into the team, as
well. To capture this, we used the notion of a coalitional
game core [10]. We define a partition of agents into teams

as begin is in the core if no subset of agents could improve
its score function by deviating from the proposed partition
and forming its own team.
A few aspects of this model merit comment. First, the
notion of core is a natural generalization of pairwise sta-
bility used in the context of widely studied stable marriage
problems [20]. Instead of pairwise stability, we focus on the
stability of groups of size larger than two. Second, like the
stable marriage problem, we do not allow coalition to com-
pensate each other with side payments. More precisely, we
use the notion of core in a game with non-transferable pay-
off.
The most important distinction of the games studied here
is that all team members benefit equally from the union.
In other words, the utility of an agent in the team equals
the value or quality score of that team. This is a natural
assumption in our setting since all students in a team re-
ceive the same grade for a joint project. But our model is
applicable elsewhere, too. For example, members of a group
benefit in a similar way when they form teams to undertake
a project that supports a community (like building a park,
library or school) or when there are social norms against
differentiation (like specifying authors in a theory paper).
We prove that under general assumptions, and as long as
members of a team equally benefit from its value, the core
of the game is non-empty. In other words, it is always pos-
sible to partition agents into sets such that no subset has
an incentive to secede. Our proof (presented in Section 5)
is constructive, but it does not always lead to a polynomial-
time algorithm. In fact, we prove that it is NP-hard to find
a stable partition in most cases. Section 6 offers polyno-
mial time algorithms for natural special cases motivated by
observations in Section 3.
2. RELATED WORK
The impact of homogeneity or diversity on team effective-
ness has been studied extensively. In homogeneous teams,
members share similar characteristics which results in eas-
ier collaboration, positive reactions, and extensive engage-
ment [17,23]. In diverse teams, the existence of varied abil-
ities and ideas sparks creativity and helps achieve a higher
final performance [18]. Another detailed study of team com-
position (including personalities, skills, team size, roles, etc.)
and its impact on team performance is found in Senior and
Swailes [21].
A specific line of related research focuses on the impact of
a founding team’s composition on the performance of firms
and ventures [5,7–9].
The team formation problem has been studied extensively
for scenarios that aim to form a single team for each exist-
ing task. Having a set of users and one or multiple tasks,
the goal is to find a team that contains all required skills
to complete the task. The problem has been studied in op-
erations research [25] and revisited in computer science by
adding compatibility constraints (in the form of communi-
cation overhead over a social network of users) [15], cost
constraints (the cost associated with adding each user to
a team) [3, 19], and time constraints (task work permitted
only during free time) [11]. Another extension of the prob-
lem considers a time-series of arriving tasks whereby users
are assigned to each task with a fair task allocation [4,16].
The concept of teams and forums performing collaborative
learning in MOOCs has been studied more recently [13,
24]. In one work, users are assumed to have a vector of
characteristics, and the goal is to group users into teams with
an upper-bound size such that team members have identical
values on all characteristics [6].
Another interesting line of work [1, 2] considers the vari-
ance in individual ability. Assuming the ability of a group of
users is the average of the ability levels of its members, the
goal is to partition users into groups such that the number
of users with an ability level less than that of their team
is maximized. In our context, users have complete control
over whom they choose to accept into their teams, therefore,
we chose game-theoretic modeling instead of optimization.
Combining the approach of the two papers poses an inter-
esting open problem.
The structure of teams as coalitions has been studied as a
network formation game in [22] and [12], where the strength
of a coalition is defined as the strength of the social network
structure creating it in a hedonic coalition formation game.
In many network coalition games, the network is used to
determine payoffs, or even network creation is considered
as a strategic process [14]. Studying the team formation
problem as a network formation game, one where network
structure affects the utility functions of agents, offers an
interesting future direction for our work. This paper defines
the utility function of a team based on the characteristics of
its members.
3. DATA OBSERVATIONS
This section describes the data, our methodology for mea-
suring homogeneity and diversity in teams, and our analysis.
3.1 Teams
NovoEd is a social learning platform used to offer expe-
riential and collaborative online classes of varying sizes. In
many courses offered on NovoEd, team work and collabora-
tion form an integral part of coursework.
The learning platform supports two types of team for-
mation processes: algorithmic and organic. Algorithmic
teams are formed automatically based on criteria set by
the instructor (e.g., size of the team and geographical loca-
tion of the members), and are often transient, being formed
around a particular assignment and dissolved afterwards.
Organic teams are formed by learners themselves and may
persist throughout the course, although students can leave
one team and join another at any time. Students find each
other through searching submitted assignments, answers to
course profile questions, locations and keywords in the pro-
file. They may be invited to join a team by the team leader
or individually ask to join a team.
This paper focuses on organic teams formed by the stu-
dents, analyzes student preferences when selecting a team,
and identifies success factors in these teams.
3.2 Dataset
Our dataset includes 11 courses offered in the summer of
2014. Enrollments ranged from 200 students to more than
25, 000 students per course. These courses were four- to
eight-weeks long and covered various business topics. Three
of the analyzed courses had tens of active students (students
who had done some course activities e.g., submitting an as-
signment), five had hundreds of them and three had thou-
sands.

Each student had a profile that included age, education,
gender, and location. Instructors could choose to introduce
extra profile questions such as those regarding skills. Profile
question responses were the main mechanism for searching
out team members.
3.3 Measuring Homogeneity and Diversity in
a Team
We propose a framework to determine whether students
prefer to work with others similar to them (i.e., homogeneity
is preferred) or with a diverse group (i.e., diversity is pre-
ferred). We describe this analytical framework by studying
age diversity in organic teams.
Students are required to select their age range in their
profile. Provided options are: 18−20, 21−25, 26−30, 31−
35, 36 − 40, 41 − 45, 46 − 50, Over 50. The age range
selected is not shown on their profiles and is not searchable
by other students, but it could be inferred from student
profile pictures and biographies.
We calculate the number of pairs of students in one team
that belong to age groups X and Y . To remove size bias, we
normalize this number by the total number of possible pairs
that could have been created from members of age group X
with members of age group Y (students are not uniformly
distributed in age groups). Let n(X) represent the number
of students in age group X and pair(X, Y ) represent the
number of pairs with one student from age group X and one
student from age group Y . Let temp(X, Y ) = pair(X,Y )
n(X)×n(Y )
.
For the special case of X = Y , the number of possible pairs
is n(X)
2
. Hence, we define temp(X, X) = pair(X,X)
(n(X)
2 )
.
Consequently, we define a metric to measure the prefer-
ence of students from age groups X and Y to be in one team
as:
pref(X, Y ) =
temp(X, Y )
average( Z temp(X, Z), Z temp(Y, Z))
Figure 1 shows age preferences in forming teams for a typ-
ical course.1
Darker colors represent larger values, showing
age groups that prefer to be in the same team. We observe
that cells on the diagonal (or close to it) have larger values.
This suggests a preference for age-homogeneous team cre-
ation. Students under 20 or students over 50 show a strong
preference to join same-age teams (viz., dark colors on the
bottom left and top right cells), while students between 26-
40 show more flexibility in age groups when forming teams.
Student profiles in our dataset also include education level
(i.e., Some high school, Graduated high school or equiva-
lent, Some college or university, Graduated with an asso-
ciate degree, Graduated with a bachelor’s degree, Gradu-
ated with a master’s degree, Graduated with a doctorate
degree); gender (male, female); location (latitude-longitude
pair); and skill set. Skill set, a course-specific question,
may have different values in different courses. For exam-
ple, skill set options defined by the instructor in a busi-
ness class included: Aerospace, Finance, Machinery, Archi-
tecture, Chemicals - Materials, Consumer products, Other
manufacturing, Telecommunications, Publishing- Schools, Ser-
vice primary secondary, Energy - Electric utilities, Software
- Internet - Mobile, Drugs - Biotech - Medical devices, Man-
agement and finance consulting, Electronics - Computers,
Law - Accounting - other business services, Other services.
1
We observed consistent behavior across all courses.
Homogeneity in teams with regard to age
18-20 21-25 26-30 31-35 36-40 41-45 46-50 over 50
18-2021-2526-3031-3536-4041-4546-50over50
Figure 1: Students prefer to form teams with those of the
same age group.
Figure 2: Preference for age-homogeneous team membership
We look deeper into team formation and analyze homo-
geneity preferences across different profile questions utilizing
the following analytical framework. For each profile ques-
tion, we plot the cumulative distribution of distances be-
tween responses given by team members in organic teams
and compare it to the same distribution in teams that could
be built completely at random. We consider all students
who responded to each of the profile questions in this analy-
sis. Profile questions can be represented as numerical (loca-
tion [latitude, longitude] and gender [0,1]), ordinal (age and
education), and categorical (skill set). We define pairwise
distances between two data points for each profile question
as follows:

Figure 3: Preference for longitude-homogeneous team mem-
bership
• Numerical profile questions: Location is defined
by two numeric values: latitude and longitude. Age is
defined as an integer by assigning binary values to male
and female. Pairwise distances of two data points for
age and location is defined as the Euclidean distance
between them.
• Ordinal profile questions: Ordinal features, such
as age range and education level, are represented with
consecutive categories. We define the pairwise distance
between two data points as the distance between the
categories to which they belong. For example, for ed-
ucation, the distance between “Some high school” and
“Graduated high school or equivalent” is 1.
• Categorical profile questions: We utilize entropy
to define the distance between categorical values. En-
tropy in a team is defined as m
i=1 pi log(pi), where pi
is the fraction of students who have selected i as their
response, and m is the number of categories.
3.4 Homogeneous and Diverse Teaming Pref-
erences
We calculate cumulative distribution functions for pair-
wise distances of student profile data in teams formed organ-
ically and compare them to random teams. Organic teams
are homogeneous with respect to a profile question if the
CDF of pairwise distances of team members for that pro-
file question appears higher than the same CDF for random
teams.
We also study the impact of homogeneity on the success
of students in organic teams by comparing the pairwise dis-
tance CDFs for all organic teams with the CDFs for the
organic teams that contain at least 50% (“successful”), 75%
(“very successful”), and 100% (“extremely successful”) suc-
cessful team members. A successful student is one who has
completed the course and received a statement of accom-
plishment. Criteria for receiving a statement of accomplish-
ment are highly dependent on the course instructor.
Figure 4: Preference for latitude-homogeneous team mem-
bership
As explained in the following subsections, we observe that
students prefer homogeneity with regard to location, dis-
tance, age range and education level. Interestingly, however,
students do not have any preference concerning the gender
of their team members. We also observe that students tend
to prefer heterogeneous skill sets in successful organic teams.
3.4.1 Homogeneous Teaming Preferences
Figure 2 shows the CDF of pairwise distances for age range
in a course. The CDF curve for the organic teams is always
above the CDF for the random case, attesting to the pref-
erence for age-homogeneity in team formation. This result
was reproduced consistently across all eleven courses in our
dataset.
Figures 3-6 show CDF curves for longitude (time zone),
latitude, geographical distance, and education level. The
CDF for organic teams appears above the CDF for random
teams for all these features consistently across all courses –
even for courses with supportive teams (i.e., students who
discuss assignments in teams but write answers individu-
ally). Teams show greater homogeneity with regard to lon-
gitude than latitude: longitude represents time zone, and
our results show that students organically form teams with
those in similar time zones.
We also reviewed invitations extended by team leaders
when they are growing their teams and observed similar ho-
mogeneous preferences for team invitations. This provides
us with a second resource to confirm that students prefer to
work with those with characteristic similar to their own.
We analyzed team formation preferences for successful or-
ganic teams with regard to age, education, and location
and reproduced the same homogeneous preferences for these
teams compared to the random case. However, it is impor-
tant to note that increasing homogeneity in a team does not
necessarily translate into making individual students more
successful in that team. In fact, in some courses we observed
that teams with more successful students depict a more di-
verse distribution compared to all other organic teams. This

Figure 5: Preference for distance-homogeneous team mem-
bership
Figure 6: Preference for education-level-homogeneous team
membership
suggests that teams with many successful students are more
homogeneous than random teams but not necessarily more
homogeneous than other organic teams.
We also analyzed teaming preferences with regard to gen-
der. We didn’t find any statistically significant result that
is consistent across all courses.
3.4.2 Diverse Teaming Preferences
Finally, we analyzed skill diversity CDF curves for organic
teams and for successful organic teams. Figure 7 shows that
the CDF curves for the organic teams with many success-
ful members are lower than the CDF curves for all organic
teams. Higher skill set entropy in a team is equivalent to
Figure 7: Skill diversity in high-profile organic teams versus
all organic teams
higher skill set diversity among that team’s members. Teams
with more successful members have diverse skill sets. This
result is consistent across all courses. Therefore, students
depict diverse teaming preferences with regard to skill set
and other similar profile questions (e.g., background, career
sector, and professional interests).
4. MODELING
Our observations in Section 3 suggest two main criteria
to consider when forming teams. Individuals prefer to form
teams with those who are in the same timezone and have
a similar location, age range and education level. We also
observed that members of teams with more diverse skill sets
achieve better learning outcomes in online courses.
To better understand the dynamics of team formation, we
employ tools and solution concepts from cooperative game
theory. This is appropriate because in our setting individu-
als have agency over team formation by choosing their own
teammates. Moreover, they benefit directly from the success
of their team and strive to form teams that have the highest
probability of success.
Let A be a set of n agents (e.g., students). Let c : A → Z
map each agent to a vector of l characteristics with discrete
values like age, education level, latitude, and longitude.2
For each a ∈ A and 1 ≤ j ≤ , cj(a) denotes the j-th
characteristic of a.
In addition, we assume that each agent has a primary skill
s : A → S, where S = {s1, ..., sm}.
Team and Team Allocation. Given a number k and a
set of predetermined thresholds {δj}1≤j≤ ≥ 0, a set T ⊆ A
is a team if 1 ≤ |T| ≤ k, and for each a, b ∈ A and 1 ≤ j ≤ ,
|cj(a) − cj(b)| ≤ δj.
We consider the above constraints as the homogeneity con-
straints, and we call {δj} the homogeneity thresholds. Here,
2
Without loss of generality, we assume the difference be-
tween two continuous values is 1 and the values are in Z.

k introduces a size constraint to avoid making very large
teams, and homogeneity constraints are introduced in accor-
dance with the observations in Section 3.4. We use T ⊆ 2A
to denote the set of all possible teams of agents. A team
allocation P is a partition of A into teams.
Value Function of a Team. The value function, v :
2A
→ R+, assigns a non-negative value to each set B ⊆ A.
One can think of v(T) as the quality score or the likelihood
of success of a team T. We assume that v(B) is a monotone
increasing function of the skill set present in B. In other
words, we assume there exists a function g : 2S
→ R+ such
that for any B ⊆ A,
v(B) = g (∪a∈Bs(a)) .
These assumptions simplify the problem significantly. In
a sense, we reduce each person to a vector of characteristics
and each team to a skill set. Obviously, the reality is far
more complex; many other factors affect the decision of in-
dividuals to form a team and whether the resulting team is
successful. On the other hand, in many situations, including
team formation in an online course, individuals must form
teams with limited information about each other. Our Sec-
tion 3 observations indicate that the diversity of skill set and
homogeneity of basic characteristics do play a strong role in
both formation and success of teams and hence the validity
of our assumptions.
Utility function. The utility of every agent is equal to
the value function or quality score of the team to which he
or she belongs. More formally, for any agent a ∈ T of a
partition P,
uP (a) = v(T).
In our model, utility function assumes that all members
benefit equally from the value created by the team. In other
words, team members cannot distribute the value freely or
compensate each other with side payments. This model
applies to a variety of settings. The most immediate ex-
ample is in the context of online courses, where all team
members receive the same grade for finishing the project.
Other scenarios include those where a large group benefits
almost equally from some projects, such as building a mu-
seum, park, or school. Our model also applies tp situations
where it is against social norms to differentiate explicitly be-
tween contributors (e.g., as when writing a paper in a theory
conference).
4.1 Stable Solutions and Core of a Coopera-
tive Game
Core of a game is a simple and intuitive notion defined in
cooperative game theory to study the stability of groups or
coalitions. The core consists of all configurations of agents
and payoff allocations that cannot be improved upon by a
coalition of agents. This means that once an agreement in
the core has been reached, no coalition has an incentive to
secede.
In our context, a team allocation P is stable (P ∈ core) if
and only if for all T ∈ T , there exists an agent a ∈ T such
that3
v(T) ≤ uP (a).
This means that for all possible teams T, the utility of at
least one agent (we call it a) under the current team allo-
3
Please recall that T denotes the set of all possible teams.
cation P is not less than its utility when it secedes from P
and forms team T. Hence, a is not willing to secede, and P
is stable.
Our goal is to identify a team allocation in which: (1) the
size of each team is at most k, (2) homogeneity constraints
are respected, and (3) the solution is in the core. In particu-
lar, after forming the teams, there is no group of agents that
could improve its utility by leaving its team and forming a
new team.
Problem 1. Given a set of agents, A, a value function
v : 2A
→ R+, a set of homogeneity thresholds {δj}1≤j≤ ≥ 0,
and a size limit k, find a team allocation P that is in the
core.
5. THEORETICAL ANALYSIS OF THE TEAM
ALLOCATION PROBLEM
This section partially answers Problem 1.4
We start by
showing that there is always a team allocation P in the core.
Consider the following optimization problem.
Problem 2. Given a set of agents, A, a value function
v : 2A
→ R+ and a set of homogeneity thresholds {δj}1≤j≤ ≥
0, find a team T ∈ T such that
v(T) ≥ v(T )
for all T ∈ T .
Our first theorem shows that, given access to an oracle that
solves Problem 2, there is an algorithm that can find a team
allocation in the core in polynomial time. Then, we show
that Problem 1 is equivalent to Problem 2 (up to a linear loss
in |A|). We continue by discussing the fact that Problem 2
is NP-hard for an adversarially chosen value function. This
suggests that Problem 1 is also NP-hard. We continue by
constructing a team allocation for a special family of value
functions.
Theorem 1. For any set of agents A, any value function
v : 2A
→ R+, and thresholds {δj}1≤j≤ ≥ 0, core = ∅.
Furthermore, given access to an oracle of Problem 2, there
is a polynomial time algorithm that finds P ∈ core.
Proof. We propose a simple algorithm. First, we use an
oracle of Problem 2 to find a team T with maximum value.
We add T to P, remove its members from A, and repeat the
procedure on the remaining set. The algorithm terminates
when A is empty. See Algorithm 1 for details.
Algorithm 1 Team Allocation Algorithm
while |A| > 0 do
Use an oracle of Problem 2 to find T ∈ T (where
T ⊆ A) of maximum value.
Add T to P and remove T from A, A = A T.
end while
Return P.
Let P be the output. For the sake of contradiction, assume
there is a team T ∈ T such that for all a ∈ T,
uP (a) < v(T).
4
A complete answer would be provided in the next sections.

Let a be the first agent in T that is assigned to another team
by the algorithm, and let T represent that team. Note that
v(T ) = uP (a) < v(T).
By the definition of a, when T is constructed, all members
of T were available, so T was also a feasible team at that
time. Since the oracle returned T, we must have
v(T ) ≥ v(T),
which is a contradiction. Hence, P ∈ core.
Observe that Algorithm 1 runs in polynomial time if the
oracle of Problem 2 runs in polynomial time. Unfortunately,
even where there is no homogeneity constraint and v(.) is a
submodular function, Problem 2 is NP-hard. In the fol-
lowing theorem, we show that this is not a limitation of
Algorithm 1 because Problem 1 is equivalent to Problem 2.
Theorem 2. Problems 1 and 2 are reducible to each other
in polynomial time.
Proof. Theorem 1 shows one direction: if we can solve
Problem 2 in polynomial time, Algorithm 1 solves Prob-
lem 1.
Conversely, given a polynomial time algorithm for Prob-
lem 1, let P be the output of this algorithm. Let T ∈ P be
the team with maximum value,
v(T) ≥ v(T ),
for all T ∈ P. We return T as the solution of Problem 2.
If T is not an optimal solution, there is a team T such that
v(T ) > v(T). But then P is not in the core because if all
agents of T withdraw from their allocated teams in P and
form T , they receive a higher utility.
Even without the homogeneity constraints, finding a set of
cardinality k maximizing a monotone function, even under
submodularity or supermodularity assumption, is NP-hard.
Therefore, Problem 1 is NP-hard.
Corollary 1. It is NP-hard to find a team allocation in
the core.
With a more intricate construction employing homogenity
constraints, we can also show that the maximization prob-
lem is NP-hard even when v is linear in the number of skills
present in the set. However, we leave that proof for a more
extensive version of this article.
6. TEAM ALLOCATION UNDER WEAKLY
SEPARABLE VALUE FUNCTIONS
Given Theorem 1, we focus on a specific class of value
functions. The following is a natural special case that admits
polynomial time algorithms.
Definition 1 (Weakly Separable Functions). A func-
tion g : 2S
→ R+ is weakly separable if for any B ⊆ A and
a, b ∈ A such that
g(B ∪ {a}) ≥ g(B ∪ {b}),
the following holds: For all B ⊃ B such that a, b /∈ B ,
g(B ∪ {a}) ≥ g(B ∪ {b}).
We say that a value function v : 2A
→ R+ is weakly sep-
arable if there is a weakly separable function g : 2S
→ R+
such that for any B ⊆ A,
v(B) = g(∪a∈Bs(a)).
Example 1. The following functions are weakly separa-
ble:
• g(V ) = (|V |)2
• g(V ) = |V |
• For any w : S → R+, g(V ) = s∈V w(s)
The rest of this section assumes that the value function
is monotone increasing and weakly separable. For each skill
s, we define the weight of s, w(s), to be the value of the
singleton containing s,
w(s) = g({s}).
First, we give a simple polynomial time algorithm for
Problem 1 when there is no homogeneity constraint. In Sub-
sections 6.1 and 6.2, we show that Problem 1 is polynomial
time solvable in the number of agents and the homogeneity
thresholds.
Theorem 3. If there is no homogeneity constraint and
the value function is weakly separable, then Problem 1 is
polynomial time solvable.
Proof. By Algorithm 1, we need only solve Problem 2
in polynomial time. Without loss of generality, assume that
there is at most one agent in A having each skill in S (if more
than one, keep one and remove the rest) and that |A| ≥ k
(if |A| < k, return A as the solution). Let A = {a1, . . . , an}
such that
w(s(a1)) ≥ w(s(a2)) ≥ · · · ≥ w(s(an)).
We show {a1, . . . , ak} is the optimum of Problem 2.
First, note that by monotonicity, the optimum set has size
k. Fix a set {b1, . . . , bk} such that
w(s(b1)) ≥ · · · ≥ w(s(bk)).
By definition of {a1, . . . , ak}, for all 1 ≤ i ≤ k,
w(s(ai)) ≥ w(s(bi)).
Therefore,
v({b1, . . . , bk}) ≤ v({b1, . . . , bk−1, ak})
since w(s(ak)) ≥ w(s(bk)) and v(.) is weakly separable. Sim-
ilarly,
v({b1, . . . , bk−1, ak}) ≤ v({b1, . . . , bk−2, ak−1, ak}).
Repeating this k times, we conclude that
v({b1, . . . , bk}) ≤ v({a1, . . . , ak}),
as desired.
Algorithm 2 outputs a team allocation in the core when
the value function is weakly separable and no homogeneity
constraint exists. It is easy to see that the algorithm runs
in time O((|S| + |A|) log(|S| + |A|)).
Building on the above analysis, Theorem 4 characterizes
all team allocations in the core of Problem 1 when no ho-
mogeneity constraint exists and the value function is weakly
separable.

Algorithm 2 Team Allocation Algorithm for weakly sepa-
rable value functions and no homogeneity constraint
1: Sort the skills in the decreasing order of weights, i.e.,
w(s1) ≥ w(s2) ≥ · · · ≥ w(sm).
Assume there is at least one agent with each skill.
2: while |A| > 0 do
3: Construct a new team T of size k by selecting one
agent from each of the highest weight k skills (if less
than k skills remain in the sequence, choose one agent
for each of them).
4: Add T to P and remove its agents from A. If there
are no more agents with skill si, remove si from the
sequence.
5: end while
6: Return P.
Theorem 4. 5
Consider a team allocation solution P.
Let ui be the minimum utility an agent with skill si receives
in P. Rename u values to u such that u1 ≥ ... ≥ um.
Team allocation P is in core if and only if there exists a
permutation of skills ˆs1, ..., ˆsm such that for all 1 ≤ j ≤ m:
uj ≥ g(AjBj),
where Aj = min(k+j−1,m)
i=1 {si}, Bj = j−1
i=1 {ˆsi}, and is
the set difference operator.6
Example 2. Consider a class of 15 students A = {a1, · · · , a15}
and 4 skills {s1, s2, s3, s4} . Students a1 − a3 have the pri-
mary skill s1, a4 − a8 have skill s2, a9 − a12 have skill s3,
and a13 − a15 have skill s4. Students are allowed to form
teams of size at most k = 3. Moreover, g(V ) = |V |.
Consider the following team allocation P:
T1 = {a1, a4, a9}, T2 = {a2, a5, a10}, T3 = {a3, a11, a13},
T4 = {a6, a12, a14}, T5 = {a7, a15}, T6 = {a8}.
The minimum utility of any student with skill s1 is u1 = 3.
Similarly, u2 = 1, u3 = 3, and u4 = 2. By sorting and
renaming ui values we get: u1 = u1 = 3 ≥ u2 = u3 = 3 ≥
u3 = u4 = 2 ≥ u4 = u2 = 1.
Consider the following permutation of skills (ˆs1 = s1, ˆs2 =
s3, ˆs3 = s4, ˆs4 = s2). The team allocation P is in core be-
cause
u1 = 3 ≥ g({s1, s2, s3}{}) = 3,
u2 = 3 ≥ g({s1, s2, s3, s4}{ˆs1}) = 3,
u3 = 2 ≥ g({s1, s2, s3, s4}{ˆs1, ˆs2}) = 3, and
u4 = 1 ≥ g({s1, s2, s3, s4}{ˆs1, ˆs2, ˆs3}) = 3.
The following theorem gives a second approach to charac-
terizing team allocations in the core.
Theorem 5. Let u1, ..., um be sorted minimum utilities
for different skills in a team allocation P. Let s1, ..., sm be
the corresponding skills. Solution P is in core if and only if
for all 1 ≤ i ≤ m
g({si, si+1, ..., smin(i+k−1,m)}) ≤ ui.
Example 3. Consider the setting in Example 2: 15 stu-
dents, 4 skills, k = 3, and g(V ) = |V |.
5
We discuss more complicated proofs, in the extended ver-
sion of this article.
6
Recall that without loss of generality, we assume that
w(s1) ≥ · · · ≥ w(sm).
Let team allocation P be: T1 = {a1, a2, a4}, T2 = {a3, a5, a6},
T3 = {a7, a8, a12}, T4 = {a9, a10, a11}, and T5 = {a13, a14, a15}.
We want to examine if P is in core.
The minimum utilities are u1 = 2, u2 = 2, u3 = 1, u4 = 1.
ui values are already sorted; hence u1 = u1, u2 = u2, u3 =
u3, u4 = u4 and s1 = s1, s2 = s2, s3 = s3, s4 = s4.
P is not in core because u1 = 2 < g({s1, s2, s3}) = 3.
6.1 Team Allocation Problem under One Ho-
mogeneity Constraint
This subsection extends Theorem 3 to the setting where
exactly one homogeneity constraint exists, i.e., we are look-
ing for a team allocation P such that for any team T,
max
a,b∈T
|c(a) − c(b)| ≤ δ.
Note that we are dropping the indices because there is just
one homogeneity constraint and, as usual, δ is the homo-
geneity threshold.
We prove the following theorem.
Theorem 6. If v is a weakly separable value function
and there is exactly one homogeneity constraint, then Al-
gorithm 3 finds a team allocation in the core in time poly-
nomial in |A|, |cmax − cmin|, where cmin = mina c(a) and
cmax = maxa c(a).
Proof. First, observe that an agent a can join agents
b ∈ A only where c(b) ∈ [c(a) − δ, c(a) + δ]. So, for any team
T, there exists an integer i ∈ [cmin, cmax] such that for all
a ∈ T,
c(a) ∈ [i, i + δ].
Without loss of generality, assume cmax − cmin > δ. For
any i ∈ [cmin, cmax − δ], let
Ai := {a : i ≤ c(a) ≤ i + δ}.
As before, we need only solve Problem 2. For each i, we find
a team Ti ⊆ Ai of maximum value using Algorithm 2. Then,
among all Ti, we simply return the one with the maximum
value T∗
i such that
i∗
= argmaxi v(Ti).
Such a set satisfies the homogeneity constraint and obviously
has the maximum weight among all teams.
Algorithm 3 finds a team allocation in the core when there
is only one homogeneity constraint and the value function is
weakly separable.
Algorithm 3 Team Allocation Algorithm for weakly sepa-
rable value functions under one homogeneity constraint
1: For each i ∈ [cmin, cmax − δ], let
Ai = {a : i ≤ c(a) ≤ i + δ}.
2: while |A| > 0 do
3: for i = cmin → cmax − δ do
4: Let Ti be the output of Algorithm 2 on Ai.
5: end for
6: Let T∗
be the Ti with maximum weight.
7: Add T∗
to P and remove it from A and all Ai.
8: end while
9: Return P.
The following theorem is immediate.

Theorem 7. A team allocation P is in core if and only
if Theorem 5 holds for all sets Ai = {a : i ≤ c(a) ≤ i + δ}
where cmin ≤ i ≤ cmax − δ.
6.2 Team Allocation under Multiple Homogene-
ity Constraints
Finally, we proceed to the case where l homogeneity con-
straints are present. For 1 ≤ j ≤ l,
∀ Team T, max
a,b∈T
|cj(a) − cj(b)| ≤ δj.
To identify a team allocation in core, again it is sufficient
to solve Problem 2 for this generalized case. We generalize
the approach described in the previous section. In partic-
ular, we create an l-dimensional cube based on cj values.
The cube contains e1 × · · · × el cells, where ej is the number
of possible integer values for characteristic cj between cjmin
and cjmax − δj:
ej = (cjmax − δj) − (cjmin ) + 1,
where cjmin = mina cj(a) and cjmax = maxa cj(a).
For any ij (1 ≤ j ≤ ) where cjmin ≤ ij ≤ cjmax − δj,
we construct a subset of A, denoted by Ai1,...,i (that corre-
sponds to a cell in the cube), as follows:
Ai1,...,i = {a ∈ A : ∀ 1 ≤ j ≤ , ij ≤ cj(a) ≤ ij + δj}.
We use Algorithm 2 to find the team with maximum
weight in each set Ai1,...,i and return the team with the
maximum weight among all as the solution of Problem 2.
The following theorem follows.
Theorem 8. If v is a weakly separable value function,
then the above algorithm finds a team allocation in the core
in time O(|A| · |S| · 1≤j≤ (cjmax − cjmin − δj)).
7. IMPLEMENTATION
Our algorithm builds teams based on teaming preferences
and δ parameters observed in organic teams. Algorithmic
teams can be used by instructors for brainstorming exer-
cises, case study discussions, and negotiation simulations.
Students are placed into teams to accomplish a collective
goal in a short period of time. The quality of suggested
teams has an immediate impact on learning outcomes in
these activities.
Organic teams are the main mechanism for creating an
intimate network of peers and learners in online courses. Or-
ganic team formation requires students to search for teams
they can join or to find new members to join their current
team. Both activities are time-consuming. Our algorithm
can suggest new team members to small teams and to stu-
dents without teams. This algorithm can implement a rec-
ommendation engine to reduce barriers of creating organic
teams and to increase the probability of success in resulting
teams.
This section describes how we implemented our proposed
algorithm and compares its stability and performance to two
baseline algorithms, Random and Exhaustive-Greedy. The
Random algorithm groups users randomly considering the
cardinality and homogeneity constraints. A student is ran-
domly selected and added to a team if homogeneity and size
constraints are satisfied. The Exhaustive-Greedy algorithm
creates teams as follows. It starts with an empty set. In each
step, it adds the student that maximizes the set’s value to
0
0.2
0.4
0.6
0.8
1
Random
Greedy
OurAlg
Random
Greedy
OurAlg
Random
Greedy
OurAlg
Random
Greedy
OurAlg
Random
Greedy
OurAlg
course 1 course 2 course 3 course 4 course 5
FractionofStudentsinStable
Teams
Figure 8: Fraction of students in stable teams formed by
Random, Exhaustive-Greedy, and our algorithms
this set as long as size and homogeneity constraints are sat-
isfied. A new empty set is created when the current set has
k members or when no other student can be added to it
without violating homogeneity constraints.
The run time of each algorithm is measured on a Mac-
Book Air with 8GB of RAM and a 1.7 GHz Intel Core i7
cpu running OS X version 10.9.5. The algorithms are im-
plemented with R and are single-threaded. The stability of
the resulting teams formed by each algorithm is measured
by computing the fraction of students in stable teams as
defined in Section 4.
We compared the performance of Random and Exhaustive-
Greedy algorithms with our algorithm for all courses, with
different values for key parameters. We reproduced con-
sistent results across courses and parameters. We report
stability and performance comparisons for creating teams
with maximum size constraint of k = 4 and two homogene-
ity constraints on age and education level, with parame-
ters δage = 3 and δeducation = 4. The selected δ values
for age and education are observed as the maximum differ-
ence between team members’ ages and education levels, re-
spectively, in 80% of all organic teams across eleven courses
present in our database.
Figure 8 shows the stability of the three algorithms for
four different courses. We observe that the quality of the
teams formed by the Random algorithm is very low, with
a stability fraction of about 0.1. The Exhaustive-Greedy
algorithm provides teams with much higher quality than
the Random algorithm, with an average stability fraction of
about 0.9. Our algorithm forms the highest quality teams,
with a stability fraction of 1. Figure 9 graphs the running
time in milliseconds displayed in logarithmic scale, demon-
strating the efficiency of our algorithm.
8. CONCLUSION AND FUTURE WORK
Teams are increasingly indispensable across organizations.
Whether in a large company or startup, a foundation, or a
research organization, interdisciplinary teams are essential
for solving problems or performing new and sophisticated
tasks. Despite our increasing dependency on high func-
tioning teams, our understanding of the dynamics of team
formation, people’s biases in choosing teammates, and the
composition of successful teams remains quite limited. This

1
10
100
1000
10000
100000
1000000
10000000
Random
Ouralg
Greedy
Random
Ouralg
Greedy
Random
Ouralg
Greedy
Random
Ouralg
Greedy
Random
Ouralg
Greedy
course 1 course 2 course 3 course 4 course 5
Time(ms)-logarithmic
10"
20'
Figure 9: Run time of Random, Exhaustive-Greedy, and our
algorithms for different courses
paper takes a small step towards using existing data to pro-
mote a better understanding of team formation dynamics
and success.
By studying 11 MOOCs and analyzing students’ prefer-
ences, we observed that students prefer to form teams that
are homogeneous across age, education level, distance and
time zone, but teams that are diverse with regard to skill set.
The same results hold for high-performance teams (teams
with many successful members). Our observations suggest
a strong positive correlation between skill diversity and team
success.
We proposed a game theoretic model for the team for-
mation problem based on these observations. In particular,
having a set of students with multiple characteristics and
skills, we defined the automatic team formation problem
as the partitioning of students into stable teams that ful-
fill predetermined homogeneity and cardinality constraints.
The problem was modeled as a collaborative game with non-
transferable utility, and the concept of core was utilized to
model stability. We showed that the core of the defined coali-
tional game is always non-empty and proposed a polynomial
time algorithm to form stable teams for a natural category
of value functions. A set of experiments were conducted that
show the performance and quality of our proposed algorithm
compared to random and greedy baseline algorithms.
An interesting direction of future work would be to study
the team formation problem when student history informa-
tion is present. Specifically, we are interested in considering
information on how students formed teams and how well
they performed in previous assignments and courses. This
data could help us define a proficiency score for students
across different skills and topics. This is a generalization
of [2], where students may have different proficiency levels
on different subjects considered together with profile ques-
tion constraints (such as age, education, and location). It
would be interesting to discover whether teams were ho-
mogeneous or heterogeneous across proficiency levels and
whether teams with more homogeneous proficiency levels
were more successful.
We are also interested in considering social connections
between users and whether these connections affect students’
preferences to form teams, generalizing our model based on
these observations.
Another interesting direction would generalize the prob-
lem for a dynamic setting, where the goal is to make rec-
ommendations to users to join teams or invite others and
update the recommendations if users decline.
Finally, we would like to observe how teams evolve over
time, how users behave in teams, when users leave their team
or get asked to leave, and how new members are selected.
9. REFERENCES
[1] R. Agrawal, B. Golshan, and E. Terzi. Forming
beneficial teams of students in massive online classes.
In Proceedings of the 1st ACM Conference on
Learning@ scale, pages 155–156. ACM, 2014.
[2] R. Agrawal, B. Golshan, and E. Terzi. Grouping
students in educational settings. In Proceedings of the
20th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages
1017–1026. ACM, 2014.
[3] A. An, M. Kargar, and M. ZiHayat. Finding affordable
and collaborative teams from a network of experts. In
Proceedings of the 2013 SIAM International
Conference on Data Mining, pages 587–595, 2013.
[4] A. Anagnostopoulos, L. Becchetti, C. Castillo,
A. Gionis, and S. Leonardi. Online team formation in
social networks. In Proceedings of the 21st
International World Wide Web Conference, pages
839–848. ACM, 2012.
[5] C. M. Beckman. The influence of founding team
company affiliations on firm behavior. Academy of
Management Journal, 49(4):741–758, 2006.
[6] R. Bredereck, T. Köhler, A. Nichterlein,
R. Niedermeier, and G. Philip. Using patterns to form
homogeneous teams. Algorithmica, pages 1–22, 2013.
[7] C. E. Eesley, D. H. Hsu, and E. B. Roberts. The
contingent effects of top management teams on
venture performance: Aligning founding team
composition with innovation strategy and
commercialization environment. Strategic Management
Journal, 35(12):1798–1817, 2014.
[8] K. M. Eisenhardt and C. B. Schoonhoven.
Organizational growth: Linking founding team,
strategy, environment, and growth among U.S.
semiconductor ventures, 1978-1988. Administrative
Science Quarterly, pages 504–529, 1990.
[9] D. P. Forbes, P. S. Borchert, M. E. Zellmer-Bruhn,
and H. J. Sapienza. Entrepreneurial team formation:
An exploration of new member addition.
Entrepreneurship Theory and Practice, 30(2):225–248,
2006.
[10] D. B. Gillies. Solutions to general non-zero-sum
games. Contributions to the Theory of Games,
4(40):47–85, 1959.
[11] X. Han, Y. Liu, X. Guo, X. Wu, and X. Song. Time
constraint-based team formation in social networks. In
Proceedings of the 2013 International Conference on
Mechatronic Sciences, Electric Engineering and
Computer (MEC), pages 1600–1604. IEEE, 2013.
[12] M. Hoefer, D. Váz, and L. Wagner. Hedonic coalition
formation in networks. In Proceedings of the 29th
AAAI Conference on Artificial Intelligence, 2014.

[13] J. Huang, A. Dasgupta, A. Ghosh, J. Manning, and
M. Sanders. Superposter behavior in mooc forums. In
Proceedings of the 1st ACM conference on Learning@
scale, pages 117–126. ACM, 2014.
[14] M. O. Jackson et al. Social and economic networks,
volume 3. Princeton University Press, 2008.
[15] T. Lappas, K. Liu, and E. Terzi. Finding a team of
experts in social networks. In Proceedings of the 15th
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages
467–476. ACM, 2009.
[16] A. Majumder, S. Datta, and K. Naidu. Capacitated
team formation problem on social networks. In
Proceedings of the 18th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
pages 1005–1013. ACM, 2012.
[17] L. L. Martins, L. L. Gilson, and M. T. Maynard.
Virtual teams: What do we know and where do we go
from here? Journal of Management, 30(6):805–835,
2004.
[18] A. S. Mello and M. E. Ruckes. Team composition. The
Journal of Business, 79(3):1019–1039, 2006.
[19] S. S. Rangapuram, T. B¨uhler, and M. Hein. Towards
realistic team formation in social networks based on
densest subgraphs. In Proceedings of the 22nd
International World Wide Web Conference, pages
1077–1088. International World Wide Web
Conferences Steering Committee, 2013.
[20] A. E. Roth and M. A. O. Sotomayor. Two-sided
matching: A study in game-theoretic modeling and
analysis. Number 18. Cambridge University Press,
1992.
[21] B. Senior and S. Swailes. The dimensions of
management team performance: a repertory grid
study. International Journal of Productivity and
Performance Management, 53(4):317–333, 2004.
[22] L. Sless, N. Hazon, S. Kraus, and M. Wooldridge.
Forming coalitions and facilitating relationships for
completing tasks in social networks. In Proceedings of
the 2014 International Conference on Autonomous
Agents and Multi-agent Systems, pages 261–268.
International Foundation for Autonomous Agents and
Multi-agent Systems, 2014.
[23] W. E. Watson, K. Kumar, and L. K. Michaelsen.
Cultural diversity’s impact on interaction process and
performance: Comparing homogeneous and diverse
task groups. Academy of Management Journal,
36(3):590–602, 1993.
[24] B. A. Williams. Peers in moocs: Lessons based on the
education production function, collective action, and
an experiment. In Proceedings of the 2nd ACM
Conference on Learning@ Scale, pages 287–292. ACM,
2015.
[25] A. Zzkarian and A. Kusiak. Forming teams: an
analytical approach. IIE Transactions, 31(1):85–97,
1999.

Team Formation Dynamics in Online Learning: Homophily and Heterophily

Recommended

Recommended

More Related Content

Similar to Team Formation Dynamics in Online Learning: Homophily and Heterophily

Similar to Team Formation Dynamics in Online Learning: Homophily and Heterophily (20)

Recently uploaded

Recently uploaded (20)

Team Formation Dynamics in Online Learning: Homophily and Heterophily