SlideShare a Scribd company logo
1 of 8
Download to read offline
The Ties that Blog: Examining the Relationship Between
   Social Ties and Continued Participation in the Wallop
                    Weblogging System
             Thomas Lento                                Howard T. Welser                                     Lei Gu
            Cornell University                         University of Washington                          Microsoft News
               323 Uris Hall                         202 Savery Hall, Box 353340                          Redmond, WA
            Ithaca, NY 14850                              Seattle WA, 98195                              Shanghai, China
    thomas.lento@cornell.edu                              htwiii@gmail.com                          legu@microsoft.com
                                                             Marc Smith
                                                           Microsoft Research
                                                             Redmond, WA
                                                    masmith@microsoft.com

ABSTRACT                                                               posting content than others? Comparing social tie patterns with
Are people who remain active as webloggers more socially con-          activity over time can answer this question.
nected to other users? How are the number and nature of social         Technical developments and the growth of the user base have
ties related to people's willingness to continue contributing con-     made weblogs a common medium for social interaction [4], and
tent to a weblog? This study uses longitudinal data taken from         understanding this increasingly important element of network
Wallop, a weblogging system developed by Microsoft Research,           usage is critical to effective analysis of the medium. Recognizing
to explore patterns of user activity. In its year long operation       the importance of social interaction, recent research ranging from
Wallop hosted a naturally occurring opportunity for cultural com-      studies of political opinions [1] to analyses of the conversational
parison, as it developed a majority Chinese language using popu-       nature of weblogs [8] has focused on links and comments as indi-
lation (despite the English language focus of the system). This        cators of interaction between users. However, such research is
allows us to consider whether or not language communities have         necessarily limited to analysis of individuals who actively post
different social network characteristics that vary along different     content, and generalizing these results to the full user population
activity levels. Logistic regression models and network visualiza-     is difficult since the bulk of the content produced in Internet dis-
tions reveal two key findings. The first is that not all ties are      cussion groups [15] and some weblogging systems [6] is contrib-
equal. Although a count of incoming comments appears to be a           uted by a very small percentage of the user population. A system-
significant predictor of retention, it loses its predictive strength   atic study of the factors contributing to continued user activity
when strong ties created by repeated, reciprocal interaction and       will provide an important backdrop to future research on social
ties from other dedicated webloggers are considered. Second, the       behavior in weblogs, and could allow researchers to examine
higher rate of retention among the Chinese language users is           questions which were previously intractable due to the difficulty
partly explained by that population’s greater ability to draw in       of generalizing results to a broader population. Furthermore, un-
participants with pre-existing social ties. We conclude with con-      derstanding the factors which contribute to continued user activity
siderations for weblogs and directions for future research.            could allow the creators of new weblogging systems to make
                                                                       more informed design decisions.
Keywords                                                               This study draws on rich longitudinal data taken from Wallop, a
Weblogs, Social Ties, Social Networks, Retention, Wallop               personal publishing and social networking system designed by
                                                                       Microsoft Research, to compare users who remain active with
1. INTRODUCTION                                                        those who do not. The analysis focuses on the relationship be-
Why do people maintain weblogs? Although existing research [7,         tween social ties and continued activity as expressed through
14] provides excellent insight into individual motivations for cre-    various features of the system, and tests if people who remain
ating weblogs, such work rarely focuses on changes in behavior         active are more socially connected to other users within the sys-
over time. Work on time-series data tends to focus on linking          tem. An additional dimension of the Wallop user base adds a cul-
patterns between weblogs [10] and the flow of information              tural element to the analysis. Over the life of the Wallop system
through these link structures [2]. However, the question of why        65% of the active users were Chinese language users who, as we
some people stop updating their weblogs after a short period of        have shown elsewhere, contribute more content and remain active
time while others remain active for months or even years is not        in the system longer than non-Chinese language users [6]. This
well researched. Is this a phenomenon dependent upon social            suggests some difference between the two groups, which contrib-
factors, the behavior of others as much as individual qualities? Do    utes to differences in their levels of continued activity in the sys-
people who receive responses and interact with others remain           tem. Is this difference cultural? Is it social? How much of the
active for longer than those who do not? Or is it more personal,       variance between the two groups is explained by differences in
where some individuals are more internally motivated to continue       their respective social networks?
2. WEBLOGS AND SOCIAL NETWORKS                                          This study seeks to address these concerns by using a unique
As the popularity of weblogs has increased, interaction between         dataset to examine the relationship between social network ties
authors has become more important to users, developers, and re-         and continued participation in a weblogging system. The dataset,
searchers. The popularity of services targeted at promoting user        which is taken from the database of activity within the Wallop
interaction, such as LiveJournal, has led to the creation of alterna-   weblogging and social networking system, includes full informa-
tive weblogging systems which emphasize sharing content with            tion about each users' invitation and entry into the system, activity
others. Weblogging features have been added to popular social           in the system, and interactions with other users. These activity
networking sites like MySpace and Friendster, and photo sharing         logs can be used to generate data on the full network of social
tools like Flickr are increasing in popularity and beginning to         interactions within the Wallop system, and by combining the ac-
draw competition from similar products.1 Meanwhile, researchers         tivity network data with changes in activity over time it is possible
have been busily analyzing the link structure taken from collec-        to examine the relationship between social ties and continued
tions of weblogs, searching for clues to the social relationships       participation.
between weblog authors. Analyses of these network structures            It is important to remember that Wallop is a partially closed unit
have led to excellent studies of deliberation within the medium         of weblogging activity bounded by a snowball sample invitation
[1], and the creation of metrics for authority and influence [11].      mechanism. Although application of these findings may be appli-
Another study of the social structure of weblogs indicates that in      cable to weblogs and other content creation platforms even when
some cases weblog postings are conversational in nature [8],            they lack Wallop’s membership features, the features that enable
which may suggest a strong social connection between the par-           some aspects of this research simultaneously set the Wallop
ticipants.                                                              dataset apart from the wider “blogosphere.” Therefore, these
Research examining the nature of social links contained within the      findings may be most directly comparable to other similarly
weblog medium is rarely focused on explaining individual partici-       bounded social network systems such as Orkut.
pation. Yet understanding why people upload content to their            2.1 Wallop
weblogs, and why some continue to publish content months or             Wallop is an online personal publishing and social networking
even years later, may be a central element needed for under-            system. An individual gains access to the system when she re-
standing the nature of the social phenomena present within the          ceives an invitation from an existing Wallop user. It is not possi-
weblogging medium. Existing research on individual motivations,         ble to navigate to the site and create an account without receiving
based on content analysis [7] and interviews [14], suggests that        an invitation.
people publish weblogs for many different reasons, including a
desire to publish a diary, share knowledge, express a viewpoint, or
become part of a community based on a particular shared interest.
However, this research only touches on the social connections
between the users and how they affect individual decisions to
participate.
Outside of the weblog literature, research suggests that social ties
are critically important for participation. Recruitment to political
and social movements is often dependent on pre-existing social
ties [3, 13]. In e-learning settings, continued participation is de-
pendent on social interaction with the professors and other stu-
dents, even more so than the course contents and materials [12].
Studies of data taken from other online groups show that closely
connected groups are more supportive of members [16], and indi-
viduals with a strong sense of attachment to a group are more
likely to participate [9].
Despite the apparently strong connection between social connec-
tions and participation in online and offline groups, empirical
research on the relationship between social network structure and
individual activity is limited. This lack of empirical research is
largely a result of the difficulty of collecting network data. Even     Figure 1. The Wallop interface, where users can manage their
in the case of weblog research, where data on the links connecting                            Wallop network.
one weblog to another is both archived and publicly available, the
sheer number of weblogs, the variety of different links, and the
tendency of users to change hosts makes comprehensive data col-         Once she logs in, the user is presented with an interface which
lection difficult. Furthermore, defining different levels of partici-   allows her to publish content in the form of text, images, or music
pation while controlling for the enormous variation in the differ-      posted to her personal weblog. She can interact with other users
ent types of weblogs is a massive undertaking, making any study         by making comments on their content, or by viewing, navigating,
of social network attributes and continued participation either         and adjusting the network of other Wallop users connected to her.
exceedingly difficult or dramatically simplified.


1
    http://www.zooomr.com/
Three types of networks, a comment network (4,285 nodes,
                                                                       14,884 edges), an invitation network (4,514, nodes, 4,255 edges),
                                                                       and a combination network (3,119 nodes, 4,323 edges), were cre-
                                                                       ated for this study. Ties in the comment network, which is the
                                                                       primary network used in the analysis, are defined as comments
                                                                       sent from one user to another. The comment network was con-
                                                                       structed from data taken in a five week period spanning the entire
                                                                       month of November 2004. This time period was chosen because it
                                                                       was an interesting growth period in the Wallop lifecycle, occur-
                                                                       ring shortly after the system went public. It also has the advantage
                                                                       of producing networks which, while large, are substantially
                                                                       smaller than networks taken from similar periods later in the life
                                                                       cycle. A five week period was chosen as short enough to capture a
                                                                       relatively brief moment in time, while still being long enough to
                                                                       provide rich data. The invitation network contains all nodes in-
                                                                       cluded in the comment network, plus any additional inviters. An
                                                                       invitation (which did not need to be accepted) defines an edge,
                                                                       and each user could only be invited once. The combination net-
     Figure 2. The Wallop interface for uploading content              work is a subset of the comment network, where edges are defined
           including comments, images and music.                       for any two nodes connected by both a comment and an invitation.
2.2 Research Questions                                                 The network data generated from the raw Wallop data was used to
An interesting phenomenon occurred after Wallop was released to        generate variables, which were then included in a logistic regres-
the public in October 2004. Despite having an English-only user        sion model. The regression model also included individual level
interface, restricting access to invited users, and starting with an   variables, such as activity measures and language group affilia-
English-speaking population of Microsoft employees, Wallop             tion. Finally, the network data was used to generate visualizations,
attracted an enormous contingent of Chinese language users. The        which were used to illustrate general trends in user behavior and
Chinese language users became the dominant group in the system         show some distinctions between those who remained active and
after a period of explosive growth in mid-November 2004, and           those who did not.
continued to grow at a rapid rate, outpacing the growth of the non-    3.1 Network Variables
Chinese language group.
                                                                       Except where noted, network variables for the analysis were con-
Earlier work [6] shows higher levels of activity among Wallop's        structed from the comment network.
Chinese language user population than the non-Chinese language
                                                                       In-degree is the number of others who commented on a person’s
users. Even after adjusting for differences in population size, the
                                                                       weblog. This variable captures interaction received from an indi-
Chinese language users send more invitations, have a higher ac-
                                                                       vidual. Out-degree was not included because, as a measure of
ceptance rate for their invitations, and contribute more content
                                                                       messages sent, it may not be representative of social interaction
than the non-Chinese language users. Most importantly, much of
                                                                       from the sender's perspective. Receipt of a message is a slightly
the variance is explained by the Chinese language users' tendency
                                                                       stronger indication to the recipient that there is a social connec-
to remain active in the system longer.
                                                                       tion. Reciprocal ties would be the ideal measure, but due to diffi-
The fact that the Chinese language users remain active in the          culties with the database they were not available at the time of this
Wallop system for a longer period of time than the non-Chinese         analysis.
language users raises some interesting questions about the reasons
                                                                       Density is the ratio of the observed number of ties divided by the
for the variance between the two groups. This study, therefore,
                                                                       maximum possible ties [(n-1)*(n-2)]. Density is calculated for
focuses on two general research questions:
                                                                       each node as the density of a subset of the comment network,
1. How are social network statistics related to a user's tendency to   consisting of all nodes and edges within a two degree radius of
remain active over a relatively long period of time?                   ego. This is a measure of the connectedness of an individual's
2. How much of the variance between the Chinese and non-Chi-           local network. The more connected an individual's network, the
nese language groups can be explained by differences in social         more likely it is that she will feel she is part of a social group, as
network structure?                                                     opposed to an isolated voice.
                                                                       Betweeness centrality measures the degree to which a node in a
3. DATA AND METHODS                                                    graph is a necessary conduit on the shortest path between other
Data were taken from the Wallop back-end database, which in-           nodes. In these data the raw measure for betweeness is highly
cludes a record of all invitations and uploads for each user, along    right skewed. Therefore the natural log of the values is used,
with date and time information. The database also includes data        which resolves some of the skew. For a formula of betweeness
associating responses or comments with the original object. This       centrality see Brandes [5].
makes it possible to reconstruct the network of interactions cre-      Total degree in combined network measures the number of strong
ated by users sending comments to each other. It also makes it         inward and outward ties that a weblogger has, where a strong tie
possible to reconstruct the invitation tree created by existing        is defined as an edge between a dyad where both an invitation and
Wallop users inviting potential new users to join the system.          a comment has occurred. Such ties indicate that webloggers inter-
                                                                       acted after invitation, suggesting a pre-existing and therefore more
substantial relationship between the nodes. While Wallop users
did invite random strangers to join the system, we argue that those           Table 1. Means for Structural Variables, Grouped by
that then also engaged in commentary with their inviter have                                       Retention
stronger ties than those that did not.
Time one alters active at time two reports a count of connections
to others in time one who are also active during time two. Ties to
dedicated bloggers (those that will stick around) will likely have
an impact on an individual's decision to remain active. If social
ties are important to activity, and all of the people with whom a
person communicates leave the system, then we would expect that
person to leave the system as well.
Percent Time One Alters Active at Time Two is a normalized ver-
sion of time one alters active at time two. The original measure is
imperfect, because an individual with a few friends might remain
active if all of those friends also remain active. Conversely, an
individual with many friends might have a low percentage of her
friends remain active, but still have many active alters and there-
fore stay active herself. Therefore, both measures are necessary.
                                                                         Two of the most striking differences stem from ties from dedi-
3.2 Individual Variables                                                 cated bloggers. On average, those active at T2 had almost four
Active at Time 2 is the dependent variable, and is coded as a bi-        times as many ties to dedicated bloggers, and within their current
nary for active or not active. Time 2 represents a 5 week period         ties, dedicated bloggers represented almost half of their ties.
from January of 2005. Active users are defined as those who
posted comments during this time period. Content uploads would                Table 2. Means for Structural Variables, Grouped by
be a better measure, but unfortunately this data was not available                                 Retention
at the time of this work.
Language Group is a binary variable coded 1 for Chinese Lan-
guage Group, 0 otherwise. Any user with more than 5% of her
comments, or with 3 or more total comments, containing charac-
ters from the CJK (Chinese, Japanese, and Korean) Unicode
ranges was classified as a Chinese language user. All other users
were placed in the non-Chinese language user group. Close ex-
amination of the comments posted by a random sample of 200
users from each group suggests that the error rates for these classi-
fication systems are quite low. Less than 1% of the Chinese lan-
guage sample did not include Chinese comments, and all of these
were Japanese or Korean.2 Similarly, less than 3% of the non-
Chinese language group were using Chinese characters, and most
were communicating in English.
                                                                         Table 2 reports the same variables while dividing the sample by
4. ANALYSIS                                                              language group rather than retention. Table 1 showed that 84% of
Table 1 compares the means of structural variables for Active T2         those active at T2 were in the Chinese language group. Therefore
and Non Active T2 webloggers. The network attributes of long             variables associated with retention are expected to be higher for
lived webloggers are strikingly and significantly different from         the Chinese language group.
those who quit. On average, they have more than twice as many            Chinese language users send comments to more unique alters
people who send them comments, they are more central in the              (they have higher degree), are more central (higher betweeness),
comment network, they have one more person in their strong tie           and have more strong ties. They also have more ties to dedicated
network, and they are disproportionately connected to other blog-        bloggers and a higher percentage of their ties are to more dedi-
gers who themselves are long lived. These differences suggest            cated bloggers. Finally, their retention rate is more than double the
that social ties are positively related to activity within the system,   Non-Chinese language group.
particularly where those ties link to other active users.3
                                                                         Overall, these comparisons of means suggest both that social
                                                                         structure should enhance retention, and that these structural fea-
2                                                                        tures are more prevalent among the Chinese language users.
    We recognize the cultural differences between Japanese,
    Koreans, and Chinese, but due to the small number of Japanese
    and Korean users who did not use Chinese we felt justified in
    accepting these as a minimal classification error.                      penalizes larger networks. We include it here as a warning
3
                                                                            against using the standard definition of density to study online
    The one contradictory variable is density, which is higher for          social networks, especially because many online networks are
    those who quit. However, this is an artifact of how density is          large.
    conventionally standardized in network analysis, by essentially
    taking the square of the number of nodes. This unrealistically
4.1 Logistic Regression Predicting Retention                              Table 3. Means of network variables for Wallop webloggers
The primary analyses employ a logistic regression model to pre-
dict the probability of continued participation in the Wallop we-
blogging system. Table 3 reports three models which allow com-
parison of how the effects on retention of these social structural
variables differ between the full sample, the Non-Chinese, and the
Chinese language groups. The model coefficients have been ex-
ponentiated and thus reveal how a one unit change in that variable
would affect the odds of retention over the odds of quitting. Val-
ues greater than one indicate a change that increases the odds of
retention, values less than one indicate a decrease in the odds of
retention, and values equal to one indicate no change in the odds
of retention. In interpreting multivariate regression results it is
important to remember that the coefficient for each variable has
been calculated after conditioning on the other variables.
The general trends in the three models, combined with the high
significance levels for several of the variables, support the notion
that social ties can positively affect continued participation.

4.1.1 Model for Combined Sample
The model for the combined sample suggests that bloggers will be
more likely to remain active if they are more central (betweeness),
have larger networks of strong ties, and are tied to larger numbers
and higher percentages of bloggers who themselves remain active.
The coefficient for percent T1 alters active is an especially power-
ful predictor of continued activity. All other things being equal,
the odds of a user remaining active in the system increase 2.6
times when all alters in her network remain active.
The most interesting result of this model is that in-degree is not       For non-Chinese language users, a larger total degree for com-
significant once other variables are taken into account. Although it     bined network is related to a lower level of activity at T2. This
seems intuitive that the more connections an individual has, the         result, while surprising, is not significant. The effect of this vari-
longer she will remain active in the system, it turns out that not all   able for Chinese language users, by contrast, is strongly signifi-
ties are equal. Combined network ties, which consist of both             cant in the opposite direction. This suggests that the pre-existing
comment and invitation links, represent stronger ties and a higher       relationships and strong connections to other users are more im-
likelihood of a pre-existing relationship between the users. These       portant to the Chinese language users than the non-Chinese lan-
types of strong ties, along with ties to other active users, are much    guage users.
more important predictors of retention than simply the number of         Having a larger number, and a higher percentage, of alters active
comments one receives (In Degree).                                       at T2 is related to a higher level of retention, and is strongly sig-
Given earlier work studying the Wallop system [6], it is not sur-        nificant for both groups. However, the effects are substantially
prising that language group has a strong effect on retention. Even       stronger for non-Chinese language users than Chinese language
after controlling for other factors, a Chinese language user is more     users. This suggests that connections to dedictated bloggers are
than twice as likely to remain active than a Non-Chinese language        more important to the non-Chinese language users, whereas for
user.                                                                    the Chinese language users these factors are less important, as
                                                                         they are sustained both by these connections, and by their strong
                                                                         ties.
4.1.2 Comparison of Models for Chinese and Non
Chinese Language User Samples                                            4.1.3 Evaluating model fit
Comparing the model for Chinese and non-Chinese language                 An intuitive way to evaluate a fitted logistic regression model is to
users can provide some insight into the differences between the          compare the predicted probability of the outcome to the observed
two groups. In general, the effects of the network variables in-         outcome for all cases. The following plot does just this, by de-
cluded in this model are similar for both groups. The two striking       picting predicted probability against the observed presence of
differences are in the effects of the Total Degree for Combined          subsequent participation for the combined sample. All cases
Network and Active T1 Alters variables.                                  above the midline were active at time 2 and all cases below the
                                                                         midline were not. Vertical variation within quadrants is simply the
                                                                         result of random jittering introduced to reduce overlap of cases.
2004, the initial period of rapid increase in the Chinese language
                                                                         users.




Figure 3. A visual cross-classification of actual and predicted
                           retention
How well did the model used in this analysis predict actual reten-                    Figure 4. Core of the comment network.
tion? The first notable result is that the vast majority of cases are
in the lower left quadrant, meaning they both dropped out, and
had social networks unlikely to sustain continued activity. Thus         The visualizations in Figures 4, 5, and 6 report on these three
there appears to be a clear relationship between low values in the       datasets, while coding language through color and shape of the
measured social structural attributes and loss of activity. Further-     node: English (Triangle, Blue), Chinese (Circle, Red), and Un-
more, the webloggers who had high levels of these social struc-          categorizable4 (Square, White). Continued activity in Wallop is
tural variables were more likely to stay active. This can be seen by     indicated by depth of shading, with darker hues corresponding to
comparing the right hand quadrants to each other. Cases are              participants in T1(November) who posted a comment during T2
clearly more numerous in the upper right than the lower right, and       (January).
as predicted probability increases, the frequency of false positives
                                                                         The comment network, which represents the primary mode of
decreases.
                                                                         interaction in Wallop, is represented in Figure 4. Since the com-
However, the upper left quadrant can be thought of as cases of           ment network is large (over 4,000 nodes) this visualization pre-
‘false negatives’, i.e. they are results which the model predicted       sents a reduced graph that includes only those nodes with at least
would drop out of the system, yet they actually remained active.         5 neighbors. This graph has several interesting attributes: (1)
This suggests that a large amount of what keeps people involved          Commenting is highly, but not completely segregated by language
in weblogging is not included in this model. Therefore (1) we can        group. A notable exception is on the far right. (2) Intense ties
do a better job of measuring social structural variables, and (2) we     (indicating multiple comments within the dyad) are prevalent
can try to measure more individualistic characteristics that might       within the core of the population. This is true for both language
also explain sustained participation. Furthermore, although this         groups, although the intense Chinese language core seems larger.
analysis provides some insight into how social structure might           (3) There is some evidence that that those who remain active
sustain continued involvement in weblogging systems, there re-           during T2 are centrally located in their local network during T1.
mains a great deal of variation to explain. The next section pre-        This supports the effects of betweenness centrality observed in the
sents network visualizations, which help provide insight into how        regression models.
structural features might be related to continued participation, and
                                                                         Figure 5 illustrates a reduced graph of the invitation network
which point to some additional future directions for research.
                                                                         (nodes with at least 4 invitees).

4.2 Network Visualizations
Network visualizations are used to explore structural relationships
in the Wallop system and to generate new intuitions. Here are
three visualizations, which highlight the major language group
division in the system.
These visualizations are based on three network datasets con-
structed from three definitions of ties: invitation, where ego is tied
to alter when one invited the other to join Wallop; comment,
where ego or alter commented on the other's weblog; and com-
bined, where within an invitation relationship there was also the        4
                                                                             Users who did not upload any comments could not be classified
presence of a comment tie. The data includes ties from November              as Chinese or non-Chinese language users.
with at least 4 invitees who also commented on or were com-
                                                                       mented upon by their child, has two interesting characteristics: (1)
                                                                       There is no clear central component. While the reduced combined
                                                                       graph contains nearly the same number of nodes (332 to 267) as
                                                                       the invitation graph, the combined graph is highly fragmented.
                                                                       This result fits with qualitative data that suggests that among Chi-
                                                                       nese users, invitations would often be made available online to
                                                                       strangers, who would then invite a group of their friends. This
                                                                       would result in fragmented groupings. (2) Branching is less
                                                                       prevalent and ties are more fragile. It seems that the invite and
                                                                       comment relationship was seldom both intense and numerous.

                                                                       5. DISCUSSION
                                                                       This study provides some initial data on the relationship between
                                                                       social interaction and an individual's propensity to continue up-
                                                                       loading content. Interacting with active alters is a strong and sig-
                                                                       nificant predictor of continued user activity. Users who maintain a
                                                                       connection with the person who invited them are also more likely
                                                                       to stay active, which suggests that pre-existing networks are likely
                                                                       an important part of retention. Chinese users had more of these
                                                                       relationships, they were retained more, and the coefficient is
                                                                       strong and significant for them. However, non-Chinese had fewer
                                                                       of these relationships, they were not retained nearly as well, and
           Figure 5. Core of the invitation network.                   this coefficient is weak, negative and non-significant for them.
                                                                       One important point to take away from this study is that the
There are several interesting elements in this graph: (1) There is a   amount of ties does not matter as much as the strength of those
similar general tendency towards segregation as in the comment         ties, and the number of ties to other dedicated bloggers. That is,
network, although there are several notable exceptions. (2) While      after controlling for combined ties and ties to bloggers who them-
there is one major branch that is initially prevalent with Non-Chi-    selves remain active, in degree had no affect on retention. This
nese language users, it and all other branches are primarily Chi-      has implications for more general studies of weblogs: measuring
nese language users. (3) People who are branching points within        ties by simply counting them is not enough because not all ties are
this graph are disproportionately active during T2. (4) The high       equally effective indicators of a strong connection, which may
degree invitation network is composed largely of a single large        itself be an indicator of community. More important than meas-
component.                                                             uring ties is to measure ties that have some underlying reason to
                                                                       affect behavior, in this case, ties that were likely to represent pre-
                                                                       existing relationships and ties to dedicated participants.
                                                                       A second major point to take away from this study concerns the
                                                                       idea of pre-existing social networks. Organizations that can draw
                                                                       existing social networks into them will build stronger community
                                                                       than those that do not. This is demonstrated by the Chinese Lan-
                                                                       guage user group, which was composed of much more durable
                                                                       and active participants. Part of their success, we argue, is that they
                                                                       had a higher prevalence of "strong ties" that combined invitation
                                                                       and commenting, which can be thought of as a proxy indicator of
                                                                       pre-existing social relationships. More important then their
                                                                       prevalence, though, was their influential nature. In the group with
                                                                       greater cohesion, the presence of those strong ties had a strong
                                                                       positive effect on retention. This variable was an important and
                                                                       significant predictor of retention for the Chinese Language group,
                                                                       and not for the non-Chinese language group. The positive effect
                                                                       of pre-existing relationships was suggested by theory and prior
                                                                       research [13], and it is instructive to find evidence of this same
                                                                       mechanism in the context of Weblogs as well. This mechanism
                                                                       can also explain the meteoric growth of MySpace and FaceBook
                                                                       in the US, which are populated predominantly by students who
                                                                       can easily draw in their extensive and intense school-based social
     Figure 6. Absence of a core in the combined network.              networks into the system. Further, weblogs seeking growth need
                                                                       to look to populations within organizations for recruitment,
                                                                       bringing in users who may then pull in their already existing so-
What does an identical reduction of the combined invitation and        cial connections, in favor of individuals outside of institutions
comment network show? The graph above, which includes nodes            who may have less developed connections.
Although this study represents a promising start, it has only                  http://www.hpl.hp.com/research/idl/papers/politicalblogs/Ad
scratched the surface of the relationship between network struc-               amicGlanceBlogWWW.pdf
ture and retention in weblogs. More sophisticated network statis-          [2] Adar, E. and L. A. Adamic. 2005. Tracking Information
tics, such as triangle census measures, measures based on recipro-             Epidemics in Blogspace. Web Intelligence 2005.
cal ties, and a more thorough treatment of data available from
analysis of localized egocentric network data for each node might          [3] Anheier, H. 2003. Movement Development and
help explain more of the variance in the model. Creating continu-              Organizational Networks: The Role of 'Single Members' in
ous variables which measure decline in user activity would also be             the German Nazi Party, 1925-30. Social Movements and
beneficial. For example, a typical user might begin with a flurry              Networks: Relational Approaches to Collective Action. D.
of activity, which then declines until the user is no longer active in         McAdam and M. Diani. New York, Oxford University Press:
the system. Comparing the social structures around these users to              49-76.
users with dramatically different behavioral patterns might pro-           [4] Blood, R. 2004. How Blogging Software Reshapes the
vide further insight into the importance of social ties.                       Online Community. Communications of the ACM. 47: 53-55.
Furthermore, controls for individual behavior are noticeably ab-           [5] Brandes, U. 2001. A Faster Algorithm for Betweenness
sent from this analysis. Login, weblog posting, and image up-                  Centrality. Journal of Mathematical Sociology. 25(2):163-
loading behavior might shed some light on user activity patterns.              177.
People who rarely log in, even if they are positioned in an active
                                                                           [6] Gu, L., P. Johns, T. M. Lento and M. A. Smith. 2006. How
social network, are not as likely to remain active. If possible, con-
                                                                               Do Blog Gardens Grow? Language Community Correlates
trols for demographic features would also enhance this analysis.
                                                                               with Network Diffusion and Adoption of Blogging Systems.
Unfortunately, due to problems with searching against the data-
                                                                               Proceedings of the AAAI Symposium on Computational
base, the data for these variables was not available at the time of
                                                                               Approaches to Analyzing Weblogs.
this analysis, but future research along these lines should include
this data wherever possible.                                               [7] Herring, S., L. A. Scheidt, S. Bonus and E. Wright. 2004.
                                                                               Bridging the Gap: A Genre Analysis of Weblogs. Hawai'i
Despite the limitations of this study, it has brought some interest-
                                                                               International Conference on Systems Science.
ing findings and intriguing theoretical questions to light. Social
interaction does appear to be important to user activity in weblog-        [8] Herring, S., I. Kouper, J. C. Paolillo, L. A. Scheidt, M.
ging systems. Even simple network measures can explain some of                 Tyworth, P. Welsch, E. Wright and N. Yu. 2005.
the variance between the heavily active Chinese language user                  Conversations in the Blogosphere: An Analysis "From the
group and the rest of the Wallop user population. Further refine-              Bottom Up." Hawai'i International Conference on Systems
ment of these models might provide some insight into how people                Sciences.
from different cultural backgrounds use weblogging systems. Do             [9] Kollock, P. and M. Smith. 1996. Managing the Virtual
the Chinese language users remain active because they are more                 Commons: Cooperation and Conflict in Computer
willing to make social connections through this medium? Or is the              Communities. Computer-Mediated Communication. S.
higher activity level a result of differences in the demographic               Herring. Amsterdam, John Benjamins.
distribution of the two populations? Finally, further exploration of
the combination network, and the effects of those combination              [10] Kumar, R., J. Novak, P. Raghavan and A. Tomkins. 2003.
ties, might shed some light on the importance of pre-existing rela-             On the Bursty Evolution of Blogspace. WWW2003.
tionships. Do users who actively contribute to weblogging sys-             [11] Marlow, C. 2004. Audience, Structure and Authority in the
tems do so because they have offline friends there? What other                  Weblog Community. Presented at the International
reasons might motivate an individual to contribute? Finally, the                Communication Association Conference.
visualizations presented above prompt additional questions for
                                                                           [12] Martinez, M. 2003. High Attrition Rates in E-Learning:
further consideration. What is the significance of the centrality of
                                                                                Challenges, Predictors, and Solutions. The E-Learning
active users? Is this an artifact of their higher activity levels – i.e.
                                                                                Developer's Journal. July 14: 9 pages.
more comments and invitations sent tend to move an individual
towards the center of the graph – or is it indicative of something         [13] McAdam, D. 1988. Freedom Summer. New York, Oxford
else? Could this be related to status within the Wallop system?                 University Press.
                                                                           [14] Nardi, B. A., D. J. Schiano, M. Gumbrecht and L. Swartz.
6. ACKNOWLEDGMENTS                                                              2004. Why We Blog. Communications of the ACM. 47: 41-
Our thanks to Wallop Users, the Wallop Development Team,                        49.
MSR Social Computing and Community Technologies Groups,                    [15] Smith, M. 2004. Measures and Maps of Usenet. From Usenet
Shelly Farnham, and Danyel Fisher.                                              to Cowebs. C. Lueg and D. Fisher, Springer.
                                                                           [16] Wellman, B. 1996. An Electronic Group Is Virtually a Social
                                                                                Network. Culture of the Internet. S. Kiesler. Hillsdale, NJ,
7. REFERENCES                                                                   Lawrence Erlbaum: 179-208.
[1] Adamic, L. A. and N. Glance. 2005. The Political
    Blogosphere and the 2004 U.S. Election: Divided They Blog.
    Proceedings of the WWW2005 Conference's 2nd Annual
    Workshop on the Weblogging Ecosystem: Aggregation,
    Analysis, and Dynamics.

More Related Content

What's hot

Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...Curtis Rogers, MLIS, EdD
 
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...Curtis Rogers, MLIS, EdD
 
Libraries social media
Libraries social mediaLibraries social media
Libraries social mediaJames Neal
 
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
 
Social Media for Scientists
Social Media for ScientistsSocial Media for Scientists
Social Media for Scientists3rhinomedia
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOijaia
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyMatthew Rowe
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social MediaMarco Brambilla
 
Social media visualization for crisis management
Social media visualization for crisis managementSocial media visualization for crisis management
Social media visualization for crisis managementMustafa Alkhunni
 
POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...
POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...
POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...IAEME Publication
 
20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...Marc Smith
 
Preproc 03
Preproc 03Preproc 03
Preproc 03naeyam
 
Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesThe Open University
 
Effects of Social Networking in Academic Literacy
Effects of Social Networking in Academic LiteracyEffects of Social Networking in Academic Literacy
Effects of Social Networking in Academic LiteracySteve Chilton
 
Detecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionDetecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionSuresh S
 
Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...Eileen Shepherd
 
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...Sorin Adam Matei
 
On Social E Learning
On Social E LearningOn Social E Learning
On Social E LearningMegaVjohnson
 
Calrg2015 2015 06-15
Calrg2015 2015 06-15Calrg2015 2015 06-15
Calrg2015 2015 06-15Katy Jordan
 

What's hot (20)

Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
 
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
Social Media, Libraries, and Web 2.0: How American Libraries are Using New To...
 
Libraries social media
Libraries social mediaLibraries social media
Libraries social media
 
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
 
Social Media for Scientists
Social Media for ScientistsSocial Media for Scientists
Social Media for Scientists
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User Study
 
SDoW2010 keynote
SDoW2010 keynoteSDoW2010 keynote
SDoW2010 keynote
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
Social media visualization for crisis management
Social media visualization for crisis managementSocial media visualization for crisis management
Social media visualization for crisis management
 
POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...
POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...
POSITIONING OF THE NEW MEANS OF COMMUNICATION AND INFORMATION: EMPIRICAL STUD...
 
20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...
 
Preproc 03
Preproc 03Preproc 03
Preproc 03
 
Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online Communities
 
Effects of Social Networking in Academic Literacy
Effects of Social Networking in Academic LiteracyEffects of Social Networking in Academic Literacy
Effects of Social Networking in Academic Literacy
 
Detecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionDetecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervision
 
Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...
 
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...
 
On Social E Learning
On Social E LearningOn Social E Learning
On Social E Learning
 
Calrg2015 2015 06-15
Calrg2015 2015 06-15Calrg2015 2015 06-15
Calrg2015 2015 06-15
 

Viewers also liked

Kick Off Proeve Van Bekwaamheid Ver2
Kick Off Proeve Van Bekwaamheid Ver2Kick Off Proeve Van Bekwaamheid Ver2
Kick Off Proeve Van Bekwaamheid Ver2Artanis12
 
Republica Dominicana
Republica DominicanaRepublica Dominicana
Republica Dominicanamcecilias
 
2009 June 22 Node Xl V88 Tutorial
2009   June   22   Node Xl V88 Tutorial2009   June   22   Node Xl V88 Tutorial
2009 June 22 Node Xl V88 TutorialMarc Smith
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network AnalysisMarc Smith
 

Viewers also liked (6)

CM Management (www.cmcons.com)
CM Management (www.cmcons.com)CM Management (www.cmcons.com)
CM Management (www.cmcons.com)
 
Kick Off Proeve Van Bekwaamheid Ver2
Kick Off Proeve Van Bekwaamheid Ver2Kick Off Proeve Van Bekwaamheid Ver2
Kick Off Proeve Van Bekwaamheid Ver2
 
Abie Collage
Abie CollageAbie Collage
Abie Collage
 
Republica Dominicana
Republica DominicanaRepublica Dominicana
Republica Dominicana
 
2009 June 22 Node Xl V88 Tutorial
2009   June   22   Node Xl V88 Tutorial2009   June   22   Node Xl V88 Tutorial
2009 June 22 Node Xl V88 Tutorial
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 

Similar to 2006 www - lento welser gu smith - ties thatblog

An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...Rick Vogel
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksEditor IJCATR
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...inventionjournals
 
a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...INFOGAIN PUBLICATION
 
Network Structure For Social Network
Network Structure For Social NetworkNetwork Structure For Social Network
Network Structure For Social NetworkVaibhav Sathe
 
Community Evolution in the Digital Space and Creation of Social Information C...
Community Evolution in the Digital Space and Creation of SocialInformation C...Community Evolution in the Digital Space and Creation of SocialInformation C...
Community Evolution in the Digital Space and Creation of Social Information C...Saptarshi Ghosh
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
A framework of Web Science
A framework of Web Science A framework of Web Science
A framework of Web Science vafopoulos
 
1. short answer question(200 words) [required] roger s. pr
1. short answer question(200 words) [required] roger s. pr1. short answer question(200 words) [required] roger s. pr
1. short answer question(200 words) [required] roger s. prjasmin849794
 
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSSTRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSijcsit
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
 
The Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its AnalysisThe Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its AnalysisIJMER
 

Similar to 2006 www - lento welser gu smith - ties thatblog (20)

An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social Networks
 
Blogosphere
BlogosphereBlogosphere
Blogosphere
 
Blogosphere
BlogosphereBlogosphere
Blogosphere
 
Blogosphere
BlogosphereBlogosphere
Blogosphere
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
 
2_Doc5_2.pdf
2_Doc5_2.pdf2_Doc5_2.pdf
2_Doc5_2.pdf
 
a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...
 
Network Structure For Social Network
Network Structure For Social NetworkNetwork Structure For Social Network
Network Structure For Social Network
 
Community Evolution in the Digital Space and Creation of Social Information C...
Community Evolution in the Digital Space and Creation of SocialInformation C...Community Evolution in the Digital Space and Creation of SocialInformation C...
Community Evolution in the Digital Space and Creation of Social Information C...
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 
A framework of Web Science
A framework of Web Science A framework of Web Science
A framework of Web Science
 
1. short answer question(200 words) [required] roger s. pr
1. short answer question(200 words) [required] roger s. pr1. short answer question(200 words) [required] roger s. pr
1. short answer question(200 words) [required] roger s. pr
 
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSSTRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behavior
 
The Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its AnalysisThe Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its Analysis
 

More from Marc Smith

How to use social media network analysis for amplification
How to use social media network analysis for amplificationHow to use social media network analysis for amplification
How to use social media network analysis for amplificationMarc Smith
 
Think link what is an edge - NodeXL
Think link   what is an edge - NodeXLThink link   what is an edge - NodeXL
Think link what is an edge - NodeXLMarc Smith
 
2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshow2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshowMarc Smith
 
2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA
2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA
2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNAMarc Smith
 
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...Marc Smith
 
2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media snaMarc Smith
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXLMarc Smith
 
Think Link: Network Insights with No Programming Skills
Think Link: Network Insights with No Programming SkillsThink Link: Network Insights with No Programming Skills
Think Link: Network Insights with No Programming SkillsMarc Smith
 
20130724 ted x-marc smith-digital health futures empowerment or coercion
20130724 ted x-marc smith-digital health futures empowerment or coercion20130724 ted x-marc smith-digital health futures empowerment or coercion
20130724 ted x-marc smith-digital health futures empowerment or coercionMarc Smith
 
2013 passbac-marc smith-node xl-sna-social media-formatted
2013 passbac-marc smith-node xl-sna-social media-formatted2013 passbac-marc smith-node xl-sna-social media-formatted
2013 passbac-marc smith-node xl-sna-social media-formattedMarc Smith
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network AnalysisMarc Smith
 
20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...Marc Smith
 
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...Marc Smith
 
2012 ona practitioner-courseflyer
2012 ona practitioner-courseflyer2012 ona practitioner-courseflyer
2012 ona practitioner-courseflyerMarc Smith
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …Marc Smith
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...Marc Smith
 
20111123 mwa2011-marc smith
20111123 mwa2011-marc smith20111123 mwa2011-marc smith
20111123 mwa2011-marc smithMarc Smith
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smithMarc Smith
 
2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-BoxMarc Smith
 
20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research Foundation20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research FoundationMarc Smith
 

More from Marc Smith (20)

How to use social media network analysis for amplification
How to use social media network analysis for amplificationHow to use social media network analysis for amplification
How to use social media network analysis for amplification
 
Think link what is an edge - NodeXL
Think link   what is an edge - NodeXLThink link   what is an edge - NodeXL
Think link what is an edge - NodeXL
 
2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshow2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshow
 
2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA
2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA
2016 SocialMedia.Org Marc Smith-NodeXL-Social Media SNA
 
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
 
2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
 
Think Link: Network Insights with No Programming Skills
Think Link: Network Insights with No Programming SkillsThink Link: Network Insights with No Programming Skills
Think Link: Network Insights with No Programming Skills
 
20130724 ted x-marc smith-digital health futures empowerment or coercion
20130724 ted x-marc smith-digital health futures empowerment or coercion20130724 ted x-marc smith-digital health futures empowerment or coercion
20130724 ted x-marc smith-digital health futures empowerment or coercion
 
2013 passbac-marc smith-node xl-sna-social media-formatted
2013 passbac-marc smith-node xl-sna-social media-formatted2013 passbac-marc smith-node xl-sna-social media-formatted
2013 passbac-marc smith-node xl-sna-social media-formatted
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...
 
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
 
2012 ona practitioner-courseflyer
2012 ona practitioner-courseflyer2012 ona practitioner-courseflyer
2012 ona practitioner-courseflyer
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
 
20111123 mwa2011-marc smith
20111123 mwa2011-marc smith20111123 mwa2011-marc smith
20111123 mwa2011-marc smith
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box
 
20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research Foundation20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research Foundation
 

2006 www - lento welser gu smith - ties thatblog

  • 1. The Ties that Blog: Examining the Relationship Between Social Ties and Continued Participation in the Wallop Weblogging System Thomas Lento Howard T. Welser Lei Gu Cornell University University of Washington Microsoft News 323 Uris Hall 202 Savery Hall, Box 353340 Redmond, WA Ithaca, NY 14850 Seattle WA, 98195 Shanghai, China thomas.lento@cornell.edu htwiii@gmail.com legu@microsoft.com Marc Smith Microsoft Research Redmond, WA masmith@microsoft.com ABSTRACT posting content than others? Comparing social tie patterns with Are people who remain active as webloggers more socially con- activity over time can answer this question. nected to other users? How are the number and nature of social Technical developments and the growth of the user base have ties related to people's willingness to continue contributing con- made weblogs a common medium for social interaction [4], and tent to a weblog? This study uses longitudinal data taken from understanding this increasingly important element of network Wallop, a weblogging system developed by Microsoft Research, usage is critical to effective analysis of the medium. Recognizing to explore patterns of user activity. In its year long operation the importance of social interaction, recent research ranging from Wallop hosted a naturally occurring opportunity for cultural com- studies of political opinions [1] to analyses of the conversational parison, as it developed a majority Chinese language using popu- nature of weblogs [8] has focused on links and comments as indi- lation (despite the English language focus of the system). This cators of interaction between users. However, such research is allows us to consider whether or not language communities have necessarily limited to analysis of individuals who actively post different social network characteristics that vary along different content, and generalizing these results to the full user population activity levels. Logistic regression models and network visualiza- is difficult since the bulk of the content produced in Internet dis- tions reveal two key findings. The first is that not all ties are cussion groups [15] and some weblogging systems [6] is contrib- equal. Although a count of incoming comments appears to be a uted by a very small percentage of the user population. A system- significant predictor of retention, it loses its predictive strength atic study of the factors contributing to continued user activity when strong ties created by repeated, reciprocal interaction and will provide an important backdrop to future research on social ties from other dedicated webloggers are considered. Second, the behavior in weblogs, and could allow researchers to examine higher rate of retention among the Chinese language users is questions which were previously intractable due to the difficulty partly explained by that population’s greater ability to draw in of generalizing results to a broader population. Furthermore, un- participants with pre-existing social ties. We conclude with con- derstanding the factors which contribute to continued user activity siderations for weblogs and directions for future research. could allow the creators of new weblogging systems to make more informed design decisions. Keywords This study draws on rich longitudinal data taken from Wallop, a Weblogs, Social Ties, Social Networks, Retention, Wallop personal publishing and social networking system designed by Microsoft Research, to compare users who remain active with 1. INTRODUCTION those who do not. The analysis focuses on the relationship be- Why do people maintain weblogs? Although existing research [7, tween social ties and continued activity as expressed through 14] provides excellent insight into individual motivations for cre- various features of the system, and tests if people who remain ating weblogs, such work rarely focuses on changes in behavior active are more socially connected to other users within the sys- over time. Work on time-series data tends to focus on linking tem. An additional dimension of the Wallop user base adds a cul- patterns between weblogs [10] and the flow of information tural element to the analysis. Over the life of the Wallop system through these link structures [2]. However, the question of why 65% of the active users were Chinese language users who, as we some people stop updating their weblogs after a short period of have shown elsewhere, contribute more content and remain active time while others remain active for months or even years is not in the system longer than non-Chinese language users [6]. This well researched. Is this a phenomenon dependent upon social suggests some difference between the two groups, which contrib- factors, the behavior of others as much as individual qualities? Do utes to differences in their levels of continued activity in the sys- people who receive responses and interact with others remain tem. Is this difference cultural? Is it social? How much of the active for longer than those who do not? Or is it more personal, variance between the two groups is explained by differences in where some individuals are more internally motivated to continue their respective social networks?
  • 2. 2. WEBLOGS AND SOCIAL NETWORKS This study seeks to address these concerns by using a unique As the popularity of weblogs has increased, interaction between dataset to examine the relationship between social network ties authors has become more important to users, developers, and re- and continued participation in a weblogging system. The dataset, searchers. The popularity of services targeted at promoting user which is taken from the database of activity within the Wallop interaction, such as LiveJournal, has led to the creation of alterna- weblogging and social networking system, includes full informa- tive weblogging systems which emphasize sharing content with tion about each users' invitation and entry into the system, activity others. Weblogging features have been added to popular social in the system, and interactions with other users. These activity networking sites like MySpace and Friendster, and photo sharing logs can be used to generate data on the full network of social tools like Flickr are increasing in popularity and beginning to interactions within the Wallop system, and by combining the ac- draw competition from similar products.1 Meanwhile, researchers tivity network data with changes in activity over time it is possible have been busily analyzing the link structure taken from collec- to examine the relationship between social ties and continued tions of weblogs, searching for clues to the social relationships participation. between weblog authors. Analyses of these network structures It is important to remember that Wallop is a partially closed unit have led to excellent studies of deliberation within the medium of weblogging activity bounded by a snowball sample invitation [1], and the creation of metrics for authority and influence [11]. mechanism. Although application of these findings may be appli- Another study of the social structure of weblogs indicates that in cable to weblogs and other content creation platforms even when some cases weblog postings are conversational in nature [8], they lack Wallop’s membership features, the features that enable which may suggest a strong social connection between the par- some aspects of this research simultaneously set the Wallop ticipants. dataset apart from the wider “blogosphere.” Therefore, these Research examining the nature of social links contained within the findings may be most directly comparable to other similarly weblog medium is rarely focused on explaining individual partici- bounded social network systems such as Orkut. pation. Yet understanding why people upload content to their 2.1 Wallop weblogs, and why some continue to publish content months or Wallop is an online personal publishing and social networking even years later, may be a central element needed for under- system. An individual gains access to the system when she re- standing the nature of the social phenomena present within the ceives an invitation from an existing Wallop user. It is not possi- weblogging medium. Existing research on individual motivations, ble to navigate to the site and create an account without receiving based on content analysis [7] and interviews [14], suggests that an invitation. people publish weblogs for many different reasons, including a desire to publish a diary, share knowledge, express a viewpoint, or become part of a community based on a particular shared interest. However, this research only touches on the social connections between the users and how they affect individual decisions to participate. Outside of the weblog literature, research suggests that social ties are critically important for participation. Recruitment to political and social movements is often dependent on pre-existing social ties [3, 13]. In e-learning settings, continued participation is de- pendent on social interaction with the professors and other stu- dents, even more so than the course contents and materials [12]. Studies of data taken from other online groups show that closely connected groups are more supportive of members [16], and indi- viduals with a strong sense of attachment to a group are more likely to participate [9]. Despite the apparently strong connection between social connec- tions and participation in online and offline groups, empirical research on the relationship between social network structure and individual activity is limited. This lack of empirical research is largely a result of the difficulty of collecting network data. Even Figure 1. The Wallop interface, where users can manage their in the case of weblog research, where data on the links connecting Wallop network. one weblog to another is both archived and publicly available, the sheer number of weblogs, the variety of different links, and the tendency of users to change hosts makes comprehensive data col- Once she logs in, the user is presented with an interface which lection difficult. Furthermore, defining different levels of partici- allows her to publish content in the form of text, images, or music pation while controlling for the enormous variation in the differ- posted to her personal weblog. She can interact with other users ent types of weblogs is a massive undertaking, making any study by making comments on their content, or by viewing, navigating, of social network attributes and continued participation either and adjusting the network of other Wallop users connected to her. exceedingly difficult or dramatically simplified. 1 http://www.zooomr.com/
  • 3. Three types of networks, a comment network (4,285 nodes, 14,884 edges), an invitation network (4,514, nodes, 4,255 edges), and a combination network (3,119 nodes, 4,323 edges), were cre- ated for this study. Ties in the comment network, which is the primary network used in the analysis, are defined as comments sent from one user to another. The comment network was con- structed from data taken in a five week period spanning the entire month of November 2004. This time period was chosen because it was an interesting growth period in the Wallop lifecycle, occur- ring shortly after the system went public. It also has the advantage of producing networks which, while large, are substantially smaller than networks taken from similar periods later in the life cycle. A five week period was chosen as short enough to capture a relatively brief moment in time, while still being long enough to provide rich data. The invitation network contains all nodes in- cluded in the comment network, plus any additional inviters. An invitation (which did not need to be accepted) defines an edge, and each user could only be invited once. The combination net- Figure 2. The Wallop interface for uploading content work is a subset of the comment network, where edges are defined including comments, images and music. for any two nodes connected by both a comment and an invitation. 2.2 Research Questions The network data generated from the raw Wallop data was used to An interesting phenomenon occurred after Wallop was released to generate variables, which were then included in a logistic regres- the public in October 2004. Despite having an English-only user sion model. The regression model also included individual level interface, restricting access to invited users, and starting with an variables, such as activity measures and language group affilia- English-speaking population of Microsoft employees, Wallop tion. Finally, the network data was used to generate visualizations, attracted an enormous contingent of Chinese language users. The which were used to illustrate general trends in user behavior and Chinese language users became the dominant group in the system show some distinctions between those who remained active and after a period of explosive growth in mid-November 2004, and those who did not. continued to grow at a rapid rate, outpacing the growth of the non- 3.1 Network Variables Chinese language group. Except where noted, network variables for the analysis were con- Earlier work [6] shows higher levels of activity among Wallop's structed from the comment network. Chinese language user population than the non-Chinese language In-degree is the number of others who commented on a person’s users. Even after adjusting for differences in population size, the weblog. This variable captures interaction received from an indi- Chinese language users send more invitations, have a higher ac- vidual. Out-degree was not included because, as a measure of ceptance rate for their invitations, and contribute more content messages sent, it may not be representative of social interaction than the non-Chinese language users. Most importantly, much of from the sender's perspective. Receipt of a message is a slightly the variance is explained by the Chinese language users' tendency stronger indication to the recipient that there is a social connec- to remain active in the system longer. tion. Reciprocal ties would be the ideal measure, but due to diffi- The fact that the Chinese language users remain active in the culties with the database they were not available at the time of this Wallop system for a longer period of time than the non-Chinese analysis. language users raises some interesting questions about the reasons Density is the ratio of the observed number of ties divided by the for the variance between the two groups. This study, therefore, maximum possible ties [(n-1)*(n-2)]. Density is calculated for focuses on two general research questions: each node as the density of a subset of the comment network, 1. How are social network statistics related to a user's tendency to consisting of all nodes and edges within a two degree radius of remain active over a relatively long period of time? ego. This is a measure of the connectedness of an individual's 2. How much of the variance between the Chinese and non-Chi- local network. The more connected an individual's network, the nese language groups can be explained by differences in social more likely it is that she will feel she is part of a social group, as network structure? opposed to an isolated voice. Betweeness centrality measures the degree to which a node in a 3. DATA AND METHODS graph is a necessary conduit on the shortest path between other Data were taken from the Wallop back-end database, which in- nodes. In these data the raw measure for betweeness is highly cludes a record of all invitations and uploads for each user, along right skewed. Therefore the natural log of the values is used, with date and time information. The database also includes data which resolves some of the skew. For a formula of betweeness associating responses or comments with the original object. This centrality see Brandes [5]. makes it possible to reconstruct the network of interactions cre- Total degree in combined network measures the number of strong ated by users sending comments to each other. It also makes it inward and outward ties that a weblogger has, where a strong tie possible to reconstruct the invitation tree created by existing is defined as an edge between a dyad where both an invitation and Wallop users inviting potential new users to join the system. a comment has occurred. Such ties indicate that webloggers inter- acted after invitation, suggesting a pre-existing and therefore more
  • 4. substantial relationship between the nodes. While Wallop users did invite random strangers to join the system, we argue that those Table 1. Means for Structural Variables, Grouped by that then also engaged in commentary with their inviter have Retention stronger ties than those that did not. Time one alters active at time two reports a count of connections to others in time one who are also active during time two. Ties to dedicated bloggers (those that will stick around) will likely have an impact on an individual's decision to remain active. If social ties are important to activity, and all of the people with whom a person communicates leave the system, then we would expect that person to leave the system as well. Percent Time One Alters Active at Time Two is a normalized ver- sion of time one alters active at time two. The original measure is imperfect, because an individual with a few friends might remain active if all of those friends also remain active. Conversely, an individual with many friends might have a low percentage of her friends remain active, but still have many active alters and there- fore stay active herself. Therefore, both measures are necessary. Two of the most striking differences stem from ties from dedi- 3.2 Individual Variables cated bloggers. On average, those active at T2 had almost four Active at Time 2 is the dependent variable, and is coded as a bi- times as many ties to dedicated bloggers, and within their current nary for active or not active. Time 2 represents a 5 week period ties, dedicated bloggers represented almost half of their ties. from January of 2005. Active users are defined as those who posted comments during this time period. Content uploads would Table 2. Means for Structural Variables, Grouped by be a better measure, but unfortunately this data was not available Retention at the time of this work. Language Group is a binary variable coded 1 for Chinese Lan- guage Group, 0 otherwise. Any user with more than 5% of her comments, or with 3 or more total comments, containing charac- ters from the CJK (Chinese, Japanese, and Korean) Unicode ranges was classified as a Chinese language user. All other users were placed in the non-Chinese language user group. Close ex- amination of the comments posted by a random sample of 200 users from each group suggests that the error rates for these classi- fication systems are quite low. Less than 1% of the Chinese lan- guage sample did not include Chinese comments, and all of these were Japanese or Korean.2 Similarly, less than 3% of the non- Chinese language group were using Chinese characters, and most were communicating in English. Table 2 reports the same variables while dividing the sample by 4. ANALYSIS language group rather than retention. Table 1 showed that 84% of Table 1 compares the means of structural variables for Active T2 those active at T2 were in the Chinese language group. Therefore and Non Active T2 webloggers. The network attributes of long variables associated with retention are expected to be higher for lived webloggers are strikingly and significantly different from the Chinese language group. those who quit. On average, they have more than twice as many Chinese language users send comments to more unique alters people who send them comments, they are more central in the (they have higher degree), are more central (higher betweeness), comment network, they have one more person in their strong tie and have more strong ties. They also have more ties to dedicated network, and they are disproportionately connected to other blog- bloggers and a higher percentage of their ties are to more dedi- gers who themselves are long lived. These differences suggest cated bloggers. Finally, their retention rate is more than double the that social ties are positively related to activity within the system, Non-Chinese language group. particularly where those ties link to other active users.3 Overall, these comparisons of means suggest both that social structure should enhance retention, and that these structural fea- 2 tures are more prevalent among the Chinese language users. We recognize the cultural differences between Japanese, Koreans, and Chinese, but due to the small number of Japanese and Korean users who did not use Chinese we felt justified in accepting these as a minimal classification error. penalizes larger networks. We include it here as a warning 3 against using the standard definition of density to study online The one contradictory variable is density, which is higher for social networks, especially because many online networks are those who quit. However, this is an artifact of how density is large. conventionally standardized in network analysis, by essentially taking the square of the number of nodes. This unrealistically
  • 5. 4.1 Logistic Regression Predicting Retention Table 3. Means of network variables for Wallop webloggers The primary analyses employ a logistic regression model to pre- dict the probability of continued participation in the Wallop we- blogging system. Table 3 reports three models which allow com- parison of how the effects on retention of these social structural variables differ between the full sample, the Non-Chinese, and the Chinese language groups. The model coefficients have been ex- ponentiated and thus reveal how a one unit change in that variable would affect the odds of retention over the odds of quitting. Val- ues greater than one indicate a change that increases the odds of retention, values less than one indicate a decrease in the odds of retention, and values equal to one indicate no change in the odds of retention. In interpreting multivariate regression results it is important to remember that the coefficient for each variable has been calculated after conditioning on the other variables. The general trends in the three models, combined with the high significance levels for several of the variables, support the notion that social ties can positively affect continued participation. 4.1.1 Model for Combined Sample The model for the combined sample suggests that bloggers will be more likely to remain active if they are more central (betweeness), have larger networks of strong ties, and are tied to larger numbers and higher percentages of bloggers who themselves remain active. The coefficient for percent T1 alters active is an especially power- ful predictor of continued activity. All other things being equal, the odds of a user remaining active in the system increase 2.6 times when all alters in her network remain active. The most interesting result of this model is that in-degree is not For non-Chinese language users, a larger total degree for com- significant once other variables are taken into account. Although it bined network is related to a lower level of activity at T2. This seems intuitive that the more connections an individual has, the result, while surprising, is not significant. The effect of this vari- longer she will remain active in the system, it turns out that not all able for Chinese language users, by contrast, is strongly signifi- ties are equal. Combined network ties, which consist of both cant in the opposite direction. This suggests that the pre-existing comment and invitation links, represent stronger ties and a higher relationships and strong connections to other users are more im- likelihood of a pre-existing relationship between the users. These portant to the Chinese language users than the non-Chinese lan- types of strong ties, along with ties to other active users, are much guage users. more important predictors of retention than simply the number of Having a larger number, and a higher percentage, of alters active comments one receives (In Degree). at T2 is related to a higher level of retention, and is strongly sig- Given earlier work studying the Wallop system [6], it is not sur- nificant for both groups. However, the effects are substantially prising that language group has a strong effect on retention. Even stronger for non-Chinese language users than Chinese language after controlling for other factors, a Chinese language user is more users. This suggests that connections to dedictated bloggers are than twice as likely to remain active than a Non-Chinese language more important to the non-Chinese language users, whereas for user. the Chinese language users these factors are less important, as they are sustained both by these connections, and by their strong ties. 4.1.2 Comparison of Models for Chinese and Non Chinese Language User Samples 4.1.3 Evaluating model fit Comparing the model for Chinese and non-Chinese language An intuitive way to evaluate a fitted logistic regression model is to users can provide some insight into the differences between the compare the predicted probability of the outcome to the observed two groups. In general, the effects of the network variables in- outcome for all cases. The following plot does just this, by de- cluded in this model are similar for both groups. The two striking picting predicted probability against the observed presence of differences are in the effects of the Total Degree for Combined subsequent participation for the combined sample. All cases Network and Active T1 Alters variables. above the midline were active at time 2 and all cases below the midline were not. Vertical variation within quadrants is simply the result of random jittering introduced to reduce overlap of cases.
  • 6. 2004, the initial period of rapid increase in the Chinese language users. Figure 3. A visual cross-classification of actual and predicted retention How well did the model used in this analysis predict actual reten- Figure 4. Core of the comment network. tion? The first notable result is that the vast majority of cases are in the lower left quadrant, meaning they both dropped out, and had social networks unlikely to sustain continued activity. Thus The visualizations in Figures 4, 5, and 6 report on these three there appears to be a clear relationship between low values in the datasets, while coding language through color and shape of the measured social structural attributes and loss of activity. Further- node: English (Triangle, Blue), Chinese (Circle, Red), and Un- more, the webloggers who had high levels of these social struc- categorizable4 (Square, White). Continued activity in Wallop is tural variables were more likely to stay active. This can be seen by indicated by depth of shading, with darker hues corresponding to comparing the right hand quadrants to each other. Cases are participants in T1(November) who posted a comment during T2 clearly more numerous in the upper right than the lower right, and (January). as predicted probability increases, the frequency of false positives The comment network, which represents the primary mode of decreases. interaction in Wallop, is represented in Figure 4. Since the com- However, the upper left quadrant can be thought of as cases of ment network is large (over 4,000 nodes) this visualization pre- ‘false negatives’, i.e. they are results which the model predicted sents a reduced graph that includes only those nodes with at least would drop out of the system, yet they actually remained active. 5 neighbors. This graph has several interesting attributes: (1) This suggests that a large amount of what keeps people involved Commenting is highly, but not completely segregated by language in weblogging is not included in this model. Therefore (1) we can group. A notable exception is on the far right. (2) Intense ties do a better job of measuring social structural variables, and (2) we (indicating multiple comments within the dyad) are prevalent can try to measure more individualistic characteristics that might within the core of the population. This is true for both language also explain sustained participation. Furthermore, although this groups, although the intense Chinese language core seems larger. analysis provides some insight into how social structure might (3) There is some evidence that that those who remain active sustain continued involvement in weblogging systems, there re- during T2 are centrally located in their local network during T1. mains a great deal of variation to explain. The next section pre- This supports the effects of betweenness centrality observed in the sents network visualizations, which help provide insight into how regression models. structural features might be related to continued participation, and Figure 5 illustrates a reduced graph of the invitation network which point to some additional future directions for research. (nodes with at least 4 invitees). 4.2 Network Visualizations Network visualizations are used to explore structural relationships in the Wallop system and to generate new intuitions. Here are three visualizations, which highlight the major language group division in the system. These visualizations are based on three network datasets con- structed from three definitions of ties: invitation, where ego is tied to alter when one invited the other to join Wallop; comment, where ego or alter commented on the other's weblog; and com- bined, where within an invitation relationship there was also the 4 Users who did not upload any comments could not be classified presence of a comment tie. The data includes ties from November as Chinese or non-Chinese language users.
  • 7. with at least 4 invitees who also commented on or were com- mented upon by their child, has two interesting characteristics: (1) There is no clear central component. While the reduced combined graph contains nearly the same number of nodes (332 to 267) as the invitation graph, the combined graph is highly fragmented. This result fits with qualitative data that suggests that among Chi- nese users, invitations would often be made available online to strangers, who would then invite a group of their friends. This would result in fragmented groupings. (2) Branching is less prevalent and ties are more fragile. It seems that the invite and comment relationship was seldom both intense and numerous. 5. DISCUSSION This study provides some initial data on the relationship between social interaction and an individual's propensity to continue up- loading content. Interacting with active alters is a strong and sig- nificant predictor of continued user activity. Users who maintain a connection with the person who invited them are also more likely to stay active, which suggests that pre-existing networks are likely an important part of retention. Chinese users had more of these relationships, they were retained more, and the coefficient is strong and significant for them. However, non-Chinese had fewer of these relationships, they were not retained nearly as well, and Figure 5. Core of the invitation network. this coefficient is weak, negative and non-significant for them. One important point to take away from this study is that the There are several interesting elements in this graph: (1) There is a amount of ties does not matter as much as the strength of those similar general tendency towards segregation as in the comment ties, and the number of ties to other dedicated bloggers. That is, network, although there are several notable exceptions. (2) While after controlling for combined ties and ties to bloggers who them- there is one major branch that is initially prevalent with Non-Chi- selves remain active, in degree had no affect on retention. This nese language users, it and all other branches are primarily Chi- has implications for more general studies of weblogs: measuring nese language users. (3) People who are branching points within ties by simply counting them is not enough because not all ties are this graph are disproportionately active during T2. (4) The high equally effective indicators of a strong connection, which may degree invitation network is composed largely of a single large itself be an indicator of community. More important than meas- component. uring ties is to measure ties that have some underlying reason to affect behavior, in this case, ties that were likely to represent pre- existing relationships and ties to dedicated participants. A second major point to take away from this study concerns the idea of pre-existing social networks. Organizations that can draw existing social networks into them will build stronger community than those that do not. This is demonstrated by the Chinese Lan- guage user group, which was composed of much more durable and active participants. Part of their success, we argue, is that they had a higher prevalence of "strong ties" that combined invitation and commenting, which can be thought of as a proxy indicator of pre-existing social relationships. More important then their prevalence, though, was their influential nature. In the group with greater cohesion, the presence of those strong ties had a strong positive effect on retention. This variable was an important and significant predictor of retention for the Chinese Language group, and not for the non-Chinese language group. The positive effect of pre-existing relationships was suggested by theory and prior research [13], and it is instructive to find evidence of this same mechanism in the context of Weblogs as well. This mechanism can also explain the meteoric growth of MySpace and FaceBook in the US, which are populated predominantly by students who can easily draw in their extensive and intense school-based social Figure 6. Absence of a core in the combined network. networks into the system. Further, weblogs seeking growth need to look to populations within organizations for recruitment, bringing in users who may then pull in their already existing so- What does an identical reduction of the combined invitation and cial connections, in favor of individuals outside of institutions comment network show? The graph above, which includes nodes who may have less developed connections.
  • 8. Although this study represents a promising start, it has only http://www.hpl.hp.com/research/idl/papers/politicalblogs/Ad scratched the surface of the relationship between network struc- amicGlanceBlogWWW.pdf ture and retention in weblogs. More sophisticated network statis- [2] Adar, E. and L. A. Adamic. 2005. Tracking Information tics, such as triangle census measures, measures based on recipro- Epidemics in Blogspace. Web Intelligence 2005. cal ties, and a more thorough treatment of data available from analysis of localized egocentric network data for each node might [3] Anheier, H. 2003. Movement Development and help explain more of the variance in the model. Creating continu- Organizational Networks: The Role of 'Single Members' in ous variables which measure decline in user activity would also be the German Nazi Party, 1925-30. Social Movements and beneficial. For example, a typical user might begin with a flurry Networks: Relational Approaches to Collective Action. D. of activity, which then declines until the user is no longer active in McAdam and M. Diani. New York, Oxford University Press: the system. Comparing the social structures around these users to 49-76. users with dramatically different behavioral patterns might pro- [4] Blood, R. 2004. How Blogging Software Reshapes the vide further insight into the importance of social ties. Online Community. Communications of the ACM. 47: 53-55. Furthermore, controls for individual behavior are noticeably ab- [5] Brandes, U. 2001. A Faster Algorithm for Betweenness sent from this analysis. Login, weblog posting, and image up- Centrality. Journal of Mathematical Sociology. 25(2):163- loading behavior might shed some light on user activity patterns. 177. People who rarely log in, even if they are positioned in an active [6] Gu, L., P. Johns, T. M. Lento and M. A. Smith. 2006. How social network, are not as likely to remain active. If possible, con- Do Blog Gardens Grow? Language Community Correlates trols for demographic features would also enhance this analysis. with Network Diffusion and Adoption of Blogging Systems. Unfortunately, due to problems with searching against the data- Proceedings of the AAAI Symposium on Computational base, the data for these variables was not available at the time of Approaches to Analyzing Weblogs. this analysis, but future research along these lines should include this data wherever possible. [7] Herring, S., L. A. Scheidt, S. Bonus and E. Wright. 2004. Bridging the Gap: A Genre Analysis of Weblogs. Hawai'i Despite the limitations of this study, it has brought some interest- International Conference on Systems Science. ing findings and intriguing theoretical questions to light. Social interaction does appear to be important to user activity in weblog- [8] Herring, S., I. Kouper, J. C. Paolillo, L. A. Scheidt, M. ging systems. Even simple network measures can explain some of Tyworth, P. Welsch, E. Wright and N. Yu. 2005. the variance between the heavily active Chinese language user Conversations in the Blogosphere: An Analysis "From the group and the rest of the Wallop user population. Further refine- Bottom Up." Hawai'i International Conference on Systems ment of these models might provide some insight into how people Sciences. from different cultural backgrounds use weblogging systems. Do [9] Kollock, P. and M. Smith. 1996. Managing the Virtual the Chinese language users remain active because they are more Commons: Cooperation and Conflict in Computer willing to make social connections through this medium? Or is the Communities. Computer-Mediated Communication. S. higher activity level a result of differences in the demographic Herring. Amsterdam, John Benjamins. distribution of the two populations? Finally, further exploration of the combination network, and the effects of those combination [10] Kumar, R., J. Novak, P. Raghavan and A. Tomkins. 2003. ties, might shed some light on the importance of pre-existing rela- On the Bursty Evolution of Blogspace. WWW2003. tionships. Do users who actively contribute to weblogging sys- [11] Marlow, C. 2004. Audience, Structure and Authority in the tems do so because they have offline friends there? What other Weblog Community. Presented at the International reasons might motivate an individual to contribute? Finally, the Communication Association Conference. visualizations presented above prompt additional questions for [12] Martinez, M. 2003. High Attrition Rates in E-Learning: further consideration. What is the significance of the centrality of Challenges, Predictors, and Solutions. The E-Learning active users? Is this an artifact of their higher activity levels – i.e. Developer's Journal. July 14: 9 pages. more comments and invitations sent tend to move an individual towards the center of the graph – or is it indicative of something [13] McAdam, D. 1988. Freedom Summer. New York, Oxford else? Could this be related to status within the Wallop system? University Press. [14] Nardi, B. A., D. J. Schiano, M. Gumbrecht and L. Swartz. 6. ACKNOWLEDGMENTS 2004. Why We Blog. Communications of the ACM. 47: 41- Our thanks to Wallop Users, the Wallop Development Team, 49. MSR Social Computing and Community Technologies Groups, [15] Smith, M. 2004. Measures and Maps of Usenet. From Usenet Shelly Farnham, and Danyel Fisher. to Cowebs. C. Lueg and D. Fisher, Springer. [16] Wellman, B. 1996. An Electronic Group Is Virtually a Social Network. Culture of the Internet. S. Kiesler. Hillsdale, NJ, 7. REFERENCES Lawrence Erlbaum: 179-208. [1] Adamic, L. A. and N. Glance. 2005. The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. Proceedings of the WWW2005 Conference's 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics.