Why Do We Need Web Science Research?Presentation Transcript
Why Do We Need
Web Science Research?
Sangki Han, Ph.D.
Professor / GSCT
Web Science by Tim Berners-Lee
This is the first report of time-dependent of these has changed when changes in only track to realizing technological capabilities
seismic tomography applied to an erupting two quantities (VP and VS) have been mea- resembling those of the fictional Virtual
volcano. It builds on earlier work of the sured is not possible and requires the addi- Geophysical Laboratory by 2025.
same kind done in geothermal areas in tion of other kinds of data. Both theoretical
California and Iceland and the Long Valley advances and more data from different vol- References and Notes
1. D. Patanè, G. Barberi, O. Cocina, P. De Gori, C.
Caldera, California. But the seminal exam- canoes are needed before the potential of the Chiarabba, Science 313, 821 (2006).
ple of major changes in VP /VS comes from method can be fully assessed. 2. The compressional and shear waves are the fastest and
The Geysers geothermal area in northern At present, monitoring of active volca- second-fastest waves to be radiated from an earthquake
California. noes still rests mostly on relatively unso- source, so they arrive first and second on seismograms.
Their ratio provides information about pressure and
During the 1980s and 1990s, some phisticated seismic networks and the moni- about the presence of gas and liquid in the study volume.
13,600 tons of steam per hour were extracted toring of simple parameters, such as the Thus, changes in their ratio can tell us about changes in
from The Geysers to generate electricity. As numbers of earthquakes and the amplitude pressure and gas/liquid, which are thought to accompany
the buildup and occurrence of a volcanic eruption.
a result of this overexploitation, the reser- of harmonic tremor. Patanè et al. show that 3. G. R. Foulger, C. C. Grant, A. Ross, B. R. Julian, Geophys.
voir became progressively depleted as pore much more sophisticated methods can now Res. Lett. 24, 135 (1997).
water was replaced by steam. Repeat seis- be used. Some of these methods only need to 4. R. C. Gunasekera, G. R. Foulger, B. R. Julian, J. Geophys.
Res. 108, 2134 (2003).
mic tomography showed the steady growth be automated—a critical factor if they are to
5. G. R. Foulger, B. R. Julian, Geotherm. Resour. Counc.
Downloaded from www.sciencemag.org on December 1, 2009
of a reservoir-wide negative VP /VS anomaly be useful in situations where information is Bull. 33, 120 (2004).
that coincided with the steam-production needed on an hourly basis. It is hoped that 6. G. R. Foulger et al., J. Geophys. Res. 108, 2147 (2003).
zone. This anomaly was caused by the com- this automation work will be pushed for-
bined effects of the replacement of pore liq- ward rapidly in the near future, putting us on 10.1126/science.1131790
uid with steam, the resulting decrease in
pressure, and the drying of clay minerals. A
remarkable series of snapshots showed the
relentless growth of a volume of heavy
depletion (3, 4). The work helped to increase
awareness of the nonsustainability of such
high rates of fluid withdrawal. Production at
Creating a Science of the Web
The Geysers has now been reduced to sus- Tim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt, Daniel J. Weitzner
tainable levels. Time-dependent tomogra-
phy is currently used to monitor the Coso Understanding and fostering the growth of the World Wide Web, both in engineering and societal
Geothermal Area, southern California (5). terms, will require the development of a new interdisciplinary field.
Time-dependent seismic tomography
was first applied to a volcano in a study of ince its inception, the World Wide lyzes the natural world, and tries to find
Mammoth Mountain, a volcano on the rim Web has changed the ways scientists microscopic laws that, extrapolated to the
of Long Valley Caldera, California. In 1989, communicate, collaborate, and edu- macroscopic realm, would generate the
an intense swarm of hundreds of earth- cate. There is, however, a growing realiza- behavior observed. Computer science, by
quakes accompanied an injection of new tion among many researchers that a clear contrast, though partly analytic, is princi-
magma into the roots of this volcano, research agenda aimed pally synthetic: It is concerned with the con-
and triggered the outpouring of some at understanding the struction of new languages and algorithms
300 tons of CO2 per day from the vol- Enhanced online at current, evolving, in order to produce novel desired computer
cano’s surface. Several broad swaths content/full/313/5788/769 and potential Web is behaviors. Web science is a combination of
of trees died as a result of high levels needed. If we want to these two features. The Web is an engineered
of CO 2 in the soil, and the CO 2 model the Web; if we space created through formally specified
also presented an asphyxiation hazard want to understand the architectural princi- languages and protocols. However, because
to humans. A comparison of VP /VS tomo- ples that have provided for its growth; and if humans are the creators of Web pages and
graphic images calculated for 1989 and we want to be sure that it supports the basic links between them, their interactions form
1997 showed changes that correlated well social values of trustworthiness, privacy, emergent patterns in the Web at a macro-
with areas of tree death on the surface above, and respect for social boundaries, then we scopic scale. These human interactions are,
and were attributed to migration of CO2 in must chart out a research agenda that targets in turn, governed by social conventions and
the volcano (6). the Web as a primary focus of attention. laws. Web science, therefore, must be inher-
By showing that time-dependent seismic When we discuss an agenda for a science ently interdisciplinary; its goal is to both
tomography can be used to monitor struc- of the Web, we use the term “science” in two understand the growth of the Web and to cre-
tural changes directly associated with a vol- ways. Physical and biological science ana- ate approaches that allow new powerful and
canic eruption cycle, Patanè et al. take a crit- more beneficial patterns to occur.
ical step toward developing a useful volcano- Unfortunately, such a research area does
T. Berners-Lee and D. J. Weitzner are at the Computer Science
hazard-reduction tool based on seismic and Artificial Intelligence Laboratory, Massachusetts Institute not yet exist in a coherent form. Within
tomography. As with all good experiments, of Technology, Cambridge, MA 02139, USA. W. Hall and computer science, Web-related research has
however, it ushers in new challenges. VP /VS N. Shadbolt are in the School of Electronics and Computer largely focused on information-retrieval
is affected by several factors, including pore Science, University of Southampton, Southampton SO17 algorithms and on algorithms for the routing
1BJ, UK. J. Hendler is in the Computer Science Department,
fluid phase, pressure, mineralogy, and frac- University of Maryland, College Park, MD 20742, USA. of information through the underlying Inter-
ture density. However, determining how each E-mail: firstname.lastname@example.org net. Outside of computing, researchers grow
www.sciencemag.org SCIENCE VOL 313 11 AUGUST 2006 769
Published by AAAS
A New Discipline
Model the Web’s structure
Articulate the architectural principles that have
fueled its phenomenal growth
Discover how online human interactions are
driven by and can change social conventions
Web Science Trust
The Web Science Research Initiative (WSRI) is a joint endeavour between the Computer
Science and Artiﬁcial Intelligence Laboratory (CSAIL) at MIT and the School of Electronics
and Computer Science (ECS) at the University of Southampton. The goal of WSRI is to
facilitate and produce the fundamental scientiﬁc advances necessary to inform the future
design and use of the World Wide Web
Publication: Foundations and Trends in Web Science
– Web Science Summer Graduate School
– WebSci09 - Society On-Line
Directors of WSRI are establishing a charitable body - the Web Science Trust (WST)
– Working with WWW Foundation
WebSci’09: Society On-Line
Understanding of both human behavior and Identiﬁed the following areas of on-line society and Web
technological design development for particular attention:
– How do people and organisations behave on-line – – E-commerce
what motivates them to shop, date, make friends,
– Government and Political Life
learn, participate in political life or manage their
health or tax on-line? – Social Relationships
– Which Web-based designs will they trust? To which – Cybercrime and/or the Prevention Thereof
on-line agents will they delegate?
– How can the dark side of the Web – such as
– Culture On-Line
cybercrime, pornography and terrorist networks –
be both understood and held in check without – E-Learning
compromising the experience of others?
The cross-cutting infrastructure issues on which these
– What are the effects of varying characteristics of areas depend including, but not limited to:
Web-based technologies – such as security, privacy,
– Linked Data and the Semantic Web
network structure, the linking of data – on on-line
behaviour, both criminal and non-criminal? – Trust and Reputation
– And how can the design of the Web of the future – Security and Privacy
ensure that a system on which – as Tim Berners-Lee
– Networking (Social and Technical)
put it – democracy and commerce depends
remains 'stable and pro-human'?
Web Science which subsequently improved
computing significantly. Web
science was launched as a formal
discipline in November 2006,
when the two of us and our col-
leagues at the Massachusetts In-
stitute of Technology and the
University of Southampton in
England announced the begin-
ning of a Web Science Research Initiative. Lead-
Studying the Web will reveal better ing researchers from 16 of the world’s top uni-
versities have since expanded on that effort.
ways to exploit information, This new discipline will model the Web’s
structure, articulate the architectural principles
prevent identity theft, that have fueled its phenomenal growth, and dis-
cover how online human interactions are driven
revolutionize industry and manage by and can change social conventions. It will elu-
cidate the principles that can ensure that the net-
our ever growing online lives work continues to grow productively and settle
complex issues such as privacy protection and in-
By Nigel Shadbolt and Tim Berners-Lee tellectual-property rights. To achieve these ends,
Web science will draw on mathematics, physics,
computer science, psychology, ecology, sociolo-
ince the World Wide Web blossomed in gy, law, political science, economics, and more.
the mid-1990s, it has exploded to more Of course, we cannot predict what this na-
than 15 billion pages that touch almost scent endeavor might reveal. Yet Web science
all aspects of modern life. Today more and more has already generated crucial insights, some
people’s jobs depend on the Web. Media, bank- presented here. Ultimately, the pursuit aims to
ing and health care are being revolutionized by answer fundamental questions: What evolu-
it. And governments are even considering how tionary patterns have driven the Web’s growth?
to run their countries with it. Little appreciated, Could they burn out? How do tipping points
however, is the fact that the Web is more than the arise, and can that be altered?
KEY CONCEPTS sum of its pages. Vast emergent properties have
The relentless rise in Web arisen that are transforming society. E-mail led Insights Already
pages and links is creating emer- to instant messaging, which has led to social net- Although Web science as a discipline is new,
gent properties, from social net- works such as Facebook. The transfer of docu- earlier research has revealed the potential value
working to virtual identity theft, ments led to ﬁle-sharing sites such as Napster, of such work. As the 1990s progressed, search-
that are transforming society. which have led to user-generated portals such as ing for information by looking for key words
A new discipline, Web science, YouTube. And tagging content with labels is cre- among the mounting number of pages was
aims to discover how Web traits ating online communities that share everything returning more and more irrelevant content.
arise and how they can be from concert news to parenting tips. The founders of Google, Larry Page and Sergey
harnessed or held in check to But few investigators are studying how such Brin, realized they needed to prioritize the
beneﬁt society. emergent properties have actually blossomed, results.
Important advances are begin- how we might harness them, what new phe- Their big insight was that the importance of
ning to be made; more work nomena may be coming or what any of this a page— how relevant it is—was best understood
can solve major issues such might mean for humankind. A new branch of in terms of the number and importance of the
as securing privacy and science —Web science— aims to address such is- pages linking to it. The difﬁculty was that part
conveying trust. sues. The timing ﬁts history: computers were of this deﬁnition is recursive: the importance of
—The Editors built first, and computer science followed, a page is determined by the importance of the
32 S C I E N T I F I C A M E R I C A N October 2008
Model the Web’s Structure
data, we can illustrate the same procedure for
PageRank by Page and Brin Power-Law Distribution of the the network of movie actors that we dis-
cussed (1). When the connectivity of the in-
World Wide Web dividual actors is plotted as a function of the
release year of their first movie (Fig. 1A), the
results are very similar to those shown in fig.
´ and Albert (1) propose an im- from other sites, and found that the distribu- 1B of Adamic and Huberman’s comment.
proved version of the Erdos-Renyi (ER) the-
¨ ´ tion of links followed a power law (Fig. 1A). The only difference is that the movie industry
Web is a scale-free network -
ory of random networks to account for the Next, we queried the InterNIC database (us- had its boom not 4 years ago, as did the
scaling properties of a number of systems, ing the WHOIS search tool at www. WWW, but rather at the beginning of the
including the link structure of the World networksolutions.com) for the date on which century; thus, the apparently structureless re-
. Wide Web (WWW). The theory they present, the site was originally registered. Whereas gime persists much longer. When the connec-
however, is inconsistent with empirically ob- the BA model predicts that older sites have tivity of the actors that debuted in the same
Northeastern University’s Albert-
served properties of the Web link structure. more time to acquire links and gather links at year is averaged, however, the average con-
Barabasi and Albert write that because
´ a faster rate than newer sites, the results of nectivity in the last 60 years increases with
“of the preferential attachment, a vertex our search (Fig. 1B) suggest no correlation the actor’s age, in line with the predictions of
that acquires more connections than anoth- between the age of a site and its number of our theory, and the curve follows a power law
er one will increase its connectivity at a links. for almost a hundred years (Fig. 1B). We
higher rate; thus, an initial difference in the The absence of correlation between age expect that a similar increasing tendency
connectivity between two vertices will in- and the number of links is hardly surpris- would appear for the WWW data after aver-
crease further as the network grows. . . . ing; all sites are not created equal. An aging, but the length of the scaling interval
Thus older . . . vertices increase their con- exciting site that appears in 1999 will soon would be limited by the Web’s comparatively
nectivity at the expense of the younger . . . have more links than a bland site created in brief history.
ones, leading over time to some vertices 1993. The rate of acquisition of new links is The fluctuations that lead to the appar-
that are highly connected, a ‘rich-get-rich- probably proportional to the number of ent randomness of Fig. 1A are due to the
er’ phenomenon” [figure 2C of (1)]. It is links the site already has, because the more individual differences in the rate at which
Web as having short paths and small worlds
this prediction of the Barabasi-Albert (BA)
´ links a site has, the more visible it becomes nodes increase their connectivity. It is
model, however, that renders it unable to and the more new links it will get. (There easy to include such differences in the
account for the power-law distribution of should, however, be an additional propor- model and continuum theory proposed by
links in the WWW [figure 1B of (1)]. tionality factor, or growth rate, that varies
We studied a crawl of 260,000 sites, each from site to site.)
one representing a separate domain name. We Our recently proposed theory (2), which
– While at Cornell University in the
counted how many links the sites received accounts for the power-law distribution in the
number of pages per site, can also be applied
to the number of links a site receives. In this
model, the number of new links a site re-
ceives at each time step is a random fraction
1990s, Duncan J. Watts and Steven H.
of the number of links the site already has.
New sites, each with a different growth rate,
appear at an exponential rate. This model
yields scatter plots similar to Fig. 1B, and can
produce any power-law exponent 1.
Lada A. Adamic
Bernardo A. Huberman
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304, USA
– Even though the Web was huge, a user References
1. A.-L. Barabasi and R. Albert, Science 286, 509 (1999).
2. B. A. Huberman and L. A. Adamic, Nature 401, 131
Fig. 1. (A) Scatter plot of movie actor connec-
tivity, k (the number of other actors with which
he or she performed during his or her career),
could get from one page to any other
10 November 1999; accepted 4 February 2000 versus the year of debut. All actors from the
Internet Movie Database were included; n
392,340. (B) Average movie actor connectivity,
Response: Adamic and Huberman offer ad- k , versus year of debut. To determine k , k is
ditional support for the evolutionary network averaged over all actors that debuted in the
model that we offered (1). The apparent mess same year. The curve shows a systematic in-
page in at most 14 clicks
in their fig. 1B is rooted in their choice not to crease in the average connectivity with the
average their data. We believe that taking the actor’s professional lifetime, t (2000 year
Fig. 1. (A) The distribution function for the average over all points of the same age, and of debut). The dotted line follows k(t) t ,
number of links, k, to Web sites (from crawl in with 0.49, very close to the prediction
spring 1997). The dashed line has slope
extracting the trends within those averages, 0.5 of (1). Inset shows a log-log plot of k as a
1.94. (B) Scatter plot of the number of links, k, would have unveiled the increasing tendency function of t, which illustrates the presence of
versus age for 120,000 sites. The correlation predicted by our model. scaling in the last century. The dotted line has
coefﬁcient is 0.03. Although we do not have access to their slope 0.5.
www.sciencemag.org SCIENCE VOL 287 24 MARCH 2000 2115a
Analysis on CyWorld
Analysis of Topological Characteristics
of Huge Online Social Networking Services
Yong-Yeol Ahn Seungyeop Han∗ Haewoon Kwak
Department of Physics NHN Corp. Division of Computer Science
KAIST, Deajeon, Korea Korea KAIST, Daejeon, Korea
email@example.com firstname.lastname@example.org email@example.com
Sue Moon Hawoong Jeong
Division of Computer Science Department of Physics
KAIST, Daejeon, Korea KAIST, Deajeon, Korea
ABSTRACT Cyworld, the largest SNS in South Korea, had already 10
Social networking services are a fast-growing business in the million users 2 years ago, one fourth of the entire population
Internet. However, it is unknown if online relationships and of South Korea. MySpace and orkut, similar social network-
their growth patterns are the same as in real-life social net- ing services, have also more than 10 million users each. Re-
works. In this paper, we compare the structures of three cently, the number of MySpace users exceeded 130 million
online social networking services: Cyworld, MySpace, and with a growing rate of over a hundred thousand people per
orkut, each with more than 10 million users, respectively. day. It is reported that these SNSs “attract nearly half of all
We have access to complete data of Cyworld’s ilchon (friend) web users” . The goal of these services is to help people
relationships and analyze its degree distribution, clustering establish an online presence and build social networks; and
property, degree correlation, and evolution over time. We to eventually exploit the user base for commercial purposes.
also use Cyworld data to evaluate the validity of snowball Thus the statistics and dynamics of these online social net-
sampling method, which we use to crawl and obtain par- works are of tremendous importance to social networking
tial network topologies of MySpace and orkut. Cyworld, service providers and those interested in online commerce.
the oldest of the three, demonstrates a changing scaling be- The notion of a network structure in social relations dates
havior over time in degree distribution. The latest Cyworld back about half a century. Yet, the focus of most sociological
data’s degree distribution exhibits a multi-scaling behavior, studies has been interactions in small groups, not structures
while those of MySpace and orkut have simple scaling be- of large and extensive networks. Diﬃculty in obtaining large
haviors with diﬀerent exponents. Very interestingly, each data sets was one reason behind the lack of structural study.
of the two exponents corresponds to the diﬀerent segments However, as reported in  recently, missing data may dis-
in Cyworld’s degree distribution. Certain online social net- tort the statistics severely and it is imperative to use large
working services encourage online activities that cannot be data sets in network structure analysis.
easily copied in real life; we show that they deviate from It is only very recently that we have seen research re-
close-knit online social networks which show a similar de- sults from large networks. Novel network structures from
gree correlation pattern to real-life social networks. human societies and communication systems have been un-
veiled; just to name a few are the Internet and WWW  and
Categories and Subject Descriptors: J.4 [Computer the patents, Autonomous Systems (AS), and aﬃliation net-
Applications]: Social and behavioral sciences works . Even in the short history of the Internet, SNSs are
General Terms: Human factors, Measurement a fairly new phenomenon and their network structures are
Keywords: Sampling, Social network not yet studied carefully. The social networks of SNSs are
believed to reﬂect the real-life social relationships of people
more accurately than any other online networks. Moreover,
1. INTRODUCTION because of their size, they oﬀer an unprecedented opportu-
The Internet has been a vessel to expand our social net- nity to study human social networks.
works in many ways. Social networking services (SNSs) are In this paper, we pose and answer the following questions:
one successful example of such a role. SNSs provide an on- What are the main characteristics of online social net-
line private space for individuals and tools for interacting works? Ever since the scale-free nature of the World-Wide
with other people in the Internet. SNSs help people ﬁnd Web network has been revealed, a large number of networks
others of a common interest, establish a forum for discus- have been analyzed and found to have power-law scaling in
sion, exchange photos and personal news, and many more. degree distribution, large clustering coeﬃcients, and small
This work was conducted while Han was at KAIST. mean degrees of separation (so called the small-world phe-
nomenon). The networks we are interested in this work are
Copyright is held by the International World Wide Web Conference Com- huge and those of this magnitude have not yet been ana-
mittee (IW3C2). Distribution of these papers is limited to classroom use,
and personal use by others.
WWW 2007, May 8–12, 2007, Banff, Alberta, Canada. How representive is a sample network? In most networks,
invoked by our population of 61,168 active
An Experimental Study of Search senders. When passing messages, senders
Discover how online human
typically used friendships in preference to
in Global Social Networks business or family ties; however, almost half
of these friendships were formed through ei-
ther work or school affiliations. Furthermore,
Peter Sheridan Dodds,1 Roby Muhamad,2 Duncan J. Watts1,2* successful chains in comparison with incom-
plete chains disproportionately involved pro-
interactions are driven by and can
We report on a global social-search experiment in which more than 60,000 fessional ties (33.9 versus 13.2%) rather than
e-mail users attempted to reach one of 18 target persons in 13 countries by friendship and familial relationships (59.8
forwarding messages to acquaintances. We ﬁnd that successful social search is versus 83.4%) (table S3). Successful chains
conducted primarily through intermediate to weak strength ties, does not were also more likely to entail links that
require highly connected “hubs” to succeed, and, in contrast to unsuccessful originated through work or higher education
social search, disproportionately relies on professional relationships. By ac- (65.1 versus 39.6%) (table S4). Men passed
change social conventions
counting for the attrition of message chains, we estimate that social searches messages more frequently to other men
can reach their targets in a median of ﬁve to seven steps, depending on the (57%), and women to other women (61%),
separation of source and target, although small variations in chain lengths and and this tendency to pass to a same-sex con-
participation rates generate large differences in target reachability. We con- tact was strengthened by about 3% if the
clude that although global social networks are, in principle, searchable, actual target was the same gender as the sender and
success depends sensitively on individual incentives. similarly weakened in the opposite case. In-
dividuals in both successful and unsuccessful
It has become commonplace to assert that any Targets included a professor at an Ivy League chains typically used ties to acquaintances
– Social drivers-goals, desires, interests and
individual in the world can reach any other university, an archival inspector in Estonia, a they deemed to be “fairly close.” However, in
individual through a short chain of social ties technology consultant in India, a policeman successful chains “casual” and “not close”
(1, 2). Early experimental work by Travers in Australia, and a veterinarian in the Norwe- ties were chosen 15.7 and 5.9% more fre-
and Milgram (3) suggested that the average gian army. Participants were informed that quently than in unsuccessful chains (table
length of such chains is roughly six, and their task was to help relay a message to their S5), thus adding support, and some resolu-
attitudes-are fundamental aspects of how
recent theoretical (4) and empirical (4–9) allocated target by passing the message to a tion, to the longstanding claim that “weak”
work has generalized the claim to a wide social acquaintance whom they considered ties are disproportionately responsible for so-
range of nonsocial networks. However, much “closer” than themselves to the target. Of the cial connectivity (23).
about this “small world” hypothesis is poorly 98,847 individuals who registered, about Senders were also asked why they consid-
understood and empirically unsubstantiated. 25% provided their personal information and ered their nominated acquaintance a suit-
links are made
In particular, individuals in real social net- initiated message chains. Because subsequent able recipient (Table 2). Two reasons—
works have only limited, local information senders were effectively recruited by their geographical proximity of the acquaintance
about the global social network and, there- own acquaintances, the participation rate af- to the target and similarity of occupation—
fore, finding short paths represents a non- ter the first step increased to an average of accounted for at least half of all choices, in
trivial search effort (10–12). Moreover, and 37%. Including initial and subsequent send- general agreement with previous findings
contrary to accepted wisdom, experimental ers, data were recorded on 61,168 individuals (24, 25). Geography clearly dominated the
evidence for short global chain lengths is from 166 countries, constituting 24,163 dis- early stages of a chain (when senders were
– Understanding the Web requires insights
extremely limited (13–15). For example, tinct message chains (table S2). More than geographically distant) but after the third step
Travers and Milgram report 96 message half of all participants resided in North Amer- was cited less frequently than other charac-
chains (of which 18 were completed) initiated ica and were middle class, professional, teristics, of which occupation was the most
by randomly selected individuals from a city college educated, and Christian, reflecting often cited. In contrast with previous claims
other than the target’s (3). Almost all other commonly held notions of the Internet-using (3, 12), the presence of highly connected
from sociology and psychology every bit
empirical studies of large-scale networks population (22). individuals (hubs) appears to have limited
(4–9, 16 –19) have focused either on non- In addition to providing his or her chosen relevance to the kind of social search embod-
social networks or on crude proxies of social contact’s name and e-mail address, each ied by our experiment (social search with
interaction such as scientific collaboration, sender was also required to describe how he large associated costs/rewards or otherwise
and studies specific to e-mail networks have or she had come to know the person, along modified individual incentives may behave
as much as from mathematics and
so far been limited to within single institu- with the type and strength of the resulting differently). Participants relatively rarely
tions (20). relationship. Table 1 lists the frequencies nominated an acquaintance primarily because
We have addressed these issues by con- with which different types of relationships— he or she had many friends (Table 2,
ducting a global, Internet-based social search classified by type, origin, and strength—were “Friends”), and individuals in successful
experiment (21). Participants registered on-
edu) and were randomly allocated one of 18 Table 1. Type, origin, and strength of social ties used to direct messages. Only the top ﬁve categories in
the ﬁrst two columns have been listed. The most useful category of social tie is medium-strength
target persons from 13 countries (table S1). friendships that originate in the workplace.
Type of relationship % Origin of relationship % Strength of relationship %
Institute for Social and Economic Research and Pol-
icy, Columbia University, 420 West 118th Street,
Friend 67 Work 25 Extremely close 18
New York, NY 10027, USA. 2Department of Sociology,
– Stanley Milgram (1967) vs. Duncan Watts
Columbia University, 1180 Amsterdam Avenue, New
Relatives 10 School/university 22 Very close 23
York, NY 10027, USA. Co-worker 9 Family/relation 19 Fairly close 33
Sibling 5 Mutual friend 9 Casual 22
*To whom correspondence should be addressed. E- Signiﬁcant other 3 Internet 6 Not close 4
www.sciencemag.org SCIENCE VOL 301 8 AUGUST 2003 827
ated with the Enhancement level. earliest 30% respondents
Wikipedia relies on the open source model  30% of the sample in term
whereBy Oded contribute their time, talent, and knowl-
people Nov No bias was found.
edge in a collaborative effort to create publicly avail-
able knowledge-based products. Therefore, in addition THE RESULTS
to the six general volunteering motivations, two other The average level of contr
motivations—fun and ideology—used extensively in
In order to increase and enhance user-generated content week—a total that varied
contributions, it isresearch on understand the factors that lead
important to open source software development (for graphics and motivation
people to freely example, time12] knowledge with others. understand why
share their [8, and ) may also help to motivations were found
people contribute to Wikipedia. In both whereas
WHAT MOTIVATES cases we would expect to see higher con-
tribution levels associated with higher
motivation levels. [0.322***] Table 2).
Ideology 5.59 that the
THE SURVEY [0.110] motivati
n Motivation Question example collaborative years have seen volunteer growth in
The last few nature of substantial their time and
Wikipedians a with con
ng Protective “By writing/editing in Wikipedia I feel less lonely
.” Wikipedia, knowledge for no monetary reward,
user-generated online content [7, 11] delivered (1.55) 2). How
ng Values expect contribution levels outlets questionnaire
through collaborative Internet such as [0.175*]
“I feel it is important to help others.” and or therefore as our as more tra- edly, the
a Career “I can make new contacts that might help my business or career.
to be positively outlets such as BBCwell
YouTube, Flickr, Slashdot.org,
included contribution measures as
ditional media News.com .
l- Social “People I'm close to want me to write/edit in Wikipedia.”
with Consistentwellthe Open lev-
Social motivation Information Society’s vision mea-
with as volunteering motivations [0.296***] ology and
ge els. of decreasing restrictions on the creation andlevel was mea-
sures. The contribution delivery Enhancement 2.97 The I
Understanding “Writing/editing in Wikipedia allows me to gain a new (1.39)
perspective on things.” Understanding. Through per weekuser-
of previously protected information goods ,
sured as hours spent on [0.313***] interestin
generated content marks a new way for information to
ve Enhancement “Writing/editing in Wikipedia makes me feel needed.” volunteering, individuals a measure commonly
manipulated, and Protective 1.97 indicated
d Fun “Writing/editing in Wikipedia is fun.”
may have an the Web-based user-created encyclopedia, contri-
Wikipedia, opportunity for participant
used as a proxy (1.05)
r, Ideology “I think information should be free.”
to learn new example and
is a prominent things of a Motivation was measured
bution . collaborative, user-gener- motivatio
n- exercise content outlet . With more than 1.9 million
through the volunteering motivations (0.94) correlate
Table 1. Motivations and skills, and abilities. Thus, as contributing content to [0.185*]
h illustration by lisa haney
scale  adjusted to the Wikipedia level. In
questionnaire items. Social 1.51
u- Wikipedia allows contributors to context, as well as items adjusted from
exercise their (0.92) state that
l, knowledge, skills, and abilities, we would expect to
research on open source motivation [0.027]
able that illustrates why peo- see higher contribution levels the more Wikipedia measuring ideology  and fun . *significant at 0.05 level
**significant at 0.01 level ideologic
***significant at 0.001 level
s like Wikipedia. Contribu- contributors are motivated by Understanding. the motivation items in the
All of translate
to be critical for sustaining Career. Volunteering may provide an opportunitywere presented as state-
November 2007/Vol. 50, No. 11 COMMUNICATIONS OF THE ACM
collaborative user-generated to achieve job-related benefits such as preparing for a ments to which Wikipedians were Table 2. could be the effect of
he content is contributed by new career or maintaining career-relevant skills. Inhow strongly they agree
asked to state and correlations responses to the questi
er their time and talent in the Wikipedia context, we would expect to find somea scale of 1 to 7. Exam-
or disagree on with contribution however, is ruled out sinc
levels. Standard lowe’s scale  was used
reward. Therefore, in order to correlation between contribution levels of questionnaire items are pro-
ples and the deviations in
rlies user-generated content Career function, as Wikipedia offers vided in Table 1. contributors a parentheses. ability. An alternative exp
understand what motivates way to signal their knowledge and writing skills to Wikipedia Alphabetical The English Pearson people have strong opinio
d identify which motivations potential employers. However, we do not expect this List of Wikipedians includes 2,847 peo-
12 coefficient in not translate into actual b
or low levels of contribution. to be a strong correlation, as most Wikipedians are ple. These are not all the contributors, brackets. case of “talk is cheap.” A
ivity, content contribution to not professional writers, or alternatively, because those who have created
but rather only might be that contributor
Small Technical Innovation
Explore how a small technical innovation
can launch a large social phenomenon
– Emergence of the blogosphere - TrackBack
– Twitterverse - ReTweet, Follow/Follower
– The growth of Facebook - Facebook Connect
Provenance of Information
Understand on the dissemination of an idea or information
might change our view of journalism and commentary
What mechanisms can assure blog readers that the facts
quoted are trustworthy?
BLOGOSPHERE has certain patterns of power.
Matthew Hurst tracked how blogs link to one
another. A visualization of the result (left) dis-
plays each blog as a white dot; the few large dots
are massively popular sites. Blogs that share
numerous cross citations form distinct communi-
ties (purple). Isolated groups that communicate
frequently among themselves but rarely with oth-
ers appear as straight lines along the outer edges.
Inﬂuentials: Two Viewpoints
The Rise of Semantic Web
The network of data on the Web
RDF (Resource Description Framework)
– Gives meaning to data through sets of ‘triples’
– The subjects, verbs and objects are each identiﬁed by a Universal
Resource Identiﬁer (URI)
Taxonomies and Ontologies
– DBpedia project by Chris Bizer
• As of November 2008, the DBpedia dataset consists of around 274 million
– Motivation to contribute to the Semantic Web? - Oded Nov of
Polytechnic Institute of NYU
Corporate applications are well under way,
and consumer uses are emerging
Graph of Linked Data sets on the
Web, as at March 2009
By Nigel Shadbolt, Web Science Research Initiative - November, 2008
A Computational Perspective
– Linked Data Web or Semantic Web: how we are to browse, explore and
query such a Web at scale.
– Collective Intelligence with only light rules of social coordination
can lead to the emergence of large-scale, coherent resources such as Wikipedia.
What are the characteristics of such resources? Why do people contribute and how
do they maintain a highly stable core body of connected content?
– How do we support inference at a Web scale? What types of reasoning are
possible? How is context represented and supported in Web inference?
– How are concepts such as trust and provenance computationally
represented, maintained and repaired on the Web?
– As the Web has grown substantial amounts of it have become disconnected,
atrophied or in others ways redundant. How are we to identify such necrotic
and non-functional parts of the Web and what should be done about them?
A Mathematical Perspective
– How do we model the transient or ephemeral Web? How do
we model this graph beneath the graph that is the Web?
– How are Bayesian or other uncertainty representations best
used within the Web?
– What is the topological structure of the Web? Can connections always
be established between its various parts, or do particular dynamic and time-
dependent conditions create disconnected or sub- regions within it?
– The virtual “shape” of the Web: A particular query about a given subject
may organize Web pages, existing or virtual, according to “how close they
are” with respect to the given search criteria.
• A different structure to different users. It is a mathematical challenge to develop
tools to describe this structure.
– How do we measure the level of complexity of the Web?
A Social Science Perspective
– How can we develop inter-disciplinary epistemologies that will enable us to
understand the Web as a complex socio-technical phenomenon?
– How can we do mixed methods research to explore the relations between
ethnographic insights to Web practice and the emergence of the Web at
the macro level?
– How can we draw on new data sources e.g. digital records of network use to develop
understanding of the sociological aspects of the Web?
– What are the on-going iterative relations between use and design of the Web?
– How and why do people use newly emergent forms of the Web in the way
that they do? What implications does this have for our understanding of key sociological
categories, e.g. kinship, gender, race, class and community, and vice versa? What implications
does this have for our understanding of psychological constructs, e.g. personal and group
identity, collaborative decision making, perception and attitudes.
– How is the Web situated within networks of power and in relation to social
inequalities? To what extent might the Web offer empowering political resources? How
might the Web change further as new populations access it?
An Economic Perspective
– What are the economics of Web 2.0 (+)?
– What are the economic forces that shape the formation of social
networks on the Web? What are the properties of those networks? What is the
relationship between the economic structure of the Web, its social and mathematical
– What are the commercial incentives created by the Web? What will be the
industrial structure? Or are there forces that will allow smaller scale operations to co-
exist with large ﬁrms?
– What are the economic arguments for and against open platforms in
the Web? Should policy (economic and public) play any role in shaping or determining
the openness of Web platforms?
– What (economic and social) mechanisms can be designed to improve the
performance of the Web? For example, are there mechanisms that can improve
the extent and quality of participation in online communities?
– Economics for piracy, privacy and identity?
A Legal Perspective
– Techniques for representing and reasoning over legal and social
rules – explore and understand the impact of law as a driver in shaping the Web
– Is the present intellectual property regulatory regime ﬁt for purpose in
the Web 2.0 (+) environment? What is content in the Semantic Web and what rights
should attach to it particularly when much is likely to be “computer generated”?
– Which technologies within the Web should the law ensure remain “open” rather
than becoming the “property” of one or more commercial entities and what are the
consequences of the choices available?
– To what extent are the service providers going to become the legal gatekeepers
for public authorities in terms of delivering their public policy objectives e.g. Web
policing for what is judged to be “illegal and harmful content”?
– What privacy issues arise in a Web environment of increasingly sophisticated
Integrative Research Themes
– Technical, socio-economic, legal, psychological
The Openness of the Web
– Economic, Legal
The Dynamics of the Web
– Mathematical, sociological, legal, linguistic
Security, Privacy and Trust
– Economic, social, legal interaction
– Computational, psychological, linguistic
Thank you and meet me at