Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Why Do We Need
Web Science Research?
       December 2009

      Sangki Han, Ph.D.
      Professor / GSCT
Web Science by Tim Berners-Lee
            in 2006


    This is the first report of time-dependent of these has changed when changes in only track to realizing ...
A New Discipline
 Model the Web’s structure

 Articulate the architectural principles that have
  fueled its phenomenal ...
Interdisciplinary Approach

                      Web Science Trust
 The Web Science Research Initiative (WSRI) is a joint endeavour between t...
WebSci’09: Society On-Line
   Understanding of both human behavior and                         Identified the following a...

Web Science                                                                                    ...
Model the Web’s Structure
Analysis on CyWorld
                          Analysis of Topological Characteristics
                        of Huge Onli...
Social Drivers
ated with the Enhancement level.                                         earliest 30% respondents
Small Technical Innovation
  Explore how a small technical innovation
   can launch a large social phenomenon
   – Emerge...
Provenance of Information
 Understand on the dissemination of an idea or information
  might change our view of journalis...
Influentials: Two Viewpoints

The Rise of Semantic Web
 The network of data on the Web
 RDF (Resource Description Framework)
  – Gives meaning to data...
Corporate applications are well under way,
and consumer uses are emerging


Graph of Linked Data sets on the
    Web, as at March 2009

Research Roadmap
 By Nigel Shadbolt, Web Science Research Initiative - November, 2008
 A Computational Perspective
   – ...
Research Roadmap
 A Mathematical Perspective
  – How do we model the transient or ephemeral Web? How do
    we model this...
Research Roadmap
 A Social Science Perspective
  – How can we develop inter-disciplinary epistemologies that will enable ...
Research Roadmap
 An Economic Perspective
  – What are the economics of Web 2.0 (+)?
  – What are the economic forces tha...
Research Roadmap
 A Legal Perspective
  – Techniques for representing and reasoning over legal and social
    rules – exp...
Integrative Research Themes
 Collective Intelligence
  – Technical, socio-economic, legal, psychological
 The Openness o...
Thank you and meet me at
    Facebook: stevehan
     Twitter: steve3034
     me2day: steve3001
Upcoming SlideShare
Loading in …5

Why Do We Need Web Science Research?


Published on

Published in: Technology
  • Be the first to comment

Why Do We Need Web Science Research?

  1. 1. Why Do We Need Web Science Research? December 2009 Sangki Han, Ph.D. Professor / GSCT KAIST
  2. 2. Web Science by Tim Berners-Lee in 2006 2
  3. 3. PERSPECTIVES This is the first report of time-dependent of these has changed when changes in only track to realizing technological capabilities seismic tomography applied to an erupting two quantities (VP and VS) have been mea- resembling those of the fictional Virtual volcano. It builds on earlier work of the sured is not possible and requires the addi- Geophysical Laboratory by 2025. same kind done in geothermal areas in tion of other kinds of data. Both theoretical California and Iceland and the Long Valley advances and more data from different vol- References and Notes 1. D. Patanè, G. Barberi, O. Cocina, P. De Gori, C. Caldera, California. But the seminal exam- canoes are needed before the potential of the Chiarabba, Science 313, 821 (2006). ple of major changes in VP /VS comes from method can be fully assessed. 2. The compressional and shear waves are the fastest and The Geysers geothermal area in northern At present, monitoring of active volca- second-fastest waves to be radiated from an earthquake California. noes still rests mostly on relatively unso- source, so they arrive first and second on seismograms. Their ratio provides information about pressure and During the 1980s and 1990s, some phisticated seismic networks and the moni- about the presence of gas and liquid in the study volume. 13,600 tons of steam per hour were extracted toring of simple parameters, such as the Thus, changes in their ratio can tell us about changes in from The Geysers to generate electricity. As numbers of earthquakes and the amplitude pressure and gas/liquid, which are thought to accompany the buildup and occurrence of a volcanic eruption. a result of this overexploitation, the reser- of harmonic tremor. Patanè et al. show that 3. G. R. Foulger, C. C. Grant, A. Ross, B. R. Julian, Geophys. voir became progressively depleted as pore much more sophisticated methods can now Res. Lett. 24, 135 (1997). water was replaced by steam. Repeat seis- be used. Some of these methods only need to 4. R. C. Gunasekera, G. R. Foulger, B. R. Julian, J. Geophys. Res. 108, 2134 (2003). mic tomography showed the steady growth be automated—a critical factor if they are to 5. G. R. Foulger, B. R. Julian, Geotherm. Resour. Counc. Downloaded from on December 1, 2009 of a reservoir-wide negative VP /VS anomaly be useful in situations where information is Bull. 33, 120 (2004). that coincided with the steam-production needed on an hourly basis. It is hoped that 6. G. R. Foulger et al., J. Geophys. Res. 108, 2147 (2003). zone. This anomaly was caused by the com- this automation work will be pushed for- bined effects of the replacement of pore liq- ward rapidly in the near future, putting us on 10.1126/science.1131790 uid with steam, the resulting decrease in pressure, and the drying of clay minerals. A remarkable series of snapshots showed the COMPUTER SCIENCE relentless growth of a volume of heavy depletion (3, 4). The work helped to increase awareness of the nonsustainability of such high rates of fluid withdrawal. Production at Creating a Science of the Web The Geysers has now been reduced to sus- Tim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt, Daniel J. Weitzner tainable levels. Time-dependent tomogra- phy is currently used to monitor the Coso Understanding and fostering the growth of the World Wide Web, both in engineering and societal Geothermal Area, southern California (5). terms, will require the development of a new interdisciplinary field. Time-dependent seismic tomography S was first applied to a volcano in a study of ince its inception, the World Wide lyzes the natural world, and tries to find Mammoth Mountain, a volcano on the rim Web has changed the ways scientists microscopic laws that, extrapolated to the of Long Valley Caldera, California. In 1989, communicate, collaborate, and edu- macroscopic realm, would generate the an intense swarm of hundreds of earth- cate. There is, however, a growing realiza- behavior observed. Computer science, by quakes accompanied an injection of new tion among many researchers that a clear contrast, though partly analytic, is princi- magma into the roots of this volcano, research agenda aimed pally synthetic: It is concerned with the con- and triggered the outpouring of some at understanding the struction of new languages and algorithms 300 tons of CO2 per day from the vol- Enhanced online at current, evolving, in order to produce novel desired computer cano’s surface. Several broad swaths content/full/313/5788/769 and potential Web is behaviors. Web science is a combination of of trees died as a result of high levels needed. If we want to these two features. The Web is an engineered of CO 2 in the soil, and the CO 2 model the Web; if we space created through formally specified also presented an asphyxiation hazard want to understand the architectural princi- languages and protocols. However, because to humans. A comparison of VP /VS tomo- ples that have provided for its growth; and if humans are the creators of Web pages and graphic images calculated for 1989 and we want to be sure that it supports the basic links between them, their interactions form 1997 showed changes that correlated well social values of trustworthiness, privacy, emergent patterns in the Web at a macro- with areas of tree death on the surface above, and respect for social boundaries, then we scopic scale. These human interactions are, and were attributed to migration of CO2 in must chart out a research agenda that targets in turn, governed by social conventions and the volcano (6). the Web as a primary focus of attention. laws. Web science, therefore, must be inher- By showing that time-dependent seismic When we discuss an agenda for a science ently interdisciplinary; its goal is to both tomography can be used to monitor struc- of the Web, we use the term “science” in two understand the growth of the Web and to cre- tural changes directly associated with a vol- ways. Physical and biological science ana- ate approaches that allow new powerful and canic eruption cycle, Patanè et al. take a crit- more beneficial patterns to occur. ical step toward developing a useful volcano- Unfortunately, such a research area does T. Berners-Lee and D. J. Weitzner are at the Computer Science hazard-reduction tool based on seismic and Artificial Intelligence Laboratory, Massachusetts Institute not yet exist in a coherent form. Within tomography. As with all good experiments, of Technology, Cambridge, MA 02139, USA. W. Hall and computer science, Web-related research has however, it ushers in new challenges. VP /VS N. Shadbolt are in the School of Electronics and Computer largely focused on information-retrieval is affected by several factors, including pore Science, University of Southampton, Southampton SO17 algorithms and on algorithms for the routing 1BJ, UK. J. Hendler is in the Computer Science Department, fluid phase, pressure, mineralogy, and frac- University of Maryland, College Park, MD 20742, USA. of information through the underlying Inter- ture density. However, determining how each E-mail: net. Outside of computing, researchers grow SCIENCE VOL 313 11 AUGUST 2006 769 Published by AAAS 3
  4. 4. A New Discipline  Model the Web’s structure  Articulate the architectural principles that have fueled its phenomenal growth  Discover how online human interactions are driven by and can change social conventions 4
  5. 5. Interdisciplinary Approach 5
  6. 6. WSRI & Web Science Trust  The Web Science Research Initiative (WSRI) is a joint endeavour between the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT and the School of Electronics and Computer Science (ECS) at the University of Southampton. The goal of WSRI is to facilitate and produce the fundamental scientific advances necessary to inform the future design and use of the World Wide Web  Publication: Foundations and Trends in Web Science  Events – Web Science Summer Graduate School – WebSci09 - Society On-Line  Directors of WSRI are establishing a charitable body - the Web Science Trust (WST) – Working with WWW Foundation 6
  7. 7. WebSci’09: Society On-Line  Understanding of both human behavior and  Identified the following areas of on-line society and Web technological design development for particular attention: – How do people and organisations behave on-line – – E-commerce what motivates them to shop, date, make friends, – Government and Political Life learn, participate in political life or manage their health or tax on-line? – Social Relationships – Which Web-based designs will they trust? To which – Cybercrime and/or the Prevention Thereof on-line agents will they delegate? – Health – How can the dark side of the Web – such as – Culture On-Line cybercrime, pornography and terrorist networks – be both understood and held in check without – E-Learning compromising the experience of others?  The cross-cutting infrastructure issues on which these – What are the effects of varying characteristics of areas depend including, but not limited to: Web-based technologies – such as security, privacy, – Linked Data and the Semantic Web network structure, the linking of data – on on-line behaviour, both criminal and non-criminal? – Trust and Reputation – And how can the design of the Web of the future – Security and Privacy ensure that a system on which – as Tim Berners-Lee – Networking (Social and Technical) put it – democracy and commerce depends remains 'stable and pro-human'? 7
  8. 8. INFORMATION TECHNOLOGY Web Science which subsequently improved computing significantly. Web EMERGES science was launched as a formal discipline in November 2006, when the two of us and our col- leagues at the Massachusetts In- stitute of Technology and the University of Southampton in England announced the begin- ning of a Web Science Research Initiative. Lead- Studying the Web will reveal better ing researchers from 16 of the world’s top uni- versities have since expanded on that effort. ways to exploit information, This new discipline will model the Web’s structure, articulate the architectural principles prevent identity theft, that have fueled its phenomenal growth, and dis- cover how online human interactions are driven revolutionize industry and manage by and can change social conventions. It will elu- cidate the principles that can ensure that the net- our ever growing online lives work continues to grow productively and settle complex issues such as privacy protection and in- By Nigel Shadbolt and Tim Berners-Lee tellectual-property rights. To achieve these ends, Web science will draw on mathematics, physics, computer science, psychology, ecology, sociolo- S ince the World Wide Web blossomed in gy, law, political science, economics, and more. the mid-1990s, it has exploded to more Of course, we cannot predict what this na- than 15 billion pages that touch almost scent endeavor might reveal. Yet Web science all aspects of modern life. Today more and more has already generated crucial insights, some people’s jobs depend on the Web. Media, bank- presented here. Ultimately, the pursuit aims to ing and health care are being revolutionized by answer fundamental questions: What evolu- it. And governments are even considering how tionary patterns have driven the Web’s growth? to run their countries with it. Little appreciated, Could they burn out? How do tipping points however, is the fact that the Web is more than the arise, and can that be altered? KEY CONCEPTS sum of its pages. Vast emergent properties have The relentless rise in Web arisen that are transforming society. E-mail led Insights Already pages and links is creating emer- to instant messaging, which has led to social net- Although Web science as a discipline is new, gent properties, from social net- works such as Facebook. The transfer of docu- earlier research has revealed the potential value working to virtual identity theft, ments led to file-sharing sites such as Napster, of such work. As the 1990s progressed, search- that are transforming society. which have led to user-generated portals such as ing for information by looking for key words A new discipline, Web science, YouTube. And tagging content with labels is cre- among the mounting number of pages was aims to discover how Web traits ating online communities that share everything returning more and more irrelevant content. arise and how they can be from concert news to parenting tips. The founders of Google, Larry Page and Sergey harnessed or held in check to But few investigators are studying how such Brin, realized they needed to prioritize the benefit society. emergent properties have actually blossomed, results. Important advances are begin- how we might harness them, what new phe- Their big insight was that the importance of ning to be made; more work nomena may be coming or what any of this a page— how relevant it is—was best understood can solve major issues such might mean for humankind. A new branch of in terms of the number and importance of the as securing privacy and science —Web science— aims to address such is- pages linking to it. The difficulty was that part conveying trust. sues. The timing fits history: computers were of this definition is recursive: the importance of —The Editors built first, and computer science followed, a page is determined by the importance of the 32 S C I E N T I F I C A M E R I C A N October 2008 8
  9. 9. Model the Web’s Structure TECHNICAL COMMENT data, we can illustrate the same procedure for  PageRank by Page and Brin Power-Law Distribution of the the network of movie actors that we dis- cussed (1). When the connectivity of the in- World Wide Web dividual actors is plotted as a function of the release year of their first movie (Fig. 1A), the results are very similar to those shown in fig. Barabasi ´ and Albert (1) propose an im- from other sites, and found that the distribu- 1B of Adamic and Huberman’s comment. proved version of the Erdos-Renyi (ER) the- ¨ ´ tion of links followed a power law (Fig. 1A). The only difference is that the movie industry  Web is a scale-free network - ory of random networks to account for the Next, we queried the InterNIC database (us- had its boom not 4 years ago, as did the scaling properties of a number of systems, ing the WHOIS search tool at www. WWW, but rather at the beginning of the including the link structure of the World for the date on which century; thus, the apparently structureless re- . Wide Web (WWW). The theory they present, the site was originally registered. Whereas gime persists much longer. When the connec- however, is inconsistent with empirically ob- the BA model predicts that older sites have tivity of the actors that debuted in the same Northeastern University’s Albert- served properties of the Web link structure. more time to acquire links and gather links at year is averaged, however, the average con- Barabasi and Albert write that because ´ a faster rate than newer sites, the results of nectivity in the last 60 years increases with “of the preferential attachment, a vertex our search (Fig. 1B) suggest no correlation the actor’s age, in line with the predictions of that acquires more connections than anoth- between the age of a site and its number of our theory, and the curve follows a power law er one will increase its connectivity at a links. for almost a hundred years (Fig. 1B). We higher rate; thus, an initial difference in the The absence of correlation between age expect that a similar increasing tendency László Barabási connectivity between two vertices will in- and the number of links is hardly surpris- would appear for the WWW data after aver- crease further as the network grows. . . . ing; all sites are not created equal. An aging, but the length of the scaling interval Thus older . . . vertices increase their con- exciting site that appears in 1999 will soon would be limited by the Web’s comparatively nectivity at the expense of the younger . . . have more links than a bland site created in brief history. ones, leading over time to some vertices 1993. The rate of acquisition of new links is The fluctuations that lead to the appar- that are highly connected, a ‘rich-get-rich- probably proportional to the number of ent randomness of Fig. 1A are due to the er’ phenomenon” [figure 2C of (1)]. It is links the site already has, because the more individual differences in the rate at which  Web as having short paths and small worlds this prediction of the Barabasi-Albert (BA) ´ links a site has, the more visible it becomes nodes increase their connectivity. It is model, however, that renders it unable to and the more new links it will get. (There easy to include such differences in the account for the power-law distribution of should, however, be an additional propor- model and continuum theory proposed by links in the WWW [figure 1B of (1)]. tionality factor, or growth rate, that varies We studied a crawl of 260,000 sites, each from site to site.) one representing a separate domain name. We Our recently proposed theory (2), which – While at Cornell University in the counted how many links the sites received accounts for the power-law distribution in the number of pages per site, can also be applied to the number of links a site receives. In this model, the number of new links a site re- ceives at each time step is a random fraction 1990s, Duncan J. Watts and Steven H. of the number of links the site already has. New sites, each with a different growth rate, appear at an exponential rate. This model yields scatter plots similar to Fig. 1B, and can produce any power-law exponent 1. Strogatz Lada A. Adamic Bernardo A. Huberman Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304, USA E-mail: – Even though the Web was huge, a user References 1. A.-L. Barabasi and R. Albert, Science 286, 509 (1999). ´ 2. B. A. Huberman and L. A. Adamic, Nature 401, 131 (1999). Fig. 1. (A) Scatter plot of movie actor connec- tivity, k (the number of other actors with which he or she performed during his or her career), could get from one page to any other 10 November 1999; accepted 4 February 2000 versus the year of debut. All actors from the Internet Movie Database were included; n 392,340. (B) Average movie actor connectivity, Response: Adamic and Huberman offer ad- k , versus year of debut. To determine k , k is ditional support for the evolutionary network averaged over all actors that debuted in the model that we offered (1). The apparent mess same year. The curve shows a systematic in- page in at most 14 clicks in their fig. 1B is rooted in their choice not to crease in the average connectivity with the average their data. We believe that taking the actor’s professional lifetime, t (2000 year Fig. 1. (A) The distribution function for the average over all points of the same age, and of debut). The dotted line follows k(t) t , number of links, k, to Web sites (from crawl in with 0.49, very close to the prediction spring 1997). The dashed line has slope extracting the trends within those averages, 0.5 of (1). Inset shows a log-log plot of k as a 1.94. (B) Scatter plot of the number of links, k, would have unveiled the increasing tendency function of t, which illustrates the presence of versus age for 120,000 sites. The correlation predicted by our model. scaling in the last century. The dotted line has coefficient is 0.03. Although we do not have access to their slope 0.5. SCIENCE VOL 287 24 MARCH 2000 2115a 9
  10. 10. Analysis on CyWorld Analysis of Topological Characteristics of Huge Online Social Networking Services Yong-Yeol Ahn Seungyeop Han∗ Haewoon Kwak Department of Physics NHN Corp. Division of Computer Science KAIST, Deajeon, Korea Korea KAIST, Daejeon, Korea Sue Moon Hawoong Jeong Division of Computer Science Department of Physics KAIST, Daejeon, Korea KAIST, Deajeon, Korea ABSTRACT Cyworld, the largest SNS in South Korea, had already 10 Social networking services are a fast-growing business in the million users 2 years ago, one fourth of the entire population Internet. However, it is unknown if online relationships and of South Korea. MySpace and orkut, similar social network- their growth patterns are the same as in real-life social net- ing services, have also more than 10 million users each. Re- works. In this paper, we compare the structures of three cently, the number of MySpace users exceeded 130 million online social networking services: Cyworld, MySpace, and with a growing rate of over a hundred thousand people per orkut, each with more than 10 million users, respectively. day. It is reported that these SNSs “attract nearly half of all We have access to complete data of Cyworld’s ilchon (friend) web users” [1]. The goal of these services is to help people relationships and analyze its degree distribution, clustering establish an online presence and build social networks; and property, degree correlation, and evolution over time. We to eventually exploit the user base for commercial purposes. also use Cyworld data to evaluate the validity of snowball Thus the statistics and dynamics of these online social net- sampling method, which we use to crawl and obtain par- works are of tremendous importance to social networking tial network topologies of MySpace and orkut. Cyworld, service providers and those interested in online commerce. the oldest of the three, demonstrates a changing scaling be- The notion of a network structure in social relations dates havior over time in degree distribution. The latest Cyworld back about half a century. Yet, the focus of most sociological data’s degree distribution exhibits a multi-scaling behavior, studies has been interactions in small groups, not structures while those of MySpace and orkut have simple scaling be- of large and extensive networks. Difficulty in obtaining large haviors with different exponents. Very interestingly, each data sets was one reason behind the lack of structural study. of the two exponents corresponds to the different segments However, as reported in [2] recently, missing data may dis- in Cyworld’s degree distribution. Certain online social net- tort the statistics severely and it is imperative to use large working services encourage online activities that cannot be data sets in network structure analysis. easily copied in real life; we show that they deviate from It is only very recently that we have seen research re- close-knit online social networks which show a similar de- sults from large networks. Novel network structures from gree correlation pattern to real-life social networks. human societies and communication systems have been un- veiled; just to name a few are the Internet and WWW [3] and Categories and Subject Descriptors: J.4 [Computer the patents, Autonomous Systems (AS), and affiliation net- Applications]: Social and behavioral sciences works [4]. Even in the short history of the Internet, SNSs are General Terms: Human factors, Measurement a fairly new phenomenon and their network structures are Keywords: Sampling, Social network not yet studied carefully. The social networks of SNSs are believed to reflect the real-life social relationships of people more accurately than any other online networks. Moreover, 1. INTRODUCTION because of their size, they offer an unprecedented opportu- The Internet has been a vessel to expand our social net- nity to study human social networks. works in many ways. Social networking services (SNSs) are In this paper, we pose and answer the following questions: one successful example of such a role. SNSs provide an on- What are the main characteristics of online social net- line private space for individuals and tools for interacting works? Ever since the scale-free nature of the World-Wide with other people in the Internet. SNSs help people find Web network has been revealed, a large number of networks others of a common interest, establish a forum for discus- have been analyzed and found to have power-law scaling in sion, exchange photos and personal news, and many more. degree distribution, large clustering coefficients, and small ∗ This work was conducted while Han was at KAIST. mean degrees of separation (so called the small-world phe- nomenon). The networks we are interested in this work are Copyright is held by the International World Wide Web Conference Com- huge and those of this magnitude have not yet been ana- mittee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. lyzed. WWW 2007, May 8–12, 2007, Banff, Alberta, Canada. How representive is a sample network? In most networks, ACM 978-1-59593-654-7/07/0005. 10
  11. 11. Social Drivers REPORTS invoked by our population of 61,168 active An Experimental Study of Search senders. When passing messages, senders  Discover how online human typically used friendships in preference to in Global Social Networks business or family ties; however, almost half of these friendships were formed through ei- ther work or school affiliations. Furthermore, Peter Sheridan Dodds,1 Roby Muhamad,2 Duncan J. Watts1,2* successful chains in comparison with incom- plete chains disproportionately involved pro- interactions are driven by and can We report on a global social-search experiment in which more than 60,000 fessional ties (33.9 versus 13.2%) rather than e-mail users attempted to reach one of 18 target persons in 13 countries by friendship and familial relationships (59.8 forwarding messages to acquaintances. We find that successful social search is versus 83.4%) (table S3). Successful chains conducted primarily through intermediate to weak strength ties, does not were also more likely to entail links that require highly connected “hubs” to succeed, and, in contrast to unsuccessful originated through work or higher education social search, disproportionately relies on professional relationships. By ac- (65.1 versus 39.6%) (table S4). Men passed change social conventions counting for the attrition of message chains, we estimate that social searches messages more frequently to other men can reach their targets in a median of five to seven steps, depending on the (57%), and women to other women (61%), separation of source and target, although small variations in chain lengths and and this tendency to pass to a same-sex con- participation rates generate large differences in target reachability. We con- tact was strengthened by about 3% if the clude that although global social networks are, in principle, searchable, actual target was the same gender as the sender and success depends sensitively on individual incentives. similarly weakened in the opposite case. In- dividuals in both successful and unsuccessful It has become commonplace to assert that any Targets included a professor at an Ivy League chains typically used ties to acquaintances – Social drivers-goals, desires, interests and individual in the world can reach any other university, an archival inspector in Estonia, a they deemed to be “fairly close.” However, in individual through a short chain of social ties technology consultant in India, a policeman successful chains “casual” and “not close” (1, 2). Early experimental work by Travers in Australia, and a veterinarian in the Norwe- ties were chosen 15.7 and 5.9% more fre- and Milgram (3) suggested that the average gian army. Participants were informed that quently than in unsuccessful chains (table length of such chains is roughly six, and their task was to help relay a message to their S5), thus adding support, and some resolu- attitudes-are fundamental aspects of how recent theoretical (4) and empirical (4–9) allocated target by passing the message to a tion, to the longstanding claim that “weak” work has generalized the claim to a wide social acquaintance whom they considered ties are disproportionately responsible for so- range of nonsocial networks. However, much “closer” than themselves to the target. Of the cial connectivity (23). about this “small world” hypothesis is poorly 98,847 individuals who registered, about Senders were also asked why they consid- understood and empirically unsubstantiated. 25% provided their personal information and ered their nominated acquaintance a suit- links are made In particular, individuals in real social net- initiated message chains. Because subsequent able recipient (Table 2). Two reasons— works have only limited, local information senders were effectively recruited by their geographical proximity of the acquaintance about the global social network and, there- own acquaintances, the participation rate af- to the target and similarity of occupation— fore, finding short paths represents a non- ter the first step increased to an average of accounted for at least half of all choices, in trivial search effort (10–12). Moreover, and 37%. Including initial and subsequent send- general agreement with previous findings contrary to accepted wisdom, experimental ers, data were recorded on 61,168 individuals (24, 25). Geography clearly dominated the evidence for short global chain lengths is from 166 countries, constituting 24,163 dis- early stages of a chain (when senders were – Understanding the Web requires insights extremely limited (13–15). For example, tinct message chains (table S2). More than geographically distant) but after the third step Travers and Milgram report 96 message half of all participants resided in North Amer- was cited less frequently than other charac- chains (of which 18 were completed) initiated ica and were middle class, professional, teristics, of which occupation was the most by randomly selected individuals from a city college educated, and Christian, reflecting often cited. In contrast with previous claims other than the target’s (3). Almost all other commonly held notions of the Internet-using (3, 12), the presence of highly connected from sociology and psychology every bit empirical studies of large-scale networks population (22). individuals (hubs) appears to have limited (4–9, 16 –19) have focused either on non- In addition to providing his or her chosen relevance to the kind of social search embod- social networks or on crude proxies of social contact’s name and e-mail address, each ied by our experiment (social search with interaction such as scientific collaboration, sender was also required to describe how he large associated costs/rewards or otherwise and studies specific to e-mail networks have or she had come to know the person, along modified individual incentives may behave as much as from mathematics and so far been limited to within single institu- with the type and strength of the resulting differently). Participants relatively rarely tions (20). relationship. Table 1 lists the frequencies nominated an acquaintance primarily because We have addressed these issues by con- with which different types of relationships— he or she had many friends (Table 2, ducting a global, Internet-based social search classified by type, origin, and strength—were “Friends”), and individuals in successful experiment (21). Participants registered on- line (http://smallworld.sociology.columbia. computer science edu) and were randomly allocated one of 18 Table 1. Type, origin, and strength of social ties used to direct messages. Only the top five categories in the first two columns have been listed. The most useful category of social tie is medium-strength target persons from 13 countries (table S1). friendships that originate in the workplace. Type of relationship % Origin of relationship % Strength of relationship % 1 Institute for Social and Economic Research and Pol- icy, Columbia University, 420 West 118th Street, Friend 67 Work 25 Extremely close 18 New York, NY 10027, USA. 2Department of Sociology, – Stanley Milgram (1967) vs. Duncan Watts Columbia University, 1180 Amsterdam Avenue, New Relatives 10 School/university 22 Very close 23 York, NY 10027, USA. Co-worker 9 Family/relation 19 Fairly close 33 Sibling 5 Mutual friend 9 Casual 22 *To whom correspondence should be addressed. E- Significant other 3 Internet 6 Not close 4 mail: (2003) SCIENCE VOL 301 8 AUGUST 2003 827 11
  12. 12. ated with the Enhancement level. earliest 30% respondents Wikipedia relies on the open source model [9] 30% of the sample in term whereBy Oded contribute their time, talent, and knowl- people Nov No bias was found. edge in a collaborative effort to create publicly avail- able knowledge-based products. Therefore, in addition THE RESULTS to the six general volunteering motivations, two other The average level of contr motivations—fun and ideology—used extensively in In order to increase and enhance user-generated content week—a total that varied contributions, it isresearch on understand the factors that lead important to open source software development (for graphics and motivation people to freely example, time12] knowledge with others. understand why share their [8, and ) may also help to motivations were found people contribute to Wikipedia. In both whereas WHAT MOTIVATES cases we would expect to see higher con- tribution levels associated with higher Motivation Fun Mean 6.10 tive were motivatio WIKIPEDIANS? (1.15) motivation levels. [0.322***] Table 2). Ideology 5.59 that the (1.71) THE SURVEY [0.110] motivati n Motivation Question example collaborative years have seen volunteer growth in The last few nature of substantial their time and Wikipedians a with con Values 3.96 ng Protective “By writing/editing in Wikipedia I feel less lonely .” Wikipedia, knowledge for no monetary reward, we would user-generated online content [7, 11] delivered (1.55) 2). How ng Values expect contribution levels outlets questionnaire through collaborative Internet such as [0.175*] “I feel it is important to help others.” and or therefore as our as more tra- edly, the a Career “I can make new contacts that might help my business or career. to be positively outlets such as BBCwell YouTube, Flickr,, associated included contribution measures as ditional media [6]. Understanding 3.92 (1.48) correlated l- Social “People I'm close to want me to write/edit in Wikipedia.” with Consistentwellthe Open lev- Social motivation Information Society’s vision mea- with as volunteering motivations [0.296***] ology and ge els. of decreasing restrictions on the creation andlevel was mea- sures. The contribution delivery Enhancement 2.97 The I Understanding “Writing/editing in Wikipedia allows me to gain a new (1.39) perspective on things.” Understanding. Through per weekuser- of previously protected information goods [1], sured as hours spent on [0.313***] interestin generated content marks a new way for information to ve Enhancement “Writing/editing in Wikipedia makes me feel needed.” volunteering, individuals a measure commonly be created,contributing,consumed. manipulated, and Protective 1.97 indicated d Fun “Writing/editing in Wikipedia is fun.” may have an the Web-based user-created encyclopedia, contri- Wikipedia, opportunity for participant used as a proxy (1.05) [0.306***] (ranked r, Ideology “I think information should be free.” to learn new example and is a prominent things of a Motivation was measured bution [10]. collaborative, user-gener- motivatio Career 1.67 n- exercise content outlet [11]. With more than 1.9 million ated their knowledge, through the volunteering motivations (0.94) correlate Table 1. Motivations and skills, and abilities. Thus, as contributing content to [0.185*] h illustration by lisa haney scale [2] adjusted to the Wikipedia level. In questionnaire items. Social 1.51 u- Wikipedia allows contributors to context, as well as items adjusted from exercise their (0.92) state that l, knowledge, skills, and abilities, we would expect to research on open source motivation [0.027] of reason able that illustrates why peo- see higher contribution levels the more Wikipedia measuring ideology [12] and fun [8]. *significant at 0.05 level **significant at 0.01 level ideologic ***significant at 0.001 level s like Wikipedia. Contribu- contributors are motivated by Understanding. the motivation items in the All of translate to be critical for sustaining Career. Volunteering may provide an opportunitywere presented as state- 60 questionnaire November 2007/Vol. 50, No. 11 COMMUNICATIONS OF THE ACM One wa collaborative user-generated to achieve job-related benefits such as preparing for a ments to which Wikipedians were Table 2. could be the effect of Motivation levels he content is contributed by new career or maintaining career-relevant skills. Inhow strongly they agree asked to state and correlations responses to the questi er their time and talent in the Wikipedia context, we would expect to find somea scale of 1 to 7. Exam- or disagree on with contribution however, is ruled out sinc levels. Standard lowe’s scale [3] was used reward. Therefore, in order to correlation between contribution levels of questionnaire items are pro- ples and the deviations in rlies user-generated content Career function, as Wikipedia offers vided in Table 1. contributors a parentheses. ability. An alternative exp understand what motivates way to signal their knowledge and writing skills to Wikipedia Alphabetical The English Pearson people have strong opinio correlation d identify which motivations potential employers. However, we do not expect this List of Wikipedians includes 2,847 peo- 12 coefficient in not translate into actual b or low levels of contribution. to be a strong correlation, as most Wikipedians are ple. These are not all the contributors, brackets. case of “talk is cheap.” A ivity, content contribution to not professional writers, or alternatively, because those who have created but rather only might be that contributor
  13. 13. Small Technical Innovation  Explore how a small technical innovation can launch a large social phenomenon – Emergence of the blogosphere - TrackBack – Twitterverse - ReTweet, Follow/Follower – The growth of Facebook - Facebook Connect 13
  14. 14. Provenance of Information  Understand on the dissemination of an idea or information might change our view of journalism and commentary  What mechanisms can assure blog readers that the facts quoted are trustworthy? BLOGOSPHERE has certain patterns of power. Matthew Hurst tracked how blogs link to one another. A visualization of the result (left) dis- plays each blog as a white dot; the few large dots are massively popular sites. Blogs that share numerous cross citations form distinct communi- ties (purple). Isolated groups that communicate frequently among themselves but rarely with oth- ers appear as straight lines along the outer edges. 14
  15. 15. Influentials: Two Viewpoints 15
  16. 16. The Rise of Semantic Web  The network of data on the Web  RDF (Resource Description Framework) – Gives meaning to data through sets of ‘triples’ – The subjects, verbs and objects are each identified by a Universal Resource Identifier (URI)  Taxonomies and Ontologies  Wiki – DBpedia project by Chris Bizer • As of November 2008, the DBpedia dataset consists of around 274 million RDF triples – Motivation to contribute to the Semantic Web? - Oded Nov of Polytechnic Institute of NYU 16
  17. 17. Corporate applications are well under way, and consumer uses are emerging S 17
  18. 18. Graph of Linked Data sets on the Web, as at March 2009 18
  19. 19. Research Roadmap  By Nigel Shadbolt, Web Science Research Initiative - November, 2008  A Computational Perspective – Linked Data Web or Semantic Web: how we are to browse, explore and query such a Web at scale. – Collective Intelligence with only light rules of social coordination can lead to the emergence of large-scale, coherent resources such as Wikipedia. What are the characteristics of such resources? Why do people contribute and how do they maintain a highly stable core body of connected content? – How do we support inference at a Web scale? What types of reasoning are possible? How is context represented and supported in Web inference? – How are concepts such as trust and provenance computationally represented, maintained and repaired on the Web? – As the Web has grown substantial amounts of it have become disconnected, atrophied or in others ways redundant. How are we to identify such necrotic and non-functional parts of the Web and what should be done about them? 19
  20. 20. Research Roadmap  A Mathematical Perspective – How do we model the transient or ephemeral Web? How do we model this graph beneath the graph that is the Web? – How are Bayesian or other uncertainty representations best used within the Web? – What is the topological structure of the Web? Can connections always be established between its various parts, or do particular dynamic and time- dependent conditions create disconnected or sub- regions within it? – The virtual “shape” of the Web: A particular query about a given subject may organize Web pages, existing or virtual, according to “how close they are” with respect to the given search criteria. • A different structure to different users. It is a mathematical challenge to develop tools to describe this structure. – How do we measure the level of complexity of the Web? 20
  21. 21. Research Roadmap  A Social Science Perspective – How can we develop inter-disciplinary epistemologies that will enable us to understand the Web as a complex socio-technical phenomenon? – How can we do mixed methods research to explore the relations between ethnographic insights to Web practice and the emergence of the Web at the macro level? – How can we draw on new data sources e.g. digital records of network use to develop understanding of the sociological aspects of the Web? – What are the on-going iterative relations between use and design of the Web? – How and why do people use newly emergent forms of the Web in the way that they do? What implications does this have for our understanding of key sociological categories, e.g. kinship, gender, race, class and community, and vice versa? What implications does this have for our understanding of psychological constructs, e.g. personal and group identity, collaborative decision making, perception and attitudes. – How is the Web situated within networks of power and in relation to social inequalities? To what extent might the Web offer empowering political resources? How might the Web change further as new populations access it? 21
  22. 22. Research Roadmap  An Economic Perspective – What are the economics of Web 2.0 (+)? – What are the economic forces that shape the formation of social networks on the Web? What are the properties of those networks? What is the relationship between the economic structure of the Web, its social and mathematical structure? – What are the commercial incentives created by the Web? What will be the industrial structure? Or are there forces that will allow smaller scale operations to co- exist with large firms? – What are the economic arguments for and against open platforms in the Web? Should policy (economic and public) play any role in shaping or determining the openness of Web platforms? – What (economic and social) mechanisms can be designed to improve the performance of the Web? For example, are there mechanisms that can improve the extent and quality of participation in online communities? – Economics for piracy, privacy and identity? 22
  23. 23. Research Roadmap  A Legal Perspective – Techniques for representing and reasoning over legal and social rules – explore and understand the impact of law as a driver in shaping the Web development? – Is the present intellectual property regulatory regime fit for purpose in the Web 2.0 (+) environment? What is content in the Semantic Web and what rights should attach to it particularly when much is likely to be “computer generated”? – Which technologies within the Web should the law ensure remain “open” rather than becoming the “property” of one or more commercial entities and what are the consequences of the choices available? – To what extent are the service providers going to become the legal gatekeepers for public authorities in terms of delivering their public policy objectives e.g. Web policing for what is judged to be “illegal and harmful content”? – What privacy issues arise in a Web environment of increasingly sophisticated information sharing? 23
  24. 24. Integrative Research Themes  Collective Intelligence – Technical, socio-economic, legal, psychological  The Openness of the Web – Economic, Legal  The Dynamics of the Web – Mathematical, sociological, legal, linguistic  Security, Privacy and Trust – Economic, social, legal interaction  Inference – Computational, psychological, linguistic 24
  25. 25. Thank you and meet me at Facebook: stevehan Twitter: steve3034 me2day: steve3001 25