Blogviz Thesis by Manuel Lima

4,429 views
4,352 views

Published on

Blogviz is a flash driven visualization model for mapping the transmission and internal structure of top links across the blogosphere. It explores the idea of meme propagation by assuming a parallel with the spreading of most cited URLs in daily weblog entries.

Blogviz is currently a portrait of blogosphere’s topic activity during the first 64 days of 2005. Nevertheless, the model was developed to easily incorporate different timeframes. Blogviz will continue to expand in the future, to the possible point of including real-time data.

found here: http://www.blogviz.com/blogviz/

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,429
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
110
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Blogviz Thesis by Manuel Lima

  1. 1. blogviz Mapping the dynamics of Information Diffusion in Blogspace by Manuel Lima A thesis document submitted in partial fulfillment of the requirements for the degree of Master of Fine Arts in Design and Technology. Parsons School of Design May 2005 Thesis Instructor: Christopher Kirwan Writing Instructor: Mark Stafford Manuel Lima lima@parsons.edu www.blogviz.com
  2. 2. blogviz Mapping the dynamics of Information Diffusion in Blogspace by Manuel Lima Abstract Blogviz is a visualization model for mapping the transmission and internal structure of top links across the blogosphere. It explores the idea of meme propagation by assuming a parallel with the spreading of most cited URLs in daily weblog entries. The main goal of Blogviz is to unravel hidden patterns in the topics diffusion process. What’s the life cycle of a topic? How does it start and how does it evolve through time? Are topics constrained to a specific community of users? Who are the most influential and innovative blogs in any topic? Are there any relationships amongst topic proliferators? Keywords Information Diffusion, Memetics, Weblogs, Online Social Communities, Complex Networks, Information Architecture, Information Visualization, Diffusion of Innovations, Epidemiology, Small Worlds
  3. 3. Acknowledgements − Scott Patterson Jared Schiffman David Kearford Fura Johannesdottir Thank you for your feedback − Christopher Kirwan Mark Stafford Thank you for your guidance, openness and continuous motivation − My dearest Parents Thank you for your eternal support and dedication
  4. 4. Table of Contents 1 Introduction 1 1.1 Concept 1 1.2 Memetics 3 1.3 Diffusion of Innovations 5 1.4 Epidemiology 10 12 2 Impetus 16 2.1 Subject of Analysis 18 3 Context 18 3.1 Online Social Communities 21 3.2 Weblogs 23 3.3 Blogosphere 24 4 Audience 26 5 Precedents 38 6 Methodology 38 6.1 Summer Research 39 6.2 Visual Explorations 42 6.3 Prototype #1 44 6.4 Prototype #2 47 6.5 Prototype #3 50 6.6 Prototype #4 53 6.7 Final Application 63 7 Technical Sources 63 7.1 Blog Engines 64 7.2 Blogviz Data 68 8 Conclusion 73 9 Bibliography Appendix A Summer Research Presentation Appendix B Complex Networks: Visual Explorations
  5. 5. 1 Introduction Blogging presents one of the most interesting social phenomenons of our time. This change in the flow of online information might radically change the way we look at news providers and large media conglomerates. It also provides an extraordinary online laboratory to analyze how trends, ideas and information travel through social communities. 1.1 C0ncept Blogviz is a non-commercial research project developed with the intent of disentangling this highly complex network for further study, research and analysis. The main goal of Blogviz is to improve our understanding of the dynamics of information propagation among weblogs. An underlying question to Blogviz is: “How can we measure meme as a unit of cultural evolution?”. The answer is not easy. Memes, due to their widespread trait and frequent untraceable evolutionary track, become extremely hard to measure accurately. In opposition to this commonly undetectable meme pool, the blogosphere offers a discernible and documented map of thousands of memes, with clear trails of progression, structured by date and time. There are many possible ways of looking at information diffusion in blogspace. It can be based on conversation threads, comment threads, key sentences, themes, tags, or top links. Blogviz analyzes top links, occasionally called topics, which represent the most cited URLs appearing in blog entries in any given day. These popular links represent particular memes that provide an idea of sources, stories and themes that have occupied the attention of bloggers over a certain period of time. By exploring the evolution of these topics through time, Blogviz will not only able to track its popular dispatchers and key innovators, but also, follow its dissemination pattern from the beginning to an eventual tipping point, where it might leap the blog community and reach the mainstream. 1
  6. 6. Blogviz embodies a flash driven interactive visualization model with extensive use of information visualization and information architecture. Why is Information Visualization central to Blogviz? Information Visualization can be defined as quot;the use of computer-supported, interactive, visual representations of abstract data to amplify cognitionquot; (Card, Mackinlay & Shneiderman, 1999). Information Visualization does not only makes data easier for human interpretation but it also discovers and highlights relationships in data elements, usually reducing the processes of searching by gathering information in a small rich space. Therefore, Blogviz employs Information Visualization with the key intent of uncovering hidden patterns in the data and deriving plausible conclusions, which promote an advanced knowledge of information dynamics in blogspace. By unraveling the modus operandi behind the blogosphere we might be able to improve our knowledge on the mechanics of online social communities and, to some extent, the mechanics of complex social networks. Blogviz is currently a portrait of blogosphere’s topic activity during the months of January and February 2005. The selection of a time period was purely arbitrary. In order to make this project a reality within the thesis development time limitations, a decision was made in order to constrain the project to a more specific time span. Nevertheless, the model was developed to easily incorporate different timeframes. Blogviz will continue to expand in the future, to the possible point of including real-time data. Blogviz uses existing data from three different blog search engines organized in a database that will soon be available for public access. (see Technical Sources for additional information) 2
  7. 7. 1.2 Memetics From a conversation with my Thesis Writing instructor, Mark Stafford, I was able to understand how my thesis had become closely related to the concepts of memetics or meme behavior. We came to the conclusion that I was developing a “topological model of meme activity”, even if until then I was somehow oblivious to it. That title actually remained for a while when characterizing Blogviz. But later on I decided to change it, since the word meme was slightly audience limiting and the expression topological could result in inadequate interpretations. I still question why the notion of Memetics didn’t came up in my research earlier, but what is particularly interesting is that it was there from the beginning, immersed in every iteration of my work. I think I was too much concentrated in the idea of a word-of-mouth behavior, an expression used by Malcolm Gladwell in “The Tipping Point” and by Duncan Watts in “Six Degrees: The Science of a Connected Age”. The vital point is that Memetics is the principle theory when contextualizing Blogviz, and because of that, understanding the theory of Memetics is a crucial measure to comprehend the underlying concept of Blogviz. 1.2.1 What’s a Meme? The term was first coined by Richard Dawkins’s, in 1976, on his notorious book “The Selfish Gene”. In the words of Dawkins the word quot;memequot; refers to quot;a unit of cultural transmission, or a unit of imitationquot;. More specifically, a meme can be defined as a self- propagating unit of cultural evolution, a unit of information, held in an individual's memory or in an outside artifact (e.g. book, record or tool), which is likely to be communicated or copied to another individual's memory or retention system. Examples of memes are ideas, catch-phrases, melodies, technologies, icons, theories, inventions, languages, designs, fashions, and traditions. This covers all forms of beliefs, values and behaviors that are normally taken over from others rather than discovered independently. A meme is basically a pattern of information that induces people to repeat it. People try to “infect” each other with memes they find most appealing, despite of the memes' objective value or truth. 3
  8. 8. 1.2.2 What is Memetics? Memetics is the study of evolutionary models of information transmission based on the concept of the meme. In spite of its roots in evolutionary biology and computer simulation, memetics has become more of a social science, focusing primarily on the spread of information within human society. Rather than debate the inherent quot;truthquot; or lack of quot;truthquot; of an idea, memetics is largely concerned with how that idea itself gets replicated. Another definition of Memetics declares it is the theoretical and empirical science that studies the replication, spread and evolution of memes. As portrayed in the Journal of Memetics*: “It’s core idea is that memes differ in their degree of ‘fitness’, i.e. adaptation to the socio-cultural environment in which they propagate. Because of natural selection, fitter memes will be more successful in being communicated, ‘infecting’ a larger number of individuals and/or surviving for a longer time within the population. Memetics tries to understand what characterizes fit memes, and how they affect individuals, organizations, cultures and society at large”. Since the premise of Memetics is to investigate the evolutionary mechanisms that determine the propagation of information within a population of human, animal or artificial agents, we can easily perceive why this science is vital to the understanding of cults, ideologies, or marketing campaigns of all kinds. A meme is acknowledged as a self-propagating unit of cultural evolution, analogous to the gene (the unit of genetics). And because of memes’ similar behavior to life forms, Memetics embraces the analytical techniques of diverse sciences, such as, epidemiology, evolutionary science, immunology, diffusion of innovations, linguistics, and semiotics. * Journal of Memetics (http://jom-emit.cfpm.org) 4
  9. 9. 1.3 Diffusion of Innovations I believe any type of Information Diffusion Model (IDM) in Social Networks must derive extensive practical knowledge from the sciences of epidemiology and diffusion of innovations. These two domains help us understand many of the factors that characterize the spreading of information and adoption process in social communities. Epidemiology and Diffusion of innovations also share many similarities and are surprisingly linked together. For these reasons I decided to include in this thesis a short description of these areas, since in addition to the concept of Memetics, they create an extraordinary context to the understanding of Blogviz. I don’t make wide explanations of each domain but rather comparisons between them on how they relate to this thesis’s assertion. In order to delineate a common ground for the following definitions, this paper assumes that an innovation can be characterized as a new meme, given that it is also described as a new idea. In the context of information diffusion in the blogosphere, it assumes the process of adoption to be the process by which a blogger, aware of the existence of a new meme (or innovation), decides to mention it on his/her own personal blog, in the form of a post or part of a post. This action can be understood as an “adoption” by the blogger of this particular unit of information, therefore contributing to its replication. The study of innovation adoption and diffusion has its origins in the Midwestern United States. In an Iowa State University study, Ryan and Gross (1943) showed that the pattern of adoption and diffusion of a maize hybrid was systematic, hence opening the door for further research. Diffusion is the process by which an innovation is communicated through certain channels over time among the members of a social system (Everett M. Rogers, 1995). The innovation includes quot;any thought, behavior, or thing that is new because it is qualitatively different from existing formsquot; (Jones, 1967). The characteristics of an innovation, as perceived by members of a social system, determine its rate of adoption. Just by analyzing these last statements one can easily grasp a series or similarities with the notion of Memetics. Even to the point that the theory of Diffusion of Innovations also considers the unit of adoption not exclusive to an individual person, but extending to other types of retention systems. 5
  10. 10. The four main elements in the diffusion of new ideas are: (1) The innovation (2) Communication channels (3) Time (4) The social system (context) 1.3.1 The Innovation These are the characteristics that determine an innovation’s rate of adoption: – Relative advantage – Compatibility – Complexity – Trialability – Observability to those people within the social system. 1.3.2 Communication Channels A communication channel is the means by which messages get from one individual to another. Mass media channels are more effective in creating knowledge of innovations, whereas interpersonal channels are more effective in forming and changing attitudes toward a new idea, and thus in influencing the decision to adopt or reject a new idea. Most individuals evaluate an innovation, not on the basis of scientific research by experts, but through the subjective evaluations of near-peers who have adopted the innovation. (Everett M. Rogers) In a broad sense, the communication channel in the context of Blogviz is indubitably the Internet. Without it there wouldn’t even be any kind of communication between bloggers. However, without blogrolls and posting citations within each blog, the restrict channels among them would be very difficult to perceive. Blogrolls are the backbone of blog communities, the edges that keep all the nodes interconnected, and therefore, are the key factors in understanding how information develops across the blogosphere. In fact, a major characteristic of online social communities is that they are based on communication channels, not on physical co-location. A blogroll is a listing of websites that often appear as links on weblogs, usually on a left or right frame of the page. This list of links is used to relate the site owner's interest or affiliation with other webloggers. 6
  11. 11. 1.3.3 Time The Diffusion of Innovations theory divides the element of Time in three main dimensions, in which only two can be fully applied to the context of information diffusion in the blogosphere. > Innovation-decision – The innovation-decision process is the mental course of action in which an individual passes from first knowledge of an innovation to forming an attitude toward the innovation, to a decision to adopt or reject it, and if adopting it, to implement this new idea and confirm the decision. In the case of a blogger deciding to post or not a specific meme in his/her weblog, this decision process is so fast that it’s almost impossible to measure. It applies to other memes, and definitely to other innovations, but it’s not relevant as a measurement in top links replication. > Innovativeness – Innovativeness is the degree to which an individual is fairly faster in adopting new ideas in relation to other members of a social system. Innovativeness, in opposition to the innovation-decision process, is an extremely significant measurement in top links replication, as in most information diffusion models. There are five adopter categories, or member classifications of a social system, based on their level of innovativeness: – Innovators – Early adopters – Early majority – Late majority – Laggards Bell-shaped curve showing categories of individual innovativeness and percentages within each category 7
  12. 12. Innovativeness among social systems is characterized by a bell-shaped curved where time and incidence of adoption are the two main vectors. This concept, in the context of Blogviz, is further explored in the Methodology chapter of this thesis. Many search engines and community tools analyzing the blogosphere, assume a direct correlation between blogs popularity and innovativeness. I believe this assumption is incorrect. Their thinking is very simple. If a specific blog has a high number of inbound links and therefore a sizeable readership, it must imply that it’s in the frontline in finding and publishing original information. The HP Information Dynamics Lab study on the “Implicit Structure and the Dynamics of Blogspace” (Eytan Adar et al) showed exactly the opposite. The study demonstrated that popular blogs are rarely among the first ones to start a specific trend. Many popular blogs claim most of their “discoveries” by not citing their original source, which are usually smaller unfamiliar blogs. The level of popularity of each blog might be directly related to its scale of influence, but not necessarily to its level of innovativeness. So who are these unknown bloggers that bring fresh ideas to the blogspace? Who are these innovators or trendsetters? Blogviz will allow an exposure of these anonymous sources, crucial in the dynamics of topics diffusion. > Rate of adoption – The rate of adoption describes how fast an innovation is adopted by members of a social system in a given time period. When mapping the cumulative adoption time path or temporal pattern of a diffusion process, the resulting distribution can generally be described as taking the form of an S-shaped (sigmoid) curve. Time and cumulative adoption (or infected population) are the plot main vectors. 8
  13. 13. 1.3.4 The Social System The fourth main element in the diffusion of new ideas is the social system, which basically creates a boundary for the diffusion and adoption of an innovation to occur. A social system is defined as a set of interrelated units that are engaged in joint problem- solving to accomplish a common goal (Everett M. Rogers). The members or units of a social system may be individuals, informal groups, organizations, and/or subsystems. In regards to the replication of top links among weblogs, the social system is undoubtedly the blogosphere, depicted as a fertile network of endless social communities. This vast communication network consists of interconnected individuals (bloggers) who are linked by shared interests and patterned flows of information. At a first glance, considering the highly interconnected web of links, connections and shared interests among bloggers, it might seem easy to understand the adoption process of a particular unit of information or innovation. However, another crucial conclusion exposed by the HP Information Dynamics Lab study, mentioned before, declared that “for URLs appearing on at least 2 blogs, 77% of blogs do not have a direct link to another blog mentioning the URL earlier. For those URL’s present on at least 10 blogs, 70% are not attributable to direct links”. There have been several studies on how the system’s social structure, and norms or established behavior patterns, affect the diffusion of innovations within a particular social system. But another area of research that is closely linked to Blogviz relates to opinion leadership. It can be described as the degree to which an individual is able to influence informally other individuals' attitudes or explicit behavior in a desired way with relative frequency. Blogviz allows a broad understanding of opinion leadership in blogspace by tracking and exposing the most influential and innovative topic proliferators. 9
  14. 14. 1.4 Epidemiology Throughout this thesis I use several times the terms contamination and infection when describing the adoption process of memes. Even though this practice might lead to unwanted interpretations, its use is not arbitrary, and it actually facilitates the comprehension of information diffusion dynamics. Epidemiology in its broadest sense is the study of disease patterns in human populations (Wikipedia). Epidemiology can also be described as the study of the determinants, occurrence, and distribution of health and disease in a defined population. Infection is the replication of organisms in host tissue, which may cause disease. A carrier is an individual with no overt disease who harbors infectious organisms. And the notion of dissemination is understood as the spread of the organism in the environment. In the above description, regardless of the different terms, we start noticing several similarities with the domain of diffusion of innovations. This analogy is even more explicit when characterizing the three major elements in disease occurrence, the so-called chain of infection: (1) The etiologic agent (parallel to the innovation) (2) The method of transmission (parallel to the communication channel) (3) The host (parallel to a unit of a social system) Further along in characterizing the disease evolution, the epidemiologic descriptive study organizes data by time, place and person. It is unquestionably the closest approach to the concept of Information Diffusion. It divides the element of Time into four main trends; respectively, secular trends, periodic trends, seasonal trends and epidemics. What’s interesting in this typology of Time is that it applies equally well to the evolution of top links across the blogosphere. Because of that I assume a series of parallelisms between them. The secular trend describes the occurrence of disease over a prolonged period. This continual development is less usual then the seasonal trend in the context of blogspace. This trend usually describes commercial or very popular websites that never lose entirely the bloggers’ interest and as a result have a continuous existence among them. 10
  15. 15. The periodic trend basically expresses a temporary modification in the overall secular trend. It conveys a sudden new interest in a specific meme that is part of a continual trend. The seasonal trend reflects seasonal changes in disease occurrence following changes in environmental conditions that enhance the ability of the agent to replicate or be transmitted. This short transitory trend is the most common in blogspace. A new meme that spreads quickly and rapidly loses interest, dying in a short period of time. The epidemic incidence of a disease happens generally when it surpasses a threshold of 7% of the target population. An epidemic is a sudden and boost in occurrence due to prevalent factors that support transmission. An information epidemic in blogspace might originate a tipping point, where a specific meme escalates and leaps the blogspace, reaching the mainstream. 11
  16. 16. 2 Impetus The main source of motivation for my thesis development is based on a solid cooperation between Information Diffusion, Information Architecture, Data Visualization, and the Science of Complex Networks. My curiosity in Information Architecture was initially fostered in Christopher Kirwan’s MFADT class in the Spring of 2004, and since then, it became a major subject of interest and awareness. I remember observing for the first time a diagram with four interconnected circles representing the continuous Understanding Spectrum. Data originates information, which leads to knowledge and ultimately to wisdom. This concept influenced my vision and made me reflect on the responsibly I had, as a designer, to contribute to this spectrum. The Understanding Spectrum Nathan Shredoff We may have access to an abundance of information but I strongly believe we lack the ability to process it effectively. In face of contemporary technological accomplishments, our ability to generate and acquire data has by far outpaced our ability to make sense of it. Neither raw data nor scattered information offers any level of meaningful understanding. This is where Information Architecture and Information Visualization undertake an important mission. If we are truly entering a fourth phase in human-kind, a theory defended by a large number of anthropologists and sociologist, then Information 12
  17. 17. Architecture is going to be a golden key in the process. In a world increasingly driven by information, it rapidly assumes the form of power, and typifies society in terms of those who own it and those who don’t. Meaningful information is not a given fact, and particularly now, when our cultural artifacts are being measured in gigabytes and terabytes, organizing, sorting and displaying information, in an efficient way, is a crucial measure for intelligence, knowledge and wisdom. In the Spring 2004 semester I was involved in two projects that were decisive in the delineation of my thesis domain of interest and my increased alertness towards Information Architecture and Information Visualization. The first one was a group project developed at the Information Architecture class, taught by Christopher Kirwan. Self- Replicating Cloners was a project aimed at producing visualizations of Virus, their progression through time and world scale dissemination. Two viruses were analyzed by comparison, SARS and MyDoom, each one representing its underlying field, human biology and computer technology. Self-Replicating Cloners Visualizations of Virus (biological/computer generated), their progression through time and worldscale dissemination 13
  18. 18. The second point of awareness was a group project developed in a collaboration studio with Siemens Corporate Research Center. Aimed at Siemens Medical, DSS – Disease Surveillance System was a visualization and communication tool that shared symptomatological data between hospitals and health care professionals for detecting possible disease outbreaks and recognizing development patterns nation wide. DSS – Disease Surveillance System After these two particular experiences, I started my summer research with some clear interests in mind, but still scattered through distinct areas such as artificial life, virology, cognitive science, genetics, cyber biology, epidemiology, and pattern recognition. Emergence, by Steven Johnson, was the first book I read in my research and it was a surprising start. The paradigm of Emergence, which can be described as a “higher-level pattern arising out of parallel complex interactions between local agents”, was slowly overflowing my mind with bright new discoveries. And with an augmented motivation, I started gradually abandoning some initial ideas and, in other cases, finding common links between them, under the sciences of complexity and self-organization. The search for answers on how order can emerge from disorder, and organization emerge from chaos, guide me to initiate a study on the individual parameters of emergent systems, such as collective/macro behavior, self-organizing communities and bottom-up hierarchy. This research led me inevitably to complex systems. Delving into this new area was even more thrilling. Finding each day, a common structure in apparent distinct fields, or similarities between natural systems and human designs, was beyond doubt overwhelming. From that point on, I became extremely fascinated with the omnipresent 14
  19. 19. web of signals and interactions, nodes and links that shape modern complex networks, from social networks, to corporations, cities, living organisms and the Internet. Complexity is a challenge by itself. Complex Networks are everywhere. It is a structural and organizational principle that reaches almost every field we can think of, from genes to power systems, from food webs to market shares. Paraphrasing Albert Barabasi, one of the leading researchers in this area, “the mistery of life begins with the intricate web of interactions, integrating the millions of molecules within each organism”. Humans, since their birth, experience the effect of networks every day, from large complex systems like transportation routes and communication networks, to less conscious interactions, common in social networks. A Scale-Free network, the most common topology in either natural or human systems, is curiously enough, a very recent breakthrough. Since its discovery, 6 years ago, dozens of researchers worldwide have been disentangling the networks around us at an amazing rate. This awareness is helping us understand not only the world around us but also the most intricate web of interactions that shape the human body. The global effort of constructing a general theory of complexity is tremendous and may lead us, not only to a structural understanding of networks, but to major improvements in stability, robustness and security of most complex systems around the globe. Like Barabasi refers in Linked, “Once we stumble across the right vision of complexity, it will take little to bring it to fruition. When that will happen is one of the mysteries that keeps many of us going”. The feature that has always fascinated me the most in complex networks is the dynamics of Dissemination Patterns. The visualization of a path, and inherent duration, of a certain fad, idea, or virus, in a social/biological or computer network has been, since the beginning, a critical point of awareness. How does a particular contagion travel from point A to B, which nodes it affects in its course, and how fast if contaminates a large cluster or the entire network. 15
  20. 20. 2.1 Subject of Analysis After my summer research presentation, in the beginning of the Fall 2004 semester, where I showed all the collected knowledge in the domain of complex networks, I went even further on observing and collecting dozens of network visualization examples and trying several open-source applications. This investigation resulted on my second official presentation. Part of this research also coincided with the work I was developing as a design researcher at Parsons Institute of Information Mapping (PIIM). For additional information on this study please consult section 6.2 of chapter 6 – Methodology. After the second official presentation I was sure of two things: 1 – I wanted to continue my visual explorations exercise, by gathering problems and inconsistencies in complex network diagrams and proposing plausible solutions. 2 – I wanted to map a dissemination pattern in a specific network. By doing that, I intended, not only to be innovative and bring something new to the field, but also display a ‘showcase’ of my visual thinking in terms of complex networks visualization. The first objective was well defined, and best of all, already under development. The major problem was finding a solution for the second point. I had to hit upon a subject that represented all the research and knowledge I had gathered through the summer and the beginning of the Fall 2004 semester. Finding an answer to this quest seemed an impossible task, due to the vagueness of possible directions. At a certain point it was as if I had came back to the start, with the fearful blankness of June assaulting my mind once again. Time was urging and I knew whatever subject I chose, I was still facing an enormous workload ahead of me. The first thing I decided was to go back to my initial interest, the main cause that led me in this escalating exploration of complex networks. I quickly found out my early motivations: virus dissemination and relationships between social/biological and computer/technological systems. One thing I discovered on my summer research is that ideas, fads, trends and innovations show similar dissemination patterns as virus in social networks. The concept of word-of-mouth is a fascinating diffusion behavior that has always intrigued psychologists, sociologists, anthropologists, and lately marketers. To be able to map a word-of-mouth epidemic in a specific social network is a blue-sky scenario. And that might be true, in relation to physical interactions in a physical world between physical 16
  21. 21. individuals. However, a flourishing movement on the Internet presents an interesting experimental laboratory to explore this behavior. Blogging embodies an incredible case of word-of-mouth, where news, ideas and fads travel through community clusters with high adoption rates. Because of their inherent nature blogs became my ultimate fixation and the main frameset for my Thesis. Their high interconnectivity and shared flow of information represent not only an obvious case study of meme propagation, but an outstanding example of a dissemination pattern in a increasingly high complex network, estimated to be over 8 million nodes. As an example, I’ll mention a topic that emerged from the blog community in the beginning of October, 2004. On the first presidential debate for the US Elections 2004, on September 30, 2004, between President George W. Bush and Senator John Kerry, there was an episode that got the attention of a particular viewer. “You forgot Poland” was the abrupt statement made by George W. Bush while John Kerry was enumerating the allied forces present at the Iraq War. The presidential debate occurred on a Friday evening, September 30, and on the following Monday night, there was a topic already sharing 12 links among bloggers. This topic pointed to a specific URL – http://www.youforgotpoland.com. By that time, less than 72 hours after the debate, someone had already created a domain (youforgotpoland.com) and was selling online t- shirts and stickers with the same sentence. A new meme had been born and in a short period of time “infected” several people. This intriguing example reveals the accelerating rate of information flow among bloggers and how fast it spreads or “contaminates” online blog communities. Another issue of awareness, demonstrated by this example, is the possibility of tracking a possible outburst. Imagine this topic reaching the mainstream a week later, possibly a major newspaper or a particular TV show. How interesting would it be, to actually go back in time and discover where this outbreak first originated, the way it was adopted and how fast it grew? These last two queries have undoubtedly become a crucial motivation for the development of my thesis. Quoting Duncan Watts, in regard to the mechanics of social networks: “To understand the pattern, we need to delve further into the rules by which individuals make decisions, and how, in the process, our apparently independent choices become inextricably bound together.” 17
  22. 22. 3 Context The contextual narrowing of my thesis proposal starts on the broad area of Complex Networks, tights its limits on Social Networks and ends at its ultimate contextual boundary, Online Social Communities. Even though this Thesis proposition places itself on the center of a broad group of domains, I decided to deeply explore its closest and more direct domain – Online Social Communities, and the main subject of analysis – Blogs. Nevertheless, besides the omnipresent field of complex networks, the context of this thesis incorporates the domains of Information Diffusion, Memetics, Information Architecture, Data Visualization, Information Theory, Diffusion of Innovations, Epidemiology and Small Worlds. 3.1 Online Social Communities Online Social Communities, although much more concise than the Science of Complex Networks, is still a wide-ranging field that can include mostly every type of online inter- personal communication medium, from e-mail listings/threads, to Usenet groups, MUDs, chat environments, instant messaging, community forums, weblogs, online gamming, interest groups, among others. Online Communities offer an interesting change on the parameters that until now have defined social interaction. Several years after Milgram’s notorious small-world test, Russell Bernard and Peter Killworth did what they called a “reverse small-world experiment”. They interviewed hundreds of individuals, explaining Milgram’s experiment and asking them what personal criteria would they use to get a specific package to someone they didn’t know. Bernard and Killworth’s study found that most of the subjects used only a couple of dimensions to get their message sent to the next recipient. Most predominant dimensions were geography and occupation. Jon Kleinberg, a computer scientist who attended Cornell and MIT, was also motivated by Milgram’s small-world study, and questioned how did the individuals actually found the paths within the network. Kleinberg concluded that people have generally a strong sense of distance, which they use to distinguish themselves from others. A notion of 18
  23. 23. distance can have several factors in which geographical distance is just one of them. Profession, race, religion, income, class, education, are other elements added to the equation, that describe how distant a specific person is from us. From the beginning of human existence, communities were created for the benefits of their own members. Usually by means of expediency, either in relation to the exchange of goods or improved security against enemies, these groups of people occurred as emergent systems by means of social convenience. Geography always played an essential role and without a common shared space most of these communities wouldn’t even exist. With the posterior developments of mail, and more recently, telephone, telex, and fax, human communication became highly enhanced and geography started diminishing its major influence. However, these new “technologies” only improved the way people communicated with each other, by giving them more tools and decreasing the time span and subsequently the distance; other then that, there were no major changes in the way social communities were formed. No matter how fast and easy it became for someone in Europe to talk with someone in America or China, there were never communities created on the basis of telephone calls. If we explore the word syntax structure of most communication tools prior to the Internet, such as telegraph, telex, telegram, and telephone, we encounter the constant presence of the prefix tele-. Tele is a greek word that means “at a distance”, usually implying “to be distant” or “over a distance”. The first use of the prefix tele was in the word telescope which was actually adapted from Galileo’s Italian word telescopi, followed by the word telegraph, meaning “writing at a distance”. Therefore, Telecommunications is the field that embodies all the systems that intent to communicate “at a distant” or “over a distance”. Once again we see the importance of geography as a crucial domain for human communication, where the advancement of technology, since the beginning, has been trying to diminish its constraints, by allowing people to communicate over an ever- present and disturbing distance. I find this analysis particularly interesting in such a way that the Internet, and all features associated with it, has completely abandoned the prefix tele-, drastically assuming the medium, and replaced it with the prefix e-. From e-mail, to e-commerce, and e-business, the prefix e- is usually associated with the latest heat of technological revolution, an abbreviation of the word electronic and an obvious association with the word cyber. 19
  24. 24. The advent of the Internet and the World Wide Web changed these secular communal constraints, possibly forever. The Internet became not just a medium for social gathering and communication, but it absorbed it, and the medium became truly the message. The transmission of information on the Internet is regularly measured in milliseconds, and the time it usually takes for a message to leave a computer in Tokyo and arrive at a computer in New York is more or less the same as a message sent to you, from your next-door neighbor. The difference is merely a few milliseconds, which is by itself a measurement difficult to perceive. Geography, as a crucial criterion for the birth of social communities, has been utterly disregarded by online social communities. Without the limitations of geography and physical interaction and identification, online communities had to rely on a more abstract, but equally distinguishing criteria, interests. By analyzing most current online communities, from online players to chat rooms, blogs and newsgroups, we find out that in the absence of physical recognition, social values like trust, confidence, respect and even friendship are ultimately based on a set of shared interests. And of course, this “virtual” interaction would not be possible without specific communication channels, portrayed as technological sub-systems of the larger medium, the Internet. Personal interests are a central element of our social identity, and subsequently, a highly considered factor in relationships. Paraphrasing Duncan Watts in regards to peer-to- peer networks, “social identity is what leads networks to be searchable”. The fabulous aspect of online communities is the possibility of not only searching these clusters of shared interests, but also tracking the exchange of conversations, ideas and messages between them. By analyzing this data, it’s possible to understand, to some extend, how information travels through these virtual environments. Weblogs, in this conjecture, represent units of a remarkable social laboratory. It’s relatively easy to track their connectivity, but also, due to their highly clustering nature, it’s possible to examine in specific communities, how do news and trends travel through individual bloggers. 20
  25. 25. 3.2 Weblogs Weblogs (alternate: blogs) are not just a new fad among Internet users and they are much more than a collection of online digital diaries of spread interest groups. Blogs represent a change in online information flow and they are becoming a rising news source for many people. We might not even be aware of how influential blogs will be in the future but one thing is sure, there are currently blogs with close to half a million visitors a day, more than many large newspapers, magazines and news broadcasters. Jorn Barger coined the term in 1997 and in 1999 Peter Merholz coined its alternative abbreviation “blog”. As Jorn Barger stated: quot;Weblogs are often-updated sites that point to articles elsewhere on the web, often with comments, and to on-site articles. A weblog is kind of a continual tour, with a human guide [whom] you get to know. There are many guides to choose from and each develops an audience. There's camaraderie and politics between the people who run weblogs. They point to each other in all kinds of structures, graphs, loops, etc.quot; The most common definition of a blog is that of an online diary of thoughts, links, events, or actions posted on a web page with a dated log format. These posts are often, but not necessarily, in reverse chronological order, and are updated on a daily or very frequent basis with new information about a particular subject or range of subjects. Despite this dry classification, the usefulness of a weblog is incredibly rich. Blogs are the vital elements of the personal publishing revolution. If we go back a few years, before the rise of online publishing, the only way someone could write something for general public would be through a letter to the editor, and hope for its message to be published in the magazine’s next issue. For the first time in the history of human communication, any single person has the opportunity to reach millions with their message, as the cliché proclaims, with “the touch of a button”. Instead of being passive consumers of information, Internet users are becoming active participants. This power to the people is debatably a positive trend, since many people subjectively consider this measure adds to the existent “junk” flowing on the Web. Since most blogs don’t obey to any kind of editorial process or peer review and sometimes “play” with anonymity, their public posts also raise legal concerns about intellectual property, defamation, and alike. 21
  26. 26. Controversies apart, blogs, as the World Wide Web, are free democratic resources that embody the concept of free speech, which is unquestionably a right for all. Blogs also exemplify the true concept of diversity. Besides being oblivious to who might use this personal tool, blog content is as varied as the Web itself. The authors of Essential Blogging explain this diversity by pointing out that “creating a taxonomy of the blogiverse is a fruitless task”, since “there’s no good, central directory of blogs that puts each one in its own pigeonhole, because even the most topical blogger will stray from the subject from time to time to celebrate some personal victory or warn his readers off a terrible movie”. One might also argue that in fact, this personal publishing revolution started with the first website, and consequently with the birth of the Internet. This is obviously true, however, until the first blog publishing tools became available, anyone who wanted to circulate their own ideas online, had to be fluent in HTML, web hosting, and aware of most webdesign applications available. Even after GeoCites launch in 1996, offering free web hosting to non-commercial personal pages, web pioneers had to be HTML-savvy people who would spent the evenings working on their websites. Also, these few personal webpages that start populating the Web in the mid 90’s were just a scattered collection of isolated opinions, with no regular updates and unconnected from each other. The big blog phenomenon started escalating in the summer of 1999, when a small web company called Pyra Labs released a product called Blogger. From that point on the blog community exploded and the more bloggers came into scene, more online blog tools became available. This was the beginning of the personal publishing revolution. The inclination towards personalization is reaching every industry, from clothing to cars, from software to medicine. News and Information are just new elements added to the equation. In my opinion, the reasons why many blogs are so successful are due to two major factors: personalization and comforting lassitude. Blogs are usually maintained by a single person who filters the huge amount of available information according to his/her own preferences. For people who share common interests with the blogger, it’s not only exciting to get information from that source, since it’s going to match their inclination to some degree, but it also saves them a lot of time by avoiding the large, more abstract, and sometimes incongruent, news sources. In countries such as the US, where large media sources are becoming increasingly dry and biased, blogs might also represent an oasis of independent information. 22
  27. 27. 3.3 Blogosphere Blogosphere (alternate: blogsphere), or blogspace, is the collective term encircling all weblogs (alternate: blogs). It’s almost impossible to determine with precision the existing number of weblogs, or even the ones currently active. Technorati is a leading search engine for the blogosphere, similar to Google or Yahoo, but exclusive to blogs. Technorati, as of February 2005, was tracking 7,245,866 blogs, and this number is far from stagnating. Out of curiosity, when reviewing this paper on April 6, 2005, I checked Technorati to see how the latest number had changed. To my not-so-surprised amazement, Technorati declared to be tracking 8,469,023 weblogs. It translates in an increase of more than 1 million blogs in less then two months. The latest Pew Internet study estimates that about 27%, or about 32 million, of American Internet users are regular blog readers. They say a new weblog is created every 2.2 seconds, which means there are about 38,000 new weblogs a day. Bloggers update their blogs regularly; there are about 500,000 posts daily, or about 5.8 posts per second. When we’re faced with a number of blogs higher than eight million (at least), it becomes hard to consider its whole as a single community. The blogosphere, in analogy to its medium, the Internet, does not represent a single community but a vast collection of endless communities. These communities shape a complex web of more than 8 million nodes and are key factors in the outburst and further development of trends, fads and innovations. Also, due to its inherent diversity, any kind of classification regarding the blogosphere is a mere exercise of oversimplification. 23
  28. 28. 4 Audience Scientists/Researchers on Complex Networks Hopefully, Blogviz will offer a significant step in this long scientific journey towards the understanding of the dynamics of complex networks. To all researchers, academics, and scientists that have been persistently and bravely disentangling the networks around us, I truly hope this model can produce one important footprint in this expedition. It doesn’t have to be gigantic, just one step forward. By bringing my visual expertise and interest in Information Architecture, Data Visualization and Interface Design, I expect to make a small corner of the vast Science of Complex Networks more clear and understandable. This corner embodies the domain of Online Social Communities and the phenomenon of blogging. Sociologists Professionals, Researchers, Faculty and Students. Blogviz will offer an interesting case study for analyzing a dynamic, ever-changing and complex online social network – the Blogosphere. To map a word-of-mouth spreading in social communication has been, until now, an almost fruitless task. Blogs in the other hand offer an engaging experimental laboratory to better study and understand this occurrence. Memetics is an expanding field of study in social sciences, which is being explored by a significant number of researchers. Blogviz, by making a parallel between meme propagation and topics diffusion in blogspace, makes an important contribution to the understanding of Memetics. Information Architects and Data Visualization enthusiasts Professionals, Researchers, Faculty and Students. I look forward that my passion and fascination for the field of Information Architecture and Data Visualization can be reflected in my thesis project. I truthfully hope that Blogviz can be a relevant precedent in some of your projects, deserve a mention in your research, inspire or influence you at some level. 24
  29. 29. Cultural Critics Blogging presents one of the most intriguing and captivating phenomenons of our time. We might be in for a long ride in the adulteration of most publishing media conglomerates. We cannot really predict the ultimate result of this major drift in the flow of online information, but one thing is sure, it has already started. Blogviz will offer an enhanced insight on the mechanics of this contemporary revolution. Marketers Possibly, the only open door to an eventual commercial viability for the application is based on its relevance for the Marketing industry. Even if Blogviz is a non-commercial research project, it is reassuring to know that it’s potentially useful outside the research and academic realms. Like sociologists, marketers have become more and more interested in the word-of-mouth behavior, even though the more traditional marketing strategists haven’t minimally explored this concept. In the blog community, most bloggers are incorporating the idea of syndication in their blogs, in the form of a data XML file, called RSS, which is basically a list of post summaries and links to them. These files can then be interpreted by a desktop application called a RSS Aggregator, and read by the user without the need to access the specific website. Some consider RSS to be the future of news distribution, and that might well be the case, which explains why, as in any communication medium, advertisement is now starting to infiltrate RSS Feeds. The potential use of Blogviz in this assertion is huge. Marketers interested in investing in the best RSS blog sources for advertisement, could easily track most seen blogs, locate the innovators, the followers, the major dispatchers of information, and then explore the conclusions accordingly. Bloggers Blogviz is a visualization model build to better understand the information dynamics within the blog community. By that order, any interested blogger who feels the need to comprehend the underlying network that he’s part of is a potential user of my research project. 25
  30. 30. 5 Precedents The chain of influences and inspiration for my thesis project is, as expected, extremely widespread and goes from new media art, information architecture, data visualization, complex networks, interface design, among so many other fields, and life in general. Even if I started enumerating major key thinkers whose work I admire and respect, and subsequently absorbed for myself, I expect many names would still be unmentioned from the extensive list of people. In enunciating the key precedents for my thesis, I concentrated exclusively in projects developed in the area of Online Social Communities, my closest encircling thesis domain. Since the major goal of my thesis is to visually map a specific diffusion pattern and the connectivity among blog communities, I decided to establish as precedents, projects that make extensive use of a visual structure to portrait their field of research. 5.1 Blog Epidemic Analizer Authors: Eytan Adar, Li Zhang, Lada Adamic, Rajan Lukose Institution: HP Information Dynamics Lab URL: http://www.hpl.hp.com/research/idl/papers/blogs/index.html Description: HP Information Dynamics Lab created the Blog Epidemic Analyzer as part of their research on information propagation. They released their paper “Implicit Structure and the Dynamics of Blogspace” as a result of this research. Eytan Adar, Li Zhang, Lada Adamic, and Rajan Lukose, used the search engine BlogPulse to map the behavior of the blog community from May 11 to May 21, 2003. Relevance: This project is the closest to my thesis ambition and it obtained exciting results that became pertinent in selecting specific parameters for my work. Although highly useful as a research project, their few tryouts in terms of visualization were extremely poor. Their major breakthrough was announcing that the most popular blogs are not the most innovative, by commonly “stealing” news and information from smaller, less-known blog sources. I believe it’s a very significant allegation that decisively influences the way we understand the mechanics of blog communities. 26
  31. 31. 5.2 Loom2 Authors: Danah Boyd, Hyun-Yeul Lee, Ethan Perry Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/projects/loom2/ Author’s Description: “The goal of our research is to use the salient features of social interaction to build a ‘legible’ interactive visual representation of Usenet. We started by exploring the Usenet environment, constructing a series of relevant questions. From the questions, we have started to explore how this information can be derived from the textual data available online. Simultaneously, we have started designing segments of visualization, under the assumption that the desired characteristics were ascertainable.” Relevance: This project is a major aesthetical inspiration. I believe the use they make of a radial structure fits the purpose of the project quite well, where specific degrees relate to a time dimension and nodes’ colors to specific theme categories. Usenet represents a subject of analysis closely related to blogging, since message/post threads in newsgroups have a similar pattern of contamination as topics among the blogosphere. For the construction of their appealing visual models it’s not surprising the amount of work they had to undertake: “To build our designs, we drew on a wide variety of theoretical and practical concepts from a range of fields, including graphic and interactive design, architecture, sociology, and computer animation.” 27
  32. 32. 5.3 Social Network Fragments Authors: Danah Boyd, Jeff Potter Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/projects/SocialNetworkFragments/index.html Description: “Social Network Fragments was developed as a self-awareness tool for individuals to explore the social networks that they create without structural consideration”. Its goal was to “help users examine their structure so as to unveil the structural holes that are built in such complex networks. These structural holes exist when users choose to fragment portions of their network, often revealing facets of their own identity. As an individual interacts with a diverse range of people, they are motivated to reveal different aspects of their identity, thereby creating a multi-faceted social identity, whereby different people know different things about the individual. In engaging in this behavior, individuals start to segment their social network into a variety of different clusters, or types of people.” Relevance: The visualization of social networks undertakes a major leap in many of the projects produced by the Sociable Media Group (SMG) at MIT Media Lab. With some amazing visual displays the SMG “investigates issues concerning society and identity in the networked world”, addressing questions such as “How do we perceive other people on- line? What does a virtual crowd look like? How do social conventions develop in the networked world?”. Social Network Fragments aims at something so extraordinary as mapping someone’s unnoticed social network. Although it may seem simple and intuitive to track any individual connections to others, this project tries to reach further more then the immediate first-degree acquaintances, by reaching a friend-of-a-friend network. 28
  33. 33. This approach to small world theory has been pursued by some companies, which sell products focusing on social networking management. The idea is simple: don’t just get to the people you know, get to the people they know. Manage your friend-of-a-friend network in order to find the shortest path for whatever you’re looking. Among the leading companies incorporating this concept are: Spoke Software, Visible Path, SRD and In-Q- Tel. Social Network Fragments offers a reasonable visual solution, where I believe some improvements could be implemented. By emphasizing the visual criteria solely on text, color and depth (simulated 3rd dimension), the interface becomes somehow limited to fully explore its content. 5.4 PostHistory Author: Fernanda Viégas Institution: Sociable Media Group - MIT Media Lab URL: http://web.media.mit.edu/~fviegas/posthistory/ Author’s Description: “Most of us deal with email on an everyday basis and some of us have been doing so for several years. Nevertheless, it is hard to perceive the accumulation of this frantic activity, it is hard to get a sense of the number of messages sent and received, not to mention how difficult it is keeping track of how many people have written to you or received messages from you. The aim is to provide users with a novel and hopefully richer experience of their email activities. PostHistory represents an opportunity for reflection and insightful monitoring of fundamental patterns of interactivity. The visualization aims at impressing on the user a sense of daily accumulation, of growth and scale – dimensions not normally conveyed on current email applications.” 29
  34. 34. Relevance: Fernanda Viégas, a brazilian graduate student at MIT Media Lab, is a prolific new media designer that has been involved in many relevant projects. PostHistory is one of her best. What I find most interesting in this project is the series of new structures and features she proposes in order to better understand the pattern created by e-mail activity. This project is visually innovative and it’s a quite an impressive contribute to the field of Information Visualization. Another project conceptually related to PostHistory is Thread Arcs, a fresh interactive visualization technique designed to help people use threads found in email. Thread Arcs, which resulted in a published paper, is a truly interesting visual approach to e-mail threads and even to small sized graphs. This concept is part of a major E-mail Application developed by the Collaborative User Experience team at IBM Research. ReMail is being developed for almost a decade and it aims at improving the knowledge of how people use e-mail, and also, make that experience more functional and straightforward. Some of its features are very encouraging. Thread Arcs ReMail (IBM Research) 30
  35. 35. 5.5 Social Circles Author: Marcos Weskamp URL: http://marumushi.com/apps/socialcircles/ Author’s Description: “Social Circles intends to partially reveal the social networks that emerge in mailing lists. The idea was to visualize in near real-time the social hierarchies and the main subjects they address. When subscribing to a mailing you never know who the principals are, how many people are listening or what subjects they are talking about. It's like entering a meeting room with plenty of people in the darkness and then having to learn who is who by just listening to their voices. Social Circles does not pretend to be a statistical application, but rather aims to raise the lights in that room just enough to let you enhance your perception of what’s happening.” Relevance: Marcos Weskamp is a key thinker in digital information design and a major personal influence. Newsmap, Weskamp’s most famous project, and one of the best online examples of data visualization, gathers google news and displays it in an innovative tree structure map in several languages (http://www.marumushi.com/apps/newsmap). In Social Circles, even thought Marcos Weskamp doesn’t push the project far from the most common network visualization schemas, its concept is very strong, particularly in a recent version of it, where the user can map its own inbox of e-mail messages. 31
  36. 36. 5.6 WebFan Author: Rebecca Xiong Institution: Sociable Media Group - MIT Media Lab URL: http://www.sbox.tugraz.at/home/k/koebi/WebFan%20Description.htm Author’s Description: “WebFan visualizes user activities at WebBoards, or Web-based message boards, which contain messages posted by users. It uses the reply structure of the messages to lay them out using a fan-like hierarchical structure. This abstract structure allows a large set of Web pages with multiple levels to be represented at the same time for overview and comparison. Users can also interactively explore the fan structure to find out more about individual pages. Dynamic user activity is overlaid on top of this display.” Relevance: “Currently, Web users have little knowledge about the activities of fellow users. They cannot see the flow of on-line crowds or identify centers of on-line activity.” WebFan seeks to enrich this experience by visualizing the activity of other people in the message boards. I believe this is a very relevant project, particularly for the unconventional medium of WebBoards, that Rebecca Xiong chose to map. WebFan relates to my thesis project by visualizing overall patterns of usage and answering questions such as: What are people looking at? What is hot? Where do clusters of similar interests form? 32
  37. 37. 5.7 Visual Who Author: Judith S. Donath Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/people/Judith/VisualWho/VisualWho.html Author’s Description: “The population of a real-world community creates many visual patterns. Some are patterns of activity: the web and flow of rush hour traffic or the swift appearance of umbrellas at the onset of a rain-shower. Others are patterns of affiliation, such as the sea of business suits streaming from a commuter train, or the bright t-shirts and sun- glasses of tourists circling a historic site. Visual Who makes these patterns visible. It creates an interactive visualization of the members’ affiliations and animates their arrivals and departures. The visualization uses a spring model. The user chooses groups (for example, subscribers to a mailing-list) to place on the screen as anchor points. The names of the community members are pulled to each anchor by a spring, the strength of which is determined by the individual’s degree of affiliation with the group represented by the anchor”. Relevance: Visual Who, besides offering a motivating contextual precedent in relation to social networks, portraits a tempting method of mapping social connectivity among a set of individuals. It offers an interesting approach to pattern recognition and visualization, although I think it suffers from the same inconsistencies pointed out in the Social Network Fragments project. 33
  38. 38. 5.8 Avatars 2002 Authors: Katy Börner, William Hazlewood, Sy-Miaw Lin Institution: School of Library and Information Science, Indiana University URL: http://ella.slis.indiana.edu/%7Ekaty/gallery/ Description: This project originated a research paper: “Visualizing the Spatial and Temporal Distribution of User Interaction Data Collected in Three-Dimensional Virtual Worlds”. The project is a visualization of the social patterns in the Culture virtual environment, part of the Quest Atlantis universe. The map shows user trails over time. It was produced using a visualization tool developed by Katy Börner and colleagues at the School of Library and Information Science, Indiana University. Relevance: The particular relevance of this project relies on its visual pattern analysis. I think the underlying concept of being able to visually recognize different user trails on a 3D online game is extremely captivating. In a virtual game, many times played with unknown faces, the notions of time and space alter considerably, which makes this project particularly challenging by trying to recreate a defined user trail pattern throughout a physically undefined space. 34
  39. 39. 5.9 PeopleGarden Author: Rebecca Xiong Institution: Sociable Media Group - MIT Media Lab URL: http://www.infovis.net/E-zine/num_46.htm Description: PeopleGarden: Creating Data Portraits for Users proposes the “Data Portrait” as a graphical medium for the visualization of information related to individual users of interactive media. The visual metaphor that PeopleGarden uses is of flowers in a garden. Each data portrait is the trace of the user’s activities and takes the shape of a flower. Relevance: “On-line interaction environments such as Web-based message boards, chat rooms, and Usenet newsgroups have become widely popular. As the number of participants rises, it is increasingly difficult to distinguish individual users and to comprehend the overall interaction context.” In PeopleGarden the representation of a vague virtual space reaches its extreme by allowing it to be portrayed as a digital garden. The concept is that flowers represent individuals in a chat room, and the more time a user stays active in a conversation the more its flower can grow and expand. I think this project is conceptually very strong as it presents an innovative visual method for representing a vague unspecified space. 35
  40. 40. 5.10 History Flow Authors: Martin Wattenberg, Fernanda Viégas Institution: IBM Watson Research Center URL: http://researchweb.watson.ibm.com/history/index.htm Author’s Description: “The history flow application charts the evolution of a document as it is edited by many people using a very simple visualization technique. History flow provides answers at a glance to questions like, Has a community contributed to the text or has it been mostly written by a single author? How much has a particular contributor influenced the current version of the document? Is the text's evolution marked by spurts of intense revision activity or does it reflect a smooth transition from its beginning to the present? The current version of history flow visualizes the evolution of pages from Wikipedia”. Relevance: HistoryFlow is truly one of the most significant projects in reveling hidden patterns from a set of data, otherwise unnoticed by the user. This feature is undoubtedly one of the key strengths of Information Visualization. Using available data from the Wikipedia website, the authors build an inventive visualization model for analyzing the evolutionary pattern of individual contributions to Wikipedia articles through time. This visualization method has some resemblance to Theme River™, developed by the Pacific Northwest National Laboratory (PNNL), but it’s quite impressive the amount of conclusions history flow was able to facilitate. In a lecture given at Parsons D+T Lab, on February 23, 2005, Martin Wattenberg speaking on this project, mentioned that it takes an average of 2 minutes for any kind of article vandalism to be noticed and repaired. 36
  41. 41. 5.11 Listening Post Authors: Mark Hansen, Ben Rubin URL: http://www.earstudio.com/projects/listeningPost.html Author’s Description: “Listening Post is an art installation that culls text fragments in real time from thousands of unrestricted Internet chat rooms, bulletin boards and other public forums. The texts are read (or sung) by a voice synthesizer, and simultaneously displayed across a suspended grid of more than two hundred small electronic screens.” Relevance: Although the toolset and the medium of this project are quite different from the screen- based interactive application intended for my thesis, I believe this project is an amazing precedent and one of the best installations I have ever seen. Exhibited at the List Visual Arts Center, Cambridge, Mass, and the Whitney Museum of American Art, New York, Listening Post has recently been awarded a prize at the Ars Electronica 2004 Festival. Co-author Ben Rubin emphasizes the motivation for the project: “My starting place was simple curiosity: What do 100,000 people chatting on the Internet sound like?”. The significance of Listening Post is remarkable. It displays short messages, randomly picked from chat rooms according to a specific set of keywords, and then, not only it gives life to them by placing the messages in a specific spatial configuration, a “suspended grid of more than two hundred small electronic screens”, but also gives them a sound dimension, which makes the experience truly memorable. This large display of small screens resembles a “window” overseeing the activity in cyberspace. 37
  42. 42. 6 Methodology 6.1 Summer Research My first presentation in the beginning of the Fall 2004 semester enclosed some of the widespread research done through summer. It was entitled “Discovering Complex Networks”. My approach to this first assignment was to face the presentation as a lecture, by educating my audience about the engaging science of complex networks and narrating all the discoveries and knowledge gathered in this initial phase. The presentation contained explanations and diagrams about the specific properties of scale-free networks and took a holistic view by showing diverse examples of complex networks in different domains, as diverse as Gene Networks and Airline Routes. All the images shown at this presentation can be seen in Appendix A – Summer Research Presentation, at the end of this paper. In order to better understand the successive steps that led me to the study of complex networks one should consult the Impetus chapter on this Thesis. There I describe in detail the evolution of my research inclination and motivation course. I ended my Summer Research Presentation with a slide where I stated that my main interest was to “Visually map a dissemination/propagation pattern in a scale-free network”. I also made a short list of additional enquiries, where one could read: > How does an idea, innovation, fad, trend, disease or virus travel from A to B in a specific scale-free network? > How long does it takes? > How many nodes are affected? > How do the hubs react? 38
  43. 43. I finally concluded the presentation by stating what were my future goals. “To choose an area and subject to analyze, where I can bring something new to the field and contribute to its development.” 6.2 Visual Explorations After an extensive research on Complex Networks I started to delve into different ways of visualizing them. The main premise was that complex networks are difficult to visualize, but we don't need to make them more complex in the process of trying. On September 27, 2004, I wrote the following in my thesis diary blog: “My thesis assertion has always been the visualization of dissemination patterns in a particular scale-free network. (…) However, I quickly found out that this premise is based on the assumption that the target network displays a visual structure suitable for analysis. Naturally, most of the time, this assumption is incorrect. Since a visual representation of a dissemination pattern cannot exist without a functional visual representation of the underlying network, I decided to dedicate my time, for now, to the visualization of complex networks. I've been delving into a set of visual explorations, collecting problems and proposing solutions.” quot;Functional visualizations are more than innovative statistical analyses and computational algorithms. They must make sense to the user and require a visual language system that uses colour, shape, line, hierarchy and composition to communicate clearly and appropriately, much like the alphabetic and character-based languages used worldwide between humans.quot; Matt Woolman Digital Information Graphics 39
  44. 44. As acknowledged in another blog entry, also on September 2004: “I've tried several open-source network visualization tools and seen hundreds of visualization examples. I think I found a critical problem. In most tools I've seen, the user starts building its network from an initial node. The user places the first node in the center of the drawing board and then, node after node, link after link, the network starts expanding. Since there's no preceding method of organizing the nodes and links in the designated area, new nodes start naturally occupying any free space available. Unsurprisingly, after a certain threshold, the lattice of lines and nodes becomes unbearable. This problem happens so many times.” The difference between this method and Mark Lombardi's drawings, for example, is a question of organization. Instead of a bottom-up hierarchy described before, Lombardi used to plan his overall design with a holistic view of the entire network, knowing beforehand the amount of space he had and the exact number of nodes and links he needed to draw. Because of this, the cleanness of his drawings, where rarely there's an edge overlapping, is an excellent example of network visualization. What I cannot understand is why Lombardi's method, and alike, aren't taken into consideration whenever someone decides to build a visual representation of a network. A macro approach to the problem is definitely more appropriate. A top-down hierarchy instead of bottom-up. And to say Lombardi's networks where not complex enough is a mere exercise of oversimplifying his work. The beautiful and eloquent global networks of Mark Lombardi 40
  45. 45. Besides the mentioned problem, I encountered two others in my research, which contribute drastically to the huge amount of bad visualization examples of complex networks. First, most visual applications are based in constructive algorithms that obey one rule: display the inputted data. Rarely the notion of how the data is displayed is considered. By that reason, often-stunning visual forms demonstrate a low level of clarity and function. Second, usually programmers who built open-source applications and scientists/researchers who use them, have no visual sensibility or graph drawing knowledge. Many researches produce a visual model of the analyzed network as a mere additional element for showing their research. Sometimes it adds nothing to it. On my second thesis presentation in the Fall 2004 semester, I applied many of my reflections and sketches to practical examples, proposing possible solutions to improve the visualization of complex networks. I divided my solutions into five major steps: The main slides of this presentation can be seen in Appendix B – Complex Networks: Visual Explorations, at the end of this paper. 41
  46. 46. 6.3 Prototype #1 This was my first visual prototype shown at the Fall 2004 mid-term review. This review also marked the birth of the thesis title: Blogviz. The mid-term presentation was entitled Blogviz: An experimental social laboratory. The underlying concept was based on a major aspiration: nodes local stability and links global connectivity. The goal was to map the connectivity among blogs. What I tried was to position the nodes in a structured way, so they would remain fixed, and to some level, under control. The links, however, would be in constant change and the outcome would be highly random and unpredictable. The reason why I chose to sort all the nodes in a precise manner was to be able to isolate the major hubs and have some control over the lattice resulting from the links agglomeration. Looking at it now, it seems the result was too rigid and strict. The radial diagram with its implosive structure reinforces the structure rigidness by resembling a closed system that probably doesn’t describe so well the blogs fundamental openness. Blogviz Visual Studies – Prototype #1 I realized I had to take a different path. I was trying too hard to control the outcome and I believe the result showed exactly that. I had to loose some of my constant need for control and let the system be more auto-sufficient, self-organizing and adaptive. As stated in my Thesis blog in October 24, 2004: “Another criticism I received during the presentation was that I was being to concerned with the visual aspect of it, and that I was thinking too much as a visual designer. Well, although I agree in part with the critic, 42
  47. 47. my thesis assertion has always been the visualization of a specific dissemination pattern, and from my extensive research in complex networks, I truly believe that the only way I can positively contribute to this field is by employing my visual and interface design knowledge. In my first prototype presentation I dissected several problems on the visualization of complex networks and proposed distinct solutions that might solve some of its inconsistencies. I believe there has to be a balance between highly complex network visualizations that offer a poor functionality and highly aesthetic/innovative visual representations that might suffer from the same dilemma. I just have to pursue that balance.” On this same presentation I also illustrated some of my initial studies regarding the linkage among blogs. Connectivity in the blogsphere is a very binary process; we only need to make two questions. Is blog A connected to blog B? If so, who is linking whom? If none of them is linking to the other, they become momentarily isolated islands. For that presentation I showed a few visual studies where I mainly explored the concept of directional linkage, by visualizing inbound or outbound links, or putting it simple, who is linking whom. The images below portrait some of these explorations. 43
  48. 48. 6.4 Prototype #2 While on my first prototype I was trying to deal with a structured way to map connectivity among blogs, by isolating the hubs and sort the nodes according to popularity, on my second prototype, I basically explored possible ways of visualizing diffusion patterns over time. I tried several models based on a radial structure where time became the major imposing element. In most of these experiences I faced a common problem in representing a continuous flow of infected blogs. The underlying radial structure seemed to impose its rigidness by enforcing fractures in the pattern, particularly whenever there was a day transition. Blogviz Visual Studies – Prototype #2 44
  49. 49. Blogviz Visual Studies – Prototype #2 Blogviz Visual Studies – Prototype #2 45
  50. 50. I quickly found out I had to make a change in my visualization thinking, since a radial structure didn’t quite apply to my subject of analysis. Perhaps I was too much influenced or distracted with the Radial Form of Organization Chart from the Alexander Hamilton Institute or Loom 2, by Danah Boyd (et al). Radial Form of Organization Chart (1924) Loom2 - Danah Boyd, Hyun-Yeul lee, Ethan Perry Alexander Hamilton Institute Sociable Media Group - MIT Media Lab As I wrote in my thesis blog on November 16, 2004: “At the moment I’m becoming convinced that a horizontal array is truly the best way of representing the quantitative and temporal qualities of a pattern. Time is a crucial domain in a dissemination pattern, particularly in a word-of-mouth social behavior. The amazing potentialities of a horizontal assortment is the uninterrupted continuous flow of data and the possibility of collapsing time frames and still maintain a sense of scale and understanding of the pattern dynamics.” Blogviz Visual Studies – Horizontal array of adopting units 46
  51. 51. Blogviz Visual Studies Different tryouts where adopting units (blogs) are structured in a vertical and horizontal array After this critical change in my visualization studies I started doing a lot of sketching and writing. I built a few diagrams to get a full understanding of my system; built several taxonomies and dissected the mechanics of blogging. This examination helped me putting my ideas straight and getting a sense of what I was dealing with. 6.5 Prototype #3 On my third prototype I introduced Blogviz as a “topological model of meme behavior”. From the conclusions of my previous tryouts, I decided to deeply explore the notion of a horizontal array of adopting units (weblogs) to portrait the propagation pattern of a specific topic. By doing that I would be constraining the Time element to the X axis. The following images represent a series of tryouts in this context. 47
  52. 52. 48
  53. 53. On this phase of the project I also introduced the first visual taxonomy of blogviz, by dissecting the system and its intrinsic elements. The following image portraits a critical understanding of the inherent structure of blogviz at that stage. At the same time, a list of goals was created (left image) in order to better understand the intent of Blogviz. 49
  54. 54. 6.6 Prototype #4 From a series of independent and spread visual studies that characterized the initial trials, this fourth prototype was the first solid tryout for acknowledging Blogviz as an interactive visualization model. At the time I was pushing the concept of application or tool of analysis, which according to some critics was implying a need for commercial viability. Even though I’m convinced this thesis has several elements that could be successfully applied in commercial applications, my goal with this project is to elevate the understanding of Memetics in a specific social network and conduct a serious research experiment, which I believe fits more adequately within the academic realm. Another point worth of consideration is that, when developing this prototype, Blogviz was intended to work with real-time data, in the form of hourly updated XML RSS feeds. This idea changed afterwards, however, it was a crucial deliberation in the development of this prototype. Prototype #4 – Default First Page 50
  55. 55. A quick explanation on the previous image’s visual schema is that circles represent topics; the diameter corresponds to the total number of adopting blogs; and the colors, pink and green, denote respectively, a decreasing or increasing course. Time is again incorporated in the X-axis, where the closer a circle is from the right edge of the window, the more recent was its last dispatch. The Y-axis position of each circle helps reinforce its level of adoption. The main interaction on this fourth prototype was based on a simple flow. The default first page would allow a swift view on the general pattern by showing the overall condition of current topics popularity. If one decided to investigate more deeply the structure and evolution of a particular topic, it would be taken to a sequence of examination methods. The following images illustrate some of the techniques proposed. Prototype #4 – Blogs’ evolutionary paths through time Prototype #4 – Plotting blogs according to time/popularity 51
  56. 56. Prototype #4 – Detailed View Prototype #4 – Detailed View Prototype #4 – Blogs’ adoption represented by a Tree Map Prototype #4 – Blogs’ analysis by Theme and Generator Prototype #4 – Blogs’ relationship analysis 52
  57. 57. 6.7 Final Application A major drift in the development of Blogviz was the decision of not incorporating real- time data for the backend of the application. As previously stated, on my fourth prototype I was mostly concentrated on developing a visualization schema that would expose current trends in the topics diffusion process, by reading data from hourly updated XML feeds. It would basically display the most adopted topics spreading in the blogosphere in any given time. Even if the application allowed an extended breakdown of each topic other then just a quick view at the present information tendencies, it was just considering a restrict number of topics. I believe Blogviz’s concept, at that phase, was trying to incorporate to many features, or levels of analysis, without being able to develop one efficiently. It was also becoming a trend analysis tool rather then a comprehensive model of topics distribution. I wanted Blogviz to become a serious visualization study on information diffusion in blogspace, and not so much a marketing application. I still believe there’s enormous potential on visualizing popular topics with real-time data integration, and that might be something Blogviz will incorporate in the future. However, I first wanted to better understand the topics’ inner structure and evolution through time. This change in Blogviz progress also coincided with a parallel immersion in the domains of Epidemiology and Diffusion of Innovations Theory. I never imagined that an apparent minor adjustment would require such a drastic turnaround in the project’s conceptualization. Until now, Blogviz had been dealing with a very restrict and manageable time span. Real time data visualization was merely constrained to one day, or at the most, one week. In opposition, by aiming at an adaptive model, the critical goal was to come up with a visualization method that could easily include time variations and still be consistent. Another crucial problem was to visualize, in a very tight space, a high number of topics. I had to come up with a visualization model that would answer these last two problems accordingly. First, it had to be flexible enough to embrace distinct time spans, but at the same time maintain uniformity throughout the process. Second, it had to be able to include a high number of topics, and also, allow an immediate understanding of the overall pattern and the individual life cycle of each topic. 53
  58. 58. On the process of looking for inspiration in diverse sources, I came up with an elucidating diagram by E. J. Marey, on Edward Tufte’s The Visual Display of Quantitative Information, that resolved particularly well many of the challenges I was facing. Original Image: E. J. Marey, La Méthode Graphique (Paris, 1885), p.20. Source: Tufte, Edward R., The Visual Display of Quantitative Information The preceding image illustrates Marey’s graphical train schedule for Paris and Lyon in the 1880’s. The X-axis incorporates Time, measured in hours, and maintains the same scale in both the top edge (corresponding to departures and arrivals from Paris) and the bottom edge (for departures and arrivals from Lyon). The remaining horizontal lines represent other train stations between Paris and Lyon. The diagonal lines represent different trains, leaving and arriving from the two main stations, and the horizontal line- breaks represent waiting time in secondary stations. This chart influenced me greatly in the following steps of my project. I believe it is an extraordinary example of information visualization, where time and pattern become one intrinsic entity, allowing a substantial understanding of the data dynamics in one brief look. I applied a modified version of this concept to Blogviz, where the lines became representative of topics, and the time scale was measured in days. Blogviz’s model doesn’t incorporate any type of constraint on the Y-axis, as Marey’s graph does, therefore the overall height of the main window is rather arbitrary. The following image represents the main visualization window for topics’ evolution within the Blogviz environment. 54
  59. 59. Blogviz’s topics visualization – Topic Lines and Time Scale The interesting characteristic of this model is that, as in the Paris/Lyon train schedule example, the angle of each line has a specific meaning. This happens because both top and bottom edges of the window maintain the same time scale. Therefore, the wider the angle, the shortest is the duration, in this case, the topic’s duration. On the image above for example, one may see a line, close to the center of the window, which seems to be almost vertical; what it means is that the life cycle of that particular topic was very short. This feature is even more relevant for topic lines that have either the starting or ending point outside the present timeframe. I conducted a small experiment within the same model, where the lines, instead of their diagonal placement, were drawn horizontally. This method was probably even more successful when the lines had the starting and ending point inside the selected time span. However, when topic lines had a first day or last day of spreading outside this frame, it would be unpredictable to calculate the amount of days beyond it. What the diagonal alignment facilitates is a full understanding of the topic’s life cycle, even when it spreads outside the present time span. To better understand the intricacies of this visualization model, the following images illustrate the four possible life cycles for every topic line, within each timeframe, and the way they are represented. 55
  60. 60. Topic with first and last day of spreading within the current time span Topic with first day of spreading outside the current time span Topic with last day of spreading outside the current time span 56
  61. 61. Topic with first and last day of spreading outside the current time span The prediction line angle for outsider dates is made through an equation that multiplies the number of days (topic duration) by the number of pixels of each day parcel. So if a specific topic line has the starting point (first day of spreading) within the present timeframe, the last day outside of it, and its total days are 64; the system multiplies 64 by 12 (number of pixels of a day parcel) from the starting point, and as a result, a line is drawn dynamically to the resulting end point. Another feature of this visualization method, further explained in the following Blogviz Interface section, refers to the brightness or color saturation of each line. In Blogviz, the default setting for the lines’ brightness is a depiction of the total number of adopting blogs. This allows for a comprehensible insight when evaluating the overall pattern. On a brief look, one is able to identify the life cycle of each topic, and also, the number of blogs that adopted it. I like to consider the visual representation of this model as a metaphor of a window, overlooking cyberspace, where lines of information flow continuously cross it. 57

×