A fantastic map created by XKCD of the online landscape circa 2007. What is apparent is the landscape of fragmented continents, and phenomena that were emerging at the time (e.g. how small Wikipedia is). But the most interesting thing is how relatively small the individual environments, or cultures are.
The same map, four years later. A totally different online world. Where once we could see a plethora of different cultures now we have YouTub and Twitter, and Facebook, overshadowing everything else. This brings up some interesting questions related to how we do research, and especially - how do we do Good research in these environments?
When we think about what people are doing online, we find that they do quite a lot. They talk, communicate, ask question, archive tidbits of their life, they “like” things, they comment, curate, debate, fight, love, meet, bookmark, protest, and these are just a few examples. And all of these activities create a fertile ground for research. We can gain much better understanding of behaviors on a scale bigger than ever before So from a research perspective it is no longer sufficient to look only at interaction through the narrow prism of individual or local small group activity. So it is time to understand what we can gain from those large scale spaces and how to do that.
The data that we have from those large scale spaces can be roughly divided into three archetypes: Individually distinctive personal data - identities, demographics, personal preferences, and contributed content. Such data answer questions pertaining to who is online, who does what online. Structural data – depicting ties and relationships , shared interests topical trendiness, routes of interaction and information dissemination. Such data describes who is connected to whom and how they are connected. Activity logs – detailing actions and behaviors, such as search queries and navigational click streams, reviews and favoring,. Such data describes who does what, when and where. While this data is rich and often easy to collect, no matter how rich logs and metrics are, they do not present the full picture of interaction that happens within those large spaces. This is where qualitative work, and specifically ethnography can help us.
One of the elements missing from quantitative analysis is why people are doing what they’re doing. Questions of conceptualization and construction of understanding, motivation, interaction, closeness and affection are not easily answered by structural data alone. To understand more than roles, positions, and information flow we have to dig deeper. Ethnography, for example, offers us the ability to gain deeper understandings and portray holistic descriptions of a phenomena, a culture including online environments.
Ethnography evolved dramatically in the past century. Cultural anthropology – austere, foreign cultures, stemming out of colonialism. Political interpretation – closer to home, critical, internal perspective. Ethnography in HCI – compressed in time, focused on system rather than cultures. Virtual ethnography – utilizing online environments to explore different cultural properties. We are now facing a new phase in online research that necessitates renewed thinking about ethnography.
Why isn’t ethnography, or qualitative works, more popular in large scale online environments? Because they are thorny, messy, problematic. There are the regular challenges of doing ethnography – going into the field, trying to find informants, getting over objections from gatekeepers, obtaining the right participant observation stance, gaining an intimate familiarity and understanding of the field, etc. But there are quite a few challenges that are unique or enhanced in large scale online spaces.
The immense scope of the data – comprised on interaction between millions of users – requires the researchers to make an a-priori very difficult decision about the boundaries of the study and the scope of the field. Boundaries are difficult to chart as well – researchers are required to endlessly sift through hundreds of millions of pages or interactions as they decided on the exact boundaries of the field. Where will they visit? Who shall they talk to? How long can they be immersed in such environments? What roles are important and how? Which interactions are fundamental for understanding the social structures? Even in “regular” online ethnography researchers have to account for phenomena that are happening in their absence or outside the boundaries of their field but these challenges are amplified more than ever today.
Another challenge is selecting the appropriate unit of analysis. One of the major problems in large scale online spaces is the long tail. Although the long tail allows for heterogeneity to flourish in the short run, in the long run it is homogeneity that prevails. Although theoretically such environments enable us to observe all kinds of online behavior, the selection – in many cases – will lead us to the most prominent, most talkative users or the most popular pages. The moderately connected, the singeltons, the small scale groups may be the most interesting ones to study. But they are so hard to find. There are marginalized and often not heard in research. And that, of course, is counter productive to what ethnography sets out to find.
Many large online spaces have a matrix like structure that augments layers of egocentric personal ties and content. This structure necessitates a priori choice between content and users as a starting point for the inquiry. Since ethnography aims to provide a systematic understanding of a culture or a phenomenon through continuous immersion and observation it may be necessary to observe all the layers of interaction – personal, artifact related. With small scale environments that’s possible. On large scale environments that’s more difficult, even impossible.
Interface, content and platform instability is another problem. . While interface change may lead to serendipitous findings, and inject interest into the study, it is difficult to follow especially in long term studies.
A different type of change is not the result of trend or technological advances, but are the outcome of user decisions. Changes in content leave fragmented interaction, orphan responses and social networks with missing links. The remaining content does not portray the chronology and evolution of interaction, but a very limited part of it, again – making for a difficult understanding of the context and the culture.
As in any research, there are ethical issues to be considered. Because of the scope of the data coming from so many participants from various countries it is important to remember how difficult it is to apply commonly used ethical practices. It’s important to consider the identity of the participants and where they come from. It is also important to remember that digital information can very easily be modified or forwarded (Twitter is an excellent example of that). It’s crucial to attribute content to the right person and uphold the relevant ethical standards and awareness.
Multi-sited ethnography also departed from traditional ethnography to discover meanings and objects in what multi sited ethnographers called “diffuse time-space”. Meaning that macro understandings emerge from piecemeal accounts of individual micro groups spatially and temporally separate. Researchers diligently put together observations that look at specific connections and translate them from one group to another, as long as these groups are relevant to each other, connected somehow. If we treat one large scale environment as multi faceted site, we can create micro accounts that build into a macro “world system”. Juxtaposing layers of interaction makes it easier to justify the selection of entry points and gatekeepers but it can also work towards solving the tension between content and ties, by looking at each from the perspective of a micro account. This can also offer researchers the benefit of both breadth and depth. But in this case significantly large research teams are needed to simultaneously and comprehensively overcome the spatial and temporal limitations.
Social network analysis and other quantitative methods are excellent complementary methods that will point ethnographers to points of interest within the researched landscape. The can highlight clusters of activity and interpersonal ties or personal networks. They can also surface phenomena that will not be observable through qualitative work alone. Using structural analysis to comb or scan the landscape and find where to drill down and take “ethnographic snapshots” will aid in presenting a more comprehensive picture, but also to focus on interesting places. However, using structural analysis should be done with care. Not all research questions can benefit from structural analysis. Sometimes it may take researchers in the wrong direction – towards the more prominent points and users. In addition it’s easy to become enamored with visually appealing rendering of structure.
NLP uses quantitative algorithms to focus researchers on points of interest. NLP and algorithmic analyses of texts can be used to understand interactions and processes that are based in human language. We have not found studies that used NLP as a starting point for ethnography, but it was used to track trending topics, sentiments, conversational motives, opinions and extract situational awareness. Clustering data and using NLP offers the same benefits of structural analysis, but within a different context. It can be used across networks, or within a specific environment. But, as even strong proponents of NLP admit, purely textual based NLP perform poorly in detecting subtle concepts such as motivation and meanings. Also, visual symbols that are prevalent in many online environments (“like”) are not amenable to NLP.
We need to rethink and understand what is the role of ethnography in large scale online environments. We need to think about what are the acceptable or even good ethnographic practices. We need to figure out how, and to which extent we should combine qualitative and quantitative work. It’s not only about mixed methods. That’s a great ides, but we need more than that – we need to create a tool chest that will serve us in the coming years when large scale environments will become even bigger and more important.
Extreme ethnography - challenges for conducting research in large scale online environments
<ul><li>Dana Rotman, Jennifer Preece, Yurong He, Allison Druin </li></ul><ul><li>The 7 th iConference, February 9 , 2012, Toronto </li></ul>
15 years ago Today Size and Scope Small settings, limited number of users (hundreds) Vast distributed networks, unlimited number of users (hundreds of millions) Unit of Analysis Selection of entry point relatively easy; User behavior relatively homogenous; heterogeneous behavior salient Selection of entry point affected by the size of the field; The “long-tail” highlights extremely well connected users Interface Structure Relatively simple Multi layered structure of ties and content Interface stability Infrequent and slow changes Rapid changes to the tool, interface, and content Ethics Budding understanding of online research implications. Distributed user populations are subject to different ethics oversight policies