Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising
Questions About Objectivity & Bias
The Tetherless World Constellation
Rensselaer Polytechnic Institute, Troy, NY
With thanks to co-author John S. Erickson and the extended RPI Tetherless World Team
The Process of Web Science
Berners-Lee, T. (2007) W3C Keynote. http://www.w3.org/2007/Talks/0509-www-keynote-tbl/#(10)
Via the workshop call: how can we study the phenomena of cybercrime & cyberwarfare that may offer a different perspective of what other disciplines already offer
• Begin with the cycle - where in the cycle does it make sense to start?
• Moving away from just one side of the cycle
How may a Web Scientist explore the topic of cybercrime
and cyberwarfare by offering an integrated study of both
the social and technical aspects of the phenomena?
I. Objectivity & Bias
II. Motivation and example
III. Open questions & future
Objectivity & Bias
Porter (1996) traces objectivity as having multiple interpretations
construed to include notions of fairness, mechanical objectivity, and
Latour’s (2000) critique goes even further suggesting that “objectivity
does not refer to a special quality of the mind . . . but to the
presence of objects which have been rendered ‘able’ to object to
what is told about them” .
The discourse of scientific objectivity and bias has a long debated history in varying definitions in multiple disciplines. Vocal critiques of objectivity and bias in social science has placed it in a contentious position - challenging the need for objectivity and implicit biases
So again, I turn back to this cycle. Unique about what Web Scientists may offer is this integrated/multi-disciplinary approach. However, with this integration, is the inheritance of the same critiques of the disciplines that feed/influence the study.
Examples of bias
“Passive” data collection methods in digital social science research;
considered by some to be more “objective” - a more “natural” method.
“. .. [that] Facebook ‘big’ data is made by users unaware
of or unconcerned about social science researchers
doesn’t change the fact it is made through and around a
structure engineers have coded.”
Jurgenson, N (2014). “Short Comment on Facebook methodology ‘more natural’”. The Society
Pages. website. http://thesocietypages.org/cyborgology/2014/06/09/short-comment-on-facebook-as-
Current examples of critiques of even bias in technical execution - algorithms ! & bias; Twitter studies P!articularly apparent on the sociology level; look at work by Kate Crawford, Nathan Jurgenson, danah boyd. Kate Crawford’s example of Tweets generated during Hurricane Sandy - biased as it did not present the whole pictures; the greatest tweets came from Manhattan, while few tweets came from areas like Breezy Point, Coney Island and Rockaway - “signal problem”: data are assumed to accurately reflect the social
world, but there are significant gaps, with little or no signal coming from particular communities < http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/> !I!n reaction to Facebook’s sociology pre-conference ahead of the American Sociological Association; wherein the claim is that research on such a platform is more “natural” - Methodological Issues:
1) Inadequate attention to implicit and explicit structural biases of the platform(s); most frequently used to generate datasets -
Examples of bias
Twitter as the “model organism” for multiple research communities
• Due to: data availability, tools availability, simple & clean data
• Biased and influenced !
by message length, rapid turnover, public
nature, directed graph interaction
• self-selection bias, signal problems, etc.
Twitter used as a means for population-level research versus selected
 Tufekci, Z. (2013). Big Data: Pitfalls, Methods and Concepts for an Emergent Field. SSRN
(March 2013). http://bit.ly/1jsN0u5!
 Rivers, C. M., & Lewis, B. L. (2014). Ethical research standards in a world of big data.
F1000Research, 3. http://bit.ly/1i2eyLV
Noted by Zeynep Tufecki (Princeton) - !-
Twitter has emerged as a “model organism” (one selected for intensive examination by the research community) - due to data availability, tools availability / popularity etc; simple and clean data structure
- However, not all model organisms are representative of their taxa
- Influenced by message length, rapid turnover, public nature, and a directed graph of social network interaction (where one can follow without consent).
- Hashtag usage; is a self-selection bias & multiple embedded layers of culture and meaning that are assumed !-
Under non-digital circumstances, IRB/ethical guidelines suggest that collection information from a public space where people could “reasonably expect to be observed by strangers” is considered appropriate even without informed consent. —- it could be reasoned then that Tweets are texts published for the
purpose of sharing with others/public (question: Should one still have a reasonable expectation of privacy?)
! - It would be unethical for a researcher to follow one specific shopper around the mall and gather data exclusively without his/her consent; however if the observation is done in aggregate - then it is acceptable; what is this boundary for online?
Rally Research Example
Fieldwork: November 2013 in Washington D.C.
Exploratory observational study on rally/protest behavior during last year’s StopWatching.US rally. !-
i!ndividual behavior and motivation; identification of authority, power and governance structures; and consideration of technology’s involvement as a propagator and facilitator of information flow. “ - Cybercrime” in this instance was defined not as an action of a nation-state unto another nation-state, but rather a single agent’s action onto a nation-state, a definition motivated by the U.S. government’s same use of the term in identifying Edward Snowden’s act as a cybercrime.
Practice of reflexivity
Explicit biases such as the
construction of the initial interview
questions were first examined.
These questions focused on
gathering information related to
individual motivation, modes of
information propagation, levels
topic comprehension, etc
Implicit biases such as
organizational affixation; self-selection
For this example, the researcher
was affiliated with one of the rally
organizing groups and had
access to non-public information.
O!ne practice: Explicit Bias !Implicit Bias !H
ow to capture/incorporate/express these biases for later research?
What ethical standards will the WS community adopt
when exploring online platforms for insight into human
How do we identify and negotiate the intersections
between more descriptive, context-dependent (qualitative
data) with the stand-alone graph analysis or quantitative
How can we identify, express, capture and share both
implicit and explicit biases in our research?
Proponent of mixed-methods
Paper was meant to provoke some additional thought; questions that arise include: