Engines of Order. Social Media and the Rise of Algorithmic Knowing.


Published on

Talk given at the Social Media and the Transformation of Public Space Conference on June 19 at the University of Amsterdam. References and comments are in the notes section.

Published in: Education, Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Question of classification is not new, obviously and conflicts around classification have a long history.
  • Parameters: a little bit shorter
  • Image from Techcrunch: http://techcrunch.com/2014/04/03/the-filtered-feed-problem/
  • The lists can be seen as vectors as well and then treated with the full arsenal of geometry (e.g. to calculate a similarity coefficient between two such vectors)
  • Engines of Order. Social Media and the Rise of Algorithmic Knowing.

    1. 1. Engines of Order Social Media and the Rise of Algorithmic Knowing Bernhard Rieder Universiteit van Amsterdam Mediastudies Department
    2. 2. Starting point "Algorithms play an increasingly important role in selecting what information is considered most relevant to us, a crucial feature of our participation in public life." (Gillespie 2015) From search engines to social media and beyond, the impression is that socially and culturally relevant tasks are delegated to and performed by algorithms. Because algorithms draw together many different things, there are many ways of beginning to address them. New forms of "knowing" that have quite different means of producing knowledge and of making it performative. Can we think of it as a "style of reasoning" (Hacking 1992)?
    3. 3. My approach to the question As researcher and software developer with the Digital Methods Initiative, I build and apply tools that contribute to "knowing" what is happening on social media, most recently: ☉ Netvizz (Facebook data extraction), Rieder 2013 https://apps.facebook.com/netvizz/ ☉ DMI-TCAT (DMI Twitter Capture and Analysis Toolkit), Borra & Rieder 2014 https://github.com/digitalmethodsinitiative/dmi-tcat/ This project is more closely aligned with a book project that investigates the conceptual content and history of algorithmic information processing. A critical approach is necessary both for my own role in algorithmic knowledge production and for understanding how social media make use of algorithms on various levels. Algorithms used by computational researchers and platforms are similar.
    4. 4. algorithminput output system in use system in use - interface elements - contents - users and uses - interface elements - contents - users and uses - capture - formalization - semantics - display - interactivity - performativity - techniques - parameters - internal states latent order revealed order users tweeting, clicking, navigating, reading, etc. some math 10 trending phrases Algorithmic configurations loads of data results possible effects
    5. 5. Very large numbers and variety in users, contents, purposes, arrangements, etc. "[Commensuration] standardizes relations between disparate things and reduces the relevance of context." (Espeland & Stevens 1998)
    6. 6. Platforms like Twitter provide opportunities for creating connections between defined types of entities (users, messages, hashtags, resources, etc.). They formalize and channel expression, exchange, and coordination. "You cannot reply to a hashtag." "Simply put, a system can only track what it can capture, and it can only capture information that can be expressed within a grammar of action that has been imposed upon the activity." (Agre
    7. 7. Using social media and the Web is like living in a survey. Or rather, in an experiment, since so many parameters are controlled. Grammars need to become more pervasive or more explicit ("deeper") so that more semantic data can be captured.
    8. 8. Data pools in social media are centralized and searchable. Data is used by social media platforms at various instances for various goals. Data is made accessible at varying degree to various actors for various reasons.
    9. 9. Taxonomy of the Encyclopédie (Diderot and d'Alembert ca. 1783)
    10. 10. United States Census Form, 1910
    11. 11. Knowing the many Similar experience of "too many" in different fields: ☉ Maxwell (1859): even if atoms are fully deterministic, we could never model the behavior of a gas by observing individual atoms; => statistical mechanics ☉ Foucault (2004): epidemics, economic dynamics, etc. cast doubt on the family as a model for understanding and governing society; => "population" and social sciences ☉ Bush (1945): "There is a growing mountain of research." => information retrieval Between 1850 and 1940 many techniques to think and analyze "the many" are introduced, looking at the structure and dynamics of interacting ensembles. The "erosion of determinism" (Hacking 1981) means that modes of description are increasingly probabilistic and oriented towards "acting in an uncertain world" (Callon, Lascoumes, Barthe 2001) that can be "tamed" (Hacking 1990) through statistical techniques.
    12. 12. Social media deal with various kinds of "the many" (users, messages, products, ideas, etc.) and strife to provide answers to questions like who to talk to, what to read, where to go, what to buy, etc. in the form of decisions. They make use of various techniques to algorithmically reduce complexity to allow continuous activity.
    13. 13. From classification to calculation Classifications as information infrastructures (cf. Bowker and Star 1999) that orient practice through normalization, standardization, selective discarding, reformulation, positioning, navigational structuring, etc. are still relevant. But various forms of process and calculation are making things much, much more complicated. We are currently seeing a race toward understanding the semantics of expression, behavior, and cultural artifacts.
    14. 14. There are different ways of producing "semantic" data. Users are not only filling up the fields, they are increasingly participating in shaping formalizations. From classifications to classification procedures.
    15. 15. "One of the simplest ways to derive information about a user is to look at the way he uses the system." (Rich 1983) Let's not forget that some of the valuable data are simply a byproduct of people using the system.
    16. 16. What are "personal data"? "Facebook Likes can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender." (Kosinskia, Stillwell, Graepel 2013) The data used in this study does not even include friends' likes. Prediction is determination of likelihood based on knowledge of previous events.
    17. 17. Data is analyzed and made performative immediately inside of the system. New categories can be derived from other data and are instantly made actionable.
    18. 18. Recapitulation By providing functionality through always more fine-grained grammars of action (and other data capturing techniques), social media platforms accumulate loads of structured and unstructured data. The semantization of data in relation to operational contexts (through formalization, derivation, etc.) begins early on. Classification is deeply caught up with calculation and process.
    19. 19. algorithminput output system in use system in use - interface elements - contents - users and uses - interface elements - contents - users and uses - capture - formalization - semantics - display - interactivity - performativity - techniques - parameters - internal states latent order revealed order Algorithmic configurations Algorithmic configurations imply "distributed calculative agencies" (Callon and Munesia 2005) that run through the system and its users. The data arriving at the algorithm has both latent meaning and order: it is related to actual practices and not random noise.
    20. 20. Correlation Coefficient (Galton 1885) Linear Regression (Pearson 1901)
    21. 21. Sociogram (Moreno 1934) Sociometric Matrix (Forsyth and Katz 1946)
    22. 22. Word-Pair Linkages (Luhn 1959) Semantic Road Maps (Doyle 1961)
    23. 23. My Facebook Network Friendship connections
    24. 24. My Facebook Network Friends and their 'likes' My 290 Friends liked at least 20588 objects
    25. 25. My Facebook Network Mapping users according to 'likes'
    26. 26. My Facebook Network Classifying users according to 'likes'
    27. 27. My Facebook Network My post-demographic profile or sphere
    28. 28. Techniques There are many different algorithmic techniques that have complex histories. Each technique reveals the data from a specific angle, but they are highly plastic and can be easily combined. They may be reductionist (e.g. graph theory: everything is a point or line), but also very generative (unlimited number of "views"). Many techniques focus on the relationship between populations and individuals. In social media units can be qualified in terms of other units. All of these techniques are "revealing" (in the sense of Heidegger) the data: they show certain aspects of the latent order in certain ways; they make truth that is caught up in a position towards the world, a finality.
    29. 29. Random Network, Size: inDegree, Color (blue => yellow => red): PageRank (α = 0.25)
    30. 30. Random Network, Size: inDegree, Color (blue => yellow => red): PageRank (α = 0.40)
    31. 31. Random Network, Size: inDegree, Color (blue => yellow => red): PageRank (α = 0.55)
    32. 32. Random Network, Size: inDegree, Color (blue => yellow => red): PageRank (α = 0.70)
    33. 33. Random Network, Size: inDegree, Color (blue => yellow => red): PageRank (α = 0.85)
    34. 34. Random Network, Size: inDegree, Color (blue => yellow => red): PageRank (α = 1)
    35. 35. Parameters Any somewhat complex technique reacts (strongly) to variation in parameters and data. This means that without knowledge of parameters and data, it is hard to understand/critique an algorithm. A single parameter can encode a commitment to a specific theory of power (PageRank at low α is "one person, one vote", at high α "patronage of the powerful"). Parameters are now often set through continuous testing. They are one of the places where empirical practices and operational goals can be brought to converge; - automatically.
    36. 36. We move from "what should the formula be according to our ideas about relevance?" to "what has our testing engine identified as the optimal parameters given our operational goal of more user interaction?". Whenever you read "n000 factors", machine learning techniques are at work.
    37. 37. Machine learning techniques (e.g. Bayesian filters, maximum entropy classifiers, etc.) can learn to "interpret" any input signal in relation to categories, based on feedback ("supervision"). In these techniques, the state of the machine (i.e. the statistical model) becomes the algorithm. These self-optimizing, empirical machines are becoming increasingly common.
    38. 38. The "risk technology" is trained by associating "thousands of pieces of data" with a probability of defaulting or not defaulting. Every signal receives meaning as predictor for defaulting.
    39. 39. States In digital media, we often need to do preciously little to "make things calculable", since everything already has been made so. Algorithms are increasingly empirical knowledge machines, that tie the "real world" to operational modes of optimization and validation. The epistemological commitment, then, is no longer to a theory or model, but to a method for generating models. The difference is thus not just between the "editorial" and the "algorithmic" (Gillespie 2012), but also between "editorial algorithms" and "generated algorithms".
    40. 40. "To date, the complexity of mobile and the disparate, closed platforms that dominate it have caused most people to ignore the possibility and benefits of A/B testing. […] To us at Taplytics this is crazy. If you are developing on the web everything is calculated and optimized and viewed in terms of hypotheses, significance levels and confidence intervals. On mobile, however, for the past 6 years we have been living in the era of the 'artform' of mobile apps, where things are viewed in terms of gut feel and shooting from the hip." (Druxerman 2014)
    41. 41. Since the digital operational environment is fully integrated, data collection, analysis, decision- making, and execution are all folded into one. These are engines of order.
    42. 42. Conclusions Moving from classification to calculation implies a move from "thing concepts" (Dingbegriffe) to "relational concepts" (Relationsbegriffe), of from substance notions of knowledge to functional ones (cf. Cassirer 1910). A good analogue to algorithmic configurations on social media platforms are markets and in particular multi-sided markets (Rochet and Tirole 2004). Just like markets, algorithmic configurations are "places of truth" (Foucault 2004) not in that they show "the truth" but that truth is produced as a byproduct of their optimal functioning e.g. the right price, the right trending topics, the right number and type of stories shown, etc. The right algorithm is the one that produces an optimal equilibrium between user satisfaction and value extraction through advertising.
    43. 43. Conclusions "The current mythology of big data is that with more data comes greater accuracy and truth. This epistemological position is so seductive that many industries, from advertising to automobile manufacturing, are repositioning themselves for massive data gathering." (Crawford 2014) This position is problematic and potentially dangerous if it frames proponents as either naïve ("they don’t know what they are saying") or cynical ("they don't believe what they are saying"). The danger is not that "big data" acolytes are wrong, but that they are right. We should consider this as a real possibility.
    44. 44. Conclusions If they are right, we face a series of really big problems: ☉ If better data + algorithms means better truth, we can expect further concentration and concentric diversification of large Internet companies through tipping markets; ☉ Operational concepts of knowledge and truth would become even more pervasive; ☉ Privacy issues pale compared to the threat of knowledge monopolization and the reconfiguration of publicness according to operational goals that are geared toward profit maximization; ☉ Political institutions and critical forces are direly unprepared for dealing with algorithmic engines of order, both technically and normatively; "I will argue that democratic talk is not essentially spontaneous but essentially role- governed, essentially civil, and unlike the kinds of conversation often held in highest esteem for their freedom and their wit, it is essentially oriented to problem-solving." (Schudson 1997)
    45. 45. Thank You rieder@uva.nl @RiederB http://thepoliticsofsystems.net https://www.digitalmethods.net