Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Figures of the ManyQuantitative Concepts for Qualitative ThinkingBernhard RiederUniversiteit van AmsterdamMediastudies Dep...
ContextTerms like "big data", "computational social science", "digital humanities","digital methods", etc. are receiving a...
This presentationHow do we talk about data? How do we analyze them? What is our frameof thought? How do we go further in t...
What styles of reasoning?Hacking (1991) building the concept of "style of reasoning" on A. C.Crombie’s (1994) "styles of s...
"It is hard to believe that we still have to absorb the same types ofactors, the same number of entities, the same profile...
The proliferation of actors and facilitation of transversal connectivity havelead to large and complex forms of socio-tech...
Platforms like Twitterboost opportunities forconnectivity betweenvarious types of actors.
At the same time, theyproduce detailed datatraces that are highlycentralized and searchable.
Quality / quantity"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the formerreads to the...
“facts and statistics collected together for reference or analysis. See also datum.- Computing: the quantities, characters...
Why does the astronaut step into the space shuttle?
A short history of reasoning the "more"Commercial Capitalism (13th +)calculating for trade, arithmetic, sharing risk and p...
Liber Abaci, Fibonacci, 1202Calculation for accounting,money-changing, insurance,lending, measurement, etc.
"Having proved that there die about 3,506 persons at Paris unnecessarily, to thedamage of France, we come next to compute ...
The Assurance of Lifes,Charles Babbage, 1826First life tables wereassembled in the 17thcentury by John Graunt.Babbage buil...
Essai sur la statistique dela population française,Adolphe dAngeville, 1836population census, taxregister, house numbers, ...
Over the last centuries, scientific thinking has become the dominant wayof producing knowledge and making decisions in mos...
2 / Two kinds of mathematicsCan there be data analysis without math? No.Does this imply epistemological commitments? Yes.B...
2 / Two kinds of mathematicsStatisticsObserved: objects and propertiesInferred: relationsData representation: the tableVis...
Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)7K posts, 700K users, 3.6M comments, 10M l...
New media platforms funnel practices into reduced and largely formal"grammars of action" (Agre 1989); data is therefore ve...
Facebook Page "ElShaheeed", June 2010 – June 2011comment timescatter
Facebook Page "ElShaheeed", June 2010 – June 2011comment timescatter, log10 y scale
Facebook Page "ElShaheeed", June 2010 – June 2011:comment timescatter, log10 y scale, likes on
Facebook Page "ElShaheeed", June 2010 – June 2011comment timeline, per day
Facebook Page "ElShaheeed", June 2010 – June 2011comment timeline, per month
Facebook Page "ElShaheeed", June 2010 – June 2011page posts by type, per month
Facebook Page "ElShaheeed", June 2010 – June 2011comparison timeline: comments, posts, comments per post
Facebook Page "ElShaheeed", June 2010 – June 2011histogram of comment lengths in characters
Facebook Page "ElShaheeed", June 2010 – June 2011histogram of like count
Calculating relationships between variablesQuetelet 1827, Galton 1885, Pearson 1901"Erosion of determinism" (Hacking 1991)
Facebook Page "ElShaheeed", June 2010 – June 2011scatterplot comments / likes, with standard error
Facebook Page "ElShaheeed", June 2010 – June 2011:scatterplot comments / likes, per post type
2 / Two kinds of mathematicsStatisticsObserved: objects and propertiesInferred: relationsData representation: the tableVis...
3 / The mathematics of structureGraph theory has a long prehistory; social network analysis starts in the1930s with Jacob ...
Three different force-based layouts of my FB profileOpenOrd, ForceAtlas, Fruchterman-Reingold
Non force-based layoutsCircle diagram, parallel bubble lines, arc diagram
Network statisticsbetweenness centralitydegreeRelational elements of graphs canbe represented as tables (nodeshave propert...
Twitter 1% sample, 24 hours: 4.3M tweets, 3.4Musers, 2M accounts mentioned, 227K unique hashtags
Helpful: baseline samplingTwitters API proposes a random 1% statuses/sample endpoint that doesnot require privileged acces...
A baseline provides reference pointsBeware of averages in non-normal distributions! But 1% sample issufficiently large to ...
Twitter 1% sample, co-hashtag analysis227,029 unique hashtags, 1627 displayed (freq >= 50)Size: frequencyColor: modularity
Size: frequencyColor: user diversityTwitter 1% sample, co-hashtag analysis227,029 unique hashtags, 1627 displayed (freq >=...
Size: frequencyColor: degreeTwitter 1% sample, co-hashtag analysis227,029 unique hashtags, 1627 displayed (freq >= 50)
Nine measures of centrality (Freeman 1979)
Label PR α=0.85 PR α=0.7 PR α=0.55 PR α=0.4 In-Degree Out-Degree Degreen34 0.0944 0.0743 0.0584 0.0460 4 1 5n1 0.0867 0.06...
Twitter 1% sampleCo-hashtag analysisDegree vs.wordFrequency
Degree vs. userDiversityTwitter 1% sampleCo-hashtag analysis
Facebook Page "ElShaheeed"700K nodes, 11M connectionsColor: type
Facebook Page "ElShaheeed"700K nodes, 11M connectionsColor: outdegree
ConclusionsThere is a lot of excitement about data analysis, but our understanding ofstyles and analytical gestures is sti...
"Incite, induce, deviate, make easy or difficult, enlarge or limit, render more orless probable… These are the categories ...
Thank Yourieder@uva.nlhttps://www.digitalmethods.nethttp://thepoliticsofsystems.net"Far better an approximate answer to th...
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Upcoming SlideShare
Loading in …5
×

Figures of the Many - Quantitative Concepts for Qualitative Thinking

4,485 views

Published on

Slides for a talk given at the Big Data Symposium at Parnassos in Utrecht on April 25 2013.

Published in: Education, Technology, Spiritual
  • Be the first to comment

Figures of the Many - Quantitative Concepts for Qualitative Thinking

  1. 1. Figures of the ManyQuantitative Concepts for Qualitative ThinkingBernhard RiederUniversiteit van AmsterdamMediastudies Department
  2. 2. ContextTerms like "big data", "computational social science", "digital humanities","digital methods", etc. are receiving a lot of attention.They point to a set of practices for knowledge production: data analysis,visualization, modeling, etc.Instead of a totalizing search for a "logic" of data analysis, we couldinquire into the vocabulary of analytical gestures that constitute thepractice of data analysis.A twofold approach to methods:☉ Engagement, development, application => digital methods☉ Conceptual, historical, and political analysis and critique => software studies
  3. 3. This presentationHow do we talk about data? How do we analyze them? What is our frameof thought? How do we go further in terms of imagination, expressivity?☉ 1 / Confronting "the many"☉ 2 / Two kinds of mathematics☉ Objects and their properties => Statistics☉ Objects and their relations => Graph theoryEngage the theory of knowledge (epistemology) mobilized in data analysis,but through the actual techniques and not generalizing concepts.
  4. 4. What styles of reasoning?Hacking (1991) building the concept of "style of reasoning" on A. C.Crombie’s (1994) "styles of scientific thinking":☉ postulation and deduction☉ experiment and empirical research☉ reasoning by analogy☉ ordering by comparison and taxonomy☉ statistical analysis of regularities and probabilities☉ genetic developmentWhat kind of reasoning are we mobilizing in data analysis?Is the history of styles of reasoning simply intellectual progress, oradaptation to a changing world, or co-constitutive of that world?What is our world like?
  5. 5. "It is hard to believe that we still have to absorb the same types ofactors, the same number of entities, the same profiles of beings, andthe same modes of existence into the same types of collectives asComte, Durkheim, Weber, or Parson [sic], especially after science andtechnology have massively multiplied the participants to be cooked inthe melting pot." (Latour 2005, 260)
  6. 6. The proliferation of actors and facilitation of transversal connectivity havelead to large and complex forms of socio-technical grouping andstructuring.Forms of organization take the shape of (multi-sided) markets basedaround technological platforms that facilitate transactions.Social media use simple but flexible grammars of connectivity(combination of point to point and list forms), exchange, and aggregationthat accommodate various practices and levels of scale.The diversity of practices, contents, geographies, topologies, intensities,motivations, etc. makes it hard to generalize and theorize dynamics of use.1 / The many
  7. 7. Platforms like Twitterboost opportunities forconnectivity betweenvarious types of actors.
  8. 8. At the same time, theyproduce detailed datatraces that are highlycentralized and searchable.
  9. 9. Quality / quantity"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the formerreads to the latter the first sentence of The Sociological Imagination: Nowadays men oftenfeel that their private lives are a series of traps. Lazarsfeld immediately replies: How manymen, which men, how long have they felt this way, which aspects of their private livesbother them, do their public lives bother them, when do they feel free rather than trapped,what kinds of traps do they experience, etc., etc., etc. If Mills succumbed, the two of themwould have to apply to the National Institute of Mental Health for a million-dollar grant tocheck out and elaborate that first sentence. They would need a staff of hundreds, and whenfinished they would have written Americans View Their Mental Health rather than TheSociological Imagination, provided that they finished at all, and provided that either of themcared enough at the end to bother writing anything." (Maurice Stein, cit. in Gitlin 1978)Theory vs. empiricism, macro vs. micro, qualitative vs. quantitative, inductive vs.deductive, associative vs. formalistic, etc.The promise of data analysis tools, applied to exhaustive (and cheap) data, is tobridge the gap, to allow zooming, "quali-quanti" (Latour 2010).
  10. 10. “facts and statistics collected together for reference or analysis. See also datum.- Computing: the quantities, characters, or symbols on which operations are performed by acomputer, being stored and transmitted in the form of electrical signals and recorded onmagnetic, optical, or mechanical recording media.- Philosophy: things known or assumed as facts, making the basis of reasoning orcalculation.” (Oxford American Dictionary)Define: dataReasoning (OAD): "think rationally", "use ones mind", "calculate", "make senseof", "come to the conclusion", "judge", "persuade", etc.Reasoning as "giving reasons" – what counts as a good reason? What counts as agood argument? As a proof? What is "good" knowledge?Reasoning as a series of techniques, e.g. science, engineering, etc.
  11. 11. Why does the astronaut step into the space shuttle?
  12. 12. A short history of reasoning the "more"Commercial Capitalism (13th +)calculating for trade, arithmetic, sharing risk and profit in long-distance commerceRise of the Nation State (17th +)"art of the state", mercantilism, scientific revolutionIndustrialization (19th +)urbanization, scientific management, large bureaucracies☉ Fibonacci, "Liber Abaci", Fibonacci, Calculating with Arab numerals (Pisa, 1202)☉ Unknown, "Arte dellAbbaco", Practical arithmetic (Venice, 1478)☉ Pacioli, "Summa de arithmetica, geometria, proportioni et proportionalità", Double entrybookkeeping (Venice, 1494)☉ William Petty & John Graunt, Political Arithmetick (17th century)☉ Hermann Conring & Gottfried Achenwall, Statistik (17th & 18th century)☉ Adolphe Quetelet, Statistical regularities and the "average man" (19th century)☉ Francis Galton & Karl Pearson, Public health and eugenics (late 19th century)
  13. 13. Liber Abaci, Fibonacci, 1202Calculation for accounting,money-changing, insurance,lending, measurement, etc.
  14. 14. "Having proved that there die about 3,506 persons at Paris unnecessarily, to thedamage of France, we come next to compute the value of the said damage, andof the remedy thereof, as follows, viz., the value of the said 3,506 at 60 livressterling per head, being about the value of Algier slaves (which is less than theintrinsic value of people at Paris), the whole loss of the subjects of France in thathospital seems to be 60 times 3,506 livres sterling per annum, viz., 210,360livres sterling, equivalent to about 2,524,320 French livres." (Petty 1655)
  15. 15. The Assurance of Lifes,Charles Babbage, 1826First life tables wereassembled in the 17thcentury by John Graunt.Babbage builds a machineto produce tables faster.
  16. 16. Essai sur la statistique dela population française,Adolphe dAngeville, 1836population census, taxregister, house numbers, etc.modern statistics, largebureaucracies, quantitativesocial sciences, etc.
  17. 17. Over the last centuries, scientific thinking has become the dominant wayof producing knowledge and making decisions in most societies.Scientific thinking implies various styles of reasoning, different ways of"giving reasons", different analytical gestures, etc.Styles are intrinsically connected to our "lifeworld" (Husserl 1936).Two diagnoses:☉ Our lifeworld is changing in significant ways => "the many"☉ We need new ways of making sense of it => data analysisWhat is the style of data analysis? Its epistemology? One or many?What are its techniques, its analytical gestures?Some conclusions for part 1
  18. 18. 2 / Two kinds of mathematicsCan there be data analysis without math? No.Does this imply epistemological commitments? Yes.But there are choice, e.g. between:☉ Confirmatory data analysis => deductive☉ Exploratory data analysis (Tukey 1962) => inductiveThere is a fast growing variety of analytical gestures focusing on largenumbers of formalized and classed objects.
  19. 19. 2 / Two kinds of mathematicsStatisticsObserved: objects and propertiesInferred: relationsData representation: the tableVisual representation: quantity chartsGrouping: class (similar properties)Graph-theoryObserved: objects and relationsInferred: structureData representation: the matrixVisual representation: network diagramsGrouping: clique (dense relations)
  20. 20. Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)7K posts, 700K users, 3.6M comments, 10M likes (tool: netvizz), work in progress!
  21. 21. New media platforms funnel practices into reduced and largely formal"grammars of action" (Agre 1989); data is therefore very clean, verycomplete, and very detailed.Can be imported with great ease into standard packages that come withmany analytical gestures built in R, Excel, SPSS, Rapidminer, etc.).Tools are easy, concepts are hard.Statistics
  22. 22. Facebook Page "ElShaheeed", June 2010 – June 2011comment timescatter
  23. 23. Facebook Page "ElShaheeed", June 2010 – June 2011comment timescatter, log10 y scale
  24. 24. Facebook Page "ElShaheeed", June 2010 – June 2011:comment timescatter, log10 y scale, likes on
  25. 25. Facebook Page "ElShaheeed", June 2010 – June 2011comment timeline, per day
  26. 26. Facebook Page "ElShaheeed", June 2010 – June 2011comment timeline, per month
  27. 27. Facebook Page "ElShaheeed", June 2010 – June 2011page posts by type, per month
  28. 28. Facebook Page "ElShaheeed", June 2010 – June 2011comparison timeline: comments, posts, comments per post
  29. 29. Facebook Page "ElShaheeed", June 2010 – June 2011histogram of comment lengths in characters
  30. 30. Facebook Page "ElShaheeed", June 2010 – June 2011histogram of like count
  31. 31. Calculating relationships between variablesQuetelet 1827, Galton 1885, Pearson 1901"Erosion of determinism" (Hacking 1991)
  32. 32. Facebook Page "ElShaheeed", June 2010 – June 2011scatterplot comments / likes, with standard error
  33. 33. Facebook Page "ElShaheeed", June 2010 – June 2011:scatterplot comments / likes, per post type
  34. 34. 2 / Two kinds of mathematicsStatisticsObserved: objects and propertiesInferred: relationsData representation: the tableVisual representation: quantity chartsGrouping: class (similar properties)Graph-theoryObserved: objects and relationsInferred: structureData representation: the matrixVisual representation: network diagramsGrouping: clique (dense relations)
  35. 35. 3 / The mathematics of structureGraph theory has a long prehistory; social network analysis starts in the1930s with Jacob Morenos work.Graph theory is "a mathematical model for any system involving a binaryrelation" (Harary 1969); it makes relational structure calculable.
  36. 36. Three different force-based layouts of my FB profileOpenOrd, ForceAtlas, Fruchterman-Reingold
  37. 37. Non force-based layoutsCircle diagram, parallel bubble lines, arc diagram
  38. 38. Network statisticsbetweenness centralitydegreeRelational elements of graphs canbe represented as tables (nodeshave properties) and analyzedthrough statistics.Network statistics bridge the gapbetween individual units and thestructural forms they areembedded in.This is currently an extremelyprolific field of research.
  39. 39. Twitter 1% sample, 24 hours: 4.3M tweets, 3.4Musers, 2M accounts mentioned, 227K unique hashtags
  40. 40. Helpful: baseline samplingTwitters API proposes a random 1% statuses/sample endpoint that doesnot require privileged access.Provides datasets for researching certain types of questions and allows to"contextualize" (baseline) other collections.We (Gerlitz / Rieder 2013) explored 24 hours of the 1% sample andcaptured 4,376,230 tweets, sent from 3,370,796 accounts, at an averagerate of 50.65 tweets per second, leading to about 1.3GB of uncompressedand unindexed MySQL tables.
  41. 41. A baseline provides reference pointsBeware of averages in non-normal distributions! But 1% sample issufficiently large to allow representative exploration of subsamples.We can qualify structures and individual elements in terms with the helpof statistics and graph theory.
  42. 42. Twitter 1% sample, co-hashtag analysis227,029 unique hashtags, 1627 displayed (freq >= 50)Size: frequencyColor: modularity
  43. 43. Size: frequencyColor: user diversityTwitter 1% sample, co-hashtag analysis227,029 unique hashtags, 1627 displayed (freq >= 50)
  44. 44. Size: frequencyColor: degreeTwitter 1% sample, co-hashtag analysis227,029 unique hashtags, 1627 displayed (freq >= 50)
  45. 45. Nine measures of centrality (Freeman 1979)
  46. 46. Label PR α=0.85 PR α=0.7 PR α=0.55 PR α=0.4 In-Degree Out-Degree Degreen34 0.0944 0.0743 0.0584 0.0460 4 1 5n1 0.0867 0.0617 0.0450 0.0345 1 2 3n17 0.0668 0.0521 0.0423 0.0355 2 1 3n39 0.0663 0.0541 0.0453 0.0388 5 1 6n22 0.0619 0.0506 0.0441 0.0393 5 1 6n27 0.0591 0.0451 0.0371 0.0318 1 0 1n38 0.0522 0.0561 0.0542 0.0486 6 0 6n11 0.0492 0.0372 0.0306 0.0274 3 1 4
  47. 47. Twitter 1% sampleCo-hashtag analysisDegree vs.wordFrequency
  48. 48. Degree vs. userDiversityTwitter 1% sampleCo-hashtag analysis
  49. 49. Facebook Page "ElShaheeed"700K nodes, 11M connectionsColor: type
  50. 50. Facebook Page "ElShaheeed"700K nodes, 11M connectionsColor: outdegree
  51. 51. ConclusionsThere is a lot of excitement about data analysis, but our understanding ofstyles and analytical gestures is still very poor.We need interrogation and critiques of methodology that are developedfrom engagement and historical/conceptual investigation.We need analytical gestures that are more closely tied to concepts fromthe humanities and social sciences; exploration rather than confirmation.Visualization and simpler tools are very interesting but require technicaland conceptual literacy to deliver more than illustrations.This is probably not a fad.
  52. 52. "Incite, induce, deviate, make easy or difficult, enlarge or limit, render more orless probable… These are the categories or power." (Deleuze 1986, 77)
  53. 53. Thank Yourieder@uva.nlhttps://www.digitalmethods.nethttp://thepoliticsofsystems.net"Far better an approximate answer to the rightquestion, which is often vague, than an exact answer tothe wrong question, which can always be made precise.Data analysis must progress by approximate answers, atbest, since its knowledge of what the problem really is willat best be approximate." (Tukey 1962)

×