Escaping greatdivide coimbra

659 views

Published on

A long conference and a workshop that I gave (with Paul Girard) at the University of Coimbra in the framework of the project "The Importance of Being Digital". The theme of the conference was how digital methods help overcome several classic binary oppositions of traditional social sciences.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
659
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 1
  • Even more important the IssueCrawler marked a key turn in the relation between social sciences and digital media. Up until few years ago, social scientists conceived electronic media as nothing more than new terrains for old methods. Notions such as “cyber-culture” (Negroponte, 1996), “virtual communities” (Rheingold, 2000), “online identities’” (Turkle, 1995) were introduced to channel the novelty of new media within the tradition of social sciences.
  • Yes, the cartography of controversy has a liking for digital techniques. Yet, the enthusiasm for digital innovation should be mitigated by three cautions that is crucial to keep in mind not to be carried away by the digital hype: 1. Google is not the world 2. More data means more noise 3. Digital data is not your data
  • The first caution is the easiest to understand and can be summarized as such: “It takes more than a Google search to map a controversy”. When handling digital data always consider what they represent and keep in mind this four simple facts: SEE THE SLIDE
    (1) Even if portals and search engines are constantly expanding their databases, they cannot grow as fast as the web. Every day hundreds of thousands of new pages are created and only a fraction is reached by the engine robots. Sometimes contents remain invisible because they are too marginal or ephemeral, sometimes because they are concealed by their authors, sometimes they are just forgotten.
    (2) Even if more and more information is exchanged via the hypertext transfer protocol (http) and under the form of an html page, a large slice of electronic traffic travels through other routes. E-mails, teleconferences, chats, peer-to-peers exchanges, document transfers and many other data do not transit via web protocols.
    (3) Not all digital information is shared on a computer network and not all networks are connected to the Internet. For every piece of information diffused on the Internet, hundreds of other data are buried inside the memory of offline computers or limited to LANs.
    (4) Even if in western societies computers are more and more ubiquitous, important portions of collective life remain impermeable to digital mediation. No matter how pervasive technology will get, face-to-face interactions will never lose their importance. Last but not least, the world is bigger than western societies (especially in an age of globalization) and other societies are proving to be much more resistant to digital penetration.
  • The second caution has to do with the fact that having more traces of collective life does not immediately mean having more data on collective life. As Bruno Latour (1993) has observed, despite the etymology of the word data (datum in Latin means given), “on ne devrait jamais parler de ‘données’ mais toujours de ‘obtenues’” . Data are never given, they need to be extracted, cleaned, indexed, prepared for the analysis. In other words, it is necessary to separate the information for the noise.
  • This work is sometime called ‘data mining’ and this metaphor should be taken very seriously. Everyone who ever visited a gold mine knows well that what is striking about this type of landscape is the feeling of absence that dominate them. Where a mountain is supposed to be, there is instead a huge hole. Describing mining as the act of collecting gold and other precious materials is mistaking the aim for the practice. 0.1% of mining is about collecting precious substances, 99,9% of it is about removing tons and tons of rocks, sand and earth. Gold is the product of such absence, what is left when everything else is gone.
  • The same is true for information mining: it is not about collecting as much data as possible (that should be called ‘compulsive hoarding’); it is about getting rid of most of it. This is important, because the current ‘data deluge’ ideology, obsessed as it is with the question of collecting, storing, exploiting data, forgets that the careful selection of data is most important part of every scientific protocol.
  • A good example of the importance of selection may come from the comparison of two maps if the Web. The first is the so-called Internet map (http://internet-map.net). This impressive map is, to our knowledge, the largest publicly available map of the Web. Aiming at exhaustively this map is both vain (because the Web is so big and changes so quickly that no map will ever capture more than a tiny fraction of it) and useless because little knowledge can be extracted from it. All that we can see is that the Web is polarized by language (the color of the nodes) and that some nodes are (far) more connected than the other (size of the nodes). None of this is a surprise.
  • A good map of the Web is always limited in its ambition: it tries to represent a limited portion of the Web and the better this portion is delimited, the better is the map. A convincing example of this strategy is map of the French political blogosphere, realized by Linkfluence for Le Monde (politicosphere.blog.lemonde.fr).
  • Because the selection of the websites has been done carefully it is possible to use this map as a research tool and discover for example, that the extreme left and the extreme right have two very different position in French online politics: the first is little, spread out and central; the second is massive, clusterized and eccentric.
  • The third cautions has to do with the fact that, unlike the data obtained through the traditional methods of social sciences, digital data are generally not collected for the sake of social science. In most cases, digital data are collected for the need of marketing (as in the case of the FEDELTA’ or credit cards), surveillance (as in the case of frontier crossing), administration (as in the case of cadastral maps) or technical optimization (as in the case of travel ticketing or servers’ log). In any case, these are second hand data whose construction is not mastered by the researchers.
    Before using these data is therefore necessary to question the conditions of their production. If, for example, the World Bank publishes all its statistical data (data.worldbank.org), one should try to know how these figures have been produced and why they have been made public. If Wikileak release thousands of diplomatic cable by the US Embassies (--- ADD REFERENCE ---), one should ask if the usage of these data is morally correct. If Wikipedia opens an API allowing to download the edit history of all its pages (mediawiki.org/wiki/API), one should reflect on the specific practices of this online community (Viegas, Wattenberg, Kriss, & Van Ham, 2007).
  • A few years ago, Chris Anderson published a controversial article on the journal Wired, in which he argued for The End of Theory:
    “At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right. Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content”.
    This argument is misleading for the reason I gave in the previous paragraph: learning something from digital traces requires separating information from noise. But things are even more complicated, because there is not way to what is information and what is noise without knowing how the traces have been constructed.
  • An example will make my argument cleared. Some years ago, I was striving with some colleagues to make sense of Google Insight for Search data and use them for social research. Reading the literature, we stumbled on an amazing discussion paper by Askitas and Zimmermann (2011), in which the two economists claimed to have found a striking correlation between the unemployment rate and the search for anti-depressors’ side effects. The result was compelling: when the unemployment rate begins to rise because of the economical crisis of 2008, so does the query for anti-depressors’ side effects.
  • Trying to reproduce these findings, however, we noticed something strange: it was not the name of the anti-depressors that matched with unemployment, but the expression ‘side effects’. At first we thought that people might have been taking more medicines in general when they lose their job, but than we found out that other words had the same curve and, in particular, the word ‘template’, which also start being more searched at the end of 2008.
  • We were striving to make sense of this, when it occurred to us that in late 2008 Google enabled by default its ‘suggest’ feature. This feature is meant to auto-complete common search expressions: when you ask it Google about a dish, it will asky you if you want to know about its recipe, when you ask about motivational letter, it will ask you if you are looking for a template and when you ask about drug, it will ask you if you want to know about its side-effects.
  • The main aim of this course is to teach you how to avoid jumping from the frying pan of positivism to the fire of relativism.
    Or, as the say in Thailand, escape a tiger, meet a crocodile.
  • The main aim of this course is to teach you how to avoid jumping from the frying pan of positivism to the fire of relativism.
    Or, as the say in Thailand, escape a tiger, meet a crocodile.
  • The main aim of this course is to teach you how to avoid jumping from the frying pan of positivism to the fire of relativism.
    Or, as the say in Thailand, escape a tiger, meet a crocodile.
  • 34
  • 35
  • 36
  • 37
  • 38
  • The main aim of this course is to teach you how to avoid jumping from the frying pan of positivism to the fire of relativism.
    Or, as the say in Thailand, escape a tiger, meet a crocodile.
  • 41
  • 42
  • 43
  • In the next chapter, we will see how the power of networks as tools for computing, visualizing and manipulating information mixed with the growing availability of data brought by digital traceability could transform the very roots of social sciences. The advantages of networks, however, should not induce to neglect the many differences that exist between actor-network theory and network analysis. Four in particular make classic network analysis unfit to operationalize actor-network theory.
    The first and possibly the most important is that while in ANT ‘networks’ and ‘actors’ are the same thing, in network analysis they have completely different properties: while nodes are indivisible and impenetrable (as atoms were supposed to be in physics, before smaller elementary particles took their place), networks are by definition composite. The second and third difficulties come from the lack of differentiation of standard graph theory. In ANT different associations can have different effects (opposing someone has not the same effect of supporting him/her), while in network analysis edge can be of different type but they will all have the same mathematical effect (possibly with different weight or in different direction). Likewise, whereas in ANT actors differ in their potential of association (remember the example of the shepherd, the dog and the fence, who are capable to associate with the sheep in very different ways), in network analysis all nodes connects in the same way. Finally, ANT is a theory of change, what counts in it is the transformation of the actors and their relations. Network analysis, at least in its standard form, has been developed for static networks and handles very badly the dynamics.
  • 57
  • 58
  • 59
  • But networks are also maps. One of the first proof of this had been provide in 1933 when the sociologist Jacob Moreno published on the NY Times this image. The network portrays the relations of friendship in an elementary school. The title of the article reads “Emotions Mapped by a New Geography”, explicitly stating that the purpose of the visualization is to represent social relations as in a geographical map. Once you know that the triangles in the image represent the boys of the class and the rounds represent the girls, the genre separation becomes evident as well as the first (romantic?) relationship within the class.
  • Networks can be interpreted as geographical maps because the proximity of their points is significant: it means something. Of course there is a capital difference between geographical maps and networks. In the former, the position of the points is depends on a system of coordinates defined before and independently from the points. In the latter, on the contrary, it is the nodes and their relations that define a space that has no autonomous existence.
    The clearest illustration of this difference can be drawn from the history of underground maps. Until the 30s, underground maps were designed by placing the stations according to their geographical coordinates and then drawing the lines that connected them.
  • Then came Harry Beck and he understood that he could legibility by positioning nodes according to their connectivity, rather then their coordinates. Nowadays all underground maps are designed this way. This does not mean, of course, that the distance in the underground maps has lost all meaning: only that its meaning has changed from a geographical distance to a distance in connectivity.
  • In this course, however, we will spatialize networks by using a set of algorithms called ‘force-vector’. These algorithms works by arranges the nodes in the space by simulating a physical system where nodes repulse each other while arcs bounds them like springs.
  • When the algorithm is launched, the nodes are moved by the opposite forces until they reach a situation of equilibrium.
  • When the algorithm is launched, the nodes are moved by the opposite forces until they reach a situation of equilibrium.
  • 70
  • 71
  • 72
  • Back to the ‘Overview’ window there are three main palettes that we will employ in the analysis: 1. The ‘Layout’ palette, to change the position of the nodes 2. The ‘Ranking palette, to change the size of the nodes 3. The ‘Partitions’ palette, to change the color of the nodes
  • 74
  • To identify the clusters, therefore, the first thing to do is to spatialize the network using a force-vector algorithm. The first action that we will do on our graph is to spatialize it with the ForceAtlas 2 layout. This algorithm can be tweaked by changing several parameters, the most important of which are - LinLog mode (maximizes the legibility of clusters) - Prevent overlap (enhances legibility, but distorts spatialization) - Scaling (increases/decreases all distance proportionally) - Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) - Approximate repulsion (reduce the time required to spatialize large graphs, but distorts spatialization)
  • 76
  • 77
  • … it is easy to identify the areas which contains no or few nodes, also called structural holes …
  • 82
  • 83
  • - Central clusters (located in the middle of the network), because centrality in a spatialized graph is a sign of high and highly diverse connectivity. - Bridging clusters (located in-between two clusters), because this clusters play a crucial role in allowing the circulation of things in the network.
  • - Central clusters (located in the middle of the network), because centrality in a spatialized graph is a sign of high and highly diverse connectivity. - Bridging clusters (located in-between two clusters), because this clusters play a crucial role in allowing the circulation of things in the network.
  • To identify the clusters, therefore, the first thing to do is to spatialize the network using a force-vector algorithm. The first action that we will do on our graph is to spatialize it with the ForceAtlas 2 layout. This algorithm can be tweaked by changing several parameters, the most important of which are - LinLog mode (maximizes the legibility of clusters) - Prevent overlap (enhances legibility, but distorts spatialization) - Scaling (increases/decreases all distance proportionally) - Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) - Approximate repulsion (reduce the time required to spatialize large graphs, but distorts spatialization)
  • 87
  • 88
  • - The in-degree, corresponding to the number of incoming edges (the number of connection pointing toward the node). The in-degree of a node is also called its ‘authority score’, because receiving many connections is generally correlated to the fact that the node is considered ‘important’ or ‘remarkable’ by the other nodes of the network.
  • The out-degree, corresponding to the number of outgoing edges (the number of starting from the node). The out-degree of a node is also called its ‘hub score’. Hubs are important in networks because the play a crucial role in the circulation of the information.
    Of course, in-degree and out-degree can only be computed in directed graphs (graph in which the connections have a direction). In non-directed graph (such as a graph of friendship, if we assume that friendship is always mutual), it is however possible to compute the degree of nodes (the number of edges connected to a each node).

  • The second window is the ‘Data Laboratory’ where you have a table view of the nodes and the edges of your graph and their attributes.
  • To identify the clusters, therefore, the first thing to do is to spatialize the network using a force-vector algorithm. The first action that we will do on our graph is to spatialize it with the ForceAtlas 2 layout. This algorithm can be tweaked by changing several parameters, the most important of which are - LinLog mode (maximizes the legibility of clusters) - Prevent overlap (enhances legibility, but distorts spatialization) - Scaling (increases/decreases all distance proportionally) - Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) - Approximate repulsion (reduce the time required to spatialize large graphs, but distorts spatialization)
  • 93
  • 94
  • But it is also interesting to observe if topology and classification are consistent (if most of the nodes of a given type are located within the same clusters and, conversely, if clusters are formed by nodes of the same type).
  • If topology and classification are consistent, it is then interesting to zoom on the exceptions and have a closer look to the nodes that have and unusual position compared to the other nodes of the same type.
  • If topology and classification are consistent, it is then interesting to zoom on the exceptions and have a closer look to the nodes that have and unusual position compared to the other nodes of the same type.
  • If topology and classification are consistent, it is then interesting to zoom on the exceptions and have a closer look to the nodes that have and unusual position compared to the other nodes of the same type.
  • The most important thing to understand about Table2Net are the three types of networks that it can generate. In the slides you see some examples of tables and the networks that can be extracted from them.
  • The heatmaps of references tend to be concentrated in specific zones of the graph (e.g. Turing’s famous article on morphogenesis, Fig. 2a).

    Specialized institutions are also extremely focused (e.g. the Imperial College Blackett Lab, Fig. 2b).

    Keywords polarize the network rather than clustering it: they differentiate fields but also contribute to the overall coherence of the network. Accordingly, their ego-centered maps, although denser in specific zones, tend to span to the whole network (e.g. “magnetic properties”, Fig. 2c).

    Finally, non-specialized institutions play the role of bridges connecting to the farthest zones of the graph (e.g. ETH Zurich, Fig. 2d).
  • Escaping greatdivide coimbra

    1. 1. Escaping the Great Divide How actor-network theory, digital methods and network analysis can make us sensitive to the differences in the density of associations Tommaso Venturini
    2. 2. Today’s special menu 1. Beyond the intensive / extensive discontinuity 2. Beyond the aggregating / situating discontinuity 3. Beyond the micro / macro discontinuity 4. Feeling the density of association 5. Visual Network Analysis 6. The médialab’s toolbox
    3. 3. Follow the White Rabbit why controversy mapping (and digital methods) will change everything you know about sociology Tommaso Venturini tommaso.venturini@sciences-po.fr The strabismus of social sciences Photo credit – tarout_sun via Flickr - ©
    4. 4. 3 discontinuities • 1. In data: intensive data / extensive data • 2. In methods: situating / aggregating • 3. In theory: micro-interactions / macro-structure
    5. 5. Part I Data: intensive / extensive
    6. 6. The quali/quantitative divide poor data on large population extensive data intensive data rich data on small population
    7. 7. The media as an object of study Photo credit – Brandon Doran via Flickr - ©
    8. 8. The media as carbon paper Chris Harrison Internet connections
    9. 9. The rise of digital methods Virtual reality Late ‘80-early ‘90 (Barlow, Turkle, Negroponte, Rheingold) Virtual society? 1997-2002 (Steve Woolgar et al.) Cultural analytics 2007 (Lev Manovitch) Digital methods 2009 (Richard Rogers) https://soundcloud.com/mit-cmsw/richard- rogers-digital-methods
    10. 10. Extensive data Paul Butler, 2010 Visualizing Friendships
    11. 11. Intensive data AOL user 711391 search history www.minimovies.org/documentaires/view/ilovealaska
    12. 12. Extensive and intensive data Google Flu www.google.org/flutrends
    13. 13. Extensive and intensive data Google Flu www.google.org/flutrends
    14. 14. Extensive and intensive data Google Flu www.google.org/flutrends
    15. 15. Beware! 1. Google is not the world 2. More data means more noise 3. Digital data is not your data
    16. 16. It takes more than Google to map a controversy 1. search engines are not the web 2. the web is not the Internet 3. the Internet is not the digital 4. the digital is not the world
    17. 17. Beware: more data means more noise!
    18. 18. Taking “data mining” seriously Yanacocha Gold Mine, Cajamarca, Peru
    19. 19. Compulsive hoarding
    20. 20. An (pseudo-) exhaustive map of the Web http://internet-map.net
    21. 21. A good map of the Web politicosphere.blog.lemonde.fr
    22. 22. A good map of the Web politicosphere.blog.lemonde.fr
    23. 23. Beware: digital data is not your data!
    24. 24. This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. Chris Anderson http://www.wired.com/science/discoveries/ magazine/16-07/pb_theory The end of theory?
    25. 25. Beware: more data means more noise! Askitas, N., & Zimmermann, K. (2011). Health and Well-Being in the Crisis. IZA Discussion Paper
    26. 26. Beware: more data means more noise!
    27. 27. http://googlesystem.blogspot.fr/2008/08/go ogle-suggest-enabled-by-default.html Beware: more data means more noise!
    28. 28. Part II Methods: situating / aggregating
    29. 29. (Collective) life is complicated Andreas Gursky 1999 Chicago, Board of Trade II
    30. 30. Situating VS aggregating
    31. 31. La fabrique de la loi http://www.lafabriquedelaloi.fr
    32. 32. http://contropedia.net/demo Contropedia
    33. 33. Borra, E., Weltevrede, E., Ciuccarelli, P., Kaltenbrunner, A., Laniado, D., Magni, G., Mauri, M., Rogers, R. and Venturini, T. (2014). Contropedia - the analysis and visualization of controversies in Wikipedia articles. In OpenSym ’14: The International Symposium on Open Collaboration Proceedings. http://contropedia.net/demo Contropedia
    34. 34. http://www.climaps.eu EMAPS (climaps.eu)
    35. 35. 2014 - Venturini, T., Baya-laffite, N., Cointet, J., Gray, I., Zabban, V., & De Pryck, K. Three Maps and Three Misunderstandings: A Digital Mapping of Climate Diplomacy. Big Data & Society, 1:1 EMAPS (climaps.eu) http://www.climaps.eu
    36. 36. Part III Theory: micro-interactions / macro-structure
    37. 37. The micro/macro distinction Merian & Jonston 1718 Folio Ants, Clony, Nest, Insects Thomas Hobbes, 1651 The Leviathan
    38. 38. What micro/macro means An ontological fracture The collective self is not a simple epiphenomenon of its morphologic base, precisely as the individual self is not a simple efflorescence of the nervous system. For the collective self to appear, a sui generis synthesis of individual self has to be produced. This synthesis creates a world of feelings, ideas, images that, once come to life, follow their own laws. An emergent fracture In certain historical periods, social interactions become much more frequent and active. Individuals seek one another out and come together more. The result is the general effervescence that is characteristic of revolutionary or creative epochs… This stimulating action of society is not felt in exceptional circumstances alone. There is virtually no instant of our lives in which a certain rush of energy fails to come to us from outside ourselves. Emile Durkheim, 1912 Le formes élémentaires de la vie religieuse
    39. 39. What micro/macro hides http://zgrossbart.github.io/hborecycling/ the ontological fracture hides other (more relevant) fractures
    40. 40. What micro/macro hides http://zgrossbart.github.io/hborecycling/ the ontological fracture hides other (more relevant) fractures The emergent fracture hides the work to build and maintain it http://en.wikipedia.org/wiki/Maxwell's_demon
    41. 41. What is disorder I am personally rather tolerant of disorder. But I always remember how unrelaxed I felt in a particular bathroom which was kept spotlessly clean in so far as the removal of grime and grease was concerned. It had been installed in an old house in a space created by the simple expedient of setting a door at each end of a corridor between two staircases. The decor remained unchanged: the engraved portrait of Vinogradoff, the books, the gardening tools, the row of gumboots. It all made good sense as the scene of a back corridor, but as a bathroom – the impression destroyed repose. Mary Douglas (1966) Purity and Danger
    42. 42. What is disorder In chasing dirt, in papering, decorating, tidying we are not governed by anxiety to escape disease, but are positively re-ordering our environment, making it conform to an idea. There is nothing fearful or unreasoning in our dirt- avoidance: it is a creative movement, an attempt to relate form to function, to make unity of experience. If this is so with our separating, tidying and purifying, we should interpret primitive purification and prophylaxis in the same light. Mary Douglas (1966) Purity and Danger
    43. 43. From boundaries to boundary work Fences make good neighbors Gieryn, Thomas F. (1983) Boundary-work the demarcation of science from non-science American Sociological Review 48(6): 781–795 Demarcation is as much a practical problem for scientists as an analytical problem for sociologists and philosophers
    44. 44. The lesson of ANT It is not that in collective life there are no boundaries (between micro and macro, science and politics…) It is that all boundaries are constantly constructed, de-constructed and re-constructed Social researchers cannot take social boundaries for granted, for their job is to study such work of (de-/re-)construction (Venturini, T. (2010). Diving in magma: how to explore controversies with actor-network theory. In Public Understanding of Science, 19(3), 258–273. )
    45. 45. In the Presence of the Holy See UNRWA photo archive image of Dheisheh refugee camp after the 1948 partition justaposed with T. Habjouqa’s 2012 photo of Israel’s wall near Beit Hanina, Jerusalem.
    46. 46. Part IV Becoming sensitive to the differences in the density of association
    47. 47. 3 discontinuities • 1. In data: intensive data / extensive data • 2. In methods: situating / aggregating • 3. In theory: micro-interactions / macro-structure
    48. 48. Overcoming the 3 discontinuities • 1. In data: intensive data / extensive data Digital traceability and computation (data scientists) • 2. In methods: situating / aggregating Datascape navigation (designers) • 3. In theory: micro-interactions / macro-structure A non-emergentist theory of action (actor-network theorist)
    49. 49. The fabric of (cooked) rice Roland Barthes (1970) The Empire of Signs Cooked rice (whose absolutely special identity is attested by a special name, which is not that of raw rice) can be defined only by a contradiction of substance; it is at once cohesive and detachable; its substantial destination is the fragment, the clump; the volatile conglomerate… it constitutes in the picture a compact whiteness, granular (contrary to that of our bread) and yet friable: what comes to the table to the table, dense and stuck together, comes undone at a touch of the chopsticks, though without ever scattering, as if division occurred only to produce still another irreducible cohesion (pp. 12-14).
    50. 50. Why are we so fascinated by networks? Paul Butler, 2010 Visualizing Friendships
    51. 51. A network (graph) is not a network (actor-network) Actor-Network Theory Complex Network Analysis Actors and networks have the same properties (they are the same) ≠ Networks are composite while nodes are indivisible and uncombinable Different mediations (can) have different effects ≠ All edges have the same effect (possibly with different weight) Different actors (can) have different association potential ≠ All nodes have equal linking potential A-N are always seen from one or more specific viewpoints ≠ Networks are usually seen from above/outside What counts is change ≠ Networks are statics
    52. 52. A network (graph) is not a network (actor-network)
    53. 53. A question of resonance A diagram of a network, then, does not look like a network but maintain the same qualities of relations – proximities, degrees of separation, and so forth – that a network also requires in order to form. Resemblance should here be considered a resonating rather than a hierarchy (a form) that arranges signifiers and signified within a sign (p. 24). Munster, A. (2013). An Aesthesia of Networks Cambridge Mass.: MIT Press
    54. 54. Networks
    55. 55. Mathematical networks analysis Euler, 1736, Solutio problematis ad geometriam situs pertinentis
    56. 56. Visual networks analysis
    57. 57. The fabric of collective life Jacob L. Moreno, April 3, 1933 The New York Times Social life is continuous but not homogenous Doing social research is becoming sensitive to the differences in the density of association
    58. 58. Network as maps London Underground 1920 map homepage.ntlworld.com/clivebillson/tube/tube.html - www.fourthway.co.uk/tfl.html
    59. 59. Network as maps London Underground 1933 map (Harry Beck) homepage.ntlworld.com/clivebillson/tube/tube.html - www.fourthway.co.uk/tfl.html
    60. 60. Force-vector algorithms
    61. 61. Force-vectors’ magic trick
    62. 62. Force-vectors’ magic trick Jacomy, M., Venturini, T., Heyman, S. & Bastian, M. (2014) ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software. PlosONE, 9:6
    63. 63. Force-vector: Yes, but which?
    64. 64. Can we trust force-vectors? NoYes
    65. 65. Can we trust force-vectors? NoYes ?!?
    66. 66. Part V Visual Network Analysis
    67. 67. Semiology of graphics Bertin J., Sémiologie graphique, Paris, Mouton/Gauthier-Villars, 1967
    68. 68. Visual (aka preattentive) variables
    69. 69. Visual variables A B C
    70. 70. A. nodes position – layout B. nodes size – ranking C. nodes color – partitions 3 visual variables of analysis Gephi.org
    71. 71. Visual network analysis questions A. Position (force-vector spatialization) 1. Nodes density Where are structural holes (under-populated regions)? Where are clusters an sub-clusters (over-populated regions)? Which are the largest and most cohesive clusters? 2. Relative position Which nodes/clusters are globally and locally central? Which nodes/clusters are global and local bridges (between clusters)? B. Size (ranking by in-degree / out-degree) 3. Nodes connectivity Which nodes are the authorities (receive most connections)? Which nodes are the hub (originate most connections)? C. Color (color by partition) 4. Distribution Is typology coherent with topology (partitions coincide with clusters)? Which are the exceptions (‘misplaced nodes’)?
    72. 72. Technical step: Spatialization with ForceAtlas 2 • LinLog mode (maximizes the legibility of clusters) • Prevent overlap (enhances legibility, but distorts spatialization) • Scaling (increases/decreases all distance proportionally) • Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) • Approximate repulsion (accelerate spatialization on large graphs, but distorts spatialization)
    73. 73. Visual network analysis questions A. Position (force-vector spatialization) 1. Nodes density Where are structural holes (under-populated regions)? Where are clusters an sub-clusters (over-populated regions)? Which are the largest and most cohesive clusters? 2. Relative position Which nodes/clusters are globally and locally central? Which nodes/clusters are global and local bridges (between clusters)? B. Size (ranking by in-degree / out-degree) 3. Nodes connectivity Which nodes are the authorities (receive most connections)? Which nodes are the hub (originate most connections)? C. Color (color by partition) 4. Distribution Is typology coherent with topology (partitions coincide with clusters)? Which are the exceptions (‘misplaced nodes’)?
    74. 74. Reading principle: Identify regions where the density of nodes is - lower (structural holes) - higher (clusters) Questions: - Where are structural holes? - Where are clusters an sub-clusters? - Which clusters are most represented in the network? - Which clusters are most cohesive? A.1. Position: nodes density
    75. 75. Main cluster and structural holes
    76. 76. Sub-clusters
    77. 77. Modularity
    78. 78. Denser and larger clusters
    79. 79. Visual network analysis questions A. Position (force-vector spatialization) 1. Nodes density Where are structural holes (under-populated regions)? Where are clusters an sub-clusters (over-populated regions)? Which are the largest and most cohesive clusters? 2. Relative position Which nodes/clusters are globally and locally central? Which nodes/clusters are global and local bridges (between clusters)? B. Size (ranking by in-degree / out-degree) 3. Nodes connectivity Which nodes are the authorities (receive most connections)? Which nodes are the hub (originate most connections)? C. Color (color by partition) 4. Distribution Is typology coherent with topology (partitions coincide with clusters)? Which are the exceptions (‘misplaced nodes’)?
    80. 80. Reading principle: Indentify what is in the center - of the graph - of each cluster Identify what is between clusters Questions: - Which nodes/clusters are globally and locally central? - Which nodes/clusters are global and local bridges? A.2. Position: relative position
    81. 81. Central nodes and clusters
    82. 82. Bridging nodes and clusters
    83. 83. Technical step: The ranking palette
    84. 84. Visual network analysis questions A. Position (force-vector spatialization) 1. Nodes density Where are structural holes (under-populated regions)? Where are clusters an sub-clusters (over-populated regions)? Which are the largest and most cohesive clusters? 2. Relative position Which nodes/clusters are globally and locally central? Which nodes/clusters are global and local bridges (between clusters)? B. Size (ranking by in-degree / out-degree) 3. Nodes connectivity Which nodes are the authorities (receive most connections)? Which nodes are the hub (originate most connections)? C. Color (color by partition) 4. Distribution Is typology coherent with topology (partitions coincide with clusters)? Which are the exceptions (‘misplaced nodes’)?
    85. 85. Reading principle: Indentify which nodes that - receive more connections - originate more connections Questions: Which are the authorities of the network? Which are the hubs of the network? B.3. Size: node connectivity
    86. 86. Authorities
    87. 87. Hubs
    88. 88. Technical step: Data laboratory window Gephi.org
    89. 89. Technical step: the partition palette
    90. 90. Visual network analysis questions A. Position (force-vector spatialization) 1. Nodes density Where are structural holes (under-populated regions)? Where are clusters an sub-clusters (over-populated regions)? Which are the largest and most cohesive clusters? 2. Relative position Which nodes/clusters are globally and locally central? Which nodes/clusters are global and local bridges (between clusters)? B. Size (ranking by in-degree / out-degree) 3. Nodes connectivity Which nodes are the authorities (receive most connections)? Which nodes are the hub (originate most connections)? C. Color (color by partition) 4. Distribution Is typology coherent with topology (partitions coincide with clusters)? Which are the exceptions (‘misplaced nodes’)?
    91. 91. Reading principle: - Evaluate if nodes of the same color are close - Identify ‘misplaced’ nodes Questions: - Is typology coherent with topology? - Which are the exceptions? C.4. Color: distribution
    92. 92. Typology and topology
    93. 93. Exceptions
    94. 94. Polarization
    95. 95. Polarization
    96. 96. Visual network analysis
    97. 97. Visual network analysis Venturini, T., Jacomy, M, De Carvalho Pereira, D. Visual Network Analysis: The example of the rio+20 online debate (working paper)
    98. 98. Part VI The médialab toolbox
    99. 99. The médialab toolkit http://tools.medialab.sciences-po.fr
    100. 100. The médialab toolkit https://github.com/medialab
    101. 101. The médialab toolkit
    102. 102. The médialab toolkit
    103. 103. Sciencescape http://tools.medialab.sciences- po.fr/sciencescape/
    104. 104. Sciencescape http://tools.medialab.sciences- po.fr/sciencescape/ Sciencescape is a simple, client-side, javascript tool intended to extract • time-curves • sankey-diagrams • co-occurrence networks from bibliographical notices exported from • ISI Web of Science • Scopus
    105. 105. Sciencescape Journal over time (ANT from Scopus)
    106. 106. Sciencescape Keyword over time (ANT from Scopus)
    107. 107. Sciencescape Authors-Keywords-Journals sankey (ANT from Scopus)
    108. 108. Sciencescape Keywords’ network (ANT from Scopus)
    109. 109. Sciencescape Future developments
    110. 110. Table2Net http://tools.medialab.sciences- po.fr/table2net/
    111. 111. Table2Net http://tools.medialab.sciences- po.fr/table2net/ Table2Net is a generic, client-side, javascript tool intended to extract (Gephi) networks from any data-table The tool is able to produce • mono-partite and bi-partite networks • weighted and non-weighted networks • static and dynamic networks
    112. 112. Table2Net http://tools.medialab.sciences- po.fr/table2net/ Normal Bipartite
    113. 113. Hyphe http://hyphe.medialab.sciences-po.fr/demo/
    114. 114. Hyphe http://hyphe.medialab.sciences-po.fr/demo/ Hyphe is a powerful, server-side tool intended to assist scholars in the building of web corpus Compared to previous tools (issuecrawler, navicrawler) • it allows a more flexible definition of ‘web-entities’ • it implement a semi-automatic semi-manual crawling
    115. 115. Hyphe Flexible definition of ‘web-entities’
    116. 116. Hyphe Flexible definition of ‘web-entities’
    117. 117. Hyphe Semi-automatic semi-manual crawling
    118. 118. Hyphe Semi-automatic semi-manual crawling
    119. 119. Hyphe The future interface (under construction)
    120. 120. ANTA actor-network text analyzer http://jiminy.medialab.sciences-po.fr/anta_dev/
    121. 121. ANTA is an experimental, server-side tool intended to assist scholars in extracting networks of occurrence of noun-phrases in textual corpuses The tool allow to • create a corpus of textual documents • extract noun-phrases from the corpus (entities) • select the more relevant entities • generate a bi-partite network of documents and entities ANTA actor-network text analyzer http://jiminy.medialab.sciences-po.fr/anta_dev/
    122. 122. ANTA http://jiminy.medialab.sciences-po.fr/anta_dev/
    123. 123. ANTA actor-network text analyzer http://jiminy.medialab.sciences-po.fr/anta_dev/
    124. 124. ANTA http://jiminy.medialab.sciences-po.fr/anta_dev/
    125. 125. Venturini, T., Gemenne, F., & Severo, M. (2013). Des Migrants et des Mots. Une analyse numérique des débats médiatiques sur les migrations et l’environnement. In Cultures & Conflits, 88(4). Venturini, T., & Guido, D. (2012). Once Upon a Text : an ANT Tale in Text Analysis. In Sociologica ANTA http://jiminy.medialab.sciences-po.fr/anta_dev/
    126. 126. The médialab toolkit
    127. 127. Gephi https://gephi.github.io/
    128. 128. Gephi https://gephi.github.io/ Gephi is a powerful, stand-alone tool for network analysis Compared to other tools, Gephi • is more user-friendly • translate graph mathematics in visual variables • allows direct network manipulation
    129. 129. Heatgraph From networks to heatmaps
    130. 130. RÉFÉRENCE Turing AM, 1952, Phil. Trans. of the Royal Society of Bio. Sciences INSTITUTION specialisée Blackett Lab. Imperial College MOT CLE Magnetic properties INSTITUTION non-specialisée Ecole Polytechnique de Zurich Heatgraph Ego-centered heatgraphs
    131. 131. Heatgraph http://tools.medialab.sciences-po.fr/heatgraph/
    132. 132. Heatgraph http://tools.medialab.sciences-po.fr/heatgraph/ Severo, M. & Venturini, T. (forthcoming) Intangible Cultural Heritage Webs: Comparing national networks through digital methods. In New Media & Society
    133. 133. http://www.tommasoventurini.it/

    ×