Sprinting with Data


Published on

Sprinting with Data

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

  • Murder on the Orient Express
    Sidney Lumet, 1974 adaptation of Agatha Christie's novel, in which Albert Finney's Hercule Poirot solves the murder of a wealthy American aboard the titular train
  • Adaptation is delayed, because it is turned to the future. Unlike the quarrels on the existence of global warming and its link with human activities, the debate on adaptation does not concern how we have messed the climate (in the past) or what we can do to fix it (in the present). It concerns what we will do when the effects of climate change will eventually hit us. And, though we know that such effects will be massive, we don’t know exactly what they will look like. Not only because climate is an amazingly complex (in fact quasi-chaotic) system, but because its change is due to the interference of social systems whose evolutions are even more unpredictable. As scenarios replace predictions, different actors choose the futures that they like the best or fear the most. Controversies, therefore, concern not only how should we prepare for the future, but also what future should we prepare for.
  • Adaptation is dispersed, because it concerns the localization of the effects of global warming. Unlike the discussion on the mitigation of climate change, the debate on adaptation does not concern global tendencies but local effects. Unlike the causes of climate change (the greenhouse gases that mix and spread in the atmosphere), its effects are highly localized. Adaptation is not concerned about the transformation of the global climate system, but about such transformation will trickle down to local microclimates. And even if we are still largely incapable to down scale our models, we know that global warming will strike differently in various parts of the world, unsettling different natural and social sectors and often in opposite directions. Though it derives from an international tragedy of the commons (mitigation can only be multilateral), adaptation can only be tackled at a national or even local level.
  • Adaptation is diffused, because it concerns the transformation of pre-existing sectors. By definition adaptation concerns the re-organization of non-climatic systems. It is the adaption of agricultural practices, of infrastructures, of rural and urban settlements, of industrial installations. Adaptation discussions, therefore are always mixed with the pre-existing socio-technical debates. Unlike the quarrels on the existence and the mitigation of climate change, adaptation does not create a new arena, but displace existing socio-technical debates. When a rich country help a poorer one to set up a new water distribution system, is this climate adaptation, or development aid, or humanitarian intervention?
  • Adaptation is opportunistic, because it mobilizes an increasing amount of human and financial resources. If IPCC projections are right, the efforts to deal with climate will consume a greater proportion of our social and economic resources in the near to long term future. Social and institutional actors know this all too well and use climate adaptation as leverage to advance other interests and other interests as leverage to advance climate adaptation. They hide adaptation under other policies and other policies under adaptation. They strive to increase their present resources as much possible while reducing their future margins of action as little as possible.
  • The heterogeneity of the actors involved. Hackathons and barcamps are generally organized as to be open to many different types of actors. In part, this comes from the need to achieve deliverable results at the end of the event, which demands to collect all the necessary competences to go through all the phases of the project. In developing marathons, this translates in having experts from the entire programming stack: from setting up the server infrastructure, to designing the wireframes, from scraping the data to implementing the front-end. The push for heterogeneity also derives from the necessity to exchange with the potential end users of the projects, who should be at hand during the developing dash.
    The effort to convene participants physically. The unity of time and place that characterize hackathons and barcamps seemed an appropriate counterbalance to the dispersion of research efforts often observed in international and interdisciplinary projects. One of the problems of working across disciplines is that the experts of one field have a blurred appreciation of what experts of other fields may need as an input for their work. Such misunderstandings are normal in interdisciplinary projects and can become disastrous if discovered too late – a risk particularly salient for international projects. Yes, technologies for distant cooperative work can ease up some of these difficulties, but nothing facilitates mutual supervision and speed up collaboration more than direct presence. One more time, ‘digital’ turned out to be opposite to ‘virtual’. Exploiting digital inscriptions demands to coordinate the efforts of many different disciplines and this in turn demands to convene in the same space and time.
    The “quick and dirty” (or “design to cost”) approach. Though thriving on the same increase in the availability of digital inscriptions, hackathons and barcamps are somewhat opposite to ‘big data’ approaches. The short and intensive nature of these events shields them from the dream of exhaustivity often associated with ‘big data’. Participants know that they will only able to treat a limited amount of digital traces and that they will achieve imperfect results, but they accept such constraints more as a challenge than as a weakness. Making the most out of light infrastructures, simple logistics and agile organization methods, participants are well aware that their work should hack code and information gathered in earlier projects and that their outcomes should become the bases for further ventures. Not only hackathons and barcamps foster iterations within their meetings, but they are explicitly conceived as intermediary steps of a larger developing cycle.
  • Data sprints are always preceded by a long and intense work of preparation. When participants meet up, most of the research infrastructure should have already been collected and prepared for treatment. Time-consuming operations such as data cleansing or infrastructures setting-up should be accomplished beforehand, so that the days of the sprint can be dedicated entirely to the operations that require a more direct collaboration. Also participation to data-sprints is not open: sprinting lineup and team formation is taken care in advance, to make sure that the working groups contains all the competences needed to achieve significant results.
    Data-sprints are also generally longer and more structured than their antecedents. While hackathons and barcamps are usually organized on two or three days, sprints work better when they extend over a full working week.
    Finally, data-sprints require a greater follow-up than hackathons and barcamps. The ‘quick end dirty’ approach that characterizes the five days of a sprint should be complemented by an extensive work of refinement and documentation, in order to endow the results with the precision and robustness demanded by scientific research.
  • 1) Posing research questions. Research questions are posed on the first day of the sprint by the invited issue experts. Besides suggesting a number of research questions, issue experts are also invited to help the other participants (most of whom have little previous knowledge of the issue at stake) to get grips with the topic of the meeting. This can be done through Q&A sessions or panel discussions, but it also (and often more fruitfully) through informal consultations as part of the running feedback on data visualizations.
    2) Operationalizing research questions into feasible digital methods projects. In a sense, this process begins already before the sprint where the organizers try to anticipate what type of projects the sprint might led to. We found that an excellent way of doing this initial vetting is to ask issue experts to suggest interesting datasets. This provides a chance to get back to the experts explaining why the proposed dataset may be unsuitable for certain research questions and thus getting them attuned to what a digital methods project can and cannot achieve.
    3) Procuring and preparing datasets. As mentioned above, while it is desirable to have datasets available in advance, this is sometime at odds with the agility of the sprint and it is not uncommon that complementary data have to be search and collected in the first days of the sprint.
    4) Writing and adapting code. Sprints are issue-specific (meant to address particular needs of the controversy actors) and their aim is less to develop generic tools than to adapt existing code to the research questions rose by the issue experts. This does not mean, however, that effort shouldn’t be invested in making datasets, scripts and visualizations re-usable beyond their original projects. Sprints should remain faithful to their communitarian roots and ensure that all the data, code and contents produced are liberated through of open-source, copy-left and open-publishing licenses.
    5) Designing data visualizations and interface. One of the driving forces of sprints is that they deliver tangible outcomes. These outcomes may have different forms, but they always share the characteristic of being directly usable by actors of the controversy. In many cases, this translates with the issue experts leaving the sprints with tangible results that they immediately mobilize in their debates.
    6) Eliciting engagement and co-production of knowledge. Data-sprints abide by the ‘co-production of knowledge model’ of social sciences advocated for by Callon, Lascoumes, & Barthe (1999). Such approach assumes that scientific activities should be pursued in a constant and genuine dialogue with their publics. If data-sprints are organized according to the five phases described above, it is distinctively for this final phase to be achieved. If sprints fail in creating a common space for social scientists and social actors, they will have failed in all other respects as well.
  • The issue experts/alpha users: Regardless of the subject matter of the sprint, the first order of business is always to formulate research questions. This is done with the help of people that have something at stake in the topic of the sprint (either because they are affected by it, produce knowledge about it, intervene politically in it, or a combination of the above). They are at once the issue experts, who are able to deploy their matters of concern for the rest of the participants, and the alpha users, who will be able to provide feedback and commentary on the intermediary results. The selection of these issue experts/alpha users does not presume to be representative (as is for example the case in citizen conferences) but is driven by the research collective's need to acquire stakes in the controversy.
    The developers: Sprints are supposed to be agile. They must be able to adapt not only to what the issue experts/alpha users bring to the table, but to what the research collective as a whole make of these contributions. The one asset that more than anything ensures this agility (or hampers it if neglected) is developers. Successful sprints are fundamentally anathema to the idea that development needs can be fully anticipated much less serviced in advance. The job of the developers is both to adapt tools and scripts for particular analysis needs, harvest missing data, and help the designers build applications for exploring the datasets.
    The project managers: Research questions must be asked in such a way that they are both feasible with the available digital methods and pertinent to the concerns of the issue experts. This requires a translational competence. Project managers must be sufficiently knowledgeable of the controversy to understand the questions posed by the issue experts and sufficiently adept with digital methods to see the potentials and constraints flagged by designers and developers. The project managers become, in a sense, the stewards of the alpha users, helping them to express their need and to lake sense of the results produced.
    The designers: Competences in information visualizations are mobilized throughout the sprint process. Visualizations are essential to facilitate the iterative exploratory data analysis (Tukey, 1977) on which the sprint is based. They are also crucial for making sure that the results of the sprint are delivered in a form that makes them largely available to controversy publics (Latour, 2005b).
    The sprint organizers: Besides making the necessary practical arrangements for the sprint to take place (booking rooms and accommodation, organizing food, distributing programs and practical info, etc.) the organizers play a key role in the preparatory phase leading up to the sprint. The most obvious occasion for this is the decision on the overall sprint theme. Although sprints should be agile enough to accommodate evolving research questions, thematic framing is necessary for an effective mobilization of issue experts. Such experts are likely to be dedicated people with busy agendas. It falls to the sprint organizers to provide them with an incentive by giving them a sense of what their stake in the sprint could be. Thematic framing is also necessary for pre-collecting datasets. Again, although sprints should be agile enough to accommodate the harvesting of new datasets, the organizers should do what they can to anticipate relevant datasets.
  • Sprinting with Data

    1. 1. Sprinting With Data Tommaso Venturini tommaso.venturini@kcl.ac.uk tommasoventurini.it University of Bath, Institute for Policy Research, 14 September, 2016 ‘Evidence and the Politics of Policymaking: where next?’
    2. 2. Thinking in the presence of the victim "The critic is not the one who debunks, but the one who assembles. The critic is not the one who lifts the rugs from under the feet of the naive believers, but the one who offers the participants arenas in which to gather" (p. 246) Bruno Latour, 2004 Why has critique run out of steam? From matters of fact to matters of concern Critical inquiry, 30(2), 225-248 "We don’t know what a researcher who today affirms the legitimacy or even the necessity of experiments on animals is capable of becoming in an oikos that demands that he or she think “in the presence of” the victims of his or her decision. Of importance is the fact that an eventual becoming will be the researcher’s own becoming” (p.997) Isabelle Stengers, 2005 The cosmopolitical proposal In Making things public Atmospheres of democracy, 994.
    3. 3. But who is the victim?
    4. 4. EMAPS (EU project 2011-2014) Mapping Climate Change Adaptation Venturini, Tommaso. 2010. “Diving in Magma: How to Explore Controversies with Actor-Network Theory.” Public Understanding of Science 19(3): 258–73. Venturini, Tommaso. 2012. “Building on Faults: How to Represent Controversies with Digital Methods.” Public Understanding of Science 21(7): 796–812. Venturini, Tommaso et al. 2014. “Climaps by Emaps in 2 Pages (A Summary for Policy Makers and Busy People in General).” Social Science Research Network (ID 2532946).
    5. 5. Adaptation is delayed A first set of disagreements concerns ou images of the future: - how bad will be climate change; - how fast will it unfold; - where and who will it strike first. • http://climaps.eu/#/map/mapping-cli-fi- scenarios-book-covers-with- landscapes-issues-and-personal- narratives
    6. 6. Adaptation is dispersed A second set of disagreements concerns the priorities of adaptation: - which regions will be more vulnerable - which sectors will be more affected; - which arrangements will make our societies more flexible or resistant. http://climaps.eu#!/map/who-is-vulnerable-according-to-whom
    7. 7. Adaptation is diffused http://climaps.eu#!/map/distribution-of- adaptation-funds-by-risk-and-adaptation- strategy-in-bangladesh A third set of disagreements concerns the boundaries of adaptation: - how will global warming will interfere with natural and social equilibria; - whether adaptation generates additional actions or merely re-labelling; - whether previous problems and opportunities are taken into account.
    8. 8. Adaptation is opportunistic http://climaps.eu#!/map/the-rise-of-adaptation-funding A fourth set of disagreements concerns therefore the wealth of adaptation: - who will provide resources for adaptation and who will use them; - through which channels will these resources flow; - who will decide how to employ them and who will assess the results.
    9. 9. The conundrum of controversy mapping Had we had a clear view of how the adaptation debate was organized, we could have sampled its actors or reach for the most relevant ones. But the fluidity of the adaptation debate offered no clear landmarks for navigation. We were trapped in a vicious circle: since we had no informants, we could not improve our understanding of the controversy and, since we had only a vague appreciation of the debate, we did not know whom to engage with. We were lost because isolated, and isolated because lost. Venturini, T., Munk, A., & Meunier, A. (2016). Data-Sprint: a Public Approach to Digital Research. (C. Lury, P. Clough, M. Michael, R. Fensham, S. Lammes, A. Last, & E. Uprichard, Eds.) Interdisciplinary Research Methods (forthcoming).
    10. 10. A classic (but unrealistic) workflow
    11. 11. A more realistic workflow
    12. 12. And an even more realistic one
    13. 13. The agile shift and hackathons and barcamps
    14. 14. hackathons and barcamps
    15. 15. 3 things we like about hackathons and barcamps 1. The heterogeneity of the actors involved (“full-stack” competencies) 2. The effort to convene participants physically 3. The “quick and dirty” approach (or “design to cost”)
    16. 16. 3 things we changed about hackathons and barcamps 1. Data sprints are preceded by a long and intense work of preparation 2. Data-sprints are also generally longer and more structured than their antecedents 3. Data-sprints require a greater follow-up
    17. 17. Enters the data sprint (Paris)
    18. 18. Enters the data sprint (Oxford)
    19. 19. Enters the data sprint (Milan)
    20. 20. What’s in a data sprint 1. Posing research questions 2. Operationalizing research questions into feasible digital methods projects 3. Procuring and preparing datasets 4. Writing and adapting code 5. Designing data visualizations and interface 6. Eliciting engagement and co-production of knowledge
    21. 21. Who’s in a data sprint • The issue experts/alpha users • The developers • The project managers • The designers • The sprint organizers
    22. 22. Tommaso Venturini Venturini, Tommaso, Anders Munk, and Axel Meunier. 2016. “Data-Sprint: A Public Approach to Digital Research” In Interdisciplinary Research Methods (forthcoming) eds. Celia Lury et al. Munk, Anders Kristian, Axel Meunier, and Tommaso Venturini. 2016. “Data Sprints: A Collaborative Format in Digital Controversy Mapping” In DigitalSTS: A Handbook and Fieldguide (Forthcoming), eds. David Ribes and Janet Vertesi. Venturini, Tommaso et al. 2015. “Designing Controversies and Their Publics” Design Issues 31(3): 74–87. tommasoventurini.it