The Politics of Big Data The Digital Future of Marxism SAMLA 2011
Digital Humanities
What is Big Data?
- Patricia Cohen The New York Times, 11-16-2010 'A history of the humanities in the 20th century could be chronicled in  “...
- Patricia Cohen The New York Times, 11-16-2010 The next big idea in language, history and the arts?    Data.
- Patricia Cohen The New York Times, 11-16-2010 'Members of a new generation of digitally savvy humanists argue it is time...
"philosophers don't use data."
There is no "pure" data.  It has a context within (and is produced by) ideological struggles. 
Case 1: Math for Artists: N-Grams and Distance Reading
"distance is … not an obstacle, but a specific form of knowledge: fewer elements, hence a sharper sense of the overal...
- Franco Moretti,  Graphs, Maps, Trees
Google's N-Gram Viewer
Case 2: Digital Harlem:  The Political Economy of Data
Digital Harlem: Everyday Life 1915-1930
Case 3:   GapMinder:  The Political Life of Facts
Hans Rosling's Gapminder
Conclusion
Upcoming SlideShare
Loading in...5
×

The politics of_big_data

176

Published on

This is a talk I gave at SAMLA this year. I was on the Digital Future of Marxism panel, chaired by Walter Kalaidjian. My panel partners were Vincente Rubio, Anthony Cooke and Derek Woods.

Published in: News & Politics, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
176
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The Politics of Big Data Introduction: Over the past two years, the idea of digital scholarship and, in particular digital humanities, has attracted a lot of attention.  The tweediest of the academic conferences - the MLA and the AHA - have each added numerous panels focusing on digital scholarship much to the delight of some technophiles and to the consternation of some traditionalists.  Indeed, this very panel seems to be part of this story of increasing interest in all things digital. I am not complaining.  Trend or not, I owe my job to all of this excitement.  As the digital scholarship coordinator in the library at Emory, I work with faculty and grad students who are trying to take advantage of emerging technology.  Before that, I became interested in what the digital can bring to scholarship because of my engagement with Marxist thought which encouraged me to look for ways to make scholarship socially engaged outside the academy.  Though not entirely unproblematic, digital technology does allow scholars to speak in a direct and timely manner. However, while I am very excited about the potential of many of the tools that are now available, I am also aware of the dangers.
  • But terms like “digital scholarship” and “digital humanities” cover an awful lot of ground; some of it terribly complex and futuristic; some of it, almost mundane.  So, just to keep the scope of this talk somewhat manageable, let me mark off some limits.  In this talk, I will not be talking about the following things:   organizing new social movements with social media online publishing creating digital archives Not that those things are not important; but for this talk I want to focus on Big Data, a new-ish and relatively exciting trend in digital humanities, and try to suss out some of the challenges this work presents to us as critically aware and socially engaged scholars.
  • What is Big Data? This imprecise, almost flippant, term refers to the fact that humanities scholars are beginning to figure out how to use digital information - often very large piles of it - to answer question they had never been able to answer before or represent research that would have been impossible, or at least very difficult, to show before.
  • Big Data (and digital humanities more generally) had its coming out party in the pages of the New York Times.  In December of 2010, Patrician Cohen wrote: “ A history of the humanities in the 20th century could be chronicled in “isms” — formalism, Freudianism, structuralism, postcolonialism — grand intellectual cathedrals from which assorted interpretations of literature, politics and culture spread.
  • The next big idea in language, history and the arts? Data.
  • Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have.” No matter what you think about digital humanities, there is a lot that is uncomfortable about that statement.
  • One, many humanities scholars are not a comfortable with the idea of data.     At a recent meeting to discuss digital scholarship, a professor from the philosophy department stated very flatly that “Philosophers don’t use data.”     We like to think that what we do cannot be quantified so easily.  That may be fine for the sciences but there is no place for it in the humanities.
  • Two, data is political.  It has a context and is embedded in and produced by struggles for power and meaning.  A turn toward data should not mean an end to ideology; rather our study of how ideology functions needs to include the data that is deployed in arguments - both ours and those of others. It is my contention that data as a tool of humanities scholarship need not be seen as a great threat that will hollow out or overly simplify humanities work. On the other hand, it is important to be mindful of what data is and is not, who produces it and where it comes from.   To keep this from becoming too abstract, I want to focus on a few examples of this data driven scholarship.  I hope these examples will both give you an sense of the possibilities but also an appreciation for the challenge of digital scholarship.
  • Case 1: Math for artists: N-Grams and Distance Reading Professor of English and Comparative Literature at Stanford, Franco Moretti, describes what he playfully calls “Distance Reading” in his book Graphs, Maps, Trees.
  • He says it is an approach to literary research “where distance is … not an obstacle, but a specific form of knowledge: fewer elements, hence a sharper sense of the overall interconnection” (1). To perform distance reading, Moretti steps back from an individual novel, back from the cannon, back even from a national bibliography to ask about the the novel in general. 
  • In this graph we can see his argument that “the rise of the novel” is a common occurrence in the literary history of a nation. As Moretti points out, this kind of work can be challenging because “it takes forever to gather data” (5).  However, new tools are emerging that significantly lower the barrier to entry.
  • Last year - shortly after the New York Times piece came pout - Google released a tool called n-grams which allows anyone with an Internet connection the ability to instantly perform distant reading on millions of books. [at this point I referenced the analysis of American and British spellings I did a few months ago. You can see it elsewhere in this blog.] This is a powerful tool but humanist need to be careful about how to use it.  Quantitative questions are new for many of us and we need to educate ourselves about how to do it.  For example, how much data counts as interesting? What is a statistically significant number?  What is “smoothing”? Also, instead of “search lots of books” wouldn ‘ t it be more honest so say, “search lots of books by by privilege European men”?  Not a terribly insightful criticism but the idea of searching millions of books has a tendency to mystify some people.
  • Case 2: The Political Economy of Data: Digital Harlem Creating digital maps has become a really popular way to use digital tools to control large amounts of data.  If you are asking a spatial question, sometimes a map can help you see the answer.  It can also help you present important though strikingly dull data in an engaging way.
  • The Digital Harlem project is a good example. This project uses a map to present all kinds of data about Harlem with the goal of creating a full picture of everyday life in the neighborhood. ## site demonstration ## Unfortunately, it can give you a strange impression of everyday life in because of where the data comes from.  Most of the data that has so far been entered into the site has come from police records.  Not only does this created a skewed view of what everyday life was like in Harlem but it also leaves the authority of this data unquestioned.  Crime statistics are not neutral.  Crime statistics taken from a predominantly black neighborhood between the years 1915 and 1930 are not only not neutral but must be treated with extreme skepticism.  I suspect that they tell us more about policing blackness than they do about everyday life.  Its one thing to deal with primary sources but once they become abstracted as raw data there is a danger of loosing their context.
  • Case 3: GapMinder: The Political Life of Facts Hans Rosling is a medical doctor from Sweden who has become kind of famous for a series of Ted Talks he has presented over the years.  He is the founder of the Gapminder Foundation which developed a piece of software called “Trendalyzer” which seeks , according to the website, “to unveil the beauty of statistical time series by converting boring numbers into enjoyable, animated and interactive graphics” (http://www.gapminder.org/about-gapminder/our-mission/)
  • Essentially, Rosling uses Trendalyzer to visualize public health data and his presentations are really striking.  Gapminder ’s motto is “For A Fact Based World-View” and much of the rhetoric deployed by is about getting past political conflict that stand in the way of rational thinking. ## demonstrate Trendalyzer ## This is a very complex tool that allows for some surprisingly sophisticated analyses.  However, it is hard to avoid the simple telelology of development implied by Trendalyzer.  All countries inevitably follow the same path toward prosperity defined by greater earning power. These visualizations also do little to help user sort out what is cause and what is correlation.  For example, is lower infant mortality the result of higher income or are they both the result of something else?   The problem is that facts do not exist in a vaccum.  They have a political life and can easily be pressed into service of any number of narratives.  Rosling’s visualizations do exactly what they say they will; they make boring numbers simple.  However, the stories they tell are not simple.  It seems to me that complex public health issues are inevitably caught up in local, national and global political struggles.  Yes, there is clearly a connection between ability to access health care and wellness.  Was that actually in question?  Why are some people unable to access the health care they need and what can be done about it.   I’m not sure sure how to rationally diagram those answers.
  • I have been critical of these projects but please don ’t take this as a critique of data driven humanities projects.  The existence of so much digital data and tools that allow us to dig into it presents amazing opportunities for scholars.  However, in order to take advantage of it, humanities scholars will need to learn some new skills.  Some will be technical skills and some will analysis skills.  Another option is to work collaboratively with other kinds of experts (which, in and of itself is a skill.) And I think that the kinds of scholars who are willing to come to a talk at 8:30 on Saturday morning about the digital future of Marxism are probably the scholars who would be most interested in the power of these tools to break out of the academic echo chamber in which we spend so much time.  There are challenges to the using these tools responsibly but the benefits are probably worth it.
  • The politics of_big_data

    1. 1. The Politics of Big Data The Digital Future of Marxism SAMLA 2011
    2. 2. Digital Humanities
    3. 3. What is Big Data?
    4. 4. - Patricia Cohen The New York Times, 11-16-2010 'A history of the humanities in the 20th century could be chronicled in “isms” — formalism, Freudianism, structuralism, postcolonialism — grand intellectual cathedrals from which assorted interpretations of literature, politics and culture spread.'
    5. 5. - Patricia Cohen The New York Times, 11-16-2010 The next big idea in language, history and the arts?    Data.
    6. 6. - Patricia Cohen The New York Times, 11-16-2010 'Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism" and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have.'
    7. 7. "philosophers don't use data."
    8. 8. There is no "pure" data.  It has a context within (and is produced by) ideological struggles. 
    9. 9. Case 1: Math for Artists: N-Grams and Distance Reading
    10. 10. "distance is … not an obstacle, but a specific form of knowledge: fewer elements, hence a sharper sense of the overall interconnection ”    - Franco Moretti, Graphs, Maps, Trees  
    11. 11. - Franco Moretti, Graphs, Maps, Trees
    12. 12. Google's N-Gram Viewer
    13. 13. Case 2: Digital Harlem:  The Political Economy of Data
    14. 14. Digital Harlem: Everyday Life 1915-1930
    15. 15. Case 3:   GapMinder:  The Political Life of Facts
    16. 16. Hans Rosling's Gapminder
    17. 17. Conclusion
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×