Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How and why study big cultural data


Published on

Lev Manovich.
How and why study big cultural data.

Presentation at Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012.

Published in: Education, Technology
  • Be the first to comment

How and why study big cultural data

  1. 1. How and whystudy big cultural dataLev Manovichmanovich@
  2. 2. New York Times (November 16, 2010):“The next big idea in language, history andthe arts? Data.”NEH/NSF Digging into Data competition(2009): “How does the notion of scaleaffect humanities and social scienceresearch?Now that scholars have access to hugerepositories of digitized data—far morethan they could read in a lifetime—whatdoes that mean for research?”
  3. 3. Why studybig cultural data ?
  4. 4. 1 study societies through the social mediatraces - social computing (but do we studysociety or only social media itself?)2 more inclusive understanding of historyand present (using much larger samples)3 detect large scale cultural patterns4 the best way to follow globalprofessionally produced digital culture;understand new developed cultural fields(“X” design)5 map cultural variability and diversity
  5. 5. Data: 3,724 18th century volumes, using 10,000 most frequent words(excluding proper nouns). Ted Underwood.The Differentiation of Literary and nonliterary diction, 1700-1900.
  6. 6. Growth of a global culture space after 1990:Cumulative number of new art biennales, 1895-2008.Cumulative number of new art biennales, 1895-2008. 6
  7. 7. modern (19th-20th centuries) social andcultural theory: describe what is similar(classes, structures, types) / statistics(reduction)computational humanities and socialscience should focus on describing whatis different / variability / diversitynot “from data to knowledge” but from(incomplete) knowledge to actual culturaldata
  8. 8. We are no longer interested in theconformity of an individual to an ideal type;we are now interested in the relation of anindividual to the other individuals withwhich it interacts... Relations will be moreimportant than categories; functions, whichare variable, will be more important thanpurposes; transitions will be moreimportant than boundaries; sequences willbe more important than hierarchies.Louis Menand on Darvin, 2001.
  9. 9. Visualization: Thinkingwithout “large”categories“The ontological status of assemblages,large and small, is always that of unique,singular individuals.”“Unlike taxonomic essentialism in whichgenus, species and individuals areseparate ontological categories, theontology of assemblages is flat since itcontains nothing but differently scaledindividual singularities.”Manuel DeLanda. A New Philosophy ofSociety.
  10. 10. Bruno Latour:The “whole” is now nothing more than aprovisional visualization which can bemodified and reversed at will, by movingback to the individual components, andthen looking for yet other tools toregroup the same elements intoalternative assemblages.
  11. 11. How to studybig cultural data ?how to explore massive visual collections(exploratory media analysis)?which data analysis and visualizationtechniques are appropriate for non-technical users? How to democratize dataanalysis?
  12. 12. Our approach:media visualization(visualizing mediadirectly rather than onlyusing abstract infovislanguage)
  13. 13. visualizing large non-visual data using abstraction
  14. 14. media visualization: showing visual datadirectlyEvery cover of Times magazine, 1923-2009 (4535 images).X-axis = publication date. Y-axis = saturation mean.
  15. 15. our media visualization software o287 megapixel display(image: 1 million manga pages)
  16. 16. our software on new display wallwith thin bezels (data: 4535 Timemagazine covers)
  17. 17. Our methods:1. media visualization using existingmetadata - show complete collection2. media visualization using existingmetadata - use samples to better revealpatterns3. digital image processing + mediavisualization (use simple image featureswhich have direct perceptual meaning -and gradually introduce humanists toimage processing)
  18. 18. 1. media visualization / existing metadata:montage
  19. 19. 2. media visualization / existing metadata /sample
  20. 20. 3. digital image processing + mediavisualizationImage plots of selected paintings by six impressionist artists.X-axis = mean saturation. Y-axis = median hue.Megan O’Rourke, 2012.
  21. 21. Advantages:replacing discretecategorieswith continuos attributes
  22. 22. 1. from timelines to curves2. better represent analogcultural attributes3. understand cultural landscapes(fuzzy / overlapping / hardclusters?)4. visualize cultural variability5. discover new gropings
  23. 23. 1. from timelines to curves
  24. 24. 2. better represent analog attributes
  25. 25. 3. our maps of cultural landscapes revealfuzzy/overlapping clusters - rather than discretecategories with hard boundaries
  26. 26. 4. visualize cultural variability
  27. 27. 5. discover new grouping
  28. 28. Studying large culturaldata challenges ourexisting theoreticalconcepts andassumptionsexample: what is “style”?
  29. 29. one million manga pages
  30. 30. single short manga series (>1000 pages)
  31. 31. 776 Vincent van Gogh paintings
  32. 32. Selected currentprojects:7000 year old stone arrowheads(with UCSD anthropologist and CS postdocat University of Washington)comparing Art Now & Graphic design Flickrgroups (340,000 images)(with CS collaborator from LaurenceBerkeley National Laboratory)One million images (+ metadata) fromdeviantArt (with an art historian / DHcollaborator from Netherlands Academy ofArts and Sciences)
  33. 33. 4.7 million newspaper pages from Libraryof Congress (UCSD undergraduatestudents)virtual world / game analytics (NSF Eager,with UCSD Experimental Games Lab)SEASR tools and workflows for workingwith image and video data (with NCSA atUniversity of Illinois, Urbana-Champaign)
  34. 34. Conclusion:Computational humanities
  35. 35. “The capacity to collect and analyze massive amounts of data hastransformed such fields as biology and physics. But the emergence of adata-driven computational social science has been much slower. Leadingjournals in economics, sociology, and political science show little evidenceof this field. But computational social science is occurring in Internetcompanies such as Google and Yahoo, and in government agencies such asthe U.S. National Security Agency.”“Computational Social Science.” Science, vol. 323, no. 6, February 2009.Digital humanities:scholars are mostly working with the archives of digitized historical culturalarchives which were created by libraries and universities with the fundingfrom NEH and other institutions.
  36. 36. Computationalhumanities:Analyzing massive amounts of cultural content and and peoplesconversations, opinions, and cultural activities online - personal andprofessional web sites, general and specialized social media networks andsites. This data offers us unprecedented opportunities to understand culturalprocesses and their dynamics and develop new concepts and models whichcan be also used to better understand the past.Current players in computational humanities:- Google, Facebook, YouTube, Blue Fin Lans, Echonest, and many othercompanies which analyze social media signals (blogs, Twitter, etc.) and thecontent of media on social networks.- Computer scientists who are working with this data.
  37. 37. manovich@
  38. 38. Appendix:visualizing videocollectionsuse media visualization with a set ofkeyframesautomatic selection of key frames (forexample, using free shot detectionsoftware)