How and why study big cultural data


Published on

Lev Manovich.
How and why study big cultural data.

Presentation at Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 1 The exponential growth of a number of both non-professional and professional media producers over the last decade has created a fundamentally new cultural situation and a challenge to our normal ways of tracking and studying culture. Hundreds of millions of people are routinely creating and sharing cultural content - blogs, photos, videos, online comments and discussions, and so on. 2 The rapid growth of professional educational and cultural institutions in many newly globalized countries along with the instant availability of cultural news over the web and ubiquity of media and design software has also dramatically increased the number of culture professionals who participate in global cultural production and discussions.
  • In summary, the availability of large digitized collections of humanities data certainly creates the case for humanists to use computational tools. However, the rise of social media and globalization of professional cultures leave us no other choice. But how can we explore patterns and relations in sets of photographs, designs, or video, which may number in hundreds of thousands, millions, or billions? (FB: 7 billion photos uploaded per month.)
  • We are situated inside Calit2 which is working on creating next generation cyberinfrastructure: grid computing, super high resolution displays, optical networks which support real-time uncompressed streaming of 4K cinema and 4K teleconferencing
  • ( can instead show my Time covers animations video ) Images are unique for big data analytics. Other media types such as music and text take time to process. If we display many of visualization, it does not work and we have to resort to information visualization. However, with images, a user can see many patterns instantly. at this point, show the true scale of the image in Photoshop
  • illustration: use of simple perceptually meaningful image features
  • 3 Instead of starting with labels (genres, styles, authors) (supervised machine learning), map cultural landscapes (and their evolution) using content properties may or may not find clusters discover many new groupings we did not think of before
  • 1. from timelines to curves: normally a book or an exhibition divides artist work into discrete periods visualization allows us to study the gradual changes, and it may reveal that there are no discrete categories
  • 2. better represent analog dimensions film scholars describe motion in a shot using only half a dozen categories; image analysis + visualization allows us to map “amount of visual change” as a continuos value and discover patterns which were hidden by the use of categories
  • fuzzy overlapping clusters
  • map space of variations
  • discover new groupings
  • How and why study big cultural data

    1. 1. How and whystudy big cultural dataLev Manovichmanovich@
    2. 2. New York Times (November 16, 2010):“The next big idea in language, history andthe arts? Data.”NEH/NSF Digging into Data competition(2009): “How does the notion of scaleaffect humanities and social scienceresearch?Now that scholars have access to hugerepositories of digitized data—far morethan they could read in a lifetime—whatdoes that mean for research?”
    3. 3. Why studybig cultural data ?
    4. 4. 1 study societies through the social mediatraces - social computing (but do we studysociety or only social media itself?)2 more inclusive understanding of historyand present (using much larger samples)3 detect large scale cultural patterns4 the best way to follow globalprofessionally produced digital culture;understand new developed cultural fields(“X” design)5 map cultural variability and diversity
    5. 5. Data: 3,724 18th century volumes, using 10,000 most frequent words(excluding proper nouns). Ted Underwood.The Differentiation of Literary and nonliterary diction, 1700-1900.
    6. 6. Growth of a global culture space after 1990:Cumulative number of new art biennales, 1895-2008.Cumulative number of new art biennales, 1895-2008. 6
    7. 7. modern (19th-20th centuries) social andcultural theory: describe what is similar(classes, structures, types) / statistics(reduction)computational humanities and socialscience should focus on describing whatis different / variability / diversitynot “from data to knowledge” but from(incomplete) knowledge to actual culturaldata
    8. 8. We are no longer interested in theconformity of an individual to an ideal type;we are now interested in the relation of anindividual to the other individuals withwhich it interacts... Relations will be moreimportant than categories; functions, whichare variable, will be more important thanpurposes; transitions will be moreimportant than boundaries; sequences willbe more important than hierarchies.Louis Menand on Darvin, 2001.
    9. 9. Visualization: Thinkingwithout “large”categories“The ontological status of assemblages,large and small, is always that of unique,singular individuals.”“Unlike taxonomic essentialism in whichgenus, species and individuals areseparate ontological categories, theontology of assemblages is flat since itcontains nothing but differently scaledindividual singularities.”Manuel DeLanda. A New Philosophy ofSociety.
    10. 10. Bruno Latour:The “whole” is now nothing more than aprovisional visualization which can bemodified and reversed at will, by movingback to the individual components, andthen looking for yet other tools toregroup the same elements intoalternative assemblages.
    11. 11. How to studybig cultural data ?how to explore massive visual collections(exploratory media analysis)?which data analysis and visualizationtechniques are appropriate for non-technical users? How to democratize dataanalysis?
    12. 12. Our approach:media visualization(visualizing mediadirectly rather than onlyusing abstract infovislanguage)
    13. 13. visualizing large non-visual data using abstraction
    14. 14. media visualization: showing visual datadirectlyEvery cover of Times magazine, 1923-2009 (4535 images).X-axis = publication date. Y-axis = saturation mean.
    15. 15. our media visualization software o287 megapixel display(image: 1 million manga pages)
    16. 16. our software on new display wallwith thin bezels (data: 4535 Timemagazine covers)
    17. 17. Our methods:1. media visualization using existingmetadata - show complete collection2. media visualization using existingmetadata - use samples to better revealpatterns3. digital image processing + mediavisualization (use simple image featureswhich have direct perceptual meaning -and gradually introduce humanists toimage processing)
    18. 18. 1. media visualization / existing metadata:montage
    19. 19. 2. media visualization / existing metadata /sample
    20. 20. 3. digital image processing + mediavisualizationImage plots of selected paintings by six impressionist artists.X-axis = mean saturation. Y-axis = median hue.Megan O’Rourke, 2012.
    21. 21. Advantages:replacing discretecategorieswith continuos attributes
    22. 22. 1. from timelines to curves2. better represent analogcultural attributes3. understand cultural landscapes(fuzzy / overlapping / hardclusters?)4. visualize cultural variability5. discover new gropings
    23. 23. 1. from timelines to curves
    24. 24. 2. better represent analog attributes
    25. 25. 3. our maps of cultural landscapes revealfuzzy/overlapping clusters - rather than discretecategories with hard boundaries
    26. 26. 4. visualize cultural variability
    27. 27. 5. discover new grouping
    28. 28. Studying large culturaldata challenges ourexisting theoreticalconcepts andassumptionsexample: what is “style”?
    29. 29. one million manga pages
    30. 30. single short manga series (>1000 pages)
    31. 31. 776 Vincent van Gogh paintings
    32. 32. Selected currentprojects:7000 year old stone arrowheads(with UCSD anthropologist and CS postdocat University of Washington)comparing Art Now & Graphic design Flickrgroups (340,000 images)(with CS collaborator from LaurenceBerkeley National Laboratory)One million images (+ metadata) fromdeviantArt (with an art historian / DHcollaborator from Netherlands Academy ofArts and Sciences)
    33. 33. 4.7 million newspaper pages from Libraryof Congress (UCSD undergraduatestudents)virtual world / game analytics (NSF Eager,with UCSD Experimental Games Lab)SEASR tools and workflows for workingwith image and video data (with NCSA atUniversity of Illinois, Urbana-Champaign)
    34. 34. Conclusion:Computational humanities
    35. 35. “The capacity to collect and analyze massive amounts of data hastransformed such fields as biology and physics. But the emergence of adata-driven computational social science has been much slower. Leadingjournals in economics, sociology, and political science show little evidenceof this field. But computational social science is occurring in Internetcompanies such as Google and Yahoo, and in government agencies such asthe U.S. National Security Agency.”“Computational Social Science.” Science, vol. 323, no. 6, February 2009.Digital humanities:scholars are mostly working with the archives of digitized historical culturalarchives which were created by libraries and universities with the fundingfrom NEH and other institutions.
    36. 36. Computationalhumanities:Analyzing massive amounts of cultural content and and peoplesconversations, opinions, and cultural activities online - personal andprofessional web sites, general and specialized social media networks andsites. This data offers us unprecedented opportunities to understand culturalprocesses and their dynamics and develop new concepts and models whichcan be also used to better understand the past.Current players in computational humanities:- Google, Facebook, YouTube, Blue Fin Lans, Echonest, and many othercompanies which analyze social media signals (blogs, Twitter, etc.) and thecontent of media on social networks.- Computer scientists who are working with this data.
    37. 37. manovich@
    38. 38. Appendix:visualizing videocollectionsuse media visualization with a set ofkeyframesautomatic selection of key frames (forexample, using free shot detectionsoftware)