London 2013 rgs_arrowsmith

  • 74 views
Uploaded on

Visualising large cinematic data

Visualising large cinematic data

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
74
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Thanks James. I’d just like to start by acknowledging my co-authors of this presentation – Deb Verhoeven who is a media/film expert and cinema historian – Deb and I have worked together for probably the past 6-7 years - and Alwyn Davidson who was one of my past PhD students who is working with us as a researcher on this project. Two quite different disciplines – blends “QUALITATIVE” with “QUANTITATIVE”.
    My presentation will take you through a project is being funded through the Australian Research Council (ARC) aimed at trying to understand the spatial patterns of film diffusion throughout the world. The project is still in its development phase. But what I want to run through today, is some of the methods we’ve used to analyse and visualise film movement and cinema venue activity that may prove useful in understanding these film movements that have been collected and are stored as “Big Data”.
  • We’ve called the project “Kinomatics” – from the Russian pronunciation of Cine (cinema) – often referred to in cinema literature – eg. “Kino Cinema” in Melbourne. Also play on Kinetic Energy (i.e pertaining to movement).
  • We have a number of research questions of which these are but a few.
    How has digitization affected film distribution? – no longer a “physical” movement of film.
  • If we start by reviewing IBM’s dimensions of what Big Data is: VARIETY (data can be structured, unstructured, text, video, audio); VELOCITY (time sensitive, can require streaming of data); and of course VOLUME – comes in one size LARGE.
    We would also add VISUALIZATION to that in order to analyse patterns.
  • We’re downloading (via daily streaming), screenings for films across 48 countries in the World. This is data collected by a US Company for commercial purposes for advertising etc. There are other large databases that also hold some of this data (for example: the Internet Movie Database or IMDb available at: www.imdb.com – but only gives films for used specific regions).
    Compressed data files automatically downloaded via PERL synchronisation service
    Project database is RHEL 6 standard MySQL 5.1.67
    Stored on virtual server at Deakin using RedHat Enterprise Linux (RHEL)
    Currently have 63.5 million – estimate at the end will have 100 million
  • This is an outline of the database schema – the link between MOVIE and VENUE is the SHOWTIME (or screening date and time).
  • Geographic methods hold geographic location as true or near-to-true. Show geographic relationships between venues and geographic movement of screenings.
    Non-geographic – show relationships between distributers and venues.
    Much of the earlier project data came from project based databases such as CAARP.
    Other visualisations: “Information is beautiful” web site (www.informationisbeautiful.net) and within this web site is “Hollywood Visualizations” (http://www.informationisbeautiful.net/2012/hollywood-visualizations/)
  • The first example came from earlier ARC Discovery Grant
  • Based on “Hot Spot” (Getis-Ord) analysis – identifies statistically significant hot spots (high values – or increase in cinema numbers) and cold spots (low values – loss in cinema numbers). Issue of small polygons.
  • Based on “Hot Spot” (Getis-Ord) analysis – identifies statistically significant hot spots (high values – or increase in cinema numbers) and cold spots (low values – loss in cinema numbers). Issue of small polygons.
    Shows that state boundaries not that significant. Topography was – hilly terrain in NE Victoria versus flat areas in NSW and Qld – distance not important but time to travel was
    Cars become influential in late ‘50s
  • Cartogram generated using the Gastner-Newman “diffusion-based” method which equalises density throughout a set polygon. It uses the mean of polygons outside the area of interest to maintain their shape.
  • Screenings for “Skyfall” shown on 10 January 2013
  • Different film screening shown on 24 December 2012.
    Approximately 1500 different films screened more than 300,000 times at 82000 venues
  • Screenings for “Life of Pi ”
  • Screenings for “Life of Pi ”
  • Radial axes related to time – circles increase from 1948 in 5 year intervals
    Lines indicate length of cinema venue operation
    Colours related to venue operator.
    Centre = Melbourne GPO
    Could be used in similar fashion to weather map – petal diagram at each location.
  • Using “Tableau”
  • Greek cinema circuit operated by staggering the release of films; period of study when 100,000s of Greek migrated to Australia (250000 Greeks came to Australia between 1952-1974)
    Uncovers the relationships between cinemas themselves; anecdotal evidence that films tended to follow a predictable pathway – wanted to test this – single release of a film (one physical copy which moved from venue to venue).
    Markov Chains – statistical process where an initial condition results in a number of alternative outcomes (stochastic).
    CAARP = Cinema and Audience Research Project
  • much data from Greek language newspaper “NeosKosmos”
  • 1 film went to “A”
    15 films went to “B” (6 went only to B)
  • A = Melbourne Town Hall (Melbourne)
    B = Lawson Theatre (Redfern)
    C = Doncaster Theatre (Sydney)
    D = Nicholas Hall (Melbourne)
  • Circos – used for genome sequencing (eg. A, C, G and T are bases and three of these code for 1 amino acid)
  • Produced using “Circos” software developed originally for identifying and analysing similarities and differences in genome structure and the sequencing of multiple genomes
    The similarities in visualising genome sequences and cinema venue sequences were evident. The circular approach to represent connections between venues became easier to organise than using a linear method.
    Hence it could be concluded that Finos Films had a much broader, or eclectic, venue repertoire than did Anzervos, who were more constrained to venues A, B and C.
  • Acknowledge Michelle Mantsio – research assistant who collected data and entered in CAARP and drew these diagrams. Olive tree is metaphor for Greek film distribution
  • Multiple sources and types of data – publications, third party commercial data, external databases often collected for differing purposes – need for socio-demographic, meteorological/seasonality, etc
    Multiple scales – local and global with differing levels of spatial and attribute accuracies – need for triangulation to confirm validity – our big data project not truly global (48 countries) – Hollywood in process of signing agreements for distributing digitally – still use hardcopy mailed out. Some countries will be left out due to internet restrictions.
    Historic data – as above – often gaps in data which can’t be ascertained
    Multiple definitions – the meaning of a “venue” – country Australia – may be a Town Hall or moving cinema (caravans)
    Finally there is a need to visualise in different ways to build a collective story.

Transcript

  • 1. Visualising heterogeneous cinema data sets Big, Open Data and the Practice of GIScience RGS-IBG Annual Conference, London 29 August 2013 Colin Arrowsmith, School of Mathematical and Geospatial Science, RMIT University, Melbourne, Victoria, Australia Deb Verhoeven and Alwyn Davidson, School of Communication and Creative Arts, Deakin University, Melbourne, Victoria, Australia
  • 2. A big data project “Only at the movies: Kinomatics” School of Mathematical and Geospatial Sciences 2
  • 3. Objective To investigate spatial patterns of film diffusion across the world. – How do films circulate around the world? – Does spatial clustering affect film screening? – How does seasonality affect screening? School of Mathematical and Geospatial Sciences 3
  • 4. Dimensions of “Big data” • Variety • Velocity • Volume IBM “Bringing big data to the Enterprise” (http://www-01.ibm.com/software/au/data/bigdata/) • Visualization School of Mathematical and Geospatial Sciences 4
  • 5. Working with “Big data” • Database downloaded from commercial film data collector • 2 to 2.5 million showtime records per week • 30000 movies downloaded after seven months • 28000 cinema venues and 118000 screens • 63.5 million records equating to 4.8 Gbytes of data School of Mathematical and Geospatial Sciences 5
  • 6. Database schema School of Mathematical and Geospatial Sciences 6
  • 7. Projects exploring approaches for visualising and analysing big film data • Geographic methods – Post-war cinema venues in Australia (change-over-time) – Global cartograms for cinema (point-in-time) – Global patterns of movement • Non-geographic (conceptual) – Multivariate visualisations (change-over-time) – Film circulation (Markov-Chains) School of Mathematical and Geospatial Sciences 7
  • 8. Geographic examples • Post-war cinema venues in Australia (change-over-time) • Global cartograms for cinema (point-in-time) • Global patterns School of Mathematical and Geospatial Sciences 8
  • 9. Static maps of post war cinema venues in Australia • Basis for data was scanned “Film Weekly” summaries • Base year of 1948 derived • New and closed cinemas determined • Significant post-processing School of Mathematical and Geospatial Sciences 9
  • 10. Film Weekly scan School of Mathematical and Geospatial Sciences 10
  • 11. Rural scale changes 1948 to 1953 1963 to 1968 1953 to 1958 1958 to 1963 1968 to 1971 School of Mathematical and Geospatial Sciences 11
  • 12. Rural scale changes 1948 to 1953 1963 to 1968 1953 to 1958 1958 to 1963 1968 to 1971 School of Mathematical and Geospatial Sciences 12
  • 13. Urban scale changes (Melbourne) 1948 to 1953 1963 to 1968 1953 to 1958 1958 to 1963 1968 to 1971 School of Mathematical and Geospatial Sciences 13
  • 14. Global cinema cartograms • Cartogram is a map where a thematic variable is substituted for area (or distance) • Population substituted for area School of Mathematical and Geospatial Sciences 14
  • 15. Cartograms Global cinema numbers 15
  • 16. Global screen numbers 16
  • 17. Continent-wide patterns School of Mathematical and Geospatial Sciences 17
  • 18. Global patterns School of Mathematical and Geospatial Sciences 18
  • 19. Life of Pi 30 November 2012 7 December 2012 14 December 2012 21 December 2012 19
  • 20. Life of Pi 28 December 2012 11 January 2013 4 January 2013 17 January 2013 20
  • 21. Life of Pi (November 2012 to January 2013) School of Mathematical and Geospatial Sciences 21
  • 22. Life of Pi (November 2012 to January 2013) School of Mathematical and Geospatial Sciences 22
  • 23. Non-geographic examples • Multivariate visualisations (change-over-time) • Film circulation (Markov-Chains) School of Mathematical and Geospatial Sciences 23
  • 24. 24
  • 25. Visualisations School of Mathematical and Geospatial Sciences 25
  • 26. Movement approaches: The Greek cinema circuit • Objective – To explore historical changes in the diasporic Greek cinema distribution of Finos and Anzervos films during the period 1956 to 1963 • Rationale – To demonstrate the role of geographic analysis in understanding cinema circuit behaviour School of Mathematical and Geospatial Sciences 26
  • 27. Data acquisition • Archival newspaper and oral history research • Government records – censorship records – theatre licence and company records • Geo-location using street address or via GPS School of Mathematical and Geospatial Sciences 27
  • 28. Anzervos School of Mathematical and Geospatial Sciences 28
  • 29. Finos School of Mathematical and Geospatial Sciences 29
  • 30. Anzervos (section) School of Mathematical and Geospatial Sciences 30
  • 31. Finos (section) School of Mathematical and Geospatial Sciences 31
  • 32. Key chains identified No. of venues Anzervos Finos 1 B C B A 2 BC CB BC AD 3 BCB CBC BCB BCA 4 BCBC MGPC BCBC BCBA School of Mathematical and Geospatial Sciences 32
  • 33. Circos – circular visualisations • Film sequence (Fort of Freedom): – BCBBBBBCAABBBBBB by screening or – BCBCAB venue sequencing School of Mathematical and Geospatial Sciences 33
  • 34. Change in sequence (Anzervos) Ali Pasha and Mrs Frossini The Fort of Freedom School of Mathematical and Geospatial Sciences 34
  • 35. Change in sequence (Finos) Music, Povery and Pride Astero School of Mathematical and Geospatial Sciences 35
  • 36. Change of venue date School of Mathematical and Geospatial Sciences 36
  • 37. Change of venue date Ali Pasha and Mrs Frosini 3.5 A J S 3 Months 2.5 2 1.5 C CA A 1 0.5 BCBBB BB C 0 0 10 20 30 40 50 60 70 Days The Fort of Freedom 35 A B B B B B 30 Months 25 20 15 10 5 B C B B B B B C A 0 0 5 10 15 20 25 30 35 40 Days School of Mathematical and Geospatial Sciences 37
  • 38. Change of venue date Music, Poverty and Pride 100 F 90 BBBB BBD DD A D K K 80 70 BBBBB Months 60 50 40 30 20 10 G P II A C 0 0 20 40 60 80 100 120 Days Astero 35 JJJ JJJ F K D 30 Months 25 20 BB B BB B 15 O 10 5 B B B BBC CB B B B BA B A A O D BBB BB A 0 0 50 100 150 200 250 Days School of Mathematical and Geospatial Sciences 38
  • 39. OLIVE TREES • The olives are where films finished: green= Sydney venue, purple = Melbourne venue • Leaves are screenings: yellow is QLD, light green is NSW, darker green is VIC, dark brown is SA • The distance is days between screenings and done to scale Anzervos Finos School of Mathematical and Geospatial Sciences 39
  • 40. Issues working with “big” complex cinema data •Multiple sources of data •Working at multiple scales •Working with historic data •Multiple definitions •Need for visualising both geographic and conceptual relationships School of Mathematical and Geospatial Sciences 40