1. Film Big Data Visualization
Based on D3.js
By
ABDUL VAHED SHAIK
016452540
SAN JOSE STATE UNIVERSITY
2. Abstract
• D3.js is utilized to visually represent the film's enormous data in order
to make it easier to comprehend and mine it. Use D3.js to present the
data in the form of a histogram, doughnut chart, force-directed graph,
map, and global cloud after receiving the information about the films
released in China in 2019 through the crawler. Add rich interactive
functions to help people find the data they require quickly. The final
analysis can serve as a guide for users to choose films as well as some
decision help for the Chinese film business. According to research,
D3.js is flexible and trustworthy for the visualization objectives in big
data processing activities at high speed and cheap cost.
3. Introduction
• People's viewing habits have gotten increasingly unpleasant as cultural norms and living
standards have advanced, which reflects people's innate desire to document their daily
lives. People can encounter subjective experience through watching movies as a service
in addition to rest and pleasure. The National Film Administration said that the entire box
office for movies in 2019 was 64.266 billion yuan, an increase of 5.4% from the previous
year; the total box office for domestic movies was 41.175 billion yuan, an increase of
8.65%. It is evident that China's film industry is seeing success.
• The upsurge of different resources and info in the big data era is astounding, but the
most efficient approach for academics to gather data from the outside world comes from
the visualization system. Analysts can often see the underlying information quickly when
data is presented in visual visuals [1,2]. Data visualization is the use of computer
graphics, image processing, and other technologies to transform data into images,
interact with them, and more easily explain information and trends [3].In order to
undertake more in-depth observation and analysis, this article leverages film-related data
from Douban Movie and Endata to visualize film data from several dimensions with D3.js.
4. Introduction to D3.js
• Data-Driven Documents is D3.js's official name. Data is displayed
dynamically by use of a Javascript library. The 'Data-Driven Documents,' or
D3.js, are documents that can load any data into the browser's memory
space and link it to the DOM (Document Object Model). By manipulating
the document with HTML, SVG, and CSS (web page elements appeared in
the browser), it applies the data-driven transformation to the document to
display visualization effects.
• D3.js complies with Web standards and has excellent browser
compatibility. Users simply need to import the D3.js source file in the HTML
head> tag; they are not required to be compatible through a proprietary
framework. They can pair data-driven DOM action with dependable
visualization components [5]. D3.js is simpler to draw with than SVG and
Canvas. D3.js is more adaptable and expandable when compared to
Echarts and other open source visualization tools .
5. Overall design
• Visualization: Use the D3.js visualization tool to display the data using
a histogram, doughnut chart, map, word cloud, or force-directed
graph after extracting and cleaning the necessary data from the
Mysql database.
• Graphical interaction: Visual presentation and interactivity are the
two main components of data visualization. Interaction can help with
more than just the conflict between data overload and limited
viewing space. Users actively engage in the process of creating mental
models, which aids in helping users comprehend data and spot
trends. This article will achieve interactivity through the use of the
mouse to hover the prompt box, search, button switching, link jump,
and other techniques.
6. Implementation
• Utilizing Python web crawlers, gather information about movies from
Douban Movie and Endata between January 1 and December 31,
2019, as well as the top 100 movies with the biggest box office in the
Chinese mainland, and save it all in a MySQL database. Figure 1
depicts the initial data. The visualization system draws charts based
on D3.js using the Bootstrap framework in the front-end and the
lightweight Python-based Web framework Flask in the background.
7. Annual box office review
The yearly box office overview is divided into two sections, which are represented by a
histogram and a doughnut chart. The first section comprises the top 10 movies at the box
office in 2019; the second half includes the box office of theater chains for the entire year.
The following methods are used to create a histogram using D3.js of the top 10 box office
hits of 2019:
1.Definition of the loading dataset: To the front-end, submit the name of the eligible movie
and the accompanying box office information that you extracted from the database. Put
the dataset in the workspace of the browser, defining it as "dataset1".
2.Bind data: Join "dataset1" to the designated document element. Make it consistent with
the length of the bound data and add new components if necessary. Create a rect> tag with
the class set to "cube" for each piece of data so that each value corresponds to a rectangle
on the page. Create a rect> tag with the class set to "cube" for each piece of data so that
each value corresponds to a rectangle on the page.
3.Attribute transformation: Control the element's transformation by setting the attribute
and changing the element's attributes. Set the width and height of "rect.cube"
dynamically. The fill color and other characteristics, such as the rectangle's x and y
coordinates, will be set concurrently.
8.
9. Similar to a histogram, a doughnut chart concentrates on the
arrangement and the conversion of angles and arc paths, and
it can be used to illustrate the box office of theater chains.
Figure 2 represents the finished doughnut diagram. In 2019,
cinemas earned 64.123 billion yuan in total. The majority was
accounted for by Wanda Cinemas. It has a close connection
to Shanghai United, Southern Shinkansen, Jinyi Zhujiang,
Hengdian, Omnijoi, Huaxia United, Guangdong Dadi, China
Film Group Corporation, and China Film Stellar. A whopping
67.68% of the total was made up by these ten well-known
theater chains.
10. Box office in china
provinces:
By using the Mercator projection
algorithm, first define the map projection. Define
the path object of the path generator after that. It
can create a closed map area by converting the
following geojson map data into a set of pixel
sequences that are displayed on the web page using
the preset projection.
To acquire map objects, use the 'd3.json'
method to call the geojson data, which contains
outline information for all of China's provinces,
municipalities, and counties. To maintain the
integrity of Chinese land, the 'd3.xml' method is
used to ask that the svg file of a map of the South
China Sea be added to the drawing area. Each
province on the map corresponds to a path object by
creating a path element in the svg to describe a
graphical path and entering each element to build
data binding. Use the 'linear()' method when filling
the region to make the color linearly related to the
relevant box office. Set the font size, center, and
other characteristics, then use CSS to change the
map's border.
11. To observe the growth in box
office in 2019 compared to 2018, users can
toggle the visualization map through the
button. Figure 4 makes it clear that all
provinces, with the exception of the three
"Hei Ji Liao" northeastern provinces, are
demonstrating a growth tendency, albeit the
increase is distinct. In the three provinces of
Heilongjiang, Jilin, and Liaoning, the slow
economic growth—which is closely
connected to the grave loss of the young
population—is the main cause of the
negative box office growth.
The growth of the Chinese film
industry has been incredibly unequal, and
regional variations are large. All ten of the
top provinces have substantial populations
and economies. This demonstrates that the
growth of the film market may be influenced
by the state of the economy. The movie
market might not be able to stably grow
steadily if the economy lags.
12. Box Office Ranking in
Chinese mainland
*The WordCloud for the top 100 films in
mainland China's box office is displayed in
Figure 6. The text size increases in direct
proportion to the box office. To depict the
box office of 5 billion, 4 billion, 3 billion, 2
billion, 1 billion, and less than 1 billion, the
films in Figure 6 are divided by colors such as
blue, orange, green, red, purple, and brown.
By clicking to view the current movie's
official release window and each week's box
office, users can interactively link to the
histogram of weekly box office revenue
associated with the current movie. The "Wolf
Warrior 2" weekly box office is displayed in
Figure 7.
13. The highest box office performance
frequently occurs in the first or second week
following the official release of the movie,
according to an analysis of the weekly box
office for these 100 movies. This
phenomenon's cause is strongly tied to both
the film's schedule and actual content, in
addition to the film's promotion. In general,
series movies, well-known IP movies like
"Fast & Furious," Marvel series movies, etc.,
and movies released during special occasions
like "My People, My Country," "The Captain,"
and other movies created to commemorate
the 70th anniversary of the founding of the
People's Republic of China tend to have the
highest box office openings.
14. Figure 8 shows a WordCloud of the top 100
films in mainland China's box office, as rated
by Douban. To distinguish between a rating
above 9.0 points, 8.0 points, 7.0 points, 6.0
points, 5.0 points, and a rating below 5.0
points, it is separated into sections using
colors like blue, orange, green, red, purple,
and brown. It is evident that movies with
successful box office performances typically
have a great reputation. These movies are
not only well-liked by the commercial
market, but they also appeal to a broad
audience due to their compelling plot and
artistic worth.
15. “Film-Director-Actor"
relationship
The finished force-directed graph
of the relationship between "Film- Director-
Actor" is shown in Figure 9, which allows for
the quick association of films, directors, and
actors. Colors like green, blue, and red are
used to split the nodes to signify actors,
directors, and movies. Three views of the
graph are available: node, text, and hidden.
The first mode shows the connections
between the characters as nodes, whereas
the second option shows the connections
between the characters as text. The third
display mode only reveals data relevant to
the currently pointed node and conceals the
rest.
The nodes information related to
"Huang Bo" is depicted in Figure 10. It is
evident that Huang Bo worked as a director
and an actor simultaneously on the two
movies "My People, My Country" and "Gone
With the Light" in 2019.
16. CONCLUSION
• The use of D3.js in data visualization and analysis is described in this
document. It is possible to estimate some film production orientation
for film firms and also provide data references to film producers by
displaying and evaluating the visualization results of the film data in
2019 and the top 100 films in the history of the Chinese mainland box
office.
17. REFERENCES
• 1. Ren L. Research on interaction techniques in information visualization [Ph.D.
Thesis]. Beijing: The Chinese Academy of Sciences,2009. (in Chinese)
• 2. Card SK, Mackinlay JD, Shneiderman B. Readings in Information Visualization:
Using Vision To Think. San Francisco: Morgan- Kaufmann Publishers, 1999. 1-712.
• 3. Keim D, Andrienko G, Fekete J, Görg C, Kohlhammer J, Melancon G. Visual
analytics: Definition, process, and challenges. In: Kerren A, ed. Proc. of the
Information Visualization. LNCS 4950, Berlin: Springer- Verlag, 2008. 154-175.
[doi: 10.1007/978-3-540-70956-5_7]
• 4. Zhao Cong. Application research of visualization library D3.js[J]. Information
Technology and Informatization,2015(02):107-109. (In Chinese)
• 5. Paul Krill, Paul Krill. D3.js JavaScript data visualization package goes modular[J].
InfoWorld.com,2016.