Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
[ICMR 2020] Automatic Color Scheme Extraction from Movies
1. Automatic Color Scheme
Extraction from Movies
Suzi Kim and Sunghee Choi
Geometric Computing Lab.
School of Computing, KAIST
Oral Session 4: Semantic Enrichment
3. Color Scheme from Movies
Why do people extract color schemes from movies?
A color scheme can be a very simple and accurate descriptor
to quantify a movie’s mise-en-scène.
Movie
Cinematographer
Gaffer
ColoristDirector
Dresser
Make-up
Artist
4. Color Scheme from Movies
Characteristics of the Color Scheme from Movies
(1) Combination of colors appears over several scenes.
https://thedesigninspirationalist.com/color-in-films-the-design-inspirationalist/
5. Color Scheme from Movies
Characteristics of the Color Scheme from Movies
(2) Even in the same movie, combination of colors appears
differently depending on the scene.
https://twitter.com/CINEMAPALETTES/
6. Problem Definition
Automatic Color Scheme Extraction from a Movie
Although it is a challenge to extract major colors from so many images
with complicated contents, there exists a dominant color scheme.
Movie
is carefully edited with many different objects
and heterogeneous contents.
is generally longer than the general videos
includes 200,000–250,000 images.
7. Problem Definition
Color Schemes over the Metadata
La La Land (2016)
Damien Chazelle
Comedy, Drama, Music
Whiplash (2014)
Damien Chazelle
Drama, Music
Léon: The Professional (1994)
Luc Besson
Action, Crime, Drama
Constantine (2005)
Francis Lawrence
Action, Fantasy, Horror
Similar metadata
Similar colors
11. Methods: (1) Semi-master-shot Boundary
Detection
Segmentation Techniques
Shot Boundary Detection Scene Boundary Detection
Detection of shot change Detection of scene change
Image (frame) grouping Semantic grouping
Similarity comparison of images Similarity comparison of videos (shots)
12. Methods: (1) Semi-master-shot Boundary
Detection
Why is the scene segmentation so difficult?
Dynamic camera moves
Unclear transitions
- Abrupt transition
- Gradual transition: dissolve, fade-in, fade-out, wipe
Semantic analysis
“Since a scene is based on human understanding of
the meaning of a video segment, it is very difficult to give an objective
and concise scene definition that covers all possible scenes judged by humans”
- Tavanapong and Zhou, TMM, 2004
13. Methods: (1) Semi-master-shot Boundary
Detection
Semi-master-shot
A single shot that contains all the characters, representing the
atmosphere of all the space being filmed.
Not to be included in the actual movie due to its unappealing style.
Master Shot
Semi-master-shot
Combining contiguous shots taken in the same location with similar
colors.
14. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Clustering of
Similar Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical
Clustering
15. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Similarity
Measurement
between Frames
Clustering of
Similar Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical
Clustering
Feature Selection
RGB colors
Visual features (SIFT, HOG, etc.)
Audio
Edges
Object detection
Feature
Extraction for
Each Frame
16. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Feature
Extraction for
Each Frame
Clustering of
Similar Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical
Clustering
Similarity Measurement
Squared difference between every
corresponding pixel in two frames
Chi-squared distance of RGB color
histograms
Similarity
Measurement
between Frames
17. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical
Clustering
Clustering Method
Shifting in one direction
Abrupt transition
(1) Difference between two adjacent
frames exceeds 𝑇.
(2) Difference between neighboring
shots exceeds 𝑇/2.
Gradual transition
(1) Repeating the process for the
abrupt transition by increasing the
sliding window size up to maximum
size of 𝑊.
Clustering of
Similar Frames
18. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Clustering of
Similar Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical Clustering
Groups adjacent shots.
To prevent duplicate detection of the same
transition, there’s a safe zone, which
separates two adjacent transitions at frame
intervals of more than 𝑇𝑠.
Hierarchical
Clustering
19. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Clustering of
Similar Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical Clustering
Groups adjacent shots.
To prevent duplicate detection of the same
transition, there’s a safe zone, which
separates two adjacent transitions at frame
intervals of more than 𝑇𝑠.
Hierarchical
Clustering
Issues in Scene Clustering
It uses a certain pre-known number of scenes.
Semi-master-shot does not have to be perfect
scene segmentation while enduring high
computational overhead.
20. Methods: (1) Semi-master-shot Boundary
Detection
Imagelab Shot Detector (ILSD)
Shot and Scene Detection via Hierarchical Clustering for Re-using
Broadcast Video [Baraldi et al., 2015, CAIP]
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Clustering of
Similar Frames
Input Video
Segmented Scenes
Segmented Shots
Hierarchical
Clustering
21. Methods: (1) Semi-master-shot Boundary
Detection
Our Pipeline
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Clustering of
Similar Frames
Input Video
Segmented
Semi-master-shots
Hierarchical
Clustering
SBD1
1
SBD2
SBD4
SBD3
22. Methods: (1) Semi-master-shot Boundary
Detection
Our Pipeline
Feature
Extraction for
Each Frame
Similarity
Measurement
between Frames
Input Video
Segmented
Semi-master-shots
SBD1
1
SBD2
Mitigating 𝑻
To determine the color
difference between shots
Use 𝑻 𝒔
∗
Proportional to the
average length of shots
Clustering of
Similar Frames
Hierarchical
Clustering
SBD4
SBD3
23. Methods: (2) Base Palette Extraction
Pipeline
Keyframes Saliency
Maps
…
…
Segmented
Semi-master-shot
Color
Clustering
Keyframe
Selector
BPE1
BPE2
2
Base Palette from
a semi-master-shot
24. Methods: (2) Base Palette Extraction
Importance of Saliency
Input Image Saliency Map Color Scheme w/o saliency (top)
and w/ saliency (bottom)
25. Methods: (2) Base Palette Extraction
Keyframe Selector
Select top m frames among a shot with cost C:
𝐶(𝑓) = 𝛼 𝑠 𝐶𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑦(𝑓) + 𝛼 𝑐 𝐶𝑐𝑙𝑎𝑟𝑖𝑡𝑦(𝑓) + 𝛼 𝑟 𝐶𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑛𝑒𝑠𝑠(𝑓)
Term Description Computation
𝐶𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑦
- How important a frame is
- Average of saliency values of pixels
𝐶𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑦(𝑓) =
𝑝∈𝑓
𝜇 𝑝 𝑓
𝐶clarity - How clear a frame is without blur 𝐶clarity 𝑓 =
∀𝑓∗∈𝑆
𝑆𝑖𝑚(𝑓, 𝑓∗
) 𝑆
𝐶𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑛𝑒𝑠𝑠
- How important a frame is among all frames
in the shot
𝐶𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑛𝑒𝑠𝑠 𝑓 = 1 − 0.01 ∗ 𝐵𝑅𝐼𝑆𝑄𝑈𝐸(𝑓)
Keyframes
Segmented
Semi-master-shot
Keyframe
Selector
BPE1
BPE1
26. Methods: (2) Base Palette Extraction
Color Clustering
k-means clustering of color set 𝑆 using RGB Euclidean distance.
Weight the cluster according to the pixel’s visual attention using saliency maps.
- Color of pixel 𝑝 is included in the color set 𝑆 with probability 𝜓 𝜇 𝑝 , which is the
weighted random function generating 0 or 1 with weight 𝜇 𝑝 providing the pixel
saliency.
- The higher the saliency of pixel 𝑝, the higher the probability that the clustering
will include the color of 𝑝.
Color
Clustering
BPE2
Keyframes Saliency Maps
Base Palette from
a semi-master-shot
BPE2
27. Methods: (3) Color Scheme Merge
Problem Definition
Color Scheme
Color Scheme Merge
Base Palettes from
Semi-master-shots
…
28. Methods: (3) Color Scheme Merge
Problem Definition
Color Scheme
Color Scheme Merge
Two Issues
Base palettes still retains a large
number of colors
Overlapping palettes exists
Base Palettes from
Semi-master-shots
…
29. Methods: (3) Color Scheme Merge
Problem Definition
Color Scheme
Color Scheme Merge
We use convex hull enclosing
to include all colors in the base palettes
Base Palettes from
Semi-master-shots
…
30. Methods: (3) Color Scheme Merge
Finding Center Point 𝒗
Base Palettes from
Semi-master-shots
…
CSM1
31. Methods: (3) Color Scheme Merge
Convex Hull EnclosingCSM2
Base Palettes from
Semi-master-shots
…
33. Methods: (3) Color Scheme Merge
Finding Weighted Center of Each Sub-hullCSM4
Center point 𝒗
Weighted center of sub-hull
Weighting each color enclosed in sub-hull
according to the length of the semi-master-shot
where the node belongs.
This color becomes the color forming the final
color scheme.
Colors enclosed in sub-hull
Color plotted from base palettes
34. Experiments
Datasets
OVSD Commercial Movie Dataset
Description Open dataset of CCL videos
Manually collected to compare current
works by artists or descriptor roles
Composition
21 short or full-length movies from
various genres
53 commercial movies from various
genres
Pros
Ground truth exists for scene
detection
Containing richer narrative patterns of
shots and scenes
Cons
Aesthetically insufficient to evaluate
the color scheme extraction
Difficult to use freely due to the
copyright issue
35. Experiments
(1) Semi-master-shot Boundary Detection Accuracy
F1 score Precision Recall
ILSD 0.738 0.623 0.906
Ours w/o 𝑇𝑆
∗
0.771 0.642 0.964
Ours w/ 𝑇𝑆
∗
0.793 0.662 0.990
Dataset: OVSD
▶ Highest F1 score with 𝑻 = 𝟓𝟎, 𝑾 = 𝟐. 𝟓
Average F1 score
for 𝑾 and 𝑻 of OVSD
45. Experiments
(5) Color Scheme Extraction: Wes Anderson’s
https://shoptwentyseven.com/products/wes-anderson-procreate-palette
https://www.fortyclothing.com/who-is-wes-anderson/wes-anderson-colour-palette/
46. Experiments
(5) Color Scheme Extraction: Wes Anderson’s
The Life Aquatic
with Steve Zissou
Fantastic Mr. Fox
Bottle rocket
The Royal Tenenbaums
The Darjeeling Limited
Moonrise Kingdom
The Grand Budapest Hotel
Rushmore
Isle of Dogs
47. Experiments
(5) Color Scheme Extraction: Makoto Shinkai’s
https://dreamaction.co/color-palette-from-kimi-no-nawa/
48. Experiments
(5) Color Scheme Extraction: Makoto Shinkai’s
The Place Promised in Our Early Days
Your Name 5 Centimeters per Second
49. Experiments
(5) Color Scheme Extraction: Marvel Cinematic Universe Films
Iron Man 3
Iron Man
Iron Man 2
Captain America:
The Winter Solider
Captain America:
Civil War
Captain America:
The First AvengerThe Avengers
Avengers: Infinity War
Avengers: Age of Ultron
50. Experiments
(5) Color Scheme Extraction: Retrieval
Query Movie
Top-5 Results
The Matrix Reloaded (2003)
Lana Wachowski, Lilly Wachowski
Action, Sci-Fi
The Matrix (1999)
Lana Wachowski, Lilly Wachowski
Action, Sci-Fi
The Matrix Revolutions (2003)
Lana Wachowski, Lilly Wachowski
Action, Sci-Fi
Fight Club (1999)
David Fincher
Drama
The Lord of the Rings: The Fellowship of the Ring (2001)
Peter Jackson
Adventure, Drama, Fantasy
Days of Being Wild (1990)
Kar-Wai Wong
Crime, Drama, Romance
51. Experiments
(5) Color Scheme Extraction: Retrieval
The Grand Budapest Hotel (2014)
Wes Anderson
Adventure, Comedy, Crime
The Wolf of Wall Street (2013)
Martin Scorsese
Biography, Crime, Drama
Avengers: Infinity War (2018)
Anthony Russo, Joe Russo
Action, Adventure, Sci-Fi
5 Centimeters per Second (2007)
Makoto Shinkai
Animation, Drama, Family
Punch-Drunk Love (2002)
Paul Thomas Anderson
Comedy, Drama, Romance
Moulin Rouge! (2001)
Baz Luhrmann
Drama, Musical, Romance
Query Movie
Top-5 Results
52. Conclusion
Contributions
To the best of our knowledge, this is the first work to generate a color
scheme from a video.
We split a video into small units and choose the final color scheme in a
bottom-up manner.
We measure the importance at three levels, namely, the importance of
each pixel in the frame, and the importance of each frame in the shot and
the importance of each shot in the movie
We define a semi-master-shot, which is a new unit to combine
contiguous shots taken in the same location with similar colors.
We demonstrate the proposed color scheme’s plausibility and
functionality as a descriptor using real movie videos.
53. Conclusion
Applications
Our color scheme extraction is useful for video processing such as
video recoloring, vectorization, and segmentation.
It can also be used to check the color tone during the actual film
production process.
54. Discussion
Future Work
Improvement of the semi-master-shot to include the non-
contiguous shots.
Weighting the relationship between colors in the same base palette
during merging.
Using hidden information to extract movie descriptors aside from
colors, e.g. time series editing, costume design, cinematography,
production design, etc.
55. Thank you
Automatic Color Scheme Extraction from Movies
Suzi Kim and Sunghee Choi
Geometric Computing Lab., School of Computing, KAIST
https://github.com/SuziKim/ICMR2020-MovieColorSchemer
A color scheme can be a very simple and accurate descriptor to quantify a movie’s mise-en-scène.
Film production strongly considers the colors that dominate a movie, with a cinema colorist adjusting the overall movie color.
Directors leverage the colors to support the narrative of the movie and generate a unified fictional space.
According to the Cinematography for Directors [12], the color scheme is an interpretation of the scenario by the cinematographer and it can convey a mood or feeling that stays with the viewer after the movie has ended.
It is because the color scheme is not just a result shot by a camera, but a combination of various elements of film production, including backgrounds and sets created by a production designer, lightings set by the gaffer [15], and costumes created by a wardrobe designer [28].
Several previous studies have considered color scheme extraction
from images [20, 21, 24, 37] but little attention has been paid to
the extraction from a video [75], and particularly from a movie. A
movie is an elaborate compilation by the director, embodying their
message and values. In contrast to the general videos, which are
filmed continuously without shot or scene distinctions, movies are
carefully edited with many different objects and heterogeneous contents.
Movies are generally longer than the general videos, although
not usually exceeding three hours, and include 200,000–250,000
images (assuming 24 fps). Although it is a challenge to extract major
colors from so many images with complicated contents, there
exists a dominant color scheme, by design, as you can imagine color
palettes after watching Wes Anderson’s movies
Each field of metadata, such as genre and director, cannot be
a primary key that separates all movies individually. In the same
manner, we do not intend to distinguish all movies only with the
proposed color scheme. We attempt to show that the color scheme
is not a unique characteristic for each movie, but a contributing
factor to cluster the movies. For example, La La Land (2016) and
Whiplash (2014) are drama musical films written and directed by
Damien Chazelle. They share similar metadata, i.e., director, genre,
and casting, but give different impressions due to the intensity
of colors dominating the whole duration, as shown in Figure 2.
Whiplash should also be linked to Constantine (2005) and Léon:
The Professional (1994), which maintain similar color tones, but for
now there is no common metadata to connect these films. A color
scheme can be a very simple and accurate descriptor to quantify a
movie’s mise-en-scène.
Key-frame: representative frame of the shot
Shot: segment of audio-visual data filmed in a single camera take
≈ 900 shots in a movie
Scene: sequence of interrelated shots that share a common semantic thread
≈ 120 scenes in a movie
A master shot is a single shot that contains all the characters,
representing the atmosphere of all the space being filmed. Modern
movies use master shots during the production stage, but they tend
not to be included in the actual movie due to their unappealing
style. Therefore, we define a semi-master-shot rather than a true
master shot, combining contiguous shots taken in the same location
with similar colors
We define a semi-master-shot, which is a new unit to combine
contiguous shots taken in the same location with similar
colors. The semi-master-shot can be used in video processing,
which has been actively studied for decades, such as
video highlight detection and video thumbnail generation.
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
Semi-master-shots are generally clustered by color difference
using local descriptors for similarity factors, such as SIFT or SURF,
which requires considerable computational overhead. In contrast,
we adopt the Imagelab Shot Detector (ILSD) [7] segmentation
method, which only considers RGB colors. ILSD measures the similarity
between frames as the sum of two color difference metrics:
squared difference between every corresponding pixel in two
frames, and chi-squared distance of RGB color histograms. Similar
frames are clustered using a sliding window to compare frame
differences centered on the current frame, shifting in one direction.
Generally, ILSD detects abrupt and gradual transitions separately.
The 𝑖-th frame, 𝑓𝑖 , is regarded as an abrupt transition if
the difference between 𝑓𝑖 and 𝑓𝑖+1 exceeds some threshold, 𝑇 , and
differences between neighboring shots exceed 𝑇 /2. Gradual transitions
are identified by repeating the process for detecting abrupt
transition with increasing window size up to the maximum size of
𝑊. After shot detection, ILSD groups adjacent shots into scenes
using hierarchical clustering. To prevent duplicate detection of the
same transition, the two adjacent transitions are separated at frame
intervals of more than a constant 𝑇𝑠 , which is called the safe zone.
Although Baraldi et al. [7] group shots, which are segmented
by ILSD, into a scene by clustering based on the color comparison,
we cannot use the scene as a semi-master-shot for two reasons.
First, they perform scene clustering using a fixed number of clusters,
i.e., assuming it already knows the total number of scenes.
Second, since the semi-master-shot does not require perfect scene
segmentation, scene clustering increases computational overhead.
Therefore, we acquire semi-master-shots mitigating 𝑇 to determine
the color difference between shots. To enhance the function of the
safe zone, we use 𝑇 ∗
𝑠 , which is proportional to the average length
of shots, instead of the fixed value of 𝑇𝑠 (see Section 7 for details).
To the best of our knowledge, this is the firstwork to generate a color scheme from a video. Extracting a color scheme from images has been well studied in computer graphics because the color scheme is the most basic unit for image recoloring and vectorization. The reason why color scheme extraction is difficult for video is that it should consider the overall colors of the video while avoiding being driven by a less significant long-shot. So, we split a video into small units and choose the final color scheme in a bottom-up manner.
We define a semi-master-shot, which is a new unit to combine contiguous shots taken in the same location with similar colors. The semi-master-shot can be used in video processing, which has been actively studied for decades, such as video highlight detection and video thumbnail generation.
Beyond the simple saliency adoption, we conduct a deep consideration of how to use the saliency map properly. We measure the importance at three levels, namely, the importance
of each pixel in the frame, and the importance of each frame in the shot and the importance of each shot in the movie.
We demonstrate the proposed color scheme’s plausibility and functionality as a descriptor using real movie videos.