SlideShare a Scribd company logo
1 of 230
Download to read offline
Dr. Mohieddin Moradi
mohieddinmoradi@gmail.com
1
Dream
Idea
Plan
Implementation
Section I
− Perceptual Artifacts in Compressed Video
− Quality Assessment in Compressed Video
− Objective Assessment of Compressed Video
− Objective Assessment of Compressed Video, Codec Assessment for Production
Section II
− Subjective Assessment of Compressed Video
− Subjective Assessment of Compressed Video, Subjective Assessment by Expert Viewing
− Performance Comparison of Video Coding Standards: An Adaptive Streaming Perspective
− Subjective Assessment by Visualizer™ Test Pattern
2
Outline
3
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
4
Blockiness Bluriness Exposure Interlace Noisiness
Framing & Pillar-/Letter-Boxing Flickering Blackout Ringing Ghosting
Brightness Contrast Freezing Block Loss Slicing
Some Video Artifacts (Baseband and Compressed)
− Consumers' expectations for better Quality-of-Experience (QoE) has been higher than ever before.
− The constraint in available resources in codec optimization often leads to degradations of perceptual
quality by introducing compression artifacts in the decoded video.
− Objective VQA techniques had also been designed to automatically evaluate the perceptual quality of
compressed video streams.
Codec Optimal Compromise
Availability of Resources
(Bandwidth, Power, and Time)
Perceptual
Quality
5
Compression Artifacts
6
Compression Artifacts
Compression Artifacts
Spatial Artifacts
Blurring
Blocking
Ringing
Basis Pattern
Effect
Color Bleeding
Temporal Artifacts
Flickering
Mosquito Noise
Fine-granularity
Flickering
Coarse-granularity
FlickeringJerkiness
Floating
Texture Floating
Edge Neighborhood
Floating
− Location-based (spatial): If you can see the artifact when the video is paused, then it’s probably a spatial
artifact.
− Time/sequence-based (temporal): If it’s much more visible while the video plays, then it’s likely temporal.
• The origin of many temporal artifacts in inter-frame coding algorithms → The propagation
compression losses to subsequent frame predictions and “rounding on rounding”.
− These include those artifacts generated
I. during video acquisition (e.g., camera noise, camera motion blur, and line/frame jittering)
II. during video transmission in error-prone networks (e.g., video freezing, jittering, and erroneously
decoded blocks caused by packet loss and delay)
III. during video post-processing and display (e.g., post deblocking and noise filtering, spatial scaling,
retargeting, chromatic aberration, and pincushion distortion)
7
Temporal vs. Spatial Artifacts
− Block-based video coding schemes create various spatial artifacts due to block partitioning and
quantization. These artifacts include
• Blurring
• Blocking
• Ringing
• Basis Pattern Effect
• Color Bleeding
− They are detected without referencing to temporally neighboring frames, and thus can be better
identified when the video is paused.
− Due to the complexity of modern compression techniques, these artifacts are interrelated with each
other, and the classification here is mainly based on their visual appearance.
8
Spatial Artifacts
Spatial Artifacts
Blurring
Blocking
Ringing
Basis Pattern
Effect
Color Bleeding
9
Blurring (Fuzziness or Unsharpness )
− Blurring of an image refers to a smoothing of its details and edges (Reduction in sharpness of edges, spatial details)
• Caused by quantization/truncation of high-frequency transform (DCT/DWT) coefficients during compression
− More noticeable around edges, textured regions
10
Blurring
Reference frame
11
Blurring
Compressed frame with blurring artifact
1- Removing high spatial frequency after Transform and quantization
− Blurring is a result of loss of high spatial frequency image detail, typically at sharp edges.
− Colloquially referred to as “fuzziness” or “unsharpness”.
− It makes discrete objects – as opposed to the entire video – appear out of focus.
− Since the energy of natural visual signals concentrate at low frequencies, quantization reduces high
frequency energy in such signals, resulting in significant blurring effect in the reconstructed signals.
12
Blurring
2- Smoothing by the in-loop de-blocking filtering
− Another source of blurring effect is in-loop de-blocking filtering, which is employed to reduce the blocking
artifact across block boundaries, and are adopted as options by state-of-the-art video coding standards
such as H.264/AVC and HEVC.
− The de-blocking operators are essentially spatially adaptive low-pass filters that smooth the block
boundaries, and thus produces perceptual blurring effect.
Note:
− Sometimes, blurring is intentionally introduced by using a Gaussian function to reduce image noise or to
enhance image structures at different scales.
− Typically, this is done as a pre-processing step before compression algorithms may be applied,
attenuating high-frequency signals and resulting in more efficient compression.
13
Blurring
a) Reference frame
b) Compressed frame with de-blocking filter turned off
c) Compressed frame with de-blocking filter turned on
14
Blurring
Motion Blur
− It appears in the direction of motion corresponding to rapidly moving objects in a still image or a video.
− It happens when the image being recorded changes position (or the camera moves) during the recording of a
single frame, because of either rapid movement of objects or long exposure of slow-moving objects.
− One way to avoid motion blur is by panning the camera to track the moving objects, so the object remains
sharp but the background is blurred instead.
− Graphics, image, or video editing tools may also generate the motion blur effect for artistic reasons; (ex.
computer-generated imagery (CGI)).
15
Blurring
16
Blocking or Blockiness
− Visibility of underlying block encoding structure (false discontinuities across block boundaries)
• Caused by coarse quantization, with different quantization applied to neighboring blocks.
− More visible in smoother areas of picture
17
Blocking or Blockiness
Reference frame
18
Blocking or Blockiness
Compressed frame with Blockiness artifact
19
Blocking or Blockiness
Blocking
DCT blocks are not recreated properly.
Either due to errors or high compression ratio.
20
Blocking or Blockiness
Blocking
DCT blocks are not recreated properly.
Either due to errors or high compression ratio.
− Blocking is known by several names: tiling, jaggies, mosaicing, pixelating, quilting and checkerboarding.
− It is frequently seen in video compression standards, which use blocks of various sizes as the basic units for
frequency transformation, quantization and motion estimation/compensation, thus producing false
discontinuities across block boundaries.
− It occurs whenever a complex (compressed) image is streamed over a low bandwidth connection.
− At decompression, the output of certain decoded blocks makes surrounding pixels appear averaged
together and look like larger blocks.
− As displays increase in size, blocking typically becomes more visible.
− However, an increase in resolution makes blocking artifacts smaller in terms of the image size and
therefore less visible at a given viewing distance.
21
Blocking or Blockiness
− The lower the bit rate, the more coarsely the block is quantized,
producing blurry, low-resolution versions of the block.
− In the extreme case, only the DC coefficient, representing the
average of the data, is left for a block, so that the reconstructed
block is only a single color region.
− The DC values vary from block to block.
− The block boundary artifact is the result of independently
quantizing the blocks of transform coefficients.
− Neighboring blocks quantize the coefficients separately, leading
to discontinuities in the reconstructed block boundaries.
− These block-boundary discontinuities are usually visible,
especially in the flat color regions such as the sky, faces, and so
on, where there are little details to mask the discontinuity.
22
Blocking or Blockiness
Blocking
Mosaic
Effect
Staircase
Effect
False
Edge
− Although all blocking effects are generated because of similar reasons, their visual appearance may be
different, depending on the region where blockiness occurs.
− Therefore, here we further classify the blocking effects into 3 subcategories.
23
Blocking or Blockiness
Reference frame
False Edge
Staircase Effect
Mosaic Effect
− Mosaic effect usually occurs when there is luminance transitions at large low-
energy regions (e.g., walls, black/white boards, and desk surfaces.)
− Due to quantization within each block, nearly all AC coefficients are
quantized to zero, and thus each block is reconstructed as a constant DC
block, where the DC values vary from block to block.
− When all blocks are put together, mosaic effect manifests as abrupt
luminance change from one block to another across the space.
− The mosaic effect is highly visible and annoying to the visual system, where
the visual masking effect is the weakest at smooth regions.
Note: Visual Masking Effect
• The reduced visibility of one image component due to the existence of
another neighboring image component.
24
Mosaic Effect
− Staircase effect typically happens along a diagonal line or curve, which, when mixed with the false
horizontal and vertical edges at block boundaries, creates fake staircase structures.
− Depending on root cause, staircasing can be categorized as
• A compression artifact (insufficient sampling rates)
• A scaler artifact (spatial resolution is too low)
25
Staircase Effect
− False edge is a fake edge that appears near a true edge.
− This is often created by a combination of
• motion estimation/compensation based inter-frame prediction
• blocking effect in the previous frame
The blockiness in the previous frame is transformed to the current frame via motion compensation as
artificial edges.
26
False Edge
− Halo surrounding objects and edges
• Caused by quantization/truncation of high-frequency transform (DCT/DWT) coefficients during compression
− Doesn’t move around frame-to-frame (unlike mosquito noise)
27
Ringing (Echoing, Ghosting)
28
Ringing
Reference frame
29
Ringing
Compressed frame with Ringing artifact
30
Reference frame
Compressed frame with ringing artifact
Ringing
Ringing is unwanted oscillation of an output signal in response to a sudden change in
the input.
− The output signal oscillates at a fading rate, similar to a bell ringing after being
struck, inspiring the name of the ringing artifact.
− Image and video signals in digital data compression and processing are band
limited.
− When they undergo frequency domain techniques such as Fourier or wavelet
transforms, or non-monotone filters such as deconvolution, a spurious and visible
ghosting or echo effect is produced near the sharp transitions or object contours.
− This is due to the well-known Gibb’s phenomenon—an oscillating behavior of the
filter’s impulse response near discontinuities, in which the output takes higher value
(overshoots) or lower value (undershoots) than the corresponding input values, with
decreasing magnitude until a steady-state is reached.
31
Ringing (Echoing, Ghosting)
− The ringing takes the form of a “halo,” band, or “ghost” near sharp edges.
− Sharp transitions in images such as strong edges and lines are transformed to many coefficients in
frequency domain representations.
− The quantization process results in partial loss or distortion of these coefficients.
− So during image reconstruction (decompression), there’s insufficient data to form as sharp an edge as in
the original.
− When the remaining coefficients are combined to reconstruct the edges or lines, artificial wave-like or
ripple structures are created in nearby regions, known as the ringing artifacts.
− Mathematically, this causes both over- and undershooting to occur at the samples around the original
edge.
− It’s the over- and undershooting that typically introduces the halo effect, creating a silhouette-like shade
parallel to the original edge.
32
Ringing (Echoing, Ghosting)
− The ringing effect is restricted to sharp edges or lines.
− Such ringing artifacts are most significant when the edges or lines are sharp and strong, and when the
regions near the edges or lines are smooth, where the visual masking effect is the weakest.
− Ringing doesn’t move around frame to frame (Unlike mosquito noise).
Note
− When the ringing effect is combined with object motion in consecutive video frames, a special temporal
artifact called mosquito noise is observed.
33
Ringing (Echoing, Ghosting)
− The basis pattern effect takes its name from basis functions (mathematical transforms) endemic to all
compression algorithms. The artifact appears similar to the ringing effect.
− However, whereas the ringing effect is restricted to sharp edges or lines, the basis pattern is not.
− It usually occurs in regions that have texture, like trees, fields of grass, waves, etc.
− Typically, if viewers notice a basis pattern, it has a strong negative impact on perceived video quality.
− If the region is in the background and does not attract visual attention, then the effect is often ignored by
human observers.
34
Basis Pattern Effect
Reference frame Compressed frame with basis pattern effect
− When the edges of one color in the image unintentionally bleeds or overlaps into another color.
− Colors of contrasting hue/saturation bleed across sharp brightness boundaries, looks like “sloppy painting”
• Caused by chroma subsampling (result of inconsistent image rendering across the luminance and chromatic
channels)
− Worse in images with high color detail
35
Color Bleeding (Smearing)
36
Color Bleeding
Reference frame
37
Color Bleeding
Compressed frame with Color Bleeding artifact
For example, in the most popular YCbCr 4:2:0 video format, the color channels Cb and Cr have half resolution
of the luminance channel Y in both horizontal and vertical dimensions.
Inconsistent Distortions Across Color Channels
− After compression, all luminance and chromatic channels exhibit various types of distortions (such as
blurring, blocking and ringing described earlier), and more importantly, these distortions are
inconsistent across color channels.
Interpolation Operations
− Moreover, because of the lower resolution in the chromatic channels, the rendering processes
inevitably involve interpolation operations, leading to additional inconsistent color spreading in the
rendering result.
− In the literature, it was shown that chromatic distortion is helpful in color image quality assessment, but
how color bleeding affects the overall perceptual quality of compressed video is still an unsolved
problem.
38
Color Bleeding (Smearing)
Temporal artifacts refer to those distortion effects that are not observed when the video is paused but during
video playback.
39
Temporal Artifacts
Temporal
Artifacts
Flickering
Mosquito Noise
Fine-granularity Flickering
Coarse-granularity
FlickeringJerkiness
Floating
Texture Floating
Edge Neighborhood Floating
Temporal artifacts are of particular interest to us for two reasons
I. As compared to spatial artifacts, temporal artifacts evolve more significantly with the development
of video coding techniques.
• For example, texture floating did not appear to be a significant issue in early video coding standards, and is
more manifest in H.264/AVC video, but is largely reduced in the latest HEVC coded video.
II. The objective evaluation of such artifacts is more challenging, and popular VQA models often fail to
account for these artifacts.
• More importantly, it was pointed out that such drops are largely due to the lack of proper assessment in
temporal artifacts such as flickering and floating (ghosting).
40
Temporal Artifacts
41
Flickering
Reference frame
42
Flickering
Compressed frame with Flickering artifact
Frequent luminance or chrominance
changes along temporal dimension
Flickering artifact generally refers to frequent luminance or chrominance changes along temporal
dimension that does not appear in uncompressed reference video.
− It can be very eye-catching and annoying to viewers and has been identified as an important temporal
artifacts that has significant impact on perceived quality.
− The most likely cause of this type of flickering is the use of GOP structures in the compression algorithm.
− I-frame-based algorithms are not susceptible to this type of artifact.
43
Flickering
Flickering
Mosquito noise
Coarse-granularity flickering
Fine-granularity flickering
− Haziness, shimmering, blotchy noise around objects/edges
• Varies from frame to frame, like mosquitos flying around person’s head.
− Addition of edges, Incorrect DCT block reconstruction, Pixels of the opposite colour created.
44
Mosquito Noise (Gibbs Effect, Edge Busyness)
− Mosquito noise is a joint effect of object motion and time-varying spatial artifacts (such as ringing and
motion prediction error) near sharp object boundaries.
− Specially, the ringing and motion prediction error are most manifest at the regions near the boundaries of
objects.
− When the objects move, such noise-like time-varying artifacts move together with the objects, and thus
look like mosquitos flying around the objects.
− Since moving objects attract visual attention and the plane region near object boundaries have weak
visual masking effect on the noise, mosquito noise is usually easily detected and has strong negative
impact on perceived video quality.
− A variant of flickering, it’s typified as haziness and/or shimmering around high-frequency content (sharp
transitions between foreground entities and the background or hard edges), and can sometimes be
mistaken for ringing.
45
Mosquito Noise (Gibbs Effect, Edge Busyness)
Coarse-granularity flickering refers to low-frequency sudden luminance changes in large spatial regions that
could extend to the entire video frame.
− The most likely reason of such flickering is the use of group of-picture (GoP) structures in standard video
compression techniques.
− When a new GoP starts, there is no dependency between the last P-frame in the previous GoP and the I-
frame in the current GoP.
− Thus sudden luminance change is likely to be observed, especially when these two frames are about the
same scene.
46
Coarse-granularity Flickering
GOP = 6
No Dependency No Dependency
− The frequency of coarse-granularity flickering is typically determined by the size of GoP.
− Advanced video encoders may not use fixed GoP lengths or structures, and an I-frame may be
employed only when scene change occurs, and thus coarse-granularity flickering may be avoided or
significantly reduced.
47
Coarse-granularity Flickering
GOP = 6
No Dependency No Dependency
GOP = 12
No Dependency
Fine-granularity flickering is typically observed in large low-energy to mid-energy regions with significant
blocking effect and slow motion.
− In these regions, significant blocking effect occurs at each frame.
− The levels of blockiness and the DC values in corresponding blocks change frame by frame.
− Consequently, these regions appear to be flashing at high frequencies, frame-by-frame (as opposed to
GoP-by-GoP in coarse granularity flickering).
− Such flashing effect is highly eye-catching and perceptually annoying, especially when the associated
moving regions are of interest to the human observers.
48
Fine-granularity flickering
− A flicker-like artifact, jerkiness (also known as choppiness), describes the perception of individual still
images in a motion picture.
− Jerkiness, is the perceived uneven or wobbly motion due to frame sampling.
− Jerkiness occurs when the temporal resolution is not high enough to catch up with the speed of moving
objects, and thus the object motion appears to be discontinuous.
− The highly visible jerkiness is typically observed only when there is strong object motion in the frame.
− The resulting unsmooth object movement may cause significant perceptual quality degradation.
− It may be noted that the frequency at which flicker and jerkiness are perceived is dependent upon many
conditions, including ambient lighting conditions.
49
Jerkiness (Choppiness)
− Jerkiness is not discernible for normal playback of video at typical frame rates of 24 frames per second or
above.
− However, in visual communication systems, if a video frame is dropped by the decoder owing to its late
arrival, or if the decoding is unsuccessful owing to network errors, the previous frame would continue to be
displayed.
− Upon successful decoding of the next error-free frame, the scene on the display would suddenly be
updated. This would cause a visible jerkiness artifact.
50
Jerkiness
Encoder DecoderNetwork Channel
• Dropped by the decoder owing to its late arrival
• Unsuccessful decoding owing to network errors
− Traditionally, jerkiness is not considered a compression artifact, but an effect of the low temporal
resolution of video acquisition device, or a video transmission issue when the available bandwidth is not
enough to transmit all frames and some frames have to be dropped or delayed.
Telecine Judder
− Another flicker-like artifact is the telecine judder.
− It’s often caused by the conversion of 24 fps movies to a 30 or 60 fps video format. The process, known as
"3:2 pulldown" or "2:3 pulldown," can’t create a flawless copy of the original movie because 24 does not
divide evenly into 30 or 60.
− Jerkiness also sometimes is called Judder.
51
Jerkiness
52
Floating
Reference frame
53
Floating
Compressed frame with Floating artifact
Floating refers to the appearance of illusive motion in certain regions as opposed to their surrounding
background.
− Visually, these regions appear as if they were floating on top of the surrounding background.
− Typically, the video encoders associate the Skip coding mode with these regions (where the encoding of
motion compensated prediction residue is skipped), and thus the structural details within the regions keep
unchanged across frames.
− It is again erroneous as the actual details in the regions are evolving over time.
− This is the result of the encoder erroneously skipping predictive frames.
54
Floating
Encoder Floating Region
Skip Coding Mode
It has illusive motion as opposed
to their surrounding background.
Texture floating typically occurs at large mid-energy texture regions.
− The most common case is when a scene with large textured regions such as water surface or trees is
captured with a camera having a slow motion movements.
− Despite the actual shifting of image content due to camera motion, many video encoders choose to
encode the blocks in the texture regions with zero motion and Skip mode.
− These are reasonable choices to save bandwidth without significantly increasing the mean squared error
or mean absolute error between the reference and reconstructed frames, but often create strong texture
floating illusion.
− Not surprisingly, such floating illusive motion is usually in opposite direction with respect to the camera
motion with the same absolute speed.
− It was sometimes also referred to as” ghosting”
55
Texture Floating Floating
Texture
floating
Edge
neighborho
od floating
Large Mid-energy Texture Regions
a) The 200th frame in the original video
b) The 200th frame in the compressed video
(visual texture floating regions are marked manually)
a) The texture floating map generated (black regions
indicate texture floating is detected)
56
Texture Floating Detection
− Among all types of temporal artifacts, texture floating is perhaps the least identified in the literature, but is
found to be highly eye-catching and visually annoying when it exists.
Factors are relevant in detecting and assessing texture floating
1. Global motion:
• Texture floating is typically observed in the video frames with global camera motion, including translation, rotation
and zooming. The relative motion between the floating regions and the background creates the floating illusion in
the visual system.
• A robust global motion estimation method is employed which uses the statistical distribution of motion vectors from
the compressed bitstream.
2. Skip mode:
• The use of Skip mode is the major source of temporal floating effect. When the compressed video stream is
available, the Skip mode can be easily detected in the syntax information.
57
Texture Floating Detection
3. Local energy:
• In high-energy texture and edge regions, erroneous motion estimation and Skip mode selection are unlikely in most
high-performance video encoders. On the other hand, there is no visible texture in low-energy regions.
• Therefore, texture floating is most likely seen in mid-energy regions. Therefore, we can define two threshold energy
parameters E1; E2 to constrain the energy range for texture floating detection.
4. Local luminance:
• The visibility of texture floating is also limited by the luminance levels of the floating regions.
• Because human eyes are less sensitive to textures in highly bright or dark regions, we can define two threshold
luminance parameters L1; L2 to consider only mid-luminance regions for texture floating identification.
5. Temporal variation similarity:
• Temporal floating is often associated with erroneous motion estimation. In the reconstruction of video frames,
erroneous motion estimation/compensation leads to significant distortions along temporal direction.
• In the case that the original uncompressed video is available to compare with, it is useful to evaluate the similarity of
temporal variation between the reference video frames and the compressed video frames as a factor to detect
temporal floating effect.
58
Texture Floating Detection
Edge neighborhood floating is observed in stationary regions that are next to the boundaries of moving
objects.
− Rather than remaining stationary, these regions may move together with the boundaries of the objects.
− Different from texture floating, edge neighborhood floating may appear without global motion.
− It is often visually unpleasant because it looks like there exists a wrapped package surrounding and
moving together with the object boundaries.
− This effect was also called stationary area temporal fluctuations.
59
Edge Neighborhood Floating Floating
Texture
floating
Edge
neighborho
od floating
Stationary Region
Boundaries of Moving Objects
− Looks like random noise process (snow); can be grey or colored, but not uniform over image
• Caused by quantization of DCT coefficients
60
Quantization Noise
− It is a frequent problem, which is caused by the limits of 8-bit color depth and (especially in low bitrate) truncation.
• Not enough colours to define the image correctly.
• Not noticeable in high detail areas of the image.
• Affects flat areas and smooth transitions badly.
61
Contouring (Banding)
− Similar reason to contouring
− Make video look like a poster
− Colors look flat
− Makeup looks badly applied
62
Posterization
− Similar reason to contouring
− Make video look like a poster
− Colors look flat
− Makeup looks badly applied
63
Posterization
− Data incorrectly saved or retrieved. Affects still images as a loss of part of the image …
… or a block shift of part of the image.
64
Data Corruption
65
Data Corruption
− Data incorrectly saved or retrieved. Affects still images as a loss of part of the image …
… or a block shift of part of the image.
66
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
Milestones in Video Coding
67
Milestones in Video Coding
68
69
Audio and Video Visual Quality
− Audiovisual Quality Control
• Audio
• Video
• Interaction between audio and video : e.g. lip-sync
− File based AV Quality Control
• Part of an automated or manual workflow
• Diagnose
• Repair / Redo
− Technical Quality Control
• Container
• Metadata
• Interaction between container, audio and video : e.g. duration of tracks
− File Based Technical Quality Control
• Part of an automated or manual workflow
• Application specifications
70
Video Quality Control in a File-based World
Safeguarding Audiovisual Quality
− Maintaining quality throughout the production chain
– Choose material as close to source as possible
• Prevent unneeded multi-generation
– Try to produce with the shortest/’most apt’ chain
• Prevent unneeded multi-generation
• Prevent transcoding
– Check quality
− Carefully design the production chain
– Choose the right codecs
– Choose the right equipment
71
Video Quality Control in a File-based World
− Video compression algorithm factors
• Decoder concealment, packetization, GOP structure, …
− Network-specific factors
• Delay, Delay variation, Bit-rate, Packet loss rate (PLR)
− Network independent factors
• Sequence
− Content, amount of motion, amount of texture, spatial and temporal resolution
− User
• Eyesight, interest, experience, involvement, expectations
− Environmental viewing conditions
• Background and room lighting; display sensitivity, contrast, and characteristics; viewing distance
72
Factors that Affect Video Quality
73
Image and Video Processing Chain
Display
Transmission
Compression
Acquisition
Aliasing, Blurring, Ringing
Noise, Contouring, Distortions
Blocking/Tiling, MC Edges, Aliasing
Blurring, Ringing, Flicker, jerkiness
Noise; Jerkiness; Blackout
Packet loss, Macroblocking
Aliasing, Blurring, Ringing, Color
contrast artifacts, Interlacing, Overscan
The Spectrum of Visual Quality
Perfect (Lossless)
Awful
Visually Lossless?
Good enough?
74
What Affects Visual Quality? (1)
Compression Ratio
Computation
Delay
Codec Type
Codec Version
Transmission Errors
System Issues
75
What Affects Visual Quality? (2)
Detail
Motion
Masking
Spatial Masking
Content
Issues
76
What Affects Visual Quality? (4)
Environment
Experience
Human Issues
Task
Attention
Attention
77
78
Excellent
Good
Fair
Poor
Bad
Absolute Category Rating
A
B
Pair-wise comparison
Absolute Category Rating and Pair-wise Comparison
− In Video/image processing, the pixel values may change leading to distortions
− Examples of added distortions
• Filtering/Interpolation distortions
• Compression distortions
• Watermarking distortions
• Transmission distortions
− It is important to know if added distortions are acceptable
Objective and Subjective Measurements/Assessment
79
80
Objective and Subjective Measurements/Assessment
Objective Metrics
• Peak Signal to Noise Ratio (PSNR)
• Structural Similarity Index (SSIM)
• Just Noticeable Difference (JND)
• … and Many More.
Video Quality
Measurement
Subjective Objective
Subjective Metrics
• MOS: Mean Opinion Score
• DMOS: Differential Mean
Opinion Scores
(it is defined as the
difference between the
raw quality score of the
reference and test images)
…….
81
Objective and Subjective Measurements/Assessment
Video Quality
Measurement
Subjective Objective
Goldeneye
Multiple
Viewer
Absolute Comparison
82
Objective and Subjective Measurements/Assessment
Video Quality
Measurement
Subjective Objective
Reduced
Reference
Model
Based Hybrid
Full
Reference
No
Reference
Distortion
Based
83
Objective and Subjective Measurements/Assessment
Video Quality
Measurement
Subjective Objective
Reduced
Reference
Model
Based
Hybrid
Full
Reference
No
Reference
Distortion
Based
Goldeneye
Multiple
Viewer
Absolute Comparison
− Formation: 1997
− Experts of ITU-T study group 6 and ITU-T study group 9
− They are considering three methods for the development of the video quality metric.
− The VQEG have defined three methods as an objective Video Quality meter:
• Full Reference Method (FR)
• Reduced Reference Method (RR)
• No Reference Method (NR)
− All these models should be validated with the SSCQE (Single Stimulus Continuous Quality Evaluation)
methods for various video segments.
− Early results indicate that these methods compared with the SSCQE perform satisfactorily, with a
correlation coefficient between 0.8 and 0.9.
Video Quality Experts Group (VQEG)
84
ITU-T study
group 9
ITU-T study
group 6
Video Quality
Experts Group
(VQEG)
85
The VQEG Three Methods as an Objective Video Quality Meter
− These three models try to mimic perceptual model of HVS
− They try to
• assess specific distortions (such as blocking, blurring and texture distortion, Jerkiness, Freeze, etc.)
• pooling (combining features, spatially and temporally)
• mapping or contributions of these artifacts into an overall video quality score.
− FR is the best and NR the worst
VQEG Meters and HVS Model
86
Measure Pool
Map to
Quality
− A full-reference (FR) quality measurement makes a comparison between a (known) reference video
signal at the input of the system and the processed video signal at the output of the system
Full-reference (FR) Method
87
− In a reduced-reference (RR) quality measurement, specific parameters (features) are extracted from both
the (known) reference and processed signals.
− Reference data relating to these parameters are sent using a side-channel to the measurement system.
− The measurement system extracts similar features to those in the reference data to make a comparison
and produce a quality measurement.
Reduced-reference (RR) Method
88
Similar features extraction
− A no-reference (NR) quality measurement analyses only the processed video without the need to access
the (full or partial) reference information.
No-reference (NR) Method
89
Image/Spatial
• Blurring
• Ringing
• Snow; Noise
• Aliasing; HD on SD and vice versa
• Colorfulness; color shifts
• PSNR due to quantization
• Contrast
• Ghosting
• Interlacing
• Motion-compensated edge artifacts!
No-reference (NR) Method
90
Video/Temporal
• Flicker
• Block flashing artifacts (mosquito noise?)
• Telecine
• Jerkiness
• Blackness
FR MOS vs PSNR
91
Pooling: combining features; combining
in space, frequency, orientation, time
Typical Video Quality Estimator (QE) or Video Quality Assessment System
92
Quality
Estimator
Measure Pool
Map to
Quality
Reference
image
Distorted test
image
Absolute
QE score
Methodology
− Measure individual artifacts
− Combine many artifacts
• Combine linearly
• Combine non-linearly
− Measure physical artifacts
− Measure perceptual artifacts
− Incorporate viewing distance
(cyc/degree) into NR
− Combine NR with FR-HVS
93
The Ideal Quality Estimator
Subjective Quality
Objective Quality
A QE With a Systematic Weakness
Subjective Quality
Objective Quality
Specific type of processing,
or specific type of image (or video)
A Quality Estimator (QE) With a Systematic Weakness
A Typical QE On a Typical Dataset
Subjective Quality
Objective Quality
− Algorithm optimization
• Automated in-the-loop assessment
− Product benchmarks
• Vendor comparison to decide what product to buy
• Product marketing to convince customer to give you $$
− System provisioning
• Determine how many servers, how much bandwidth, etc.
− Content acquisition and delivery (and SLAs)
• Enter into legal agreements with other parties
− Outage detection and troubleshooting
Applications Of Video Quality Estimators
94
Absolute QE scores
− Absolute QE scores are useful for
• product benchmarking
• content acquisition
• system provisioning
− Absolute QE scores still depend on context. They are not truly “absolute”.
Relative QE scores
− Relative QE scores are useful for
• algorithm optimization
• product benchmarking
Absolue vs. Relative Quality Estimator (QE) Scores
95
− Full reference (FR)
• Most available info; requires original and decoded pixels
• ITU-T standards J.247, J.341
− Reduced reference (RR)
• Most available info; requires original and decoded pixels extracted features
• ITU-T standards J.246, J.342
− No-Reference Pixel-based methods (NR-P)
• Requires decoded pixels: a decoder for each video stream
− No-Reference Bitstream-based methods (NR-B)
• Processes packets containing bitstream, without decoder
• ITU-T standards: P.1201 (packet headers only);P.1202 (packet headers and bitstream info)
− Hybrid Methods (Hybrid NR-P-B)
• Hybrid models combine parameters extracted from the bitstream with a decoded video signal.
• They are therefore a mix between NR-P and NR-B models.
Quality Estimator (QE) Categorization
96
Quality Estimator (QE) Categorization
97
Use information from a collection of “similar enough” images to estimate quality of them all
− Applications
• Super-resolution
• Downsampling
• image fusion
• images collected of the same scene from different angles, etc
• egocentric video
− A relative quality estimation
− Uses effective information from overlapping regions; does not require perfect pixel alignment
Quality Estimator (QE) with Mutual Reference
98
− Sequence
• Content
• amount of motion, amount of texture
• spatial and temporal resolution
− User
• Eyesight
• Interest
• Experience
• Involvement
• expectations
− Environmental viewing conditions
• Background and room lighting
• display sensitivity, contrast, and characteristics
• viewing distance
− The processing may improve, not degrade, quality!
Why FR Quality Estimator (QE) are Challenging to Design?
99
− All the reasons FR QE are challenging plus…
− Many types of processing
• Encoding
• transmission errors
• Sampling
• backhoe, …
− Many desired signals may “look like” distortion
− Limited input information – no vref and often no vtest!
− Nonetheless, some applications require NR
Why NR Quality Estimator (QE) are Challenging to Design?
100
Approach 1: Model Perception and Perceptual Attributes
− Psychology community, Martens, Kayyargadde, Allnatt, Goodman,…
Approach 2: Model System Quality
− The entire photography/camera community, Keelan,…
Approach 3:
− The image processing community
− H.R. Wu, Winkler, Marziliano, Bovik, Sheikh, Wang, Ghanbari, Reibman, Wolf
Existing Approaches to NR Quality Estimator (QE)
101
Original video X
Encoding parameters E(.)
Complete encoded bitstream E(X)
Network impairments (losses, jitter) L(.)
Lossy bitstream L(E(X))
Decoder (concealment, buffer, jitter) D(.)
Decoded pixels D(L(E(X)))
What Information Can You Gather?
102
NetwotkEncoder Decoder
A B C D
Original video X
Encoding parameters E(.)
Complete encoded bitstream E(X)
Network impairments (losses, jitter) L(.)
Lossy bitstream L(E(X))
Decoder (concealment, buffer, jitter) D(.)
Decoded pixels D(L(E(X)))
Full-Reference Quality Estimator (QE)
103
NetwotkEncoder Decoder
A B C D
Original video X
Encoding parameters E(.)
Complete encoded bitstream E(X)
Network impairments (losses, jitter) L(.)
Lossy bitstream L(E(X))
Decoder (concealment, buffer, jitter) D(.)
Decoded pixels D(L(E(X)))
Quality Estimator (QE) Using Network Measurements
104
NetwotkEncoder Decoder
A B C D
Original video X
Encoding parameters E(.)
Complete encoded bitstream E(X)
Network impairments (losses, jitter) L(.)
Lossy bitstream L(E(X))
Decoder (concealment, buffer, jitter) D(.)
Decoded pixels D(L(E(X)))
Quality Estimator (QE) Using Lossy Bitstream
105
NetwotkEncoder Decoder
A B C D
106
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
− Mean Squared Error (MSE) of a quantizer for a continuous valued signal
• Where 𝑝(𝑓) is the probability density function of 𝑓
− MSE for a specific image
107
Mean Squared Error (MSE)
− Signal to Noise Ratio (SNR)
− Peak SNR or PSNR
• For the error measure to be “independent of the signal energy”→
Signal energy≡ Dynamic range square of the image
• For 8 bit image, peak=255
108
Signal to Noise Ratio (SNR) and Peak Signal to Noise Ratio (PSNR)
109
MSE of a Uniform Quantizer for a Uniform Source
• Uniform quantization into L levels: 𝑞 = 𝐵/𝐿
• Same error in each bin
• Error is uniformly distributed in (−𝑞/2, +𝑞/2)
𝜎𝑓
2
= න
𝑓 𝑚𝑖𝑛
𝑓 𝑚𝑎𝑥
(𝑓 − 𝑄(𝑓))2
1
𝐵
𝑑𝑓 =
𝐵2
12
110
𝑆𝑄𝑁𝑅 = 10 log
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
= 6𝐵 + 1.78
𝑃𝑆𝑁𝑅 = 10 log
𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
=?
𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
=
𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
×
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
=(
(2𝐴)2
(
𝐴
2
)2
)×
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
=8
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
𝑷𝑺𝑵𝑹 = 10 log 8 ×
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
= 10 log 8 + (6𝐵 + 1.78) ≈ 𝟔𝑩 + 𝟏𝟏 (𝒅𝑩)
PSNR for a Sin Waveform
2𝐴
− Due to the limitations in subjective assessments in terms of cost and time consuming, the researchers are
focusing for the automated quality assessment.
− The computerized assessment of quality metric is mainly based on mathematical calculations which also
include the property of human visual system (HVS).
− Ideally we want to measure the performance by how close is the quantized image to the original image
“Perceptual Difference”
− But it is very hard to come up with an objective measure that correlates very well with the perceptual
quality
− Frequently used objective measure
− Mean Squared Error (MSE) between original and quantized samples
− Signal to Noise Ratio (SNR)
− Peak SNR (PSNR)
111
Objective Measurement of Performance
− The Mean Squared Error (MSE) is simplest Objective measure and is calculated:
− Where 𝒀 𝒓 𝒙, 𝒚 and 𝒀 𝒅 𝒙, 𝒚 are the luminance level of pixel in the reference image and the coded images
of size 𝑚 × 𝑛 respectively.
− The PSNR is calculated using MSE as:
Objective Assessment by PSNR and MSE
112
𝑀𝑆𝐸 =
1
𝑚 × 𝑛
෍
𝑥=1
𝑚
෍
𝑦=1
𝑛
(𝑌𝑟 𝑥, 𝑦 − 𝑌𝑑 𝑥, 𝑦 )2
𝑃𝑆𝑁𝑅 = 10 log10
2552
𝑀𝑆𝐸
𝑑𝐵
113
PSNR
− The difference between PSNR and APSNR is in the way of average PSNR calculation for a sequence.
− The correct way to calculate average PSNR for a sequence is to calculate average MSE for all frames
(average MSE is arithmetic mean of the MSE values for frames) and after that to calculate PSNR using
ordinary equation for PSNR:
APSNR (Average Peak Signal to Noise Ratio)
− But sometimes it is needed to take simple average of all the per frame PSNR values.
− APSNR is implemented for this case and calculate average PSNR by simply averaging per frame PSNR
values.
− APSNR is usually 1dB higher than PSNR.
PSNR and APSNR
𝑃𝑆𝑁𝑅 = 10 log10
2552
𝑀𝑆𝐸
𝑑𝐵
114
− Usually our PSNR refers to Y-PSNR, because human eyes are more sensitive to luminance difference.
− However, some customers calculate PSNR by combining Y, U, V-PSNR in a costumed ratio, like:
PSNR Channel
𝑃𝑆𝑁𝑅 =
6 ∗ 𝑌 − 𝑃𝑆𝑁𝑅 + 𝑈 − 𝑃𝑆𝑁𝑅 + 𝑉 − 𝑃𝑆𝑁𝑅
8
115
Encoding y4m with H.264 video codec, MP4 container, 2Mbps bitrate, PSNR measurement:
• ffmpeg -i source .y4m -codec h264 -b 2000000 destination.mp4 –psnr
Note: Other codecs: -codec mpeg2video , -codec hevc, -codec vp9, …)
Decoding from coded media file (any) to y4m:
• ffmpeg -i source.mp4 destination.y4m
Encoding, Decoding and PSNR measurement by FFmpeg
116
PSNR
Without saving the results in a log file:
− ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi psnr -f null -
With saving the results in a log file:
− ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi psnr=psnr.log -f null –
SSIM
Without saving the results in a log file:
− ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi ssim -f null -
With saving the results in a log file:
− ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi ssim=ssim.log -f null -
PSNR and SSIM measurement by FFmpeg
117
Original Processed MSE
Objective Assessment by MSE
𝑀𝑆𝐸 =
1
𝑚 × 𝑛
෍
𝑥=1
𝑚
෍
𝑦=1
𝑛
(𝑌𝑟 𝑥, 𝑦 − 𝑌𝑑 𝑥, 𝑦 )2
118
Objective Assessment by MSE
𝑀𝑆𝐸 =
1
𝑚 × 𝑛
෍
𝑥=1
𝑚
෍
𝑦=1
𝑛
(𝑌𝑟 𝑥, 𝑦 − 𝑌𝑑 𝑥, 𝑦 )2
(a) original image
(b) luminance offset
(c) contrast stretch
(d) impulse noise
(e) Gaussian noise
(f) Blur
(g) JPEG compression
(h) spatial shift (to the left)
(i) spatial scaling (zoom out)
(j) rotation (CCW).
Images (b)-(g) have nearly identical
MSEs but very different visual quality.
(b) MSE = 309
(a)
(c) MSE = 306 (d) MSE = 313
(e) MSE = 309 (f) MSE = 308 (g) MSE = 309
(h) MSE = 871 (i) MSE = 694 (j) MSE = 590
119
PSNR=19.09 dB PSNR=25.25 dB
Objective Assessment by PSNR
PSNR for Image Encoding: JPEG-2000
120Original
Objective Assessment by PSNR
121PSNR 45.53 [dB]
Objective Assessment by PSNR
122PSNR 36.81 [dB]
Objective Assessment by PSNR
123PSNR 31.45 [dB]
Objective Assessment by PSNR
124
Objective Assessment by MSE and PSNR
125
50
45
40
35
30
25
20
40dB:
Every pixel off by 1%
30dB:
Every pixel off by 3.2%
20dB:
Every pixel off by 10%0 5 10 15 20
Root MSE
PSNR
An Example of PSNR Relationship to Root MSE
126
PSNR 25.8
0.63 bpp (bit per pixel)
12.8 : 1
PSNR Reflects Fidelity (1)
127
PSNR 24.2
0.31 bpp (bit per pixel)
25.6 : 1
PSNR Reflects Fidelity (2)
128
PSNR 23.2
0.16 bpp (bit per pixel)
51.2 : 1
PSNR Reflects Fidelity (3)
PSNR = 25.12 dB PSNR = 25.11 dB
Q. Huynh-Thu and M. Ghanbari, ‘The scope of validity of PSNR in image/video quality assessment, Electronic Letters, 44:13, (June 2008), pp. 800-801
129
PSNR is not Everything
130
PSNR = 25.36 dB PSNR = 25.25 dB
PSNR is not Everything
131
PSNR is not Everything
132
PSNR = 25.8 dB PSNR = 25.8 dB
PSNR is not Everything
− Take a natural image
− Give more bits to areas you look at more
(Saliency map)
− Give less bits to areas you look at less
− Subjective rating will be high, PSNR low
Ex. 1: How to Trick PSNR?
133
Original
Attention Map Example Test (High subjective rating, low PSNR)
Ex: A small part of a picture in a video is severely degraded
− This type of distortion is very common in video, where due to a single bit error, blocks of 16×16 pixels might
be erroneously decoded.
− This has almost no significant effect on PSNR but can be viewed as an annoying artefact.
Ex. 2: How to Trick PSNR?
134
A Picture
Severely Degraded
(Distorted)
It hardly affects the PSNR or any objective model
parameters (depending on the area of distortion)
It attracts the observers attention, and the video
looks bad if a larger part of the picture was distorted.
− In comparing codecs, PSNR or any objective measure should be used with great care.
− PSNR does not correlate accurately with the picture quality and thus it would be misleading to directly
compare PSNR from very different algorithms.
“Ensure that the types of coding distortions are not significantly different from each other”
Ex. 3: How to Trick PSNR?
135
Coder 1
(block-based)
Coder 2
(filter-based)
Blockiness Distortion, PSNR,1
Smearing Distortion, PSNR,2
Same Input Objective results can be different
The expert viewers prefer blockiness distortion to the smearing, and nonexperts’ views are opposite!
1. Not shift invariant
2. Rates enhanced images as degraded
3. Difficult to handle different spatial resolutions
4. Difficult to handle different temporal resolutions
5. Does not consider underlying image
• Masking in human visual system (temporal, luminance, texture)
• Will clearly fail to accurately compare different source material
6. Does not consider relationship among pixels
• Compares different types of impairments poorly
• Blocking vs. blurring; Wavelet vs. DCT; High vs. low spatial frequency
Six Situations in Which PSNR Fails
136
− The main criticism against the PSNR
“The human interpretation of the distortions at different parts of the video can be different”
− Although it is hoped that the variety of interpretations can be included in the objective models, there are
still some issues that not only the simple PSNR but also more sophisticated objective models may fail to
address.
Objective Assessment by PSNR
137
Example of PSNR interpretation in terms of a quality
for a specific video codec and streamed video
QualityPSNR Value
Excellent QualityPSNR > 33dB
Fair Quality33 dB < PSNR < 30 dB
Poor QualityPSNR < 30 dB
What is the main reason that PSNR is still used in comparing the performance of various video codecs?
− Under similar conditions if one system has a better PSNR than the other, then the subjective quality can be
better but not worse.
− PSNR can provide information about the behaviour of the compression algorithm through the multi-
generation process.
Objective Assessment by PSNR
138
System 1
System 2
PSNR,1
PSNR,2
Same Input
MSE and PSNR are widely used because they are simple and easy to calculate and mathematically easy to
deal with for optimization purpose.
− Mathematically, it’s very tractable
• Easy to model, differentiable
− Many times, increasing PSNR improves visual quality
− Experimentally, it’s easy to optimize
• Reducing error in one pixel increases PSNR
• Can reduce the error in each pixel independently
− Rate distortion optimization successes
• Search over all possible strategies, the one that minimizes the distortion (D=MSE) at the decoder,
subject to a constraint on encoding rate R.
• Shows dramatic improvement in images, video
Good Things about MSE and PSNR
139
A number of reasons why MSE or PSNR may not correlate well with the human perception of quality.
• Digital pixel values, on which the MSE is typically computed, may not exactly represent the light stimulus
entering the eye.
• Simple error summation, like the one implemented in the MSE formulation, may be markedly different from
the way the HVS and the brain arrives at an assessment of the perceived distortion.
• Two distorted image signals with the same amount of error energy may have very different structure of
errors, and hence different perceptual quality.
Some PSNR Facts
140
− PSNR is inaccurate in measuring video quality of a video content encoded at different frame rates
because it is not capable of assessing the perceptual trade-off between the spatial and temporal
qualities.
− PSNR follows a monotonic relationship with subjective quality in the case of full frame rate encoding
(without the presence of frame freezing or dropping) when the video content and codec are fixed.
− So PSNR can be used as an indicator of the variation of the video quality when the content and codec are
fixed across the test conditions, and when the encoding is done at full frame rate without the presence of
frame freezing or dropping (it leads to frame rate change).
− PSNR becomes an unreliable and inaccurate quality metric when several videos with different content are
jointly assessed.
Some PSNR Facts
141
142
PSNR
Long GOP
B I B B P B B I BI I I I I I I I I
Time
GOP
I Frame Only
Ti me
PSNR in Different Moving Picture Types
143
PSNR in Different Moving Picture Types
144
PSNR, GOP, Intra and Inter Coding
1st 5th
Long GOP Codec
PSNR
10th
Generation
AVC-Intra100 : Cut Edit
AVC-Intra50 : Cut Edit
Still Pictures
Fast Motion
Confetti fall
Flashing lights
Landscape
Long GOP quality is
content dependent
− This measure gives the difference between color components between the original frame and
compressed frame. The value of this metric is the mean absolute difference of the color components in
the correspondent points of image.
− The values are in 0..1. The value 0 means equal frames, lower values are better.
− It can be used to identify which part of a search image is most similar to a template image.
Mean Sum of Absolute Difference (MSAD)
145
Original Processed MSAD
𝒅 𝑿, 𝒀 =
σ𝒊=𝟏,𝒋=𝟏
𝒎,𝒏
𝑿𝒊,𝒋 − 𝒀𝒊,𝒋
𝒎𝒏
The conventional PSNR demonstrates inaccuracy of measurement while applied to measure video stream
over wireless and mobile network.
− It is due to packet loss issue in the wireless and mobile network.
− A concept of dynamic window size is used to improve accuracy of frame lost detection.
− The concept is named Aligned-PSNR (APSNR).
Aligned-PSNR (APSNR)
146
Illustration of conventional PSNR.
− The window size is depending on sum of frame loss in the S-frame.
− In this case, there are total of five frame losses so the window size is six (𝑾𝒊𝒏𝒅𝒐𝒘 𝒔𝒊𝒛𝒆 = 𝑺𝒖𝒎𝑭𝑳 + 𝟏).
− Window determines limit of frame loss searching so that the algorithm only need to find corresponding
frames in total of window size.
− The processed S-frame (S-frame number x) should correspond with O-frame number eight.
Aligned-PSNR (APSNR)
147
Original
Streamed
Two phenomena demonstrate that perceived brightness is not a simple function of intensity.
− Mach Band Effect: The visual system tends to undershoot or overshoot around the boundary of regions of
different intensities.
− Simultaneous Contrast: a region’s perceived brightness does not depend only on its intensity.
Perceived Brightness Relation with Intensity
148
Mach band effect.
Perceived intensity is
not a simple function
of actual intensity.
Examples of simultaneous contrast.
All the inner squares have the same intensity, but they appear
progressively darker as the background becomes lighter
− The term masking usually refers to a destructive interaction or
interference among stimuli that are closely coupled in time or
space.
− This may result in a failure in detection or errors in recognition.
− Here, we are mainly concerned with the detectability of one
stimulus when another stimulus is present simultaneously.
− The effect of one stimulus on the detectability of another,
however, does not have to decrease detectability.
Masking Recall
149
I: Gray level (intensity value)
Masker: Background 𝐼2 (one stimulus)
Disk: Another stimulus 𝐼1
In ∆𝑰 = 𝑰 𝟐 − 𝑰 𝟏,the object can be noticed by
the HVS with a 50% chance.
− Under what circumstances can the disk-shaped object be
discriminated from the background (as a masker stimulus) by
the HVS? Weber’s law:
− Weber’s law states that for a relatively very wide range of I
(Masker), the threshold for disc discrimination, ∆𝑰, is directly
proportional to the intensity I.
• Bright Background: a larger difference in gray levels is needed
for the HVS to discriminate the object from the background.
• Dark Background: the intensity difference required could be
smaller.
Masking Recall
150
Contrast Sensitivity Function (CSF)
I: Gray level (intensity value)
Masker: Background 𝐼2 (one stimulus)
Disk: Another stimulus 𝐼1
In ∆𝑰 = 𝑰 𝟐 − 𝑰 𝟏,the object can be noticed by
the HVS with a 50% chance.
∆𝑰
𝑰
= 𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕 (≈ 𝟎. 𝟎𝟐)
− The HVS demonstrates light adaptation characteristics and as a consequence of that it is sensitive to
relative changes in brightness. This effect is referred to as “Luminance masking”.
“ Luminance Masking: The perception of brightness is not a linear function of the luminance”
− In fact, the threshold of visibility of a brightness pattern is a linear function of the background luminance.
− In other words, brighter regions in an image can tolerate more noise due to distortions before it becomes
visually annoying.
− The direct impact that luminance masking has on image and video compression is related to
quantization.
− Luminance masking suggests a nonuniform quantization scheme that takes the contrast sensitivity function
into consideration.
Luminance Masking
151
− It can be observed that the noise is more visible in the dark area than in the bright area if comparing, for
instance, the dark portion and the bright portion of the cloud above the bridge.
152The bridge in Vancouver: (a) Original and (b) Uniformly corrupted by AWGN.
Luminance Masking
Luminance Masking
− The perception of brightness is not a linear function of the luminance.
− The HVS demonstrates light adaptation characteristics and as a consequence of that it is sensitive to
relative changes in brightness.
Contrast Masking
− The changes in contrast are less noticeable when the base contrast is higher than when it is low.
− The visibility of certain image components is reduced due to the presence of other strong image
components with similar spatial frequencies and orientations at neighboring spatial locations.
Contrast Masking
153
With same MSE:
• The distortions are clearly visible in the ‘‘Caps’’ image.
• The distortions are hardly noticeable in the ‘‘Buildings’’ image.
• The strong edges and structure in the ‘‘Buildings’’ image
effectively mask the distortion, while it is clearly visible in the
smooth ‘‘Caps’’ image.
This is a consequence of the contrast masking property of the HVS i.e.
• The visibility of certain image components is reduced due to
the presence of other strong image components with similar
spatial frequencies and orientations at neighboring spatial
locations.
Contrast Masking
154
(a) Original ‘‘Caps’’ image (b) Original ‘‘Buildings’’ image
(c) JPEG compressed image, MSE = 160 (d) JPEG compressed image, MSE = 165
(e) JPEG 2000 compressed image, MSE =155 (f) AWGN corrupted image, MSE = 160.
− In developing a quality metric, a signal is first decomposed into several frequency bands and the HVS
model specifies the maximum possible distortion that can be introduced in each frequency component
before the distortion becomes visible.
− This is known as the Just Noticeable Difference (JND).
− The final stage in the quality evaluation involves combining the errors in the different frequency
components, after normalizing them with the corresponding sensitivity thresholds, using some metric such
as the Minkowski error.
− The final output of the algorithm is either
• a spatial map showing the image quality at different spatial locations
• a single number describing the overall quality of the image.
Developing a Quality Metric Using Just Noticeable Difference (JND)
155
− The Contrast Sensitivity Function (CSF) provides a description of the frequency response of the HVS, which
can be thought of as a band-pass filter.
Weber’s law
∆𝑰
𝑰
= 𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕 ≈ 𝟎. 𝟎𝟐 → 𝑪𝑭𝑺 =
𝐈
∆𝑰 𝒎𝒊𝒏
− The HVS is less sensitive to higher spatial frequencies and this fact is exploited by most compression
algorithms to encode images at low bit rates, with minimal degradation in visual quality.
− Most HVS based approaches use some kind of modeling of the luminance masking and contrast
sensitivity properties of the HVS.
Contrast Sensitivity Function (CSF)
156
HVS Model
• Luminance masking
• Contrast sensitivity
• Contrast masking
HVS Model
• Luminance masking
• Contrast sensitivity
• Contrast masking
Frequency
decomposition
Frequency
decomposition
Error
Pooling
Reference
image/video
Distorted
image/video
Block diagram of HVS-based quality metrics
− Structural information is defined as those aspects of the image that are independent of the luminance
and contrast
− The structure of various objects in the scene is independent of the brightness and contrast of the image.
− Structural approaches to image quality assessment, in contrast to HVS-based approaches, take a top-
down view of the problem.
− It is hypothesized that the HVS has evolved to extract structural information from a scene and hence,
quantifying the loss in structural information can accurately predict the quality of an image.
− The structural philosophy overcomes certain limitations of HVS-based approaches such as:
• computational complexity
• inaccuracy of HVS models.
Image Structural Information
157
− PSNR and MSE are inconsistent with human eye perception.
− The distorted versions of the ‘‘Buildings’’ and ‘‘Caps’’ images
have the same MSE with respect to the references.
− The bad visual quality of the ‘‘Caps’’ image can be
attributed to the structural distortions in both the background
and the objects in the image.
− The structural philosophy can also accurately predict the
good visual quality of the ‘‘Buildings’’ image, since the
structure of the image remains almost intact in both distorted
versions.
Image Structural Information
158
(a) Original ‘‘Caps’’ image (b) Original ‘‘Buildings’’ image
(c) JPEG compressed image, MSE = 160 (d) JPEG compressed image, MSE = 165
(e) JPEG 2000 compressed image, MSE =155 (f) AWGN corrupted image, MSE = 160.
− The HVS demonstrates luminance and contrast masking, which SSIM (also called Wang-Bovik Index) takes
into account while PSNR does not.
− The SSIM index executes three comparisons, in terms of
• l(x,y): Luminance Comparison Measurement
• c(x,y): Contrast Comparison Measurement
• s(x,y): Structure Comparison Measurement
− Where x and y are the original and processed pictures
− Value lies between [0,1]
Structural SIMilarity (SSIM)
159
𝑆𝑆𝐼𝑀 𝑥, 𝑦 = 𝑓(𝑙 𝑥, 𝑦 , 𝑐 𝑥, 𝑦 , 𝑠 𝑥, 𝑦 )
160
Structural SIMilarity (SSIM)
− First the mean luminance is calculated
− Then a luminance comparison is executed
− 𝑪 𝟏 is used to stabilize the division with weak denominator
− The value of the parameters 𝑪 𝟏 is set to (𝑲 𝟏. 𝑳) 𝟐, where 𝑲 𝟏 << 𝟏 is a small constant
− Luminance comparison attaining the maximum possible value if and only if the means of the two images
are equal.
Luminance comparison: l(x,y)
161
𝜇 𝑥 =
1
𝑁
෍
𝑖=1
𝑁
𝑥𝑖
𝑙 𝑥, 𝑦 =
2𝜇 𝑥 𝜇 𝑦 + 𝐶1
𝜇 𝑥
2
+ 𝜇 𝑦
2
+ 𝐶1
− The base contrast of each signal is computed using its standard deviation (the average luminance is
removed from the signal amplitude)
− A contrast comparison is computed as
− 𝑪 𝟐 is used to stabilize the division with weak denominator.
− The value of the parameters 𝑪 𝟐 is set to (𝑲 𝟐. 𝑳) 𝟐
, where 𝑲 𝟐 << 𝟏 is a small constant.
− L is the dynamic range of the pixel values.
Contrast Comparison C(x,y)
162
𝜎 𝑥 = (
1
𝑁 − 1
෍
𝑖=1
𝑁
(𝑥𝑖 − 𝜇 𝑥)2
)
1
2 𝜎 𝑦 = (
1
𝑁 − 1
෍
𝑖=1
𝑁
(𝑥𝑖 − 𝜇 𝑦)2)
1
2
𝑐 𝑥, 𝑦 =
2𝜎 𝑥 𝜎 𝑦 + 𝐶2
𝜎 𝑥
2
+ 𝜎 𝑦
2
+ 𝐶2
− The structural comparison is performed between the luminance and contrast normalized signals.
− Let 𝒙 and 𝒚 represent vectors containing pixels from the reference and distorted images respectively.
− For each image, the average luminance is subtracted and divided by its base contrast to normalize it.
− The correlation or inner product between luminance and contrast normalized signals is an effective
measure of the structural similarity.
− The correlation between the normalized vectors is equal to the correlation coefficient between the
original signals 𝒙 and 𝒚.
Structure Comparison S(x,y)
163
Ԧ𝑥 − 𝜇 𝑥
𝜎 𝑥
Ԧ𝑦 − 𝜇 𝑦
𝜎 𝑦
− 𝝈 𝒙𝒚 is the covariance of 𝒙 and 𝒚 that is a measure of the joint variability of 𝑥 and 𝑦 .
− Pearson's correlation coefficient is the test statistics that measures the statistical relationship, or association,
between two continuous variables.
− A Pearson correlation coefficient is calculated as a measure of structural similarity
− 𝑪 𝟑 is used to stabilize the division with weak denominator.
Structure Comparison S(x,y)
164
𝑠 𝑥, 𝑦 =
𝜎 𝑥𝑦 + 𝐶3
𝜎 𝑥 𝜎 𝑦 + 𝐶3
𝜎 𝑥𝑦 =
1
𝑁 − 1
෍
𝑖=1
𝑁
(𝑥𝑖 − 𝜇 𝑥)(𝑦𝑖 − 𝜇 𝑦)
− The SSIM output is a combination of all three components
− This SSIM model is parameterized by 𝜶, 𝜷, 𝜸 where typically the parameter values are all equal to 1.
− 𝑪 𝟏, 𝑪 𝟐 and 𝑪 𝟑 are small constants added to avoid numerical instability when the denominators of the
fractions are small.
Final Value of SSIM
165
𝑆𝑆𝐼𝑀 𝑥, 𝑦 =
2𝜇 𝑥 𝜇 𝑦 + 𝐶1
𝜇 𝑥
2
+ 𝜇 𝑦
2
+ 𝐶1
.
2𝜎 𝑥 𝜎 𝑦 + 𝐶2
𝜎 𝑥
2
+ 𝜎 𝑦
2
+ 𝐶2
.
𝜎 𝑥𝑦 + 𝐶3
𝜎 𝑥 𝜎 𝑦 + 𝐶3
𝑆𝑆𝐼𝑀 𝑥, 𝑦 = [𝑙 𝑥, 𝑦 ] 𝛼. [c 𝑥, 𝑦 ] 𝛽 . [𝑠 𝑥, 𝑦 ] 𝛾
𝑆𝑆𝐼𝑀 𝑥, 𝑦 = [
2𝜇 𝑥 𝜇 𝑦 + 𝐶1
𝜇 𝑥
2
+ 𝜇 𝑦
2
+ 𝐶1
] 𝛼. [
2𝜎 𝑥 𝜎 𝑦 + 𝐶2
𝜎 𝑥
2
+ 𝜎 𝑦
2
+ 𝐶2
] 𝛽 . [
𝜎 𝑥𝑦 + 𝐶3
𝜎 𝑥 𝜎 𝑦 + 𝐶3
] 𝛾
− Two variables 𝑪 𝟏 and 𝑪 𝟏 are used to stabilize the division with weak denominator
• 𝑪 𝟏 = (𝑲 𝟏. 𝑳) 𝟐
, where 𝑲 𝟏 << 𝟏 is a small constant. (By default 𝑲 𝟏 = 𝟎. 𝟎𝟏)
• 𝑪 𝟐 = (𝑲 𝟐. 𝑳) 𝟐
, where 𝑲 𝟐 << 𝟏 is a small constant. (By default 𝑲 𝟐 = 𝟎. 𝟎𝟑)
• 𝑳 = 𝟐 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒃𝒊𝒕𝒔 𝒑𝒆𝒓 𝒑𝒊𝒙𝒆𝒍
− 𝟏, the dynamic range of the pixel-values
− In order to simplify the expression 𝑪 𝟑 = 𝑪 𝟐/𝟐.
− Consequently we get
Final Value of SSIM
166
𝑆𝑆𝐼𝑀 𝑥, 𝑦 =
2𝜇 𝑥 𝜇 𝑦 + 𝐶1
𝜇 𝑥
2
+ 𝜇 𝑦
2
+ 𝐶1
.
2𝜎 𝑥𝑦 + 𝐶2
𝜎 𝑥
2
+ 𝜎 𝑦
2
+ 𝐶2
𝑆𝑆𝐼𝑀 𝑥, 𝑦 =
2𝜇 𝑥 𝜇 𝑦 + 𝐶1
𝜇 𝑥
2
+ 𝜇 𝑦
2
+ 𝐶1
.
2𝜎 𝑥 𝜎 𝑦 + 𝐶2
𝜎 𝑥
2
+ 𝜎 𝑦
2
+ 𝐶2
.
𝜎 𝑥𝑦 + 𝐶3
𝜎 𝑥 𝜎 𝑦 + 𝐶3
MSSIM: This SSIMI is averaged over the image, so it is known as Mean SSIM or MSSIM.
167
Original
Gaussian Blurring
SSIM = 0.85
Objective Assessment by SSIM
168
JPEG2000 compression
SSIM = 0.78
Objective Assessment by SSIM
Original
169
Salt and Pepper noise
SSIM = 0.87
Objective Assessment by SSIM
Original
170
Objective Assessment by MSE and SSIM
MSE=226.80 SSIM =0.4489 MSE = 225.91 SSIM =0.4992
171
Objective Assessment by MSE and SSIM
MSE = 213.55 SIM = 0.3732 MSE = 225.80 SIM =0.7136
172
Objective Assessment by MSE and SSIM
MSE = 226.80 SSIM = 0.4489 MSE = 406.87 SSIM =0.910
SSIM vs. MOS
173
On a broad database of images distorted by jpeg, jpeg2000,
white noise, gaussian blur, and fast fading noise.
-t/τ
-t/τ
1+be
a
1+βe
Curve is best fitting .
What is important is that the data cluster
closely about the curve.
logistic function
MOS vs PSNR and SSIM
174
− Displaying 𝑆𝑆𝐼𝑀 𝑥, 𝑦 as an image is called a
SSIM Map. (𝑆𝑆𝐼𝑀 𝑥, 𝑦 (𝑖, 𝑗))
− It is an effective way of visualizing where the
images 𝑥, 𝑦 differ.
− The SSIM map depicts where the quality of one
image is flawed relative to the other.
SSIM Map
175
(a) (b)
(d)(c)
(a) Reference Image; (b) JPEG Compressed;
(c) Absolute Difference; (d) SSIM Map.
𝑆𝑆𝐼𝑀 𝑥, 𝑦 𝑖, 𝑗
=
2𝜇 𝑥(𝑖, 𝑗)𝜇 𝑦(𝑖, 𝑗) + 𝐶1
𝜇 𝑥
2
(𝑖, 𝑗) + 𝜇 𝑦
2
(𝑖, 𝑗) + 𝐶1
.
2𝜎 𝑥𝑦(𝑖, 𝑗) + 𝐶2
𝜎 𝑥
2
(𝑖, 𝑗) + 𝜎 𝑦
2
(𝑖, 𝑗) + 𝐶2
SSIM Map
176
(a) Reference Image; (b) JPEG Compressed;
(c) Absolute Difference; (d) SSIM Map.
(a) (b)
(d)(c)
An example of perceptual masking!An example of perceptual masking!
(a) Reference Image; (b) JPEG Compressed;
(c) Absolute Difference; (d) SSIM Map.
(a) (b)
(d)(c)
Two Images with Equal SSIM
177
False Ordering SSIM: 0.67 vs. 0.80
178
− Some good aspects
• Considers correlation between signal and error
• Considers local region
− Some bad aspects
• Very hard to put into an optimization
• Still requires an original signal
• Does not incorporate many features of HVS
− Has received much attention on ways to improve it
SSIM Specifications
179
− The MSE/PSNR can be a poor predictor of visual fidelity.
− VSNR is an efficient metric for quantifying the visual fidelity of natural images based on near-threshold and
suprathreshold properties of human vision is VSNR.
− It is efficient both in terms of its low computational complexity and low memory requirements
− It operates based on physical luminances and visual angle (rather than on digital pixel values and pixel-
based dimensions) to accommodate different viewing conditions.
− This metric estimates visual fidelity by computing:
1. contrast thresholds for detection of the distortions
2. a measure of the perceived contrast of the distortions
3. a measure of the degree to which the distortions disrupt global precedence and, therefore, degrade
the image’s structure.
Visual SNR (VSNR)
180
Chandler and Hemami, “A wavelet-based visual signal-to-noise ratio for natural images”, IEEE Trans. Image Processing, 2007
It operates via a two-stage approach.
1- Computing contrast thresholds for detection of distortions in the presence of natural images (many
subjective tests)
− The low-level HVS properties of contrast sensitivity, visual masking (visual summation) are used via a
wavelet-based model to determine if the distortions are below the threshold of visual detection (whether
the distortions in the distorted image are visible).
− If the distortions are below the threshold of detection, the distorted image is deemed to be of perfect
visual fidelity (VSNR =∞ ).
Visual SNR (VSNR)
181
It operates via a two-stage approach.
2- If the distortions are suprathreshold, followings are taken into account as an alternative measure of
structural degradation.
• Low-level visual property of perceived contrast including
I. Low-level HVS property of contrast sensitivity
II. Low-level HVS property of visual masking
• Mid-level visual property of global precedence (i.e., the visual system’s preference for integrating
edges in a coarse-to-fine-scale fashion)
− These two properties are modeled as Euclidean distances in distortion-contrast space of a multiscale
wavelet decomposition, and VSNR is computed based on a simple linear sum of these distances.
Visual SNR (VSNR)
182
Two Images with Equal VSNR
183
VSNR, Noise Most Visible
184
VSNR: 32.7
SSIM: 0.83
VSNR: 21.2
SSIM: 0.95
− Relies on modeling of the statistical image source, the
image distortion channel and the human visual
distortion channel.
− VIF was developed for image and video quality
measurement based on natural scene statistics (NSS).
− Images come from a common class: the class of natural
scene.
Objective Assessment by Virtual Image Fidelity (VIF)
185
Natural
Image
Source
Channel
Distortion
HVS
Channel
HVS
Channel
Mutual
Information
Information
Content
C D
E F
− Image quality assessment is done based on information
fidelity where the channel imposes fundamental limits
on how much information could flow from the source
(the reference image), through the channel (the image
distortion process) to the receiver (the human observer).
Objective Assessment by Virtual Image Fidelity (VIF)
186
Mutual information between C and E quantifies the
information that the brain could ideally extract from the
reference image, whereas the mutual information
between C and F quantifies the corresponding
information that could be extracted from the test image.
𝑉𝐼𝐹 =
𝐷𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 𝐼𝑚𝑎𝑔𝑒 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛
𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝐼𝑚𝑎𝑔𝑒 𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛
Natural
Image
Source
Channel
Distortion
HVS
Channel
HVS
Channel
Mutual
Information
Information
Content
C D
E F
Objective Assessment by Virtual Image Fidelity (VIF)
187
VIF = 0.5999 SSIM = 0.8558 VIF = 1.11 SSIM = 0.9272
− The VIF has a distinction over traditional quality assessment methods, a linear contrast enhancement of the
reference image that does not add noise to it will result in a VIF value larger than unity, thereby signifying
that the enhanced image has a superior visual quality than the reference image.
− No other quality assessment algorithm has the ability to predict if the visual image quality has been
enhanced by a contrast enhancement operation.
Objective Assessment by Virtual Image Fidelity (VIF)
188
VIF = 0.6045 SSIM = 0.8973 VIF = 0.5944 SSIM = 0.7054
Objective Assessment by Virtual Image Fidelity (VIF)
189
VIF = 0.60 SSIM = 0.7673 VIF = 0.6043 SSIM = 0.8695
Blur vs. AWG Noise
190
PSNR: 26.6
SSIM: 0.89
VSNR: 20.7
VIF: 0.47
PSNR: 25.4
SSIM: 0.74
VSNR: 29.5
VIF: 0.58
Blur vs. AWG Noise
191
PSNR: 26.6
SSIM: 0.89
VSNR: 20.7
VIF: 0.47
PSNR: 21.3
SSIM: 0.60
VSNR: 29.5
VIF: 0.44
192
JPEG-2000 vs. Blur
PSNR: 35.4
VSNR: 35.2
SSIM: 0.93
VIF: 0.76
PSNR: 34.3
VSNR: 33.5
SSIM: 0.97
VIF: 0.76
193
JPEG-2000 vs. Blur
PSNR: 35.4
VSNR: 35.2
SSIM: 0.93
VIF: 0.59
PSNR: 30.8
VSNR: 26.8
SSIM: 0.94
VIF: 0.5 vs. 0.59
− The advantage of the multi-scale methods, over single-scale methods, like SSIM, is that in multi-scale
methods image details at different resolutions and viewing conditions are incorporated into the quality
assessment algorithm.
Multi-scale Structural Similarity Index (MS-SSIM)
194
i  i th scale
L: low-pass filter; ↓2:downsampling by factor of 2.
MAD assumes that HVS employs different strategies when judging the quality of images.
Detection-based Strategy
• When HVS attempts to view images containing near-threshold distortions, it tries to move past the
image, looking for distortions.
• For estimating distortions in detection-based strategy, local luminance and contrast masking are
used.
Appearance-based Strategy
• When HVS attempts to view images containing clearly visible distortions, it tries to move past the
distortions, looking for image’s subject matter.
• For estimating distortions in appearance-based strategy, variations in local statistics of spatial
frequency components are being employed.
MAD (Most Apparent Distortion) Algorithm
195
196
MAD (Most Apparent Distortion) Algorithm
Detection-based and Appearance-based Strategies
The block diagram of the detection-based strategy in the MAD algorithm
The block diagram of the appearance-based strategy in the MAD algorithm.
− The FSIM algorithm is based on the fact that HVS understands an image mainly due to its low-
level characteristics, e.g., edges and zero crossings.
− In order to assess the quality of an image, FSIM algorithm uses two kinds of features.
1. Phase Congruency (PC):
• Physiological and psychophysical experiments have demonstrated that at points with high phase
congruency (PC), HVS can extract highly informative features.
2. Gradient Magnitude (GM):
• The phase congruency (PC), is contrast invariant and our perception of an image’s quality is also
affected by local contrast of that image.
• As a result of this dependency, the image gradient magnitude (GM) is used as the secondary
feature in the FSIM algorithm.
Feature Similarity Index (FSIM)
197
Feature Similarity Index (FSIM)
198
Calculating FSIM measure consists of two stages:
• Computing image’s PC and GM
• Computing the similarity measure between the reference and test images.
1. The detection thresholds are predicted and a perceptually normalized response map is generated.
2. The perceptually normalized response is decomposed into several bands of different orientations and
scales (using cortex transform, i.e., the collection of the band-pass and orientation selective filters)
3. The conditional probability of each distortion type is calculated for prediction of three distortion types
separately for each band.
4. The probability of detecting a distortion in any subband is calculated.
Quality assessment of high dynamic range (HDR) images
Dynamic range independent quality measure (DRIM)
199The block diagram of the DRIM algorithm.
− This metric is a combination of
• multi-scale structural fidelity measure
• statistical naturalness measure
− The TMQI algorithm consists of two stages
• structural fidelity measurement
• statistical naturalness measurement.
− At each scale, the local structural fidelity map is computed and averaged in order to obtain a single score S.
Quality assessment of high dynamic range (HDR) images
Tone-mapped images quality index (TMQI)
200The block diagram of the TMQI algorithm.
VMAF – Video Multi-Method Assessment Fusion
201
PSNR: 31
DMOS: 82
PSNR: 31
DMOS: 27
PSNR: 34
DMOS: 96
PSNR: 34
DMOS: 58
− VMAF is a perceptual video quality metric that models the human visual system.
− Predicts subjective quality by ‘fusing’ elementary metrics into a final quality metrics using a machine-learning
algorithm which assigns weights to each elementary metric (Support Vector Machine (SVM) regressor).
− Machine-learning model is trained and tested using the opinion scores obtained through a subjective experiment
(Ex: NFLX Video Dataset).
− It correlates better than PSNR with subjective opinion over a wide quality range and content (Netflix).
− VMAF enables reliable codec comparisons across the broad range of bitrates and resolutions occurring in
adaptive streaming.
Elementary quality metrics
− Visual Information Fidelity (VIF): quality is complementary to the measure of information fidelity loss
− Detail Loss Metric (DLM): separately measure the loss of details which affects the content visibility, and the
redundant impairment which distracts viewer attention
− Motion: temporal difference between adjacent frames
VMAF – Video Multi-Method Assessment Fusion
202
Metric Comparison
203
204
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
205
Ex: Video Quality due to De-interlacing
De-interlacing: difficult to perform
Good quality converters exist (price!)
720p - DNxHD (Gen0) 1080i - DNxHD (Gen0) – de-interlace – 7
source de-interlaced
206
Ex: Distribution Encoder and Bit-rate
Which bit-rate to choose?
Distribution channel ‘defines’ available bit-rate
MPEG-4 does a finejob
Motion in video isimportant
UncompressedAVC/DNxHD–10Mbit/sH.264
AVC/DNxHD–16Mbit/sH.264AVC/DNxHD–8Mbit/sH.264
207
Ex: Distribution Encoder and Production Codec
Uncompressed HD ~ 1 500 Mbit/s
Production bitrate ~ 100 Mbit/s
Distribution encoder ~ 10 Mbit/s
Do we actually see the influence of the production codec?
AVC/AVC + 8 Mbit/s H.264 DVC/DVC + 8 Mbit/s H.264
Example of Simulation Chain
208
Video Quality in Production Chain
Without Pixel shift
With Pixel shift (+2H, +6V)
Camera
Encoding
Post Production Encoding (4 Generations)
Example of Test Setup
209
Video Quality in Production Chain
Encode Decode
HD-SDI HD-SDI
Uncompressed
Source
Gen 0
(Cam)
Gen 1
(PP1)
Gen 2
(PP2)
Gen 3
(PP3)
Gen 3
shifted
Gen 4
(PP4)
Uncompressed YCbCr Storage
HD-SDI Ingest & Playout
VRT-medialab: onderzoek en innovatie
− The multi-generation codec assessment simulates how the production chain affects the images as a result
of multi-compression and decompression stages.
− The multi-generation codec assessment has two steps:
1. The agreed method of multi-generation testing was to visually compare the 1st, 4th and 7th generations
(including pixel shifts after each generation) with the original image under defined conditions (reference
video monitor, particular viewing environment settings, and expert viewers).
2. In addition, an objective measurement– the PSNR – was calculated to give some general trend indication
of the multi-generation performance of the individual codecs.
210
Multi-generation Codec Assessment for Production Codec Tests
211
Standalone Chain (without Spatial Shift) for INTRA Codec
− The standalone chain without processing simply consisted of several cascaded generations of the codec
under test, without any other modifications to the picture content apart from those applied by the codec
under test.
− This process accurately simulates the effect of a simple dubbing of the sequence and is usually not very
challenging for the compression algorithm.
− This simple chain can provide useful information about
• the performance of the sub-sampling filtering that is applied.
• the precision of the mathematical implementation of the code.
212
Standalone Chain (without Spatial Shift) for INTRA Codec
Encoder Decoder Encoder Decoder Encoder Decoder
First
Generation
Second
Generation
Seven
GenerationInput
Video
213
Standalone Chain (without Spatial Shift) for INTRA Codec
In fact, the most important
impact on the picture quality
should be incurred at the first
generation, when the encoder
has to eliminate some
information.
The effect of the subsequent
generations should be minimal
as the encoder should basically
eliminate the same information
already deleted in the first
compression step.
Encoder Decoder Encoder Decoder Encoder Decoder
First
Generation
Second
Generation
Seven
GenerationInput
Video
214
Standalone Chain (with Spatial Shift) for INTRA Codec
215
Standalone Chain (with Spatial Shift) for INTRA Codec
− In a real production chain, several manipulations are applied to the picture to produce the master
• Editing
• Zoom
• NLE
• Colour correction
− A realistic simulation has to take into account this issue.
− As all these processes are currently feasible only in the uncompressed domain, the effect of the
processing is simulated by spatially shifting the image horizontally (pixel) or vertically (lines) in between
each compression step.
− Obviously, this shift makes the task of the coder more challenging, especially for those algorithms based
on a division of the picture into blocks (e.g. NxN DCT block), as in any later generation the content of each
block is different to that in the previous generation.
216
Standalone Chain (with Spatial Shift) for INTRA Codec
− The shift process introduces black pixels on the edges of the frame if/when necessary.
− The shifts were applied variously using software or hardware, but the method used was exactly the same
for all the algorithms under test.
• Horizontal shift (H): “only even shifts” to take into account the chroma subsampling of the 4:2:2 format.
• Vertical shift (V): shift is applied on a frame basis and is always an “even value”.
Progressive formats:
− The whole frame is shifted by a number of lines corresponding to the vertical shift applied.
• For example, a shift equal to +2V means two lines down for progressive formats.
Interlaced formats:
− Each field is shifted by a number of lines corresponding to half the vertical shift applied.
• For example, a shift equal to +2V means 1 line down for each field of an interlaced format.
217
Standalone Chain (with Spatial Shift) for INTRA Codec
218
Standalone Chain with GoP Alignment (without Spatial Shift) for INTER Codec
− The GoP structure has some important implications on the way the standalone chain has to be realized,
and introduces a further variable in the way the multi-generation can be performed –depending on
whether the GoP alignment is guaranteed between each generation (GoP aligned) or not (GoP mis-
aligned).
− The GoP is considered to be aligned if one frame of the original picture that is encoded at the first
generation using one of the three possible kinds of frame is again encoded using that same kind of frame
in all the following generations (Intra, Predicted or Bidirectional ).
− It is therefore possible to have only one multi-generation chain with “GoP alignment”.
219
Standalone Chain with GoP Alignment (without Spatial Shift) for INTER Codec
Encoder Decoder Encoder Decoder Encoder Decoder
First
Generation
Second
Generation
Seven
GenerationInput
Video
Ex: frame n of the original sequence is always encoded as Intra and frame n+6 as Predicted in all generations.
220
Standalone Chain with GoP Alignment and Spatial Shift for INTER Codec
− If GoP alignment is not guaranteed, several conditions of GoP mis-alignment are possible
If GoP length L=12 → for the second generation 11 different GoP mis-alignments are possible
→ for the third generation 11 by 11 different GoP mis-alignments are possible
and so on
→ making the testing of all the possible conditions unrealistic.
− It was therefore agreed to apply one “temporal shift” equal to one frame between each generation, so
that the frame that is encoded in Intra mode in the first generation is encoded in Bidirectional mode in the
second generation and, in general, in a different mode for each following generation.
− It is interesting to underline that the alignment of the GoP in the different generations was under control
(not random) and that this was considered the likely worst case as far as the mis-alignment effect is
concerned, and was referred to in the documents as “chain without GoP alignment”.
221
Standalone Chain without GoP Alignment
222
Standalone Chain without GoP Alignment and with Spatial Shift for INTER Codec
− Four different possible standalone chains up to the seventh generation:
• Multigeneration chain with GoP alignment (without spatial shift)
• Multigeneration chain without GoP alignment (without spatial shift)
• Multigeneration chain with GoP alignment and spatial shift
• Multigeneration chain without GoP alignment and spatial shift
− A procedure to re-establish the spatial alignment between the original and the de-compressed version of the
test sequence is applied.
− 16 pixels on the edges of the picture is skipped to avoid taking measurements on the black pixels introduced
during the shift.
− PSNR does not correlate accurately with the picture quality and thus it would be misleading to directly compare
PSNR from very different algorithms.
− PSNR can provide information about the behaviour of the compression algorithm through the multi-generation
process.
223
Objective Measurements by PSNR
− Both objective measurements (PSNR) and visual scrutiny of the picture (i.e. expert viewing) are used for
video quality evaluation.
− They are considered to be complementary.
224
Ex. 1, EBU R124-2008
− The viewing distance is 3H (HDTV).
− Sometimes a closer viewing distance, e.g. 1H, was used to closely observe small details and artefacts and,
when used, this condition was clearly noted in the report.
225
Ex. 1, EBU R124-2008
Compressed (Impaired) version
(e.g. Seventh generation
with spatial shift)
Original
The following displays were used during the tests (HDTV):
• CRT 32” Sony Type BVM-A32E1WM
• CRT 20” Sony Type BVM-A20F1M
• Plasma Full HD 50” Type TH50PK9EK Panasonic
• LCD 47” Type Focus
The displays and the room conditions were aligned
according to the conditions described in ITU-R BT.500-11.
• For acquisition applications an HDTV format with 4:2:2 sampling, no further horizontal or vertical sub-
sampling should be applied.
• The 8-bit bit-depth is sufficient for mainstream programmes.
• The 10-bit bit-depth is preferred for high-end acquisition.
− For production applications of mainstream HD, the tests of the EBU has found no reason to relax the
requirement placed on SDTV studio codecs that “Quasi-transparent quality” must be maintained after 7
cycles of encoding and recoding with horizontal and vertical pixel-shifts applied.
− All tested codecs have shown quasi-transparent quality up to at least 4 to 5 multi-generations, but have
also shown few impairments such as noise or loss of resolution with critical images at the 7th generation.
→ Thus EBU Members are required to carefully design the production workflow and to avoid 7 multi-
generation steps.
226
Ex. 1, EBU R124-2008
The EBU recommends in document R124-2008 is that:
− If the production/archiving format is to be based on I-frames only, the bitrate should not be less than 100
Mbit/s.
− If the production/archiving format is to be based on long-GoP MPEG-2, the bitrate should not be less than
50 Mbit/s.
− Furthermore, the expert viewing tests have revealed that:
• A 10-bit bit-depth in production is only significant for post-production with graphics and after
transmission encoding and decoding at the consumer end, if the content has been generated using
advanced colour grading, etc (e.g. graphics or animation).
• For normal moving pictures, an 8-bit bit-depth in production will not significantly degrade the HD
picture quality at the consumer’s premises.
227
Ex. 1, EBU R124-2008
− A contribution link was simulated by passing a signal
twice through the codec under test, with a spatial pixel
shift introduced between codec passes.
− This is equivalent to a signal passing through a pair of
cascaded codecs that would typically be
encountered on a contribution link.
− 4:2:2 colour sampling is required for professional
contribution applications.
228
Ex. 2, Simulation of a Typical Contribution Link
Encoder
Decoder
Pixel
Shift
Final
Output
2 Pixels horizontally
2 Pixels vertically
1st Pass
2nd Pass
Source
− The equivalent quality using H.264/AVC was found subjectively by viewing the H.264/AVC sequences at
different bit-rates, starting at half the MPEG-2 bit-rate and increasing it by steps of 10% (of the MPEG-2
reference bit rate) until the quality of the MPEG-2 encoded sequence and that of the H.264/AVC
encoded sequence was judged to be equivalent.
− The subjective evaluation mainly tested the 2nd generation sequences, which can actually allow better
discrimination of the feeds than the 1st generation.
229
Ex. 2, Simulation of a Typical Contribution Link (Cont.)
EBU Triple Stimulus Continuous Evaluation Scale
(TSCES) quality assessment rig
Questions??
Discussion!!
Suggestions!!
Criticism!!
230

More Related Content

What's hot

An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1    An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1 Dr. Mohieddin Moradi
 
To Understand Video
To Understand VideoTo Understand Video
To Understand Videoadil raja
 
An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2Dr. Mohieddin Moradi
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
Modern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operationModern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operationDr. Mohieddin Moradi
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Dr. Mohieddin Moradi
 
Understanding video technologies
Understanding video technologiesUnderstanding video technologies
Understanding video technologiesfionayoung
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniquesMazin Alwaaly
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Dr. Mohieddin Moradi
 
Multimedia fundamental concepts in video
Multimedia fundamental concepts in videoMultimedia fundamental concepts in video
Multimedia fundamental concepts in videoMazin Alwaaly
 
Video signal-ppt
Video signal-pptVideo signal-ppt
Video signal-pptDeepa K C
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression BasicsSanjiv Malik
 
Canon XF100 & XF105
Canon XF100 & XF105Canon XF100 & XF105
Canon XF100 & XF105AV ProfShop
 

What's hot (20)

An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1    An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1
 
To Understand Video
To Understand VideoTo Understand Video
To Understand Video
 
An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2
 
HDR and WCG Principles-Part 5
HDR and WCG Principles-Part 5HDR and WCG Principles-Part 5
HDR and WCG Principles-Part 5
 
HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2
 
HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Unit iv
Unit ivUnit iv
Unit iv
 
HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1
 
Modern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operationModern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operation
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
 
Understanding video technologies
Understanding video technologiesUnderstanding video technologies
Understanding video technologies
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniques
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
 
Multimedia fundamental concepts in video
Multimedia fundamental concepts in videoMultimedia fundamental concepts in video
Multimedia fundamental concepts in video
 
Jpeg standards
Jpeg   standardsJpeg   standards
Jpeg standards
 
SECURICO CCTV BOOK
SECURICO CCTV BOOK SECURICO CCTV BOOK
SECURICO CCTV BOOK
 
Video signal-ppt
Video signal-pptVideo signal-ppt
Video signal-ppt
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression Basics
 
Canon XF100 & XF105
Canon XF100 & XF105Canon XF100 & XF105
Canon XF100 & XF105
 

Similar to Video Compression, Part 4 Section 1, Video Quality Assessment

Video compression
Video compressionVideo compression
Video compressionnnmaurya
 
The motion estimation
The motion estimationThe motion estimation
The motion estimationsakshij91
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2VijayKumarArya
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jainSahil Jain
 
Compression of Compound Images Using Wavelet Transform
Compression of Compound Images Using Wavelet TransformCompression of Compound Images Using Wavelet Transform
Compression of Compound Images Using Wavelet TransformDR.P.S.JAGADEESH KUMAR
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics RenderingBhupinder Singh
 
Motion graphics and_compositing_video_analysis_worksheet 2
Motion graphics and_compositing_video_analysis_worksheet 2Motion graphics and_compositing_video_analysis_worksheet 2
Motion graphics and_compositing_video_analysis_worksheet 2smashingentertainment
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)danishrafiq
 
White Paper - Mpeg 4 Toolkit Approach
White Paper - Mpeg 4 Toolkit ApproachWhite Paper - Mpeg 4 Toolkit Approach
White Paper - Mpeg 4 Toolkit ApproachAmos Kohn
 

Similar to Video Compression, Part 4 Section 1, Video Quality Assessment (20)

WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
 
Video compression
Video compressionVideo compression
Video compression
 
The motion estimation
The motion estimationThe motion estimation
The motion estimation
 
No-reference Video Quality Assessment on Mobile Devices
No-reference Video Quality Assessment on Mobile DevicesNo-reference Video Quality Assessment on Mobile Devices
No-reference Video Quality Assessment on Mobile Devices
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Dsc
DscDsc
Dsc
 
Dsc
DscDsc
Dsc
 
Dsc
DscDsc
Dsc
 
Dsc
DscDsc
Dsc
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jain
 
Compression of Compound Images Using Wavelet Transform
Compression of Compound Images Using Wavelet TransformCompression of Compound Images Using Wavelet Transform
Compression of Compound Images Using Wavelet Transform
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
056-2004 SID Manuscript
056-2004 SID Manuscript056-2004 SID Manuscript
056-2004 SID Manuscript
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics Rendering
 
C04841417
C04841417C04841417
C04841417
 
Motion graphics and_compositing_video_analysis_worksheet 2
Motion graphics and_compositing_video_analysis_worksheet 2Motion graphics and_compositing_video_analysis_worksheet 2
Motion graphics and_compositing_video_analysis_worksheet 2
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)
 
White Paper - Mpeg 4 Toolkit Approach
White Paper - Mpeg 4 Toolkit ApproachWhite Paper - Mpeg 4 Toolkit Approach
White Paper - Mpeg 4 Toolkit Approach
 

More from Dr. Mohieddin Moradi

An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1   An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1 Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Dr. Mohieddin Moradi
 
An Introduction to Audio Principles
An Introduction to Audio Principles An Introduction to Audio Principles
An Introduction to Audio Principles Dr. Mohieddin Moradi
 
Video Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video CodecsVideo Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video CodecsDr. Mohieddin Moradi
 
Video Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video CodecsVideo Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video CodecsDr. Mohieddin Moradi
 
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Dr. Mohieddin Moradi
 
Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Dr. Mohieddin Moradi
 
Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles Dr. Mohieddin Moradi
 

More from Dr. Mohieddin Moradi (17)

Video Quality Control
Video Quality ControlVideo Quality Control
Video Quality Control
 
HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6
 
SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2
 
SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1
 
Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2
 
Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1
 
An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1   An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1
 
An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4
 
An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3
 
Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2
 
Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1
 
An Introduction to Audio Principles
An Introduction to Audio Principles An Introduction to Audio Principles
An Introduction to Audio Principles
 
Video Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video CodecsVideo Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video Codecs
 
Video Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video CodecsVideo Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video Codecs
 
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
 
Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts
 
Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles
 

Recently uploaded

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 

Video Compression, Part 4 Section 1, Video Quality Assessment

  • 2. Section I − Perceptual Artifacts in Compressed Video − Quality Assessment in Compressed Video − Objective Assessment of Compressed Video − Objective Assessment of Compressed Video, Codec Assessment for Production Section II − Subjective Assessment of Compressed Video − Subjective Assessment of Compressed Video, Subjective Assessment by Expert Viewing − Performance Comparison of Video Coding Standards: An Adaptive Streaming Perspective − Subjective Assessment by Visualizer™ Test Pattern 2 Outline
  • 4. 4 Blockiness Bluriness Exposure Interlace Noisiness Framing & Pillar-/Letter-Boxing Flickering Blackout Ringing Ghosting Brightness Contrast Freezing Block Loss Slicing Some Video Artifacts (Baseband and Compressed)
  • 5. − Consumers' expectations for better Quality-of-Experience (QoE) has been higher than ever before. − The constraint in available resources in codec optimization often leads to degradations of perceptual quality by introducing compression artifacts in the decoded video. − Objective VQA techniques had also been designed to automatically evaluate the perceptual quality of compressed video streams. Codec Optimal Compromise Availability of Resources (Bandwidth, Power, and Time) Perceptual Quality 5 Compression Artifacts
  • 6. 6 Compression Artifacts Compression Artifacts Spatial Artifacts Blurring Blocking Ringing Basis Pattern Effect Color Bleeding Temporal Artifacts Flickering Mosquito Noise Fine-granularity Flickering Coarse-granularity FlickeringJerkiness Floating Texture Floating Edge Neighborhood Floating
  • 7. − Location-based (spatial): If you can see the artifact when the video is paused, then it’s probably a spatial artifact. − Time/sequence-based (temporal): If it’s much more visible while the video plays, then it’s likely temporal. • The origin of many temporal artifacts in inter-frame coding algorithms → The propagation compression losses to subsequent frame predictions and “rounding on rounding”. − These include those artifacts generated I. during video acquisition (e.g., camera noise, camera motion blur, and line/frame jittering) II. during video transmission in error-prone networks (e.g., video freezing, jittering, and erroneously decoded blocks caused by packet loss and delay) III. during video post-processing and display (e.g., post deblocking and noise filtering, spatial scaling, retargeting, chromatic aberration, and pincushion distortion) 7 Temporal vs. Spatial Artifacts
  • 8. − Block-based video coding schemes create various spatial artifacts due to block partitioning and quantization. These artifacts include • Blurring • Blocking • Ringing • Basis Pattern Effect • Color Bleeding − They are detected without referencing to temporally neighboring frames, and thus can be better identified when the video is paused. − Due to the complexity of modern compression techniques, these artifacts are interrelated with each other, and the classification here is mainly based on their visual appearance. 8 Spatial Artifacts Spatial Artifacts Blurring Blocking Ringing Basis Pattern Effect Color Bleeding
  • 9. 9 Blurring (Fuzziness or Unsharpness ) − Blurring of an image refers to a smoothing of its details and edges (Reduction in sharpness of edges, spatial details) • Caused by quantization/truncation of high-frequency transform (DCT/DWT) coefficients during compression − More noticeable around edges, textured regions
  • 12. 1- Removing high spatial frequency after Transform and quantization − Blurring is a result of loss of high spatial frequency image detail, typically at sharp edges. − Colloquially referred to as “fuzziness” or “unsharpness”. − It makes discrete objects – as opposed to the entire video – appear out of focus. − Since the energy of natural visual signals concentrate at low frequencies, quantization reduces high frequency energy in such signals, resulting in significant blurring effect in the reconstructed signals. 12 Blurring
  • 13. 2- Smoothing by the in-loop de-blocking filtering − Another source of blurring effect is in-loop de-blocking filtering, which is employed to reduce the blocking artifact across block boundaries, and are adopted as options by state-of-the-art video coding standards such as H.264/AVC and HEVC. − The de-blocking operators are essentially spatially adaptive low-pass filters that smooth the block boundaries, and thus produces perceptual blurring effect. Note: − Sometimes, blurring is intentionally introduced by using a Gaussian function to reduce image noise or to enhance image structures at different scales. − Typically, this is done as a pre-processing step before compression algorithms may be applied, attenuating high-frequency signals and resulting in more efficient compression. 13 Blurring
  • 14. a) Reference frame b) Compressed frame with de-blocking filter turned off c) Compressed frame with de-blocking filter turned on 14 Blurring
  • 15. Motion Blur − It appears in the direction of motion corresponding to rapidly moving objects in a still image or a video. − It happens when the image being recorded changes position (or the camera moves) during the recording of a single frame, because of either rapid movement of objects or long exposure of slow-moving objects. − One way to avoid motion blur is by panning the camera to track the moving objects, so the object remains sharp but the background is blurred instead. − Graphics, image, or video editing tools may also generate the motion blur effect for artistic reasons; (ex. computer-generated imagery (CGI)). 15 Blurring
  • 16. 16 Blocking or Blockiness − Visibility of underlying block encoding structure (false discontinuities across block boundaries) • Caused by coarse quantization, with different quantization applied to neighboring blocks. − More visible in smoother areas of picture
  • 18. 18 Blocking or Blockiness Compressed frame with Blockiness artifact
  • 19. 19 Blocking or Blockiness Blocking DCT blocks are not recreated properly. Either due to errors or high compression ratio.
  • 20. 20 Blocking or Blockiness Blocking DCT blocks are not recreated properly. Either due to errors or high compression ratio.
  • 21. − Blocking is known by several names: tiling, jaggies, mosaicing, pixelating, quilting and checkerboarding. − It is frequently seen in video compression standards, which use blocks of various sizes as the basic units for frequency transformation, quantization and motion estimation/compensation, thus producing false discontinuities across block boundaries. − It occurs whenever a complex (compressed) image is streamed over a low bandwidth connection. − At decompression, the output of certain decoded blocks makes surrounding pixels appear averaged together and look like larger blocks. − As displays increase in size, blocking typically becomes more visible. − However, an increase in resolution makes blocking artifacts smaller in terms of the image size and therefore less visible at a given viewing distance. 21 Blocking or Blockiness
  • 22. − The lower the bit rate, the more coarsely the block is quantized, producing blurry, low-resolution versions of the block. − In the extreme case, only the DC coefficient, representing the average of the data, is left for a block, so that the reconstructed block is only a single color region. − The DC values vary from block to block. − The block boundary artifact is the result of independently quantizing the blocks of transform coefficients. − Neighboring blocks quantize the coefficients separately, leading to discontinuities in the reconstructed block boundaries. − These block-boundary discontinuities are usually visible, especially in the flat color regions such as the sky, faces, and so on, where there are little details to mask the discontinuity. 22 Blocking or Blockiness
  • 23. Blocking Mosaic Effect Staircase Effect False Edge − Although all blocking effects are generated because of similar reasons, their visual appearance may be different, depending on the region where blockiness occurs. − Therefore, here we further classify the blocking effects into 3 subcategories. 23 Blocking or Blockiness Reference frame False Edge Staircase Effect Mosaic Effect
  • 24. − Mosaic effect usually occurs when there is luminance transitions at large low- energy regions (e.g., walls, black/white boards, and desk surfaces.) − Due to quantization within each block, nearly all AC coefficients are quantized to zero, and thus each block is reconstructed as a constant DC block, where the DC values vary from block to block. − When all blocks are put together, mosaic effect manifests as abrupt luminance change from one block to another across the space. − The mosaic effect is highly visible and annoying to the visual system, where the visual masking effect is the weakest at smooth regions. Note: Visual Masking Effect • The reduced visibility of one image component due to the existence of another neighboring image component. 24 Mosaic Effect
  • 25. − Staircase effect typically happens along a diagonal line or curve, which, when mixed with the false horizontal and vertical edges at block boundaries, creates fake staircase structures. − Depending on root cause, staircasing can be categorized as • A compression artifact (insufficient sampling rates) • A scaler artifact (spatial resolution is too low) 25 Staircase Effect
  • 26. − False edge is a fake edge that appears near a true edge. − This is often created by a combination of • motion estimation/compensation based inter-frame prediction • blocking effect in the previous frame The blockiness in the previous frame is transformed to the current frame via motion compensation as artificial edges. 26 False Edge
  • 27. − Halo surrounding objects and edges • Caused by quantization/truncation of high-frequency transform (DCT/DWT) coefficients during compression − Doesn’t move around frame-to-frame (unlike mosquito noise) 27 Ringing (Echoing, Ghosting)
  • 30. 30 Reference frame Compressed frame with ringing artifact Ringing
  • 31. Ringing is unwanted oscillation of an output signal in response to a sudden change in the input. − The output signal oscillates at a fading rate, similar to a bell ringing after being struck, inspiring the name of the ringing artifact. − Image and video signals in digital data compression and processing are band limited. − When they undergo frequency domain techniques such as Fourier or wavelet transforms, or non-monotone filters such as deconvolution, a spurious and visible ghosting or echo effect is produced near the sharp transitions or object contours. − This is due to the well-known Gibb’s phenomenon—an oscillating behavior of the filter’s impulse response near discontinuities, in which the output takes higher value (overshoots) or lower value (undershoots) than the corresponding input values, with decreasing magnitude until a steady-state is reached. 31 Ringing (Echoing, Ghosting)
  • 32. − The ringing takes the form of a “halo,” band, or “ghost” near sharp edges. − Sharp transitions in images such as strong edges and lines are transformed to many coefficients in frequency domain representations. − The quantization process results in partial loss or distortion of these coefficients. − So during image reconstruction (decompression), there’s insufficient data to form as sharp an edge as in the original. − When the remaining coefficients are combined to reconstruct the edges or lines, artificial wave-like or ripple structures are created in nearby regions, known as the ringing artifacts. − Mathematically, this causes both over- and undershooting to occur at the samples around the original edge. − It’s the over- and undershooting that typically introduces the halo effect, creating a silhouette-like shade parallel to the original edge. 32 Ringing (Echoing, Ghosting)
  • 33. − The ringing effect is restricted to sharp edges or lines. − Such ringing artifacts are most significant when the edges or lines are sharp and strong, and when the regions near the edges or lines are smooth, where the visual masking effect is the weakest. − Ringing doesn’t move around frame to frame (Unlike mosquito noise). Note − When the ringing effect is combined with object motion in consecutive video frames, a special temporal artifact called mosquito noise is observed. 33 Ringing (Echoing, Ghosting)
  • 34. − The basis pattern effect takes its name from basis functions (mathematical transforms) endemic to all compression algorithms. The artifact appears similar to the ringing effect. − However, whereas the ringing effect is restricted to sharp edges or lines, the basis pattern is not. − It usually occurs in regions that have texture, like trees, fields of grass, waves, etc. − Typically, if viewers notice a basis pattern, it has a strong negative impact on perceived video quality. − If the region is in the background and does not attract visual attention, then the effect is often ignored by human observers. 34 Basis Pattern Effect Reference frame Compressed frame with basis pattern effect
  • 35. − When the edges of one color in the image unintentionally bleeds or overlaps into another color. − Colors of contrasting hue/saturation bleed across sharp brightness boundaries, looks like “sloppy painting” • Caused by chroma subsampling (result of inconsistent image rendering across the luminance and chromatic channels) − Worse in images with high color detail 35 Color Bleeding (Smearing)
  • 37. 37 Color Bleeding Compressed frame with Color Bleeding artifact
  • 38. For example, in the most popular YCbCr 4:2:0 video format, the color channels Cb and Cr have half resolution of the luminance channel Y in both horizontal and vertical dimensions. Inconsistent Distortions Across Color Channels − After compression, all luminance and chromatic channels exhibit various types of distortions (such as blurring, blocking and ringing described earlier), and more importantly, these distortions are inconsistent across color channels. Interpolation Operations − Moreover, because of the lower resolution in the chromatic channels, the rendering processes inevitably involve interpolation operations, leading to additional inconsistent color spreading in the rendering result. − In the literature, it was shown that chromatic distortion is helpful in color image quality assessment, but how color bleeding affects the overall perceptual quality of compressed video is still an unsolved problem. 38 Color Bleeding (Smearing)
  • 39. Temporal artifacts refer to those distortion effects that are not observed when the video is paused but during video playback. 39 Temporal Artifacts Temporal Artifacts Flickering Mosquito Noise Fine-granularity Flickering Coarse-granularity FlickeringJerkiness Floating Texture Floating Edge Neighborhood Floating
  • 40. Temporal artifacts are of particular interest to us for two reasons I. As compared to spatial artifacts, temporal artifacts evolve more significantly with the development of video coding techniques. • For example, texture floating did not appear to be a significant issue in early video coding standards, and is more manifest in H.264/AVC video, but is largely reduced in the latest HEVC coded video. II. The objective evaluation of such artifacts is more challenging, and popular VQA models often fail to account for these artifacts. • More importantly, it was pointed out that such drops are largely due to the lack of proper assessment in temporal artifacts such as flickering and floating (ghosting). 40 Temporal Artifacts
  • 42. 42 Flickering Compressed frame with Flickering artifact Frequent luminance or chrominance changes along temporal dimension
  • 43. Flickering artifact generally refers to frequent luminance or chrominance changes along temporal dimension that does not appear in uncompressed reference video. − It can be very eye-catching and annoying to viewers and has been identified as an important temporal artifacts that has significant impact on perceived quality. − The most likely cause of this type of flickering is the use of GOP structures in the compression algorithm. − I-frame-based algorithms are not susceptible to this type of artifact. 43 Flickering Flickering Mosquito noise Coarse-granularity flickering Fine-granularity flickering
  • 44. − Haziness, shimmering, blotchy noise around objects/edges • Varies from frame to frame, like mosquitos flying around person’s head. − Addition of edges, Incorrect DCT block reconstruction, Pixels of the opposite colour created. 44 Mosquito Noise (Gibbs Effect, Edge Busyness)
  • 45. − Mosquito noise is a joint effect of object motion and time-varying spatial artifacts (such as ringing and motion prediction error) near sharp object boundaries. − Specially, the ringing and motion prediction error are most manifest at the regions near the boundaries of objects. − When the objects move, such noise-like time-varying artifacts move together with the objects, and thus look like mosquitos flying around the objects. − Since moving objects attract visual attention and the plane region near object boundaries have weak visual masking effect on the noise, mosquito noise is usually easily detected and has strong negative impact on perceived video quality. − A variant of flickering, it’s typified as haziness and/or shimmering around high-frequency content (sharp transitions between foreground entities and the background or hard edges), and can sometimes be mistaken for ringing. 45 Mosquito Noise (Gibbs Effect, Edge Busyness)
  • 46. Coarse-granularity flickering refers to low-frequency sudden luminance changes in large spatial regions that could extend to the entire video frame. − The most likely reason of such flickering is the use of group of-picture (GoP) structures in standard video compression techniques. − When a new GoP starts, there is no dependency between the last P-frame in the previous GoP and the I- frame in the current GoP. − Thus sudden luminance change is likely to be observed, especially when these two frames are about the same scene. 46 Coarse-granularity Flickering GOP = 6 No Dependency No Dependency
  • 47. − The frequency of coarse-granularity flickering is typically determined by the size of GoP. − Advanced video encoders may not use fixed GoP lengths or structures, and an I-frame may be employed only when scene change occurs, and thus coarse-granularity flickering may be avoided or significantly reduced. 47 Coarse-granularity Flickering GOP = 6 No Dependency No Dependency GOP = 12 No Dependency
  • 48. Fine-granularity flickering is typically observed in large low-energy to mid-energy regions with significant blocking effect and slow motion. − In these regions, significant blocking effect occurs at each frame. − The levels of blockiness and the DC values in corresponding blocks change frame by frame. − Consequently, these regions appear to be flashing at high frequencies, frame-by-frame (as opposed to GoP-by-GoP in coarse granularity flickering). − Such flashing effect is highly eye-catching and perceptually annoying, especially when the associated moving regions are of interest to the human observers. 48 Fine-granularity flickering
  • 49. − A flicker-like artifact, jerkiness (also known as choppiness), describes the perception of individual still images in a motion picture. − Jerkiness, is the perceived uneven or wobbly motion due to frame sampling. − Jerkiness occurs when the temporal resolution is not high enough to catch up with the speed of moving objects, and thus the object motion appears to be discontinuous. − The highly visible jerkiness is typically observed only when there is strong object motion in the frame. − The resulting unsmooth object movement may cause significant perceptual quality degradation. − It may be noted that the frequency at which flicker and jerkiness are perceived is dependent upon many conditions, including ambient lighting conditions. 49 Jerkiness (Choppiness)
  • 50. − Jerkiness is not discernible for normal playback of video at typical frame rates of 24 frames per second or above. − However, in visual communication systems, if a video frame is dropped by the decoder owing to its late arrival, or if the decoding is unsuccessful owing to network errors, the previous frame would continue to be displayed. − Upon successful decoding of the next error-free frame, the scene on the display would suddenly be updated. This would cause a visible jerkiness artifact. 50 Jerkiness Encoder DecoderNetwork Channel • Dropped by the decoder owing to its late arrival • Unsuccessful decoding owing to network errors
  • 51. − Traditionally, jerkiness is not considered a compression artifact, but an effect of the low temporal resolution of video acquisition device, or a video transmission issue when the available bandwidth is not enough to transmit all frames and some frames have to be dropped or delayed. Telecine Judder − Another flicker-like artifact is the telecine judder. − It’s often caused by the conversion of 24 fps movies to a 30 or 60 fps video format. The process, known as "3:2 pulldown" or "2:3 pulldown," can’t create a flawless copy of the original movie because 24 does not divide evenly into 30 or 60. − Jerkiness also sometimes is called Judder. 51 Jerkiness
  • 54. Floating refers to the appearance of illusive motion in certain regions as opposed to their surrounding background. − Visually, these regions appear as if they were floating on top of the surrounding background. − Typically, the video encoders associate the Skip coding mode with these regions (where the encoding of motion compensated prediction residue is skipped), and thus the structural details within the regions keep unchanged across frames. − It is again erroneous as the actual details in the regions are evolving over time. − This is the result of the encoder erroneously skipping predictive frames. 54 Floating Encoder Floating Region Skip Coding Mode It has illusive motion as opposed to their surrounding background.
  • 55. Texture floating typically occurs at large mid-energy texture regions. − The most common case is when a scene with large textured regions such as water surface or trees is captured with a camera having a slow motion movements. − Despite the actual shifting of image content due to camera motion, many video encoders choose to encode the blocks in the texture regions with zero motion and Skip mode. − These are reasonable choices to save bandwidth without significantly increasing the mean squared error or mean absolute error between the reference and reconstructed frames, but often create strong texture floating illusion. − Not surprisingly, such floating illusive motion is usually in opposite direction with respect to the camera motion with the same absolute speed. − It was sometimes also referred to as” ghosting” 55 Texture Floating Floating Texture floating Edge neighborho od floating Large Mid-energy Texture Regions
  • 56. a) The 200th frame in the original video b) The 200th frame in the compressed video (visual texture floating regions are marked manually) a) The texture floating map generated (black regions indicate texture floating is detected) 56 Texture Floating Detection
  • 57. − Among all types of temporal artifacts, texture floating is perhaps the least identified in the literature, but is found to be highly eye-catching and visually annoying when it exists. Factors are relevant in detecting and assessing texture floating 1. Global motion: • Texture floating is typically observed in the video frames with global camera motion, including translation, rotation and zooming. The relative motion between the floating regions and the background creates the floating illusion in the visual system. • A robust global motion estimation method is employed which uses the statistical distribution of motion vectors from the compressed bitstream. 2. Skip mode: • The use of Skip mode is the major source of temporal floating effect. When the compressed video stream is available, the Skip mode can be easily detected in the syntax information. 57 Texture Floating Detection
  • 58. 3. Local energy: • In high-energy texture and edge regions, erroneous motion estimation and Skip mode selection are unlikely in most high-performance video encoders. On the other hand, there is no visible texture in low-energy regions. • Therefore, texture floating is most likely seen in mid-energy regions. Therefore, we can define two threshold energy parameters E1; E2 to constrain the energy range for texture floating detection. 4. Local luminance: • The visibility of texture floating is also limited by the luminance levels of the floating regions. • Because human eyes are less sensitive to textures in highly bright or dark regions, we can define two threshold luminance parameters L1; L2 to consider only mid-luminance regions for texture floating identification. 5. Temporal variation similarity: • Temporal floating is often associated with erroneous motion estimation. In the reconstruction of video frames, erroneous motion estimation/compensation leads to significant distortions along temporal direction. • In the case that the original uncompressed video is available to compare with, it is useful to evaluate the similarity of temporal variation between the reference video frames and the compressed video frames as a factor to detect temporal floating effect. 58 Texture Floating Detection
  • 59. Edge neighborhood floating is observed in stationary regions that are next to the boundaries of moving objects. − Rather than remaining stationary, these regions may move together with the boundaries of the objects. − Different from texture floating, edge neighborhood floating may appear without global motion. − It is often visually unpleasant because it looks like there exists a wrapped package surrounding and moving together with the object boundaries. − This effect was also called stationary area temporal fluctuations. 59 Edge Neighborhood Floating Floating Texture floating Edge neighborho od floating Stationary Region Boundaries of Moving Objects
  • 60. − Looks like random noise process (snow); can be grey or colored, but not uniform over image • Caused by quantization of DCT coefficients 60 Quantization Noise
  • 61. − It is a frequent problem, which is caused by the limits of 8-bit color depth and (especially in low bitrate) truncation. • Not enough colours to define the image correctly. • Not noticeable in high detail areas of the image. • Affects flat areas and smooth transitions badly. 61 Contouring (Banding)
  • 62. − Similar reason to contouring − Make video look like a poster − Colors look flat − Makeup looks badly applied 62 Posterization
  • 63. − Similar reason to contouring − Make video look like a poster − Colors look flat − Makeup looks badly applied 63 Posterization
  • 64. − Data incorrectly saved or retrieved. Affects still images as a loss of part of the image … … or a block shift of part of the image. 64 Data Corruption
  • 65. 65 Data Corruption − Data incorrectly saved or retrieved. Affects still images as a loss of part of the image … … or a block shift of part of the image.
  • 67. Milestones in Video Coding 67
  • 68. Milestones in Video Coding 68
  • 69. 69 Audio and Video Visual Quality
  • 70. − Audiovisual Quality Control • Audio • Video • Interaction between audio and video : e.g. lip-sync − File based AV Quality Control • Part of an automated or manual workflow • Diagnose • Repair / Redo − Technical Quality Control • Container • Metadata • Interaction between container, audio and video : e.g. duration of tracks − File Based Technical Quality Control • Part of an automated or manual workflow • Application specifications 70 Video Quality Control in a File-based World
  • 71. Safeguarding Audiovisual Quality − Maintaining quality throughout the production chain – Choose material as close to source as possible • Prevent unneeded multi-generation – Try to produce with the shortest/’most apt’ chain • Prevent unneeded multi-generation • Prevent transcoding – Check quality − Carefully design the production chain – Choose the right codecs – Choose the right equipment 71 Video Quality Control in a File-based World
  • 72. − Video compression algorithm factors • Decoder concealment, packetization, GOP structure, … − Network-specific factors • Delay, Delay variation, Bit-rate, Packet loss rate (PLR) − Network independent factors • Sequence − Content, amount of motion, amount of texture, spatial and temporal resolution − User • Eyesight, interest, experience, involvement, expectations − Environmental viewing conditions • Background and room lighting; display sensitivity, contrast, and characteristics; viewing distance 72 Factors that Affect Video Quality
  • 73. 73 Image and Video Processing Chain Display Transmission Compression Acquisition Aliasing, Blurring, Ringing Noise, Contouring, Distortions Blocking/Tiling, MC Edges, Aliasing Blurring, Ringing, Flicker, jerkiness Noise; Jerkiness; Blackout Packet loss, Macroblocking Aliasing, Blurring, Ringing, Color contrast artifacts, Interlacing, Overscan
  • 74. The Spectrum of Visual Quality Perfect (Lossless) Awful Visually Lossless? Good enough? 74
  • 75. What Affects Visual Quality? (1) Compression Ratio Computation Delay Codec Type Codec Version Transmission Errors System Issues 75
  • 76. What Affects Visual Quality? (2) Detail Motion Masking Spatial Masking Content Issues 76
  • 77. What Affects Visual Quality? (4) Environment Experience Human Issues Task Attention Attention 77
  • 78. 78 Excellent Good Fair Poor Bad Absolute Category Rating A B Pair-wise comparison Absolute Category Rating and Pair-wise Comparison
  • 79. − In Video/image processing, the pixel values may change leading to distortions − Examples of added distortions • Filtering/Interpolation distortions • Compression distortions • Watermarking distortions • Transmission distortions − It is important to know if added distortions are acceptable Objective and Subjective Measurements/Assessment 79
  • 80. 80 Objective and Subjective Measurements/Assessment Objective Metrics • Peak Signal to Noise Ratio (PSNR) • Structural Similarity Index (SSIM) • Just Noticeable Difference (JND) • … and Many More. Video Quality Measurement Subjective Objective Subjective Metrics • MOS: Mean Opinion Score • DMOS: Differential Mean Opinion Scores (it is defined as the difference between the raw quality score of the reference and test images) …….
  • 81. 81 Objective and Subjective Measurements/Assessment Video Quality Measurement Subjective Objective Goldeneye Multiple Viewer Absolute Comparison
  • 82. 82 Objective and Subjective Measurements/Assessment Video Quality Measurement Subjective Objective Reduced Reference Model Based Hybrid Full Reference No Reference Distortion Based
  • 83. 83 Objective and Subjective Measurements/Assessment Video Quality Measurement Subjective Objective Reduced Reference Model Based Hybrid Full Reference No Reference Distortion Based Goldeneye Multiple Viewer Absolute Comparison
  • 84. − Formation: 1997 − Experts of ITU-T study group 6 and ITU-T study group 9 − They are considering three methods for the development of the video quality metric. − The VQEG have defined three methods as an objective Video Quality meter: • Full Reference Method (FR) • Reduced Reference Method (RR) • No Reference Method (NR) − All these models should be validated with the SSCQE (Single Stimulus Continuous Quality Evaluation) methods for various video segments. − Early results indicate that these methods compared with the SSCQE perform satisfactorily, with a correlation coefficient between 0.8 and 0.9. Video Quality Experts Group (VQEG) 84 ITU-T study group 9 ITU-T study group 6 Video Quality Experts Group (VQEG)
  • 85. 85 The VQEG Three Methods as an Objective Video Quality Meter
  • 86. − These three models try to mimic perceptual model of HVS − They try to • assess specific distortions (such as blocking, blurring and texture distortion, Jerkiness, Freeze, etc.) • pooling (combining features, spatially and temporally) • mapping or contributions of these artifacts into an overall video quality score. − FR is the best and NR the worst VQEG Meters and HVS Model 86 Measure Pool Map to Quality
  • 87. − A full-reference (FR) quality measurement makes a comparison between a (known) reference video signal at the input of the system and the processed video signal at the output of the system Full-reference (FR) Method 87
  • 88. − In a reduced-reference (RR) quality measurement, specific parameters (features) are extracted from both the (known) reference and processed signals. − Reference data relating to these parameters are sent using a side-channel to the measurement system. − The measurement system extracts similar features to those in the reference data to make a comparison and produce a quality measurement. Reduced-reference (RR) Method 88 Similar features extraction
  • 89. − A no-reference (NR) quality measurement analyses only the processed video without the need to access the (full or partial) reference information. No-reference (NR) Method 89
  • 90. Image/Spatial • Blurring • Ringing • Snow; Noise • Aliasing; HD on SD and vice versa • Colorfulness; color shifts • PSNR due to quantization • Contrast • Ghosting • Interlacing • Motion-compensated edge artifacts! No-reference (NR) Method 90 Video/Temporal • Flicker • Block flashing artifacts (mosquito noise?) • Telecine • Jerkiness • Blackness
  • 91. FR MOS vs PSNR 91
  • 92. Pooling: combining features; combining in space, frequency, orientation, time Typical Video Quality Estimator (QE) or Video Quality Assessment System 92 Quality Estimator Measure Pool Map to Quality Reference image Distorted test image Absolute QE score Methodology − Measure individual artifacts − Combine many artifacts • Combine linearly • Combine non-linearly − Measure physical artifacts − Measure perceptual artifacts − Incorporate viewing distance (cyc/degree) into NR − Combine NR with FR-HVS
  • 93. 93 The Ideal Quality Estimator Subjective Quality Objective Quality A QE With a Systematic Weakness Subjective Quality Objective Quality Specific type of processing, or specific type of image (or video) A Quality Estimator (QE) With a Systematic Weakness A Typical QE On a Typical Dataset Subjective Quality Objective Quality
  • 94. − Algorithm optimization • Automated in-the-loop assessment − Product benchmarks • Vendor comparison to decide what product to buy • Product marketing to convince customer to give you $$ − System provisioning • Determine how many servers, how much bandwidth, etc. − Content acquisition and delivery (and SLAs) • Enter into legal agreements with other parties − Outage detection and troubleshooting Applications Of Video Quality Estimators 94
  • 95. Absolute QE scores − Absolute QE scores are useful for • product benchmarking • content acquisition • system provisioning − Absolute QE scores still depend on context. They are not truly “absolute”. Relative QE scores − Relative QE scores are useful for • algorithm optimization • product benchmarking Absolue vs. Relative Quality Estimator (QE) Scores 95
  • 96. − Full reference (FR) • Most available info; requires original and decoded pixels • ITU-T standards J.247, J.341 − Reduced reference (RR) • Most available info; requires original and decoded pixels extracted features • ITU-T standards J.246, J.342 − No-Reference Pixel-based methods (NR-P) • Requires decoded pixels: a decoder for each video stream − No-Reference Bitstream-based methods (NR-B) • Processes packets containing bitstream, without decoder • ITU-T standards: P.1201 (packet headers only);P.1202 (packet headers and bitstream info) − Hybrid Methods (Hybrid NR-P-B) • Hybrid models combine parameters extracted from the bitstream with a decoded video signal. • They are therefore a mix between NR-P and NR-B models. Quality Estimator (QE) Categorization 96
  • 97. Quality Estimator (QE) Categorization 97
  • 98. Use information from a collection of “similar enough” images to estimate quality of them all − Applications • Super-resolution • Downsampling • image fusion • images collected of the same scene from different angles, etc • egocentric video − A relative quality estimation − Uses effective information from overlapping regions; does not require perfect pixel alignment Quality Estimator (QE) with Mutual Reference 98
  • 99. − Sequence • Content • amount of motion, amount of texture • spatial and temporal resolution − User • Eyesight • Interest • Experience • Involvement • expectations − Environmental viewing conditions • Background and room lighting • display sensitivity, contrast, and characteristics • viewing distance − The processing may improve, not degrade, quality! Why FR Quality Estimator (QE) are Challenging to Design? 99
  • 100. − All the reasons FR QE are challenging plus… − Many types of processing • Encoding • transmission errors • Sampling • backhoe, … − Many desired signals may “look like” distortion − Limited input information – no vref and often no vtest! − Nonetheless, some applications require NR Why NR Quality Estimator (QE) are Challenging to Design? 100
  • 101. Approach 1: Model Perception and Perceptual Attributes − Psychology community, Martens, Kayyargadde, Allnatt, Goodman,… Approach 2: Model System Quality − The entire photography/camera community, Keelan,… Approach 3: − The image processing community − H.R. Wu, Winkler, Marziliano, Bovik, Sheikh, Wang, Ghanbari, Reibman, Wolf Existing Approaches to NR Quality Estimator (QE) 101
  • 102. Original video X Encoding parameters E(.) Complete encoded bitstream E(X) Network impairments (losses, jitter) L(.) Lossy bitstream L(E(X)) Decoder (concealment, buffer, jitter) D(.) Decoded pixels D(L(E(X))) What Information Can You Gather? 102 NetwotkEncoder Decoder A B C D
  • 103. Original video X Encoding parameters E(.) Complete encoded bitstream E(X) Network impairments (losses, jitter) L(.) Lossy bitstream L(E(X)) Decoder (concealment, buffer, jitter) D(.) Decoded pixels D(L(E(X))) Full-Reference Quality Estimator (QE) 103 NetwotkEncoder Decoder A B C D
  • 104. Original video X Encoding parameters E(.) Complete encoded bitstream E(X) Network impairments (losses, jitter) L(.) Lossy bitstream L(E(X)) Decoder (concealment, buffer, jitter) D(.) Decoded pixels D(L(E(X))) Quality Estimator (QE) Using Network Measurements 104 NetwotkEncoder Decoder A B C D
  • 105. Original video X Encoding parameters E(.) Complete encoded bitstream E(X) Network impairments (losses, jitter) L(.) Lossy bitstream L(E(X)) Decoder (concealment, buffer, jitter) D(.) Decoded pixels D(L(E(X))) Quality Estimator (QE) Using Lossy Bitstream 105 NetwotkEncoder Decoder A B C D
  • 107. − Mean Squared Error (MSE) of a quantizer for a continuous valued signal • Where 𝑝(𝑓) is the probability density function of 𝑓 − MSE for a specific image 107 Mean Squared Error (MSE)
  • 108. − Signal to Noise Ratio (SNR) − Peak SNR or PSNR • For the error measure to be “independent of the signal energy”→ Signal energy≡ Dynamic range square of the image • For 8 bit image, peak=255 108 Signal to Noise Ratio (SNR) and Peak Signal to Noise Ratio (PSNR)
  • 109. 109 MSE of a Uniform Quantizer for a Uniform Source • Uniform quantization into L levels: 𝑞 = 𝐵/𝐿 • Same error in each bin • Error is uniformly distributed in (−𝑞/2, +𝑞/2) 𝜎𝑓 2 = න 𝑓 𝑚𝑖𝑛 𝑓 𝑚𝑎𝑥 (𝑓 − 𝑄(𝑓))2 1 𝐵 𝑑𝑓 = 𝐵2 12
  • 110. 110 𝑆𝑄𝑁𝑅 = 10 log 𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 = 6𝐵 + 1.78 𝑃𝑆𝑁𝑅 = 10 log 𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 =? 𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 = 𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 × 𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 =( (2𝐴)2 ( 𝐴 2 )2 )× 𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 =8 𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 𝑷𝑺𝑵𝑹 = 10 log 8 × 𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟 𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟 = 10 log 8 + (6𝐵 + 1.78) ≈ 𝟔𝑩 + 𝟏𝟏 (𝒅𝑩) PSNR for a Sin Waveform 2𝐴
  • 111. − Due to the limitations in subjective assessments in terms of cost and time consuming, the researchers are focusing for the automated quality assessment. − The computerized assessment of quality metric is mainly based on mathematical calculations which also include the property of human visual system (HVS). − Ideally we want to measure the performance by how close is the quantized image to the original image “Perceptual Difference” − But it is very hard to come up with an objective measure that correlates very well with the perceptual quality − Frequently used objective measure − Mean Squared Error (MSE) between original and quantized samples − Signal to Noise Ratio (SNR) − Peak SNR (PSNR) 111 Objective Measurement of Performance
  • 112. − The Mean Squared Error (MSE) is simplest Objective measure and is calculated: − Where 𝒀 𝒓 𝒙, 𝒚 and 𝒀 𝒅 𝒙, 𝒚 are the luminance level of pixel in the reference image and the coded images of size 𝑚 × 𝑛 respectively. − The PSNR is calculated using MSE as: Objective Assessment by PSNR and MSE 112 𝑀𝑆𝐸 = 1 𝑚 × 𝑛 ෍ 𝑥=1 𝑚 ෍ 𝑦=1 𝑛 (𝑌𝑟 𝑥, 𝑦 − 𝑌𝑑 𝑥, 𝑦 )2 𝑃𝑆𝑁𝑅 = 10 log10 2552 𝑀𝑆𝐸 𝑑𝐵
  • 113. 113 PSNR − The difference between PSNR and APSNR is in the way of average PSNR calculation for a sequence. − The correct way to calculate average PSNR for a sequence is to calculate average MSE for all frames (average MSE is arithmetic mean of the MSE values for frames) and after that to calculate PSNR using ordinary equation for PSNR: APSNR (Average Peak Signal to Noise Ratio) − But sometimes it is needed to take simple average of all the per frame PSNR values. − APSNR is implemented for this case and calculate average PSNR by simply averaging per frame PSNR values. − APSNR is usually 1dB higher than PSNR. PSNR and APSNR 𝑃𝑆𝑁𝑅 = 10 log10 2552 𝑀𝑆𝐸 𝑑𝐵
  • 114. 114 − Usually our PSNR refers to Y-PSNR, because human eyes are more sensitive to luminance difference. − However, some customers calculate PSNR by combining Y, U, V-PSNR in a costumed ratio, like: PSNR Channel 𝑃𝑆𝑁𝑅 = 6 ∗ 𝑌 − 𝑃𝑆𝑁𝑅 + 𝑈 − 𝑃𝑆𝑁𝑅 + 𝑉 − 𝑃𝑆𝑁𝑅 8
  • 115. 115 Encoding y4m with H.264 video codec, MP4 container, 2Mbps bitrate, PSNR measurement: • ffmpeg -i source .y4m -codec h264 -b 2000000 destination.mp4 –psnr Note: Other codecs: -codec mpeg2video , -codec hevc, -codec vp9, …) Decoding from coded media file (any) to y4m: • ffmpeg -i source.mp4 destination.y4m Encoding, Decoding and PSNR measurement by FFmpeg
  • 116. 116 PSNR Without saving the results in a log file: − ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi psnr -f null - With saving the results in a log file: − ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi psnr=psnr.log -f null – SSIM Without saving the results in a log file: − ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi ssim -f null - With saving the results in a log file: − ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi ssim=ssim.log -f null - PSNR and SSIM measurement by FFmpeg
  • 117. 117 Original Processed MSE Objective Assessment by MSE 𝑀𝑆𝐸 = 1 𝑚 × 𝑛 ෍ 𝑥=1 𝑚 ෍ 𝑦=1 𝑛 (𝑌𝑟 𝑥, 𝑦 − 𝑌𝑑 𝑥, 𝑦 )2
  • 118. 118 Objective Assessment by MSE 𝑀𝑆𝐸 = 1 𝑚 × 𝑛 ෍ 𝑥=1 𝑚 ෍ 𝑦=1 𝑛 (𝑌𝑟 𝑥, 𝑦 − 𝑌𝑑 𝑥, 𝑦 )2 (a) original image (b) luminance offset (c) contrast stretch (d) impulse noise (e) Gaussian noise (f) Blur (g) JPEG compression (h) spatial shift (to the left) (i) spatial scaling (zoom out) (j) rotation (CCW). Images (b)-(g) have nearly identical MSEs but very different visual quality. (b) MSE = 309 (a) (c) MSE = 306 (d) MSE = 313 (e) MSE = 309 (f) MSE = 308 (g) MSE = 309 (h) MSE = 871 (i) MSE = 694 (j) MSE = 590
  • 119. 119 PSNR=19.09 dB PSNR=25.25 dB Objective Assessment by PSNR PSNR for Image Encoding: JPEG-2000
  • 121. 121PSNR 45.53 [dB] Objective Assessment by PSNR
  • 122. 122PSNR 36.81 [dB] Objective Assessment by PSNR
  • 123. 123PSNR 31.45 [dB] Objective Assessment by PSNR
  • 125. 125 50 45 40 35 30 25 20 40dB: Every pixel off by 1% 30dB: Every pixel off by 3.2% 20dB: Every pixel off by 10%0 5 10 15 20 Root MSE PSNR An Example of PSNR Relationship to Root MSE
  • 126. 126 PSNR 25.8 0.63 bpp (bit per pixel) 12.8 : 1 PSNR Reflects Fidelity (1)
  • 127. 127 PSNR 24.2 0.31 bpp (bit per pixel) 25.6 : 1 PSNR Reflects Fidelity (2)
  • 128. 128 PSNR 23.2 0.16 bpp (bit per pixel) 51.2 : 1 PSNR Reflects Fidelity (3)
  • 129. PSNR = 25.12 dB PSNR = 25.11 dB Q. Huynh-Thu and M. Ghanbari, ‘The scope of validity of PSNR in image/video quality assessment, Electronic Letters, 44:13, (June 2008), pp. 800-801 129 PSNR is not Everything
  • 130. 130 PSNR = 25.36 dB PSNR = 25.25 dB PSNR is not Everything
  • 131. 131 PSNR is not Everything
  • 132. 132 PSNR = 25.8 dB PSNR = 25.8 dB PSNR is not Everything
  • 133. − Take a natural image − Give more bits to areas you look at more (Saliency map) − Give less bits to areas you look at less − Subjective rating will be high, PSNR low Ex. 1: How to Trick PSNR? 133 Original Attention Map Example Test (High subjective rating, low PSNR)
  • 134. Ex: A small part of a picture in a video is severely degraded − This type of distortion is very common in video, where due to a single bit error, blocks of 16×16 pixels might be erroneously decoded. − This has almost no significant effect on PSNR but can be viewed as an annoying artefact. Ex. 2: How to Trick PSNR? 134 A Picture Severely Degraded (Distorted) It hardly affects the PSNR or any objective model parameters (depending on the area of distortion) It attracts the observers attention, and the video looks bad if a larger part of the picture was distorted.
  • 135. − In comparing codecs, PSNR or any objective measure should be used with great care. − PSNR does not correlate accurately with the picture quality and thus it would be misleading to directly compare PSNR from very different algorithms. “Ensure that the types of coding distortions are not significantly different from each other” Ex. 3: How to Trick PSNR? 135 Coder 1 (block-based) Coder 2 (filter-based) Blockiness Distortion, PSNR,1 Smearing Distortion, PSNR,2 Same Input Objective results can be different The expert viewers prefer blockiness distortion to the smearing, and nonexperts’ views are opposite!
  • 136. 1. Not shift invariant 2. Rates enhanced images as degraded 3. Difficult to handle different spatial resolutions 4. Difficult to handle different temporal resolutions 5. Does not consider underlying image • Masking in human visual system (temporal, luminance, texture) • Will clearly fail to accurately compare different source material 6. Does not consider relationship among pixels • Compares different types of impairments poorly • Blocking vs. blurring; Wavelet vs. DCT; High vs. low spatial frequency Six Situations in Which PSNR Fails 136
  • 137. − The main criticism against the PSNR “The human interpretation of the distortions at different parts of the video can be different” − Although it is hoped that the variety of interpretations can be included in the objective models, there are still some issues that not only the simple PSNR but also more sophisticated objective models may fail to address. Objective Assessment by PSNR 137 Example of PSNR interpretation in terms of a quality for a specific video codec and streamed video QualityPSNR Value Excellent QualityPSNR > 33dB Fair Quality33 dB < PSNR < 30 dB Poor QualityPSNR < 30 dB
  • 138. What is the main reason that PSNR is still used in comparing the performance of various video codecs? − Under similar conditions if one system has a better PSNR than the other, then the subjective quality can be better but not worse. − PSNR can provide information about the behaviour of the compression algorithm through the multi- generation process. Objective Assessment by PSNR 138 System 1 System 2 PSNR,1 PSNR,2 Same Input
  • 139. MSE and PSNR are widely used because they are simple and easy to calculate and mathematically easy to deal with for optimization purpose. − Mathematically, it’s very tractable • Easy to model, differentiable − Many times, increasing PSNR improves visual quality − Experimentally, it’s easy to optimize • Reducing error in one pixel increases PSNR • Can reduce the error in each pixel independently − Rate distortion optimization successes • Search over all possible strategies, the one that minimizes the distortion (D=MSE) at the decoder, subject to a constraint on encoding rate R. • Shows dramatic improvement in images, video Good Things about MSE and PSNR 139
  • 140. A number of reasons why MSE or PSNR may not correlate well with the human perception of quality. • Digital pixel values, on which the MSE is typically computed, may not exactly represent the light stimulus entering the eye. • Simple error summation, like the one implemented in the MSE formulation, may be markedly different from the way the HVS and the brain arrives at an assessment of the perceived distortion. • Two distorted image signals with the same amount of error energy may have very different structure of errors, and hence different perceptual quality. Some PSNR Facts 140
  • 141. − PSNR is inaccurate in measuring video quality of a video content encoded at different frame rates because it is not capable of assessing the perceptual trade-off between the spatial and temporal qualities. − PSNR follows a monotonic relationship with subjective quality in the case of full frame rate encoding (without the presence of frame freezing or dropping) when the video content and codec are fixed. − So PSNR can be used as an indicator of the variation of the video quality when the content and codec are fixed across the test conditions, and when the encoding is done at full frame rate without the presence of frame freezing or dropping (it leads to frame rate change). − PSNR becomes an unreliable and inaccurate quality metric when several videos with different content are jointly assessed. Some PSNR Facts 141
  • 142. 142 PSNR Long GOP B I B B P B B I BI I I I I I I I I Time GOP I Frame Only Ti me PSNR in Different Moving Picture Types
  • 143. 143 PSNR in Different Moving Picture Types
  • 144. 144 PSNR, GOP, Intra and Inter Coding 1st 5th Long GOP Codec PSNR 10th Generation AVC-Intra100 : Cut Edit AVC-Intra50 : Cut Edit Still Pictures Fast Motion Confetti fall Flashing lights Landscape Long GOP quality is content dependent
  • 145. − This measure gives the difference between color components between the original frame and compressed frame. The value of this metric is the mean absolute difference of the color components in the correspondent points of image. − The values are in 0..1. The value 0 means equal frames, lower values are better. − It can be used to identify which part of a search image is most similar to a template image. Mean Sum of Absolute Difference (MSAD) 145 Original Processed MSAD 𝒅 𝑿, 𝒀 = σ𝒊=𝟏,𝒋=𝟏 𝒎,𝒏 𝑿𝒊,𝒋 − 𝒀𝒊,𝒋 𝒎𝒏
  • 146. The conventional PSNR demonstrates inaccuracy of measurement while applied to measure video stream over wireless and mobile network. − It is due to packet loss issue in the wireless and mobile network. − A concept of dynamic window size is used to improve accuracy of frame lost detection. − The concept is named Aligned-PSNR (APSNR). Aligned-PSNR (APSNR) 146 Illustration of conventional PSNR.
  • 147. − The window size is depending on sum of frame loss in the S-frame. − In this case, there are total of five frame losses so the window size is six (𝑾𝒊𝒏𝒅𝒐𝒘 𝒔𝒊𝒛𝒆 = 𝑺𝒖𝒎𝑭𝑳 + 𝟏). − Window determines limit of frame loss searching so that the algorithm only need to find corresponding frames in total of window size. − The processed S-frame (S-frame number x) should correspond with O-frame number eight. Aligned-PSNR (APSNR) 147 Original Streamed
  • 148. Two phenomena demonstrate that perceived brightness is not a simple function of intensity. − Mach Band Effect: The visual system tends to undershoot or overshoot around the boundary of regions of different intensities. − Simultaneous Contrast: a region’s perceived brightness does not depend only on its intensity. Perceived Brightness Relation with Intensity 148 Mach band effect. Perceived intensity is not a simple function of actual intensity. Examples of simultaneous contrast. All the inner squares have the same intensity, but they appear progressively darker as the background becomes lighter
  • 149. − The term masking usually refers to a destructive interaction or interference among stimuli that are closely coupled in time or space. − This may result in a failure in detection or errors in recognition. − Here, we are mainly concerned with the detectability of one stimulus when another stimulus is present simultaneously. − The effect of one stimulus on the detectability of another, however, does not have to decrease detectability. Masking Recall 149 I: Gray level (intensity value) Masker: Background 𝐼2 (one stimulus) Disk: Another stimulus 𝐼1 In ∆𝑰 = 𝑰 𝟐 − 𝑰 𝟏,the object can be noticed by the HVS with a 50% chance.
  • 150. − Under what circumstances can the disk-shaped object be discriminated from the background (as a masker stimulus) by the HVS? Weber’s law: − Weber’s law states that for a relatively very wide range of I (Masker), the threshold for disc discrimination, ∆𝑰, is directly proportional to the intensity I. • Bright Background: a larger difference in gray levels is needed for the HVS to discriminate the object from the background. • Dark Background: the intensity difference required could be smaller. Masking Recall 150 Contrast Sensitivity Function (CSF) I: Gray level (intensity value) Masker: Background 𝐼2 (one stimulus) Disk: Another stimulus 𝐼1 In ∆𝑰 = 𝑰 𝟐 − 𝑰 𝟏,the object can be noticed by the HVS with a 50% chance. ∆𝑰 𝑰 = 𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕 (≈ 𝟎. 𝟎𝟐)
  • 151. − The HVS demonstrates light adaptation characteristics and as a consequence of that it is sensitive to relative changes in brightness. This effect is referred to as “Luminance masking”. “ Luminance Masking: The perception of brightness is not a linear function of the luminance” − In fact, the threshold of visibility of a brightness pattern is a linear function of the background luminance. − In other words, brighter regions in an image can tolerate more noise due to distortions before it becomes visually annoying. − The direct impact that luminance masking has on image and video compression is related to quantization. − Luminance masking suggests a nonuniform quantization scheme that takes the contrast sensitivity function into consideration. Luminance Masking 151
  • 152. − It can be observed that the noise is more visible in the dark area than in the bright area if comparing, for instance, the dark portion and the bright portion of the cloud above the bridge. 152The bridge in Vancouver: (a) Original and (b) Uniformly corrupted by AWGN. Luminance Masking
  • 153. Luminance Masking − The perception of brightness is not a linear function of the luminance. − The HVS demonstrates light adaptation characteristics and as a consequence of that it is sensitive to relative changes in brightness. Contrast Masking − The changes in contrast are less noticeable when the base contrast is higher than when it is low. − The visibility of certain image components is reduced due to the presence of other strong image components with similar spatial frequencies and orientations at neighboring spatial locations. Contrast Masking 153
  • 154. With same MSE: • The distortions are clearly visible in the ‘‘Caps’’ image. • The distortions are hardly noticeable in the ‘‘Buildings’’ image. • The strong edges and structure in the ‘‘Buildings’’ image effectively mask the distortion, while it is clearly visible in the smooth ‘‘Caps’’ image. This is a consequence of the contrast masking property of the HVS i.e. • The visibility of certain image components is reduced due to the presence of other strong image components with similar spatial frequencies and orientations at neighboring spatial locations. Contrast Masking 154 (a) Original ‘‘Caps’’ image (b) Original ‘‘Buildings’’ image (c) JPEG compressed image, MSE = 160 (d) JPEG compressed image, MSE = 165 (e) JPEG 2000 compressed image, MSE =155 (f) AWGN corrupted image, MSE = 160.
  • 155. − In developing a quality metric, a signal is first decomposed into several frequency bands and the HVS model specifies the maximum possible distortion that can be introduced in each frequency component before the distortion becomes visible. − This is known as the Just Noticeable Difference (JND). − The final stage in the quality evaluation involves combining the errors in the different frequency components, after normalizing them with the corresponding sensitivity thresholds, using some metric such as the Minkowski error. − The final output of the algorithm is either • a spatial map showing the image quality at different spatial locations • a single number describing the overall quality of the image. Developing a Quality Metric Using Just Noticeable Difference (JND) 155
  • 156. − The Contrast Sensitivity Function (CSF) provides a description of the frequency response of the HVS, which can be thought of as a band-pass filter. Weber’s law ∆𝑰 𝑰 = 𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕 ≈ 𝟎. 𝟎𝟐 → 𝑪𝑭𝑺 = 𝐈 ∆𝑰 𝒎𝒊𝒏 − The HVS is less sensitive to higher spatial frequencies and this fact is exploited by most compression algorithms to encode images at low bit rates, with minimal degradation in visual quality. − Most HVS based approaches use some kind of modeling of the luminance masking and contrast sensitivity properties of the HVS. Contrast Sensitivity Function (CSF) 156 HVS Model • Luminance masking • Contrast sensitivity • Contrast masking HVS Model • Luminance masking • Contrast sensitivity • Contrast masking Frequency decomposition Frequency decomposition Error Pooling Reference image/video Distorted image/video Block diagram of HVS-based quality metrics
  • 157. − Structural information is defined as those aspects of the image that are independent of the luminance and contrast − The structure of various objects in the scene is independent of the brightness and contrast of the image. − Structural approaches to image quality assessment, in contrast to HVS-based approaches, take a top- down view of the problem. − It is hypothesized that the HVS has evolved to extract structural information from a scene and hence, quantifying the loss in structural information can accurately predict the quality of an image. − The structural philosophy overcomes certain limitations of HVS-based approaches such as: • computational complexity • inaccuracy of HVS models. Image Structural Information 157
  • 158. − PSNR and MSE are inconsistent with human eye perception. − The distorted versions of the ‘‘Buildings’’ and ‘‘Caps’’ images have the same MSE with respect to the references. − The bad visual quality of the ‘‘Caps’’ image can be attributed to the structural distortions in both the background and the objects in the image. − The structural philosophy can also accurately predict the good visual quality of the ‘‘Buildings’’ image, since the structure of the image remains almost intact in both distorted versions. Image Structural Information 158 (a) Original ‘‘Caps’’ image (b) Original ‘‘Buildings’’ image (c) JPEG compressed image, MSE = 160 (d) JPEG compressed image, MSE = 165 (e) JPEG 2000 compressed image, MSE =155 (f) AWGN corrupted image, MSE = 160.
  • 159. − The HVS demonstrates luminance and contrast masking, which SSIM (also called Wang-Bovik Index) takes into account while PSNR does not. − The SSIM index executes three comparisons, in terms of • l(x,y): Luminance Comparison Measurement • c(x,y): Contrast Comparison Measurement • s(x,y): Structure Comparison Measurement − Where x and y are the original and processed pictures − Value lies between [0,1] Structural SIMilarity (SSIM) 159 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = 𝑓(𝑙 𝑥, 𝑦 , 𝑐 𝑥, 𝑦 , 𝑠 𝑥, 𝑦 )
  • 161. − First the mean luminance is calculated − Then a luminance comparison is executed − 𝑪 𝟏 is used to stabilize the division with weak denominator − The value of the parameters 𝑪 𝟏 is set to (𝑲 𝟏. 𝑳) 𝟐, where 𝑲 𝟏 << 𝟏 is a small constant − Luminance comparison attaining the maximum possible value if and only if the means of the two images are equal. Luminance comparison: l(x,y) 161 𝜇 𝑥 = 1 𝑁 ෍ 𝑖=1 𝑁 𝑥𝑖 𝑙 𝑥, 𝑦 = 2𝜇 𝑥 𝜇 𝑦 + 𝐶1 𝜇 𝑥 2 + 𝜇 𝑦 2 + 𝐶1
  • 162. − The base contrast of each signal is computed using its standard deviation (the average luminance is removed from the signal amplitude) − A contrast comparison is computed as − 𝑪 𝟐 is used to stabilize the division with weak denominator. − The value of the parameters 𝑪 𝟐 is set to (𝑲 𝟐. 𝑳) 𝟐 , where 𝑲 𝟐 << 𝟏 is a small constant. − L is the dynamic range of the pixel values. Contrast Comparison C(x,y) 162 𝜎 𝑥 = ( 1 𝑁 − 1 ෍ 𝑖=1 𝑁 (𝑥𝑖 − 𝜇 𝑥)2 ) 1 2 𝜎 𝑦 = ( 1 𝑁 − 1 ෍ 𝑖=1 𝑁 (𝑥𝑖 − 𝜇 𝑦)2) 1 2 𝑐 𝑥, 𝑦 = 2𝜎 𝑥 𝜎 𝑦 + 𝐶2 𝜎 𝑥 2 + 𝜎 𝑦 2 + 𝐶2
  • 163. − The structural comparison is performed between the luminance and contrast normalized signals. − Let 𝒙 and 𝒚 represent vectors containing pixels from the reference and distorted images respectively. − For each image, the average luminance is subtracted and divided by its base contrast to normalize it. − The correlation or inner product between luminance and contrast normalized signals is an effective measure of the structural similarity. − The correlation between the normalized vectors is equal to the correlation coefficient between the original signals 𝒙 and 𝒚. Structure Comparison S(x,y) 163 Ԧ𝑥 − 𝜇 𝑥 𝜎 𝑥 Ԧ𝑦 − 𝜇 𝑦 𝜎 𝑦
  • 164. − 𝝈 𝒙𝒚 is the covariance of 𝒙 and 𝒚 that is a measure of the joint variability of 𝑥 and 𝑦 . − Pearson's correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables. − A Pearson correlation coefficient is calculated as a measure of structural similarity − 𝑪 𝟑 is used to stabilize the division with weak denominator. Structure Comparison S(x,y) 164 𝑠 𝑥, 𝑦 = 𝜎 𝑥𝑦 + 𝐶3 𝜎 𝑥 𝜎 𝑦 + 𝐶3 𝜎 𝑥𝑦 = 1 𝑁 − 1 ෍ 𝑖=1 𝑁 (𝑥𝑖 − 𝜇 𝑥)(𝑦𝑖 − 𝜇 𝑦)
  • 165. − The SSIM output is a combination of all three components − This SSIM model is parameterized by 𝜶, 𝜷, 𝜸 where typically the parameter values are all equal to 1. − 𝑪 𝟏, 𝑪 𝟐 and 𝑪 𝟑 are small constants added to avoid numerical instability when the denominators of the fractions are small. Final Value of SSIM 165 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = 2𝜇 𝑥 𝜇 𝑦 + 𝐶1 𝜇 𝑥 2 + 𝜇 𝑦 2 + 𝐶1 . 2𝜎 𝑥 𝜎 𝑦 + 𝐶2 𝜎 𝑥 2 + 𝜎 𝑦 2 + 𝐶2 . 𝜎 𝑥𝑦 + 𝐶3 𝜎 𝑥 𝜎 𝑦 + 𝐶3 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = [𝑙 𝑥, 𝑦 ] 𝛼. [c 𝑥, 𝑦 ] 𝛽 . [𝑠 𝑥, 𝑦 ] 𝛾 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = [ 2𝜇 𝑥 𝜇 𝑦 + 𝐶1 𝜇 𝑥 2 + 𝜇 𝑦 2 + 𝐶1 ] 𝛼. [ 2𝜎 𝑥 𝜎 𝑦 + 𝐶2 𝜎 𝑥 2 + 𝜎 𝑦 2 + 𝐶2 ] 𝛽 . [ 𝜎 𝑥𝑦 + 𝐶3 𝜎 𝑥 𝜎 𝑦 + 𝐶3 ] 𝛾
  • 166. − Two variables 𝑪 𝟏 and 𝑪 𝟏 are used to stabilize the division with weak denominator • 𝑪 𝟏 = (𝑲 𝟏. 𝑳) 𝟐 , where 𝑲 𝟏 << 𝟏 is a small constant. (By default 𝑲 𝟏 = 𝟎. 𝟎𝟏) • 𝑪 𝟐 = (𝑲 𝟐. 𝑳) 𝟐 , where 𝑲 𝟐 << 𝟏 is a small constant. (By default 𝑲 𝟐 = 𝟎. 𝟎𝟑) • 𝑳 = 𝟐 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒃𝒊𝒕𝒔 𝒑𝒆𝒓 𝒑𝒊𝒙𝒆𝒍 − 𝟏, the dynamic range of the pixel-values − In order to simplify the expression 𝑪 𝟑 = 𝑪 𝟐/𝟐. − Consequently we get Final Value of SSIM 166 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = 2𝜇 𝑥 𝜇 𝑦 + 𝐶1 𝜇 𝑥 2 + 𝜇 𝑦 2 + 𝐶1 . 2𝜎 𝑥𝑦 + 𝐶2 𝜎 𝑥 2 + 𝜎 𝑦 2 + 𝐶2 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = 2𝜇 𝑥 𝜇 𝑦 + 𝐶1 𝜇 𝑥 2 + 𝜇 𝑦 2 + 𝐶1 . 2𝜎 𝑥 𝜎 𝑦 + 𝐶2 𝜎 𝑥 2 + 𝜎 𝑦 2 + 𝐶2 . 𝜎 𝑥𝑦 + 𝐶3 𝜎 𝑥 𝜎 𝑦 + 𝐶3 MSSIM: This SSIMI is averaged over the image, so it is known as Mean SSIM or MSSIM.
  • 167. 167 Original Gaussian Blurring SSIM = 0.85 Objective Assessment by SSIM
  • 168. 168 JPEG2000 compression SSIM = 0.78 Objective Assessment by SSIM Original
  • 169. 169 Salt and Pepper noise SSIM = 0.87 Objective Assessment by SSIM Original
  • 170. 170 Objective Assessment by MSE and SSIM MSE=226.80 SSIM =0.4489 MSE = 225.91 SSIM =0.4992
  • 171. 171 Objective Assessment by MSE and SSIM MSE = 213.55 SIM = 0.3732 MSE = 225.80 SIM =0.7136
  • 172. 172 Objective Assessment by MSE and SSIM MSE = 226.80 SSIM = 0.4489 MSE = 406.87 SSIM =0.910
  • 173. SSIM vs. MOS 173 On a broad database of images distorted by jpeg, jpeg2000, white noise, gaussian blur, and fast fading noise. -t/τ -t/τ 1+be a 1+βe Curve is best fitting . What is important is that the data cluster closely about the curve. logistic function
  • 174. MOS vs PSNR and SSIM 174
  • 175. − Displaying 𝑆𝑆𝐼𝑀 𝑥, 𝑦 as an image is called a SSIM Map. (𝑆𝑆𝐼𝑀 𝑥, 𝑦 (𝑖, 𝑗)) − It is an effective way of visualizing where the images 𝑥, 𝑦 differ. − The SSIM map depicts where the quality of one image is flawed relative to the other. SSIM Map 175 (a) (b) (d)(c) (a) Reference Image; (b) JPEG Compressed; (c) Absolute Difference; (d) SSIM Map. 𝑆𝑆𝐼𝑀 𝑥, 𝑦 𝑖, 𝑗 = 2𝜇 𝑥(𝑖, 𝑗)𝜇 𝑦(𝑖, 𝑗) + 𝐶1 𝜇 𝑥 2 (𝑖, 𝑗) + 𝜇 𝑦 2 (𝑖, 𝑗) + 𝐶1 . 2𝜎 𝑥𝑦(𝑖, 𝑗) + 𝐶2 𝜎 𝑥 2 (𝑖, 𝑗) + 𝜎 𝑦 2 (𝑖, 𝑗) + 𝐶2
  • 176. SSIM Map 176 (a) Reference Image; (b) JPEG Compressed; (c) Absolute Difference; (d) SSIM Map. (a) (b) (d)(c) An example of perceptual masking!An example of perceptual masking! (a) Reference Image; (b) JPEG Compressed; (c) Absolute Difference; (d) SSIM Map. (a) (b) (d)(c)
  • 177. Two Images with Equal SSIM 177
  • 178. False Ordering SSIM: 0.67 vs. 0.80 178
  • 179. − Some good aspects • Considers correlation between signal and error • Considers local region − Some bad aspects • Very hard to put into an optimization • Still requires an original signal • Does not incorporate many features of HVS − Has received much attention on ways to improve it SSIM Specifications 179
  • 180. − The MSE/PSNR can be a poor predictor of visual fidelity. − VSNR is an efficient metric for quantifying the visual fidelity of natural images based on near-threshold and suprathreshold properties of human vision is VSNR. − It is efficient both in terms of its low computational complexity and low memory requirements − It operates based on physical luminances and visual angle (rather than on digital pixel values and pixel- based dimensions) to accommodate different viewing conditions. − This metric estimates visual fidelity by computing: 1. contrast thresholds for detection of the distortions 2. a measure of the perceived contrast of the distortions 3. a measure of the degree to which the distortions disrupt global precedence and, therefore, degrade the image’s structure. Visual SNR (VSNR) 180 Chandler and Hemami, “A wavelet-based visual signal-to-noise ratio for natural images”, IEEE Trans. Image Processing, 2007
  • 181. It operates via a two-stage approach. 1- Computing contrast thresholds for detection of distortions in the presence of natural images (many subjective tests) − The low-level HVS properties of contrast sensitivity, visual masking (visual summation) are used via a wavelet-based model to determine if the distortions are below the threshold of visual detection (whether the distortions in the distorted image are visible). − If the distortions are below the threshold of detection, the distorted image is deemed to be of perfect visual fidelity (VSNR =∞ ). Visual SNR (VSNR) 181
  • 182. It operates via a two-stage approach. 2- If the distortions are suprathreshold, followings are taken into account as an alternative measure of structural degradation. • Low-level visual property of perceived contrast including I. Low-level HVS property of contrast sensitivity II. Low-level HVS property of visual masking • Mid-level visual property of global precedence (i.e., the visual system’s preference for integrating edges in a coarse-to-fine-scale fashion) − These two properties are modeled as Euclidean distances in distortion-contrast space of a multiscale wavelet decomposition, and VSNR is computed based on a simple linear sum of these distances. Visual SNR (VSNR) 182
  • 183. Two Images with Equal VSNR 183
  • 184. VSNR, Noise Most Visible 184 VSNR: 32.7 SSIM: 0.83 VSNR: 21.2 SSIM: 0.95
  • 185. − Relies on modeling of the statistical image source, the image distortion channel and the human visual distortion channel. − VIF was developed for image and video quality measurement based on natural scene statistics (NSS). − Images come from a common class: the class of natural scene. Objective Assessment by Virtual Image Fidelity (VIF) 185 Natural Image Source Channel Distortion HVS Channel HVS Channel Mutual Information Information Content C D E F
  • 186. − Image quality assessment is done based on information fidelity where the channel imposes fundamental limits on how much information could flow from the source (the reference image), through the channel (the image distortion process) to the receiver (the human observer). Objective Assessment by Virtual Image Fidelity (VIF) 186 Mutual information between C and E quantifies the information that the brain could ideally extract from the reference image, whereas the mutual information between C and F quantifies the corresponding information that could be extracted from the test image. 𝑉𝐼𝐹 = 𝐷𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 𝐼𝑚𝑎𝑔𝑒 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝐼𝑚𝑎𝑔𝑒 𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 Natural Image Source Channel Distortion HVS Channel HVS Channel Mutual Information Information Content C D E F
  • 187. Objective Assessment by Virtual Image Fidelity (VIF) 187 VIF = 0.5999 SSIM = 0.8558 VIF = 1.11 SSIM = 0.9272 − The VIF has a distinction over traditional quality assessment methods, a linear contrast enhancement of the reference image that does not add noise to it will result in a VIF value larger than unity, thereby signifying that the enhanced image has a superior visual quality than the reference image. − No other quality assessment algorithm has the ability to predict if the visual image quality has been enhanced by a contrast enhancement operation.
  • 188. Objective Assessment by Virtual Image Fidelity (VIF) 188 VIF = 0.6045 SSIM = 0.8973 VIF = 0.5944 SSIM = 0.7054
  • 189. Objective Assessment by Virtual Image Fidelity (VIF) 189 VIF = 0.60 SSIM = 0.7673 VIF = 0.6043 SSIM = 0.8695
  • 190. Blur vs. AWG Noise 190 PSNR: 26.6 SSIM: 0.89 VSNR: 20.7 VIF: 0.47 PSNR: 25.4 SSIM: 0.74 VSNR: 29.5 VIF: 0.58
  • 191. Blur vs. AWG Noise 191 PSNR: 26.6 SSIM: 0.89 VSNR: 20.7 VIF: 0.47 PSNR: 21.3 SSIM: 0.60 VSNR: 29.5 VIF: 0.44
  • 192. 192 JPEG-2000 vs. Blur PSNR: 35.4 VSNR: 35.2 SSIM: 0.93 VIF: 0.76 PSNR: 34.3 VSNR: 33.5 SSIM: 0.97 VIF: 0.76
  • 193. 193 JPEG-2000 vs. Blur PSNR: 35.4 VSNR: 35.2 SSIM: 0.93 VIF: 0.59 PSNR: 30.8 VSNR: 26.8 SSIM: 0.94 VIF: 0.5 vs. 0.59
  • 194. − The advantage of the multi-scale methods, over single-scale methods, like SSIM, is that in multi-scale methods image details at different resolutions and viewing conditions are incorporated into the quality assessment algorithm. Multi-scale Structural Similarity Index (MS-SSIM) 194 i  i th scale L: low-pass filter; ↓2:downsampling by factor of 2.
  • 195. MAD assumes that HVS employs different strategies when judging the quality of images. Detection-based Strategy • When HVS attempts to view images containing near-threshold distortions, it tries to move past the image, looking for distortions. • For estimating distortions in detection-based strategy, local luminance and contrast masking are used. Appearance-based Strategy • When HVS attempts to view images containing clearly visible distortions, it tries to move past the distortions, looking for image’s subject matter. • For estimating distortions in appearance-based strategy, variations in local statistics of spatial frequency components are being employed. MAD (Most Apparent Distortion) Algorithm 195
  • 196. 196 MAD (Most Apparent Distortion) Algorithm Detection-based and Appearance-based Strategies The block diagram of the detection-based strategy in the MAD algorithm The block diagram of the appearance-based strategy in the MAD algorithm.
  • 197. − The FSIM algorithm is based on the fact that HVS understands an image mainly due to its low- level characteristics, e.g., edges and zero crossings. − In order to assess the quality of an image, FSIM algorithm uses two kinds of features. 1. Phase Congruency (PC): • Physiological and psychophysical experiments have demonstrated that at points with high phase congruency (PC), HVS can extract highly informative features. 2. Gradient Magnitude (GM): • The phase congruency (PC), is contrast invariant and our perception of an image’s quality is also affected by local contrast of that image. • As a result of this dependency, the image gradient magnitude (GM) is used as the secondary feature in the FSIM algorithm. Feature Similarity Index (FSIM) 197
  • 198. Feature Similarity Index (FSIM) 198 Calculating FSIM measure consists of two stages: • Computing image’s PC and GM • Computing the similarity measure between the reference and test images.
  • 199. 1. The detection thresholds are predicted and a perceptually normalized response map is generated. 2. The perceptually normalized response is decomposed into several bands of different orientations and scales (using cortex transform, i.e., the collection of the band-pass and orientation selective filters) 3. The conditional probability of each distortion type is calculated for prediction of three distortion types separately for each band. 4. The probability of detecting a distortion in any subband is calculated. Quality assessment of high dynamic range (HDR) images Dynamic range independent quality measure (DRIM) 199The block diagram of the DRIM algorithm.
  • 200. − This metric is a combination of • multi-scale structural fidelity measure • statistical naturalness measure − The TMQI algorithm consists of two stages • structural fidelity measurement • statistical naturalness measurement. − At each scale, the local structural fidelity map is computed and averaged in order to obtain a single score S. Quality assessment of high dynamic range (HDR) images Tone-mapped images quality index (TMQI) 200The block diagram of the TMQI algorithm.
  • 201. VMAF – Video Multi-Method Assessment Fusion 201 PSNR: 31 DMOS: 82 PSNR: 31 DMOS: 27 PSNR: 34 DMOS: 96 PSNR: 34 DMOS: 58
  • 202. − VMAF is a perceptual video quality metric that models the human visual system. − Predicts subjective quality by ‘fusing’ elementary metrics into a final quality metrics using a machine-learning algorithm which assigns weights to each elementary metric (Support Vector Machine (SVM) regressor). − Machine-learning model is trained and tested using the opinion scores obtained through a subjective experiment (Ex: NFLX Video Dataset). − It correlates better than PSNR with subjective opinion over a wide quality range and content (Netflix). − VMAF enables reliable codec comparisons across the broad range of bitrates and resolutions occurring in adaptive streaming. Elementary quality metrics − Visual Information Fidelity (VIF): quality is complementary to the measure of information fidelity loss − Detail Loss Metric (DLM): separately measure the loss of details which affects the content visibility, and the redundant impairment which distracts viewer attention − Motion: temporal difference between adjacent frames VMAF – Video Multi-Method Assessment Fusion 202
  • 205. 205 Ex: Video Quality due to De-interlacing De-interlacing: difficult to perform Good quality converters exist (price!) 720p - DNxHD (Gen0) 1080i - DNxHD (Gen0) – de-interlace – 7 source de-interlaced
  • 206. 206 Ex: Distribution Encoder and Bit-rate Which bit-rate to choose? Distribution channel ‘defines’ available bit-rate MPEG-4 does a finejob Motion in video isimportant UncompressedAVC/DNxHD–10Mbit/sH.264 AVC/DNxHD–16Mbit/sH.264AVC/DNxHD–8Mbit/sH.264
  • 207. 207 Ex: Distribution Encoder and Production Codec Uncompressed HD ~ 1 500 Mbit/s Production bitrate ~ 100 Mbit/s Distribution encoder ~ 10 Mbit/s Do we actually see the influence of the production codec? AVC/AVC + 8 Mbit/s H.264 DVC/DVC + 8 Mbit/s H.264
  • 208. Example of Simulation Chain 208 Video Quality in Production Chain Without Pixel shift With Pixel shift (+2H, +6V) Camera Encoding Post Production Encoding (4 Generations)
  • 209. Example of Test Setup 209 Video Quality in Production Chain Encode Decode HD-SDI HD-SDI Uncompressed Source Gen 0 (Cam) Gen 1 (PP1) Gen 2 (PP2) Gen 3 (PP3) Gen 3 shifted Gen 4 (PP4) Uncompressed YCbCr Storage HD-SDI Ingest & Playout VRT-medialab: onderzoek en innovatie
  • 210. − The multi-generation codec assessment simulates how the production chain affects the images as a result of multi-compression and decompression stages. − The multi-generation codec assessment has two steps: 1. The agreed method of multi-generation testing was to visually compare the 1st, 4th and 7th generations (including pixel shifts after each generation) with the original image under defined conditions (reference video monitor, particular viewing environment settings, and expert viewers). 2. In addition, an objective measurement– the PSNR – was calculated to give some general trend indication of the multi-generation performance of the individual codecs. 210 Multi-generation Codec Assessment for Production Codec Tests
  • 211. 211 Standalone Chain (without Spatial Shift) for INTRA Codec
  • 212. − The standalone chain without processing simply consisted of several cascaded generations of the codec under test, without any other modifications to the picture content apart from those applied by the codec under test. − This process accurately simulates the effect of a simple dubbing of the sequence and is usually not very challenging for the compression algorithm. − This simple chain can provide useful information about • the performance of the sub-sampling filtering that is applied. • the precision of the mathematical implementation of the code. 212 Standalone Chain (without Spatial Shift) for INTRA Codec Encoder Decoder Encoder Decoder Encoder Decoder First Generation Second Generation Seven GenerationInput Video
  • 213. 213 Standalone Chain (without Spatial Shift) for INTRA Codec In fact, the most important impact on the picture quality should be incurred at the first generation, when the encoder has to eliminate some information. The effect of the subsequent generations should be minimal as the encoder should basically eliminate the same information already deleted in the first compression step. Encoder Decoder Encoder Decoder Encoder Decoder First Generation Second Generation Seven GenerationInput Video
  • 214. 214 Standalone Chain (with Spatial Shift) for INTRA Codec
  • 215. 215 Standalone Chain (with Spatial Shift) for INTRA Codec
  • 216. − In a real production chain, several manipulations are applied to the picture to produce the master • Editing • Zoom • NLE • Colour correction − A realistic simulation has to take into account this issue. − As all these processes are currently feasible only in the uncompressed domain, the effect of the processing is simulated by spatially shifting the image horizontally (pixel) or vertically (lines) in between each compression step. − Obviously, this shift makes the task of the coder more challenging, especially for those algorithms based on a division of the picture into blocks (e.g. NxN DCT block), as in any later generation the content of each block is different to that in the previous generation. 216 Standalone Chain (with Spatial Shift) for INTRA Codec
  • 217. − The shift process introduces black pixels on the edges of the frame if/when necessary. − The shifts were applied variously using software or hardware, but the method used was exactly the same for all the algorithms under test. • Horizontal shift (H): “only even shifts” to take into account the chroma subsampling of the 4:2:2 format. • Vertical shift (V): shift is applied on a frame basis and is always an “even value”. Progressive formats: − The whole frame is shifted by a number of lines corresponding to the vertical shift applied. • For example, a shift equal to +2V means two lines down for progressive formats. Interlaced formats: − Each field is shifted by a number of lines corresponding to half the vertical shift applied. • For example, a shift equal to +2V means 1 line down for each field of an interlaced format. 217 Standalone Chain (with Spatial Shift) for INTRA Codec
  • 218. 218 Standalone Chain with GoP Alignment (without Spatial Shift) for INTER Codec
  • 219. − The GoP structure has some important implications on the way the standalone chain has to be realized, and introduces a further variable in the way the multi-generation can be performed –depending on whether the GoP alignment is guaranteed between each generation (GoP aligned) or not (GoP mis- aligned). − The GoP is considered to be aligned if one frame of the original picture that is encoded at the first generation using one of the three possible kinds of frame is again encoded using that same kind of frame in all the following generations (Intra, Predicted or Bidirectional ). − It is therefore possible to have only one multi-generation chain with “GoP alignment”. 219 Standalone Chain with GoP Alignment (without Spatial Shift) for INTER Codec Encoder Decoder Encoder Decoder Encoder Decoder First Generation Second Generation Seven GenerationInput Video Ex: frame n of the original sequence is always encoded as Intra and frame n+6 as Predicted in all generations.
  • 220. 220 Standalone Chain with GoP Alignment and Spatial Shift for INTER Codec
  • 221. − If GoP alignment is not guaranteed, several conditions of GoP mis-alignment are possible If GoP length L=12 → for the second generation 11 different GoP mis-alignments are possible → for the third generation 11 by 11 different GoP mis-alignments are possible and so on → making the testing of all the possible conditions unrealistic. − It was therefore agreed to apply one “temporal shift” equal to one frame between each generation, so that the frame that is encoded in Intra mode in the first generation is encoded in Bidirectional mode in the second generation and, in general, in a different mode for each following generation. − It is interesting to underline that the alignment of the GoP in the different generations was under control (not random) and that this was considered the likely worst case as far as the mis-alignment effect is concerned, and was referred to in the documents as “chain without GoP alignment”. 221 Standalone Chain without GoP Alignment
  • 222. 222 Standalone Chain without GoP Alignment and with Spatial Shift for INTER Codec
  • 223. − Four different possible standalone chains up to the seventh generation: • Multigeneration chain with GoP alignment (without spatial shift) • Multigeneration chain without GoP alignment (without spatial shift) • Multigeneration chain with GoP alignment and spatial shift • Multigeneration chain without GoP alignment and spatial shift − A procedure to re-establish the spatial alignment between the original and the de-compressed version of the test sequence is applied. − 16 pixels on the edges of the picture is skipped to avoid taking measurements on the black pixels introduced during the shift. − PSNR does not correlate accurately with the picture quality and thus it would be misleading to directly compare PSNR from very different algorithms. − PSNR can provide information about the behaviour of the compression algorithm through the multi-generation process. 223 Objective Measurements by PSNR
  • 224. − Both objective measurements (PSNR) and visual scrutiny of the picture (i.e. expert viewing) are used for video quality evaluation. − They are considered to be complementary. 224 Ex. 1, EBU R124-2008
  • 225. − The viewing distance is 3H (HDTV). − Sometimes a closer viewing distance, e.g. 1H, was used to closely observe small details and artefacts and, when used, this condition was clearly noted in the report. 225 Ex. 1, EBU R124-2008 Compressed (Impaired) version (e.g. Seventh generation with spatial shift) Original The following displays were used during the tests (HDTV): • CRT 32” Sony Type BVM-A32E1WM • CRT 20” Sony Type BVM-A20F1M • Plasma Full HD 50” Type TH50PK9EK Panasonic • LCD 47” Type Focus The displays and the room conditions were aligned according to the conditions described in ITU-R BT.500-11.
  • 226. • For acquisition applications an HDTV format with 4:2:2 sampling, no further horizontal or vertical sub- sampling should be applied. • The 8-bit bit-depth is sufficient for mainstream programmes. • The 10-bit bit-depth is preferred for high-end acquisition. − For production applications of mainstream HD, the tests of the EBU has found no reason to relax the requirement placed on SDTV studio codecs that “Quasi-transparent quality” must be maintained after 7 cycles of encoding and recoding with horizontal and vertical pixel-shifts applied. − All tested codecs have shown quasi-transparent quality up to at least 4 to 5 multi-generations, but have also shown few impairments such as noise or loss of resolution with critical images at the 7th generation. → Thus EBU Members are required to carefully design the production workflow and to avoid 7 multi- generation steps. 226 Ex. 1, EBU R124-2008
  • 227. The EBU recommends in document R124-2008 is that: − If the production/archiving format is to be based on I-frames only, the bitrate should not be less than 100 Mbit/s. − If the production/archiving format is to be based on long-GoP MPEG-2, the bitrate should not be less than 50 Mbit/s. − Furthermore, the expert viewing tests have revealed that: • A 10-bit bit-depth in production is only significant for post-production with graphics and after transmission encoding and decoding at the consumer end, if the content has been generated using advanced colour grading, etc (e.g. graphics or animation). • For normal moving pictures, an 8-bit bit-depth in production will not significantly degrade the HD picture quality at the consumer’s premises. 227 Ex. 1, EBU R124-2008
  • 228. − A contribution link was simulated by passing a signal twice through the codec under test, with a spatial pixel shift introduced between codec passes. − This is equivalent to a signal passing through a pair of cascaded codecs that would typically be encountered on a contribution link. − 4:2:2 colour sampling is required for professional contribution applications. 228 Ex. 2, Simulation of a Typical Contribution Link Encoder Decoder Pixel Shift Final Output 2 Pixels horizontally 2 Pixels vertically 1st Pass 2nd Pass Source
  • 229. − The equivalent quality using H.264/AVC was found subjectively by viewing the H.264/AVC sequences at different bit-rates, starting at half the MPEG-2 bit-rate and increasing it by steps of 10% (of the MPEG-2 reference bit rate) until the quality of the MPEG-2 encoded sequence and that of the H.264/AVC encoded sequence was judged to be equivalent. − The subjective evaluation mainly tested the 2nd generation sequences, which can actually allow better discrimination of the feeds than the 1st generation. 229 Ex. 2, Simulation of a Typical Contribution Link (Cont.) EBU Triple Stimulus Continuous Evaluation Scale (TSCES) quality assessment rig