SlideShare a Scribd company logo
Dr. Mohieddin Moradi
mohieddinmoradi@gmail.com
1
Dream
Idea
Plan
Implementation
Section I
− Perceptual Artifacts in Compressed Video
− Quality Assessment in Compressed Video
− Objective Assessment of Compressed Video
− Objective Assessment of Compressed Video, Codec Assessment for Production
Section II
− Subjective Assessment of Compressed Video
− Subjective Assessment of Compressed Video, Subjective Assessment by Expert Viewing
− Performance Comparison of Video Coding Standards: An Adaptive Streaming Perspective
− Subjective Assessment by Visualizer™ Test Pattern
2
Outline
3
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
4
Upper anchor, Source material
Material to be evaluated
Lower anchor , Worst case
VRT-medialab: onderzoek en
innovatie
Subjective Assessment
EBU Triple Stimulus Continuous Evaluation Scale
(TSCES) quality assessment rig
In subjective evaluation, a panel of human observers are given the task to rate the quality of image or video
sequence.
− There are normally 15 to 25 subjects, but the minimum is 15
− Subjects should have normal HVS (age 18-30)
− They should NOT be familiar with Image degradations (but now it is changed!)
− The average values of the score is called: Mean Opinion Score (MOS)
− MOS is normally defined in the range of 1-5
Subjective Assessment
5
− Golden Standard
• Provides the best predictive value of actual subjective ratings
− Absolute Category Rating (ACR)
• Human subjects are shown two video sequences (original and processed) and are asked to assess the
overall quality of the processed sequence with respect to the original (reference) sequence
− Mean Opinion Score (MOS)
• Numerical measure of the human-judged overall quality
• On a scale of 1 (bad), 2 (poor), 3 (fair), 4 (good), 5 (excellent)
− DMOS (Differential Mean Opinion Scores)
• It is defined as the difference between the raw quality score of the reference and test images.
Mean Opinion Score (MOS)
6
Laboratory Viewing Environment
− The laboratory viewing environment is intended to provide critical conditions to check systems.
− The laboratory viewing environment parameters have been selected to define an environment slightly
more critical than the typical home viewing situations.
Home Viewing Environment
− The home viewing environment is intended to provide a means to evaluate quality at the consumer side
of the TV chain.
General Viewing Conditions
7
In a Laboratory Environment
The assessors’ viewing conditions should be arranged as follows:
− Room illumination: low
− Chromaticity of background: D65
− Peak luminance (it should be adjusted according to the room illumination): 70-250 𝒄𝒅/𝒎 𝟐
− Monitor contrast ratio: ≤ 0.02
− Ratio of luminance of background behind picture monitor to peak luminance of picture:≃0.15
General Viewing Conditions for Subjective Assessments
8
In Home Environment
The assessors’ viewing conditions should be arranged as follows:
− Environmental illuminance on the screen (incident light from the environment falling on the screen, should
be measured perpendicularly to the screen): 200 lux
− Peak luminance (it should be adjusted according to the room illumination): 70-500 𝒄𝒅/𝒎 𝟐
− Ratio of luminance of inactive screen to peak luminance monitor contrast ratio: ≤ 0.02
General Viewing Conditions for Subjective Assessments
9
Preferred Viewing Distance (PVD)
− The PVD is based upon viewers’ preferences which have been determined empirically.
− The PVD contains a number of data sets collected from available sources.
− This information may be referred to for designing a subjective assessment test.
Viewing Distance
10
Design Viewing Distance (DVD).
− The design viewing distance (DVD), or optimal viewing distance, for a digital system is the distance at
which two adjacent pixels subtend an angle of 1 arc-min at the viewer’s eye; and the optimal horizontal
viewing angle as the angle under which an image is seen at its optimal viewing distance.
Viewing Distance
11
− Using monitors with different characteristics will yield different subjective image qualities.
− It is therefore strongly recommended that characteristics of the monitors used should be checked
beforehand.
• Recommendation ITU-R BT.1886 – Reference electro-optical transfer function for Flat Panel Displays
(FPD) used in HDTV studio production
• Report ITU-R BT.2129 – User requirements for a Flat Panel Display (FPD) as a Master monitor in an HDTV
programme production environment may be referred to when professional FPD monitors are used for
subjective assessment.
• Report ITU-R BT.2390 provides information concerning laboratory and home displays and viewing
environments for the assessment of High Dynamic Range (HDR) images.
The Monitor
12
− Monitor processing such as image scaling, frame rate conversion, image enhancer, if implemented,
should be done in such a way as to avoid introducing artefacts.
− HDR processing should be appropriate to the HDR system being assessed or being used during the
assessment. For consumer environment or distribution assessments, this may include the use of the
appropriate static or dynamic metadata.
− Full details of such metadata should be included in the notes of the assessments in order that other
laboratories can accurately repeat the assessments.
Monitor Processing
13
− When using consumer displays for subject image assessments, it is important that all image-processing
options are disabled unless the impact of such image processing is the subject of the assessment(s).
− When accessing interlace images, the test report should indicate whether de-interlacer has been used or
not. It is preferable not to use de-interlacer if interlaced signals can be displayed without it.
Monitor Processing
14
− The resolution of professional monitors usually complies with the required standards for subjective
assessments in their luminance operating range.
− To check and report the maximum and minimum resolutions (center and corners of the screen) at the
used luminance value might be suggested.
− If consumer FPD TV displays are used for subjective assessments, it is strongly recommended to check and
report the maximum and minimum resolutions (center and corners of the screen) at the used luminance
value.
− At present the most practical system available to subjective assessments performers, in order to check
monitors or consumer TV sets resolutions, is the use of a swept test pattern electronically generated.
Monitor resolution
15
Contrast
− Contrast could be strongly influenced by the environment illuminance.
− Professional monitors seldom use technologies to improve their contrast in a high illuminance environment,
so it is possible they do not comply with the requested contrast standard if used in a high illuminance
environment.
− Consumer monitors typically use technologies to get a better contrast in a high illuminance environment.
Brightness
− When adjusting the LCD monitor brightness, it is preferable to use backlight intensity control rather than
using “signal level” scaling to retain the bit precision.
− In the case of other display technologies that do not use a backlight, the white level should be adjusted
by means other than signal level scaling.
Monitor Contrast and Brightness
16
− The display should not introduce motion artefacts that are introduced by specific display technologies.
− On the other hand, the motion effects included in the input signal should be represented on the display.
− When using consumer displays, it is vital that ALL motion processing options are disabled.
Monitor Motion Processing Options
17
Assessment problem Material used
Overall performance with average material General, “critical but not unduly so”
Capacity, critical applications (e.g. contribution, post-
processing, etc.)
Range, including very critical material for the application
tested
Performance of “adaptive” systems Material very critical for “adaptive” scheme used
Identify weaknesses and possible improvements Critical, attribute-specific material
Identify factors on which systems are seen to vary Wide range of very rich material
Conversion among different standards Critical for differences (e.g. field rate)
A Survey of Typical Assessment Problems and Test Materials Used to Address Them
ITU-R Test Sequences
− A number of organizations have developed test still images and sequences.
− Report ITU-R BT.2245 - HDTV and UHDTV including HDR-TV test materials for assessment of picture quality,
gives details of HDTV and UHDTV test material that can be used for subjective assessment.
18
− Observers may be expert or non-expert depending on the objectives of the assessment.
− An expert observer is an observer that has expertise in the image artefacts that may be introduced by the
system under test.
− A non-expert (“naive”) observer is an observer that has no expertise in the image artefacts that may be
introduced by the system under test.
− In any case, observers should not be, or have been, directly involved, i.e. enough to acquire specific and
detailed knowledge, in the development of the system under study.
Expert and Non-expert Observers
19
− At least 15 observers should be used.
− The number of assessors needed depends upon the sensitivity and reliability of the test procedure
adopted and upon the anticipated size of the effect sought.
− For studies with limited scope, e.g. of exploratory nature, fewer than 15 observers may be used. In this
case, the study should be identified as ‘informal’.
− The level of expertise in television image quality assessment of the observers should be reported.
Observers Quantity
20
− Prior to a session, the observers should be screened for (corrected-to-) normal visual acuity on the Snellen
or Landolt chart, and for normal colour vision using specially selected charts (Ishihara, for instance).
− A study of consistency between results at different testing laboratories has found that systematic
differences can occur between results obtained from different laboratories.
− A possible explanation for the differences between different laboratories is that there may be different skill
levels amongst different groups of assessors.
− Suggested data to be provided could include an occupation category (e.g. broadcast organization
employee, university student, office worker, ...), gender, and age range.
Observer Screening and Test Consistency
21
− Assessors should be carefully introduced to
• the method of assessment
• the types of impairment or quality factors likely to occur
• the grading scale
• the sequence and timing
− Training sequences demonstrating the range and the type of the impairments to be assessed should be
used with illustrating images other than those used in the test, but of comparable sensitivity.
− In the case of quality assessments, quality may be defined as to consist of specific perceptual attributes.
Instructions for the Assessment
22
The Test Session
23
− A session should not last more than half an hour.
− At the beginning of the first session, about five “dummy presentations” should be introduced to stabilize
the observers’ opinion.
− The data issued from these presentations must not be considered in the results of the test.
− If several sessions are necessary, about three dummy presentations are only necessary at the beginning of
the following session.
− A random order should be used for the presentations (for example, derived from Graeco-Latin squares);
but the test condition order should be arranged so that any effects on the grading of tiredness or
adaptation are balanced out from session to session.
− Some of the presentations can be repeated from session to session to check coherence.
The Test Session
24
− For each test parameter, the mean and 95% confidence interval of the statistical distribution of the
assessment grades must be given.
− The results must be given together with the following information:
• details of the test configuration;
• details of the test materials;
• type of image source and display monitors;
• number and type of assessors;
• reference systems used;
• the grand mean score for the experiment;
• original and adjusted mean scores and 95% confidence interval if one or more observers have been
eliminated according to the procedure given below.
Presentation of the Results
25
When the time-varying picture quality of the processed video without reference is evaluated by the subjects,
the SSCQE is preferred.
− Viewers are shown each video sequence one by one and asked them to vote.
− Each sequence is shown only once and it consists of 30 seconds to several minutes of duration.
− Since video sequences are long, they are segmented into 10s shots, and for each video segment an MOS
is calculated.
− During the presentation of sequence, viewers continuously provide their rating scores by using a slider.
− The variations in slider are observed after every 500 msec which provides the continuous scale reading.
− The rating scale is divided into five equal lengths as explained for DSCQS.
Single Stimulus Continuous Quality Evaluation (SSCQE)
26
− A typical assessment might call for an evaluation of either a new system, or the effect of a transmission
path impairment.
− The double-stimulus (EBU) method is cyclic in that the assessor is first presented with an unimpaired
reference, then with the same image impaired.
− Following this, he is asked to vote on the second, keeping in mind the first.
The Double-stimulus Impairment Scale (DSIS)
27
− In sessions, which last up to half an hour, the assessor is presented with a series of images or sequences in
random order and with random impairments covering all required combinations.
− The unimpaired image is included in the images or sequences to be assessed.
− At the end of the series of sessions, the mean score for each test condition and test image is calculated.
− The method uses the impairment scale, for which it is usually found that the stability of the results is greater
for small impairments than for large impairments.
− Although the method sometimes has been used with limited ranges of impairments, it is more properly
used with a full range of impairments.
The Double-stimulus Impairment Scale (DSIS)
28
− Assessors are presented with a series of test images or sequences. They are arranged in pairs such that the
first in the pair comes direct from the source, and the second is the same image via the system under test.
The Double-stimulus Impairment Scale (DSIS)
Grading scales
• Imperceptible
• Perceptible, But Not Annoying
• Slightly Annoying
• Annoying
• Very Annoying
29
Source Signal
or
System Under Test.
Presentation of the Test Material
− A test session comprises a number of presentations.
− There are two variants to the structure of presentations, I and II outlined below.
• Variant I: The reference image or sequence and the test image or sequence are presented only once.
• Variant II: The reference image or sequence and the test image or sequence are presented twice. It is
more time consuming, may be applied if the discrimination of very small impairments is required or
moving sequences are under test.
The Double-stimulus Impairment Scale (DSIS)
T1 = 10s Reference image
T2 = 3s Mid-grey produced by video level of around 200 mV
T3 = 10s Test condition
T4 = 5 to 11s Mid-grey
Experience suggests that extending the periods T1 and T3 beyond 10 s does not
improve the assessor’s ability to grade the image sequences.
30
The Introduction to the Assessments
− At the beginning of each session, an explanation is given to the observers about the type of assessment,
the grading scale, the sequence and timing (reference image, grey, test image, voting period).
− The range and type of the impairments to be assessed should be illustrated on images other than those
used in the tests, but of comparable sensitivity.
− It must not be implied that the worst quality seen necessarily corresponds to the lowest subjective grade.
− Observers should be asked to base their judgement on the overall impression given by the image, and to
express these judgements in terms of the wordings used to define the subjective scale.
− The observers should be asked to look at the image for the whole of the duration of T1 and T3.
− Voting should be permitted only during T4.
The Double-stimulus Impairment Scale (DSIS)
31
The Test Session
− The images and impairments should be presented in a pseudo-random sequence and, preferably in a
different sequence for each session.
− In any case, the same test image or sequences should never be presented on two successive occasions
with the same or different levels of impairment.
− The range of impairments should be chosen so that all grades are used by the majority of observers; a
grand mean score (averaged overall judgements made in the experiment) close to three should be
aimed at.
− A session should not last more than roughly half an hour, including the explanations and preliminaries; the
test sequence could begin with a few images indicative of the range of impairments; judgements of these
images would not be taken into account in the final results.
The Double-stimulus Impairment Scale (DSIS)
32
− Video sequences are presented to observers in pair and the first video will always be reference video
followed by the coded video sequence, and instead of quality, impairment is rated.
− For voting: there are two methods used:
1st method: the reference video is followed by coded video and viewers are asked to vote.
2nd method: both reference and coded video sequences are presented twice to the observers and then
asked them for vote as illustrated in figure.
Double Stimulus Impairment Scale (DSIS), Summary
33
− A typical assessment might call for evaluation of a new system or of the effects of transmission paths on
quality.
− The double-stimulus method is thought to be especially useful when it is not possible to provide test
stimulus test conditions that exhibit the full range of quality.
− The method is cyclic in that the assessor is asked to view a pair of images, each from the same source, but
one via the process under examination, and the other one directly from the source.
− He is asked to assess the quality of both.
− In sessions which last up to half an hour, the assessor is presented with a series of image pairs (internally
random) in random order, and with random impairments covering all required combinations.
− At the end of the sessions, the mean scores for each test condition and test image are calculated.
The Double-stimulus Continuous Quality-scale (DSCQS)
34
General arrangement for test system for DSCQS method
The Double-stimulus Continuous Quality-scale (DSCQS)
T1 = 10 s Reference image
T2 = 3 s Mid-grey produced by video level of around 200 mV
T3 = 10 s Test condition
T4 = 5 to 11 s Mid-grey
35
Presentation of the Test Material
Variant I (a single observer):
− The assessor is free to switch between the A and B signals until the assessor has the mental measure of
the quality associated with each signal.
− The assessor may typically choose to do this two or three times for periods of up to 10s.
Variant II (use a number of observers simultaneously):
− Prior to recording results, the pair of conditions is shown one or more times for an equal length of time
to allow the assessor to gain the mental measure of the qualities associated with them, then the pair is
shown again one or more times while the results are recorded.
− The number of repetitions depends on the length of the test sequences.
− For still images, a 3-4 s sequence and five repetitions (voting during the last two) may be appropriate.
− For moving images with time-varying artefacts, a 10s sequence with two repetitions (voting during the
second) may be appropriate.
The Double-stimulus Continuous Quality-scale (DSCQS)
36
Grading Scale
− The method requires the assessment of two versions of each test image.
− One of each pair of test images is unimpaired while the other presentation might or might
not contain an impairment.
− The unimpaired image is included to serve as a reference, but the observers are not told
which the reference image is.
− In the series of tests, the position of the reference image is changed in pseudo-random
fashion.
− The observers are simply asked to assess the overall image quality of each presentation by
inserting a mark on a vertical scale.
− The vertical scales are printed in pairs to accommodate the double presentation of each
test image.
− The scales provide a continuous rating system to avoid quantizing errors, but they are divided
into five equal lengths which correspond to the normal ITU-R five-point quality scale.
The Double-stimulus Continuous Quality-scale (DSCQS)
37
Analysis of the Results
− The pairs of assessments (reference and test) for each test condition are converted from
measurements of length on the score sheet to normalized scores in the range 0 to 100.
− Then, the differences between the assessment of the reference and the test condition are
calculated.
− Experience has shown that the scores obtained for different test sequences are dependent
on the criticality of the test material used.
− A more complete understanding of codec performance can be obtained by presenting
results for different test sequences separately, rather than only as aggregated averages
across all the test sequences used in the assessment.
− If results for individual test sequences are arranged in a rank order of test sequence
criticality on an abscissa it is possible to present a crude graphical description of the image
content failure characteristic of the system under test.
The Double-stimulus Continuous Quality-scale (DSCQS)
38
Interpretation of the Results
− When using this DSCQS method, it could be hazardous, and even wrong, to derive conclusions about
the quality of the conditions under test by associating numerical DSCQS values with adjectives coming
from other tests protocols (e.g. imperceptible, perceptible but not annoying, ... coming from the DSIS
method).
− It is noted that results obtained from the DSCQS method should not be treated as absolute scores but as
differences of scores between a reference condition and a test condition.
− Thus, it is erroneous to associate the scores with a single quality description term even with those which
come from the DSCQS protocol itself (e.g. excellent, good, fair, ...).
− In any test procedure it is important to decide acceptability criteria before the assessment is
commenced.
− This is especially important when using the DSCQS method because of the tendency for inexperienced
users to misunderstand the meaning of the quality scale values produced by the method.
The Double-stimulus Continuous Quality-scale (DSCQS)
39
− Each assessment consists of a pair of video sequences A then B followed by the same sequences A and B.
− The order of reference and coded video sequences are shuffled and is unknown to viewers.
− Viewers are asked to rate the quality of both video sequences.
− Each video sequence has typically 8 to 10 seconds duration.
− The voting pattern of DSCQS is illustrated in figure below.
Double Stimulus Continuos Quality Scale (DSCQS), Summary
40
− Viewers rate the quality on continuous scale which is divided into five equal
intervals:
– Excellent for (100-80),
– Good for (79-60),
– Fair for (59-40),
– Poor for (39-20)
– Bad for (19-0)
− The final score is rated from 1(Bad) to 5 (Excellent)
− For distribution applications, quality specifications can be expressed in terms of the subjective judgement
of observers.
− Such codecs can in theory therefore be assessed subjectively against these specifications.
− The quality of a codec designed for contribution applications however, could not in theory be specified in
terms of subjective performance parameters because its output is destined not for immediate viewing, but
for studio post-processing, storing and/or coding for further transmission.
− Because of the difficulty of defining this performance for a variety of post-processing operations, the
approach preferred has been to specify the performance of a chain of equipment, including a post-
processing function, which is thought to be representative of a practical contribution application.
Subjective Assessment for Distribution or Contribution Codec
41
− This chain might typically consist of a codec, followed by a studio post-processing function (or another
codec in the case of basic contribution quality assessment), followed by yet another codec before the
signal reaches the observer.
− Adoption of this strategy for the specification of codecs for contribution applications means that the
measurement procedures given in this Recommendation can also be used to assess them.
− The most completely reliable method of evaluating the ranking order of high-quality codecs is to assess
all the candidate systems at the same time under identical conditions.
− Tests made independently, where fine differences of quality are involved, should be used for guidance
rather than as indisputable evidence of superiority.
Subjective Assessment for Distribution or Contribution Codec
42
− Where a codec is being assessed for distribution applications, this quality refers to pictures decoded after
a single pass through a codec pair.
− For contribution codecs, basic quality may be assessed after several codecs in series, in order to simulate
a typical contribution application.
− Where the range of quality to be assessed is small, as will normally be the case for television codecs, the
testing methodology to be used is variant II of the double-stimulus continuous quality-scale.
− The original source sequence will be used as the reference condition. Further consideration is being given
to the duration of presentation sequences.
− In the recent tests on codecs for 4:2:2 component video, it was considered advantageous to modify the
presentation from that given in this Recommendation.
− Composite pictures were used as an additional reference to provide a lower quality level against which to
judge the codec performance.
Evaluations of Basic Picture Quality
43
− It is recommended that at least six picture sequences be used in the assessment, plus an additional one
to be used for training purposes prior to the start of the trial.
− The sequences should range between moderately critical and critical in the context of the bit-rate
reduction application being considered.
− Suppose the importance is stressed of testing digital codecs with picture sequences which are critical in
the context of television bit-rate reduction.
• It is therefore reasonable to ask:
• How critical a particular image sequence is for a particular bit-rate reduction task.
or
• Whether one sequence is more critical than another.
Evaluations of Basic Picture Quality
44
− A simple but not especially helpful answer is that “criticality” means very different things to different
codecs.
− For example:
• To an intrafield codec a still picture containing much detail could well be critical.
• To an interframe codec a still picture containing much detail would present no difficulty at all.
Critical Sequences to All Classes of Codec
− Moving Texture + Complex motion → These types of sequences are most useful to generate or identify.
• Complex motion may take the form of movements which are predictable to an observer but not to coding
algorithms, such as tortuous periodic motion.
Evaluations of Basic Picture Quality
45
Fixed Word-length Intra-field Codecs
− While it is possible and valid to assess these codecs on still images, the use of moving sequences is
recommended since coding noise processes are easier to observe and this is more realistic of television
applications.
− If still images are used in computer simulations of codecs, processing should be performed over the entire
assessment sequence in order to preserve temporal aspects of any source noise, for example.
− The scenes chosen should contain as many as possible of the following details:
• static and moving textured areas (some with coloured texture)
• static and moving objects with sharp high contrast edges at various orientation (some with colour)
• static plain mid-grey areas.
− At least one sequence in the ensemble should exhibit just perceptible source noise
− At least one sequence should be synthetic (i.e. computer generated) so that it is free from camera
imperfections such as scanning aperture and lag.
Evaluations of Basic Picture Quality
46
Fixed Word-length Interframe Codecs
− The test scenes chosen should all contain movement and as many as possible of the following details:
• moving textured areas (some coloured)
• objects with sharp, high contrast edges moving in a direction perpendicular to these edges and at
various orientations (some coloured)
− At least one sequence in the ensemble should exhibit just perceptible source noise
− At least one sequence should be synthetic.
Evaluations of Basic Picture Quality
47
Variable Word-length Intra-field Codecs
− It is recommended that these codecs be tested with moving image sequence material for the same
reasons as the fixed word-length codecs.
− It should be noted that by virtue of its variable word-length coding and associated buffer store, these
codecs can dynamically distribute coding bit-capacity throughout the image.
− Thus, for example, if half of a picture consists of a featureless sky which does not require many bits to
code, capacity is saved for the other parts of the picture which can therefore be reproduced with high
quality even if they are critical.
− The important conclusion from this is that if a picture sequence is to be critical for such a codec, the
content of every part of the screen should be detailed.
Evaluations of Basic Picture Quality
48
Variable Word-length Intra-field Codecs
− So if a picture sequence is to be critical for such a codec, the content of every part of the screen should
be detailed.
− It should be filled with:
• moving and static texture
• as much colour variation as possible
• objects with sharp, high contrast edges
− At least one sequence in the text ensemble should exhibit just perceptible source noise
− At least one sequence should be synthetic.
Evaluations of Basic Picture Quality
49
Variable Word-length Interframe Codecs
− This is the most sophisticated class of codec and the kind which requires the most demanding material to
stress it.
− Not only should every part of the scene be filled with detail as in the intra-field variable word-length case,
but this detail should also exhibit motion.
− Furthermore, since many codecs employ motion compensation methods, the motion throughout the
sequence should be complex.
Evaluations of Basic Picture Quality
50
Variable Word-length Interframe Codecs
− Examples of complex motion are:
• scenes employing simultaneous zooming and panning of a camera
• a scene which has as a background a textured or detailed curtain blowing in the wind
• a scene containing objects which are rotating in the three-dimensional world
• scenes containing detailed objects which accelerate across the screen.
− All scenes should contain substantial motion of objects with different velocities, textures and high contrast
edges as well as a varied colour content.
− At least one sequence in the test ensemble should exhibit just perceptible source noise
− At least one sequence should have complex computer-generated camera motion from a natural still
picture (so that it is free from noise and camera lag)
− At least one sequence should be entirely computer generated.
Evaluations of Basic Picture Quality
51
− This assessment is intended to permit judgement to be made on the suitability of a codec for contribution
applications with respect to a particular post-process e.g. colour matte, slow motion, electronic zoom.
− The minimum arrangement of equipment for such an assessment is a single pass through the codec under
test, followed by the post-process of interest, followed by the viewer.
− It may, however, be more representative of a contribution application to employ further codecs after the
post-process.
− The test methodology to be used is variant II of the double-stimulus continuous quality-scale method (use
a number of observers simultaneously).
− Here however the reference condition will be the source subjected to the same post-processing as the
decoded pictures.
− If inclusion of a lower quality reference is considered to be advantageous then it too should be subjected
to the same post-process.
Evaluations of Picture Quality after Downstream Processing
52
− Test sequences required for post-processing assessments are subject to exactly the same criticality criteria
as sequences for other digital applications.
− Because of the practical constraints of possibly having to assess a codec with several post-processes, the
number of test picture sequences used may be a minimum of three with an additional one available for
demonstration purposes.
− The nature of the sequences will be dependent upon the post-processing task being studied but should
range between moderately critical and critical in the context of television bit-rate reduction and for the
process under consideration.
− For slow motion assessment a display rate of 1/10th of the source rate may be suitable.
Evaluations of Picture Quality after Downstream Processing
53
− In subjective assessments of impairments in codec images, due to imperfections in the transmission or
emission channel, a minimum of five, but preferably more, bit-error ratios or selected
transmission/emission conditions should be chosen, approximately logarithmically spaced and
adequately sampling the range which gives rise to codec impairments from “imperceptible” to “very
annoying”.
− It is possible that codec assessments could be required at transmission bit error ratios which result in visible
transients so infrequent that they may not be expected to occur during a 10s test sequence period.
− The presentation timing suggested here is clearly not suitable for such tests.
Evaluations of Failure Characteristics
54
− If recordings of a codec output under fairly low bit error ratio conditions (resulting in a small number of
visible transients within a 10s period) are to be made for later editing into subjective assessment
presentations, care should be taken to ensure that the recording used is typical of the codec output
viewed over a longer time-span.
− Because of the need to explore codec performance over a range of transmission bit error ratios, practical
constraints suggest that three test image sequences with an additional demonstration sequence will
probably be adequate.
− Sequences should be of the order of 10s in duration but it should be noted that test viewers may prefer a
duration of 15-30 s.
− It should range between moderately critical and critical in the context of television bit-rate reduction.
− As the tests will span the full range of impairment, the double-stimulus impairment scale (DSIS) method is
appropriate and should be used.
Evaluations of Failure Characteristics
55
56
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
Display selection and set-up
− The display used should be a flat panel display featuring performances typical of professional applications
(e.g. broadcasting studios or vans)
− The display diagonal dimension may vary from 22’ (minimum) to 40’ (suggested), but it may extend to 50’
or higher, when image systems with a resolution of HDTV or higher are assessed.
− It is allowed to use a reduced portion of the active viewing area of a display; in this case the area around
the active part of the display should be set to mid-grey. In this condition of use it should not be allowed to
set the monitor to a resolution different from its native one.
− The display should allow a proper set-up and calibration for luminance and colour, using a professional
light-meter instrument. The calibration of the display should comply with the parameters specified in the
relevant Recommendation for the test being undertaken.
57
Subjective Assessment by Expert Viewing
Viewing distance
− The viewing distance at which the experts are seated should be chosen according to the resolution of the
screen, and to the height of the active part of the screen, according to the design viewing distance as
described in Recommendation ITU-R BT.2022 or shorter viewing distance, according to the requirements in
terms of critical viewing conditions.
Viewing conditions
− An expert viewing protocol (EVP) experiment should not necessarily be run in a test laboratory, but it is
important that the testing location is protected from audible and or visible disturbances (e.g. a quiet
office or meeting room may be used as well).
− Any direct or reflected source of light falling on the screen should be eliminated; other ambient light
should be low, maintained to the minimum level that can allow filling scoring sheets (if used).
− The number of experts seated in front of the monitor, may vary according to the screen size, in order to
guarantee the same image rendering and stimulus presentation for all the viewers.
58
Subjective Assessment by Expert Viewing
Viewers
− The viewers participating to an EVP experiment should be expert in the domain of study.
− Viewers should not necessarily be screened for visual acuity or colour blindness, since they should be
chosen among qualified persons.
− The minimum number of different viewers should be nine.
− To reach the minimum number of viewers, the same experiment may be conducted at the same location
repeating the test, or in more than one location. The scores from different locations participating to an
expert viewing session may be statistically processed together.
59
Subjective Assessment by Expert Viewing
The basic test cell
− A basic test cell (BTC) for each couple of coding conditions to be assessed.
− The source reference sequences (SRC) and the processed video sequences (PVSs) clips to consider in a
BTC should always be related to the same video sequence, in order that the experts may be able to
identify any improvement in visual quality provided by the compression algorithms under test.
60
Subjective Assessment by Expert Viewing
The BTC should be organised as follows:
− 0.5 seconds with the screen set to a mid-grey (mean value in the luminance scale);
− 10 seconds presentation of the reference uncompressed video clip;
− 0.5 seconds showing the message “A” (first video to assess) on a mid-grey background;
− 10 seconds presentation of an impaired version of the video clip;
− 0.5 seconds showing the message “B” (second video to assess) on a mid-grey background;
− 10 seconds presentation of an impaired version of the video clip;
− 5 seconds showing a message that asks the viewers to express their opinion.
The message “Vote” should be followed by a number that helps to get synchronised on the scoring sheet.
Scoring sheet and rating scale
− The presentation of the video clips should be arranged in such a way that the unimpaired reference (SRC)
is shown at first, followed by two impaired video sequences (PVS).
− The order of presentation of the PVS should be randomly changed for each BTC and the viewers should
not know the order of presentation.
61
Subjective Assessment by Expert Viewing
Test design and session creation
− A random order by the test designer, in such a way that the same video clip is not shown two consecutive
times as well as the same impaired clip.
− Any viewing session should begin with a “stabilization phase” including the “best”, the “worst” and two
“mid quality” BTC among those included in each test session.
− This will allow the viewers to have an immediate impression of the quality range, already at the beginning
the test session.
− If the viewing session is longer than 20 minutes, the test designer should split it into two (or more) separate
viewing sessions, each of them not exceeding 20 minutes. In this case, the “stabilization phase” should be
provided before each viewing session.
62
Subjective Assessment by Expert Viewing
Training
− Even if this procedure is foreseen for use with the participation of experts, a short (5-6 BTC) training viewing
session should preferably be organised prior to each experiment.
− The video material used in the training session may be the same that will be used during the actual
sessions, but the order of presentation should be different.
− The viewers should be trained on the use of the 11-grade scale by asking them to carefully look at the
video clips shown immediately after the message “A” and “B” on the screen, and check whether they
can see any difference to the video clip shown as first (the SRC).
63
Subjective Assessment by Expert Viewing
Data collection and processing
− The scores should be collected at the end of each session and logged on an electronic spreadsheet to
compute the MEAN values.
− A “post screening” of the viewers should desirably be performed, using a linear Pearson’s correlation.
− The “correlation” function should be applied considering all the scores of each subject in relation to the
mean opinion scores (MOS); a threshold may be set to define each viewer as “acceptable” or “rejected”
(Recommendation ITU-T P.913 suggests the use of a “reject” threshold value equal to 0.75).
Terms of use of the expert viewing protocol results
− The expert viewing protocol (EVP) may be used when time and resources do not allow running a formal
subjective assessment experiment.
− EVP requires less time than a formal subjective assessment and may be executed in an “informal”
environment. The only mandatory conditions are related to the ambient illumination and to the viewing
conditions (display, angle of observation and viewing distance) as described in the above paragraphs.
64
Subjective Assessment by Expert Viewing
Limitations of use of the EVP results
− Even if the EVP is demonstrating to be able to provide acceptable results with only nine viewers, the MOS
provided by an EVP experiment cannot be considered as a replacement of the results obtainable with a
formal subjective assessment experiment.
− The MOS data obtained using EVP might be used to make a preliminary ranking of the video processing
schemes under evaluation.
− Where retained convenient or necessary, an EVP experiment can be run in parallel in more locations,
assuming the viewing conditions, viewing distance and the test design are identical.
− If the number of expert viewers involved in the same EVP experiment, also if running the experiment in
different locations, is equal or higher than 15, the raw subjective data might be processed to obtain MOS,
standard deviation and confidence interval data, that may help to perform a more accurate ranking of
the cases under test. In this last case more accurate inferential statistical analysis may be performed, e.g.
T-Student test.
65
Subjective Assessment by Expert Viewing
66
EVP in the Presence of a Large Number of Expert Assessors
67
EVP in the Presence of a Large Number of Expert Assessors
− Observing the results it can be noted that:
• 15 and the 18 viewers graphs show an homogeneous slope from high quality to low quality MOS
values.
• 9 and the 12 viewers graphs show some “inversions” of ranking compared to the 18 viewers graph,
even if the variations of scores are rather limited in their extension.
− In conclusion, the EVP experiments here described show a very good performance of EVP protocol,
confirming that the use of the EVP protocol, even if it cannot be considered a full replacement of a formal
subjective experiment, might be considered an evaluation procedure stable and providing results very
close to those obtained when many more viewers are available and a formal subjective assessment is
done.
68
EVP in the Presence of a Large Number of Expert Assessors
69
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
Distribution Applications
− The quality specifications can be expressed in terms of the subjective judgement of observers.
− Such codecs can in theory therefore be assessed subjectively against these specifications.
Contribution Applications
− The quality could not in theory be specified in terms of subjective performance parameters because its
output is destined not for immediate viewing, but for
• studio post-processing
• storing
• and/or coding for further transmission.
70
Distribution Applications and Contribution Application Codecs Test
Contribution Applications
− Because of the difficulty of defining this performance for a variety of post-processing operations, the
approach preferred has been to specify the performance of a chain of equipment, including a post-
processing function, which is thought to be representative of a practical contribution application.
− This chain might typically consist of a codec, followed by a studio post-processing function (or another
codec in the case of basic contribution quality assessment), followed by yet another codec before the
signal reaches the observer.
71
Distribution Applications and Contribution Application Codecs Test
Encoder
Studio Post-processing
Function
EncoderDecoder
First
Generation
Second
Generation
Input
Video
Decoder
Representative of a practical
contribution application
Contribution Applications
− The most completely reliable method of evaluating the ranking order of high-quality codecs is to assess
all the candidate systems at the same time under identical conditions.
− Tests made independently, where fine differences of quality are involved, should be used for guidance
rather than as indisputable evidence of superiority.
− A useful subjective measure may be impairment determined as a function of the bit error ratio which
occurs in the contribution link between coder and decoder.
− At present there is insufficient experimental knowledge of true transmission error statistics to recommend
parameters for a model which accounts for error clustering or bursts. Until this information becomes
available Poisson-distributed errors may be used.
72
Distribution Applications and Contribution Application Codecs Test
− One article might claim that codec A is 15% better than codec B, while the next one might assert that
codec B is 10% better than codec A. Why?
− Because the testing methodology and content play a crucial role in the evaluation of video codecs.
The several factors that impact the assessment of video codecs:
• Encoder Implementation
• Encoder Settings
• Methodology
• Content
• Metrics
Factors in Codec Comparisons
73
− Video coding standards are instantiated in software or hardware with goals as varied as research,
broadcasting, or streaming.
− A ‘reference encoder’ is a software implementation used during the video standardization process and for
research, and as a reference by implementers.
− Generally, this is the first implementation of a standard and not used for production. Afterward, production
encoders developed by the open-source community or commercial entities come along.
− These are practical implementations deployed by most companies for their encoding needs and are
subject to stricter speed and resource constraints. Therefore, the performance of reference and
production encoders might be substantially different.
− Besides, the standard profile and specific version influence the observed performance, especially for a
new standard with still immature implementations.
− Netflix deploys production encoders tuned to achieve the highest subjective quality for streaming.
Encoder Implementation
74
− Encoding parameters such as the number of coding passes, parallelization tools, rate-control, visual
tuning, and others introduce a high degree of variability in the results.
− The selection of these encoding settings is mainly application-driven.
− Standardization bodies tend to use test conditions that let them compare one tool to another, often
maximizing a particular objective metric and reducing variability over different experiments.
− For example, rate-control and visual tunings are generally disabled, to focus on the effectiveness of core
coding tools.
− Netflix encoding recipes focus on achieving the best quality, enabling the available encoder tools that
boost visual appearance, and thus, giving less weight to indicators like speed or encoder footprint that
are crucial in other applications.
Encoding Settings
75
− Testing methodology in codec standardization establishes well-defined “common test conditions” to
assess new coding tools and to allow for reproducibility of the experiments.
− Common test conditions consist of a relatively small set of test sequences (single shots of 1 to 10 seconds)
that are encoded only at the input resolution with a fixed set of quality parameters.
− Quality (PSNR) and bitrate are gathered for each of these quality points and used to compute the
average bitrate savings, the so-called BD-rate (Bjontegaard Delta (BD) rate savings).
Methodology
76
− While the methodology in standards has been suitable for its intended purpose, other considerations
come into play in the adaptive streaming world.
− Notably, the option to offer multiple representations of the same video at different bitrates and resolutions
to match network bandwidth and client processing and display capabilities.
− The content, encoding, and display resolutions are not necessarily tied together. Removing this constraint
implies that quality can be optimized by encoding at different resolutions.
Methodology
77
− Per-resolution rate-quality curves cross, so there is a range of rates for which each encoding resolution
gives the best quality.
− The ‘convex hull’ is derived by selecting the optimal curve at each rate for the entire range.
− Then, the BD-rate difference is computed on the convex hulls, instead of using the single resolution curves.
− The flexibility in delivering bitstreams on the convex hull as opposed to just the single resolution ones leads
to remarkable quality improvements
Methodology
78
− The dynamic optimizer (DO) methodology generalizes this concept to sequences with multiple shots.
− DO operates on the convex hull of all the shots in a video to jointly optimize the overall rate-distortion by
finding the optimal compression path for an encode across qualities, resolutions, and shots.
− It showed a 25% in BD-rate savings for multiple-shot videos.
− Three characteristics that make DO particularly wellsuited for adaptive streaming and codec comparisons
are:
• It is codec agnostic since it can be applied in the same way to any encoder.
• It can use any metric to guide its optimization process.
• It eliminates the need for high-level rate control across shots in the encoder. Lower-level rate control, like
adaptive quantization within a frame, is still useful, because DO does not operate below the shot level.
− DO, as originally presented, is a non-real-time, computationally expensive algorithm.
− However, because of the exhaustive search, DO could be seen as the upper-bound performance for
high-level rate control algorithms.
Methodology
79
− For a fair comparison, the testing content should be balanced, covering a variety of distinct types of
video (natural vs. animation, slow vs. high motion, etc.) or reflect the kind of content for the application at
hand.
− The testing content should not have been used during the development of the codec.
− Netflix has produced and made public long video sequences with multiple shots, such as ‘El Fuente’ or
‘Chimera’, to extend the available videos for R&D and mitigate the problem of conflating training and
test content.
− Internally, we extensively evaluate algorithms using full titles from our catalog.
Content
80
− Traditionally, PSNR has been the metric of choice given its simplicity, and it
reasonably matches subjective opinion scores.
− Other metrics, such as VIF or SSIM, better correlate with the subjective
scores.
− Metrics have commonly been computed at the encoding resolution.
− Netflix highly relies on VMAF (Video Multi-Method Assessment Fusion)
throughout its video pipeline.
− VMAF is a perceptual video quality metric that models the human visual
system.
− It correlates better than PSNR with subjective opinion over a wide quality
range and content.
− VMAF enables reliable codec comparisons across the broad range of
bitrates and resolutions occurring in adaptive streaming.
Metrics
81
Approximate correspondence
between VMAF values and
subjective opinion
Two relevant aspects when employing metrics are the resolution at which they are computed and the
temporal averaging:
Scaled Metric:
− VMAF is not computed at the encoding resolution, but at the display resolution, which better emulates our
members viewing experience. This is not unique to VMAF, as PSNR and other metrics can be applied at any
desired resolution by appropriately scaling the video.
Temporal Average:
− Metrics are calculated on a per-frame basis.
− Normally, the arithmetic mean has been the method of choice to obtain the temporal average across the entire
sequence.
− We employ the harmonic mean, which gives more weight to outliers than the arithmetic mean.
− The rationale for using the harmonic mean is that if there are few frames that look really bad in a shot of a show
you are watching, then your experience is not that great, no matter how good the quality of the rest of the shot
is. The acronym for the harmonic VMAF is HVMAF.
Metrics
82
− Putting in practice the factors mentioned above, we show the results drawn from the two distinct codec
comparison approaches, the traditional and the adaptive streaming one.
− Three commonly used video coding standards are tested
• H.264/AVC by ITU.T and ISO/MPEG
• H.265/HEVC by ITU.T and ISO/MPEG
• VP9 by Google.
− For each standard, we use the reference encoder and a production encoder.
Codec Comparison Results by Netflix
83
Results with the traditional approach
− The traditional approach uses fixed QP (quality) encoding for a set of short sequences.
− Encoder settings are listed in the table below.
Codec Comparison Results by Netflix
84
Codec Comparison Results by Netflix
85
Results with the traditional approach
− Methodology: Five fixed quality encodings per sequence at the content resolution are generated.
− Content: 14 standard sequences from the MPEG Common Test Conditions set (mostly from JVET) and 14
from the Alliance for Open Media (AOM) set. All sequences are 1080p. These are short clips: about10
seconds for the MPEG set and 1 second for the AOM set. Mostly, they are single shot sequences.
− Metrics: BD-rate savings are computed using the Classic PSNR for the luma component.
• BD-rates are given in percentage with respect to x264.
• Positive numbers indicate an average increase of bitrate, while negative numbers indicate bitrate
reduction.
− Interestingly, using the MPEG set doesn’t seem to benefit HEVC encoders, or using the AOM set help VP9
encoders.
Codec Comparison Results by Netflix
86
Results from an adaptive streaming perspective
− It builds on top of the traditional approach, modifying some aspects in each of the factors:
− Encoder settings: Settings are changed to incorporate perceptual tunings. The rest of the settings remain
as previously defined.
Codec Comparison Results by Netflix
87
Results from an adaptive streaming perspective
− Methodology:
• Encoding is done at 10 different resolutions for each shot, from 1920x1080 down to 256x144.
• DO performs the overall encoding optimization using HVMAF (harmonic VMAF).
− Content:
• A set of 8 Netflix full titles (like Orange is the New Black, House of Cards or Bojack) is added to the other two
test sets.
• Netflix titles are 1080p, 30fps, 8 bits/component. They contain a wide variety of content in about 8 hours of
video material.
− Metrics:
• HVMAF is employed to assess these perceptually-tuned encodings.
• The metric is computed over the relevant quality range of the convex hull.
• HVMAF is computed after scaling the encodes to the display resolution (assumed to be 1080p), which also
matches the resolution of the source content.
Codec Comparison Results by Netflix
88
Results from an adaptive streaming perspective
− Additionally, we split results into two ranges to visualize performance at different qualities.
• The low range refers to HVMAF between 30 and 63
• The high range refers to HVMAF between 63 and 96, which correlates with high subjective quality.
− The highlighted rows in the HVMAF BD-rates table are the most relevant operation points for Netflix.
− Encoder, encoding settings, methodology, testing content, and metrics should be thoroughly described in
any codec comparison since they greatly influence results.
− A different selection of the testing conditions leads to different conclusions.
• For example, EVE-VP9 is about 1% worse than x265 in terms of PSNR for the traditional approach, but
about 12% better for the HVMAF high range case.
− Given the vast amount of video compressed and delivered by services like Netflix and that traditional and
adaptive streaming approaches do not necessarily converge to the same outcome, it would be
beneficial if the video coding community considered the adaptive streaming perspective in the
comparisons.
− For example, it is relatively easy to compute metrics on the convex hull or to add the HVMAF numbers to
the reported metrics.
Codec Comparison Results by Netflix
89
90
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
Visualizer™ SDR/HDR+WCG Digital Video Test Pattern
91
− It enables accurate evaluation and calibration of a wide range of digital video systems (to evaluate more than
20 key parameters of video quality from a single screen).
− The pattern is available as an uncompressed or compressed video sequence. In streaming form—as
HEVC/H.265, H.264/ MPEG-4 AVC and MPEG-2 video—it is suitable for testing both file-based and streaming
systems.
• Quantifies compression fidelity
• Identifies color matrix mismatch
• Determines bit depth and chroma subsampling
• Reveals skipped frames
• Quantifies lip sync errors
• Provides 18 additional tests
− HDR+WCG Version
• Reveals tone mapping and clipping
• Shows color gamut mismatch
• Indicates gamut mapping
Visualizer™ SDR/HDR+WCG Digital Video Test Pattern
92
− At the left, H and V linear frequency sweeps from zero to the Nyquist limit.
− At right, after conversion from 1920x1080 to 1280x720 and back.
− Note the aliasing at the top that should have been filtered away to gray.
− Note the strong Moiré pattern 2/3 the way up. 2/3 = 720/1080.
− Use pattern’s 12-unit grid scale to measure frequency ratios (8/12 = 720/1080).
Frequency Response
93
− Right: Some regions have missing detail.
− At each frequency, note the highest bit depth with visible signal.
− Left: Uncompressed shows visible signal from bottom to top in every frequency burst. (Contrast
exaggerated.)
Compression Fidelity
94
− Check skin tones, detail in hair and jewelry; check for highlight clipping, black clipping.
− The image is colormanaged for gamma and RGB primaries.
Reference Image
95
− Calibrated linear light stairstep from 100% (10,000 nits) down 6 decades in 24 steps,
with +/- 1/3 stop deltas. Check monitor tone mapping and clipping, and darkest
visible light level.
Jacob’s Ladder
96
− See User Guide Appendix for RGB values for each chip.
− Note six dim blocks in the fifth row. They are 1/200, 1/400,
− 1/800, 1/1600, 1/3200, 1/6400 of the linear light range.
ST-303M
97
− Right half: luma only.
− Left half: chroma only.
− Look for jumps or stutter indicating dropped or repeated frames.
− 1/4 way up, speed is 1 line per frame or field: look for jaggies there.
− Look for differences between luma and chroma resolution and
motion smoothness.
Lava Lamp
98
Chroma Downsampling
99
Chroma 4:4:4 Chroma 4:2:2
Chroma 4:2:0iChroma 4:2:0p
Color Matrix
100
Correct
Rec.2020 decoded as Rec.709Rec.709 YUV decoded as Rec. 601
− 1002, 940, 932, 914 (107%, 100, 99, 97).
− Headroom is clipped if you can’t see the difference between 107 and 100.
− You should see a difference between 100 and 99 (adjust monitor contrast down).
White Pluge
101
Correct Blown out
Chroma Upsampling Back to 4:4:4
102
4:4:4 4:2:2 to 4:4:4
4:2:0p to 4:4:4 4:2:0i to 4:4:4
− ± 4, 2, 1%.
− Left image: monitor brightness too high.
− Adjust down until 1st and 2nd columns merge (middle image).
− All 3rd column chips are different than 2nd column.
− Right: monitor brightness too high, but column 1 (super black) has been clipped off upstream or in the
monitor.
Pluge
103
− Left: 10-bit shows no contour lines.
− Dithered 8-bit may look almost as smooth.
− Right: 8-bit video has rotating contour lines.
− Compression artifacts may also appear. (Contrast exaggerated.)
Bit Depth
104
− 12 color points around each gamut, with small chips desaturated by 5 delta-E.
− Reveals gamut mapping.
Gamut Bars
105
− Upper field means the odd line comes first. Even field means the even line comes first.
− The field dominance refers to the choice of which field of an interlaced video signal is chosen as the point
at which video edits or switches occur.
− NTSC and PAL are both lower-field (or even-field) dominant. This means they record all the even-
numbered lines of your image first, then the odd-numbered lines a fraction of a second later.
− Most HD formats are upper-field dominant, which means they record the odd-numbered lines first.
Field Dominance
106
Correct
Reversed
− Check skin tones, detail in hair and jewelry; check for highlight clipping, black clipping.
− The image is colormanaged for gamma and RGB primaries.
Lip Sync
107
Correct: Tick is heard when center mark flashes.
Incorrect: Audio is 3 frames/fields early
− Demonstrates incorrect processing of Chroma in interlaced 4:2:0 systems.
Chroma Motion Error
108
Correct
Incorrect
− When center-cropped to the indicated aspect ratio, the associated dashed lines will be the edge pixels
of the new image.
Border Marquee & Crop Lines
109
Correct: Top white dashed line is showing Top line is missing
Questions??
Discussion!!
Suggestions!!
Criticism!!
110

More Related Content

What's hot

HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3
Dr. Mohieddin Moradi
 
SECURICO CCTV BOOK
SECURICO CCTV BOOK SECURICO CCTV BOOK
SECURICO CCTV BOOK
Saurabh Tripathi
 
An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1    An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1
Dr. Mohieddin Moradi
 
HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1
Dr. Mohieddin Moradi
 
3d tv broadcasting and distribution systems
3d tv broadcasting and distribution systems3d tv broadcasting and distribution systems
3d tv broadcasting and distribution systems
Abhiram Subhagan
 
HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2
Dr. Mohieddin Moradi
 
Modern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operationModern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operation
Dr. Mohieddin Moradi
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Dr. Mohieddin Moradi
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Dr. Mohieddin Moradi
 
To Understand Video
To Understand VideoTo Understand Video
To Understand Video
adil raja
 
HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6
Dr. Mohieddin Moradi
 
Understanding video technologies
Understanding video technologiesUnderstanding video technologies
Understanding video technologies
fionayoung
 
Dukane image pro 6540 hd
Dukane image pro 6540 hdDukane image pro 6540 hd
Dukane image pro 6540 hd
SchoolVision Inc.
 
What is In side a Television Broadcasting Station
What is In side a Television Broadcasting StationWhat is In side a Television Broadcasting Station
What is In side a Television Broadcasting Station
Anvar sha S
 
Unit iv
Unit ivUnit iv
Unit iv
swapnasalil
 
An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1   An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1
Dr. Mohieddin Moradi
 
Multimedia fundamental concepts in video
Multimedia fundamental concepts in videoMultimedia fundamental concepts in video
Multimedia fundamental concepts in video
Mazin Alwaaly
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
Christian Kehl
 
Video signal-ppt
Video signal-pptVideo signal-ppt
Video signal-ppt
Deepa K C
 
DIC_video_coding_standards_07
DIC_video_coding_standards_07DIC_video_coding_standards_07
DIC_video_coding_standards_07
aniruddh Tyagi
 

What's hot (20)

HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3
 
SECURICO CCTV BOOK
SECURICO CCTV BOOK SECURICO CCTV BOOK
SECURICO CCTV BOOK
 
An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1    An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1
 
HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1
 
3d tv broadcasting and distribution systems
3d tv broadcasting and distribution systems3d tv broadcasting and distribution systems
3d tv broadcasting and distribution systems
 
HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2
 
Modern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operationModern broadcast camera techniques, set up & operation
Modern broadcast camera techniques, set up & operation
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
 
To Understand Video
To Understand VideoTo Understand Video
To Understand Video
 
HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6
 
Understanding video technologies
Understanding video technologiesUnderstanding video technologies
Understanding video technologies
 
Dukane image pro 6540 hd
Dukane image pro 6540 hdDukane image pro 6540 hd
Dukane image pro 6540 hd
 
What is In side a Television Broadcasting Station
What is In side a Television Broadcasting StationWhat is In side a Television Broadcasting Station
What is In side a Television Broadcasting Station
 
Unit iv
Unit ivUnit iv
Unit iv
 
An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1   An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1
 
Multimedia fundamental concepts in video
Multimedia fundamental concepts in videoMultimedia fundamental concepts in video
Multimedia fundamental concepts in video
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Video signal-ppt
Video signal-pptVideo signal-ppt
Video signal-ppt
 
DIC_video_coding_standards_07
DIC_video_coding_standards_07DIC_video_coding_standards_07
DIC_video_coding_standards_07
 

Similar to Video Compression, Part 4 Section 2, Video Quality Assessment

Presentación Tesis 08022016
Presentación Tesis 08022016Presentación Tesis 08022016
Presentación Tesis 08022016
Universidad Politécnica de Madrid
 
NY Radiation Safety/Quality Assurance Program
NY Radiation Safety/Quality Assurance ProgramNY Radiation Safety/Quality Assurance Program
NY Radiation Safety/Quality Assurance Program
qubyx
 
Article on Remote Visual Inspection | Integrated NDE Solution.pdf
Article on Remote Visual Inspection | Integrated NDE Solution.pdfArticle on Remote Visual Inspection | Integrated NDE Solution.pdf
Article on Remote Visual Inspection | Integrated NDE Solution.pdf
integratedndesolutio
 
Machine Vision: The Key Considerations for Successful Visual Inspection
Machine Vision: The Key Considerations for Successful Visual InspectionMachine Vision: The Key Considerations for Successful Visual Inspection
Machine Vision: The Key Considerations for Successful Visual Inspection
Optima Control Solutions
 
Iec 62676 5 standardized spezification of image quality
Iec 62676 5 standardized spezification of image qualityIec 62676 5 standardized spezification of image quality
Iec 62676 5 standardized spezification of image quality
Henry Rutjes
 
Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...
Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...
Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...
Yole Developpement
 
EYE GAZE SYSTEM
EYE GAZE SYSTEMEYE GAZE SYSTEM
EYE GAZE SYSTEM
PullareddyC
 
BTC302: Interim Report Sample
BTC302: Interim Report SampleBTC302: Interim Report Sample
BTC302: Interim Report Sample
Martin Uren
 
seminarppt-180313082529.pdf
seminarppt-180313082529.pdfseminarppt-180313082529.pdf
seminarppt-180313082529.pdf
Sathwika7
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
Wen-Chih Lo
 
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly DetectionMVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
LEE HOSEONG
 
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...A survey on Measurement of Objective Video Quality in Social Cloud using Mach...
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...
IRJET Journal
 
qualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptx
qualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptxqualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptx
qualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptx
BhumiSuratiya
 
Barco medical display systems
Barco medical display systems   Barco medical display systems
Barco medical display systems
Santiago Scarsi
 
Proposal to supply, installation and testing of CCTV Camera
Proposal to supply, installation and testing of CCTV Camera Proposal to supply, installation and testing of CCTV Camera
Proposal to supply, installation and testing of CCTV Camera
Denis kisina
 
University roll (Sub code).pptx
University roll (Sub code).pptxUniversity roll (Sub code).pptx
University roll (Sub code).pptx
SKILL2021
 
Medical display calibration and qa
Medical display calibration and qaMedical display calibration and qa
Medical display calibration and qa
qubyx
 
Dualis Contour Brochure
Dualis Contour BrochureDualis Contour Brochure
Dualis Contour Brochure
ifm electronic gmbh
 
FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-
FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-
FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-
Emmanuel Anim
 
Kingpak Resume
Kingpak ResumeKingpak Resume
Kingpak Resume
Ethan Juan
 

Similar to Video Compression, Part 4 Section 2, Video Quality Assessment (20)

Presentación Tesis 08022016
Presentación Tesis 08022016Presentación Tesis 08022016
Presentación Tesis 08022016
 
NY Radiation Safety/Quality Assurance Program
NY Radiation Safety/Quality Assurance ProgramNY Radiation Safety/Quality Assurance Program
NY Radiation Safety/Quality Assurance Program
 
Article on Remote Visual Inspection | Integrated NDE Solution.pdf
Article on Remote Visual Inspection | Integrated NDE Solution.pdfArticle on Remote Visual Inspection | Integrated NDE Solution.pdf
Article on Remote Visual Inspection | Integrated NDE Solution.pdf
 
Machine Vision: The Key Considerations for Successful Visual Inspection
Machine Vision: The Key Considerations for Successful Visual InspectionMachine Vision: The Key Considerations for Successful Visual Inspection
Machine Vision: The Key Considerations for Successful Visual Inspection
 
Iec 62676 5 standardized spezification of image quality
Iec 62676 5 standardized spezification of image qualityIec 62676 5 standardized spezification of image quality
Iec 62676 5 standardized spezification of image quality
 
Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...
Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...
Yole Intel RealSense 3D camera module and STM IR laser 2015 teardown reverse ...
 
EYE GAZE SYSTEM
EYE GAZE SYSTEMEYE GAZE SYSTEM
EYE GAZE SYSTEM
 
BTC302: Interim Report Sample
BTC302: Interim Report SampleBTC302: Interim Report Sample
BTC302: Interim Report Sample
 
seminarppt-180313082529.pdf
seminarppt-180313082529.pdfseminarppt-180313082529.pdf
seminarppt-180313082529.pdf
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
 
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly DetectionMVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
 
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...A survey on Measurement of Objective Video Quality in Social Cloud using Mach...
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...
 
qualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptx
qualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptxqualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptx
qualification of instrument(UV & FTIR) BY Bhumi Suratiya, M.Pharm sem 2.pptx
 
Barco medical display systems
Barco medical display systems   Barco medical display systems
Barco medical display systems
 
Proposal to supply, installation and testing of CCTV Camera
Proposal to supply, installation and testing of CCTV Camera Proposal to supply, installation and testing of CCTV Camera
Proposal to supply, installation and testing of CCTV Camera
 
University roll (Sub code).pptx
University roll (Sub code).pptxUniversity roll (Sub code).pptx
University roll (Sub code).pptx
 
Medical display calibration and qa
Medical display calibration and qaMedical display calibration and qa
Medical display calibration and qa
 
Dualis Contour Brochure
Dualis Contour BrochureDualis Contour Brochure
Dualis Contour Brochure
 
FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-
FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-
FDA - ELECTRONIC SECURITY- AFWEST INT. SECURITY - EMMANUEL ANIM-
 
Kingpak Resume
Kingpak ResumeKingpak Resume
Kingpak Resume
 

More from Dr. Mohieddin Moradi

Video Quality Control
Video Quality ControlVideo Quality Control
Video Quality Control
Dr. Mohieddin Moradi
 
SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2
Dr. Mohieddin Moradi
 
SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1
Dr. Mohieddin Moradi
 
Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2
Dr. Mohieddin Moradi
 
Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1
Dr. Mohieddin Moradi
 
An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2
Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4
Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3
Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2
Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1
Dr. Mohieddin Moradi
 
An Introduction to Audio Principles
An Introduction to Audio Principles An Introduction to Audio Principles
An Introduction to Audio Principles
Dr. Mohieddin Moradi
 
Video Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video CodecsVideo Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video Codecs
Dr. Mohieddin Moradi
 
Video Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video CodecsVideo Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video Codecs
Dr. Mohieddin Moradi
 
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Dr. Mohieddin Moradi
 
Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts
Dr. Mohieddin Moradi
 
Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles
Dr. Mohieddin Moradi
 

More from Dr. Mohieddin Moradi (16)

Video Quality Control
Video Quality ControlVideo Quality Control
Video Quality Control
 
SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2
 
SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1
 
Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2
 
Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1
 
An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2
 
An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4
 
An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3
 
Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2
 
Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1
 
An Introduction to Audio Principles
An Introduction to Audio Principles An Introduction to Audio Principles
An Introduction to Audio Principles
 
Video Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video CodecsVideo Compression, Part 3-Section 2, Some Standard Video Codecs
Video Compression, Part 3-Section 2, Some Standard Video Codecs
 
Video Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video CodecsVideo Compression, Part 3-Section 1, Some Standard Video Codecs
Video Compression, Part 3-Section 1, Some Standard Video Codecs
 
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
 
Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts
 
Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles Video Compression Part 1 Video Principles
Video Compression Part 1 Video Principles
 

Recently uploaded

Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
edwin408357
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
Yasser Mahgoub
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
morris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdfmorris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdf
ycwu0509
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 

Recently uploaded (20)

Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
morris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdfmorris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdf
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 

Video Compression, Part 4 Section 2, Video Quality Assessment

  • 2. Section I − Perceptual Artifacts in Compressed Video − Quality Assessment in Compressed Video − Objective Assessment of Compressed Video − Objective Assessment of Compressed Video, Codec Assessment for Production Section II − Subjective Assessment of Compressed Video − Subjective Assessment of Compressed Video, Subjective Assessment by Expert Viewing − Performance Comparison of Video Coding Standards: An Adaptive Streaming Perspective − Subjective Assessment by Visualizer™ Test Pattern 2 Outline
  • 4. 4 Upper anchor, Source material Material to be evaluated Lower anchor , Worst case VRT-medialab: onderzoek en innovatie Subjective Assessment EBU Triple Stimulus Continuous Evaluation Scale (TSCES) quality assessment rig
  • 5. In subjective evaluation, a panel of human observers are given the task to rate the quality of image or video sequence. − There are normally 15 to 25 subjects, but the minimum is 15 − Subjects should have normal HVS (age 18-30) − They should NOT be familiar with Image degradations (but now it is changed!) − The average values of the score is called: Mean Opinion Score (MOS) − MOS is normally defined in the range of 1-5 Subjective Assessment 5
  • 6. − Golden Standard • Provides the best predictive value of actual subjective ratings − Absolute Category Rating (ACR) • Human subjects are shown two video sequences (original and processed) and are asked to assess the overall quality of the processed sequence with respect to the original (reference) sequence − Mean Opinion Score (MOS) • Numerical measure of the human-judged overall quality • On a scale of 1 (bad), 2 (poor), 3 (fair), 4 (good), 5 (excellent) − DMOS (Differential Mean Opinion Scores) • It is defined as the difference between the raw quality score of the reference and test images. Mean Opinion Score (MOS) 6
  • 7. Laboratory Viewing Environment − The laboratory viewing environment is intended to provide critical conditions to check systems. − The laboratory viewing environment parameters have been selected to define an environment slightly more critical than the typical home viewing situations. Home Viewing Environment − The home viewing environment is intended to provide a means to evaluate quality at the consumer side of the TV chain. General Viewing Conditions 7
  • 8. In a Laboratory Environment The assessors’ viewing conditions should be arranged as follows: − Room illumination: low − Chromaticity of background: D65 − Peak luminance (it should be adjusted according to the room illumination): 70-250 𝒄𝒅/𝒎 𝟐 − Monitor contrast ratio: ≤ 0.02 − Ratio of luminance of background behind picture monitor to peak luminance of picture:≃0.15 General Viewing Conditions for Subjective Assessments 8
  • 9. In Home Environment The assessors’ viewing conditions should be arranged as follows: − Environmental illuminance on the screen (incident light from the environment falling on the screen, should be measured perpendicularly to the screen): 200 lux − Peak luminance (it should be adjusted according to the room illumination): 70-500 𝒄𝒅/𝒎 𝟐 − Ratio of luminance of inactive screen to peak luminance monitor contrast ratio: ≤ 0.02 General Viewing Conditions for Subjective Assessments 9
  • 10. Preferred Viewing Distance (PVD) − The PVD is based upon viewers’ preferences which have been determined empirically. − The PVD contains a number of data sets collected from available sources. − This information may be referred to for designing a subjective assessment test. Viewing Distance 10
  • 11. Design Viewing Distance (DVD). − The design viewing distance (DVD), or optimal viewing distance, for a digital system is the distance at which two adjacent pixels subtend an angle of 1 arc-min at the viewer’s eye; and the optimal horizontal viewing angle as the angle under which an image is seen at its optimal viewing distance. Viewing Distance 11
  • 12. − Using monitors with different characteristics will yield different subjective image qualities. − It is therefore strongly recommended that characteristics of the monitors used should be checked beforehand. • Recommendation ITU-R BT.1886 – Reference electro-optical transfer function for Flat Panel Displays (FPD) used in HDTV studio production • Report ITU-R BT.2129 – User requirements for a Flat Panel Display (FPD) as a Master monitor in an HDTV programme production environment may be referred to when professional FPD monitors are used for subjective assessment. • Report ITU-R BT.2390 provides information concerning laboratory and home displays and viewing environments for the assessment of High Dynamic Range (HDR) images. The Monitor 12
  • 13. − Monitor processing such as image scaling, frame rate conversion, image enhancer, if implemented, should be done in such a way as to avoid introducing artefacts. − HDR processing should be appropriate to the HDR system being assessed or being used during the assessment. For consumer environment or distribution assessments, this may include the use of the appropriate static or dynamic metadata. − Full details of such metadata should be included in the notes of the assessments in order that other laboratories can accurately repeat the assessments. Monitor Processing 13
  • 14. − When using consumer displays for subject image assessments, it is important that all image-processing options are disabled unless the impact of such image processing is the subject of the assessment(s). − When accessing interlace images, the test report should indicate whether de-interlacer has been used or not. It is preferable not to use de-interlacer if interlaced signals can be displayed without it. Monitor Processing 14
  • 15. − The resolution of professional monitors usually complies with the required standards for subjective assessments in their luminance operating range. − To check and report the maximum and minimum resolutions (center and corners of the screen) at the used luminance value might be suggested. − If consumer FPD TV displays are used for subjective assessments, it is strongly recommended to check and report the maximum and minimum resolutions (center and corners of the screen) at the used luminance value. − At present the most practical system available to subjective assessments performers, in order to check monitors or consumer TV sets resolutions, is the use of a swept test pattern electronically generated. Monitor resolution 15
  • 16. Contrast − Contrast could be strongly influenced by the environment illuminance. − Professional monitors seldom use technologies to improve their contrast in a high illuminance environment, so it is possible they do not comply with the requested contrast standard if used in a high illuminance environment. − Consumer monitors typically use technologies to get a better contrast in a high illuminance environment. Brightness − When adjusting the LCD monitor brightness, it is preferable to use backlight intensity control rather than using “signal level” scaling to retain the bit precision. − In the case of other display technologies that do not use a backlight, the white level should be adjusted by means other than signal level scaling. Monitor Contrast and Brightness 16
  • 17. − The display should not introduce motion artefacts that are introduced by specific display technologies. − On the other hand, the motion effects included in the input signal should be represented on the display. − When using consumer displays, it is vital that ALL motion processing options are disabled. Monitor Motion Processing Options 17
  • 18. Assessment problem Material used Overall performance with average material General, “critical but not unduly so” Capacity, critical applications (e.g. contribution, post- processing, etc.) Range, including very critical material for the application tested Performance of “adaptive” systems Material very critical for “adaptive” scheme used Identify weaknesses and possible improvements Critical, attribute-specific material Identify factors on which systems are seen to vary Wide range of very rich material Conversion among different standards Critical for differences (e.g. field rate) A Survey of Typical Assessment Problems and Test Materials Used to Address Them ITU-R Test Sequences − A number of organizations have developed test still images and sequences. − Report ITU-R BT.2245 - HDTV and UHDTV including HDR-TV test materials for assessment of picture quality, gives details of HDTV and UHDTV test material that can be used for subjective assessment. 18
  • 19. − Observers may be expert or non-expert depending on the objectives of the assessment. − An expert observer is an observer that has expertise in the image artefacts that may be introduced by the system under test. − A non-expert (“naive”) observer is an observer that has no expertise in the image artefacts that may be introduced by the system under test. − In any case, observers should not be, or have been, directly involved, i.e. enough to acquire specific and detailed knowledge, in the development of the system under study. Expert and Non-expert Observers 19
  • 20. − At least 15 observers should be used. − The number of assessors needed depends upon the sensitivity and reliability of the test procedure adopted and upon the anticipated size of the effect sought. − For studies with limited scope, e.g. of exploratory nature, fewer than 15 observers may be used. In this case, the study should be identified as ‘informal’. − The level of expertise in television image quality assessment of the observers should be reported. Observers Quantity 20
  • 21. − Prior to a session, the observers should be screened for (corrected-to-) normal visual acuity on the Snellen or Landolt chart, and for normal colour vision using specially selected charts (Ishihara, for instance). − A study of consistency between results at different testing laboratories has found that systematic differences can occur between results obtained from different laboratories. − A possible explanation for the differences between different laboratories is that there may be different skill levels amongst different groups of assessors. − Suggested data to be provided could include an occupation category (e.g. broadcast organization employee, university student, office worker, ...), gender, and age range. Observer Screening and Test Consistency 21
  • 22. − Assessors should be carefully introduced to • the method of assessment • the types of impairment or quality factors likely to occur • the grading scale • the sequence and timing − Training sequences demonstrating the range and the type of the impairments to be assessed should be used with illustrating images other than those used in the test, but of comparable sensitivity. − In the case of quality assessments, quality may be defined as to consist of specific perceptual attributes. Instructions for the Assessment 22
  • 24. − A session should not last more than half an hour. − At the beginning of the first session, about five “dummy presentations” should be introduced to stabilize the observers’ opinion. − The data issued from these presentations must not be considered in the results of the test. − If several sessions are necessary, about three dummy presentations are only necessary at the beginning of the following session. − A random order should be used for the presentations (for example, derived from Graeco-Latin squares); but the test condition order should be arranged so that any effects on the grading of tiredness or adaptation are balanced out from session to session. − Some of the presentations can be repeated from session to session to check coherence. The Test Session 24
  • 25. − For each test parameter, the mean and 95% confidence interval of the statistical distribution of the assessment grades must be given. − The results must be given together with the following information: • details of the test configuration; • details of the test materials; • type of image source and display monitors; • number and type of assessors; • reference systems used; • the grand mean score for the experiment; • original and adjusted mean scores and 95% confidence interval if one or more observers have been eliminated according to the procedure given below. Presentation of the Results 25
  • 26. When the time-varying picture quality of the processed video without reference is evaluated by the subjects, the SSCQE is preferred. − Viewers are shown each video sequence one by one and asked them to vote. − Each sequence is shown only once and it consists of 30 seconds to several minutes of duration. − Since video sequences are long, they are segmented into 10s shots, and for each video segment an MOS is calculated. − During the presentation of sequence, viewers continuously provide their rating scores by using a slider. − The variations in slider are observed after every 500 msec which provides the continuous scale reading. − The rating scale is divided into five equal lengths as explained for DSCQS. Single Stimulus Continuous Quality Evaluation (SSCQE) 26
  • 27. − A typical assessment might call for an evaluation of either a new system, or the effect of a transmission path impairment. − The double-stimulus (EBU) method is cyclic in that the assessor is first presented with an unimpaired reference, then with the same image impaired. − Following this, he is asked to vote on the second, keeping in mind the first. The Double-stimulus Impairment Scale (DSIS) 27
  • 28. − In sessions, which last up to half an hour, the assessor is presented with a series of images or sequences in random order and with random impairments covering all required combinations. − The unimpaired image is included in the images or sequences to be assessed. − At the end of the series of sessions, the mean score for each test condition and test image is calculated. − The method uses the impairment scale, for which it is usually found that the stability of the results is greater for small impairments than for large impairments. − Although the method sometimes has been used with limited ranges of impairments, it is more properly used with a full range of impairments. The Double-stimulus Impairment Scale (DSIS) 28
  • 29. − Assessors are presented with a series of test images or sequences. They are arranged in pairs such that the first in the pair comes direct from the source, and the second is the same image via the system under test. The Double-stimulus Impairment Scale (DSIS) Grading scales • Imperceptible • Perceptible, But Not Annoying • Slightly Annoying • Annoying • Very Annoying 29 Source Signal or System Under Test.
  • 30. Presentation of the Test Material − A test session comprises a number of presentations. − There are two variants to the structure of presentations, I and II outlined below. • Variant I: The reference image or sequence and the test image or sequence are presented only once. • Variant II: The reference image or sequence and the test image or sequence are presented twice. It is more time consuming, may be applied if the discrimination of very small impairments is required or moving sequences are under test. The Double-stimulus Impairment Scale (DSIS) T1 = 10s Reference image T2 = 3s Mid-grey produced by video level of around 200 mV T3 = 10s Test condition T4 = 5 to 11s Mid-grey Experience suggests that extending the periods T1 and T3 beyond 10 s does not improve the assessor’s ability to grade the image sequences. 30
  • 31. The Introduction to the Assessments − At the beginning of each session, an explanation is given to the observers about the type of assessment, the grading scale, the sequence and timing (reference image, grey, test image, voting period). − The range and type of the impairments to be assessed should be illustrated on images other than those used in the tests, but of comparable sensitivity. − It must not be implied that the worst quality seen necessarily corresponds to the lowest subjective grade. − Observers should be asked to base their judgement on the overall impression given by the image, and to express these judgements in terms of the wordings used to define the subjective scale. − The observers should be asked to look at the image for the whole of the duration of T1 and T3. − Voting should be permitted only during T4. The Double-stimulus Impairment Scale (DSIS) 31
  • 32. The Test Session − The images and impairments should be presented in a pseudo-random sequence and, preferably in a different sequence for each session. − In any case, the same test image or sequences should never be presented on two successive occasions with the same or different levels of impairment. − The range of impairments should be chosen so that all grades are used by the majority of observers; a grand mean score (averaged overall judgements made in the experiment) close to three should be aimed at. − A session should not last more than roughly half an hour, including the explanations and preliminaries; the test sequence could begin with a few images indicative of the range of impairments; judgements of these images would not be taken into account in the final results. The Double-stimulus Impairment Scale (DSIS) 32
  • 33. − Video sequences are presented to observers in pair and the first video will always be reference video followed by the coded video sequence, and instead of quality, impairment is rated. − For voting: there are two methods used: 1st method: the reference video is followed by coded video and viewers are asked to vote. 2nd method: both reference and coded video sequences are presented twice to the observers and then asked them for vote as illustrated in figure. Double Stimulus Impairment Scale (DSIS), Summary 33
  • 34. − A typical assessment might call for evaluation of a new system or of the effects of transmission paths on quality. − The double-stimulus method is thought to be especially useful when it is not possible to provide test stimulus test conditions that exhibit the full range of quality. − The method is cyclic in that the assessor is asked to view a pair of images, each from the same source, but one via the process under examination, and the other one directly from the source. − He is asked to assess the quality of both. − In sessions which last up to half an hour, the assessor is presented with a series of image pairs (internally random) in random order, and with random impairments covering all required combinations. − At the end of the sessions, the mean scores for each test condition and test image are calculated. The Double-stimulus Continuous Quality-scale (DSCQS) 34
  • 35. General arrangement for test system for DSCQS method The Double-stimulus Continuous Quality-scale (DSCQS) T1 = 10 s Reference image T2 = 3 s Mid-grey produced by video level of around 200 mV T3 = 10 s Test condition T4 = 5 to 11 s Mid-grey 35
  • 36. Presentation of the Test Material Variant I (a single observer): − The assessor is free to switch between the A and B signals until the assessor has the mental measure of the quality associated with each signal. − The assessor may typically choose to do this two or three times for periods of up to 10s. Variant II (use a number of observers simultaneously): − Prior to recording results, the pair of conditions is shown one or more times for an equal length of time to allow the assessor to gain the mental measure of the qualities associated with them, then the pair is shown again one or more times while the results are recorded. − The number of repetitions depends on the length of the test sequences. − For still images, a 3-4 s sequence and five repetitions (voting during the last two) may be appropriate. − For moving images with time-varying artefacts, a 10s sequence with two repetitions (voting during the second) may be appropriate. The Double-stimulus Continuous Quality-scale (DSCQS) 36
  • 37. Grading Scale − The method requires the assessment of two versions of each test image. − One of each pair of test images is unimpaired while the other presentation might or might not contain an impairment. − The unimpaired image is included to serve as a reference, but the observers are not told which the reference image is. − In the series of tests, the position of the reference image is changed in pseudo-random fashion. − The observers are simply asked to assess the overall image quality of each presentation by inserting a mark on a vertical scale. − The vertical scales are printed in pairs to accommodate the double presentation of each test image. − The scales provide a continuous rating system to avoid quantizing errors, but they are divided into five equal lengths which correspond to the normal ITU-R five-point quality scale. The Double-stimulus Continuous Quality-scale (DSCQS) 37
  • 38. Analysis of the Results − The pairs of assessments (reference and test) for each test condition are converted from measurements of length on the score sheet to normalized scores in the range 0 to 100. − Then, the differences between the assessment of the reference and the test condition are calculated. − Experience has shown that the scores obtained for different test sequences are dependent on the criticality of the test material used. − A more complete understanding of codec performance can be obtained by presenting results for different test sequences separately, rather than only as aggregated averages across all the test sequences used in the assessment. − If results for individual test sequences are arranged in a rank order of test sequence criticality on an abscissa it is possible to present a crude graphical description of the image content failure characteristic of the system under test. The Double-stimulus Continuous Quality-scale (DSCQS) 38
  • 39. Interpretation of the Results − When using this DSCQS method, it could be hazardous, and even wrong, to derive conclusions about the quality of the conditions under test by associating numerical DSCQS values with adjectives coming from other tests protocols (e.g. imperceptible, perceptible but not annoying, ... coming from the DSIS method). − It is noted that results obtained from the DSCQS method should not be treated as absolute scores but as differences of scores between a reference condition and a test condition. − Thus, it is erroneous to associate the scores with a single quality description term even with those which come from the DSCQS protocol itself (e.g. excellent, good, fair, ...). − In any test procedure it is important to decide acceptability criteria before the assessment is commenced. − This is especially important when using the DSCQS method because of the tendency for inexperienced users to misunderstand the meaning of the quality scale values produced by the method. The Double-stimulus Continuous Quality-scale (DSCQS) 39
  • 40. − Each assessment consists of a pair of video sequences A then B followed by the same sequences A and B. − The order of reference and coded video sequences are shuffled and is unknown to viewers. − Viewers are asked to rate the quality of both video sequences. − Each video sequence has typically 8 to 10 seconds duration. − The voting pattern of DSCQS is illustrated in figure below. Double Stimulus Continuos Quality Scale (DSCQS), Summary 40 − Viewers rate the quality on continuous scale which is divided into five equal intervals: – Excellent for (100-80), – Good for (79-60), – Fair for (59-40), – Poor for (39-20) – Bad for (19-0) − The final score is rated from 1(Bad) to 5 (Excellent)
  • 41. − For distribution applications, quality specifications can be expressed in terms of the subjective judgement of observers. − Such codecs can in theory therefore be assessed subjectively against these specifications. − The quality of a codec designed for contribution applications however, could not in theory be specified in terms of subjective performance parameters because its output is destined not for immediate viewing, but for studio post-processing, storing and/or coding for further transmission. − Because of the difficulty of defining this performance for a variety of post-processing operations, the approach preferred has been to specify the performance of a chain of equipment, including a post- processing function, which is thought to be representative of a practical contribution application. Subjective Assessment for Distribution or Contribution Codec 41
  • 42. − This chain might typically consist of a codec, followed by a studio post-processing function (or another codec in the case of basic contribution quality assessment), followed by yet another codec before the signal reaches the observer. − Adoption of this strategy for the specification of codecs for contribution applications means that the measurement procedures given in this Recommendation can also be used to assess them. − The most completely reliable method of evaluating the ranking order of high-quality codecs is to assess all the candidate systems at the same time under identical conditions. − Tests made independently, where fine differences of quality are involved, should be used for guidance rather than as indisputable evidence of superiority. Subjective Assessment for Distribution or Contribution Codec 42
  • 43. − Where a codec is being assessed for distribution applications, this quality refers to pictures decoded after a single pass through a codec pair. − For contribution codecs, basic quality may be assessed after several codecs in series, in order to simulate a typical contribution application. − Where the range of quality to be assessed is small, as will normally be the case for television codecs, the testing methodology to be used is variant II of the double-stimulus continuous quality-scale. − The original source sequence will be used as the reference condition. Further consideration is being given to the duration of presentation sequences. − In the recent tests on codecs for 4:2:2 component video, it was considered advantageous to modify the presentation from that given in this Recommendation. − Composite pictures were used as an additional reference to provide a lower quality level against which to judge the codec performance. Evaluations of Basic Picture Quality 43
  • 44. − It is recommended that at least six picture sequences be used in the assessment, plus an additional one to be used for training purposes prior to the start of the trial. − The sequences should range between moderately critical and critical in the context of the bit-rate reduction application being considered. − Suppose the importance is stressed of testing digital codecs with picture sequences which are critical in the context of television bit-rate reduction. • It is therefore reasonable to ask: • How critical a particular image sequence is for a particular bit-rate reduction task. or • Whether one sequence is more critical than another. Evaluations of Basic Picture Quality 44
  • 45. − A simple but not especially helpful answer is that “criticality” means very different things to different codecs. − For example: • To an intrafield codec a still picture containing much detail could well be critical. • To an interframe codec a still picture containing much detail would present no difficulty at all. Critical Sequences to All Classes of Codec − Moving Texture + Complex motion → These types of sequences are most useful to generate or identify. • Complex motion may take the form of movements which are predictable to an observer but not to coding algorithms, such as tortuous periodic motion. Evaluations of Basic Picture Quality 45
  • 46. Fixed Word-length Intra-field Codecs − While it is possible and valid to assess these codecs on still images, the use of moving sequences is recommended since coding noise processes are easier to observe and this is more realistic of television applications. − If still images are used in computer simulations of codecs, processing should be performed over the entire assessment sequence in order to preserve temporal aspects of any source noise, for example. − The scenes chosen should contain as many as possible of the following details: • static and moving textured areas (some with coloured texture) • static and moving objects with sharp high contrast edges at various orientation (some with colour) • static plain mid-grey areas. − At least one sequence in the ensemble should exhibit just perceptible source noise − At least one sequence should be synthetic (i.e. computer generated) so that it is free from camera imperfections such as scanning aperture and lag. Evaluations of Basic Picture Quality 46
  • 47. Fixed Word-length Interframe Codecs − The test scenes chosen should all contain movement and as many as possible of the following details: • moving textured areas (some coloured) • objects with sharp, high contrast edges moving in a direction perpendicular to these edges and at various orientations (some coloured) − At least one sequence in the ensemble should exhibit just perceptible source noise − At least one sequence should be synthetic. Evaluations of Basic Picture Quality 47
  • 48. Variable Word-length Intra-field Codecs − It is recommended that these codecs be tested with moving image sequence material for the same reasons as the fixed word-length codecs. − It should be noted that by virtue of its variable word-length coding and associated buffer store, these codecs can dynamically distribute coding bit-capacity throughout the image. − Thus, for example, if half of a picture consists of a featureless sky which does not require many bits to code, capacity is saved for the other parts of the picture which can therefore be reproduced with high quality even if they are critical. − The important conclusion from this is that if a picture sequence is to be critical for such a codec, the content of every part of the screen should be detailed. Evaluations of Basic Picture Quality 48
  • 49. Variable Word-length Intra-field Codecs − So if a picture sequence is to be critical for such a codec, the content of every part of the screen should be detailed. − It should be filled with: • moving and static texture • as much colour variation as possible • objects with sharp, high contrast edges − At least one sequence in the text ensemble should exhibit just perceptible source noise − At least one sequence should be synthetic. Evaluations of Basic Picture Quality 49
  • 50. Variable Word-length Interframe Codecs − This is the most sophisticated class of codec and the kind which requires the most demanding material to stress it. − Not only should every part of the scene be filled with detail as in the intra-field variable word-length case, but this detail should also exhibit motion. − Furthermore, since many codecs employ motion compensation methods, the motion throughout the sequence should be complex. Evaluations of Basic Picture Quality 50
  • 51. Variable Word-length Interframe Codecs − Examples of complex motion are: • scenes employing simultaneous zooming and panning of a camera • a scene which has as a background a textured or detailed curtain blowing in the wind • a scene containing objects which are rotating in the three-dimensional world • scenes containing detailed objects which accelerate across the screen. − All scenes should contain substantial motion of objects with different velocities, textures and high contrast edges as well as a varied colour content. − At least one sequence in the test ensemble should exhibit just perceptible source noise − At least one sequence should have complex computer-generated camera motion from a natural still picture (so that it is free from noise and camera lag) − At least one sequence should be entirely computer generated. Evaluations of Basic Picture Quality 51
  • 52. − This assessment is intended to permit judgement to be made on the suitability of a codec for contribution applications with respect to a particular post-process e.g. colour matte, slow motion, electronic zoom. − The minimum arrangement of equipment for such an assessment is a single pass through the codec under test, followed by the post-process of interest, followed by the viewer. − It may, however, be more representative of a contribution application to employ further codecs after the post-process. − The test methodology to be used is variant II of the double-stimulus continuous quality-scale method (use a number of observers simultaneously). − Here however the reference condition will be the source subjected to the same post-processing as the decoded pictures. − If inclusion of a lower quality reference is considered to be advantageous then it too should be subjected to the same post-process. Evaluations of Picture Quality after Downstream Processing 52
  • 53. − Test sequences required for post-processing assessments are subject to exactly the same criticality criteria as sequences for other digital applications. − Because of the practical constraints of possibly having to assess a codec with several post-processes, the number of test picture sequences used may be a minimum of three with an additional one available for demonstration purposes. − The nature of the sequences will be dependent upon the post-processing task being studied but should range between moderately critical and critical in the context of television bit-rate reduction and for the process under consideration. − For slow motion assessment a display rate of 1/10th of the source rate may be suitable. Evaluations of Picture Quality after Downstream Processing 53
  • 54. − In subjective assessments of impairments in codec images, due to imperfections in the transmission or emission channel, a minimum of five, but preferably more, bit-error ratios or selected transmission/emission conditions should be chosen, approximately logarithmically spaced and adequately sampling the range which gives rise to codec impairments from “imperceptible” to “very annoying”. − It is possible that codec assessments could be required at transmission bit error ratios which result in visible transients so infrequent that they may not be expected to occur during a 10s test sequence period. − The presentation timing suggested here is clearly not suitable for such tests. Evaluations of Failure Characteristics 54
  • 55. − If recordings of a codec output under fairly low bit error ratio conditions (resulting in a small number of visible transients within a 10s period) are to be made for later editing into subjective assessment presentations, care should be taken to ensure that the recording used is typical of the codec output viewed over a longer time-span. − Because of the need to explore codec performance over a range of transmission bit error ratios, practical constraints suggest that three test image sequences with an additional demonstration sequence will probably be adequate. − Sequences should be of the order of 10s in duration but it should be noted that test viewers may prefer a duration of 15-30 s. − It should range between moderately critical and critical in the context of television bit-rate reduction. − As the tests will span the full range of impairment, the double-stimulus impairment scale (DSIS) method is appropriate and should be used. Evaluations of Failure Characteristics 55
  • 57. Display selection and set-up − The display used should be a flat panel display featuring performances typical of professional applications (e.g. broadcasting studios or vans) − The display diagonal dimension may vary from 22’ (minimum) to 40’ (suggested), but it may extend to 50’ or higher, when image systems with a resolution of HDTV or higher are assessed. − It is allowed to use a reduced portion of the active viewing area of a display; in this case the area around the active part of the display should be set to mid-grey. In this condition of use it should not be allowed to set the monitor to a resolution different from its native one. − The display should allow a proper set-up and calibration for luminance and colour, using a professional light-meter instrument. The calibration of the display should comply with the parameters specified in the relevant Recommendation for the test being undertaken. 57 Subjective Assessment by Expert Viewing
  • 58. Viewing distance − The viewing distance at which the experts are seated should be chosen according to the resolution of the screen, and to the height of the active part of the screen, according to the design viewing distance as described in Recommendation ITU-R BT.2022 or shorter viewing distance, according to the requirements in terms of critical viewing conditions. Viewing conditions − An expert viewing protocol (EVP) experiment should not necessarily be run in a test laboratory, but it is important that the testing location is protected from audible and or visible disturbances (e.g. a quiet office or meeting room may be used as well). − Any direct or reflected source of light falling on the screen should be eliminated; other ambient light should be low, maintained to the minimum level that can allow filling scoring sheets (if used). − The number of experts seated in front of the monitor, may vary according to the screen size, in order to guarantee the same image rendering and stimulus presentation for all the viewers. 58 Subjective Assessment by Expert Viewing
  • 59. Viewers − The viewers participating to an EVP experiment should be expert in the domain of study. − Viewers should not necessarily be screened for visual acuity or colour blindness, since they should be chosen among qualified persons. − The minimum number of different viewers should be nine. − To reach the minimum number of viewers, the same experiment may be conducted at the same location repeating the test, or in more than one location. The scores from different locations participating to an expert viewing session may be statistically processed together. 59 Subjective Assessment by Expert Viewing
  • 60. The basic test cell − A basic test cell (BTC) for each couple of coding conditions to be assessed. − The source reference sequences (SRC) and the processed video sequences (PVSs) clips to consider in a BTC should always be related to the same video sequence, in order that the experts may be able to identify any improvement in visual quality provided by the compression algorithms under test. 60 Subjective Assessment by Expert Viewing The BTC should be organised as follows: − 0.5 seconds with the screen set to a mid-grey (mean value in the luminance scale); − 10 seconds presentation of the reference uncompressed video clip; − 0.5 seconds showing the message “A” (first video to assess) on a mid-grey background; − 10 seconds presentation of an impaired version of the video clip; − 0.5 seconds showing the message “B” (second video to assess) on a mid-grey background; − 10 seconds presentation of an impaired version of the video clip; − 5 seconds showing a message that asks the viewers to express their opinion. The message “Vote” should be followed by a number that helps to get synchronised on the scoring sheet.
  • 61. Scoring sheet and rating scale − The presentation of the video clips should be arranged in such a way that the unimpaired reference (SRC) is shown at first, followed by two impaired video sequences (PVS). − The order of presentation of the PVS should be randomly changed for each BTC and the viewers should not know the order of presentation. 61 Subjective Assessment by Expert Viewing
  • 62. Test design and session creation − A random order by the test designer, in such a way that the same video clip is not shown two consecutive times as well as the same impaired clip. − Any viewing session should begin with a “stabilization phase” including the “best”, the “worst” and two “mid quality” BTC among those included in each test session. − This will allow the viewers to have an immediate impression of the quality range, already at the beginning the test session. − If the viewing session is longer than 20 minutes, the test designer should split it into two (or more) separate viewing sessions, each of them not exceeding 20 minutes. In this case, the “stabilization phase” should be provided before each viewing session. 62 Subjective Assessment by Expert Viewing
  • 63. Training − Even if this procedure is foreseen for use with the participation of experts, a short (5-6 BTC) training viewing session should preferably be organised prior to each experiment. − The video material used in the training session may be the same that will be used during the actual sessions, but the order of presentation should be different. − The viewers should be trained on the use of the 11-grade scale by asking them to carefully look at the video clips shown immediately after the message “A” and “B” on the screen, and check whether they can see any difference to the video clip shown as first (the SRC). 63 Subjective Assessment by Expert Viewing
  • 64. Data collection and processing − The scores should be collected at the end of each session and logged on an electronic spreadsheet to compute the MEAN values. − A “post screening” of the viewers should desirably be performed, using a linear Pearson’s correlation. − The “correlation” function should be applied considering all the scores of each subject in relation to the mean opinion scores (MOS); a threshold may be set to define each viewer as “acceptable” or “rejected” (Recommendation ITU-T P.913 suggests the use of a “reject” threshold value equal to 0.75). Terms of use of the expert viewing protocol results − The expert viewing protocol (EVP) may be used when time and resources do not allow running a formal subjective assessment experiment. − EVP requires less time than a formal subjective assessment and may be executed in an “informal” environment. The only mandatory conditions are related to the ambient illumination and to the viewing conditions (display, angle of observation and viewing distance) as described in the above paragraphs. 64 Subjective Assessment by Expert Viewing
  • 65. Limitations of use of the EVP results − Even if the EVP is demonstrating to be able to provide acceptable results with only nine viewers, the MOS provided by an EVP experiment cannot be considered as a replacement of the results obtainable with a formal subjective assessment experiment. − The MOS data obtained using EVP might be used to make a preliminary ranking of the video processing schemes under evaluation. − Where retained convenient or necessary, an EVP experiment can be run in parallel in more locations, assuming the viewing conditions, viewing distance and the test design are identical. − If the number of expert viewers involved in the same EVP experiment, also if running the experiment in different locations, is equal or higher than 15, the raw subjective data might be processed to obtain MOS, standard deviation and confidence interval data, that may help to perform a more accurate ranking of the cases under test. In this last case more accurate inferential statistical analysis may be performed, e.g. T-Student test. 65 Subjective Assessment by Expert Viewing
  • 66. 66 EVP in the Presence of a Large Number of Expert Assessors
  • 67. 67 EVP in the Presence of a Large Number of Expert Assessors
  • 68. − Observing the results it can be noted that: • 15 and the 18 viewers graphs show an homogeneous slope from high quality to low quality MOS values. • 9 and the 12 viewers graphs show some “inversions” of ranking compared to the 18 viewers graph, even if the variations of scores are rather limited in their extension. − In conclusion, the EVP experiments here described show a very good performance of EVP protocol, confirming that the use of the EVP protocol, even if it cannot be considered a full replacement of a formal subjective experiment, might be considered an evaluation procedure stable and providing results very close to those obtained when many more viewers are available and a formal subjective assessment is done. 68 EVP in the Presence of a Large Number of Expert Assessors
  • 70. Distribution Applications − The quality specifications can be expressed in terms of the subjective judgement of observers. − Such codecs can in theory therefore be assessed subjectively against these specifications. Contribution Applications − The quality could not in theory be specified in terms of subjective performance parameters because its output is destined not for immediate viewing, but for • studio post-processing • storing • and/or coding for further transmission. 70 Distribution Applications and Contribution Application Codecs Test
  • 71. Contribution Applications − Because of the difficulty of defining this performance for a variety of post-processing operations, the approach preferred has been to specify the performance of a chain of equipment, including a post- processing function, which is thought to be representative of a practical contribution application. − This chain might typically consist of a codec, followed by a studio post-processing function (or another codec in the case of basic contribution quality assessment), followed by yet another codec before the signal reaches the observer. 71 Distribution Applications and Contribution Application Codecs Test Encoder Studio Post-processing Function EncoderDecoder First Generation Second Generation Input Video Decoder Representative of a practical contribution application
  • 72. Contribution Applications − The most completely reliable method of evaluating the ranking order of high-quality codecs is to assess all the candidate systems at the same time under identical conditions. − Tests made independently, where fine differences of quality are involved, should be used for guidance rather than as indisputable evidence of superiority. − A useful subjective measure may be impairment determined as a function of the bit error ratio which occurs in the contribution link between coder and decoder. − At present there is insufficient experimental knowledge of true transmission error statistics to recommend parameters for a model which accounts for error clustering or bursts. Until this information becomes available Poisson-distributed errors may be used. 72 Distribution Applications and Contribution Application Codecs Test
  • 73. − One article might claim that codec A is 15% better than codec B, while the next one might assert that codec B is 10% better than codec A. Why? − Because the testing methodology and content play a crucial role in the evaluation of video codecs. The several factors that impact the assessment of video codecs: • Encoder Implementation • Encoder Settings • Methodology • Content • Metrics Factors in Codec Comparisons 73
  • 74. − Video coding standards are instantiated in software or hardware with goals as varied as research, broadcasting, or streaming. − A ‘reference encoder’ is a software implementation used during the video standardization process and for research, and as a reference by implementers. − Generally, this is the first implementation of a standard and not used for production. Afterward, production encoders developed by the open-source community or commercial entities come along. − These are practical implementations deployed by most companies for their encoding needs and are subject to stricter speed and resource constraints. Therefore, the performance of reference and production encoders might be substantially different. − Besides, the standard profile and specific version influence the observed performance, especially for a new standard with still immature implementations. − Netflix deploys production encoders tuned to achieve the highest subjective quality for streaming. Encoder Implementation 74
  • 75. − Encoding parameters such as the number of coding passes, parallelization tools, rate-control, visual tuning, and others introduce a high degree of variability in the results. − The selection of these encoding settings is mainly application-driven. − Standardization bodies tend to use test conditions that let them compare one tool to another, often maximizing a particular objective metric and reducing variability over different experiments. − For example, rate-control and visual tunings are generally disabled, to focus on the effectiveness of core coding tools. − Netflix encoding recipes focus on achieving the best quality, enabling the available encoder tools that boost visual appearance, and thus, giving less weight to indicators like speed or encoder footprint that are crucial in other applications. Encoding Settings 75
  • 76. − Testing methodology in codec standardization establishes well-defined “common test conditions” to assess new coding tools and to allow for reproducibility of the experiments. − Common test conditions consist of a relatively small set of test sequences (single shots of 1 to 10 seconds) that are encoded only at the input resolution with a fixed set of quality parameters. − Quality (PSNR) and bitrate are gathered for each of these quality points and used to compute the average bitrate savings, the so-called BD-rate (Bjontegaard Delta (BD) rate savings). Methodology 76
  • 77. − While the methodology in standards has been suitable for its intended purpose, other considerations come into play in the adaptive streaming world. − Notably, the option to offer multiple representations of the same video at different bitrates and resolutions to match network bandwidth and client processing and display capabilities. − The content, encoding, and display resolutions are not necessarily tied together. Removing this constraint implies that quality can be optimized by encoding at different resolutions. Methodology 77
  • 78. − Per-resolution rate-quality curves cross, so there is a range of rates for which each encoding resolution gives the best quality. − The ‘convex hull’ is derived by selecting the optimal curve at each rate for the entire range. − Then, the BD-rate difference is computed on the convex hulls, instead of using the single resolution curves. − The flexibility in delivering bitstreams on the convex hull as opposed to just the single resolution ones leads to remarkable quality improvements Methodology 78
  • 79. − The dynamic optimizer (DO) methodology generalizes this concept to sequences with multiple shots. − DO operates on the convex hull of all the shots in a video to jointly optimize the overall rate-distortion by finding the optimal compression path for an encode across qualities, resolutions, and shots. − It showed a 25% in BD-rate savings for multiple-shot videos. − Three characteristics that make DO particularly wellsuited for adaptive streaming and codec comparisons are: • It is codec agnostic since it can be applied in the same way to any encoder. • It can use any metric to guide its optimization process. • It eliminates the need for high-level rate control across shots in the encoder. Lower-level rate control, like adaptive quantization within a frame, is still useful, because DO does not operate below the shot level. − DO, as originally presented, is a non-real-time, computationally expensive algorithm. − However, because of the exhaustive search, DO could be seen as the upper-bound performance for high-level rate control algorithms. Methodology 79
  • 80. − For a fair comparison, the testing content should be balanced, covering a variety of distinct types of video (natural vs. animation, slow vs. high motion, etc.) or reflect the kind of content for the application at hand. − The testing content should not have been used during the development of the codec. − Netflix has produced and made public long video sequences with multiple shots, such as ‘El Fuente’ or ‘Chimera’, to extend the available videos for R&D and mitigate the problem of conflating training and test content. − Internally, we extensively evaluate algorithms using full titles from our catalog. Content 80
  • 81. − Traditionally, PSNR has been the metric of choice given its simplicity, and it reasonably matches subjective opinion scores. − Other metrics, such as VIF or SSIM, better correlate with the subjective scores. − Metrics have commonly been computed at the encoding resolution. − Netflix highly relies on VMAF (Video Multi-Method Assessment Fusion) throughout its video pipeline. − VMAF is a perceptual video quality metric that models the human visual system. − It correlates better than PSNR with subjective opinion over a wide quality range and content. − VMAF enables reliable codec comparisons across the broad range of bitrates and resolutions occurring in adaptive streaming. Metrics 81 Approximate correspondence between VMAF values and subjective opinion
  • 82. Two relevant aspects when employing metrics are the resolution at which they are computed and the temporal averaging: Scaled Metric: − VMAF is not computed at the encoding resolution, but at the display resolution, which better emulates our members viewing experience. This is not unique to VMAF, as PSNR and other metrics can be applied at any desired resolution by appropriately scaling the video. Temporal Average: − Metrics are calculated on a per-frame basis. − Normally, the arithmetic mean has been the method of choice to obtain the temporal average across the entire sequence. − We employ the harmonic mean, which gives more weight to outliers than the arithmetic mean. − The rationale for using the harmonic mean is that if there are few frames that look really bad in a shot of a show you are watching, then your experience is not that great, no matter how good the quality of the rest of the shot is. The acronym for the harmonic VMAF is HVMAF. Metrics 82
  • 83. − Putting in practice the factors mentioned above, we show the results drawn from the two distinct codec comparison approaches, the traditional and the adaptive streaming one. − Three commonly used video coding standards are tested • H.264/AVC by ITU.T and ISO/MPEG • H.265/HEVC by ITU.T and ISO/MPEG • VP9 by Google. − For each standard, we use the reference encoder and a production encoder. Codec Comparison Results by Netflix 83
  • 84. Results with the traditional approach − The traditional approach uses fixed QP (quality) encoding for a set of short sequences. − Encoder settings are listed in the table below. Codec Comparison Results by Netflix 84
  • 85. Codec Comparison Results by Netflix 85 Results with the traditional approach − Methodology: Five fixed quality encodings per sequence at the content resolution are generated. − Content: 14 standard sequences from the MPEG Common Test Conditions set (mostly from JVET) and 14 from the Alliance for Open Media (AOM) set. All sequences are 1080p. These are short clips: about10 seconds for the MPEG set and 1 second for the AOM set. Mostly, they are single shot sequences. − Metrics: BD-rate savings are computed using the Classic PSNR for the luma component. • BD-rates are given in percentage with respect to x264. • Positive numbers indicate an average increase of bitrate, while negative numbers indicate bitrate reduction. − Interestingly, using the MPEG set doesn’t seem to benefit HEVC encoders, or using the AOM set help VP9 encoders.
  • 86. Codec Comparison Results by Netflix 86 Results from an adaptive streaming perspective − It builds on top of the traditional approach, modifying some aspects in each of the factors: − Encoder settings: Settings are changed to incorporate perceptual tunings. The rest of the settings remain as previously defined.
  • 87. Codec Comparison Results by Netflix 87 Results from an adaptive streaming perspective − Methodology: • Encoding is done at 10 different resolutions for each shot, from 1920x1080 down to 256x144. • DO performs the overall encoding optimization using HVMAF (harmonic VMAF). − Content: • A set of 8 Netflix full titles (like Orange is the New Black, House of Cards or Bojack) is added to the other two test sets. • Netflix titles are 1080p, 30fps, 8 bits/component. They contain a wide variety of content in about 8 hours of video material. − Metrics: • HVMAF is employed to assess these perceptually-tuned encodings. • The metric is computed over the relevant quality range of the convex hull. • HVMAF is computed after scaling the encodes to the display resolution (assumed to be 1080p), which also matches the resolution of the source content.
  • 88. Codec Comparison Results by Netflix 88 Results from an adaptive streaming perspective − Additionally, we split results into two ranges to visualize performance at different qualities. • The low range refers to HVMAF between 30 and 63 • The high range refers to HVMAF between 63 and 96, which correlates with high subjective quality. − The highlighted rows in the HVMAF BD-rates table are the most relevant operation points for Netflix.
  • 89. − Encoder, encoding settings, methodology, testing content, and metrics should be thoroughly described in any codec comparison since they greatly influence results. − A different selection of the testing conditions leads to different conclusions. • For example, EVE-VP9 is about 1% worse than x265 in terms of PSNR for the traditional approach, but about 12% better for the HVMAF high range case. − Given the vast amount of video compressed and delivered by services like Netflix and that traditional and adaptive streaming approaches do not necessarily converge to the same outcome, it would be beneficial if the video coding community considered the adaptive streaming perspective in the comparisons. − For example, it is relatively easy to compute metrics on the convex hull or to add the HVMAF numbers to the reported metrics. Codec Comparison Results by Netflix 89
  • 91. Visualizer™ SDR/HDR+WCG Digital Video Test Pattern 91
  • 92. − It enables accurate evaluation and calibration of a wide range of digital video systems (to evaluate more than 20 key parameters of video quality from a single screen). − The pattern is available as an uncompressed or compressed video sequence. In streaming form—as HEVC/H.265, H.264/ MPEG-4 AVC and MPEG-2 video—it is suitable for testing both file-based and streaming systems. • Quantifies compression fidelity • Identifies color matrix mismatch • Determines bit depth and chroma subsampling • Reveals skipped frames • Quantifies lip sync errors • Provides 18 additional tests − HDR+WCG Version • Reveals tone mapping and clipping • Shows color gamut mismatch • Indicates gamut mapping Visualizer™ SDR/HDR+WCG Digital Video Test Pattern 92
  • 93. − At the left, H and V linear frequency sweeps from zero to the Nyquist limit. − At right, after conversion from 1920x1080 to 1280x720 and back. − Note the aliasing at the top that should have been filtered away to gray. − Note the strong Moiré pattern 2/3 the way up. 2/3 = 720/1080. − Use pattern’s 12-unit grid scale to measure frequency ratios (8/12 = 720/1080). Frequency Response 93
  • 94. − Right: Some regions have missing detail. − At each frequency, note the highest bit depth with visible signal. − Left: Uncompressed shows visible signal from bottom to top in every frequency burst. (Contrast exaggerated.) Compression Fidelity 94
  • 95. − Check skin tones, detail in hair and jewelry; check for highlight clipping, black clipping. − The image is colormanaged for gamma and RGB primaries. Reference Image 95
  • 96. − Calibrated linear light stairstep from 100% (10,000 nits) down 6 decades in 24 steps, with +/- 1/3 stop deltas. Check monitor tone mapping and clipping, and darkest visible light level. Jacob’s Ladder 96
  • 97. − See User Guide Appendix for RGB values for each chip. − Note six dim blocks in the fifth row. They are 1/200, 1/400, − 1/800, 1/1600, 1/3200, 1/6400 of the linear light range. ST-303M 97
  • 98. − Right half: luma only. − Left half: chroma only. − Look for jumps or stutter indicating dropped or repeated frames. − 1/4 way up, speed is 1 line per frame or field: look for jaggies there. − Look for differences between luma and chroma resolution and motion smoothness. Lava Lamp 98
  • 99. Chroma Downsampling 99 Chroma 4:4:4 Chroma 4:2:2 Chroma 4:2:0iChroma 4:2:0p
  • 100. Color Matrix 100 Correct Rec.2020 decoded as Rec.709Rec.709 YUV decoded as Rec. 601
  • 101. − 1002, 940, 932, 914 (107%, 100, 99, 97). − Headroom is clipped if you can’t see the difference between 107 and 100. − You should see a difference between 100 and 99 (adjust monitor contrast down). White Pluge 101 Correct Blown out
  • 102. Chroma Upsampling Back to 4:4:4 102 4:4:4 4:2:2 to 4:4:4 4:2:0p to 4:4:4 4:2:0i to 4:4:4
  • 103. − ± 4, 2, 1%. − Left image: monitor brightness too high. − Adjust down until 1st and 2nd columns merge (middle image). − All 3rd column chips are different than 2nd column. − Right: monitor brightness too high, but column 1 (super black) has been clipped off upstream or in the monitor. Pluge 103
  • 104. − Left: 10-bit shows no contour lines. − Dithered 8-bit may look almost as smooth. − Right: 8-bit video has rotating contour lines. − Compression artifacts may also appear. (Contrast exaggerated.) Bit Depth 104
  • 105. − 12 color points around each gamut, with small chips desaturated by 5 delta-E. − Reveals gamut mapping. Gamut Bars 105
  • 106. − Upper field means the odd line comes first. Even field means the even line comes first. − The field dominance refers to the choice of which field of an interlaced video signal is chosen as the point at which video edits or switches occur. − NTSC and PAL are both lower-field (or even-field) dominant. This means they record all the even- numbered lines of your image first, then the odd-numbered lines a fraction of a second later. − Most HD formats are upper-field dominant, which means they record the odd-numbered lines first. Field Dominance 106 Correct Reversed
  • 107. − Check skin tones, detail in hair and jewelry; check for highlight clipping, black clipping. − The image is colormanaged for gamma and RGB primaries. Lip Sync 107 Correct: Tick is heard when center mark flashes. Incorrect: Audio is 3 frames/fields early
  • 108. − Demonstrates incorrect processing of Chroma in interlaced 4:2:0 systems. Chroma Motion Error 108 Correct Incorrect
  • 109. − When center-cropped to the indicated aspect ratio, the associated dashed lines will be the edge pixels of the new image. Border Marquee & Crop Lines 109 Correct: Top white dashed line is showing Top line is missing