Subjective study of the influence  of color information on visual quality assessment of high resolution pictures   Francesca De Simone  a , Frederic Dufaux  a , Touradj Ebrahimi  a , Cristina Delogu  b , Vittorio Baroncini  b a  École Polytechnique Fédérale de Lausanne, Switzerland b   Fondazione Ugo Bordoni, Italy
Outline Introduction State of the art Motivations behind this study Goal and main challenges Proposed test methodology Test material presentation and rating Test room set up Test material selection Starting point Selection of original images Selection of compressed images Final test dataset Results Subjective data processing Presentation of the results Results for different images Conclusions Achievements and future work
Introduction
In most applications, pictures and video data are in color BUT Full Reference (FR) visual quality metrics usually take into account  only the luminance channel of the picture under analysis  (“luminance only metrics”) Observation objective quality metric capture and processing color pictures end user only luminance component experience, expectations,  … . QUALITY RATE COLOR WORLD
Some “color metrics” have been proposed in literature  [1][2][3] BUT “ is there a significant improvement with respect to much simpler luminance only metrics?” - few verification results they are not widely known and used - quite complex algorithms - not publicly available implementations State of the art [1] C. Chou et al. “A Fidelity Metric for Assessing Visual Quality of Color Images”, Aug. 2007. [2] P. Le Callet et al. “A robust quality metric for color image quality assessment”, Sept. 2003. [3] C. Charrier et al. “A psychovisual color image quality metric integrating both intra and inter channel masking effect”, Sept. 2004.
Quality evaluation and optimization of image processing algorithms are usually done by means of luminance only metrics “ Is this approach limiting  a priori  the correlation with subjective judgment of perceived quality?” Can we measure how much the color information influences a human observer’s quality judgment? Proposed psycho-visual experiments. Motivations behind this study
Goal of the experiment: S how that the perceived quality by a human subject is different when observing a color picture versus its luminance version only Quantify this difference Main challenges:   l ack of a standard which describes how to  assess: mono-channel (luminance) vs. multi-channel (color)  images high resolution images  Goal and main challenges
Proposed test methodology
Double Stimulus Continuous Quality Scale (DSCQS)  method   [ITU-R Rec. BT.500-11]   adapted to deal with the evaluation of still pictures: Test material presentation and rating (I) Reference Image Test Image test picture and its reference are  shown at the same time. the assessor is not told about the presence of a reference picture. positions of reference and test pictures are systematically switched. test pairs related to different original contents are always alternated.
Test material presentation and rating (II) when the subject clicks into the active area of the screen a rating input window is shown. the subject has to rate the quality of the two pictures choosing for each a value in between 0 (Very bad quality) to 100 (Excellent quality). Rating window
Test material presentation and rating (III) In a first session  (“color data session”)  only color images are displayed. In a second session  (“luminance data session”)  only the luminance component of each color image of the previously used dataset is displayed. Each subject performs both the sessions, in two different days. Before each session, instructions are provided to subjects and a training session is performed  contents shown for training are not used for testing data gathered during the training are not included in the final test results
Test room set up The experiment was conducted at the Fondazione Ugo Bordoni test laboratory: high performance PC  video board connected to two identical high resolution monitors (Samsung 226 CW) to obtain extended desktop area of 3200 x 1200 pixels A dedicated GUI presented the reference image on one screen and the test image on the other screen, as two adjacent windows of equal size. monitor calibration using color calibration device (EyeOne Display2)
Test material selection
Starting point Database of 23 natural images: 24 bit-per-pixel (bpp) high resolution (2560x1600 pixels)  pictures  [4] Distortions introduced by 2 compression algorithms: 4:4:4 JPEG coding  (IJG software) 4:4:4 JPEG 2000 coding  (Kakadu software) Need to reduce the duration of each test session implies a  strong limitation in the number of original and compressed images,   which can be included in the test material . [4] Dataset established by Microsoft and available for JPEG members at http://www.jpeg.org.
Selection of original images Classification of the 23 pictures by estimating “how difficult each image is for coding”. analysis of the slope of the Rate-Distortion (RD) curves. computation of the Spatial Information (SI) index. visual verification of variations in the quality of compressed images. 2)  4 original images were selected ,   which represent different coding complexity levels. image 09 image 12 image 23 image 17
Selection of compressed images For each content and coding algorithm, 5 compressed versions were selected: identification of the “visual area” (i.e. minimum and maximum bpp values before which and after which, respectively, the quality does not change significantly). selection of significant samples in the visual area (overall picture quality JND test). 2) For each image, further reduction of compressed versions to retain only those with quality levels reasonably distinguishable from each other: 7 compressed images  were selected  for each original image  (3 JPEG and 4 JPEG 2000 compressed images).
Final test dataset Color test dataset with 28 images (i.e. 28 different test pairs) : 4 different originals 7 differently compressed images Luminance dataset simply obtained by applying RGB to Y’CbCr color transform on the 28 color test pictures and keeping only the luminance component. To show the reference and the test image at the same time, only a portion of each test picture is shown, corresponding to half of the screen resolution (i.e. two 1600x1200 images).
Results
Subjective data processing 6 expert subjects  rather than naïve viewers  [5] . For each original content, the  Differential Mean Opinion Score (DMOS)  is computed for each test condition: DMOS test condition  = MOS reference picture  - MOS test picture higher DMOS values mean lower visual quality scores Two sets of DMOSs are obtained: DMOS-Luma and DMOS-Color. [5] ITU-R Rec. BT.1663 “Expert viewing methods to assess the quality of systems for the digital display of LSDI in theatres”.
Presentation of results Results are grouped for each test image and show DMOS-Luma and DMOS-Color values of the seven compressed test samples:  J_01, J_02, J_03, J2_01, J2_02, J2_03, J2_04. Test samples J_01 to J_03 are JPEG compressed images, with descending compression ratios. Test samples J2_01 to J2_04 are JPEG 2000 compressed images, with descending compression ratios. The values on the abscissa are ordered to show ascending DMOS_Color values in the graphs.
Results for image 09 Coding complexity level = 1 (simple)
Results for image 12 Coding complexity level = 2
Results for image 23 Coding complexity level = 3
Results for image 17 Coding complexity level = 4 (difficult)
Conclusions
Evidence of a  relevant difference in the perceived visual quality when only the luminance channel of an image is assessed, instead of its color version . Assessment of luminance component instead it s  color version often leads to optimistic assessment results This evidence should encourage the design of objective color visual quality metrics! Future work: extending the panel of subjects including naïve viewers Further characterization of the original images analysis of artifacts masking effects U se of subjective data to create multi-channel extension of existing single-channel metrics. Achievements and future work
Thank you for your attention! Questions?

Subjective study of the influence of color information on visual quality assessment of high resolution pictures

  • 1.
    Subjectivestudy of the influence of color information on visual quality assessment of high resolution pictures Francesca De Simone a , Frederic Dufaux a , Touradj Ebrahimi a , Cristina Delogu b , Vittorio Baroncini b a École Polytechnique Fédérale de Lausanne, Switzerland b Fondazione Ugo Bordoni, Italy
  • 2.
    Outline Introduction Stateof the art Motivations behind this study Goal and main challenges Proposed test methodology Test material presentation and rating Test room set up Test material selection Starting point Selection of original images Selection of compressed images Final test dataset Results Subjective data processing Presentation of the results Results for different images Conclusions Achievements and future work
  • 3.
  • 4.
    In most applications,pictures and video data are in color BUT Full Reference (FR) visual quality metrics usually take into account only the luminance channel of the picture under analysis (“luminance only metrics”) Observation objective quality metric capture and processing color pictures end user only luminance component experience, expectations, … . QUALITY RATE COLOR WORLD
  • 5.
    Some “color metrics”have been proposed in literature [1][2][3] BUT “ is there a significant improvement with respect to much simpler luminance only metrics?” - few verification results they are not widely known and used - quite complex algorithms - not publicly available implementations State of the art [1] C. Chou et al. “A Fidelity Metric for Assessing Visual Quality of Color Images”, Aug. 2007. [2] P. Le Callet et al. “A robust quality metric for color image quality assessment”, Sept. 2003. [3] C. Charrier et al. “A psychovisual color image quality metric integrating both intra and inter channel masking effect”, Sept. 2004.
  • 6.
    Quality evaluation andoptimization of image processing algorithms are usually done by means of luminance only metrics “ Is this approach limiting a priori the correlation with subjective judgment of perceived quality?” Can we measure how much the color information influences a human observer’s quality judgment? Proposed psycho-visual experiments. Motivations behind this study
  • 7.
    Goal of theexperiment: S how that the perceived quality by a human subject is different when observing a color picture versus its luminance version only Quantify this difference Main challenges: l ack of a standard which describes how to assess: mono-channel (luminance) vs. multi-channel (color) images high resolution images Goal and main challenges
  • 8.
  • 9.
    Double Stimulus ContinuousQuality Scale (DSCQS) method [ITU-R Rec. BT.500-11] adapted to deal with the evaluation of still pictures: Test material presentation and rating (I) Reference Image Test Image test picture and its reference are shown at the same time. the assessor is not told about the presence of a reference picture. positions of reference and test pictures are systematically switched. test pairs related to different original contents are always alternated.
  • 10.
    Test material presentationand rating (II) when the subject clicks into the active area of the screen a rating input window is shown. the subject has to rate the quality of the two pictures choosing for each a value in between 0 (Very bad quality) to 100 (Excellent quality). Rating window
  • 11.
    Test material presentationand rating (III) In a first session (“color data session”) only color images are displayed. In a second session (“luminance data session”) only the luminance component of each color image of the previously used dataset is displayed. Each subject performs both the sessions, in two different days. Before each session, instructions are provided to subjects and a training session is performed contents shown for training are not used for testing data gathered during the training are not included in the final test results
  • 12.
    Test room setup The experiment was conducted at the Fondazione Ugo Bordoni test laboratory: high performance PC video board connected to two identical high resolution monitors (Samsung 226 CW) to obtain extended desktop area of 3200 x 1200 pixels A dedicated GUI presented the reference image on one screen and the test image on the other screen, as two adjacent windows of equal size. monitor calibration using color calibration device (EyeOne Display2)
  • 13.
  • 14.
    Starting point Databaseof 23 natural images: 24 bit-per-pixel (bpp) high resolution (2560x1600 pixels) pictures [4] Distortions introduced by 2 compression algorithms: 4:4:4 JPEG coding (IJG software) 4:4:4 JPEG 2000 coding (Kakadu software) Need to reduce the duration of each test session implies a strong limitation in the number of original and compressed images, which can be included in the test material . [4] Dataset established by Microsoft and available for JPEG members at http://www.jpeg.org.
  • 15.
    Selection of originalimages Classification of the 23 pictures by estimating “how difficult each image is for coding”. analysis of the slope of the Rate-Distortion (RD) curves. computation of the Spatial Information (SI) index. visual verification of variations in the quality of compressed images. 2) 4 original images were selected , which represent different coding complexity levels. image 09 image 12 image 23 image 17
  • 16.
    Selection of compressedimages For each content and coding algorithm, 5 compressed versions were selected: identification of the “visual area” (i.e. minimum and maximum bpp values before which and after which, respectively, the quality does not change significantly). selection of significant samples in the visual area (overall picture quality JND test). 2) For each image, further reduction of compressed versions to retain only those with quality levels reasonably distinguishable from each other: 7 compressed images were selected for each original image (3 JPEG and 4 JPEG 2000 compressed images).
  • 17.
    Final test datasetColor test dataset with 28 images (i.e. 28 different test pairs) : 4 different originals 7 differently compressed images Luminance dataset simply obtained by applying RGB to Y’CbCr color transform on the 28 color test pictures and keeping only the luminance component. To show the reference and the test image at the same time, only a portion of each test picture is shown, corresponding to half of the screen resolution (i.e. two 1600x1200 images).
  • 18.
  • 19.
    Subjective data processing6 expert subjects rather than naïve viewers [5] . For each original content, the Differential Mean Opinion Score (DMOS) is computed for each test condition: DMOS test condition = MOS reference picture - MOS test picture higher DMOS values mean lower visual quality scores Two sets of DMOSs are obtained: DMOS-Luma and DMOS-Color. [5] ITU-R Rec. BT.1663 “Expert viewing methods to assess the quality of systems for the digital display of LSDI in theatres”.
  • 20.
    Presentation of resultsResults are grouped for each test image and show DMOS-Luma and DMOS-Color values of the seven compressed test samples: J_01, J_02, J_03, J2_01, J2_02, J2_03, J2_04. Test samples J_01 to J_03 are JPEG compressed images, with descending compression ratios. Test samples J2_01 to J2_04 are JPEG 2000 compressed images, with descending compression ratios. The values on the abscissa are ordered to show ascending DMOS_Color values in the graphs.
  • 21.
    Results for image09 Coding complexity level = 1 (simple)
  • 22.
    Results for image12 Coding complexity level = 2
  • 23.
    Results for image23 Coding complexity level = 3
  • 24.
    Results for image17 Coding complexity level = 4 (difficult)
  • 25.
  • 26.
    Evidence of a relevant difference in the perceived visual quality when only the luminance channel of an image is assessed, instead of its color version . Assessment of luminance component instead it s color version often leads to optimistic assessment results This evidence should encourage the design of objective color visual quality metrics! Future work: extending the panel of subjects including naïve viewers Further characterization of the original images analysis of artifacts masking effects U se of subjective data to create multi-channel extension of existing single-channel metrics. Achievements and future work
  • 27.
    Thank you foryour attention! Questions?