Research and activity report

2,010 views
1,914 views

Published on

This presentation depicts my research and teching activities since the achievement of the PhD degree, and it is mainly based on the talk I gave to defend my HDR (Habilitation à Diriger des Recherches)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,010
On SlideShare
0
From Embeds
0
Number of Embeds
1,461
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Research and activity report

  1. 1. Activity & research report Marco Cagnazzo Paris, September 2013
  2. 2. Overview • Activity report – Teaching and PhD supervision – Projects and other activities – Bibliometrics • Research themes – Video coding optimization • Motion representation • 3D video coding – Adaptive image compression – Distributed video coding • Multiview DVC • SI effectiveness evaluation – Robust video streaming • Streaming protocols • Network coding • Conclusions 2
  3. 3. Timeline • 2002-2005 : PhD @ University of Naples & University of Nice-Sophia Antipolis (cotutelle) • 2005-2006 : Post-doc @ National Multimedia Lab (Naples) & Assistant professor @ University of Naples • 2006-2008 : Post-doc @ I3S Lab, Sophia Antipolis • Since February 2008: Maître de conférences in Digital Video @ Telecom-ParisTech 3 2002 2004 2006 2008 2010 2012 2014
  4. 4. Teaching 4 Name Institution Years Information Theory “Parthenope” University of Naples 2004-2005 Responsible Multimedia signal processing “Federico II” University of Naples 2005-2007 Responsible Compression techniques Telecom-ParisTech 2008- … Co-responsible Digital video and multimedia Telecom-ParisTech 2009- … Responsible Digital television Telecom-ParisTech 2009- … Co-responsible, CE Video over mobile Telecom-ParisTech 2009- … Co-responsible, CE 3D Video Telecom-ParisTech 2010- … Co-responsible, CE
  5. 5. Teaching • Collaborative Learning Thematic Project • Tools and applications for signals, images and sound • Image processing and analysis • Advanced methods for image processing • Computer vision • Web Mining • Introduction to image processing (ATHENS) • Multimedia Indexing and Retrieval (ATHENS) • Short and long student projects (“projet libres” and “stages”) • Image and video compression • Video over IP • Signal and image processing • Wavelet and signal processing Total: ≈1100 hours (heures équivalentes TD) 5
  6. 6. PhD students Name Years Subject Marwa Meddeb 2013 - Video-conference with HEVC Marco Calemme 2012 - 3D Video and Depth coding Aniello Fiengo 2012 - Rate allocation for video Giovanni Chierchia 2011 - Convex optimization Elie Gabriel Mora 2011 - 3D Video Compression Giovanni Petrazzuoli 2009 - 2013 DVC and IMVS Abdel-Bassir Abou El Ailah 2009 - 2012 DVC and FRI signals Claudio Greco 2008 - 2012 Robust video streaming Thomas Maugey 2007 - 2010 Multiview DVC In addition to a dozen of MSc students supervision 6
  7. 7. Research projects Name Period Subject LABNET 2001-2002 Low-complexity video coding CNRAED 2004-2005 Hyper-spectral image coding CPRE 46 04 06 11 2006-2007 Region based motion vector coding Secure Media SIM 2007-2008 Secure video coding over SIM card AIBER 2008 Wavelet-based scalable video coding DIVINE 2007-2009 Robust video coding DITEMOI 2007-2010 Video streaming over wireless networks (*) PERSEE 2009-2013 Perceptual 2D and 3D video coding (*) SWAN 2011-2013 Network coding SURICATE Approved Video protection WOW Submitted Interactive 3D streaming (**) (*) Responsible for Telecom-ParisTech (**) Project coordinator Moreover: smaller contributions to ACDC, Pingo, Sebastian 2, NeVEx 7
  8. 8. Other responsibilities • 8 PhD Thesis committees (4 as examiner, 4 as co-supervisor) • Area editor for 2 Elsevier journals (SPIC, SIGPRO) • Reviewer for main journals and conferences in the field • Participation to conference organization (Organizing committees of MMSP’10, EUVIP’11, EUSIPCO’12, ICIP’14) • Special session co-organization (EUSIPCO’10, DSP’11, WIAMIS’13, ASILOMAR’13) • Correspondant académique between Telecom-ParisTech and the University of Naples • Yearly Erasmus lessons at University of Naples • Invited lesson at the Winter Doctoral School, University of Naples (2010) • IEEE Senior Member (‘11), IEEE SPS Member, EURASIP Member 8
  9. 9. Bibliometrics • 15 journal papers: 13 published, 2 to appear – One paper selected as “High quality paper” by the IEEE MMTC-R Letter board, and included in the January 2013 issue • 4 submitted journal papers: 2 in first round; 2 in preparation for the second round • 3 journal papers in preparation • 59 conference papers: 56 published and 3 to appear – Two MMSP Top 10% awards • One standardization contribution • One co-edited book – F. Dufaux, B. Pesquet-Popescu, M Cagnazzo (eds.): Emerging Technologies for 3D Video. Wiley, 2013 • 9 book chapters: 3 published and 6 to appear • According to the Google Scholar web site, my H-index is equal to 13 (update: August 31, 2013) 9
  10. 10. VIDEO CODING OPTIMIZATION 1 Standardization contribution 8 Conference papers 1 Submitted journal paper 4 Journal papers
  11. 11. Motion vector representation • Quantization of motion vectors to reduce their coding cost • Motion vector refinement and dense motion vector representation generated at the decoder • Lossless coding of segmented motion fields • Motion estimation for wavelet-based video coding 11
  12. 12. MC Motion vector quantization ME DCT IDCT Frame Buffer Q 𝜆 𝒗∗ Frame Buffer MC𝐵 𝜃 𝑄 𝑝 𝒗∗ 𝐵(𝑄 𝑝) • M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009. • S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in Picture Coding Symposium, (Chicago, IL), 2009. • M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011. 12 DecoderEncoder
  13. 13. MC Motion vector quantization ME DCT IDCT Frame Buffer Q 𝜆 𝒗∗ Q Frame Buffer MC𝐵 𝜃 𝑄 𝑝 𝒗(𝑄 𝑣) 𝑄 𝑣 𝐵(𝑄 𝑝, 𝑄 𝑣) 13 • M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009. • S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in Picture Coding Symposium, (Chicago, IL), 2009. • M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011. DecoderEncoder
  14. 14. Quantization step for motion vectors • Double-pass approach – Estimation of the best step over a frame – Actual encoding with the selected step • Estimation: – Sum of distortions – Oracle (used as reference) • Results: average rate reduction ≈ 4% with respect to H.264 and ≈ 8% with respect to H.264 1/8-pel NB: All rate reductions for video are measured using the Bjontegaard metric (approximated average rate reduction for the same PSNR over a given interval) 14
  15. 15. Differential techniques for ME BMA ME Hybrid Coder H.264 Stream Residual MVs Differential MV refinement Input video Side Info Enhancement Layer Residual Hybrid Coder MVs M. Cagnazzo and B. Pesquet-Popescu, “Introducing differential motion estimation into hybrid video coders,” in SPIE Visual Communications and Image Processing Conference, vol. 1, (Huang Shan, An Hui, China), pp. 1–4, 2010. 15
  16. 16. Differential ME in hybrid video coding • Layered representation of video • Base layer compatible with any hybrid technique • Enhancement layer uses costless refined vectors 𝛿𝐯 𝑛, 𝑚 = −𝑒 𝑛,𝑚 𝜆 + 𝝓 𝑛,𝑚 2 𝝓 𝑛,𝑚 • The refinement depends on the motion compensated error image 𝑒 and on the motion compensated reference image gradient 𝝓 • Proof of principle, small improvements (up to almost 1% rate reduction) 16
  17. 17. Context quantization • Target: exploit high-order statistical dependencies in segmented motion fields to reduce the coding rate (lossless coding) • Tool: context-based lossless encoder – Implemented with an arithmetic coder • Problem: high-order dependencies  large context  context dilution – I.e. too many contexts, difficult to estimate conditional probabilities • Solution: context quantization • M. Cagnazzo, M. Antonini, and M. Barlaud, “Mutual information-based context quantization,” Signal Proc.: Image Comm. (Elsevier Science), pp. 64–74, Jan. 2010. 17
  18. 18. Context quantization • Contexts (i.e. sequences of already encoded symbols) are grouped into classes • Rate increase: the average information loss of including a context into a class ℒ 𝑓 = 𝑝 𝑥 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝 𝑌 𝑓 𝑥 𝑥∈𝒳 𝑥: generic context 𝑌: symbol to encode 𝑓: context quantization function, i.e. context label 18
  19. 19. Context quantization • Problem: finding optimum 𝑓 • Classical approach – Start with a set of classes – Move a context from a class 𝑐𝑖 to a class 𝑐𝑗 as far as the relative entropy 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑖) is larger than 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑗) – Stopping criterion on the relative improvement of the objective function ℒ(𝑓) or on the number of iterations 19
  20. 20. Context quantization • Classical approach – Intuitive, very popular, good results – Some open questions: • Does the basic step actually reduce the cost function at each iteration? • Is it the largest possible reduction? • If not, what is the largest possible reduction, and can we achieve it? • Contribution: answers to these questions 20
  21. 21. Context quantization • We found the expression of the cost function variation Δℒ associated to the displacement of a context from a class to another • We proved that with the classical approach, each iteration actually reduces the cost function… • … but not as much as actually possible • We found the best step • Rate reductions: up to 3.6% on motion data and to a further 5% on synthetic data (global minimization based on dynamic programming) 21
  22. 22. ME criterion for WT-based video coding • WT video coding is based on temporal transform rather than classical temporal prediction • Therefore MSE-based ME is not assured to be optimal • The optimal criterion is the maximization of the coding gain: CG = 𝑎𝑖 𝑤𝑖 𝜎𝑖 2𝑀 𝑖=1 𝑤𝑖 𝜎𝑖 2 𝑎 𝑖𝑀 𝑖=1 • where 𝑖 is the subband index, 𝜎𝑖 2 the variance, 𝑎𝑖 is the relative number of coefficients, and 𝑤𝑖 the normalization factor of the 𝑖-th subband • M. Cagnazzo, F. Castaldo, T. André, M. Antonini, and M. Barlaud, “Optimal motion estimation for wavelet video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 907–911, July 2007. 22
  23. 23. ME criterion for WT-based video coding • We showed that we only have to minimize 𝜌2 = 𝜎𝑖 2 𝑎 𝑖𝑀 𝑖=1 • In general each MV influences all the subbands, the problem is still complex • However, the CG can be analytically maximized for a particular class of MC-ed lifting schemes, the (𝑁, 0) LS 𝒗 𝑩 ∗ , 𝒗 𝑭 ∗ = argmin 𝑣 𝐵,𝑣 𝐹 ℰ 𝜖 𝐵 + ℰ 𝜖 𝐹 + 2 𝜖 𝐵, 𝜖 𝐹 • Average rate reduction: 8% 23 x7x6x5 x8x3x2x1 x4 h1 h2 l1 l2 h3 h4 l3 l4 x9 … Input frames High-frequency subband Low-frequency subband (2,0) LS NB: All rate reductions for video are measured using the Bjontegaard metric
  24. 24. 3D video coding • MVD format : multiple views plus depth • Inter-view and inter-component redundancy • Three contributions for the upcoming standard 3D-HEVC 24
  25. 25. Modification of the Merge candidate list for 3D-VC • In the Merge mode, a block is predicted using a vector from a short list (Merge list) • Coding the index list is much less costly than coding the vector • It can be a motion vector or a disparity vector • In 3D-HEVC, MVs are much more frequently selected than DVs • We have proposed to insert a further DV in the Merge list • Several positions in the primary and secondary list have been tested • Best results obtained with the first position of the secondary list • We obtained both a rate reduction (0.6%) and a complexity reduction (4%) • Contribution accepted into the standard • E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. "Modification of the merge candidate list for dependent views in 3D-HEVC". In IEEE International Conference on Image Processing, September 2013. Melbourne, Australia. • E. Mora, B. Pesquet, M. Cagnazzo and J. Jung. Modification of the Merge Candidate List for Dependent Views in 3DV-HTM. Document JCT3V-B0069 for Shanghai meeting (MPEG number m26793). Shanghai (PRC), October 2012. 25
  26. 26. Intra mode inheritance for 3D-HEVC • Observation: blocks with strong contours and one dominant direction tend to be encoded with the same Intra directional mode in Texture and Depth • Idea: when coding Depth, add the co-located Intra mode to the Most Probable Mode list when a dominant direction is detected • Dominant direction is revealed by the presence of a single peak in the histogram of the gradient angle for the current block • E. Mora, J. Jung, M. Cagnazzo, and B. Pesquet-Popescu, “Codage de vidéos de profondeur basé sur l’héritage des modes intra de texture,” in Compression et Représentation des Signaux Audiovisuels, vol. 1, (Lille, France), pp. 1–4, 2012. • E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Depth Video Coding Based on Intra Mode Inheritance From Texture”. Submitted to APSIPA Transactions on Signal and Information Processing (2013) 26 2. Compute the gradient statistics 1. Find the reference texture block 3. If a dominant direction is detected, add to the MPM list
  27. 27. Intra mode inheritance for 3D-HEVC • The Dominant angle revealed to be an effective feature for detecting blocks were inheritance is effective • Inserting the inherited mode in the MPM list allows an average coding rate reduction of ≈1% • Tests performed over MPEG sequences under Common Test Conditions 27
  28. 28. Enhanced quad-tree coding for 3D-HEVC • The 3D-HEVC codec uses quad-trees for encoding texture and depth • These trees are quite correlated • We propose an inter-component coding tool for both reducing complexity and rate by exploiting the quad- tree redundancy • Two variants, according to the component that is encoded first (texture or depth) • Contribution to 3D-HEVC working draft and reference software • E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Initialization, limitation and predictive coding of the depth and texture quad-tree in 3D-HEVC Video Coding”. Accepted into IEEE Transaction on Circuits and Systems for Video Technology 28
  29. 29. Enhanced quad-tree coding for 3D-HEVC • Observation: texture coding units are very often as much partitioned as depth • Therefore we can limit the depth map partitioning level if we know texture… • … or we can initialize the texture partitioning if we know depth • Complexity reduction (less configuration to examine): up to -31% encoder saving time • Rate reduction (easier prediction of coding modes): up to -1.8% 29
  30. 30. Don’t Care Regions A depth pixel only needs to be reconstructed such that the resulting geometric error leads to an acceptable distortion in the synthesized view Disparity value Error in the synthesized pixel value DCR G. Valenzise, G. Cheung, R. Galvao, M. Cagnazzo, B. Pesquet-Popescu, and A. Ortega, “Motion prediction of depth video for depth-image-based rendering using Don’t Care Regions,” in Picture Coding Symposium, vol. 1, (Krakow, Poland), pp. 1–4, 2012. 30
  31. 31. DCR Example (Kendo, frame 10, t = 5) 31
  32. 32. Don’t Care Regions We embedded DCR into a H.264/AVC encoder, changing three basic aspects: 1. Motion estimation 2. Residual coding 3. Skip mode 32
  33. 33. Don’t Care Regions 33
  34. 34. Don’t Care Regions • We compute and encode prediction residuals wrt the DCRs • For SKIP mode, no prediction residuals are coded – The reconstructed values could be far outside the DCR, leading to an arbitrarily high distortion in the synthesized view – We adopt a conservative policy: prevent SKIP selection when any reconstructed pixel is outside its DCR • Results: average rate saving of 7% • High preprocessing complexity 34
  35. 35. Other work in 3D video coding • Dense disparity field for MVV and MVD coding • Depth coding using elastic curve model • I. Daribo, M. Kaaniche, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Dense disparity estimation in multiview video coding,” in IEEE Workshop on Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009. • M. Cagnazzo and B. Pesquet-Popescu, “Depth map coding by dense disparity estimation for MVD compression,” in IEEE Digital Signal Processing, (Corfu, Greece), 2011. • E. Mora, J. Jung, B. Pesquet-Popescu, M. Cagnazzo. "Modification of the disparity vector derivation process in 3D-HEVC". In IEEE Workshop on Multimedia Signal Processing, vol. 1, September 2013. Cagliari, Italy. 35
  36. 36. OBJECT-BASED IMAGE CODING 6 Conference papers 2 Journal papers
  37. 37. Region-based hyperspectral image coding Multispectral / Hyperspectral Image Map Segmentation (TS-VQ) Map Coding Region Coding • M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Region based compression of multispectral images by classified KLT,” in EUSIPCO. 2006. • M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Adaptive region-based compression of multispectral images,” in Proceed. of IEEE Intern. Conf. Image Proc., (Atlanta, GA), pp. 3249–3252, Oct. 2006 • M. Cagnazzo, S. Parrilli, G. Poggi, and L. Verdoliva, “Costs and advantages of object-based image coding with shape-adaptive wavelet transform,” EURASIP J. Image Video Proc., 2007 37
  38. 38. Region-based hyperspectral image coding • Spectral transform: WT, global KLT, class-based KLT, region-based KLT • Spatial transform: WT, SA-WT • Encoder: SA-SPIHT with optimal rate allocation among objects • Results: – 0.5 dB better than JP2K-Multicomponent – Better post-processing (i.e. classification) results • M. Cagnazzo, G. Poggi, and L. Verdoliva, “Region-based transform coding of multispectral images,” IEEE Trans. on Image Processing, vol. 16, pp. 2916–2926, Dec. 2007. 38
  39. 39. Region-based hyperspectral image coding AVIRIS image 32 bands, 0.3 bps (original @16bps) Landsat TM image 6 bands, 0.6 bps (original @8bps) 39
  40. 40. Adaptive wavelet and rate allocation • Adaptive wavelets (implemented via lifting schemes) allows to change filters according to the signal characteristics • Further constraint: reconstruction without sending side information x(k) xd(k)= y01 (k) U -PDSplit d(k) xa(k)=y00 (k) • S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Distortion evaluation in transform domain for adaptive lifting schemes,” in IEEE Workshop on Multimedia Signal Processing, (Cairns, Australia), pp. 200–205, 2008. • S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Estimation of quantization noise for adaptive-prediction lifting schemes,” in IEEE Workshop on Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009. 40 x(k) xd(k)= y01 (k) U -PDSplit d(k) xa(k)=y00 (k)
  41. 41. Adaptive wavelet and rate allocation • The resulting transform is highly non-orthogonal • Problem: distortion evaluation in the transform domain in order to perform rate allocation • Solutions for uncorrelated noise – Good error energy evaluation – Performance improvement for ALS up to 3dB – Improved SSIM (+3%) 41 • M. Cagnazzo and B. Pesquet-Popescu, “Perceptual impact of transform coefficients quantization for adaptive lifting schemes,” in International Workshop on Video Processing and Quality Metrics for Consumer Electronics, (Scottsdale, AZ), 2010. • M. Abid, M. Cagnazzo, and B. Pesquet-Popescu, “Image denoising by adaptive lifting schemes,” in European Workshop on Visual Information Processing, vol. 1, (Paris, France), 2010
  42. 42. DISTRIBUTED VIDEO CODING 17 Conference papers 2 Submitted journal paper 3 Journal papers
  43. 43. Distributed video coding • Coding of many correlated sources • Encoders do not communicate one with another • Same RD performance of centralized coding (in theory only!) Slepian-Wolf Coder Quantizer Turbo Encoder Min Distort Reconstr Q Q’ Buffer Turbo Decoder WZ WZWZ SI Image Interpolation KF KF Intra Coder Intra Decoder Decoded KFs Decoded WZFs Encoder Decoder 43
  44. 44. Image interpolation: High-order trajectories for ME in DVC • G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. "High order motion interpolation for side information improvement in DVC". In International Conference on Acoustics, Speech and Signal Processing, March 2010. Dallas, TX • G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “Fast and efficient side information generation in distributed video coding by using dense motion representation,” in European Signal Processing Conference, (Aalborg, Denmark), 2010. • G. Petrazzuoli, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Side information refinement for long duration GOPs in DVC,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010. 44 Rate −3.3%
  45. 45. Image interpolation: Pel-based motion estimation • Block-based object trajectory used as initialization • Within each block, pixel-by-pixel vectors are obtained by refining the initialization (Cafforio-Rocca algorithm) • Refinement equations have been re-written and solved () since in this case the reference image does not exist • Rate reductions: 3.5% to 6% • M. Cagnazzo, T. Maugey, and B. Pesquet-Popescu, “A differential motion estimation method for image interpolation in distributed video coding,” in International Conference on Acoustics, Speech and Signal Processing, vol. 1, (Taiwan), pp. 1861–1864, 2009. • W. Miled, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Image interpolation with dense disparity estimation in multiview distributed video coding,” in International Conference on Distributed Smart Cameras, (Como, Italy), 2009. • T. Maugey, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Méthodes denses d’interpolation de mouvement pour le codage vidéo distribué monovue et multivue,” in Colloque GRETSI - Traitement du Signal et des Images, (Dijon (France)), 2009. • M. Cagnazzo, W. Miled, T. Maugey, and B. Pesquet-Popescu, “Image interpolation with edge-preserving differential motion refinement,” in IEEE International Conference on Image Processing, vol. 1, (Cairo, Egypt), pp. 361–364, 2009. 45
  46. 46. The Cafforio-Rocca algorithm: Sample results 46
  47. 47. Local and global SI fusion • Given the WZF, feature points on the reference frames are extracted by SIFT • Matching features allow to perform a global motion compensation (first SI) • Local motion compensation (traditional method) is also performed (second SI) • The two SI are merged using partial channel decoding and re- estimating motion • Experiments show average rate reduction of ≈ 25% with respect to literature references • A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, B. Pesquet-Popescu, and J. Farah, “Successive refinement of side information using adaptive search area for long duration GOPs in distributed video coding,” in International Conference on Telecommunications, (Beirut), 2012. • A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, and J. Farah, “Fusion of global and local side information using support vector machine in transform-domain DVC,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012. • A. Abou-El Ailah, G. Petrazzuoli, J. Farah, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Side Information Improvement in Transform-Domain Distributed Video Coding". In SPIE - Applications of Digital Image Processing,. San Diego, CA (USA), Aug. 2012 • A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013. 47
  48. 48. Multiview DVC • Motion models for temporal image interpolations – High order motion interpolation – Pixel-based motion vector refinement • Multi-hypothesis SI fusion based on observed parity bits and Bayesian classification 48 Views Time KFWZWZ KF KF WZ KF WZ KF KFWZWZ KF KF WZ KF WZ KF WZ WZ • G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. “Novel solutions for side information generation and fusion in multiview distributed video coding”. Submitted to Eurasip Journal of Advances in Signal Processing
  49. 49. Multiview DVC • Step 1: produce a temporal estimation with HOMI • Step 2: produce a inter-view estimation with occlusion reduction (use disparity to estimate foreground objects) • Step 3: produce a fusion of the two estimations using Left-Right Consistency Check to remove residual occlusions • Step 4: Select one out of these three images as side information 49
  50. 50. Multiview DVC • For one image out of 𝑁 we ask for parity bits for temporal and inter-view estimation • We compare the number of bits needed for correcting the two estimations: – If they are close, we choose the fusion image – If not, we select the image with the least rate • Equivalent to Bayesian decision 𝐷 = arg max 𝑑 𝑃 𝐷 = 𝑑 𝛿 𝑅 = arg max 𝑑 𝑝 𝛿 𝑅 𝑑 𝑃 𝑑 = arg max 𝑑 𝑓𝑑(𝛿 𝑅) 50
  51. 51. Multiview DVC • Experiments show that the Bayesian classifier selects very often the best SI • It only may be wrong when the decoding rates are very near each to the other, but thus, selecting a suboptimal SI does not degrade performance • Cumulated gain w.r.t to the state of the art: ≈ 9.1% rate reduction 51
  52. 52. Side information effectiveness • Side information is corrected with parity bits to produce the decoded WZ frame • Intuitively, the most the SI “is similar” to the original image, the less parity bits are needed • Traditionally, PSNR between SI and WZF has been used to evaluate the SI quality • However it is easy to build some toy example where two iso-PSNR images requires a very different number of correction bits SI PSNR: 29.1 dB SI PSNR: 29.1 dB Parity bits: 137kb Decoded quality: 39.3 dB Parity bits: 192kb Decoded quality: 35.4 dB • T. Maugey, J. Gauthier, M. Cagnazzo, B. Pesquet. “Evaluation of side information effectiveness in distributed video coding”. IEEE TCSVT, accepted 52
  53. 53. Side information effectiveness • Questions: why PSNR is not always reliable? Can we find better metrics? • Applications: Hash-based DVC systems, Witsenhausen-Wyner video coding systems, … • New framework for metric comparison based on end-to-end RD performance • Proposed metrics: SIQ 𝑎 𝐼0, 𝐼1 = 10 log10 2552 𝐼0 𝒑 − 𝐼1 𝒑 𝑎 𝒑 HSIQ 𝐼0, 𝐼1 = 10 log10 𝑁bits 𝑑H 𝐼0, 𝐼1 • SIQ1 and HSIQ improves wrt PSNR both theoretical and practical effectiveness measures (Hash-based system: 20% rate reduction) • PSNR works well for homogenous errors and start failing for large but spatially concentrated errors • T. Maugey, C. Yaacoub, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Side information enhancement using an adaptive hash-based genetic algorithm in a Wyner-Ziv context,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010 53
  54. 54. IMVS using DVC Views Time All frames are Intra Coded Each image is coded and stored only once Large bandwidth requested Relatively low server space requested 54
  55. 55. IMVS using DVC Views Time P-frames are used: all possible frame dependencies are coded Each image is coded many times Smallest bandwidth requested Very large server space requested 55
  56. 56. IMVS using DVC Views Time WZ-frames are used: only parity bits are coded Each image is coded and stored only once Trade-off between server space and bandwidth 56
  57. 57. IMVS using DVC 57 Bandwidth Server space Only Intra Predictive coding: Each image coded many times Ideal Case: Path known at encoding time WZ coding Operation region
  58. 58. IMVS for MVD using DVC • We proposed several strategies for view-switching • The best (adaptive) achieves a rate reduction of more than 15% wrt to reference methods G. Petrazzuoli, M. Cagnazzo, F. Dufaux, and B. Pesquet-Popescu, “Using distributed source coding and depth image based rendering to improve interactive multiview video access,” in IEEE International Conference on Image Processing, vol. 1, (Bruxelles, Belgium), pp. 605–608, 2011. G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC E-Letter (May 2013). 58
  59. 59. Other work on DVC • Fusion schemes for multiview DVC • Iterative methods for SI refinement • DVC for multiple-view-plus-depth video • DVC and interactive multiview streaming • Local and global SI fusion • Nine further conference papers • A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013. • G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC E-Letter (May 2013). 59
  60. 60. ROBUST VIDEO DISTRIBUTION 7 Conference papers 1 Submitted journal paper + 2 in preparation 2 Journal papers
  61. 61. ABCD protocol • Problem: reliable diffusion of video on wireless network • Construction of overlays to carry MDC video • Minimization of the number of sent packets (both video and management packets) • First contribution: a reliable extension of the IEEE 802.11 broadcast communication, using a control peer • Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it • C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “H.264-based multiple description coding using motion compensated temporal interpolation,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010 • C. Greco, G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “An MDC-based video streaming architecture for mobile networks,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Hangzhou, China), pp. 1–4, 2011. • C. Greco and M. Cagnazzo, “A cross-layer protocol for cooperative content discovery over mobile ad-hoc networks,” International Journal of Communication Networks and Distributed Systems, vol. 6, July 2011. • C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “ABCD : Un protocole cross-layer pour la diffusion vidéo dans des réseaux sans fil ad-hoc,” in Colloque GRETSI - Traitement du Signal et des Images, (Bordeaux, France), 2011. 61
  62. 62. ABCD protocol • Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it 𝑠 𝑝1 𝑝2 Advertisement Attachment Attachment 62
  63. 63. Video Data Video Data & Attachment Attachment ABCD protocol • Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it 𝑠 𝑝1 𝑝2 𝑝3 𝑝4 63
  64. 64. ABCD protocol: parent switch 𝑝∗ = arg min 𝑝 𝑤ℎℎ 𝑝 + 𝑤 𝑎 𝑎 𝑝 + 𝑤 𝑑 𝑑 𝑝 − 𝑤𝑔 𝑔(𝑝) 64
  65. 65. ABCD: simulation results (ns2) 65
  66. 66. ABCD/CoDiO • ABCD may suffer from high delay in large, crowded networks • To reduce the delay, we introduced a Congestion-Distortion Optimization (CoDiO) in the per-hop wireless broadcast transmission • We adjust the RTS/CTS retry limit k of each packet in a Co-Di optimized fashion • Small values of k reduce the congestion but the distortion increases, as the probability of obtaining the channel is lower • High values of k lower the distortion, but congestion increases due to the channel occupation Cost function: 𝐽 𝑘 = 𝐷 𝑘 + 𝜆𝐶(𝑘) • C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “Low-latency video streaming with congestion control in mobile ad-hoc networks,” IEEE Transactions on Multimedia, vol. 14, n. 4, pp. 1337-1350, Aug. 2012. Paper selected as “High quality paper” by the IEEE MMTC-R Letter board 66
  67. 67. ABCD/CoDiO Challenges: • Model the effects of a single-node decision on the entire network • Even if a node switches off, alternative paths may be formed • Information about alternative paths is gathered at leaves and conveyed upstream • The information is refined where it actually matters, i.e. near the root – where a single decision affects a lot of nodes 67
  68. 68. ABCD/CoDiO: simulation results (ns2) 68
  69. 69. Network coding for video delivery • Network coding allows incrementing network throughput by letting intermediate nodes processing packets instead of simply relaying them • NC can easily be extended to wireless networks 69
  70. 70. Network coding • Using ABCD as overlay to implement NC in wireless network • Optimized scheduling for MDC in Expanded Window NC • Optimized scheduling for multiview video over NC • Blind source separation for reducing the NC overhead 70
  71. 71. Network coding for video delivery • RDO-scheduling in NC-based delivery • A generation is composed by the frame of a multi-view GOP or a MDC GOP • Each node must decide the schedule of frames • I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A framework for joint multiple description coding and network coding over wireless ad- hoc networks,” in International Conference on Acoustics, Speech and Signal Processing, (Kyoto, Japan), 2012 • I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A network coding scheduling for multiple description video streaming over wireless networks,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012. • I. Nemoianu, C. Greco, M. Cagnazzo, B. Pesquet-Popescu. "Multi-View Video Streaming over Wireless Networks with RD-Optimized Scheduling of Network Coded Packets". In SPIE Visual Communications and Image Processing Conference, San Diego, CA (USA), Nov. 2012. 71
  72. 72. Network coding for video delivery • RDO calls for a unique scheduling (send first the frame that maximally reduces the RD cost function) • NC calls for different scheduling at each node (pseudo- random selection) in order to maximize the throughput • Solution: to collect frames into groups with “similar” RD characteristics, and randomly select within a group 72
  73. 73. BSS for NC • In NC the intermediate nodes of a network send linear combinations of the packets they have previously received, with random coefficients taken from a finite field • The random coefficients must be added to the packet as headers, incurring an overhead • In a blind source separation (BSS) based approach, it could be possible to relieve the nodes from the need to include the coefficients in the packets • BSS consists in recovering a set of source signals 𝑆 from a set of mixed signals 𝑋 = 𝑓(𝑆), also referred to as observations, without knowing the sources themselves nor the mixing process parameters; in NC we have linear mixing, 𝑋 = 𝐴𝑆 • I. Nemoianu, C. Greco, M. Castella, B. Pesquet-Popescu, M. Cagnazzo. "On a practical approach to source separation over finite fields for network coding applications". In International Conference on Acoustics, Speech and Signal Processing, May 2013. Vancouver, Canada. 73
  74. 74. BSS for NC • Literature BSS approach in finite fields: – Iterative scan of packet combinations – Minimization of a contrast function • Our idea: add to packets a signature that is degraded by linear combination • Then, the contrast function can be computed only on candidates having a valid signature • Problems: how to choose the signature to reduce the probability that a linear combination of packets still carries a valid signature • Simple solution: odd-parity bit • Drastic reduction of the search space 74
  75. 75. CONCLUSION
  76. 76. Perspectives • “Classical” video coding: advanced models for rate control • 3D VC: – combined use of motion and disparity compensation to produce improved reference frames; – elastic deformation model for lossless coding of depth contours • DVC: – Improved SI generation using an elastic deformation model for estimating object shapes; – Geometry-based DVC system for MVD (no backward channel, no channel coding) • NC and streaming: use of “social” information to optimize interactive multiview streaming with a NC approach 76
  77. 77. New themes • Forensic, forgery detection • Feature representation and compression • Video protection • Immersive communications: holoscopy / holography, high dynamic range 77

×