Machine learning approaches are being explored for video compression. Conservative approaches replace individual MPEG blocks with deep learning blocks, while disruptive end-to-end approaches aim to replace the entire MPEG chain. Optical flow networks can exploit temporal redundancy by estimating motion between frames. Fully neural network-based video compression models jointly learn motion estimation, motion compression, and residual compression in an end-to-end optimized framework. However, performance gains must be balanced against increased complexity, and neural network approaches are not yet mature enough to be included in video compression standards.
PR-340: DVC: An End-to-end Deep Video Compression FrameworkHyeongmin Lee
이번 PR12 340번째 논문으로 소개드릴 내용은 Deep Learning을 이용한 Video Compression에 관한 내용입니다. 바로 이전 논문으로 Deep Learning을 이용한 Image Compression에 대해 설명드렸었는데요, 시간 여유가 있으신 분들께서는 이전 영상 먼저 보시고 오는 것을 추천드립니다 :)
이전 영상: https://www.youtube.com/watch?v=rtuJqQDWmIA
paper link: https://arxiv.org/abs/1812.00101
youtube link: https://youtu.be/Dd8Gj2ZITkA
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/person-re-identification-and-tracking-at-the-edge-challenges-and-techniques-a-presentation-from-the-university-of-auckland/
Morteza Biglari-Abhari, Senior Lecturer at the University of Auckland, presents the “Person Re-Identification and Tracking at the Edge: Challenges and Techniques” tutorial at the May 2021 Embedded Vision Summit.
Numerous video analytics applications require understanding how people are moving through a space, including the ability to recognize when the same person has moved outside of the camera’s view and then back into the camera’s view, or when a person has passed from the view of one camera to the view of another. This capability is referred to as person re-identification and tracking. It’s an essential technique for applications such as surveillance for security, health and safety monitoring in healthcare and industrial facilities, intelligent transportation systems and smart cities. It can also assist in gathering business intelligence such as monitoring customer behavior in shopping environments. Person re-identification is challenging.
In this talk, Biglari-Abhari discusses the key challenges and current approaches for person re-identification and tracking, as well as his initial work on multi-camera systems and techniques to improve accuracy, especially fusing appearance and spatio-temporal models. He also briefly discusses privacy-preserving techniques, which are critical for some applications, as well as challenges for real-time processing at the edge.
PR-340: DVC: An End-to-end Deep Video Compression FrameworkHyeongmin Lee
이번 PR12 340번째 논문으로 소개드릴 내용은 Deep Learning을 이용한 Video Compression에 관한 내용입니다. 바로 이전 논문으로 Deep Learning을 이용한 Image Compression에 대해 설명드렸었는데요, 시간 여유가 있으신 분들께서는 이전 영상 먼저 보시고 오는 것을 추천드립니다 :)
이전 영상: https://www.youtube.com/watch?v=rtuJqQDWmIA
paper link: https://arxiv.org/abs/1812.00101
youtube link: https://youtu.be/Dd8Gj2ZITkA
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/person-re-identification-and-tracking-at-the-edge-challenges-and-techniques-a-presentation-from-the-university-of-auckland/
Morteza Biglari-Abhari, Senior Lecturer at the University of Auckland, presents the “Person Re-Identification and Tracking at the Edge: Challenges and Techniques” tutorial at the May 2021 Embedded Vision Summit.
Numerous video analytics applications require understanding how people are moving through a space, including the ability to recognize when the same person has moved outside of the camera’s view and then back into the camera’s view, or when a person has passed from the view of one camera to the view of another. This capability is referred to as person re-identification and tracking. It’s an essential technique for applications such as surveillance for security, health and safety monitoring in healthcare and industrial facilities, intelligent transportation systems and smart cities. It can also assist in gathering business intelligence such as monitoring customer behavior in shopping environments. Person re-identification is challenging.
In this talk, Biglari-Abhari discusses the key challenges and current approaches for person re-identification and tracking, as well as his initial work on multi-camera systems and techniques to improve accuracy, especially fusing appearance and spatio-temporal models. He also briefly discusses privacy-preserving techniques, which are critical for some applications, as well as challenges for real-time processing at the edge.
Occlusion and Abandoned Object Detection for Surveillance ApplicationsEditor IJCATR
Object detection is an important step in any video analysis. Difficulties of the object detection are finding hidden objects
and finding unrecognized objects. Although many algorithms have been developed to avoid them as outliers, occlusion boundaries
could potentially provide useful information about the scene’s structure and composition. A novel framework for blob based occluded
object detection is proposed. A technique that can be used to detect occlusion is presented. It detects and tracks the occluded objects in
video sequences captured by a fixed camera in crowded environment with occlusions. Initially the background subtraction is modeled
by a Mixture of Gaussians technique (MOG). Pedestrians are detected using the pedestrian detector by computing the Histogram of
Oriented Gradients descriptors (HOG), using a linear Support Vector Machine (SVM) as the classifier. In this work, a recognition and
tracking system is built to detect the abandoned objects in the public transportation area such as train stations, airports etc. Several
experiments were conducted to demonstrate the effectiveness of the proposed approach. The results show the robustness and
effectiveness of the proposed method.
Thesis presentation Slides for Doctorale deplomat obtention of Ph.D. Aliouat Ahcen. Defended on 31-05-2023 in LASA Laboratory, Electronics Department, Faculty of Technology, Badji Mokhtar - Annaba University. Algeria
The presentation title is: Study and Implementation of an Object-based Video Encoder for Embedded Wireless Video Surveillance Systems.
Research Question: How can we detect ROI in a captured video to ensure high-quality encoding and transmission over a WMSN while minimizing bitrate and energy consumption?
Data compression has increased by leaps and bounds over the years due to technical innovation, enabling the proliferation of streamed digital multimedia and voice over IP. For example, a regular cadence of technical advancement in video codecs has led to massive reduction in file size – in fact, up to a 1000x reduction in file size when comparing a raw video file to a VVC encoded file. However, with the rise of machine learning techniques and diverse data types to compress, AI may be a compelling tool for next-generation compression, offering a variety of benefits over traditional techniques. In this presentation we discuss:
- Why the demand for improved data compression is growing
- Why AI is a compelling tool for compression in general
- Qualcomm AI Research’s latest AI voice and video codec research
- Our future AI codec research work and challenges
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
In October 2017, ISO/IEC JCT1 SC29/WG11 MPEG and ITU-T SG16/Q6 VCEG have jointly published a Call for Proposals on Video Compression with Capability beyond HEVC and its current extensions. It is targeting at a new generation of video compression technology that has substantially higher compression capability than the existing HEVC standard. The responses to the call are evaluated in April 2018, forming the kick-off for a new standardization activity in the Joint Video Experts Team (JVET) of VCEG and MPEG, with a target of finalization by the end of the year 2020. Three categories of video are addressed: Standard dynamic range video (SDR), high dynamic range video (HDR), and 360° video. While SDR and HDR cover variants of conventional video to be displayed e.g. on a suitable TV screen at very high resolution (UHD), the 360° category targets at videos capturing a full-degree surround view of the scene. This enables an immersive video experience with the possibility to look around in the rendered scene, e.g. when viewed using a head-mounted display. This application triggers various technical challenges which need to be addressed in terms of compression, encoding, transport, and rendering. The talk summarizes the current state of the complete standardization project. Focussing on the SDR and 360° video categories, it highlights the development of selected coding tools compared to the state of the art. Representative examples of the new technological challenges as well as corresponding proposed solutions are presented.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
A presentation introducting DeepLab V3+, the state-of-the-art architecture for semantic segmentation. It also includes detailed descriptions of how 2D multi-channel convolutions function, as well as giving a detailed explanation of depth-wise separable convolutions.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Occlusion and Abandoned Object Detection for Surveillance ApplicationsEditor IJCATR
Object detection is an important step in any video analysis. Difficulties of the object detection are finding hidden objects
and finding unrecognized objects. Although many algorithms have been developed to avoid them as outliers, occlusion boundaries
could potentially provide useful information about the scene’s structure and composition. A novel framework for blob based occluded
object detection is proposed. A technique that can be used to detect occlusion is presented. It detects and tracks the occluded objects in
video sequences captured by a fixed camera in crowded environment with occlusions. Initially the background subtraction is modeled
by a Mixture of Gaussians technique (MOG). Pedestrians are detected using the pedestrian detector by computing the Histogram of
Oriented Gradients descriptors (HOG), using a linear Support Vector Machine (SVM) as the classifier. In this work, a recognition and
tracking system is built to detect the abandoned objects in the public transportation area such as train stations, airports etc. Several
experiments were conducted to demonstrate the effectiveness of the proposed approach. The results show the robustness and
effectiveness of the proposed method.
Thesis presentation Slides for Doctorale deplomat obtention of Ph.D. Aliouat Ahcen. Defended on 31-05-2023 in LASA Laboratory, Electronics Department, Faculty of Technology, Badji Mokhtar - Annaba University. Algeria
The presentation title is: Study and Implementation of an Object-based Video Encoder for Embedded Wireless Video Surveillance Systems.
Research Question: How can we detect ROI in a captured video to ensure high-quality encoding and transmission over a WMSN while minimizing bitrate and energy consumption?
Data compression has increased by leaps and bounds over the years due to technical innovation, enabling the proliferation of streamed digital multimedia and voice over IP. For example, a regular cadence of technical advancement in video codecs has led to massive reduction in file size – in fact, up to a 1000x reduction in file size when comparing a raw video file to a VVC encoded file. However, with the rise of machine learning techniques and diverse data types to compress, AI may be a compelling tool for next-generation compression, offering a variety of benefits over traditional techniques. In this presentation we discuss:
- Why the demand for improved data compression is growing
- Why AI is a compelling tool for compression in general
- Qualcomm AI Research’s latest AI voice and video codec research
- Our future AI codec research work and challenges
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
In October 2017, ISO/IEC JCT1 SC29/WG11 MPEG and ITU-T SG16/Q6 VCEG have jointly published a Call for Proposals on Video Compression with Capability beyond HEVC and its current extensions. It is targeting at a new generation of video compression technology that has substantially higher compression capability than the existing HEVC standard. The responses to the call are evaluated in April 2018, forming the kick-off for a new standardization activity in the Joint Video Experts Team (JVET) of VCEG and MPEG, with a target of finalization by the end of the year 2020. Three categories of video are addressed: Standard dynamic range video (SDR), high dynamic range video (HDR), and 360° video. While SDR and HDR cover variants of conventional video to be displayed e.g. on a suitable TV screen at very high resolution (UHD), the 360° category targets at videos capturing a full-degree surround view of the scene. This enables an immersive video experience with the possibility to look around in the rendered scene, e.g. when viewed using a head-mounted display. This application triggers various technical challenges which need to be addressed in terms of compression, encoding, transport, and rendering. The talk summarizes the current state of the complete standardization project. Focussing on the SDR and 360° video categories, it highlights the development of selected coding tools compared to the state of the art. Representative examples of the new technological challenges as well as corresponding proposed solutions are presented.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
A presentation introducting DeepLab V3+, the state-of-the-art architecture for semantic segmentation. It also includes detailed descriptions of how 2D multi-channel convolutions function, as well as giving a detailed explanation of depth-wise separable convolutions.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The surveillance systems are expected to record the videos in 24/7 and obviously it requires a huge storage space. Even though the hard disks are cheaper today, the number of CCTV cameras is also vertically increasing in order to boost up security. The video compression techniques is the only better option to reduce required the storage space; however, the existing video compression techniques are not adequate at all for the modern digital surveillance system monitoring as they require huge video streams. In this paper, a novel video compression technique is presented with a critical analysis of the experimental results.
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdfVignesh V Menon
In two-pass encoding, also known as multi-pass encoding, the input video content is analyzed in the first-pass to help the second-pass encoding utilize better encoding decisions and improve overall compression efficiency. In live streaming applications, a single-pass encoding scheme is mainly used to avoid the additional first-pass encoding run-time to analyze the complexity of every video content. This paper introduces an Efficient low-latency Two-Pass encoding Scheme (ETPS) for live video streaming applications. In this scheme, Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment are extracted in the first-pass to predict each target bitrate’s optimal constant rate factor (CRF) for the second-pass
constrained variable bitrate (cVBR) encoding. Experimental results show that, on average, ETPS compared to a traditional two-pass average bitrate encoding scheme yields encoding time savings of 43.78% without any noticeable drop in compression efficiency. Additionally, compared to a single-pass constant bitrate (CBR) encoding, it yields bitrate savings of 10.89% and 8.60% to
maintain the same PSNR and VMAF, respectively.
In two-pass encoding, also known as multi-pass encoding, the input video content is analyzed in the first-pass to help the second-pass encoding utilize better encoding decisions and improve overall compression efficiency. In live streaming applications, a single-pass encoding scheme is mainly used to avoid the additional first-pass encoding run-time to analyze the complexity of every video content. This paper introduces an Efficient low-latency Two-Pass encoding Scheme (ETPS) for live video streaming applications. In this scheme, Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment are extracted in the first-pass to predict each target bitrate’s optimal constant rate factor (CRF) for the second-pass constrained variable bitrate (cVBR) encoding. Experimental results show that, on average, ETPS compared to a traditional two-pass average bitrate encoding scheme yields encoding time savings of 43.78% without any noticeable drop in compression efficiency. Additionally, compared to a single-pass constant bitrate (CBR) encoding, it yields bitrate savings of 10.89% and 8.60% to maintain the same PSNR and VMAF, respectively.
NEW IMPROVED 2D SVD BASED ALGORITHM FOR VIDEO CODINGcscpconf
Video compression is one of the most important blocks of an image acquisition system.
Compression of video results in reduction of transmission bandwidth. In real time video
compression the incoming video data is directly compressed without being stored first.
Therefore real time video compression system operates under stringent timing constraints.
Current video compression standards like MPEG, H.26x series, involve emotion estimation and
compensation blocks which are highly computationally expensive and hence they are not
suitable for real time applications on resource scarce systems. Current applications like video
calling, video conferencing require low complexity video compression algorithms so that they
can be implemented in environments that have scarce computational resources (like mobile
phones). A low complexity video compression algorithm based on 2D SVD exists. In this paper, a modification to that algorithm which provides higher PSNR at the same bit rate is presented.
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Vignesh V Menon
Abstract: In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized
number of CPU threads for each encoding instance may result in
(i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency
Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each
representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70 %, a reduction in the total number of CPU threads used by 63.83 %, and a 37.87 % reduction in the overall encoding time, considering a JND of six VMAF points.
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Alpen-Adria-Universität
In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70%, a reduction in the total number of CPU threads used by 63.83%, and a 37.87% reduction in the overall encoding time, considering a JND of six VMAF points.
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
Global interconnect planning becomes a challenge as semiconductor technology continuously scales. Because of the increasing wire resistance and higher capacitive coupling in smaller features, the delay of global interconnects becomes large compared with the delay of a logic gate, introducing a huge performance gap that needs to be resolved A novel equalized global link architecture and driver– receiver co design flow are proposed for high-speed and low-energy on-chip communication by utilizing a continuous-time linear equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver–receiver design flow, over 50% energy reduction is observed.
Optimal coding unit decision for early termination in high efficiency video c...IJECEIAES
Video compression is an emerging research topic in the field of block based video encoders. Due to the growth of video coding technologies, high efficiency video coding (HEVC) delivers superior coding performance. With the increased encoding complexity, the HEVC enhances the rate-distortion (RD) performance. In the video compression, the out-sized coding units (CUs) have higher encoding complexity. Therefore, the computational encoding cost and complexity remain vital concerns, which need to be considered as an optimization task. In this manuscript, an enhanced whale optimization algorithm (EWOA) is implemented to reduce the computational time and complexity of the HEVC. In the EWOA, a cosine function is incorporated with the controlling parameter A and two correlation factors are included in the WOA for controlling the position of whales and regulating the movement of search mechanism during the optimization and search processes. The bit streams in the Luma-coding tree block are selected using EWOA that defines the CU neighbors and is used in the HEVC. The results indicate that the EWOA achieves best bit rate (BR), time saving, and peak signal to noise ratio (PSNR). The EWOA showed 0.006-0.012 dB higher PSNR than the existing models in the real-time videos.
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
Global interconnect planning becomes a challenge as semiconductor technology continuously scales. Because of the increasing wire resistance and higher capacitive coupling in smaller features, the delay of global interconnects becomes large compared with the delay of a logic gate, introducing a huge performance gap that needs to be resolved A novel equalized global link architecture and driver– receiver co design flow are proposed for high-speed and low-energy on-chip communication by utilizing a continuous-time linear equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver–receiver design flow, over 50% energy reduction is observed.
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
The video codecs are focusing on a smart transition in this era. A future area of research that has not yet been fully investigated is the effect of deep learning on video compression. The paper’s goal is to reduce the ringing and artifacts that loop filtering causes when high-efficiency video compression is used. Even though there is a lot of research being done to lessen this effect, there are still many improvements that can be made. In This paper we have focused on an intelligent solution for improvising in-loop filtering in high efficiency video coding (HEVC) using a deep convolutional neural network (CNN). The paper proposes the design and implementation of deep CNN-based loop filtering using a series of 15 CNN networks followed by a combine and squeeze network that improves feature extraction. The resultant output is free from double enhancement and the peak signal-to-noise ratio is improved by 0.5 dB compared to existing techniques. The experiments then demonstrate that improving the coding efficiency by pipelining this network to the current network and using it for higher quantization parameters (QP) is more effective than using it separately. Coding efficiency is improved by an average of 8.3% with the switching based deep CNN in-loop filtering.
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVCVignesh V Menon
Abstract: High Efficiency Video Coding (HEVC) improves the encoding efficiency by utilizing sophisticated tools such as flexible Coding Tree Units(CTUs) partitioning. TheCodingUnits(CUs) can be split recursively into four equally sized CUs ranging from 64×64 to 8×8 pixels. At each depth level (or CU size), intra prediction via exhaustive mode search was exploited in HEVC to improve the encoding efficiency and result in a very high encoding time complexity. This paper proposes an Intra CU Depth Prediction(INCEPT) algorithm, which limits Rate-Distortion Optimization(RDO) for each CTU in HEVC by utilizing the spatial correlation with the neighboring CTUs, which is computed using a DCT energy-based feature. Thus, INCEPT reduces the number of candidate CU sizes required to be considered for each CTU in HEVC intra coding. Experimental results show that the INCEPT algorithm achieves a better trade-off between the encoding efficiency and encoding time saving (i.e., BDR/∆T) than the benchmark algorithms. While BDR/∆T is12.35% and 9.03% for the benchmark algorithms, it is 5.49% for the proposed algorithm. As a result, INCEPT achieves a 23.34% reduction in encoding time on average while incurring only a 1.67% increase in bitrate than the original coding in the x265 HEVC open-source encoder.
High Efficiency Video Coding (HEVC) improves the encoding efficiency by utilizing sophisticated tools such as flexible Coding Tree Unit (CTU) partitioning. The Coding Unit (CU) can be split recursively into four equally sized CUs ranging from 64×64 to 8×8 pixels. At each depth level (or CU size), intra prediction via exhaustive mode search was exploited in HEVC to improve the encoding efficiency and result in a very high encoding time complexity. This paper proposes an Intra CU Depth Prediction (INCEPT) algorithm, which limits Rate-Distortion Optimization (RDO) for each CTU in HEVC by utilizing the spatial correlation with the neighboring CTUs, which is computed using a DCT energy-based feature. Thus, INCEPT reduces the number of candidate CU sizes required to be considered for each CTU in HEVC intra coding. Experimental results show that the INCEPT algorithm achieves a better trade-off between the encoding efficiency and encoding time saving (i.e., BDR/∆T) than the benchmark algorithms. While BDR/∆T is 12.35% and 9.03% for the benchmark algorithms, it is 5.49% for the proposed algorithm. As a result, INCEPT achieves a 23.34% reduction in encoding time on average while incurring only a 1.67% increase in bit rate than the original coding in the x265 HEVC open-source encoder.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Machine Learning approaches at video compression
1. Machine Learning approaches at
video compression
Roberto Iacoviello
RAI - Radiotelevisione Italiana
Centre for Research, Technological Innovation and
Experimentation (CRITS)
2. Machine Learning is like sex in high school.
Everyone is talking about it, a few know what to
do, and only your teacher is doing it
There are 2/3 topics around AI: Ethics, that
sounds to me like if we don’t teach ethics to
the machines, Skynet will kill all of us.
Academic paper full of mathematics and
different notations. After you read them you
feel like: Ok, and now?
Then there is the real life: sometime is good
and sometimes is bad.
3. Dear old typical hybrid block based approach
Many new tools in VVC: Versatile Video
Coding. MPEG group in 30 years has
developed many useful standards but
based on the same schema. Now the
group is going towards new horizons:
neural networks.
4. Two approaches:
NON Video approach: coded representation of neural network
Neural Network Video approach
Conservative Disruptive
One to One End to End
Replace one MPEG block with
one Deep Learning block
Replace the entire chain MPEG
5. Non-video approach: coded representation of
neural networks
Scope: Representation
of weights
and parameters,
no architecture
N18162 Marrakech
7. Coded representation of neural networks
Represent different artificial neural network
Enable faster inference
Enable use under resource limitations
8. Use cases
• Inference may be performed on a large number of devices
• The NNs used in an application can be improved incrementally
• Limitations in terms of processing power and memory
• Several apps would need to store on the device the
same base neural network multiple times
8
W17924 Macao
Type Parameter’s Size
Media content analysis From few KB to several
hundreds of MB
Translational app Currently around 200MB
Compact Descriptors for
Video Analysis (CDVA)
About 500-600 MB
9. MPEG Use cases
• UC10 Distributed training and evaluation of neural networks
for media content analysis
• UC11 Compact Descriptors for Video Analysis (CDVA)
• UC12 Image/Video Compression
• UC13 Distribution of neural networks for content processing
W17924 Macao
10. Dropping connections
Dropping layers
Replacing convolutions with
lower Dimensional ones Matrix
decomposition
Changing stride in convolutions
without Increasing output size
Quantization (rate distortion based)
Quantization using codebook
Entropy coding
Methods
Summary: cut Something
Somewhere
11. • Uniform Quantization
• Sequential Quantization
• Nonuniform Quantization
• Low-Rank Approximation
M47704, Geneva
Methods
Original Weight
(32-bits)
Quantization Stage 1Quantization Stage 1
Quantization1
(10-bits)
DeQuantization 1
Quantization2
(8-bits)
Compressed
Model
DeQuantization 2
(for inference)
Quantization Stage 2Quantization Stage 2
W x H
Conv
W x 1
Conv
1 x H
Conv
Relu
Relu
12. • “Importance” estimation step
• With the proper re-train the model with the constraints of fixed-point
weights, the model’s precision could be very closed to the floating-
point model
• Quantize the coefficients with different precision for different layers
Methods
13. Video approach: Conservative
Neural Network based Filter for Video Coding
Core Experiment 13 on neural network based filter for video coding
Investigate the following problems:
The impact of NN filter position in the filter chain
The generalization capability of the NN: performance change when the test QP is not the same
as the training QP
13
JVET-N0840-v1
14. CE13-2.1: Convolutional Neural Network Filter (CNNF) for
Intra Frame
JVET-N0169
Over VTM-4.0 All Intra
Y U V EncT DecT
DF+CNNF+SAO+ALF -3.48% -5.18% -6.77% 142% 38414%
CNNF+ALF -4.65% -6.73% -7.92% 149% 37956%
CNNF -4.14% -5.49% -6.70% 140% 38411%
Pay attention to
the decoding
time
16. CE13-1.1: Convolutional neural network loop filter
JVET-N0110-v1
Over VTM-4.0
Random Access
Y U V EncT DecT
-1.36% -14.96% -14.91% 100% 142%
17. Each category will investigate the following problems:
The impact of NN filter position in the filter chain: there is always objective gain
The generalization capability of the NN: results indicate that the difference is minor
Neural Network based Filter for Video Coding
JVET-N_Notes_dD
What MPEG has decided in the March meeting (25/3/2019):
The performance/complexity tradeoff indicates that the NN technology
currently is not mature enough to be included in a standard
As I
said…sometimes
life is bad
19. Neural Network Video approach: Disruptive
Videos are temporally highly
redundant
No deep image compression can
compete with state-of-the-art video
compression, which exploits this
redundancy
Optical Flow
20. Optical Flow
In the computer vision tasks, optical flow is widely used to exploit temporal
relationship
Learning based optical flow methods can provide accurate motion information at
pixel-level
Only artificial/synthetic data set
22. • Learning based optical flow estimation is utilized to obtain the motion
information and reconstruct the current frame
• End-to-end deep video compression model that jointly learns motion
estimation, motion compression, and residual compression
DVC: An End-to-end Deep Video Compression
Framework
23. DVC: An End-to-end Deep Video Compression
Framework
MPEG NN
𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒2 =
𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑜𝑓 𝑁𝑁 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒𝑠
26. MV Encoder and Decoder Network
DVC: An End-to-end Deep Video Compression
Framework
27. DVC: An End-to-end Deep Video Compression
Framework
Motion Compensation Network
28. DVC: An End-to-end Deep Video Compression
Framework
Residual Encoder Net
Bit Rate Estimation Net
29. Loss Function DVC: An End-to-end Deep Video
Compression Framework
The whole compression system is end-to-end optimized:
Rate Distortion Optimization Just one end to end
formula that jointly learns
motion estimation,
motion compression, and
residual compression
Residuals
entropy
Motion
entropy
30. Advantages of Neural Networks
Excellent content adaptivity
Improve coding efficiency by leveraging samples from far distance
Neural Network can well represent both texture and feature
The whole compression system is end-to-end optimized
31. Rai R&D : what we are doing
End to end chain
Issues:
Residuals compression
32. New EBU Distribution
Codecs activity
Please join the EBU Video Group
https://tech.ebu.ch/video
Please join the
EBU Video Group,
we’ll have lot of
fun!
33. Machine Learning approaches at
video compression
Roberto Iacoviello
roberto.iacoviello@rai.it
Grazie per l’attenzione
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0
Unported License
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
On your left there is the
reinforcement learning, that
means: this is the reward if
you contact me.