Video processing.pptx

Introduction
• Video processing is the manipulation and analysis of digital video
sequences.
• Basic video processing techniques include trimming, image resizing,
brightness and contrast adjustment, fade in and fade out, and analyzing.
• These tasks can be performed using a variety of ML techniques, including
deep learning, computer vision, and natural language processing.

Formats
• MP4 - commonly used for video compression.
• AVI - container format that was developed by Microsoft
• MOV - developed by Apple
• AVCHD - commonly used for video recorded by digital camcorders
and DSLR cameras.
• FLV - used for streaming video over the internet. FLV is commonly
used for videos on websites such as YouTube and Vimeo.

Key Concepts
• Compression - Compression is the process of reducing the size of a video file
while maintaining its quality. Video compression algorithms remove redundant
information.
• Frame - a frame refers to a single still image that makes up a sequence of images
(or "frames") that, when played back in rapid succession, create the illusion of
motion
• Frame rate- The frame rate is the number of frames displayed per second. It
determines the smoothness of the video.
• Resolution - Resolution refers to the number of pixels in a video frame. A higher
resolution means more pixels and better quality.
• Aspect ratio - The aspect ratio is the ratio of the width of a video frame to its
height. Common aspect ratios include 4:3 and 16:9

Video Processing Techniques
• Compression: to reduce the size of a video file while maintaining its
quality.
• Enhancement: to improve the visual quality of a video, such as noise
reduction, color correction, and sharpening.
• Restoration: to repair or improve the quality of a video that has been
degraded by noise, blur, or other factors.
• Analysis: to extract information from video sequences, such as object
tracking, facial recognition, and scene analysis.

Video Compression
• Inter-frame compression is a technique that reduces the amount of data
needed to represent a video by only storing the differences between
consecutive frames, instead of storing each frame in its entirety.
• This is done by comparing each frame to the preceding one and only
storing the changes, rather than the entire frame.
• Most commonly used algorithms are H.264, VP9, HEVC , etc..

Video Compression
• Intra-frame compression, also known as intra-coded compression,
works by compressing each frame individually.
• It uses techniques such as Discrete Cosine Transform (DCT) and
Color Quantization to compress the data of each frame.
• Discrete Cosine - technique applied to image pixels in spatial domain
in order to transform them into a frequency domain in which
redundancy can be identified
• Quantization - the process of mapping continuous infinite values to a
smaller set of discrete finite values.

Video Compression
• Lossy compression is a technique that removes some of the data from
the video in order to reduce its file size.
• This can be done in various ways, such as by removing redundant data
or by removing information that the human eye is less likely to notice.
• The result is a lower quality video but with a smaller file size.
• It's worth noting that most of the existing video compression standard
use inter-frame compression as it's more efficient than intra-frame
compression

Video Compression
• Fractal Compression: This technique uses fractal mathematics to
compress an image. The image is broken down into smaller fractal
patterns, which can be used to recreate the original image with a
smaller file size.
• Vector Quantization: This technique groups similar image features
together and replaces them with a single symbol. This reduces the
amount of data needed to represent the image and can be applied on
both grayscale and color images.
• Run-Length Encoding: This technique is used for images with large
areas of uniform color. It replaces repeating pixels with a single
symbol, reducing the amount of data needed to represent the image.

Video Enhancement Techniques
• Super-resolution: increases the resolution of a video by estimating and
reconstructing high-resolution details from low-resolution frames.
• Denoising: reduces noise in a video by removing or reducing random
variations in pixel intensity.
• Color correction: improves the color accuracy of a video by adjusting the color
balance, saturation, and brightness.
• Deblurring: removes blur from a video caused by camera shake or fast-moving
objects.
• Stabilization: removes jitter or shake from a video caused by camera
movement.

Video Super Resolution
• Video super-resolution (VSR) is a technique used to increase the resolution of a video by
estimating and reconstructing high-resolution details from low-resolution frames. Some
common video super-resolution techniques include:
• Interpolation-based methods: These methods use interpolation algorithms such as bicubic
or Lanczos to estimate missing pixels in the high-resolution version of the video.
• Reconstruction-based methods: These methods use image or video processing techniques
to reconstruct the high-resolution version of the video. Examples of these methods
include single image super-resolution (SISR) and video super-resolution (VSR).
• Deep learning-based methods: These methods use deep neural networks (DNNs) to learn
the mapping between low-resolution and high-resolution images. Examples of these
methods include deep convolutional neural networks (CNNs) and generative adversarial
networks (GANs).
• Hybrid methods : These methods combine multiple techniques to achieve the best results.
For example, combining deep learning with interpolation or reconstruction-based
methods.

Interpolation
• Interpolation-based methods for video super-resolution (VSR) use interpolation algorithms to
estimate missing pixels in the high-resolution version of the video. Some common interpolation-
based VSR methods include:
• Nearest-neighbor interpolation: This method replicates the value of the nearest pixel to fill in
missing pixels. It is simple to implement but can introduce "blocky" artifacts in the output.
• Bilinear interpolation: This method uses the weighted average of the four closest pixels to estimate
the value of missing pixels. It is a more sophisticated method than nearest-neighbor interpolation
but can still introduce some artifacts in the output.
• Bicubic interpolation: This method uses the weighted average of the 16 closest pixels to estimate
the value of missing pixels. It is more sophisticated than bilinear interpolation and typically
produces better results, but it is also more computationally expensive.
• Lanczos interpolation: This method uses a sinc function to estimate the value of missing pixels. It
is a highly sophisticated method and is known for producing the best results among interpolation-
based methods, but it is also the most computationally expensive.

Reconstruction-based Methods
• Reconstruction-based methods for video super-resolution (VSR) use
image or video processing techniques to reconstruct the high-
resolution version of the video.
• These methods typically involve building a model of the image or
video and using that model to generate the high-resolution version.
• Some common reconstruction-based VSR methods include:
• Optical flow-based methods: These methods use the motion information
between the frames to estimate the high-resolution frame.
• Spatial-Temporal Super-Resolution methods: These methods use a
combination of spatial and temporal information in order to increase the
resolution of the video.

Optical Flow-based Methods
• Optical flow-based methods for video super-resolution (VSR) use the motion information
between frames to estimate the high-resolution frame.
• These methods work by estimating the motion vectors between low-resolution frames and
using these vectors to warp the pixels of one frame to match the position of the pixels in
another frame.
• The high-resolution frame can then be reconstructed by combining the warped frames.
• Optical flow-based VSR methods typically involve the following steps:
• Estimating the optical flow: This step involves estimating the motion vectors between low-
resolution frames using techniques such as Lucas-Kanade, Horn-Schunck or deep learning-based
optical flow estimation.
• Warping frames: This step involves using the motion vectors to warp the pixels of one frame to
match the position of the pixels in another frame.
• Combining frames: This step involves combining the warped frames to form the high-resolution
frame. This can be done by averaging, weighted averaging, or median filtering the warped frames.

Spatial-Temporal Super-Resolution
• Spatial-Temporal Super-Resolution (STSR) is a method that combines spatial and
temporal information in order to increase the resolution of a video.
• These methods utilize both the spatial information of the individual frames and the
temporal information between frames to generate high-resolution video.
• STSR methods typically involve the following steps:
• Spatial resolution enhancement: This step involves enhancing the resolution of each individual
frame of the video using interpolation-based methods, SISR or DNN-based methods.
• Temporal information extraction: This step involves extracting temporal information from the
video, such as motion vectors or optical flow, that can be used to align the frames and improve the
resolution of the video.
• Temporal resolution enhancement: This step involves using the extracted temporal information to
align and fuse the frames to generate the high-resolution video.
• STSR methods can be effective for VSR, especially when applied to videos with complex
temporal dynamics such as fast moving objects or complex background. These methods
can also be robust to occlusions and motion discontinuities. However, these methods can
be computationally expensive, especially when extracting temporal information.

Denoising in Video Enhancement
• Denoising in video enhancement is the process of removing noise from a video in
order to improve its visual quality.
• Noise in videos can be caused by various factors such as low-light conditions,
electronic noise in the camera sensor, or compression artifacts.
• There are several methods for denoising videos, including:
• Spatial filtering: This method involves applying a filter to each frame of the video to reduce
noise. Examples of spatial filters include median filters and Gaussian filters.
• Temporal filtering: This method involves using information from multiple frames of the video
to reduce noise. Examples of temporal filters include Kalman filters and recursive filters.
• Non-local Means filter: This method is a spatial-temporal filter that uses information from
similar pixels in other frames to remove noise.
• Deep learning-based methods: These methods use deep neural networks (DNNs) to learn the
mapping between noisy and denoised videos. Examples of DNN-based denoising methods
include autoencoder-based and UNet-based methods.
• Hybrid methods: These methods combine spatial, temporal, and deep learning-based methods
to denoise videos.

Spatial Filtering
• Spatial filtering is a method for denoising videos that involves applying a filter to each
frame of the video to reduce noise.
• These filters operate on the spatial domain, meaning that they process the pixels in each
frame independently of the pixels in other frames.
• Spatial filtering methods are fast and easy to implement, and can be useful for removing
noise such as sensor noise, impulse noise, or salt-and-pepper noise.
• Some examples of spatial filters include:
• Median filter: This filter replaces the value of a pixel with the median value of the pixels in a
neighborhood around it. It is effective at removing salt-and-pepper noise but can blur fine details.
• Gaussian filter: This filter replaces the value of a pixel with a weighted average of the pixels in a
neighborhood around it, where the weighting is determined by a Gaussian function. It is effective at
removing Gaussian noise but can blur fine details.
• Mean filter: This filter replaces the value of a pixel with the mean value of the pixels in a
neighborhood around it. It is effective at removing impulse noise but can blur fine details.
• Bilater filter: This filter is a combination of a Gaussian filter and a mean filter. It smooths the image
while preserving the edges.

Temporal Filtering
• Temporal filtering is a method for denoising videos that involves using information from multiple
frames of the video to reduce noise.
• These filters operate in the temporal domain, meaning that they process the pixels in each frame in
relation to the pixels in other frames.
• Temporal filtering methods can be more effective at removing noise such as temporal noise (noise
that changes over time), camera shake, or compression artifacts.
• Some examples of temporal filters include:
• Kalman filter: This filter uses a mathematical model to estimate the state of a system over time, and is used to
predict the current frame based on the previous frames. It can be effective at removing temporal noise, but can
be computationally expensive.
• Recursive filter: This filter uses recursive algorithms to estimate the current frame based on the previous
frames. It is similar to the Kalman filter but more computationally efficient.
• Optical flow-based filter: This filter uses optical flow to align frames, and then uses spatial filtering to remove
noise. It can be effective at removing noise caused by camera shake but can be sensitive to occlusions and
motion discontinuities.
• Recurrent neural networks (RNN): This filter uses a recurrent neural network to estimate the current frame
based on the previous frames. It is similar to the Kalman filter but uses a deep learning approach, can be more
powerful in removing noise and can be computationally expensive.

The Non-local Means
• The Non-local Means (NLM) filter is a method for denoising videos that uses
information from similar pixels in other frames to remove noise.
• It is a spatial-temporal filter, meaning that it processes the pixels in each frame in
relation to the pixels in other frames and in a neighborhood around them.
• The NLM filter operates in the following steps:
• For each pixel in the current frame, it searches for similar pixels in other frames.
• It computes a weighted average of the similar pixels, where the weighting is determined by a
similarity metric such as the Euclidean distance.
• It replaces the value of the current pixel with the computed weighted average.
• The NLM filter is effective at removing noise such as temporal noise, camera
shake, and compression artifacts. It can also preserve edges and fine details better
than spatial filters. However, it can be computationally expensive, as it requires
searching for similar pixels in other frames.

Color correction
• Color correction is the process of adjusting the colors of a video to improve its visual
quality. It can be used to correct color imbalances, fix exposure issues, and improve the
overall color and tone of the video. There are several techniques that can be used for
color correction:
1.White balance: This technique is used to correct the color cast of a video caused by
different lighting conditions. It can be done by adjusting the color temperature of the video
to make it appear more neutral.
2.Color grading: This technique is used to adjust the overall color and tone of a video. It can
be done by adjusting the brightness, contrast, saturation, and hue of the video.
3.Curves: This technique allows for fine-grained color correction by adjusting the brightness
levels of individual colors.
4.LUTs (Lookup tables): A LUT is a predefined table that maps the input colors to output
colors. Using a LUT allows for fast and consistent color correction across multiple shots.
5.Color matching: This technique is used to match the colors of different shots or scenes. It
can be done by adjusting the colors of one shot to match the colors of another shot.
6.Machine Learning-based methods: These methods use machine learning algorithms to
learn the underlying structure of the video, and then use this knowledge to correct the
color.

Deblurring:
• Deblurring, also known as image restoration, is the process of removing blur from an image or video
caused by factors such as camera shake, fast motion, or a small aperture. There are several
techniques that can be used for deblurring:
1. Inverse Filtering: This technique uses a known blur function to reverse the blurring effect. This
method is highly sensitive to noise and is usually not used in practice.
2. Wiener Filtering: This technique uses a statistical model of the image and the blur function to
estimate the original image. This method is less sensitive to noise but can still produce poor results.
3. Blind Deconvolution: This technique is used when the blur function is not known. It attempts to
estimate both the blur function and the original image simultaneously. This method can produce
good results but is highly sensitive to noise and initialization.
4. Regularization-based methods: These methods add a regularization term to the objective function
to prevent overfitting. Examples include Tikhonov regularization, Total Variation regularization, and
Sparse Representation based methods.
5. Machine Learning-based methods: These methods use machine learning algorithms such as Deep
Learning to learn the underlying structure of the image and then use this knowledge to deblur the
image.

Stabilization
• Video stabilization is the process of removing the unwanted camera shake or jitter from a
video. It is used to make the video appear smoother and more stable. There are several
techniques that can be used for video stabilization:
1.Optical Flow: This technique uses the motion of the pixels between consecutive frames to
estimate the camera motion. The video is then compensated for this motion by aligning
the frames.
2.Feature-based: This technique uses the features such as points, edges or corners in the
video to estimate the camera motion. These features are tracked between consecutive
frames to estimate the motion.
3.Hybrid methods: These methods combine the above techniques. They first use feature-
based methods to estimate the motion, then use optical flow to refine the motion estimate.
4.Gyroscopic stabilization: This technique uses a gyroscopic sensor to measure the rotation
of the camera. The video is then compensated for this rotation by aligning the frames.
5.Machine Learning-based methods: These methods use machine learning algorithms such
as Deep Learning to learn the underlying structure of the video and then use this
knowledge to stabilize the video.

Segmentation
• Two are the most widely used segmentation techniques
• Semantic segmentation: This involves dividing a video into segments
based on semantic content, such as by identifying and separating
different objects or regions in a video, and then classifying them into
semantic categories.
• Motion segmentation: This involves dividing a video into segments
based on motion, such as by identifying and separating different
moving objects or regions in a video.

RCNN
• Generate initial sub-
segmentation, we generate many
candidate regions
• Use greedy algorithm to
recursively combine similar
regions into larger ones
• Use the generated regions to
produce the final candidate
region proposals

Fast-rcnn
• The same author of the previous
paper(R-CNN) solved some of the
drawbacks of R-CNN to build a faster
object detection algorithm and it was
called Fast R-CNN.
• The approach is similar to the R-CNN
algorithm.
• But, instead of feeding the region
proposals to the CNN, we feed the
input image to the CNN to generate a
convolutional feature map.

Faster-rcnn
• R-CNN & Fast R-CNN uses
selective search to find out the
region proposals.
• In faster-rcnn, Lets the network
learn the region proposals.

background subraction
• Frame differencing: Compares each frame to the
previous frame and detects changes.
• Running average: Keeps a running average of the
background and detects changes that deviate
from the average.
• Gaussian mixture model: Uses a statistical model
to represent the background and detect changes.

Optical flow and Clustering-based methods
• Optical flow algorithms compute the motion of each pixel in the
image by analyzing the changes in the pixel's position and color
from one frame to the next.
• Clustering-based methods involve grouping pixels or regions use
a set of features, such as color, texture, or motion information, to
represent the pixels or regions, and then apply a clustering
algorithm to group similar features together.
• popular clustering algorithms used for motion segmentation
include k-means, mean-shift, and Gaussian mixture models.

Video content Analysis
• Video content analysis deals with the extraction of metadata
from raw video to be used as components for further processing
in applications such as search, summarization, classification or
event detection.
• The main goal of video analytics is to automatically recognize
temporal and spatial events in videos.
• This technical capability is used in a wide range of domains including
entertainment, video retrieval and video browsing, health-care, retail,
automotive, transport, home automation, flame and smoke
detection, safety, and security

How does video analytics work?
• Video content analysis can be done in two different ways:
i. In real time, by configuring the system to trigger alerts for specific events
and incidents that unfold in the moment.
ii. In post processing, by performing advanced searches to facilitate forensic
analysis tasks.
• Feeding the system:The data being analyzed can come from various
streaming video sources. The most common are CCTV cameras, traffic
cameras and online video feeds.
• A key goal is coverage: we need to have a clear view of the entire area, and
from various angles,

Central processing vs edge
processing
• Video analysis software can be run centrally on servers that are generally
located in the monitoring station, which is known as central processing.
• Or, it can be embedded in the cameras themselves, a strategy known as edge
processing.
• With a hybrid approach, the processing performed by the cameras reduces
the data being processed by the central servers

Classification of Video Analysis

Tools and Techniques for Video
Analysis

Facial Recognition in Video
Analysis
• Facial recognition systems that can identify or
verify a person from a digital image or video
find application in a variety of contexts.
• Facial recognition works in two parts: face
detection and face identification.
i. In the first stage, the system detects faces
in the input data using methods like
background subtraction.
ii. Next, it measures the facial features to
define facial landmarks and tries to match
them with a known dataset. Based on the
percentage of accuracy of match, the faces
can be recognized or classified as unknown.

• Dlib’s face landmark predictor to detect a face
and extract features such as eyes, mouth,
brows, nose, and jawline.
• The image was standardized by cropping to
include just these features and aligning it
based on the location of eyes and the bottom
lip.
• The preprocessed image was then mapped to a
numerical vector representation. An algorithmic
comparison of the vector images made facial
recognition possible.

Detecting Motion
• We compare each frame of a video stream to the previous one
and detect all spots that have changed.
• We convert the image to gray and smooth it out a bit by blurring
the image. Converting to grey converts all RGB pixels to a value
between 0 and 255 where 0 is black and 255 is white.

• We’ll compare the previous frame with the current one by
examining the pixel values. Remember that since we’ve
converted the image to grey all pixels are represented by a
single value between 0 and 255.
• We use Threshold function “cv2.threshold” to convert each
pixel to either 0 (white) or 1 (black). The threshold for this is 20.

• Finding areas and contouring: We want to find the area that
has changed since the last frame, not each pixel. In order to do
so, we first need to find an area.
• cv.findContours it retrieves contours or outer limits from each
white spot from the part above.

Video processing.pptx

Recommended

Recommended

More Related Content

Similar to Video processing.pptx

Similar to Video processing.pptx (20)

Recently uploaded

Recently uploaded (20)

Video processing.pptx