This document describes the development of a 3D convolutional neural network (CNN) model to recognize human activities using moderate computation capabilities. The model is trained on the KTH dataset, which contains activities like walking, running, jogging, handwaving, handclapping, and boxing. The proposed model uses 3D CNN layers and max pooling layers to extract both spatial and temporal features from video frames. Testing achieved an accuracy of 93.33% for activity recognition. The number of model parameters and operations are also calculated to show the model can perform human activity recognition with reasonable computational requirements suitable for devices with moderate capabilities.
This document presents a method for image upscaling using a fuzzy ARTMAP neural network. It begins with an introduction to image upscaling and interpolation techniques. It then provides background on ARTMAP neural networks and fuzzy logic. The proposed method uses a linear interpolation algorithm trained with an ARTMAP network. Results show the method performs better than nearest neighbor interpolation in terms of peak signal-to-noise ratio, mean squared error, and structural similarity, though not as high as bicubic interpolation. Overall, the fuzzy ARTMAP network provides an effective way to perform image upscaling with fewer artifacts than traditional methods.
This document discusses data hiding techniques for images. It begins by introducing steganography and some common image steganography methods like LSB substitution, blocking, and palette modification. It then reviews related work on minimizing distortion in steganography, modifying matrix encoding for minimal distortion, and designing adaptive steganographic schemes. The document proposes using a universal distortion measure to evaluate embedding changes independently of the domain. It presents a system for reversible data hiding in encrypted images that partitions the image, encrypts it, hides data in the encrypted image, and allows extraction from the decrypted or encrypted image. Least significant bit substitution is discussed as an approach for hiding data in the encrypted image.
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ability. Based on this, a joint framework based on neural network and scale compression is proposed in this paper. The framework first encodes the incoming PNG image information, and then the image is converted into binary input decoder to reconstruct the intermediate state image, next, we import the intermediate state image into the zooming compressor and repressurize it, and reconstruct the final image. From the experimental results, this method can better process the digital image and suppress the reverse expansion problem, and the compression effect can be improved by 4 to 10 times as much as that of using RNN alone, showing better ability in the application. In this paper, the method is transmitted over a digital image, the effect is far better than the existing compression method alone, the Human visual system cannot feel the change of the effect.
Image Compression and Reconstruction Using Artificial Neural NetworkIRJET Journal
1) The document presents a neural network based method for image compression and reconstruction. An artificial neural network is used to compress image data for storage or transmission and then restore the image when desired.
2) The neural network accepts image data as input, compresses it by generating an internal representation, and then decompresses the data to reconstruct the original image.
3) The performance of the neural network method for image compression and reconstruction is evaluated using standard test images. Results show that it achieves high compression ratios and low distortion while maintaining its ability to generalize and is robust.
Artificial Neural Network (ANN) is a fast-growing method which has been used in different
industries during recent years. The main idea for creating ANN which is a subset of artificial
intelligence is to provide a simple model of human brain in order to solve complex scientific and
industrial problems. ANNs are high-value and low-cost tools in modelling, simulation, control,
condition monitoring, sensor validation and fault diagnosis of different systems. It have high
flexibility and robustness in modeling, simulating and diagnosing the behavior of rotating machines
even in the presence of inaccurate input data. They can provide high computational speed for
complicated tasks that require rapid response such as real-time processing of several simultaneous
signals. ANNs can also be used to improve efficiency and productivity of energy in rotating
equipment
Image compression and reconstruction using a new approach by artificial neura...Hưng Đặng
This document describes a neural network approach to image compression and reconstruction. It discusses using a backpropagation neural network with three layers (input, hidden, output) to compress an image by representing it with fewer hidden units than input units, then reconstructing the image from the hidden unit values. It also covers preprocessing steps like converting images to YCbCr color space, downsampling chrominance, normalizing pixel values, and segmenting images into blocks for the neural network. The neural network weights are initially randomized and then trained using backpropagation to learn the image compression.
This document provides an exploratory review of soft computing techniques for image segmentation. It discusses various segmentation techniques including discontinuity-based techniques like point, line and edge detection using spatial filtering. Thresholding techniques like global, adaptive and multi-level thresholding are also covered. Region-based techniques such as region growing, region splitting/merging and morphological watersheds are summarized. The document concludes that future work can focus on developing genetic segmentation filters using a genetic algorithm approach for medical image segmentation.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
This document presents a method for image upscaling using a fuzzy ARTMAP neural network. It begins with an introduction to image upscaling and interpolation techniques. It then provides background on ARTMAP neural networks and fuzzy logic. The proposed method uses a linear interpolation algorithm trained with an ARTMAP network. Results show the method performs better than nearest neighbor interpolation in terms of peak signal-to-noise ratio, mean squared error, and structural similarity, though not as high as bicubic interpolation. Overall, the fuzzy ARTMAP network provides an effective way to perform image upscaling with fewer artifacts than traditional methods.
This document discusses data hiding techniques for images. It begins by introducing steganography and some common image steganography methods like LSB substitution, blocking, and palette modification. It then reviews related work on minimizing distortion in steganography, modifying matrix encoding for minimal distortion, and designing adaptive steganographic schemes. The document proposes using a universal distortion measure to evaluate embedding changes independently of the domain. It presents a system for reversible data hiding in encrypted images that partitions the image, encrypts it, hides data in the encrypted image, and allows extraction from the decrypted or encrypted image. Least significant bit substitution is discussed as an approach for hiding data in the encrypted image.
MULTIPLE RECONSTRUCTION COMPRESSION FRAMEWORK BASED ON PNG IMAGEijcsity
It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ability. Based on this, a joint framework based on neural network and scale compression is proposed in this paper. The framework first encodes the incoming PNG image information, and then the image is converted into binary input decoder to reconstruct the intermediate state image, next, we import the intermediate state image into the zooming compressor and repressurize it, and reconstruct the final image. From the experimental results, this method can better process the digital image and suppress the reverse expansion problem, and the compression effect can be improved by 4 to 10 times as much as that of using RNN alone, showing better ability in the application. In this paper, the method is transmitted over a digital image, the effect is far better than the existing compression method alone, the Human visual system cannot feel the change of the effect.
Image Compression and Reconstruction Using Artificial Neural NetworkIRJET Journal
1) The document presents a neural network based method for image compression and reconstruction. An artificial neural network is used to compress image data for storage or transmission and then restore the image when desired.
2) The neural network accepts image data as input, compresses it by generating an internal representation, and then decompresses the data to reconstruct the original image.
3) The performance of the neural network method for image compression and reconstruction is evaluated using standard test images. Results show that it achieves high compression ratios and low distortion while maintaining its ability to generalize and is robust.
Artificial Neural Network (ANN) is a fast-growing method which has been used in different
industries during recent years. The main idea for creating ANN which is a subset of artificial
intelligence is to provide a simple model of human brain in order to solve complex scientific and
industrial problems. ANNs are high-value and low-cost tools in modelling, simulation, control,
condition monitoring, sensor validation and fault diagnosis of different systems. It have high
flexibility and robustness in modeling, simulating and diagnosing the behavior of rotating machines
even in the presence of inaccurate input data. They can provide high computational speed for
complicated tasks that require rapid response such as real-time processing of several simultaneous
signals. ANNs can also be used to improve efficiency and productivity of energy in rotating
equipment
Image compression and reconstruction using a new approach by artificial neura...Hưng Đặng
This document describes a neural network approach to image compression and reconstruction. It discusses using a backpropagation neural network with three layers (input, hidden, output) to compress an image by representing it with fewer hidden units than input units, then reconstructing the image from the hidden unit values. It also covers preprocessing steps like converting images to YCbCr color space, downsampling chrominance, normalizing pixel values, and segmenting images into blocks for the neural network. The neural network weights are initially randomized and then trained using backpropagation to learn the image compression.
This document provides an exploratory review of soft computing techniques for image segmentation. It discusses various segmentation techniques including discontinuity-based techniques like point, line and edge detection using spatial filtering. Thresholding techniques like global, adaptive and multi-level thresholding are also covered. Region-based techniques such as region growing, region splitting/merging and morphological watersheds are summarized. The document concludes that future work can focus on developing genetic segmentation filters using a genetic algorithm approach for medical image segmentation.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
This document discusses parallelizing graph algorithms on GPUs for optimization. It summarizes previous work on parallel Breadth-First Search (BFS), All Pair Shortest Path (APSP), and Traveling Salesman Problem (TSP) algorithms. It then proposes implementing BFS, APSP, and TSP on GPUs using optimization techniques like reducing data transfers between CPU and GPU and modifying the algorithms to maximize GPU computing power and memory usage. The paper claims this will improve performance and speedup over CPU implementations. It focuses on optimizing graph algorithms for parallel GPU processing to accelerate applications involving large graph analysis and optimization problems.
Compression can be defined as an art form that involves the representation of information in a reduced form when compared to the original information. Image compression is extremely important in this day and age because of the increased demand for sharing and storing multimedia data. Compression is concerned with removing redundant or superfluous information from a file to reduce the size of the file. The reduction of the file size saves both memory and the time required to transmit and store data. Lossless compression techniques are distinguished from lossy compression techniques, which are distinguished from one another. This paper focuses on the literature studies on various compression techniques and the comparisons between them.
IRJET- Handwritten Decimal Image Compression using Deep Stacked AutoencoderIRJET Journal
This document proposes using a deep stacked autoencoder neural network for compressing handwritten decimal image data. It involves training multiple autoencoders in sequence to form a deep network that can compress the high-dimensional input images into lower-dimensional encoded representations while minimizing information loss. The autoencoders are trained one layer at a time using scaled conjugate gradient descent. Testing on the MNIST handwritten digits dataset showed the deep stacked autoencoder achieved compression by encoding the 400-dimensional input images down to a 25-dimensional representation while maintaining good reconstruction accuracy, as measured by minimizing the mean squared error at each layer.
IMAGE COMPRESSION AND DECOMPRESSION SYSTEMVishesh Banga
Image compression is the application of Data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form.
Thesis on Image compression by Manish MystManish Myst
The document discusses using neural networks for image compression. It describes how previous neural network methods divided images into blocks and achieved limited compression. The proposed method applies edge detection, thresholding, and thinning to images first to reduce their size. It then uses a single-hidden layer feedforward neural network with an adaptive number of hidden neurons based on the image's distinct gray levels. The network is trained to compress the preprocessed image block and reconstruct the original image at the receiving end. This adaptive approach aims to achieve higher compression ratios than previous neural network methods.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
High Speed Data Exchange Algorithm in Telemedicine with Wavelet based on 4D M...Dr. Amarjeet Singh
Existing Medical imaging techniques such as fMRI, positron emission tomography (PET), dynamic 3D ultrasound and dynamic computerized tomography yield large amounts of four-dimensional sets. 4D medical data sets are the series of volumetric images netted in time, large in size and demand a great of assets for storage and transmission. Here, in this paper, we present a method wherein 3D image is taken and Discrete Wavelet Transform(DWT) and Dual-Tree Complex Wavelet Transform(DTCWT) techniques are applied separately on it and the image is split into sub-bands. The encoding and decoding are done using 3D-SPIHT, at different bit per pixels(bpp). The reconstructed image is synthesized using Inverse DWT technique. The quality of the compressed image has been evaluated using some factors such as Mean Square Error(MSE) and Peak-Signal to Noise Ratio (PSNR).
A Review of Comparison Techniques of Image SteganographyIOSR Journals
This document reviews and compares three common techniques for hiding information in digital images: Least Significant Bit (LSB) steganography, Discrete Cosine Transform (DCT) steganography, and Discrete Wavelet Transform (DWT) steganography. LSB is implemented in the spatial domain by replacing the least significant bits of cover image pixels with payload bits. DCT and DWT are implemented in the frequency domain by transforming the cover image and embedding payload bits in the transformed coefficients. The document evaluates and compares the performance of these three techniques based on metrics like mean squared error, peak signal-to-noise ratio, embedding capacity, and robustness.
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...csandit
Time-delay estimation is an essential building block of many signal processing applications.This paper follows up on earlier work for acoustic source localization and time delay estimation
using pattern recognition techniques in the adverse environment such as reverberant rooms or underwater; it presents unprecedented high performance results obtained with supervised training of neural networks which challenge the state of the art and compares its performance to that of well-known methods such as the Generalized Cross-Correlation or Adaptive Eigenvalue Decomposition.
A ROBUST CHAOTIC AND FAST WALSH TRANSFORM ENCRYPTION FOR GRAY SCALE BIOMEDICA...sipij
In this work, a new scheme of image encryption based on chaos and Fast Walsh Transform (FWT) has been proposed.
We used two chaotic logistic maps and combined chaotic encryption methods to the two-dimensional FWT of images.
The encryption process involves two steps: firstly, chaotic sequences generated by the chaotic logistic maps are used to
permute and mask the intermediate results or array of FWT, the next step consist in changing the chaotic sequences or
the initial conditions of chaotic logistic maps among two intermediate results of the same row or column. Changing the
encryption key several times on the same row or column makes the cipher more robust against any attack. We tested
our algorithms on many biomedical images. We also used images from data bases to compare our algorithm to those
in literature. It comes out from statistical analysis and key sensitivity tests that our proposed image encryption schemeprovides an efficient and secure way for real-time encryption and transmission biomedical images.
A new image steganography algorithm basedIJNSA Journal
In recent years, the rapid growth of information technology and digital communication has become very
important to secure information transmission between the sender and receiver. Therefore, steganography
introduces strongly to hide information and to communicate a secret data in an appropriate multimedia
carrier, e.g., image, audio and video files. In this paper, a new algorithm for image steganography has
been proposed to hide a large amount of secret data presented by secret color image. This algorithm is
based on different size image segmentations (DSIS) and modified least significant bits (MLSB), where the
DSIS algorithm has been applied to embed a secret image randomly instead of sequentially; this approach
has been applied before embedding process. The number of bit to be replaced at each byte is non uniform,
it bases on byte characteristics by constructing an effective hypothesis. The simulation results justify that
the proposed approach is employed efficiently and satisfied high imperceptible with high payload capacity
reached to four bits per byte.
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
Scientific seminar at Politecnico di Milano
Como, Italy
September 2018
In recent years, Deep Learning has achieved outstanding results outperforming previous techniques and even humans, thus becoming the state-of-the-art in a wide range of tasks, among which Computer Vision has been one of the most benefited areas.
Nonetheless, most of this success is tightly coupled to strongly supervised learning tasks, which require highly accurate, expensive and labor-intensive defined ground truth labels.
In this presentation, we will introduce diverse alternatives to deal with this problem and support the training of Deep Learning models for Computer Vision tasks by simplifying the process of data labelling or exploiting the unlimited supply of publicly available data in Internet (such as user-tagged images from Flickr). Such alternatives rely on data comprising noisy and weak labels, which are much easier to collect but require special care to be used.
A presentation on Image Recognition, the basic definition and working of Image Recognition, Edge Detection, Neural Networks, use of Convolutional Neural Network in Image Recognition, Applications, Future Scope and Conclusion
USING BIAS OPTIMIAZATION FOR REVERSIBLE DATA HIDING USING IMAGE INTERPOLATIONIJNSA Journal
In this paper, we propose a reversible data hiding method in the spatial domain for compressed grayscale images. The proposed method embeds secret bits into a compressed thumbnail of the original image by using a novel interpolation method and the Neighbour Mean Interpolation (NMI) technique as scaling up to the original image occurs. Experimental results presented in this paper show that the proposed method has significantly improved embedding capacities over the approach proposed by Jung and Yoo.
Fuzzy Type Image Fusion Using SPIHT Image Compression TechniqueIJERA Editor
This paper presents a fuzzy type image fusion technique using Set Partitioning in Hierarchical Trees (SPIHT).
It is concluded that fusion with higher single levels provides better fusion quality. This technique can be used
for fusion of fuzzy images as well as multi model image fusion. The proposed algorithm is very simple, easy to
implement and could be used for real time applications. This is paper also provided comparatively studied
between proposed and previous existing technique and validation of the proposed algorithm as Peak Signal to
Noise Ratio (PSNR), Root Mean Square Error (RMSE).
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/applying-the-right-deep-learning-model-with-the-right-data-for-your-application-a-presentation-from-vision-elements/
Hila Blecher-Segev, Computer Vision and AI Research Associate at Vision Elements, presents the “Applying the Right Deep Learning Model with the Right Data for Your Application” tutorial at the May 2021 Embedded Vision Summit.
Deep learning has made a huge impact on a wide variety of computer vision applications. But while the capabilities of deep neural networks are impressive, understanding how to best apply them is not straightforward. In this talk, Blecher-Segev highlights key questions that must be answered when considering incorporating a deep neural network into a vision application.
What type of data will be most beneficial for the task? Should the DNN use other types of data in addition to images? How should the data be annotated? What classes should be defined? What is the minimum amount of data needed for the network to be generalized and robust? What algorithmic approach should we use for our task (classification, regression or segmentation)? What type of network should we choose (FCN, DCNN, RNN, GAN)? Blecher-Segev explains the options and trade-offs, and maps out a process for making good choices for a specific application.
The document proposes a method for face recognition using deep learning and data augmentation. It cleans and pre-processes existing face datasets to remove noise and extracts faces. It then uses image processing techniques to add masks to the faces to create a new masked face dataset. An Inception Resnet-v1 model is trained on the new dataset. The method is applied to build a face recognition application for employee timekeeping that achieves high accuracy even when faces are masked.
This document provides an overview of a project that implemented image filtering using VHDL on an FPGA board. It discusses designing filters like average, Sobel, Gaussian, and Laplacian filters. Cache memory and a processing unit were developed to hold pixel values and apply filter kernels. Different methods for multiplication in the convolution process were evaluated. Results showed the output images after applying each filter both in software and on the FPGA board. In conclusion, FPGAs provide reconfigurable, accelerated processing for image applications like filtering compared to general purpose computers.
IRJET- A Survey on Medical Image Interpretation for Predicting PneumoniaIRJET Journal
This document summarizes research on using machine learning and deep learning techniques to interpret medical images and predict pneumonia. It first discusses how medical image analysis is an active field for machine learning. It then reviews several related studies on using convolutional neural networks (CNNs) and transfer learning to classify chest x-rays and detect pneumonia. Specifically, it examines research on developing CNN models for pneumonia classification and using pre-trained CNN architectures like VGG16, VGG19, and ResNet with transfer learning. The document concludes that computer-aided diagnosis systems using deep learning can provide accurate predictions to assist radiologists in pneumonia diagnosis from chest x-rays.
An optimized discrete wavelet transform compression technique for image trans...IJECEIAES
Transferring images in a wireless multimedia sensor network (WMSN) knows a fast development in both research and fields of application. Nevertheless, this area of research faces many problems such as the low quality of the received images after their decompression, the limited number of reconstructed images at the base station, and the high-energy consumption used in the process of compression and decompression. In order to fix these problems, we proposed a compression method based on the classic discrete wavelet transform (DWT). Our method applies the wavelet compression technique multiple times on the same image. As a result, we found that the number of received images is higher than using the classic DWT. In addition, the quality of the received images is much higher compared to the standard DWT. Finally, the energy consumption is lower when we use our technique. Therefore, we can say that our proposed compression technique is more adapted to the WMSN environment.
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold.
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold.
This document discusses parallelizing graph algorithms on GPUs for optimization. It summarizes previous work on parallel Breadth-First Search (BFS), All Pair Shortest Path (APSP), and Traveling Salesman Problem (TSP) algorithms. It then proposes implementing BFS, APSP, and TSP on GPUs using optimization techniques like reducing data transfers between CPU and GPU and modifying the algorithms to maximize GPU computing power and memory usage. The paper claims this will improve performance and speedup over CPU implementations. It focuses on optimizing graph algorithms for parallel GPU processing to accelerate applications involving large graph analysis and optimization problems.
Compression can be defined as an art form that involves the representation of information in a reduced form when compared to the original information. Image compression is extremely important in this day and age because of the increased demand for sharing and storing multimedia data. Compression is concerned with removing redundant or superfluous information from a file to reduce the size of the file. The reduction of the file size saves both memory and the time required to transmit and store data. Lossless compression techniques are distinguished from lossy compression techniques, which are distinguished from one another. This paper focuses on the literature studies on various compression techniques and the comparisons between them.
IRJET- Handwritten Decimal Image Compression using Deep Stacked AutoencoderIRJET Journal
This document proposes using a deep stacked autoencoder neural network for compressing handwritten decimal image data. It involves training multiple autoencoders in sequence to form a deep network that can compress the high-dimensional input images into lower-dimensional encoded representations while minimizing information loss. The autoencoders are trained one layer at a time using scaled conjugate gradient descent. Testing on the MNIST handwritten digits dataset showed the deep stacked autoencoder achieved compression by encoding the 400-dimensional input images down to a 25-dimensional representation while maintaining good reconstruction accuracy, as measured by minimizing the mean squared error at each layer.
IMAGE COMPRESSION AND DECOMPRESSION SYSTEMVishesh Banga
Image compression is the application of Data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form.
Thesis on Image compression by Manish MystManish Myst
The document discusses using neural networks for image compression. It describes how previous neural network methods divided images into blocks and achieved limited compression. The proposed method applies edge detection, thresholding, and thinning to images first to reduce their size. It then uses a single-hidden layer feedforward neural network with an adaptive number of hidden neurons based on the image's distinct gray levels. The network is trained to compress the preprocessed image block and reconstruct the original image at the receiving end. This adaptive approach aims to achieve higher compression ratios than previous neural network methods.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
High Speed Data Exchange Algorithm in Telemedicine with Wavelet based on 4D M...Dr. Amarjeet Singh
Existing Medical imaging techniques such as fMRI, positron emission tomography (PET), dynamic 3D ultrasound and dynamic computerized tomography yield large amounts of four-dimensional sets. 4D medical data sets are the series of volumetric images netted in time, large in size and demand a great of assets for storage and transmission. Here, in this paper, we present a method wherein 3D image is taken and Discrete Wavelet Transform(DWT) and Dual-Tree Complex Wavelet Transform(DTCWT) techniques are applied separately on it and the image is split into sub-bands. The encoding and decoding are done using 3D-SPIHT, at different bit per pixels(bpp). The reconstructed image is synthesized using Inverse DWT technique. The quality of the compressed image has been evaluated using some factors such as Mean Square Error(MSE) and Peak-Signal to Noise Ratio (PSNR).
A Review of Comparison Techniques of Image SteganographyIOSR Journals
This document reviews and compares three common techniques for hiding information in digital images: Least Significant Bit (LSB) steganography, Discrete Cosine Transform (DCT) steganography, and Discrete Wavelet Transform (DWT) steganography. LSB is implemented in the spatial domain by replacing the least significant bits of cover image pixels with payload bits. DCT and DWT are implemented in the frequency domain by transforming the cover image and embedding payload bits in the transformed coefficients. The document evaluates and compares the performance of these three techniques based on metrics like mean squared error, peak signal-to-noise ratio, embedding capacity, and robustness.
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...csandit
Time-delay estimation is an essential building block of many signal processing applications.This paper follows up on earlier work for acoustic source localization and time delay estimation
using pattern recognition techniques in the adverse environment such as reverberant rooms or underwater; it presents unprecedented high performance results obtained with supervised training of neural networks which challenge the state of the art and compares its performance to that of well-known methods such as the Generalized Cross-Correlation or Adaptive Eigenvalue Decomposition.
A ROBUST CHAOTIC AND FAST WALSH TRANSFORM ENCRYPTION FOR GRAY SCALE BIOMEDICA...sipij
In this work, a new scheme of image encryption based on chaos and Fast Walsh Transform (FWT) has been proposed.
We used two chaotic logistic maps and combined chaotic encryption methods to the two-dimensional FWT of images.
The encryption process involves two steps: firstly, chaotic sequences generated by the chaotic logistic maps are used to
permute and mask the intermediate results or array of FWT, the next step consist in changing the chaotic sequences or
the initial conditions of chaotic logistic maps among two intermediate results of the same row or column. Changing the
encryption key several times on the same row or column makes the cipher more robust against any attack. We tested
our algorithms on many biomedical images. We also used images from data bases to compare our algorithm to those
in literature. It comes out from statistical analysis and key sensitivity tests that our proposed image encryption schemeprovides an efficient and secure way for real-time encryption and transmission biomedical images.
A new image steganography algorithm basedIJNSA Journal
In recent years, the rapid growth of information technology and digital communication has become very
important to secure information transmission between the sender and receiver. Therefore, steganography
introduces strongly to hide information and to communicate a secret data in an appropriate multimedia
carrier, e.g., image, audio and video files. In this paper, a new algorithm for image steganography has
been proposed to hide a large amount of secret data presented by secret color image. This algorithm is
based on different size image segmentations (DSIS) and modified least significant bits (MLSB), where the
DSIS algorithm has been applied to embed a secret image randomly instead of sequentially; this approach
has been applied before embedding process. The number of bit to be replaced at each byte is non uniform,
it bases on byte characteristics by constructing an effective hypothesis. The simulation results justify that
the proposed approach is employed efficiently and satisfied high imperceptible with high payload capacity
reached to four bits per byte.
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
Scientific seminar at Politecnico di Milano
Como, Italy
September 2018
In recent years, Deep Learning has achieved outstanding results outperforming previous techniques and even humans, thus becoming the state-of-the-art in a wide range of tasks, among which Computer Vision has been one of the most benefited areas.
Nonetheless, most of this success is tightly coupled to strongly supervised learning tasks, which require highly accurate, expensive and labor-intensive defined ground truth labels.
In this presentation, we will introduce diverse alternatives to deal with this problem and support the training of Deep Learning models for Computer Vision tasks by simplifying the process of data labelling or exploiting the unlimited supply of publicly available data in Internet (such as user-tagged images from Flickr). Such alternatives rely on data comprising noisy and weak labels, which are much easier to collect but require special care to be used.
A presentation on Image Recognition, the basic definition and working of Image Recognition, Edge Detection, Neural Networks, use of Convolutional Neural Network in Image Recognition, Applications, Future Scope and Conclusion
USING BIAS OPTIMIAZATION FOR REVERSIBLE DATA HIDING USING IMAGE INTERPOLATIONIJNSA Journal
In this paper, we propose a reversible data hiding method in the spatial domain for compressed grayscale images. The proposed method embeds secret bits into a compressed thumbnail of the original image by using a novel interpolation method and the Neighbour Mean Interpolation (NMI) technique as scaling up to the original image occurs. Experimental results presented in this paper show that the proposed method has significantly improved embedding capacities over the approach proposed by Jung and Yoo.
Fuzzy Type Image Fusion Using SPIHT Image Compression TechniqueIJERA Editor
This paper presents a fuzzy type image fusion technique using Set Partitioning in Hierarchical Trees (SPIHT).
It is concluded that fusion with higher single levels provides better fusion quality. This technique can be used
for fusion of fuzzy images as well as multi model image fusion. The proposed algorithm is very simple, easy to
implement and could be used for real time applications. This is paper also provided comparatively studied
between proposed and previous existing technique and validation of the proposed algorithm as Peak Signal to
Noise Ratio (PSNR), Root Mean Square Error (RMSE).
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/applying-the-right-deep-learning-model-with-the-right-data-for-your-application-a-presentation-from-vision-elements/
Hila Blecher-Segev, Computer Vision and AI Research Associate at Vision Elements, presents the “Applying the Right Deep Learning Model with the Right Data for Your Application” tutorial at the May 2021 Embedded Vision Summit.
Deep learning has made a huge impact on a wide variety of computer vision applications. But while the capabilities of deep neural networks are impressive, understanding how to best apply them is not straightforward. In this talk, Blecher-Segev highlights key questions that must be answered when considering incorporating a deep neural network into a vision application.
What type of data will be most beneficial for the task? Should the DNN use other types of data in addition to images? How should the data be annotated? What classes should be defined? What is the minimum amount of data needed for the network to be generalized and robust? What algorithmic approach should we use for our task (classification, regression or segmentation)? What type of network should we choose (FCN, DCNN, RNN, GAN)? Blecher-Segev explains the options and trade-offs, and maps out a process for making good choices for a specific application.
The document proposes a method for face recognition using deep learning and data augmentation. It cleans and pre-processes existing face datasets to remove noise and extracts faces. It then uses image processing techniques to add masks to the faces to create a new masked face dataset. An Inception Resnet-v1 model is trained on the new dataset. The method is applied to build a face recognition application for employee timekeeping that achieves high accuracy even when faces are masked.
This document provides an overview of a project that implemented image filtering using VHDL on an FPGA board. It discusses designing filters like average, Sobel, Gaussian, and Laplacian filters. Cache memory and a processing unit were developed to hold pixel values and apply filter kernels. Different methods for multiplication in the convolution process were evaluated. Results showed the output images after applying each filter both in software and on the FPGA board. In conclusion, FPGAs provide reconfigurable, accelerated processing for image applications like filtering compared to general purpose computers.
IRJET- A Survey on Medical Image Interpretation for Predicting PneumoniaIRJET Journal
This document summarizes research on using machine learning and deep learning techniques to interpret medical images and predict pneumonia. It first discusses how medical image analysis is an active field for machine learning. It then reviews several related studies on using convolutional neural networks (CNNs) and transfer learning to classify chest x-rays and detect pneumonia. Specifically, it examines research on developing CNN models for pneumonia classification and using pre-trained CNN architectures like VGG16, VGG19, and ResNet with transfer learning. The document concludes that computer-aided diagnosis systems using deep learning can provide accurate predictions to assist radiologists in pneumonia diagnosis from chest x-rays.
An optimized discrete wavelet transform compression technique for image trans...IJECEIAES
Transferring images in a wireless multimedia sensor network (WMSN) knows a fast development in both research and fields of application. Nevertheless, this area of research faces many problems such as the low quality of the received images after their decompression, the limited number of reconstructed images at the base station, and the high-energy consumption used in the process of compression and decompression. In order to fix these problems, we proposed a compression method based on the classic discrete wavelet transform (DWT). Our method applies the wavelet compression technique multiple times on the same image. As a result, we found that the number of received images is higher than using the classic DWT. In addition, the quality of the received images is much higher compared to the standard DWT. Finally, the energy consumption is lower when we use our technique. Therefore, we can say that our proposed compression technique is more adapted to the WMSN environment.
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold.
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold.
Machine learning based augmented reality for improved learning application th...IJECEIAES
Detection of objects and their location in an image are important elements of current research in computer vision. In May 2020, Meta released its state-ofthe-art object-detection model based on a transformer architecture called detection transformer (DETR). There are several object-detection models such as region-based convolutional neural network (R-CNN), you only look once (YOLO) and single shot detectors (SSD), but none have used a transformer to accomplish this task. These models mentioned earlier, use all sorts of hyperparameters and layers. However, the advantages of using a transformer pattern make the architecture simple and easy to implement. In this paper, we determine the name of a chemical experiment through two steps: firstly, by building a DETR model, trained on a customized dataset, and then integrate it into an augmented reality mobile application. By detecting the objects used during the realization of an experiment, we can predict the name of the experiment using a multi-class classification approach. The combination of various computer vision techniques with augmented reality is indeed promising and offers a better user experience.
Improving AI surveillance using Edge ComputingIRJET Journal
This document proposes using edge computing and multiple deep learning models for improved AI surveillance. The models include face detection, landmarks recognition, face re-identification, and Mask R-CNN for object detection. These models would be deployed on edge devices using the Intel OpenVino toolkit to perform real-time surveillance with low latency. Experimental results show the edge computing approach can process video frames at 25 FPS for smart classroom monitoring, compared to 10 FPS for cloud-based approaches. Initial testing of the Mask R-CNN model achieved a validation loss of 0.2294 for weapon detection. The proposed system aims to enhance security monitoring while reducing resources required compared to cloud-based solutions.
Video captioning in Vietnamese using deep learningIJECEIAES
With the development of today's society, demand for applications using digital cameras jumps over year by year. However, analyzing large amounts of video data causes one of the most challenging issues. In addition to storing the data captured by the camera, intelligent systems are required to quickly analyze the data to correct important situations. In this paper, we use deep learning techniques to build automatic models that describe movements on video. To solve the problem, we use three deep learning models: sequence-to-sequence model based on recurrent neural network, sequenceto-sequence model with attention and transformer model. We evaluate the effectiveness of the approaches based on the results of three models. To train these models, we use Microsoft research video description corpus (MSVD) dataset including 1970 videos and 85,550 captions translated into Vietnamese. In order to ensure the description of the content in Vietnamese, we also combine it with the natural language processing (NLP) model for Vietnamese.
Beginner's Guide to Diffusion Models..pptxIshaq Khan
1. Diffusion models are a new powerful family of deep generative models inspired by physics that destroy structure in data using a diffusion process and train a model to reverse the process.
2. Denoising Diffusion Probabilistic Models (DDPMs) were introduced to generate high-quality samples using a U-Net to predict noise levels and recover data from pure noise.
3. Improved DDPMs achieved competitive log-likelihoods while maintaining high sample quality through modifications like an improved noise schedule and optimization techniques.
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
This document discusses accelerating deep neural networks on low power heterogeneous architectures. Specifically, it focuses on accelerating the inference time of the VGG-16 neural network on the ODROID-XU4 board, which contains an ARM CPU and Mali GPU. The authors develop parallel versions of VGG-16 using OpenMP for the CPU and OpenCL for the GPU. Several optimizations are explored in OpenCL, including work groups, vector data types, and the CLBlast library. The best OpenCL implementation achieves a 9.4x speedup over the original serial version.
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...IJCNCJournal
Deep learning applications, especially multilayer neural network models, result in network intrusion detection with high accuracy. This study proposes a model that combines a multilayer neural network with Dense Sparse Dense (DSD) multi-stage training to simultaneously improve the criteria related to the performance of intrusion detection systems on a comprehensive dataset UNSW-NB15. We conduct experiments on many neural network models such as Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), etc. to evaluate the combined efficiency with each model through many criteria such as accuracy, detection rate, false alarm rate, precision, and F1-Score.
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...IJCNCJournal
Deep learning applications, especially multilayer neural network models, result in network intrusion detection with high accuracy. This study proposes a model that combines a multilayer neural network with Dense Sparse Dense (DSD) multi-stage training to simultaneously improve the criteria related to the performance of intrusion detection systems on a comprehensive dataset UNSW-NB15. We conduct experiments on many neural network models such as Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), etc. to evaluate the combined efficiency with each model through many criteria such as accuracy, detection rate, false alarm rate, precision, and F1-Score.
This document discusses the layers of convolutional neural networks (CNNs). It provides an overview of common CNN layers including convolutional layers, max pooling layers, padding, rectified linear unit (ReLU) nonlinearity, and fully connected layers. Convolutional layers extract features from input images using small filter matrices in a sliding window approach. Max pooling layers reduce the dimensionality of feature maps. Padding handles edge effects when filters are smaller than inputs. ReLU introduces nonlinearity. Fully connected layers flatten feature maps into vectors for classification. The document reviews the functions of these key CNN layers.
This document discusses using artificial neural networks and MATLAB 7.10 to develop an efficient system for sorting mechanical spare parts. It involves using wavelet transforms to extract features from images of parts, which are then used to train an artificial neural network. The neural network can accurately recognize parts based on their wavelet features with high efficiency. Simulation results show the system can successfully identify the name of a selected spare part from its image with a graphical output.
Applying convolutional neural networks for limited-memory applicationTELKOMNIKA JOURNAL
Currently, convolutional neural networks (CNN) are considered as the most effective tool in image diagnosis and processing techniques. In this paper, we studied and applied the modified SSDLite_MobileNetV2 and proposed a solution to always maintain the boundary of the total memory capacity in the following robust bound and applied on the bridge navigational watch & alarm system (BNWAS). The hardware was designed based on raspberry Pi-3, an embedded single board computer with CPU smartphone level, limited RAM without CUDA GPU. Experimental results showed that the deep learning model on an embedded single board computer brings us high effectiveness in application.
Deep Learning based Multi-class Brain Tumor ClassificationIRJET Journal
The document discusses a study that aims to improve the classification of brain tumors on MRI images using deep learning techniques. It compares several convolutional neural network architectures (Custom CNN, DenseNet169, MobileNet, VGG-16, and ResNet152) for multi-class brain tumor classification using MRI data. The models are trained on a dataset of approximately 5,000 brain MR images and their performance at tumor detection is evaluated and compared. Transfer learning techniques are also discussed for applying knowledge from one task to improve predictions for new tasks.
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAScscpconf
This article describes the design and development ofa system for remote indoor 3D monitoring
using an undetermined number of Microsoft® Kinect sensors. In the proposed client-server
system, the Kinect cameras can be connected to different computers, addressing this way the
hardware limitation of one sensor per USB controller. The reason behind this limitation is the
high bandwidth needed by the sensor, which becomes also an issue for the distributed system
TCP/IP communications. Since traffic volume is too high, 3D data has to be compressed before
it can be sent over the network. The solution consists in self-coding the Kinect data into RGB
images and then using a standard multimedia codec to compress color maps. Information from
different sources is collected into a central client computer, where point clouds are transformed
to reconstruct the scene in 3D. An algorithm is proposed to conveniently merge the skeletons
detected locally by each Kinect, so that monitoring of people is robust to self and inter-user
occlusions. Final skeletons are labeled and trajectories of every joint can be saved for event
reconstruction or further analysis.
JOINT IMAGE WATERMARKING, COMPRESSION AND ENCRYPTION BASED ON COMPRESSED SENS...ijma
ABSTRACT
Image usage over the internet becomes more and more important each day. Over 3 billion images are shared each day over the internet which raise a concern about how to protect images copyrights? Or how to utilize image sharing experience? This paper proposes a new robust image watermarking algorithm based on compressed sensing (CS) and quantization index modulation (QIM) watermark embedding. The algorithm capitalizes on the CS to compress and encrypt images jointly with Entropy Coding, Arnold Cat Map, Pseudo-random numbers and Advanced Encryption Standard (AES). Our proposed algorithm works under the JPEG standard umbrella. Watermark embedding is done in 3 different locations inside the image using QIM. Those locations differ with each 8-by-8 image block. Choosing which combination of coefficients to be used in QIM watermark embedding depends on selecting a combination from combinations table, which is generated at the same time with projection matrices using a 10-digits Pseudorandom number secret key SK1. After quantization phase, the algorithm shuffles image blocks using Arnold’s Cat Map with a 10-digits Pseudo-random number secret key SK2, followed by a unique method for splitting every 8x8 block into two unequal parts. Part number one will act as the host for two QIM watermarks then goes through encoding phase using Run-Length Encoding (RLE) followed by Huffman Encoding, while part number two goes through sparse watermark embedding followed by a third QIM watermark embedding and compression phase using CS, then Huffman encoder is used to encode this part. The algorithm aims to combine image watermarking, compression and encryption capabilities in one algorithm while balancing how those capabilities works with each other to achieve significant improvement in terms of image watermarking, compression and encryption. 15 different images usually used in image processing benchmarking were used for testing the algorithm capabilities and experiments show that our proposed algorithm achieves robust watermarking jointly with encryption and compression under the JPEG standard framework.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Similar to Development of 3D convolutional neural network to recognize human activities using moderate computation machine (20)
Square transposition: an approach to the transposition process in block cipherjournalBEEI
The transposition process is needed in cryptography to create a diffusion effect on data encryption standard (DES) and advanced encryption standard (AES) algorithms as standard information security algorithms by the National Institute of Standards and Technology. The problem with DES and AES algorithms is that their transposition index values form patterns and do not form random values. This condition will certainly make it easier for a cryptanalyst to look for a relationship between ciphertexts because some processes are predictable. This research designs a transposition algorithm called square transposition. Each process uses square 8 × 8 as a place to insert and retrieve 64-bits. The determination of the pairing of the input scheme and the retrieval scheme that have unequal flow is an important factor in producing a good transposition. The square transposition can generate random and non-pattern indices so that transposition can be done better than DES and AES.
Hyper-parameter optimization of convolutional neural network based on particl...journalBEEI
The document proposes using a particle swarm optimization (PSO) algorithm to optimize the hyperparameters of a convolutional neural network (CNN) for image classification. The PSO algorithm is used to find optimal values for CNN hyperparameters like the number and size of convolutional filters. In experiments on the MNIST handwritten digit dataset, the optimized CNN achieved a testing error rate of 0.87%, which is competitive with state-of-the-art models. The proposed approach finds optimized CNN architectures automatically without requiring manual design or encoding strategies during training.
Supervised machine learning based liver disease prediction approach with LASS...journalBEEI
In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system.
A secure and energy saving protocol for wireless sensor networksjournalBEEI
The research domain for wireless sensor networks (WSN) has been extensively conducted due to innovative technologies and research directions that have come up addressing the usability of WSN under various schemes. This domain permits dependable tracking of a diversity of environments for both military and civil applications. The key management mechanism is a primary protocol for keeping the privacy and confidentiality of the data transmitted among different sensor nodes in WSNs. Since node's size is small; they are intrinsically limited by inadequate resources such as battery life-time and memory capacity. The proposed secure and energy saving protocol (SESP) for wireless sensor networks) has a significant impact on the overall network life-time and energy dissipation. To encrypt sent messsages, the SESP uses the public-key cryptography’s concept. It depends on sensor nodes' identities (IDs) to prevent the messages repeated; making security goals- authentication, confidentiality, integrity, availability, and freshness to be achieved. Finally, simulation results show that the proposed approach produced better energy consumption and network life-time compared to LEACH protocol; sensors are dead after 900 rounds in the proposed SESP protocol. While, in the low-energy adaptive clustering hierarchy (LEACH) scheme, the sensors are dead after 750 rounds.
Plant leaf identification system using convolutional neural networkjournalBEEI
This paper proposes a leaf identification system using convolutional neural network (CNN). This proposed system can identify five types of local Malaysia leaf which were acacia, papaya, cherry, mango and rambutan. By using CNN from deep learning, the network is trained from the database that acquired from leaf images captured by mobile phone for image classification. ResNet-50 was the architecture has been used for neural networks image classification and training the network for leaf identification. The recognition of photographs leaves requested several numbers of steps, starting with image pre-processing, feature extraction, plant identification, matching and testing, and finally extracting the results achieved in MATLAB. Testing sets of the system consists of 3 types of images which were white background, and noise added and random background images. Finally, interfaces for the leaf identification system have developed as the end software product using MATLAB app designer. As a result, the accuracy achieved for each training sets on five leaf classes are recorded above 98%, thus recognition process was successfully implemented.
Customized moodle-based learning management system for socially disadvantaged...journalBEEI
This study aims to develop Moodle-based LMS with customized learning content and modified user interface to facilitate pedagogical processes during covid-19 pandemic and investigate how teachers of socially disadvantaged schools perceived usability and technology acceptance. Co-design process was conducted with two activities: 1) need assessment phase using an online survey and interview session with the teachers and 2) the development phase of the LMS. The system was evaluated by 30 teachers from socially disadvantaged schools for relevance to their distance learning activities. We employed computer software usability questionnaire (CSUQ) to measure perceived usability and the technology acceptance model (TAM) with insertion of 3 original variables (i.e., perceived usefulness, perceived ease of use, and intention to use) and 5 external variables (i.e., attitude toward the system, perceived interaction, self-efficacy, user interface design, and course design). The average CSUQ rating exceeded 5.0 of 7 point-scale, indicated that teachers agreed that the information quality, interaction quality, and user interface quality were clear and easy to understand. TAM results concluded that the LMS design was judged to be usable, interactive, and well-developed. Teachers reported an effective user interface that allows effective teaching operations and lead to the system adoption in immediate time.
Understanding the role of individual learner in adaptive and personalized e-l...journalBEEI
Dynamic learning environment has emerged as a powerful platform in a modern e-learning system. The learning situation that constantly changing has forced the learning platform to adapt and personalize its learning resources for students. Evidence suggested that adaptation and personalization of e-learning systems (APLS) can be achieved by utilizing learner modeling, domain modeling, and instructional modeling. In the literature of APLS, questions have been raised about the role of individual characteristics that are relevant for adaptation. With several options, a new problem has been raised where the attributes of students in APLS often overlap and are not related between studies. Therefore, this study proposed a list of learner model attributes in dynamic learning to support adaptation and personalization. The study was conducted by exploring concepts from the literature selected based on the best criteria. Then, we described the results of important concepts in student modeling and provided definitions and examples of data values that researchers have used. Besides, we also discussed the implementation of the selected learner model in providing adaptation in dynamic learning.
Prototype mobile contactless transaction system in traditional markets to sup...journalBEEI
1) Researchers developed a prototype contactless transaction system using QR codes and digital payments to support physical distancing during the COVID-19 pandemic in traditional markets.
2) The system allows sellers and buyers in traditional markets to conduct fast, secure transactions via smartphones without direct cash exchange. Buyers scan sellers' QR codes to view product details and make e-wallet payments.
3) Testing showed the system's functions worked properly and users found it easy to use and useful for supporting contactless transactions and digital transformation of traditional markets. However, further development is needed to increase trust in digital payments for users unfamiliar with the technology.
Wireless HART stack using multiprocessor technique with laxity algorithmjournalBEEI
The use of a real-time operating system is required for the demarcation of industrial wireless sensor network (IWSN) stacks (RTOS). In the industrial world, a vast number of sensors are utilised to gather various types of data. The data gathered by the sensors cannot be prioritised ahead of time. Because all of the information is equally essential. As a result, a protocol stack is employed to guarantee that data is acquired and processed fairly. In IWSN, the protocol stack is implemented using RTOS. The data collected from IWSN sensor nodes is processed using non-preemptive scheduling and the protocol stack, and then sent in parallel to the IWSN's central controller. The real-time operating system (RTOS) is a process that occurs between hardware and software. Packets must be sent at a certain time. It's possible that some packets may collide during transmission. We're going to undertake this project to get around this collision. As a prototype, this project is divided into two parts. The first uses RTOS and the LPC2148 as a master node, while the second serves as a standard data collection node to which sensors are attached. Any controller may be used in the second part, depending on the situation. Wireless HART allows two nodes to communicate with each other.
Implementation of double-layer loaded on octagon microstrip yagi antennajournalBEEI
This document describes the implementation of a double-layer structure on an octagon microstrip yagi antenna (OMYA) to improve its performance at 5.8 GHz. The double-layer consists of two double positive (DPS) substrates placed above the OMYA. Simulation and experimental results show that the double-layer configuration increases the gain of the OMYA by 2.5 dB compared to without the double-layer. The measured bandwidth of the OMYA with double-layer is 14.6%, indicating the double-layer can increase both the gain and bandwidth of the OMYA.
The calculation of the field of an antenna located near the human headjournalBEEI
In this work, a numerical calculation was carried out in one of the universal programs for automatic electro-dynamic design. The calculation is aimed at obtaining numerical values for specific absorbed power (SAR). It is the SAR value that can be used to determine the effect of the antenna of a wireless device on biological objects; the dipole parameters will be selected for GSM1800. Investigation of the influence of distance to a cell phone on radiation shows that absorbed in the head of a person the effect of electromagnetic radiation on the brain decreases by three times this is a very important result the SAR value has decreased by almost three times it is acceptable results.
Exact secure outage probability performance of uplinkdownlink multiple access...journalBEEI
In this paper, we study uplink-downlink non-orthogonal multiple access (NOMA) systems by considering the secure performance at the physical layer. In the considered system model, the base station acts a relay to allow two users at the left side communicate with two users at the right side. By considering imperfect channel state information (CSI), the secure performance need be studied since an eavesdropper wants to overhear signals processed at the downlink. To provide secure performance metric, we derive exact expressions of secrecy outage probability (SOP) and and evaluating the impacts of main parameters on SOP metric. The important finding is that we can achieve the higher secrecy performance at high signal to noise ratio (SNR). Moreover, the numerical results demonstrate that the SOP tends to a constant at high SNR. Finally, our results show that the power allocation factors, target rates are main factors affecting to the secrecy performance of considered uplink-downlink NOMA systems.
Design of a dual-band antenna for energy harvesting applicationjournalBEEI
This report presents an investigation on how to improve the current dual-band antenna to enhance the better result of the antenna parameters for energy harvesting application. Besides that, to develop a new design and validate the antenna frequencies that will operate at 2.4 GHz and 5.4 GHz. At 5.4 GHz, more data can be transmitted compare to 2.4 GHz. However, 2.4 GHz has long distance of radiation, so it can be used when far away from the antenna module compare to 5 GHz that has short distance in radiation. The development of this project includes the scope of designing and testing of antenna using computer simulation technology (CST) 2018 software and vector network analyzer (VNA) equipment. In the process of designing, fundamental parameters of antenna are being measured and validated, in purpose to identify the better antenna performance.
Transforming data-centric eXtensible markup language into relational database...journalBEEI
eXtensible markup language (XML) appeared internationally as the format for data representation over the web. Yet, most organizations are still utilising relational databases as their database solutions. As such, it is crucial to provide seamless integration via effective transformation between these database infrastructures. In this paper, we propose XML-REG to bridge these two technologies based on node-based and path-based approaches. The node-based approach is good to annotate each positional node uniquely, while the path-based approach provides summarised path information to join the nodes. On top of that, a new range labelling is also proposed to annotate nodes uniquely by ensuring the structural relationships are maintained between nodes. If a new node is to be added to the document, re-labelling is not required as the new label will be assigned to the node via the new proposed labelling scheme. Experimental evaluations indicated that the performance of XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML concerning storing time, query retrieval time and scalability. This research produces a core framework for XML to relational databases (RDB) mapping, which could be adopted in various industries.
Key performance requirement of future next wireless networks (6G)journalBEEI
The document provides an overview of the key performance indicators (KPIs) for 6G wireless networks compared to 5G networks. Some of the major KPIs discussed for 6G include: achieving data rates of up to 1 Tbps and individual user data rates up to 100 Gbps; reducing latency below 10 milliseconds; supporting up to 10 million connected devices per square kilometer; improving spectral efficiency by up to 100 times through technologies like terahertz communications and smart surfaces; and achieving an energy efficiency of 1 pico-joule per bit transmitted through techniques like wireless power transmission and energy harvesting. The document outlines how 6G aims to integrate terrestrial, aerial and maritime communications into a single network to provide ubiquitous connectivity with higher
Noise resistance territorial intensity-based optical flow using inverse confi...journalBEEI
This paper presents the use of the inverse confidential technique on bilateral function with the territorial intensity-based optical flow to prove the effectiveness in noise resistance environment. In general, the image’s motion vector is coded by the technique called optical flow where the sequences of the image are used to determine the motion vector. But, the accuracy rate of the motion vector is reduced when the source of image sequences is interfered by noises. This work proved that the inverse confidential technique on bilateral function can increase the percentage of accuracy in the motion vector determination by the territorial intensity-based optical flow under the noisy environment. We performed the testing with several kinds of non-Gaussian noises at several patterns of standard image sequences by analyzing the result of the motion vector in a form of the error vector magnitude (EVM) and compared it with several noise resistance techniques in territorial intensity-based optical flow method.
Modeling climate phenomenon with software grids analysis and display system i...journalBEEI
This study aims to model climate change based on rainfall, air temperature, pressure, humidity and wind with grADS software and create a global warming module. This research uses 3D model, define, design, and develop. The results of the modeling of the five climate elements consist of the annual average temperature in Indonesia in 2009-2015 which is between 29oC to 30.1oC, the horizontal distribution of the annual average pressure in Indonesia in 2009-2018 is between 800 mBar to 1000 mBar, the horizontal distribution the average annual humidity in Indonesia in 2009 and 2011 ranged between 27-57, in 2012-2015, 2017 and 2018 it ranged between 30-60, during the East Monsoon, the wind circulation moved from northern Indonesia to the southern region Indonesia. During the west monsoon, the wind circulation moves from the southern part of Indonesia to the northern part of Indonesia. The global warming module for SMA/MA produced is feasible to use, this is in accordance with the value given by the validate of 69 which is in the appropriate category and the response of teachers and students through a 91% questionnaire.
An approach of re-organizing input dataset to enhance the quality of emotion ...journalBEEI
The purpose of this paper is to propose an approach of re-organizing input data to recognize emotion based on short signal segments and increase the quality of emotional recognition using physiological signals. MIT's long physiological signal set was divided into two new datasets, with shorter and overlapped segments. Three different classification methods (support vector machine, random forest, and multilayer perceptron) were implemented to identify eight emotional states based on statistical features of each segment in these two datasets. By re-organizing the input dataset, the quality of recognition results was enhanced. The random forest shows the best classification result among three implemented classification methods, with an accuracy of 97.72% for eight emotional states, on the overlapped dataset. This approach shows that, by re-organizing the input dataset, the high accuracy of recognition results can be achieved without the use of EEG and ECG signals.
Parking detection system using background subtraction and HSV color segmentationjournalBEEI
Manual system vehicle parking makes finding vacant parking lots difficult, so it has to check directly to the vacant space. If many people do parking, then the time needed for it is very much or requires many people to handle it. This research develops a real-time parking system to detect parking. The system is designed using the HSV color segmentation method in determining the background image. In addition, the detection process uses the background subtraction method. Applying these two methods requires image preprocessing using several methods such as grayscaling, blurring (low-pass filter). In addition, it is followed by a thresholding and filtering process to get the best image in the detection process. In the process, there is a determination of the ROI to determine the focus area of the object identified as empty parking. The parking detection process produces the best average accuracy of 95.76%. The minimum threshold value of 255 pixels is 0.4. This value is the best value from 33 test data in several criteria, such as the time of capture, composition and color of the vehicle, the shape of the shadow of the object’s environment, and the intensity of light. This parking detection system can be implemented in real-time to determine the position of an empty place.
Quality of service performances of video and voice transmission in universal ...journalBEEI
The universal mobile telecommunications system (UMTS) has distinct benefits in that it supports a wide range of quality of service (QoS) criteria that users require in order to fulfill their requirements. The transmission of video and audio in real-time applications places a high demand on the cellular network, therefore QoS is a major problem in these applications. The ability to provide QoS in the UMTS backbone network necessitates an active QoS mechanism in order to maintain the necessary level of convenience on UMTS networks. For UMTS networks, investigation models for end-to-end QoS, total transmitted and received data, packet loss, and throughput providing techniques are run and assessed and the simulation results are examined. According to the results, appropriate QoS adaption allows for specific voice and video transmission. Finally, by analyzing existing QoS parameters, the QoS performance of 4G/UMTS networks may be improved.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Development of 3D convolutional neural network to recognize human activities using moderate computation machine
1. Bulletin of Electrical Engineering and Informatics
Vol. 10, No. 6, December 2021, pp. 3137~3146
ISSN: 2302-9285, DOI: 10.11591/eei.v10i6.2802 3137
Journal homepage: http://beei.org
Development of 3D convolutional neural network to recognize
human activities using moderate computation machine
Malik A. Alsaedi, Abdulrahman S. Mohialdeen, Baraa M. Albaker
College of Engineering, Al-Iraqia University, Sabe’ Abkar, Adhamiya, Baghdad, Iraq
Article Info ABSTRACT
Article history:
Received Jan 12, 2021
Revised May 20, 2021
Accepted Oct 9, 2021
Human activity recognition (HAR) is recently used in numerous applications
including smart homes to monitor human behavior, automate homes
according to human activities, entertainment, falling detection, violence
detection, and people care. Vision-based recognition is the most powerful
method widely used in HAR systems implementation due to its characteristics
in recognizing complex human activities. This paper addresses the design of a
3D convolutional neural network (3D-CNN) model that can be used in smart
homes to identify several numbers of activities. The model is trained using
KTH dataset that contains activities like (walking, running, jogging,
handwaving handclapping, boxing). Despite the challenges of this method due
to the effectiveness of the lamination, background variation, and human body
variety, the proposed model reached an accuracy of 93.33%. The model was
implemented, trained and tested using moderate computation machine and the
results show that the proposal was successfully capable to recognize human
activities with reasonable computations.
Keywords:
3D-CNN
Convolutional neural network
Deep learning
HAR
Smart home
Vision
This is an open access article under the CC BY-SA license.
Corresponding Author:
Abdulrahman S. Mohialdeen
College of Engineering
Al-Iraqia University
Sabe’ Abkar, Adhamiya, Baghdad, Iraq
Email: abd_saeed@aliraqia.edu.iq
1. INTRODUCTION
HAR is one of the challenging subjects because of the huge number of human activities, some of the
activities can be easily noticed some of them are confusing, and some of them require interaction with other
objects or humans, besides the diversity of the activities, the recognition methods are also diverse. There are
many types of data required to recognize a human activity, some of them use ambient sensors like
accelerometer, gyroscope, humidity, and temperature [1], [2]. Some of them get the benefit of the
smartphone's sensors like accelerometer and gyroscope [3], [4], the other use the radio frequency [5]. But the
most popular recognizing methods use vision-based recognition [6]-[14].
Vision uses images or videos to recognize the activity. Also, there is a lot of challenges for
recognizing human activities because of the effect of lamination, variance of background. Still, the question
is how to process these visual data to recognize the activity. The answer there are many techniques most of
them use machine learning, and deep learning has shown an excellent benefit for recognizing human
activities, especially the CNN which are very useful for vision-based data recognition.
In this paper, we will design a neural network architect a.k.a. model, that can be used for human
activity monitoring, the model proposed of 3D dimensional CNN (3D-CNN), and the purpose of using
3D-CNN is to extract spatial and temporal features rather than only spatial features, and the activity consists
2. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3137 – 3146
3138
of multiple movements that can be identified by extracting the temporal features from between several
numbers of frames.
Our proposed model is a small number of 3D-CNN layers to reduce the amount of processing time
so that any computer with low computational ability could recognize human activity online without delay.
CNN was introduced by Fukushima [12]. At the beginning, it was mainly used for image classification, and
image features extraction, one of the most attractive models that been invented to get into a challenge to
classify images [13], [14]. During that many video human activity datasets were published and KTH [15] is
one of the most popular small size datasets. CNN architect encouraged researchers to use it in HAR with
image data taken from video dataset [10], [16]. Other researchers proposed to use two-stream CNN where the
first stream is an image data and the second it's optical flow [16], [17] 3D-CNN for HAR most cited articles
[18] proposed model with TRECVID dataset, and [19] proposed 3D-CNN model for UCF-101 dataset [20] in
which our model inspired by their model.
2. RESEARCH METHOD
This paper proposing to use deep neural networks (DNN) models that consists of many different
layers, and each layer has its purpose during training and testing. Still, the dominant layer in which the article
focused on is the Convolutional neural network, which is one of the most widely used neural networks, and
its central idea is applying filters to the input data or convolute it, and transfer the convoluted data to the next
layer. CNN was primarily used to deal with image data. Now there are 1D, 2D, and 3D CNN to get the
benefit of this architect for another type of data with a different number of dimensions. 3D-CNN used for
three-dimensional data which is very suited for our project, because we are dealing with video data, and the
reason for using video data rather than image data is the activity made of several consequential movements of
body parts. This continuous movement can be noticed with successive images, and this is a video.
Pooling layers used in this paper are max-pooling, which returns the maximum value within a kernel
size when it wraps around the data, average-pooling which returns the average amount within a kernel size
that wraps along with the data, and global-average-pooling, which returns the average value of each
dimension or each kernel in the CNN layer, and that is why it is useful at the final part of the model to reduce
the number of parameters, and to help the model overcome overfitting dropout layers are used.
2.1. Proposed models
The first suggested model shown in Figure 1 is influenced by the model proposed by Tran et al.
[19]. The model consists of three connected 3D convolutional neural networks with 3*3*3 kernel size for all
convolutional layers, 6ed by a max-pooling layer with a kernel size of 2*2*2 for all max-pooling layers.
Zero-padding is used for all convolutional layers, and ‘ReLU’ activation function added after each
convolutional and fully connected (FC) layers except for the last was SoftMax.
Figure 1. Initial proposed model
In the second attempt, other convolutional and max-pooling layers added before flatten layer, to
increase the accuracy. Then dropouts were added in different places in the model with five attempts to get
high accuracy, where all dropouts were with a 0.5 percentage factor. We try with our model to reduce the
number of parameters (weights and biases) and return to the first model with dropouts. The accuracy
increased using the dropouts. But, the number of parameters in huge, because the output of the layer before
flatten layer was more extensive than before removing convolutional and pooling layers, and as the pooling
layer is gone the number of parameters increased, in (1)-(3) shows how the number of parameters calculated.
Input
30*40*40*1
Conv3D
64
MaxPool
Conv3D
128
MaxPool
Conv3D
256
MaxPool
Flatten
FC
128
FC
32
FC
6
SoftMax Output
6 classes
3. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Development of 3D convolutional neural network to recognize human … (Malik A. Alsaedi)
3139
So, to solve this problem GlobalAveragePooling3D layer replaced the flatten layer, this replacement reduced
the number of parameters to about two million parameters, which is very helpful for low or moderate
computation capabilities machines to deal with online activity recognition.
No. of parameters in CNN=(filterwidth * filterheight * filterdepth + 1) * no. of filters (1)
No. of parameters in FC net=neurons of current layer * neurons of previous layer (2)
No. of parameters in flatten=multiplication of all previous layer dimension (3)
3. RESULTS AND DISCUSSION
The neural network model is trained on the KTH dataset, the dataset is doubled before using it by
adding a flipped copy of it, 20% of data is taken for test after training, and another 20% taken for the test
during the training, and 60% were taken for the training process. The model was trained using Tensorflow-
v2.1 [21] as backend, and Keras-v2.3.1 [22] using Python-language-v3.7.5 as front-end, with batch size=16
and the shape of the frame taken from the video were 40x40 pixels with one channel (grayscale), from each
video 30 frames were taken between each frame and another there were four frames between taken frames
discarded.
The optimizer of the model was Adam optimizer [23], and with a learning rate of 0.001 and
categorical cross-entropy loss function, the machine specification was: HP 15 Notebook, Memory: 12288MB
RAM, Intel-Core i5-3230M processor with four cores, two of them are physical, and the maximum frequency
is 2.6GHz and Windows 8.1 Enterprise 64-bit OS (6.3, build 9600).
Validation data controls the training operation. So, if an update made to the model and validation
data applied to the model and the losses did not improved for three epochs for the learning rate would be
multiplied by a half and the minimum reduction is 0.0001. If the losses did not improve for 15 epochs
consequently, the training would be finished before getting to the given number of epochs is 100.
3.1. Calculating results
After training operation finishes test samples are pushed to the model to get the response the
accuracy, precision, recall, and f1_score are calculated using (4)-(7) which uses confusion matrix shown in
Figure 2 [24], for average loss is calculated using categorical cross-entropy algorithm [25].
Figure 2. Confusion matrix annotation
Accuracy=
∑ tpi
l
i=1 +tni
∑ (tpi+fni+fpi+tni
)
l
i=1
(4)
In (5) shows the way to calculate accuracy which defines the effectiveness of the model overall.
Precision=
∑ tpi
l
i=1
∑ (tpi+fpi)
l
i=1
(5)
According to (5) shows the way to calculate precision which determines the matching between the
label of classes and the calculated labels. In (6) shows the way to calculate recall which demonstrates the
effectiveness of the model to identify the label of classes. As shown in (7) shows the way to calculate
F1_score which defines the relation between output data taken from the model after entering data for test and
the positive labels.
In (8) show the way to calculate the average loss, where N is the number of samples, M is the
number of classes, d is the true label or desired output, and y is the calculated or tested output from the
model. Table 1 shows the calculated results, and it’s figure number. Table 2 shows a comparison of
accuracies for several studies done on the KTH dataset for human activity recognition and our study
accuracy, and it is evident that our method has shown a remarkable improvement according to accuracy.
4. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3137 – 3146
3140
Recall=
∑ tpi
l
i=1
∑ (tpi+fni)
l
i=1
(6)
F1_score=
(β2+1)∗Precision+Recall
β2∗Precision+Recall
(8)
Loss=
∑ −
1
M
∑ di,jlog(yi,j)
M
j=1
N
i=1
N
(9)
Table 1. Calculated results for all models
No. No. of figure Accuracy % Loss Precision % Recall % F1_score %
1. Figure 3 85.83 0.39 86.30 85.89 86.09
2. Figure 4 88.75 0.39 88.97 88.97 88.97
3. Figure 5 92.08 0.23 92.67 91.90 92.28
4. Figure 6 92.92 0.22 93.08 92.95 93.02
Table 2. Comparison of accuracies of researches done on KTH
No. Method Accuracy %
1 Ahmad and Lee [26] 84.83
2 Taylor et al. [27] 88.00
3 Qian et al. [28] 88.69
4 Our method 93.33
Number the table consecutively according to the first mention (sequential order).
3.2. Calculating the number of operations for layers
We want to calculate an approximate number of operations for each layer, the calculations don’t
include controlling operations, calculations are detailed below:
a. 3D-CNN each kernel convolutes on the entire input data, and for 3D-CNN with a kernel size of (Kd,
Kh, Kw), input data of (frames, height, width) and strides are (strided, strideh, stridew), we would have:
No. of operations per node=(Kd * Kh * Kw + 1)2 * no. of previous kernels (9)
Output nodes=((frames-Kd)/strided+1) * (height-Kh)/strideh+1) *((width-
Kw)/stridew+1) *no. of current kernels
(10)
Output nodes=(frames/strided+1) * (height/strideh+1) * (width/stridew+1) *
no. of current kernels
(11)
In (9) shows the number of operations for each output node came from the convolution operation
and the power two because we have multiplications, in (10) [29] shows the number of output node and for
each output node we have. For 3D-CNN layer, but for no zero paddings, if we use padding which is used in
our proposed model the number of operation would be as shown in (11) [29].
b. Maxpooling3D performs comparison operation, for a pool window size of (Pd, Ph, Pw), input data of
(frames, height, width) and strides are (strided, strideh, stridew), we would have:
No. of operations per node=Pd * Ph * Pw (12)
Output nodes=((frames-Pd)/strided+1) * ((height-Ph)/strideh+1) * ((width-
Pw)/stridew+1) no. of current kernels
(13)
Output nodes=(frames/strided+1) * (height/strideh+1) * (width/stridew+1) *
current kernels
(14)
In (12) shows the number of operations for each pool window, in (13) shows the number of output
node without padding which is used in our model and the size of the stride are the same pool window size, in
(14) shows the number of output nodes if there are zero paddings.
c. Fully connected has a vast number of parameters as compared with CNN if the number of neurons of
the previous layer is Nprevious and the number of neurons of the current layer is Ncurrent.
In (15) shows the number of operations for a fully connected layer, power two because we have
multiplications.
5. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Development of 3D convolutional neural network to recognize human … (Malik A. Alsaedi)
3141
Number of operations=(Nprevious * Ncurrent)2 (15)
d. Flatten has only one operation which is reshaping the dimensions into one dimension.
e. Dropout works only during training operation, and it’s just hidden randomly chosen a portion of nodes
so as not to participate in producing the output at only some point in the training, so for testing it
doesn’t cost any operations.
f. GlobalAveragePooling3D adds nodes for each channel where the previous layer output is (frames,
height, width, channels), so it adds all the numbers in frames, height, and width for a particular channel
and divides by the number of (frames * height * width), so the total number of operations is shown
in (16).
Number of operations=(frames * height * width) * channels (16)
Table 3 shows the number of operations for each model proposed according to their figures, and we
can see the least number of operations is the model with the least number of parameters and high accuracy of
92.92%.
Table 3. Number of operations for each model according to its figure
No. of figure No. of operations
Figures 3 1.08 * 1013
Figures 4, 5 5.77 * 1012
Figure 6 4.77 * 1012
4. DISCUSSION
Each result is discussed according to the number of figures.
Figure 4 shows model architect, confusion matrix, accuracy, and loss figures for the earlier proposed
model. We can notice that because the last before flatten were not small we got a massive number of the
parameters for the fully connected layer, also we can see that the model’s learning was saturated in early time
which completed learning within 30 epochs. Figure 5 shows model architect, confusion matrix, accuracy and
loss figures for the proposed model after adding Conv3D and MaxPooling before flatten to reduce the
number of parameters, and the training time and the training finished within 30 epochs. Figure 6, shows
model architect, confusion matrix, accuracy, and loss figures for the model after adding dropouts after the
fourth, sixth, and eighth layers, the accuracy for this model shown remarkable improvement and the training
finished in epoch 100 which is the final demanded epoch. Figure 6 shows model architect, confusion matrix,
accuracy, and loss figures for the model after changing flatten layer with GlobalAveragePooling3D, which
reduced the number of parameters and so the training time per epoch. It also reduced the number of
operations during testing and got a fantastic accuracy of 92.92%, the training was finished in epoch 82.
Layer (type) Output shape Parameters
Conv3D 30, 40, 40, 64 1792
MaxPooling3D 15, 20, 20, 64 0
Conv3D 15, 20, 20, 128 221312
MaxPooling3D 7, 10, 10, 128 0
Conv3D 7, 10, 10, 256 884992
MaxPooling3D 3, 5, 5, 256 0
Flatten 19200 19200
Dense 128 2457728
Dense 32 4128
Dense 6 198
Total Parameters 3,570,150
(a) (b)
Figure 3. Earlier proposed model; (a) model architecture, (b) confusion matrix
6. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3137 – 3146
3142
(c) (d)
Figure 4. Earlier proposed model; (c) accuracy of training and validation, (d) losses of training and validation
(continue)
Layer (type) Output shape Parameters
Conv3D 30, 40, 40, 64 1792
MaxPooling3D 15, 20, 20, 64 0
Conv3D 15, 20, 20, 128 221312
MaxPooling3D 7, 10, 10, 128 0
Conv3D 7, 10, 10, 256 884992
MaxPooling3D 3, 5, 5, 256 0
Conv3D 3, 5, 5, 256 1769728
MaxPooling3D 1, 2, 2, 256 0
Flatten 1024 0
Dense 128 131200
Dense 32 4128
Dense 6 198
Total Parameters 3,013,350
(a) (b)
(c) (d)
Figure 5. Adding fourth convolutional and max-pooling layers; (a) model architecture, (b) confusion matrix,
(c) accuracy of training and validation, (d) losses of training and validation
7. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Development of 3D convolutional neural network to recognize human … (Malik A. Alsaedi)
3143
Layer (type) Output shape Parameters
Conv3D 30, 40, 40, 64 1792
MaxPooling3D 15, 20, 20, 64 0
Conv3D 15, 20, 20, 128 221312
MaxPooling3D 7, 10, 10, 128 0
Dropout (0.5) 7, 10, 10, 128 0
Conv3D 7, 10, 10, 256 884992
MaxPooling3D 3, 5, 5, 256 0
Dropout (0.5) 3, 5, 5, 256 0
Conv3D 3, 5, 5, 256 1769728
MaxPooling3D 1, 2, 2, 256 0
Dropout (0.5) 1, 2, 2, 256 0
Flatten 1024 0
Dense 128 131200
Dense 32 4128
Dense 6 198
Total Parameters 3,013,350
(a) (b)
(c) (d)
Figure 6. Dropouts before third and fourth convolutional and also before flatten layer; (a) model architecture,
(b) confusion matrix, (c) accuracy of training and validation, (d) losses of training and validation
Layer (type) Output shape Parameters
Conv3D 30, 40, 40, 64 1792
MaxPooling3 15, 20, 20, 64 0
Conv3D 15, 20, 20, 128 221312
MaxPooling3 7, 10, 10, 128 0
Conv3D 7, 10, 10, 256 884992
MaxPooling3 3, 5, 5, 256 0
Dropout (0.5) 3, 5, 5, 256 0
GlobalAVG3D 256 0
Dropout (0.5) 256 0
Dense 128 32896
Dense 32 4128
Dense 6 198
Total parameters 1,145,318
(a) (b)
Figure 6. Replacing flatten with global average pooling; (a) model architecture, (b) confusion matrix
8. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3137 – 3146
3144
(c) (d)
Figure 6. Replacing flatten with global average pooling; (c) accuracy of training and validation, (d) losses of
training and validation (continue)
We can see that dropout has great benefit, but this benefit can’t be taken unless when we put
dropout in the right place, we can see that there were several changes to the place and number of dropouts
when had seen that dropouts increases the accuracy when it was added before the third convolutional and
flatten layers but decreased slightly decreased when added before fourth convolutional layer. Then the place
of dropouts was changed to be before and after flatten layer, in which we got the maximum accuracy, after
that this increasing tested for the model with a smaller number of layers for the aim of decreasing the number
of parameters. The results were helpful, then complete the operation of parameters decreasing, flatten layer
has been replaced by Global-average-pooling, which reduced the number of parameters for the model by two
million parameters.
5. CONCLUSION
We have designed a model that can be used for online human activity recognition using moderate
computation machine. The accuracy of our proposed model was raised to 93.33%, and 92.92% for the model
with reduced amount of parameters. The last presented model is useful for moderate computation capabilities
machines, due to its low number of parameters and a low number of mathematical operations. We have
reached this high accuracy by getting the benefit of dropouts, and decreasing learning rate during training
when there is no improvement. The model with a low number of mathematical operations could be used for
online human activity recognition in a smart houses, helping monitoring human activities in the houses. We
intend to do more augmentation for the data to increase the overall accuracy, where only flipping
augmentation is made to the data.
REFERENCES
[1] X. Zhou, W. Liang, K. I. Wang, H. Wang, L. T. Yang and Q. Jin, "Deep-Learning-Enhanced Human Activity
Recognition for Internet of Healthcare Things," in IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6429-6438,
July 2020, doi: 10.1109/JIOT.2020.2985082..
[2] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini and I. De Munari, "IoT Wearable Sensor and
Deep Learning: An Integrated Approach for Personalized Human Activity Recognition in a Smart Home
Environment," in IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8553-8562, Oct. 2019, doi:
10.1109/JIOT.2019.2920283.
[3] A. K. M. Masum, A. Barua, E. H. Bahadur, M. R. Alam, M. A. U. Z. Chowdhury and M. S. Alam, "Human
Activity Recognition Using Multiple Smartphone Sensors," 2018 International Conference on Innovations in
Science, Engineering and Technology (ICISET), 2018, pp. 468-473, doi: 10.1109/ICISET.2018.8745628.
[4] M. M. Hassan, M. Z. Uddin, A. Mohamed and A. Almogren, “A robust human activity recognition system using
smartphone sensors and deep learning,” Future Generation Computer Systems, vol. 81, pp. 307-313, 2018, doi:
10.1016/j.future.2017.11.029.
9. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Development of 3D convolutional neural network to recognize human … (Malik A. Alsaedi)
3145
[5] X. Wu, Z. Chu, P. Yang, C. Xiang, X. Zheng and W. Huang, "TW-See: Human Activity Recognition Through the
Wall With Commodity Wi-Fi Devices," in IEEE Transactions on Vehicular Technology, vol. 68, no. 1, pp. 306-
319, Jan. 2019, doi: 10.1109/TVT.2018.2878754.
[6] A. Diba et al., “Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification,”
Computer Science, 2017.
[7] T. Lima, B. Fernandes and P. Barros, "Human action recognition with 3D convolutional neural network," 2017
IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2017, pp. 1-6, doi: 10.1109/LA-
CCI.2017.8285700.
[8] J. Carreira and A. Zisserman, “Quo Vadis, action recognition? A new model and the kinetics dataset,” Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2017-Janua, pp. 4724–4733,
2017.
[9] R. Singh, A. K. S. Kushwaha and R. Srivastava, “Multi-view recognition system for human activity based on
multiple features for video surveillance system,” Multimedia Tools and Applications, vol. 78, no. 12, pp. 17165-
17196, 2019, doi: 10.1007/s11042-018-7108-9.
[10] H. D. Mehr and H. Polat, "Human Activity Recognition in Smart Home With Deep Learning Approach," 2019 7th
International Istanbul Smart Grids and Cities Congress and Fair (ICSG), 2019, pp. 149-153, doi:
10.1109/SGCF.2019.8782290.
[11] Z. Tu et al., “Multi-stream CNN: Learning representations based on human-related regions for action recognition,”
Pattern Recognition, vol. 79, pp. 32-43, 2018, doi: 10.1016/j.patcog.2018.01.020.
[12] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition
unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980, doi:
10.1007/BF00344251.
[13] J J. Deng, W. Dong, R. Socher, L. Li, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image
database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248-255, doi:
10.1109/CVPR.2009.5206848.
[14] A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,”
Advances in neural information processing systems, vol. 60, no. 6, pp. 84-90, 2017, doi: 10.1145/3065386.
[15] “KTH dataset,” 2005. [Online]. Available: https://www.csc.kth.se/cvap/actions/. [Accessed: 27-May-2020].
[16] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-Scale Video Classification
with Convolutional Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014,
pp. 1725-1732, doi: 10.1109/CVPR.2014.223.
[17] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in
Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 1, no. 1, pp. 568-
576, 2014, doi: 10.5555/2968826.2968890.
[18] S. Ji, W. Xu, M. Yang and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," in IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013, doi:
10.1109/TPAMI.2012.59.
[19] D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, “Learning spatiotemporal features with 3D
convolutional networks,” Proceedings of the IEEE International Conference on Computer Vision (ICCV), vol. 2015
Inter, pp. 4489-4497, 2015.
[20] K. Soomro, A. R. Zamir and M. Shah, “UCF101: A Dataset of 101 Human Actions Classes From Videos in The
Wild,” Computer Vision and Pattern Recognition, no. November, 2012.
[21] M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” Distributed,
Parallel, and Cluster Computing, 2016.
[22] François Chollet, “Keras,” 2015. [Online]. Available: https://keras.io/. [Accessed: 08-Jun-2020].
[23] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” Computer Science, Mathematics, pp. 1-
15, 2015.
[24] M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,”
Information Processing & Management, vol. 45, no. 4, pp. 427-437, 2009, doi: 10.1016/j.ipm.2009.03.002.
[25] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,”
32nd Conference on Neural Information Processing Systems (NeurIPS), 2018, vol. 2018-Decem, no. NeurIPS, pp.
8778-8788, doi: 10.5555/3327546.3327555.
[26] M. Ahmad and S. W. Lee, “Human action recognition using shape and CLG-motion flow from multi-view image
sequences,” Pattern Recognition, vol. 41, no. 7, pp. 2237-2252, 2008, doi: 10.1016/j.patcog.2007.12.008.
[27] G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, “Convolutional learning of spatio-temporal features,”
European conference on computer vision, Springer, Berlin, Heidelberg, 2010, vol. 6316 LNCS, no. PART 6, pp.
140-153, doi: 10.1007/978-3-642-15567-3_11.
[28] H. Qian, Y. Mao, W. Xiang and Z. Wang, “Recognition of human activities using SVM multi-class classifier,”
Pattern Recognition Letters, vol. 31, no. 2, pp. 100-111, 2010, doi: 10.1016/j.patrec.2009.09.019.
[29] I. Vasilev, D. Slater, G. Spacagna, P. Roelants and V. Zocca, "Python Deep Learning," 2nd Editio. Birmingham:
Packt Publishing, 2019.
10. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3137 – 3146
3146
BIOGRAPHIES OF AUTHORS
Malik Alsaedi is Asst. Prof. of electrical engineering. He finished his B.Sc. degree from
University of Technology University Baghdad, the M.Tch. degree from JNTU University
India and the Ph.D. degree from UTM university Malaysia. Currently position a deputy dean
of engineering college Al-Iraqia University Iraq. He is interested in optical communication
and IoT technology.
Abdulrahman S. Mohialdeen has a Bachelor degree in Electrical Engineering from
University of Baghdad, Master degree in Computer Engineering from Al-Iraqia University,
research interest in deep learning, human activity recognition, and computer vision.
Baraa Munqith Albaker received both B.Sc. degree in electrical engineering and
M.Sc. degree in computer and control engineering from University of Baghdad, Iraq, and
Ph.D. degree in control engineering from University of Malaya, Malaysia. He had worked in
industry on data acquisition systems and radar signal processing and analysis for over three
years. He was a lecturer at University of Baghdad for four years. Next, he was a senior lecturer
of UMPEDAC research Centre, University of Malaya for two years. Currently, he works as
head of Networks Engineering department at Al-Iraqia University. His research interests focus
on contemporary development in computer and control applications.