Due to the extensive use of video-sharing platforms and services, the amount of such all kinds of content on the web has become massive. This abundance of information is a problem controlling the kind of content that may be present in such a video. More than telling if the content is suitable for children and sensitive people or not, figuring it out is also important what parts of it contains such content, for preserving parts that would be discarded in a simple broad analysis. To tackle this problem, a comparison was done for popular image deep learning models: MobileNetV2, Xception model, InceptionV3, VGG16, VGG19, ResNet101 and ResNet50 to seek the one that is most suitable for the required application. Also, a system is developed that would automatically censor inappropriate content such as violent scenes with the help of deep learning. The system uses a transfer learning mechanism using the VGG16 model. The experiments suggested that the model showed excellent performance for the automatic censoring application that could also be used in other similar applications.
Detecting anomalies in security cameras with 3D-convolutional neural network ...IJECEIAES
This paper presents a novel deep learning-based approach for anomaly detec- tion in surveillance films. A deep network that has been trained to recognize objects and human activity in movies forms the foundation of the suggested ap- proach. In order to detect anomalies in surveillance films, the proposed method combines the strengths of 3D-convolutional neural network (3DCNN) and con- volutional long short-term memory (ConvLSTM). From the video frames, the 3DCNN is utilized to extract spatiotemporal features,while ConvLSTM is em- ployed to record temporal relationships between frames. The technique was evaluated on five large-scale datasets from the actual world (UCFCrime, XD- Violence, UBIFights, CCTVFights, UCF101) that had both indoor and outdoor video clips as well as synthetic datasets with a range of object shapes, sizes, and behaviors. The results further demonstrate that combining 3DCNN with Con- vLSTM can increase precision and reduce false positives, achieving a high ac- curacy and area under the receiver operating characteristic-area under the curve (ROC-AUC) in both indoor and outdoor scenarios when compared to cutting- edge techniques mentioned in the comparison.
Framework for Contextual Outlier Identification using Multivariate Analysis a...IJECEIAES
Majority of the existing commercial application for video surveillance system only captures the event frames where the accuracy level of captures is too poor. We reviewed the existing system to find that at present there is no such research technique that offers contextual-based scene identification of outliers. Therefore, we presented a framework that uses unsupervised learning approach to perform precise identification of outliers for a given video frames concerning the contextual information of the scene. The proposed system uses matrix decomposition method using multivariate analysis to maintain an equilibrium better faster response time and higher accuracy of the abnormal event/object detection as an outlier. Using an analytical methodology, the proposed system blocking operation followed by sparsity to perform detection. The study outcome shows that proposed system offers an increasing level of accuracy in contrast to the existing system with faster response time.
A fully integrated violence detection system using CNN and LSTM IJECEIAES
Recently, the number of violence-related cases in places such as remote roads, pathways, shopping malls, elevators, sports stadiums, and liquor shops, has increased drastically which are unfortunately discovered only after it’s too late. The aim is to create a complete system that can perform realtime video analysis which will help recognize the presence of any violent activities and notify the same to the concerned authority, such as the police department of the corresponding area. Using the deep learning networks CNN and LSTM along with a well-defined system architecture, we have achieved an efficient solution that can be used for real-time analysis of video footage so that the concerned authority can monitor the situation through a mobile application that can notify about an occurrence of a violent event immediately.
Detecting anomalies in security cameras with 3D-convolutional neural network ...IJECEIAES
This paper presents a novel deep learning-based approach for anomaly detec- tion in surveillance films. A deep network that has been trained to recognize objects and human activity in movies forms the foundation of the suggested ap- proach. In order to detect anomalies in surveillance films, the proposed method combines the strengths of 3D-convolutional neural network (3DCNN) and con- volutional long short-term memory (ConvLSTM). From the video frames, the 3DCNN is utilized to extract spatiotemporal features,while ConvLSTM is em- ployed to record temporal relationships between frames. The technique was evaluated on five large-scale datasets from the actual world (UCFCrime, XD- Violence, UBIFights, CCTVFights, UCF101) that had both indoor and outdoor video clips as well as synthetic datasets with a range of object shapes, sizes, and behaviors. The results further demonstrate that combining 3DCNN with Con- vLSTM can increase precision and reduce false positives, achieving a high ac- curacy and area under the receiver operating characteristic-area under the curve (ROC-AUC) in both indoor and outdoor scenarios when compared to cutting- edge techniques mentioned in the comparison.
Framework for Contextual Outlier Identification using Multivariate Analysis a...IJECEIAES
Majority of the existing commercial application for video surveillance system only captures the event frames where the accuracy level of captures is too poor. We reviewed the existing system to find that at present there is no such research technique that offers contextual-based scene identification of outliers. Therefore, we presented a framework that uses unsupervised learning approach to perform precise identification of outliers for a given video frames concerning the contextual information of the scene. The proposed system uses matrix decomposition method using multivariate analysis to maintain an equilibrium better faster response time and higher accuracy of the abnormal event/object detection as an outlier. Using an analytical methodology, the proposed system blocking operation followed by sparsity to perform detection. The study outcome shows that proposed system offers an increasing level of accuracy in contrast to the existing system with faster response time.
A fully integrated violence detection system using CNN and LSTM IJECEIAES
Recently, the number of violence-related cases in places such as remote roads, pathways, shopping malls, elevators, sports stadiums, and liquor shops, has increased drastically which are unfortunately discovered only after it’s too late. The aim is to create a complete system that can perform realtime video analysis which will help recognize the presence of any violent activities and notify the same to the concerned authority, such as the police department of the corresponding area. Using the deep learning networks CNN and LSTM along with a well-defined system architecture, we have achieved an efficient solution that can be used for real-time analysis of video footage so that the concerned authority can monitor the situation through a mobile application that can notify about an occurrence of a violent event immediately.
The ubiquitous and connected nature of camera loaded mobile devices has greatly estimated the value and importance of visual information they capture. Today, sending videos from camera phones uploaded by unknown users is relevant on news networks, and banking customers expect to be able to deposit checks using mobile devices. In this paper we represent Movee, a system that addresses the fundamental question of whether the visual stream exchange by a user has been captured live on a mobile device, and has not been tampered with by an adversary. Movee leverages the mobile device motion sensors and the inherent user movements during the shooting of the video. Movee exploits the observation that the movement of the scene recorded on the video stream should be related to the movement of the device simultaneously captured by the accelerometer. the last decade e-lecturing has become more and more popular. We model the distribution of correlation of temporal noise residue in a forged video as a Gaussian mixture model (GMM). We propose a twostep scheme to estimate the model parameters. Consequently, a Bayesian classifier is used to find the optimal threshold value based on the estimated parameters. Cyrus Deboo | Shubham Kshatriya | Rajat Bhat"Video Liveness Verification" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12772.pdf http://www.ijtsrd.com/computer-science/other/12772/video-liveness-verification/cyrus-deboo
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for
retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification
resulting from semantic analysis of video. The obtained classes will be projected in the visualization space.
The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s
profile, based on the interaction with the system, to render the system more adequate to its preferences.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s profile, based on the interaction with the system, to render the system more adequate to its preferences.
Novel framework for optimized digital forensic for mitigating complex image ...IJECEIAES
Digital Image Forensic is significantly becoming popular owing to the increasing usage of the images as a media of information propagation. However, owing to the presence of various image editing tools and software, there is also an increasing threat to image content security. Reviewing the existing approaches to identify the traces or artifacts states that there is a large scope of optimization to be implemented to enhance the processing further. Therefore, this paper presents a novel framework that performs cost-effective optimization of digital forensic technique with an idea of accurately localizing the area of tampering as well as offers a capability to mitigate the attacks of various forms. The study outcome shows that the proposed system offers better outcomes in contrast to the existing system to a significant scale to prove that minor novelty in design attributes could induce better improvement with respect to accuracy as well as resilience toward all potential image threats.
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...ijdpsjournal
Preventing users from accessing adult videos and at the same time allowing them to access good educational videos and other materials through campus wide network is a big challenge for organizations. Major existing web filtering systems are textual content or link analysis based. As a result, potential users
cannot access qualitative and informative video content which is available online. Adult content detection
in video based on motion features or skin detection requires significant computing power and time. Judgment to identify pornography videos is taken based on processing of every chunk from video, consisting specific number of frames, sequentially one after another. This solution is not feasible in real time when user has started watching the video and decision about blocking needs to be taken within few seconds.
On the fly porn video blocking using distributed multi gpu and data mining ap...ijdpsjournal
Preventing users from accessing adult videos and at the same time allowing them to access good
educational videos and other materials through campus wide network is a big challenge for organizations.
Major existing web filtering systems are textual content or link analysis based. As a result, potential users
cannot access qualitative and informative video content which is available online. Adult content detection
in video based on motion features or skin detection requires significant computing power and time.
Judgment to identify pornography videos is taken based on processing of every chunk from video,
consisting specific number of frames, sequentially one after another. This solution is not feasible in real
time when user has started watching the video and decision about blocking needs to be taken within few
seconds.
In this paper, we propose a model where user is allowed to start watching any video; at the backend porn
detection process using extracted video and image features shall run on distributed nodes with multiple
GPUs (Graphics Processing Units). The video is processed on parallel and distributed platform in shortest
time and decision about filtering the video is taken in real time. Track record of blocked content and
websites is cached, too. For every new video downloads, cache is verified to prevent repetitive content
analysis. On the fly blocking is feasible due to latest GPU architecture, CUDA (Compute Unified Device
Architecture) and CUDA aware MPI (Message Passing Interface). It is possible to achieve coarse grained
as well as fine grained parallelism. Video Chunks are processed parallel on distributed nodes. Porn
detection algorithm on frames of chunks of videos can also achieve parallelism using GPUs on single node.
It ultimately results into blocking porn video on the fly and allowing educational and informative videos.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
CRIMINAL IDENTIFICATION FOR LOW RESOLUTION SURVEILLANCEvivatechijri
Criminal Identification System allows the user to identify a certain criminal based on their biometrics. With advancements in security technology, CCTV cameras have been installed in many public and private areas to provide surveillance activities. The CCTV footage becomes crucial for understanding of the criminal activities that take place and to detect suspects. Additionallywhen a criminal is found it is difficult to locate and track him with just his image if he is on the run. Currently this procedure consists of finding such people in CCTV surveillance footage manually which is time consuming. It is also a tedious process as the resolution for such CCTV cameras is quite low. As a solution to these issues, the proposed system is developed to go through real time surveillance footage, detect and recognize the criminals based on reference datasets of criminals. The use of facial recognition for identifying criminals proves to bebeneficial. Once the best match is found the real time cropped image of the recognized criminal is saved which can be accessed by authorized officials for locating and tracking criminals or for further investigative use.
Analysis of student sentiment during video class with multi-layer deep learni...IJECEIAES
The modern education system is an essential part of the rise of technology. The E-learning education system is not just an experimental system; it is a vital learning system for the whole world over the last few months. In our research, we have developed our learning method in a more effective and modern way for students and teachers. For significant implementation, we are implementing convolutions neural networks and advanced data classifiers. The expression and mood analysis of a student during the online class is the main focus of our study. For output measure, we divide the final output result as attentive, inattentive, understand, and neutral. Showing the output in real-time online class and for sensory analysis, we have used support vector machine (SVM) and OpenCV. The level of 5*4 neural network is created for this work. An advanced learning medium is proposed through our study. Teachers can monitor the live class and different feelings of a student during the class period through this system.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
In recent years, the modeling of human behaviors and patterns of activity for recognition or detection of special events has attracted considerable research interest. Various methods abounding to build intelligent vision systems aimed at understanding the scene and making correct semantic inferences from the observed dynamics of moving targets. Many systems include detection, storage of video information, and human-computer interfaces. Here we present not only an update that expands previous similar surveys but also a emphasis on contextual abnormal detection of human activity , especially in video surveillance applications. The main purpose of this survey is to identify existing methods extensively, and to characterize the literature in a manner that brings to attention key challenges.
APPLICATION OF VARIOUS DEEP LEARNING MODELS FOR AUTOMATIC TRAFFIC VIOLATION D...ijitcs
A rapid growth in the population and economic growth has resulted in an increasing number of vehicles on
road every year. Traffic congestion is a big problem in every metropolitan city. To reach their destination
faster and to avoid traffic, some people are violating traffic rules and regulations. Violation of traffic rules
puts everyone in danger. Maintaining traffic rules manually has become difficult over the time due to the
rapid increase in the population. This alarming situation has be taken care of at the earliest. To overcome
this, we need a real-time violation detection system to help maintain the traffic rules. The approach is to
detect traffic violations in real-time using edge computing, which reduces the time to detect. Different
machine learning models and algorithms were applied to detect traffic violations like traveling without a
helmet, line crossing, parking violation detection, violating the one-way rule etc. The model implemented
gave an accuracy of around 85%, due to memory constraints of the edge device in this case NVIDIA Jetson
Nano, as the fps is quite low.
Java Implementation based Heterogeneous Video Sequence Automated Surveillance...CSCJournals
Automated video based surveillance monitoring is an essential and computationally challenging task to resolve issues in the secure access localities. This paper deals with some of the issues which are encountered in the integration surveillance monitoring in the real-life circumstances. We have employed video frames which are extorted from heterogeneous video formats. Each video frame is chosen to identify the anomalous events which are occurred in the sequence of time-driven process. Background subtraction is essentially required based on the optimal threshold and reference frame. Rest of the frames are ablated from reference image, hence all the foreground images paradigms are obtained. The co-ordinate existing in the deducted images is found by scanning the images horizontally until the occurrence of first black pixel. Obtained coordinate is twinned with existing co-ordinates in the primary images. The twinned co-ordinate in the primary image is considered as an active-region-of-interest. At the end, the starred images are converted to temporal video that scrutinizes the moving silhouettes of human behaviors in a static background. The proposed model is implemented in Java. Results and performance analysis are carried out in the real-life environments.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
More Related Content
Similar to Automatic video censoring system using deep learning
The ubiquitous and connected nature of camera loaded mobile devices has greatly estimated the value and importance of visual information they capture. Today, sending videos from camera phones uploaded by unknown users is relevant on news networks, and banking customers expect to be able to deposit checks using mobile devices. In this paper we represent Movee, a system that addresses the fundamental question of whether the visual stream exchange by a user has been captured live on a mobile device, and has not been tampered with by an adversary. Movee leverages the mobile device motion sensors and the inherent user movements during the shooting of the video. Movee exploits the observation that the movement of the scene recorded on the video stream should be related to the movement of the device simultaneously captured by the accelerometer. the last decade e-lecturing has become more and more popular. We model the distribution of correlation of temporal noise residue in a forged video as a Gaussian mixture model (GMM). We propose a twostep scheme to estimate the model parameters. Consequently, a Bayesian classifier is used to find the optimal threshold value based on the estimated parameters. Cyrus Deboo | Shubham Kshatriya | Rajat Bhat"Video Liveness Verification" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12772.pdf http://www.ijtsrd.com/computer-science/other/12772/video-liveness-verification/cyrus-deboo
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for
retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification
resulting from semantic analysis of video. The obtained classes will be projected in the visualization space.
The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s
profile, based on the interaction with the system, to render the system more adequate to its preferences.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s profile, based on the interaction with the system, to render the system more adequate to its preferences.
Novel framework for optimized digital forensic for mitigating complex image ...IJECEIAES
Digital Image Forensic is significantly becoming popular owing to the increasing usage of the images as a media of information propagation. However, owing to the presence of various image editing tools and software, there is also an increasing threat to image content security. Reviewing the existing approaches to identify the traces or artifacts states that there is a large scope of optimization to be implemented to enhance the processing further. Therefore, this paper presents a novel framework that performs cost-effective optimization of digital forensic technique with an idea of accurately localizing the area of tampering as well as offers a capability to mitigate the attacks of various forms. The study outcome shows that the proposed system offers better outcomes in contrast to the existing system to a significant scale to prove that minor novelty in design attributes could induce better improvement with respect to accuracy as well as resilience toward all potential image threats.
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...ijdpsjournal
Preventing users from accessing adult videos and at the same time allowing them to access good educational videos and other materials through campus wide network is a big challenge for organizations. Major existing web filtering systems are textual content or link analysis based. As a result, potential users
cannot access qualitative and informative video content which is available online. Adult content detection
in video based on motion features or skin detection requires significant computing power and time. Judgment to identify pornography videos is taken based on processing of every chunk from video, consisting specific number of frames, sequentially one after another. This solution is not feasible in real time when user has started watching the video and decision about blocking needs to be taken within few seconds.
On the fly porn video blocking using distributed multi gpu and data mining ap...ijdpsjournal
Preventing users from accessing adult videos and at the same time allowing them to access good
educational videos and other materials through campus wide network is a big challenge for organizations.
Major existing web filtering systems are textual content or link analysis based. As a result, potential users
cannot access qualitative and informative video content which is available online. Adult content detection
in video based on motion features or skin detection requires significant computing power and time.
Judgment to identify pornography videos is taken based on processing of every chunk from video,
consisting specific number of frames, sequentially one after another. This solution is not feasible in real
time when user has started watching the video and decision about blocking needs to be taken within few
seconds.
In this paper, we propose a model where user is allowed to start watching any video; at the backend porn
detection process using extracted video and image features shall run on distributed nodes with multiple
GPUs (Graphics Processing Units). The video is processed on parallel and distributed platform in shortest
time and decision about filtering the video is taken in real time. Track record of blocked content and
websites is cached, too. For every new video downloads, cache is verified to prevent repetitive content
analysis. On the fly blocking is feasible due to latest GPU architecture, CUDA (Compute Unified Device
Architecture) and CUDA aware MPI (Message Passing Interface). It is possible to achieve coarse grained
as well as fine grained parallelism. Video Chunks are processed parallel on distributed nodes. Porn
detection algorithm on frames of chunks of videos can also achieve parallelism using GPUs on single node.
It ultimately results into blocking porn video on the fly and allowing educational and informative videos.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
CRIMINAL IDENTIFICATION FOR LOW RESOLUTION SURVEILLANCEvivatechijri
Criminal Identification System allows the user to identify a certain criminal based on their biometrics. With advancements in security technology, CCTV cameras have been installed in many public and private areas to provide surveillance activities. The CCTV footage becomes crucial for understanding of the criminal activities that take place and to detect suspects. Additionallywhen a criminal is found it is difficult to locate and track him with just his image if he is on the run. Currently this procedure consists of finding such people in CCTV surveillance footage manually which is time consuming. It is also a tedious process as the resolution for such CCTV cameras is quite low. As a solution to these issues, the proposed system is developed to go through real time surveillance footage, detect and recognize the criminals based on reference datasets of criminals. The use of facial recognition for identifying criminals proves to bebeneficial. Once the best match is found the real time cropped image of the recognized criminal is saved which can be accessed by authorized officials for locating and tracking criminals or for further investigative use.
Analysis of student sentiment during video class with multi-layer deep learni...IJECEIAES
The modern education system is an essential part of the rise of technology. The E-learning education system is not just an experimental system; it is a vital learning system for the whole world over the last few months. In our research, we have developed our learning method in a more effective and modern way for students and teachers. For significant implementation, we are implementing convolutions neural networks and advanced data classifiers. The expression and mood analysis of a student during the online class is the main focus of our study. For output measure, we divide the final output result as attentive, inattentive, understand, and neutral. Showing the output in real-time online class and for sensory analysis, we have used support vector machine (SVM) and OpenCV. The level of 5*4 neural network is created for this work. An advanced learning medium is proposed through our study. Teachers can monitor the live class and different feelings of a student during the class period through this system.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
In recent years, the modeling of human behaviors and patterns of activity for recognition or detection of special events has attracted considerable research interest. Various methods abounding to build intelligent vision systems aimed at understanding the scene and making correct semantic inferences from the observed dynamics of moving targets. Many systems include detection, storage of video information, and human-computer interfaces. Here we present not only an update that expands previous similar surveys but also a emphasis on contextual abnormal detection of human activity , especially in video surveillance applications. The main purpose of this survey is to identify existing methods extensively, and to characterize the literature in a manner that brings to attention key challenges.
APPLICATION OF VARIOUS DEEP LEARNING MODELS FOR AUTOMATIC TRAFFIC VIOLATION D...ijitcs
A rapid growth in the population and economic growth has resulted in an increasing number of vehicles on
road every year. Traffic congestion is a big problem in every metropolitan city. To reach their destination
faster and to avoid traffic, some people are violating traffic rules and regulations. Violation of traffic rules
puts everyone in danger. Maintaining traffic rules manually has become difficult over the time due to the
rapid increase in the population. This alarming situation has be taken care of at the earliest. To overcome
this, we need a real-time violation detection system to help maintain the traffic rules. The approach is to
detect traffic violations in real-time using edge computing, which reduces the time to detect. Different
machine learning models and algorithms were applied to detect traffic violations like traveling without a
helmet, line crossing, parking violation detection, violating the one-way rule etc. The model implemented
gave an accuracy of around 85%, due to memory constraints of the edge device in this case NVIDIA Jetson
Nano, as the fps is quite low.
Java Implementation based Heterogeneous Video Sequence Automated Surveillance...CSCJournals
Automated video based surveillance monitoring is an essential and computationally challenging task to resolve issues in the secure access localities. This paper deals with some of the issues which are encountered in the integration surveillance monitoring in the real-life circumstances. We have employed video frames which are extorted from heterogeneous video formats. Each video frame is chosen to identify the anomalous events which are occurred in the sequence of time-driven process. Background subtraction is essentially required based on the optimal threshold and reference frame. Rest of the frames are ablated from reference image, hence all the foreground images paradigms are obtained. The co-ordinate existing in the deducted images is found by scanning the images horizontally until the occurrence of first black pixel. Obtained coordinate is twinned with existing co-ordinates in the primary images. The twinned co-ordinate in the primary image is considered as an active-region-of-interest. At the end, the starred images are converted to temporal video that scrutinizes the moving silhouettes of human behaviors in a static background. The proposed model is implemented in Java. Results and performance analysis are carried out in the real-life environments.
Similar to Automatic video censoring system using deep learning (20)
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Automatic video censoring system using deep learning
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 12, No. 6, December 2022, pp. 6744~6755
ISSN: 2088-8708, DOI: 10.11591/ijece.v12i6.pp6744-6755 6744
Journal homepage: http://ijece.iaescore.com
Automatic video censoring system using deep learning
Yash Verma1
, Madhulika Bhatia2
, Poonam Tanwar3
, Shaveta Bhatia4
, Mridula Batra4
1
Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University, Noida, India
2
Department of Computer Science and Engineering, Manav Rachna International Institute of Research and Studies, Faridabad, India
3
Faculty of Engineering and Technology, Manav Rachna International Institute of Research and Studies, Faridabad, India
4
Faculty of Computer Applications, Manav Rachna International Institute of Research and Studies, Faridabad, India
Article Info ABSTRACT
Article history:
Received Jun 9, 2021
Revised Jun 21, 2022
Accepted Jul 18, 2022
Due to the extensive use of video-sharing platforms and services, the amount
of such all kinds of content on the web has become massive. This abundance
of information is a problem controlling the kind of content that may be
present in such a video. More than telling if the content is suitable for
children and sensitive people or not, figuring it out is also important what
parts of it contains such content, for preserving parts that would be discarded
in a simple broad analysis. To tackle this problem, a comparison was done
for popular image deep learning models: MobileNetV2, Xception model,
InceptionV3, VGG16, VGG19, ResNet101 and ResNet50 to seek the one
that is most suitable for the required application. Also, a system is developed
that would automatically censor inappropriate content such as violent scenes
with the help of deep learning. The system uses a transfer learning
mechanism using the VGG16 model. The experiments suggested that the
model showed excellent performance for the automatic censoring application
that could also be used in other similar applications.
Keywords:
Automatic video censoring
Computer vision
Convolutional neural network
Deep learning
Machine learning
This is an open access article under the CC BY-SA license.
Corresponding Author:
Madhulika Bhatia
Department of Computer Science and Engineering, Amity School of Engineering and Technology
Amity University
Amity Rd, Sector 125, Noida, Uttar Pradesh 201301, India
Email: madhulikabhatia@gmail.com
1. INTRODUCTION
“We become what we see” is an ancient famous quote that holds even today. It points out the
importance of the visual information effect on people in a way that is profoundly ingrained. Internet is full of
things that are inappropriate for children and even teenagers that can affect their young minds in a very
negative way. Violent scenes that encourage young viewers to reproduce the roles performed on television by
their characters will cost them their lives. Films claiming that drug use is health-destroying simply promote
drug use as young minds become curious to do it in real life. These scenes should be censored tighter.
The internet's media presence is vast because of the widespread use of video sharing sites and cloud
facilities. This data volume makes it impossible to monitor the content of those videos. One of the most
crucial issues about the video contents is whether it has an objectionable topic, for example, violence or
abuse in any form. More than telling if a video is either appropriate or inappropriate, it is also important to
identify which parts of it contain such content, for preserving parts that would be discarded in a simple broad
analysis.
Censoring boards and committees are there for large scale movies and television shows, but people
are moving to other sources like smartphones, tablets, and computers. as their primary source of
entertainment from movies theatres and Television shows. There are several sources of information and
2. Int J Elec & Comp Eng ISSN: 2088-8708
Automatic video censoring system using deep learning… (Yash Verma)
6745
entertainment like Netflix, Amazon prime video, YouTube, Instagram, Facebook, and Twitter. As these can
be very convenient at times due to their portability and reach to the major population in the world. A
mechanism has not been put into place to censor the content which gets shared through these platforms.
Especially in developing countries like India, this is a big issue. As opposed to the primary movies given by
central board of film certification (CBFC), the broadcasting content complaints council (BCCC), and the
over-the-top (OTT) platforms do not have a regulatory authority to monitor the content streamed and
therefore enjoy its rights. The contents on these websites are also subject to observation by the supreme court
direct contradiction to the various laws of the country. But the legal loopholes and grey fields are also
troubling [1]. Also, many notorious groups deliberately put inappropriate content on social media. There
were many instances where people put up content related to unbearable violence and bloodshed that is
unbearable even for fully grown adults. The proposed system will help in filtering the above-said content.
Censoring content in the videos may be specified as an example of binary classification where two
different classes can be appropriate and non-appropriate. Some of the popular machine learning (ML) are
logistic regression, k-nearest neighbors, decision trees, support vector machines (SVM) and naïve-Bayes
algorithm. But here the data to be classified is in the form of complex images that vary very much. Due to
this complex data, traditional ML algorithms are not enough as discussed in the literature review section
below. So, this problem has to be tackled by deep learning (DL). The best known and most effective deep
learning technique is the convolutional neural network. Through this work, different deep learning models
have been compared to find the model that is best suited for the task of automatic censoring of videos and
images by classifying the frames as violent or non-violent. And the complete architecture of the system is
provided in this paper along with the results and at last, conclusions are drawn.
A lot of research was conducted for the domain of video censoring using machine learning and deep
learning. Some of the most notable work relevant to this project has been mentioned. Nievas et al. [2]
evaluated the various state of the art video descriptors for the task of detection of fights on datasets
containing action movie scenes and National Hockey League footage. The popular bag of words approach
has been found to 90% accurately classify the frames. It was discovered that for hockey data set that the
accuracy of the choice of feature descriptor of low level and vocabulary size is not important; furthermore, on
the latter dataset, descriptor choice was critical; under either circumstance, the motion SIFT (MoSIFT)
functionality was significantly greater than the best spatio-temporal interest points (STIP).
Deniz et al. [3] proposed a new approach for the detection of violent actions in which the major
discriminatory attribute is extreme acceleration patterns. The proposed algorithms resulted in 12% accuracy
improvement over state-of-the-art methods in surveillance scenarios including sports where complex actions
take place. It was hypothesized from the experiments that motion alone is sufficient for categorization and
visuals could be a cause of confusion in the detection process and cost any additional computations. The
method found to be 15-fold faster than with a very a smaller number of features. This can also be used as the
first stage of a maximum accuracy system with STIP or MoSIFT features.
Fu et al. [4] presented an efficient approach for violence detection in videos using analysis of
motion without action events, gestures, or complex behavior recognition. A heuristic framework has been
suggested, built on a decision tree structure, to derive more accurate motion information from optical flow
images to distinguish movement types by motion, count, size, and direction according to the movement
regions extract. Motion attraction was also suggested to measure intensity between two motion zones, which
will measure different statistics as classification features.
Bilinski and Bremond [5] proposed a technique that is based on improved fisher vectors (IFV)
allowing for both spatio-temporal as well as local features for video representation for the purpose of
recognizing violence in videos. When comparing the proposed technique with temporal-spatial positions IFV,
the proposed extension got similar or better accuracy. This new strategy has been more effective than the
previous approaches for violence detection too. Here, the approach using sliding-window have also been
studied and IFV has been reformulated for increasing the speed of the framework for detection of violence on
four states of the art datasets.
Nar et al. [6] showed that for abnormal behavior detection in front of automatic teller machines
(ATMs), the recognition of posture could be used. The experiments were done on the data captured using
Kinect 3D camera. Logistic regression was used as the classification technique. For the calculation of optimal
parameters, gradient descent was used.
Xie et al. [7] proposed an algorithm using a motion vector for the detection of violence for
surveillance videos. First, the system removes motion vectors from compressed videos and next analyses the
space-time distribution of the magnitude of the motion vectors and the path to take region motion vectors
(RMV), then uses radial based SVM, which classifies RMV and determines violent behavior in monitoring
videos. It can increase video-monitoring systems performance on the detection of violent activities in
real-time and increase the retrieval systems performance for videos on the position of the violence in
historical videotapes or other media source. The methodology provided is highly appropriate for front video
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6744-6755
6746
encoders operating on an integrated digital signal processor (DSP) network since it saves measures to
identify and trace movements that are normally applied in the conventional method of behavior analysis. The
provided methodology could classify the UCF sports dataset with 91.9% accuracy.
Ribeiro et al. [8] worked on the issue of detection of violence where detection is difficult due to
some external factors like backgrounds that are clustered and dynamic, occlusion making the scene complex
for detection. A proposition was given for rotation-invariant feature modelling motion coherence that is
specific to the violence detection problem. That is used to differentiate against unstructured movements and
structure capture. The value of the histogram of optical flow vectors, obtained from the 2nd
order statistics of
time instants, is calculated locally, densely, and integrated into the Riemannian spherical manifold. It is
dependent on its values. It was shown through experiments, the accuracy given by the provided approach
similar to those of state-of-the-art models in the laboratory as well as real-world setting.
Senst et al. [9] presented an automatic special violence detection technique based on the Lagrangian
technique. We propose a new feature based on a spatio-temporal model, which uses appearance, context
motion compensation, and details on the longer-term motion. We provide an extensive bag-of-words
procedure as a per-video classification scheme to ensure suitable spatial and temporal scales. The proposed
architecture was thoroughly reviewed by the London metropolitan police on multiple challenge data sets and
real-world non-public data.
Zhou et al. [10] proposed FightNet, convolutional neural network (CNN) for modelling long-term
temporal structure for detecting interactions that are violent. The model was trained on the UCF101 dataset.
Acceleration fields are researched for capturing motion features and found to be responsible for a 0.3%
increase in inaccuracy. The system was able to achieve higher accuracy with decreased computational cost.
Firstly, the framing of each video is red, green, blue (RGB). Second, the field of optical flow is calculated by
following photos. Acceleration is obtained by the optical flow field. Third, FightNet has three types of input
modes, including RGB images, optical flow images and acceleration images for temporal networks. When
fused from all inputs, we infer whether a video says a violent incident or not.
Chaudhary et al. [11] put forward a technique for automatically detecting violence in surveillance
videos. The method proposed contains three key steps: moving object identification, observing objects and
comprehension of behavior for movement recognition. Key features (speed, direction, center, and
dimensions) are defined by using the feature extraction method. This helps trace objects in video frames. For
extracting foreground, the Gaussian mixture model was utilized. This method could categorize the videos
with 90% accuracy.
Zhou et al. [12] also proposed a different algorithm involving optical flow fields. First, the
movement regions are divided by optical flow field distribution. Next, in motion areas, a proposal was made
that the presence and dynamics of aggressive actions be extracted from two kinds of low-level features. The
suggested low-level characteristics of the local appendix histogram (LHOG), derived from the local appendix
optical flow histogram (LHOF), extracted from optic flow photographs, are the Low-level characteristics.
Now, for each video, a vector of a certain length is obtained, and the features collected are coded using the
bag-of-words (BoW) to remove unnecessary content. The last thing was to use SVMs to classify vectors at
the level of each video.
Khan et al. [13] proposed a model to classify cartoon content as inappropriate for children. The
model used the transfer learning approach taking benefit of the MobileNet model. The system proved to be
97% accurate.
Song et al. [14] made a proposal on 3D ConvNets with modified pre-processing steps for the
detection of violence. This paper proposes a new sampling method in which instead of uniform sampling,
keyframes are identified and those are implemented for categorization of sequence instead of every frame.
Through this method, as well as exceptional precision on benchmark datasets, the computing is greatly
reduced which result in faster classification. An important point is that the shorter clips are treated in a
similar manner as uniform sampling rate, just lengthy clips use the new method which further maintains the
accuracy and speed. For three public violent detection datasets: hockey fight, movies, and crowd violence,
individualized strategies are implemented to suit the varied clip length. They used uniform sampling for short
clips. However, for longer videos, the fixed sample method brings the problem of redundancy and the
discontinuity of motions. A new method is proposed for longer clips. The proposed scheme obtains
competitive results: 99.62% on hockey fight, 99.97% on movies, and 94.3% on crowd violence.
Khan et al. [15] proposed a model based on a similar approach as the Song et al. [14]. Initially, the
entire film is divided into images, and then an individual image is chosen from each scene depending on
salience intensity. These pictures can then be translated from a lightweight model, which is well tuned to
distinguish aggression and nonviolence in a film using a transfer learning technique. Finally, all nonviolent
scenes are combined into a series to provide a non-violent film that children can enjoy, and paranoid persons
can watch. The model is tested on benchmark datasets and good accuracy has been achieved.
4. Int J Elec & Comp Eng ISSN: 2088-8708
Automatic video censoring system using deep learning… (Yash Verma)
6747
Freitas et al. [16] proposed a multi-model method that classifies inappropriate and appropriate
scenes using both visual and audio parameters using convolutional neural networks. The InceptionV3 has
been used for video and AudioVGG for audio classification. Then the principal component analysis is used
for feature selection and finally, SVM performs the last categorization. The model achieved 98.94% and
98.95% F1 scores for inappropriate and appropriate content, respectively.
Gkountakos et al. [17] proposed a framework for the detection of violence by crowd-based using
ResNet and 3D ConvNets. The framework is compared with several states of the art models on the Violent-
Flow dataset. The 3D-ResNet50 framework proved to give better accuracy and is quicker than the compared
models from the experiments using Violent-Flows dataset.
Roman and Chávez [18] proposed an approach for violence detection and localization in
surveillance videos. The proposed method is based on ConvNets and dynamic images. Instead of using the
computationally costly optical flow, researchers used dynamic images which besides reducing the cost of
computation also helped the analysis of long temporal information. This helped in getting good accuracies
with lower computational cost.
Li et al. [19] proposed a multiple stream method for violence detection for the video surveillance
system. The solution suggested improves the detection of acts of violence in the video by merging three
separate streams: spatial RGB source, time stream and local space stream. The focus-based spatial RGB
stream discovers from soft-attention mechanisms the spatial attention regions of people that are highly likely
to be action regions. As the input to retrieve temporal characteristics, the temporary stream uses optical flow.
The local spatial stream uses block images as feedback to learn spatial local characteristics. The algorithm's
proposal was tested on a self-compiled elevator surveillance dataset and found to be satisfactory.
Accattoli et al. [20] used C3D, which is a 3D ConvNet architecture that allows the computation of
video descriptor features to remove motion features without any previous information. These descriptors
were then used to characterize videos as either aggressive or peaceful bypassing descriptors as an input for a
linear support vector machine. For the model, performance was improved for the application of both crowd
and individual violent action recognition than the state-of-the-art models.
The technique developed by Sharma et al. [21] was composed of three phases. In the first step, the
whole film is split into shots, and then a random sample from each shot is chosen. The next step is to transfer
learning, which was done to fine-tune the classification of violence and nonviolence. To put an end to all the
non-violence sequences, the violence-free movie that can be seen by children and violent paranoid
individuals is made by stitching all the non-violence segments together. The model was evaluated on the
Violence in movies dataset, the Hockey fights dataset, and the VSD dataset benchmarks, and it obtained an
overall accuracy of 96.3%. MobileNetV2 [22] has been a popular lightweight model for the task of object
detection. Xception [23], InceptionV3 [24], VGG16 [25], ResNet50 [26] are some of the winners for the
ImageNet challenges in different years that showed excellent accuracies for the ImageNet dataset.
2. METHOD
2.1. Dataset
The movies fight detection dataset [27] has been used for comparison purpose. The dataset is
comprised of 200 videos of variable length taken from various international movies. For computation, the
individual frame images have been extracted from these videos. A few samples of nonviolent and violent
images from the data set have been provided as shown in Figures 1 and 2.
Figure 1. Non-violent images from the dataset
2.2. Hardware and software used
The implementation was done on a computer with an CPU Intel(R) Core(TM) i7-9750H CPU @
2.60 GHz, GPU NVIDIA GeForce GTX 1660 Ti, RAM 8 GB, and operating system Windows 10. This
research used language Python 3.7.10. And used Framework Keras with TensorFlow 2.4.1.
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6744-6755
6748
Figure 2. Violent images from the dataset
2.3. System methodology
The overall methodology is explained using the following steps. The complete workflow of the
system has been illustrated through the flowchart in Figure 3. Input raw video is taken from dataset. The
frames were splits from the uploaded video. Image processing techniques applied using deep neural network.
The pretrained dataset also been fed into the model. The videos were classified into violent or not using pass
unchanged and gaussian blurring.
2.3.1. Input raw video
The first step is the input step. The system can accept both image and video input. The system can
be in one of the common formats for images like JPG or PNG and mp4 for videos. The minimum expected
resolution of the input is 240×240 pixels.
Figure 3. Methodology flowchart
2.3.2. Splitting frames from video
Every video consists of a series of frames. So, if the input is in the form of a video, it is first
separated into individual frames. The system is capable of handling a high frame rate but for testing purposes
6. Int J Elec & Comp Eng ISSN: 2088-8708
Automatic video censoring system using deep learning… (Yash Verma)
6749
to decrease the processing time (because of unavailability of high specification hardware), the sampling rate
here has been set to ten frames per second.
2.3.3. Image processing
To give more flexibility input could be in any resolution greater than 240×240 pixels. But the
system is designed to work upon input of 240×240 pixels, all the frames are converted to this dimension.
This resolution has been chosen owing to the hardware bottleneck. So, in presence of hardware of higher
specifications, the system can be easily adjusted to work upon higher resolution frames.
2.3.4. Processing using proposed deep neural network
The processed frames are fed to the DL model as illustrated below and the output is one of two
categories i.e., violent, or non-violent. There are techniques which are used for processing of videos using
deep neural network are VGG19, Convolution 2D layer as well as Max pool layer. The architecture of the
model is explained below and illustrated in Figure 4.
Figure 4. Proposed deep neural network architecture
− VGG16
The model utilized the pre-trained model weights to save time and other resources. VGG16 is an
excellent object detection model. So, the image input is first passed to VGG16, and the output of the second
last layer is passed to the convolutional 2D (Conv2D) layer of the new model instead of the output layer.
− Convolutional 2D layer
Conv2D detects various features in frames via convolution operation. A kernel is passed over the
whole image or feature map and the result of convolution is stored in the output feature map. The features
from frames were extracted to classify video content as violent or nonviolent. These frames were further
combined and saved in disk.
− Max pooling layer
Max pooling is one of the three pooling techniques used for dimensionality reduction in CNNs. In
the max-pooling layer, the maximum value of the feature is selected from specified adjacent values of feature
maps. The feature maps received certain values which is useful in selecting the highest values.
− Dense layer
In this layer, every neuron from the previous layer is connected to every neuron. That is why it is
also called a fully connected layer. Matrix multiplication takes place in this layer in the background and
learning takes place via adjustment of weights by backpropagation mechanism.
− Dropout layer
The introduction of this layer is important to reduce the model overfitting while training the model
employing regularization. A dropout factor of 0.3 has been set here which means 30% of the inputs are
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6744-6755
6750
randomly discarded and others are moved to dense layer. The dense layer is created which is further used to
accumulate those portions of inputs which has thirty% dropout.
− Dense layer
This layer function in a similar way as the above dense layer to enforce learning in the model after
the dropout layer. This is the last layer processing multi-dimensional inputs and outputs. This layer is deeply
connected neural network layer is very common and frequently used layer. Dense layer operates on frames of
input and return the output.
− Flatten layer
Flatten layer is used to convert multidimensional feature maps into a long single-dimensional vector
so that the final classification can be performed by the fully connected output layer quickly. It makes the
multidimensional frame input one-dimensional frame output. This layer helpful in transition of convolution
layer to the fully connected layer.
− Dense output layer
This is the last layer in the proposed model where actual classification takes place. Here, the sigmoid
activation function is used for classification. The output of this layer is one of the two classes that is violent
or non-violent.
2.3.5. Applying blur to violent images
Based on the output of the model, gaussian blur is applied to the frame. If the frame is violent, the
frame is blurred otherwise no change is made to the frame. The output is recognized as violent and
differentiated with nonviolent frames on the basis of blurred texture observed in the frame.
2.3.6. Merge to form video
The processing till now is done on individual frames. In this step, all the frames are again combined
and encoded to form the video output (or image in case of image inputs). The multidimensional videos are
converted into one dimensional. The unit frames merged and encoding is performed to get the video back.
2.3.7. Saving to disk
In this step, the output video or image file is stored on the storage medium at the path specified that
is the same path as the input file by default. After classifying video into violent and non-violent videos. These
videos will be merged and stored in local hard disk.
3. RESULTS AND DISCUSSION
It has been inferred from the literature review that the deep learning approach has been found more
effective than the alternative ML approaches. So, some of the popular deep learning models are assessed for
the application of classifying violent and non-violent scenes. For comparison, the last layer of every network
compared was removed and the base model layers have been set as non-trainable. All the models used for
transfer learning have been concatenated with similar layer architecture so that the results will not be biased
due to anything except base model architecture. Tables 1 to 4 show the values of training accuracies,
validation accuracies, training losses, and validation losses respectively for all the considered models for the
same dataset.
Table 1. Training accuracies
Epoch VGG16 ResNet50 ResNet101 Xception InceptionV3 MobileNetV2 VGG19
1 0.4308 0.5325 0.4286 0.6099 0.4009 0.3329 0.4433
2 0.6145 0.6699 0. 5435 0.7211 0.4043 0.3607 0.5779
3 0.6286 0.7788 0.5546 0.7373 0.4378 0.3874 0.5627
4 0.6259 0.6919 0.5710 0.7079 0.4533 0.3342 0.5877
5 0.6259 0.7120 0.5736 0.7673 0.4715 0.3936 0.5976
6 0.6708 0.7923 0.6016 0.7770 0.5085 0.4334 0.6051
7 0.6708 0.7993 0.6174 0.7452 0.5264 0.4489 0.5783
8 0.7814 0.8766 0.5928 0.7200 5438 0.4456 0.5879
9 0.8476 0.9482 0.5917 0.7800 0.6046 0.4433 0.6116
10 0.9521 0.7746 0.5958 0.7771 0.5867 0.4530 0.7018
11 0.9743 0.8658 - - - - -
12 0.9962 0.9887 - - - - -
13 0.9968 0.9969 - - - - -
14 0.9984 0.9963 - - - - -
15 0.9971 0.9962 - - - - -
8. Int J Elec & Comp Eng ISSN: 2088-8708
Automatic video censoring system using deep learning… (Yash Verma)
6751
Table 2. Validation accuracies
Epoch VGG16 ResNet50 ResNet101 Xception InceptionV3 MobileNetV2 VGG19
1 0.3811 0.4325 0.1022 0.2596 0.0899 0.0688 0.1527
2 0.3802 0.6518 0.1178 0.2362 0.0899 0.0651 0.1633
3 0.3848 0.5495 0.1165 0.2495 0.0899 0.0894 0.1458
4 0.3926 0.3527 0.1220 0.2614 0.0912 0.0880 0.2183
5 0.2472 0.3100 0.1293 0.2665 0.0958 0.0889 0.1853
6 0.2495 0.3729 0.2004 0.2596 0.1073 0.0894 0.272
7 0.5504 0.3743 0.2266 0.2160 0.1178 0.0894 0.2192
8 0.6307 0.4752 0.2243 0.2798 0.1688 0.0889 0.2087
9 0.7834 0.4325 0.2220 0.2807 0.2137 0.0889 0.2692
10 0.8853 0.5330 0.1422 0.3036 0.3582 0.0889 0.2917
11 0.9706 0.7853 - - - - -
12 0.9486 0.8426 - - - - -
13 0.9559 0.9220 - - - - -
14 0.9559 0.7261 - - - - -
15 0.9522 0.7775 - - - - -
Table 3. Training losses
Epoch VGG16 ResNet50 ResNet101 Xception InceptionV3 MobileNetV2 VGG19
1 0.0846 0.0974 0.6127 3.9513 2.0313 7.8345-05 0.2149
2 0.0020 0.0023 0.0002 0.0255 0.3080 7.4334-05 0.0002
3 2.1427e-05 0.0027 0.0013 0.6266 0.2705 0.0002 0.0004
4 7.4089e-05 0.0108 7.5943 0.0084 0.2483 0.0013 0.0021
5 0.0009 0.0043 0.0002 0.0021 0.2207 2.8252e-05 3.23E-05
6 6.3740e-06 0.0001 0.0004 0.0014 0.1909 1.8697e-05 0.0002
7 0.0023 5.2149 0.0017 0.0194 0.1526 2.3616e-05 6.62E-05
8 0.0116 0.0010 2.1748 0.0053 0.1409 8.5036e-06 2.65E-05
9 0.0200 0.0054 3.5433 0.0057 0.0953 1.9839e-05 3.90E-03
10 0.0169 0.0001 0.0008 0.0010 0.0884 2.2081e-05 2.50E-03
11 0.2106 0.0566 - - - - -
12 0.1233 0.0756 - - - - -
13 0.0009 0.0361 - - - - -
14 1.6412e-12 0.1541 - - - - -
15 0.0088 0.2933 - - - - -
Table 4. Validation losses
Epoch VGG16 ResNet50 ResNet101 Xception Inception MobileNet VGG19
1 0.1721 0.2322 0.9444 8.4396 1.1195 2.7507 0.5384
2 0.5360 0.2199 1.1235 8.1314 2.6441 3.0738 0.2601
3 0.4174 0.0651 1.3611 9.5290 5.0485 1.4225 0.7005
4 0.349 0.2968 1.6463 15.6250 13.4026 2.5173 0.1774
5 0.9826 1.9352 2.5149 16.8107 6.4081 2.8827 0.3428
6 1.1410 1.5655 3.1745 11.7113 13.0884 2.8260 0.1015
7 4.2067 1.5548 0.4659 6.7274 21.7036 2.7518 0.1844
8 1.8556 4.4436 0.4944 8.6459 5.3421 2.9729 0.2076
9 1.4086 2.1964 0.5038 13.4710 13.4546 2.8615 4.56
10 0.5593 2.4719 1.2660 9.0726 2.1627 3.2106 1.1646
11 0.1909 1.9767 - - - - -
12 0.4748 2.9610 - - - - -
13 1.1006 3.2689 - - - - -
14 1.1067 25.5439 - - - - -
15 4.8550 6.9018 - - - - -
Shown below are the model training and validation metrics. Both the models have shown
considerably lower training losses than the other compared models. But the validation losses in the case of
VGG16 have been found to be significantly lower when compared to the validation losses when using
ResNet50.
The time required for validation and training of ResNet101 and VGG19 is more than double as of
ResNet50 and VGG16 as being the most complex model among the compared models, but the accuracy is
worse than the ResNet50 and VGG16. Xception lies somewhere between VGG16 and ResNet101.The
MobileNetV2 being the lightest model has also got the least training time, but the accuracy is also worse than
VGG16 and ResNet50. The training time for VGG16, ResNet50 and Inception are similar.
Since VGG16 and ResNet50 have shown the best accuracies among all, they are trained for 5 more
epochs to get the most accurate models. Both the models start to overfit after epoch 13. Also, the validation
and training losses are at optimum at epoch 13, as observed from Figures 5 to 12. So, the model training is
9. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6744-6755
6752
stopped at epoch 13 at 99.68% training and 95.59% validation accuracy for VGG16, and 99.69% training and
92.20% validation accuracy for ResNet50.
The validation and training losses are also at optimum at epoch 13, as observed from Figures 6, 8,
10 and 12. So, the model training is stopped at epoch 13 at 99.68% training and 95.59% validation accuracy
for VGG16, and 99.69% training and 92.20% validation accuracy for ResNet50.The VGG16 performed well
in respect of accuracy as 99% as compared to validation accuracy achieved by ResNet.
Figure 5. ResNet50 training accuracy Figure 6. ResNet50 training loss
Figure 7. ResNet50 validation accuracy Figure 8. ResNet50 validation loss
Figure 9. VGG16 training accuracy Figure 10. VGG16 training loss
10. Int J Elec & Comp Eng ISSN: 2088-8708
Automatic video censoring system using deep learning… (Yash Verma)
6753
Figure 11. VGG16 validation accuracy Figure 12. VGG16 validation loss
4. CONCLUSION
VGG16 and ResNet50 has shown best results among MobileNetV2, Xception model, InceptionV3,
ResNet101, ResNet50 and VGG16. It has been concluded from the experiments that VGG16 is the most
appropriate model for designing the system to classify the violent and non-violent scenes from the movies.
This is shown to be 99.68% for training and 95.59% accurate for validation. Due to the absence of hardware
with high specifications, the system is made to reduce the resolution of output to 240×240 pixels and the
sampling rate is reduced to 10 frames per second. The system is capable to work at high sampling rates and
work for high-resolution output if better hardware is used for processing.
REFERENCES
[1] Nagoriastha, “Censorship of OTT platforms: a boon or bane,” Legal Service India. http://www.legalserviceindia.com/legal/article-
3418-censorship-of-ott-platforms-a-boon-or-bane.html (accessed May 20, 2021).
[2] E. B. Nievas, O. D. Suarez, G. B. García, and R. Sukthankar, “Violence detection in video using computer vision techniques,” in
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 6855, no. 2, Springer Berlin Heidelberg, 2011, pp. 332–339, doi: 10.1007/978-3-642-23678-5_39.
[3] O. Deniz, I. Serrano, G. Bueno, and T. K. Kim, “Fast violence detection in video,” in 9th International Conference on Computer
Vision Theory and Applications, 2014, vol. 2, pp. 478–485, doi: 10.5220/0004695104780485.
[4] E. Y. Fu, H. Va Leong, G. Ngai, and S. Chan, “Automatic fight detection in surveillance videos,” in 14th International
Conference on Advances in Mobile Computing and Multi Media, Nov. 2016, pp. 225–234, doi: 10.1145/3007120.3007129.
[5] P. Bilinski and F. Bremond, “Human violence recognition and detection in surveillance videos,” in 2016 13th IEEE International
Conference on Advanced Video and Signal Based Surveillance (AVSS), Aug. 2016, pp. 30–36, doi: 10.1109/AVSS.2016.7738019.
[6] R. Nar, A. Singal, and P. Kumar, “Abnormal activity detection for bank ATM surveillance,” in International Conference on
Advances in Computing, Communications and Informatics (ICACCI), Sep. 2016, pp. 2042–2046, doi:
10.1109/ICACCI.2016.7732351.
[7] J. Xie, W. Yan, C. Mu, T. Liu, P. Li, and S. Yan, “Recognizing violent activity without decoding video streams,” Optik, vol. 127,
no. 2, pp. 795–801, Jan. 2016, doi: 10.1016/j.ijleo.2015.10.165.
[8] P. C. Ribeiro, R. Audigier, and Q. C. Pham, “RIMOC, a feature to discriminate unstructured motions: application to violence
detection for video-surveillance,” Computer Vision and Image Understanding, vol. 144, pp. 121–143, Mar. 2016, doi:
10.1016/j.cviu.2015.11.001.
[9] T. Senst, V. Eiselein, A. Kuhn, and T. Sikora, “Crowd violence detection using global motion-compensated lagrangian features
and scale-sensitive video-level lepresentation,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 12,
pp. 2945–2956, Dec. 2017, doi: 10.1109/TIFS.2017.2725820.
[10] P. Zhou, Q. Ding, H. Luo, and X. Hou, “Violent interaction detection in video based on deep learning,” Journal of Physics:
Conference Series, vol. 844, no. 1, Jun. 2017, doi: 10.1088/1742-6596/844/1/012044.
[11] S. Chaudhary, M. A. Khan, and C. Bhatnagar, “Multiple anomalous activity detection in videos,” Procedia Computer Science,
vol. 125, pp. 336–345, 2018, doi: 10.1016/j.procs.2017.12.045.
[12] P. Zhou, Q. Ding, H. Luo, and X. Hou, “Violence detection in surveillance video using low-level features,” PLOS ONE, vol. 13,
no. 10, Oct. 2018, doi: 10.1371/journal.pone.0203668.
[13] N. F. Khan, A. Hussain, A. Khan, and I. U. Din, “Categorized violent contents detection in cartoonmovies using deep learning
model: mobile net,” in 5th
International Conference on Next Generation Computing, 2019.
[14] W. Song, D. Zhang, X. Zhao, J. Yu, R. Zheng, and A. Wang, “A novel violent video detection scheme based on modified 3D
convolutional neural networks,” IEEE Access, vol. 7, pp. 39172–39179, 2019, doi: 10.1109/ACCESS.2019.2906275.
[15] S. U. Khan, I. U. Haq, S. Rho, S. W. Baik, and M. Y. Lee, “Cover the violence: a novel deep-learning-based approach towards
violence-detection in movies,” Applied Sciences, vol. 9, no. 22, Nov. 2019, doi: 10.3390/app9224963.
[16] P. V. A. de Freitas et al., “A multimodal CNN-based tool to censure inappropriate video scenes,” arXiv:1911.03974, Nov. 2019.
[17] K. Gkountakos, K. Ioannidis, T. Tsikrika, S. Vrochidis, and I. Kompatsiaris, “A crowd analysis framework for detecting violence
scenes,” in Proceedings of the 2020 International Conference on Multimedia Retrieval, Jun. 2020, pp. 276–280, doi:
10.1145/3372278.3390725.
11. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6744-6755
6754
[18] D. G. C. Roman and G. C. Chavez, “Violence detection and localization in surveillance video,” in 33rd SIBGRAPI Conference on
Graphics, Patterns and Images, Nov. 2020, pp. 248–255, doi: 10.1109/SIBGRAPI51738.2020.00041.
[19] H. Li, J. Wang, J. Han, J. Zhang, Y. Yang, and Y. Zhao, “A novel multi-stream method for violent interaction detection using
deep learning,” Measurement and Control, vol. 53, no. 5–6, pp. 796–806, May 2020, doi: 10.1177/0020294020902788.
[20] S. Accattoli, P. Sernani, N. Falcionelli, D. N. Mekuria, and A. F. Dragoni, “Violence detection in videos by combining 3D
convolutional neural networks and support vector machines,” Applied Artificial Intelligence, vol. 34, no. 4, pp. 329–344, Mar.
2020, doi: 10.1080/08839514.2020.1723876.
[21] K. Sharma and M. Bhatia, “Deep learning in pandemic states: Portrayal,” International Journal on Emerging Technologies,
vol. 11, no. 3, pp. 462–467, 2020.
[22] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “MobileNetV2: inverted residuals and linear bottlenecks,” in
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 4510–4520,
doi: 10.1109/CVPR.2018.00474.
[23] F. Chollet, “Xception: deep learning with depthwise separable convolutions,” in IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Jul. 2017, pp. 1800–1807, doi: 10.1109/CVPR.2017.195.
[24] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2818–2826, doi:
10.1109/CVPR.2016.308.
[25] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International
Conference on Learning Representations, Sep. 2014, doi: 10.48550/arXiv.1409.1556.
[26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[27] I. S. Gracia, O. D. Suarez, G. B. Garcia, and T. K. Kim, “Fast fight detection,” PLoS ONE, vol. 10, no. 4, Apr. 2015, doi:
10.1371/journal.pone.0120448.
BIOGRAPHIES OF AUTHORS
Yash Verma is a student pursuing Master of Technology degree in Computer
Science and Engineering from Amity University, Noida, Uttar Pradesh, India. He completed
his Bachelor of Technology degree in Computer Science and Engineering from SRM Institute
of Science and Technology, Chennai, Tamil Nadu, India. His primary interest is in “Image and
Video processing using Machine Learning and Deep Learning Techniques”. His research is
focused on Security and Surveillance and automatic censorship in videos. He has also worked
on a system to train ground security personnel for surveillance in different environments which
is compatible with the new age Virtual Reality systems. He can be contacted at email:
yashver96@gmail.com.
Madhulika Bhatia is working as an Associate Professor in the Department of
Computer Science and Engineering at Amity School of Engineering and Technology, Amity
University, Noida. She holds a Diploma in Computer Science and Engineering, B.E in
Computer Science and Engineering, MBA in Information Technology, M. Tech in Computer
Science and PhD from Amity University, Noida. She has a total of 15 years of teaching
experience. She published almost 32 Research Papers in National, International conferences
and Journals. She is also the author of two books. She Filed two Provisional Patent. She
attended and organized many workshops, Guest Lectures, seminars. She is also a member of
many technical societies like IET, ACM, UACEE, IEEE She reviewed for Elsevier-Heliyon,
IGI, Indian Journal of Science and Technology and as well as did Editorial for Springer
Nature, Switzerland for Book Chapter in Data Visualization and Knowledge Engineering. She
has guided 5 M.Tech. Thesis and around 50 B. Tech. Major and Minor Projects and guiding
PhD scholars. He can be contacted at email: madhulikabahtia@gmail.com.
Poonam Tanwar has 18 years of Teaching Experience working as Prof. in Manav
Rachna International Institute of Research and Studies, Faridabad. She has filled 6 patents. She
was Guest Editor for Special issue of “Advancement in machine learning (ML) and
Knowledge Mining (KM)” for International Journal of Recent Patents in Engineering (UAE).
She has organized various Science and Technology awareness program for rural development.
Beside this she has even published more than 40 research papers in various International
Journals and Conferences. 3 edited books with international publisher are in her credit. She is
technical program committee member for various International Conferences. She can be
contacted at email: poonamtanwar.fet@mriu.edu.in.
12. Int J Elec & Comp Eng ISSN: 2088-8708
Automatic video censoring system using deep learning… (Yash Verma)
6755
Shaveta Bhatia is a professor at Department of Computer Applications, MRIIRS.
Deputy Director, Manav Rachna Online Education. She has been awarded her Ph.D. degree in
Computer Applications. She has completed her master’s in computer applications (MCA) from
Kurukshetra University. She is having 18 years of academic and research experience. She is a
member of various professional bodies like ACM, IAENG and CSI. She has participated in
various National and International Conferences and actively involved in various projects.
There are more than 40 publications to her credit in reputed National and International Journals
and Conferences. She is also member of Editorial board of various highly index journals. Her
specialized domains include Mobile Computing, Web Applications, Data Mining and Software
Engineering and guiding research scholars in these areas. She can be contacted at email:
shaveta.fca@mriu.edu.in.
Mridula Batra Faculty of Computer Applications, MRIIRS. She has participated
in various National and International Conferences and actively involved in various projects.
There are more than 20 publications to her credit in reputed National and International Journals
and Conferences. She is also member of Editorial board of various highly index journals. She
can be contacted at email: Mridula.fca@mriu.edu.in.