The need for reliable automated object detection and classification in the maritime sphere has increasedsignificantly over the last decade. The increased use of unmanned marine vehicles necessitates the developmentand use of an autonomous object detection and classification systems that utilize onboard cameras and operatereliably, efficiently, and accurately without the need for human intervention. As the autonomous vehicle realmmoves further away from active human supervision, the requirements for a detection and classification suite callfor a higher level of accuracy in the classification models. This paper presents a comparative study using differentclassification models on a large maritime vessel image dataset. For this study, additional images were annotatedusing Roboflow, focusing on the types of subclasses that had the lowest detection performance, a previous workby Brown et al. (1). The present study uses a dataset of over 5,000 images. Using the enhanced set of imageannotations, models were created in the cloud using Google Colab using YOLOv5, YOLOv7, and YOLOv8 as wellas in Amazon Web Services using Amazon Rekognition. The models were tuned and run for five runs of 150 epochseach to collect efficiency and performance metrics. Comparison of these metrics from the YOLO models yieldedinteresting improvements in classification accuracies for the subclasses that previously had lower scores, but atthe expense of the overall model accuracy. Furthermore, training time efficiency was improved drastically using thenewer YOLO APIs. In contrast, using Amazon Rekognition yielded superior accuracy results across the board, butat the expense of the lowest training efficiency.
Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made
on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5,
a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the
Common Objects in Context (COCO) dataset, was used to train on annotations of sub-classifications of maritime
vessels. YOLOv5 provides significant results in detecting a boat. The training, validation, and test set of images
trained YOLOv5 in the cloud using Google Colab. Three of our five subclasses, namely, cruise ships, ROROs (Roll
On Roll Off, typically car carriers), and military ships, have very distinct shapes and features and yielded positive
results. Two of our subclasses, namely, the tanker and cargo ship, have similar characteristics when the cargo
ship is unloaded and not carrying any cargo containers. This yielded interesting misclassifications that could be
improved in future work. Our trained model resulted in the validation metric of mean Average Precision (mAP@0.5)
of 0.932 across all subclassification of ships
This document describes research using YOLOv5, an object detection model, to classify images of ships into five subclasses (cargo, military, carrier, cruise, tanker). The researchers trained YOLOv5 on manually annotated images from a public dataset. They achieved a mean Average Precision of 0.932 for detecting the ship subclasses. Some subclasses like cruise ships and military ships were detected accurately due to distinct shapes, but cargo and tanker ships were sometimes misclassified when cargo ships were empty. The researchers concluded YOLOv5 showed significant results but future work could improve classifications of similar subclasses.
EMERGENCY VEHICLE SOUND DETECTION SYSTEMS IN TRAFFIC CONGESTIONIRJET Journal
This document summarizes research on using YOLOv3 object detection to optimize CCTV storage efficiency by selectively recording relevant footage. It proposes integrating YOLOv3, trained on the COCO dataset, with OpenCV's DNN module to enable real-time object detection and classification in traffic camera video. This would allow identifying significant changes between frames and enclosing detected objects with bounding boxes, improving data storage and surveillance analysis. The methodology calculates frame differences, uses YOLOv3 for object detection, and cosine similarity for motion tracking, with integration into OpenCV for efficient deployment. This provides an effective solution for optimizing CCTV storage and object detection.
Vehicle detection and classification are essential for advanced driver assistance systems (ADAS) and even traffic camera surveillance. Yet, it is challenging due to complex backgrounds, varying illumination intensities, occlusions, vehicle size, and type variations. This paper aims to apply you only look once (YOLO) since it has been proven to produce high object detection and classification accuracy. There are various versions of YOLO, and their performances differ. An investigation on the detection and classification performance of YOLOv3, YOLOv4, and YOLOv5 has been conducted. The training images were from common objects in context (COCO) and open image, two publicly available datasets. The testing input images were captured on a few highways in two main cities in Malaysia, namely Shah Alam and Kuala Lumpur. These images were captured using a mobile phone camera with different backgrounds during the day and night, representing different illuminations and varying types and sizes of vehicles. The accuracy and speed of detecting and classifying cars, trucks, buses, motorcycles, and bicycles have been evaluated. The experimental results show that YOLOv5 detects vehicles more accurately but slower than its predecessors, namely YOLOv4 and YOLOv3. Future work includes experimenting with newer versions of YOLO.
Real time pedestrian detection with deformable part models [h. cho, p. rybski...tino
This document summarizes a real-time pedestrian detection system intended for automotive applications. The system uses a deformable part-based model approach and achieves superior detection performance compared to state-of-the-art methods, running at 14 fps. It uses geometric constraints from camera calibration to efficiently search feature pyramids. Evaluation on the Caltech Pedestrian dataset shows a detection rate of 61% with 1 false positive per image, outperforming other methods. The system provides practical pedestrian detection for autonomous and driver-assisted vehicles.
UNDERWATER DETECTION OF ANCIENT POTTERY SHERDS USING DEEP LEARNINGIJCI JOURNAL
This paper outlines the creation of a machine learning model designed to identify ancient pottery
fragments near a submerged shipwreck off Modi Island, Greece. We trained multiple iterations of the
YOLOv8 model using a custom dataset comprised of underwater videos taken during diving expeditions at
the wreck site. The primary goal of this research is to integrate the resulting object detection system into a
remotely operated vehicle (ROV) for automated pottery shard recognition, thereby aiding archaeological
excavations. The paper elaborates on the model's development methodology and presents comprehensive
experimental and evaluative results. These findings underscore the model's potential to significantly
enhance the efficiency and accuracy of underwater archaeological exploration and analysis.
This document compares the performance of YOLOv3, YOLOv4, and YOLOv5 object detection algorithms for detecting safe landing spots for faulty UAVs using aerial images. It trains the algorithms on the DOTA dataset, which contains images of objects like planes, ships, and sports fields. The algorithms are evaluated on a PC and companion computer based on accuracy metrics like mAP and F1 score as well as inference speed measured in FPS. Preliminary results show YOLOv5 achieves the highest accuracy while maintaining a slightly slower speed than YOLOv4 and YOLOv3. The best performing algorithm will be used for emergency landing spot detection on actual faulty UAVs
Vigilance: Vehicle Detector and TrackerIRJET Journal
The document proposes a vehicle detection and tracking system using YOLO and DeepSort algorithms. It first discusses challenges in vehicle detection from video frames and reviews existing methods. The proposed system uses YOLO to detect vehicles in frames and DeepSort to track detected vehicles across frames. The system was tested on a dataset with results showing it can detect vehicles in complex traffic environments with potential for cloud-based processing. Future applications discussed include real-time traffic monitoring and intelligent parking management.
Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made
on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5,
a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the
Common Objects in Context (COCO) dataset, was used to train on annotations of sub-classifications of maritime
vessels. YOLOv5 provides significant results in detecting a boat. The training, validation, and test set of images
trained YOLOv5 in the cloud using Google Colab. Three of our five subclasses, namely, cruise ships, ROROs (Roll
On Roll Off, typically car carriers), and military ships, have very distinct shapes and features and yielded positive
results. Two of our subclasses, namely, the tanker and cargo ship, have similar characteristics when the cargo
ship is unloaded and not carrying any cargo containers. This yielded interesting misclassifications that could be
improved in future work. Our trained model resulted in the validation metric of mean Average Precision (mAP@0.5)
of 0.932 across all subclassification of ships
This document describes research using YOLOv5, an object detection model, to classify images of ships into five subclasses (cargo, military, carrier, cruise, tanker). The researchers trained YOLOv5 on manually annotated images from a public dataset. They achieved a mean Average Precision of 0.932 for detecting the ship subclasses. Some subclasses like cruise ships and military ships were detected accurately due to distinct shapes, but cargo and tanker ships were sometimes misclassified when cargo ships were empty. The researchers concluded YOLOv5 showed significant results but future work could improve classifications of similar subclasses.
EMERGENCY VEHICLE SOUND DETECTION SYSTEMS IN TRAFFIC CONGESTIONIRJET Journal
This document summarizes research on using YOLOv3 object detection to optimize CCTV storage efficiency by selectively recording relevant footage. It proposes integrating YOLOv3, trained on the COCO dataset, with OpenCV's DNN module to enable real-time object detection and classification in traffic camera video. This would allow identifying significant changes between frames and enclosing detected objects with bounding boxes, improving data storage and surveillance analysis. The methodology calculates frame differences, uses YOLOv3 for object detection, and cosine similarity for motion tracking, with integration into OpenCV for efficient deployment. This provides an effective solution for optimizing CCTV storage and object detection.
Vehicle detection and classification are essential for advanced driver assistance systems (ADAS) and even traffic camera surveillance. Yet, it is challenging due to complex backgrounds, varying illumination intensities, occlusions, vehicle size, and type variations. This paper aims to apply you only look once (YOLO) since it has been proven to produce high object detection and classification accuracy. There are various versions of YOLO, and their performances differ. An investigation on the detection and classification performance of YOLOv3, YOLOv4, and YOLOv5 has been conducted. The training images were from common objects in context (COCO) and open image, two publicly available datasets. The testing input images were captured on a few highways in two main cities in Malaysia, namely Shah Alam and Kuala Lumpur. These images were captured using a mobile phone camera with different backgrounds during the day and night, representing different illuminations and varying types and sizes of vehicles. The accuracy and speed of detecting and classifying cars, trucks, buses, motorcycles, and bicycles have been evaluated. The experimental results show that YOLOv5 detects vehicles more accurately but slower than its predecessors, namely YOLOv4 and YOLOv3. Future work includes experimenting with newer versions of YOLO.
Real time pedestrian detection with deformable part models [h. cho, p. rybski...tino
This document summarizes a real-time pedestrian detection system intended for automotive applications. The system uses a deformable part-based model approach and achieves superior detection performance compared to state-of-the-art methods, running at 14 fps. It uses geometric constraints from camera calibration to efficiently search feature pyramids. Evaluation on the Caltech Pedestrian dataset shows a detection rate of 61% with 1 false positive per image, outperforming other methods. The system provides practical pedestrian detection for autonomous and driver-assisted vehicles.
UNDERWATER DETECTION OF ANCIENT POTTERY SHERDS USING DEEP LEARNINGIJCI JOURNAL
This paper outlines the creation of a machine learning model designed to identify ancient pottery
fragments near a submerged shipwreck off Modi Island, Greece. We trained multiple iterations of the
YOLOv8 model using a custom dataset comprised of underwater videos taken during diving expeditions at
the wreck site. The primary goal of this research is to integrate the resulting object detection system into a
remotely operated vehicle (ROV) for automated pottery shard recognition, thereby aiding archaeological
excavations. The paper elaborates on the model's development methodology and presents comprehensive
experimental and evaluative results. These findings underscore the model's potential to significantly
enhance the efficiency and accuracy of underwater archaeological exploration and analysis.
This document compares the performance of YOLOv3, YOLOv4, and YOLOv5 object detection algorithms for detecting safe landing spots for faulty UAVs using aerial images. It trains the algorithms on the DOTA dataset, which contains images of objects like planes, ships, and sports fields. The algorithms are evaluated on a PC and companion computer based on accuracy metrics like mAP and F1 score as well as inference speed measured in FPS. Preliminary results show YOLOv5 achieves the highest accuracy while maintaining a slightly slower speed than YOLOv4 and YOLOv3. The best performing algorithm will be used for emergency landing spot detection on actual faulty UAVs
Vigilance: Vehicle Detector and TrackerIRJET Journal
The document proposes a vehicle detection and tracking system using YOLO and DeepSort algorithms. It first discusses challenges in vehicle detection from video frames and reviews existing methods. The proposed system uses YOLO to detect vehicles in frames and DeepSort to track detected vehicles across frames. The system was tested on a dataset with results showing it can detect vehicles in complex traffic environments with potential for cloud-based processing. Future applications discussed include real-time traffic monitoring and intelligent parking management.
The document discusses using AI and computer vision for automating testing. Specifically, it explores using the YOLO object detection model and BDD testing framework for vehicle detection. YOLO can detect objects like vehicles in images and video in real-time with high accuracy. BDD allows testing YOLO's vehicle detection capabilities through user stories and steps defined in a feature file. While computer vision automation provides benefits like improved accuracy and speed, challenges include ensuring high quality training data and integrating new techniques with existing processes.
The document introduces the Universal-Scale Object Detection Benchmark (USB) which aims to evaluate object detection methods on their ability to detect objects of various scales across different image domains. USB consists of the COCO, Waymo Open, and Manga109-s datasets to cover natural, traffic, and manga image domains with diverse object scales. The document proposes USB protocols for training and evaluation, including multiple divisions for training epochs and test resolutions, to enable fair comparisons while allowing strong results from longer training.
The document discusses 6 research papers related to vehicle detection using deep learning algorithms. The papers cover topics like real-time vehicle detection using YOLO v5, object detection for autonomous vehicles using YOLOv4, optimized YOLO-v2 for tiny vehicle detection, vehicle detection from UAV imagery using Faster R-CNN and RetinaNet, instance segmentation using Mask R-CNN, and vehicle tracking with YOLO and DeepSORT. Accuracy rates between 67-94% are reported for different algorithms on various datasets like KITTI and BDD100K. Limitations mentioned include small datasets and limitations of the algorithms.
You only look once model-based object identification in computer visionIAESIJAI
You only look once version 4 (YOLOv4) is a deep-learning object detection algorithm. It is used to decrease parameters and simplify network structures, making it suited for mobile and embedded device development. The YOLO detector can foresee an object's Class, bounding box, and probability of that Object's Class being found inside that bounding box. A probability value for each bounding box represents the likelihood of a given item class in that bounding box. Global features, channel attention, and special attention are also applied to extract more compelling information. Finally, the model combines the auxiliary and backbone networks to create the YOLOv4's entire network topology. Using custom functions developed upon YOLOv4, we get the count of the objects and a crop around the objects detected with a confidence score that specifies the probability of the thing seen being the same Class as predicted by YOLOv4. A confidence threshold is implemented to eliminate the detections with low confidence.
This document discusses using YOLO (You Only Look Once) models for object detection from images and video captured by drones. It begins with an introduction to road accidents and the need for improved monitoring of roads. Next, it reviews previous work using YOLO and other methods for object detection tasks. The document then discusses using YOLOv4 and a deep learning approach with Python and OpenCV to perform real-time object detection from drone footage to help identify accidents. In the conclusion, it restates that the goal is to use YOLO models to enable improved monitoring of roads and faster emergency response through computer vision-based object detection.
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET Journal
This document proposes a method for real-time human detection in flood-affected areas using video content analysis and the YOLO object detection algorithm. It trains YOLO on the COCO Human dataset to detect and localize humans in video frames from surveillance cameras. The results show that YOLO can accurately detect multiple humans, even with occlusion, and single humans under varying illumination. This approach aims to help rescue operations quickly identify affected areas and prioritize aid.
Application of improved you only look once model in road traffic monitoring ...IJECEIAES
The present research focuses on developing an intelligent traffic management solution for tracking the vehicles on roads. Our proposed work focuses on a much better you only look once (YOLOv4) traffic monitoring system that uses the CSPDarknet53 architecture as its foundation. Deep-sort learning methodology for vehicle multi-target detection from traffic video is also part of our research study. We have included features like the Kalman filter, which estimates unknown objects and can track moving targets. Hungarian techniques identify the correct frame for the object. We are using enhanced object detection network design and new data augmentation techniques with YOLOv4, which ultimately aids in traffic monitoring. Until recently, object identification models could either perform quickly or draw conclusions quickly. This was a big improvement, as YOLOv4 has an astoundingly good performance for a very high frames per second (FPS). The current study is focused on developing an intelligent video surveillance-based vehicle tracking system that tracks the vehicles using a neural network, image-based tracking, and YOLOv4. Real video sequences of road traffic are used to test the effectiveness of the method that has been suggested in the research. Through simulations, it is demonstrated that the suggested technique significantly increases graphics processing unit (GPU) speed and FSP as compared to baseline algorithms.
Custom Object Detection Using YOLO Integrated with a Segment Anything ModelIRJET Journal
This document presents a methodology for custom object detection and segmentation that integrates the YOLO object detection model with the Segment Anything Model (SAM).
The proposed approach combines the real-time performance of YOLO with the pixel-level accuracy of SAM. It trains these models on a custom dataset containing images and annotations of specific objects. Experimental results demonstrate that this integrated system achieves higher accuracy than the individual models. The qualitative analysis of video and image outputs also validates the ability of this approach to accurately detect and segment custom objects.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
Intelligent Traffic Light Control SystemIRJET Journal
This document proposes an Intelligent Traffic Light Control System (ITLCS) that uses cameras and deep learning to classify vehicles and dynamically adjust traffic light timings based on real-time traffic conditions. The system aims to reduce average wait times and account for changes in traffic to ensure optimal traffic flow and safety. It would require object detection using data acquisition and training a deep learning model to identify vehicle classes. Implementing ITLCS could address traffic congestion issues and reduce accidents at intersections by providing a more efficient alternative to traditional static traffic control systems.
Real Time Object Detection System with YOLO and CNN Models: A ReviewSpringer
The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK
ONCE (YOLO) algorithm and it's more evolved versions are briefly described in this research survey. This
survey is all about YOLO and convolution neural networks (CNN) in the direction of real time object detection.
YOLO does generalized object representation more effectively without precision losses than other object
detection models. CNN architecture models have the ability to eliminate highlights and identify objects in any
given image. When implemented appropriately, CNN models can address issues like deformity diagnosis,
creating educational or instructive application, etc. This article reached at number of observations and
perspective findings through the analysis. Also it provides support for the focused visual information and
feature extraction in the financial and other industries, highlights the method of target detection and feature
selection, and briefly describes the development process of yolo algorithm
This document discusses the development of a face mask detection system using YOLOv4. The system uses a deep learning model with YOLOv4 to detect faces in real-time video and determine if each person is wearing a mask or not. It is trained on images of faces with and without masks. The model uses CSPDarknet53 as the backbone network and PANet for feature aggregation. It is implemented with OpenCV and a Python GUI for a user interface. The goal is to help enforce mask mandates and alert authorities if too many people in an area are not wearing masks.
IRJET- Object Detection and Recognition using Single Shot Multi-Box DetectorIRJET Journal
This document summarizes research on object detection and recognition using the Single Shot Multi-Box Detector (SSD) deep learning model. SSD improves on existing object detection systems by eliminating the need for generating object proposals and resampling pixels or features, thereby making detection faster and encapsulating all computation in a single neural network. The researchers applied SSD to standard datasets like PASCAL VOC, COCO, and ILSVRC and achieved competitive accuracy compared to methods using additional proposal steps, with SSD running significantly faster at 59 FPS. Experimental results on PASCAL VOC using SSD achieved a mean average precision of 74.3%, outperforming a comparable Faster R-CNN model.
VEHICLE DETECTION USING YOLO V3 FOR COUNTING THE VEHICLES AND TRAFFIC ANALYSISIRJET Journal
This document discusses using YOLOv3 for vehicle detection and counting from video to analyze traffic. Video frames are used to identify moving vehicles and background extraction is applied to each frame to detect and count vehicles. YOLOv3 with a pre-trained model is used for object detection and classification of vehicles into classes like car, bus, motorcycle. Classification is shown for vehicles and individual types to analyze traffic levels. The analysis of vehicle levels is displayed using a pie chart.
Person Detection in Maritime Search And Rescue OperationsIRJET Journal
This document discusses recent research on using computer vision and machine learning techniques for person detection in maritime search and rescue operations from images and video captured by drones. Specifically, it summarizes 12 research papers on this topic, covering approaches such as training convolutional neural networks on bird's eye view datasets to detect people from aerial images, using multiple detection methods like sliding windows and precise localization, combining data from multiple drones and sensors to optimize search efforts, and evaluating models on both RGB and thermal image datasets. The goal of this research is to automate part of the search process to make maritime rescue operations more efficient and effective.
Person Detection in Maritime Search And Rescue OperationsIRJET Journal
1) The document discusses using machine learning and computer vision techniques for person detection in maritime search and rescue operations using drones/UAVs. It aims to automatically detect people in images/videos captured by drones to help with search efforts.
2) A key challenge is that people appear small in drone footage and are often obscured by vegetation or terrain. The models need to be trained on similar bird's eye view data to achieve high accuracy. The document reviews different person detection models and their use in search and rescue.
3) It discusses recent work involving using efficient neural networks like MobileNet for object detection from drones. Other work involves using depth sensors and pose estimation for person tracking, as well as using distributed deep learning
Pothole Detection for Safer Commutes with the help of Deep learning and IOT D...IRJET Journal
This document describes a proposed approach to detect potholes in real-time using deep learning and IoT devices. The approach involves 1) developing a device integrated into vehicles to continuously scan road surfaces and detect potholes, 2) using GPS to determine locations of detected potholes, and 3) linking the database of pothole locations to mapping software for access by authorities and users. The method uses a YOLO v8 deep learning algorithm to identify potholes from images captured by onboard cameras.
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTIONIRJET Journal
This document presents a web application designed using YOLOv5 and Flask for detecting Indian currency notes to aid visually impaired people. The researchers trained a YOLOv5 model on a dataset of Indian currency note images. They evaluated the model's performance using metrics like precision, recall, and mean average precision (mAP). They then built a web app with front-end components for uploading images and back-end components using Flask and YOLOv5 for detecting notes in images. The app detects notes with over 90% probability and outputs the label and an audio file of the label in English and Hindi. Testing showed the model and app could accurately detect currency notes in single and multiple denomination images on both laptop and
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
This document provides a comprehensive analysis of imbalance problems in object detection. It presents a taxonomy to classify different types of imbalances and discusses solutions proposed in literature. The analysis highlights significant gaps including existing imbalances that require further attention, as well as entirely new imbalances that have never been addressed before. A survey of imbalance problems caused by weather conditions and common object imbalances is conducted. Methods for addressing imbalances include data augmentation using GANs and balancing training based on class performance.
The document discusses implementing deep learning algorithms for object detection and scene perception in self-driving cars. It compares the YOLO and Faster R-CNN models, finding that Faster R-CNN has higher accuracy (mAP of 41.8) but lower speed (17.1 FPS), while YOLO has lower accuracy (mAP of 18.6) but higher speed (212.4 FPS). The authors conclude that achieving both high accuracy and high speed remains a goal for future work, which could explore using newer versions of YOLO or other models.
Air quality forecasting using convolutional neural networksBIJIAM Journal
Air pollution is now one of the biggest environmental risks, which causes more than 6 million premature deathseach year from heart diseases, stroke, diabetes, respiratory disease, and so on. Protecting humans from thedamage which is caused by air pollution is one of the major issues for the global community. The prediction ofair pollution can be done by machine learning (ML) algorithms. ML combines statistics and computer science tomaximize the prediction power. ML can be also used to predict the air quality index (AQI). The aim of this researchis to develop a convolutional neural network (CNN) model to predict air quality from the unseen data set, whichincludes concentration of nitrogen dioxide (NO2), carbon monoxide (CO), and sulfur dioxide (SO2). The proposedsystem will be implemented in two steps; the first step will focus on data analysis and pre-processing, includingfiltering, feature extraction, constructing convolutional neural network layers, and optimizing the parameters ofeach layer, while the second step is used to evaluate its model accuracy. The output is predicted as AQI for thedeveloped CNN model. The developed CNN model achieves a root-mean-square error of 13.4150 and a highaccuracy of 86.585%. The overall model is implemented using MATLAB software.
Prevention of fire and hunting in forests using machine learning for sustaina...BIJIAM Journal
Deforestation, illegal hunting, and forest fires are a few current issues that have an impact on the diversity andecosystem of forests. To increase the biodiversity of species and ecosystems, it becomes imperative to preservethe forest. The conventional techniques employed to prevent these issues are costly, less effective, and insecure.The current systems are unreliable and use more energy. By utilizing an Internet of Things (IOT) system, thistechnology offers a more practical and economical method of continuously maintaining and monitoring the statusof the forest. To guarantee excellent security, this system combines a number of sensors, alarms, cameras, lights,and microphones. It aids in reducing forest loss, animal trafficking, and forest fires. In the suggested system,sensors are used for monitoring, and cloud storage is used for data storage. Through the use of machine learning,the raspberry pi camera module significantly aids in the prevention of unlawful wildlife hunting as well as thedetection and prevention of forest fires.
More Related Content
Similar to Object detection and ship classification using YOLO and Amazon Rekognition
The document discusses using AI and computer vision for automating testing. Specifically, it explores using the YOLO object detection model and BDD testing framework for vehicle detection. YOLO can detect objects like vehicles in images and video in real-time with high accuracy. BDD allows testing YOLO's vehicle detection capabilities through user stories and steps defined in a feature file. While computer vision automation provides benefits like improved accuracy and speed, challenges include ensuring high quality training data and integrating new techniques with existing processes.
The document introduces the Universal-Scale Object Detection Benchmark (USB) which aims to evaluate object detection methods on their ability to detect objects of various scales across different image domains. USB consists of the COCO, Waymo Open, and Manga109-s datasets to cover natural, traffic, and manga image domains with diverse object scales. The document proposes USB protocols for training and evaluation, including multiple divisions for training epochs and test resolutions, to enable fair comparisons while allowing strong results from longer training.
The document discusses 6 research papers related to vehicle detection using deep learning algorithms. The papers cover topics like real-time vehicle detection using YOLO v5, object detection for autonomous vehicles using YOLOv4, optimized YOLO-v2 for tiny vehicle detection, vehicle detection from UAV imagery using Faster R-CNN and RetinaNet, instance segmentation using Mask R-CNN, and vehicle tracking with YOLO and DeepSORT. Accuracy rates between 67-94% are reported for different algorithms on various datasets like KITTI and BDD100K. Limitations mentioned include small datasets and limitations of the algorithms.
You only look once model-based object identification in computer visionIAESIJAI
You only look once version 4 (YOLOv4) is a deep-learning object detection algorithm. It is used to decrease parameters and simplify network structures, making it suited for mobile and embedded device development. The YOLO detector can foresee an object's Class, bounding box, and probability of that Object's Class being found inside that bounding box. A probability value for each bounding box represents the likelihood of a given item class in that bounding box. Global features, channel attention, and special attention are also applied to extract more compelling information. Finally, the model combines the auxiliary and backbone networks to create the YOLOv4's entire network topology. Using custom functions developed upon YOLOv4, we get the count of the objects and a crop around the objects detected with a confidence score that specifies the probability of the thing seen being the same Class as predicted by YOLOv4. A confidence threshold is implemented to eliminate the detections with low confidence.
This document discusses using YOLO (You Only Look Once) models for object detection from images and video captured by drones. It begins with an introduction to road accidents and the need for improved monitoring of roads. Next, it reviews previous work using YOLO and other methods for object detection tasks. The document then discusses using YOLOv4 and a deep learning approach with Python and OpenCV to perform real-time object detection from drone footage to help identify accidents. In the conclusion, it restates that the goal is to use YOLO models to enable improved monitoring of roads and faster emergency response through computer vision-based object detection.
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET Journal
This document proposes a method for real-time human detection in flood-affected areas using video content analysis and the YOLO object detection algorithm. It trains YOLO on the COCO Human dataset to detect and localize humans in video frames from surveillance cameras. The results show that YOLO can accurately detect multiple humans, even with occlusion, and single humans under varying illumination. This approach aims to help rescue operations quickly identify affected areas and prioritize aid.
Application of improved you only look once model in road traffic monitoring ...IJECEIAES
The present research focuses on developing an intelligent traffic management solution for tracking the vehicles on roads. Our proposed work focuses on a much better you only look once (YOLOv4) traffic monitoring system that uses the CSPDarknet53 architecture as its foundation. Deep-sort learning methodology for vehicle multi-target detection from traffic video is also part of our research study. We have included features like the Kalman filter, which estimates unknown objects and can track moving targets. Hungarian techniques identify the correct frame for the object. We are using enhanced object detection network design and new data augmentation techniques with YOLOv4, which ultimately aids in traffic monitoring. Until recently, object identification models could either perform quickly or draw conclusions quickly. This was a big improvement, as YOLOv4 has an astoundingly good performance for a very high frames per second (FPS). The current study is focused on developing an intelligent video surveillance-based vehicle tracking system that tracks the vehicles using a neural network, image-based tracking, and YOLOv4. Real video sequences of road traffic are used to test the effectiveness of the method that has been suggested in the research. Through simulations, it is demonstrated that the suggested technique significantly increases graphics processing unit (GPU) speed and FSP as compared to baseline algorithms.
Custom Object Detection Using YOLO Integrated with a Segment Anything ModelIRJET Journal
This document presents a methodology for custom object detection and segmentation that integrates the YOLO object detection model with the Segment Anything Model (SAM).
The proposed approach combines the real-time performance of YOLO with the pixel-level accuracy of SAM. It trains these models on a custom dataset containing images and annotations of specific objects. Experimental results demonstrate that this integrated system achieves higher accuracy than the individual models. The qualitative analysis of video and image outputs also validates the ability of this approach to accurately detect and segment custom objects.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
Intelligent Traffic Light Control SystemIRJET Journal
This document proposes an Intelligent Traffic Light Control System (ITLCS) that uses cameras and deep learning to classify vehicles and dynamically adjust traffic light timings based on real-time traffic conditions. The system aims to reduce average wait times and account for changes in traffic to ensure optimal traffic flow and safety. It would require object detection using data acquisition and training a deep learning model to identify vehicle classes. Implementing ITLCS could address traffic congestion issues and reduce accidents at intersections by providing a more efficient alternative to traditional static traffic control systems.
Real Time Object Detection System with YOLO and CNN Models: A ReviewSpringer
The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK
ONCE (YOLO) algorithm and it's more evolved versions are briefly described in this research survey. This
survey is all about YOLO and convolution neural networks (CNN) in the direction of real time object detection.
YOLO does generalized object representation more effectively without precision losses than other object
detection models. CNN architecture models have the ability to eliminate highlights and identify objects in any
given image. When implemented appropriately, CNN models can address issues like deformity diagnosis,
creating educational or instructive application, etc. This article reached at number of observations and
perspective findings through the analysis. Also it provides support for the focused visual information and
feature extraction in the financial and other industries, highlights the method of target detection and feature
selection, and briefly describes the development process of yolo algorithm
This document discusses the development of a face mask detection system using YOLOv4. The system uses a deep learning model with YOLOv4 to detect faces in real-time video and determine if each person is wearing a mask or not. It is trained on images of faces with and without masks. The model uses CSPDarknet53 as the backbone network and PANet for feature aggregation. It is implemented with OpenCV and a Python GUI for a user interface. The goal is to help enforce mask mandates and alert authorities if too many people in an area are not wearing masks.
IRJET- Object Detection and Recognition using Single Shot Multi-Box DetectorIRJET Journal
This document summarizes research on object detection and recognition using the Single Shot Multi-Box Detector (SSD) deep learning model. SSD improves on existing object detection systems by eliminating the need for generating object proposals and resampling pixels or features, thereby making detection faster and encapsulating all computation in a single neural network. The researchers applied SSD to standard datasets like PASCAL VOC, COCO, and ILSVRC and achieved competitive accuracy compared to methods using additional proposal steps, with SSD running significantly faster at 59 FPS. Experimental results on PASCAL VOC using SSD achieved a mean average precision of 74.3%, outperforming a comparable Faster R-CNN model.
VEHICLE DETECTION USING YOLO V3 FOR COUNTING THE VEHICLES AND TRAFFIC ANALYSISIRJET Journal
This document discusses using YOLOv3 for vehicle detection and counting from video to analyze traffic. Video frames are used to identify moving vehicles and background extraction is applied to each frame to detect and count vehicles. YOLOv3 with a pre-trained model is used for object detection and classification of vehicles into classes like car, bus, motorcycle. Classification is shown for vehicles and individual types to analyze traffic levels. The analysis of vehicle levels is displayed using a pie chart.
Person Detection in Maritime Search And Rescue OperationsIRJET Journal
This document discusses recent research on using computer vision and machine learning techniques for person detection in maritime search and rescue operations from images and video captured by drones. Specifically, it summarizes 12 research papers on this topic, covering approaches such as training convolutional neural networks on bird's eye view datasets to detect people from aerial images, using multiple detection methods like sliding windows and precise localization, combining data from multiple drones and sensors to optimize search efforts, and evaluating models on both RGB and thermal image datasets. The goal of this research is to automate part of the search process to make maritime rescue operations more efficient and effective.
Person Detection in Maritime Search And Rescue OperationsIRJET Journal
1) The document discusses using machine learning and computer vision techniques for person detection in maritime search and rescue operations using drones/UAVs. It aims to automatically detect people in images/videos captured by drones to help with search efforts.
2) A key challenge is that people appear small in drone footage and are often obscured by vegetation or terrain. The models need to be trained on similar bird's eye view data to achieve high accuracy. The document reviews different person detection models and their use in search and rescue.
3) It discusses recent work involving using efficient neural networks like MobileNet for object detection from drones. Other work involves using depth sensors and pose estimation for person tracking, as well as using distributed deep learning
Pothole Detection for Safer Commutes with the help of Deep learning and IOT D...IRJET Journal
This document describes a proposed approach to detect potholes in real-time using deep learning and IoT devices. The approach involves 1) developing a device integrated into vehicles to continuously scan road surfaces and detect potholes, 2) using GPS to determine locations of detected potholes, and 3) linking the database of pothole locations to mapping software for access by authorities and users. The method uses a YOLO v8 deep learning algorithm to identify potholes from images captured by onboard cameras.
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTIONIRJET Journal
This document presents a web application designed using YOLOv5 and Flask for detecting Indian currency notes to aid visually impaired people. The researchers trained a YOLOv5 model on a dataset of Indian currency note images. They evaluated the model's performance using metrics like precision, recall, and mean average precision (mAP). They then built a web app with front-end components for uploading images and back-end components using Flask and YOLOv5 for detecting notes in images. The app detects notes with over 90% probability and outputs the label and an audio file of the label in English and Hindi. Testing showed the model and app could accurately detect currency notes in single and multiple denomination images on both laptop and
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
This document provides a comprehensive analysis of imbalance problems in object detection. It presents a taxonomy to classify different types of imbalances and discusses solutions proposed in literature. The analysis highlights significant gaps including existing imbalances that require further attention, as well as entirely new imbalances that have never been addressed before. A survey of imbalance problems caused by weather conditions and common object imbalances is conducted. Methods for addressing imbalances include data augmentation using GANs and balancing training based on class performance.
The document discusses implementing deep learning algorithms for object detection and scene perception in self-driving cars. It compares the YOLO and Faster R-CNN models, finding that Faster R-CNN has higher accuracy (mAP of 41.8) but lower speed (17.1 FPS), while YOLO has lower accuracy (mAP of 18.6) but higher speed (212.4 FPS). The authors conclude that achieving both high accuracy and high speed remains a goal for future work, which could explore using newer versions of YOLO or other models.
Similar to Object detection and ship classification using YOLO and Amazon Rekognition (20)
Air quality forecasting using convolutional neural networksBIJIAM Journal
Air pollution is now one of the biggest environmental risks, which causes more than 6 million premature deathseach year from heart diseases, stroke, diabetes, respiratory disease, and so on. Protecting humans from thedamage which is caused by air pollution is one of the major issues for the global community. The prediction ofair pollution can be done by machine learning (ML) algorithms. ML combines statistics and computer science tomaximize the prediction power. ML can be also used to predict the air quality index (AQI). The aim of this researchis to develop a convolutional neural network (CNN) model to predict air quality from the unseen data set, whichincludes concentration of nitrogen dioxide (NO2), carbon monoxide (CO), and sulfur dioxide (SO2). The proposedsystem will be implemented in two steps; the first step will focus on data analysis and pre-processing, includingfiltering, feature extraction, constructing convolutional neural network layers, and optimizing the parameters ofeach layer, while the second step is used to evaluate its model accuracy. The output is predicted as AQI for thedeveloped CNN model. The developed CNN model achieves a root-mean-square error of 13.4150 and a highaccuracy of 86.585%. The overall model is implemented using MATLAB software.
Prevention of fire and hunting in forests using machine learning for sustaina...BIJIAM Journal
Deforestation, illegal hunting, and forest fires are a few current issues that have an impact on the diversity andecosystem of forests. To increase the biodiversity of species and ecosystems, it becomes imperative to preservethe forest. The conventional techniques employed to prevent these issues are costly, less effective, and insecure.The current systems are unreliable and use more energy. By utilizing an Internet of Things (IOT) system, thistechnology offers a more practical and economical method of continuously maintaining and monitoring the statusof the forest. To guarantee excellent security, this system combines a number of sensors, alarms, cameras, lights,and microphones. It aids in reducing forest loss, animal trafficking, and forest fires. In the suggested system,sensors are used for monitoring, and cloud storage is used for data storage. Through the use of machine learning,the raspberry pi camera module significantly aids in the prevention of unlawful wildlife hunting as well as thedetection and prevention of forest fires.
Machine learning for autonomous online exam frauddetection: A concept designBIJIAM Journal
E-learning (EL) has emerged as one of the most valuable means for continuing education around the world,especially in the aftermath of the global pandemic that included a variety of obstacles. Real-time onlineassessments have become a significant concern for educational organizations. Instances of fraudulent behaviorduring online exams (OEs) have created considerable challenges for exam invigilators, who are unable to identifyand remove such dishonest behavior. In response to this significant issue, educational institutions have used avariety of manual procedures to alleviate the situation, but none of these measures have shown to be particularlyinnovative or effective. The current study presents a novel strategy for detecting fraudulent actions in real timeduring OEs that uses convolutional neural network (CNN) algorithms and image processing. The developmentmodel will be trained using the CK and CK++ datasets. The training procedure will use 80% of the selecteddataset, with the remaining 20% used for model testing to confirm the model’s efficacy and generalization capacity.This project intends to revolutionize the monitoring and prevention of fraudulent actions during online tests byintegrating CNN techniques and image processing. The use of CK and CK++ datasets, as well as an 80–20split for training and testing, contributes to the study’s thorough and rigorous approach. Educational institutionscan improve their assessment procedures and maintain the credibility of EL as a credible and equitable way ofcontinuing education by successfully using this unique technique.
Prognostication of the placement of students applying machine learning algori...BIJIAM Journal
Placement is the process of connecting the selected candidate with the employer. Every student might have adream of having a job offer when he or she is about to complete her course. All educational institutions aim athaving their students well placed in good organizations. The reputation of any institution depends on the placementof its students. Hence, many institutions try hard to have a good placement cell. Classification using machinelearning may be utilized to retrieve data from the student-databases. A prediction model that can foretell theeligibility of the students based on their academic and extracurricular achievements is proposed. Related data wascollected from many institutions for which the placement-prediction is made. This paradigm is being weighed upwith the existing algorithms, and findings have been made regarding the accuracy of predictions. It was found thatthe proposed algorithm performed significantly better and yielded good results.
Drug recommendation using recurrent neural networks augmented with cellular a...BIJIAM Journal
Drug recommendation systems are systems that have the capability to recommend drugs. On a daily basis, a huge amount of data is being generated by the patients. All this valuable data can be properly utilized to create a reliable drug recommendation system. In this paper, we recommend a system for drug recommendations. The main scope of our system is to predict the correct medication based on reviews and ratings. Our proposed system uses natural language processing techniques (NLP), recurrent neural networks (RNN), and cellular automata (CA). We also considered various metrics like precision, recall, accuracy, F1 score, and ROC curve as measures of our system’s performance. NLP techniques are being used for gathering useful information from patient data, and RNN is a machine learning methodology that works really well in analyzing textual data. The system considers various patient data attributes like age, gender, dosage, medical history, and symptoms in order to make appropriate predictions. The proposed system has the potential to help medical professionals make informed drug recommendations.
Discrete clusters formulation through the exploitation of optimized k-modes a...BIJIAM Journal
This article focuses on the results of self-funded quantitative research conducted by social workers working inthe “refugee” crisis and social services in Greece (1). The research, among other findings, argues that front-lineprofessionals possess specific characteristics regarding their working profile. Statistical methods in the researchperformed significance tests to validate the initial hypotheses concerning the correlation between dataset variables.On the contrary of this concept, in this work, we present an alternative approach for validating initial hypothesesthrough the exploitation of clustering algorithms. Toward that goal, we evaluated several frequently used clusteringalgorithms regarding their efficiency in feature selection processes, and we finally propose a modified k-Modesalgorithm for efficient feature subset selection.
Emotion Recognition Based on Speech Signals by Combining Empirical Mode Decom...BIJIAM Journal
This paper proposes a novel method for speech emotion recognition. Empirical mode decomposition (EMD) is applied in this paper for the extraction of emotional features from speeches, and a deep neural network (DNN) is used to classify speech emotions. This paper enhances the emotional components in speech signals by using EMD with acoustic feature Mel-Scale Frequency Cepstral Coefficients (MFCCs) to improve the recognition rates of emotions from speeches using the classifier DNN. In this paper, EMD is first used to decompose the speech signals, which contain emotional components into multiple intrinsic mode functions (IMFs), and then emotional features are derived from the IMFs and are calculated using MFCC. Then, the emotional features are used to train the DNN model. Finally, a trained model that could recognize the emotional signals is then used to identify emotions in speeches. Experimental results reveal that the proposed method is effective.
Enhanced 3D Brain Tumor Segmentation Using Assorted Precision TrainingBIJIAM Journal
A brain tumor is a medical disorder faced by individuals of all demographics. Medically, it is described as the spread of nonessential cells close to or throughout the brain. Symptoms of this ailment include headaches, seizures, and sensory changes. This research explores two main categories of brain tumors: benign and malignant. Benign spreads steadily, and malignant express growth makes it dangerous. Early identification of brain tumors is a crucial factor for the survival of patients. This research provides a state-of-the-art approach to the early identification of tumors within the brain. We implemented the SegResNet architecture, a widely adopted architecture for threedimensional segmentation, and trained it using the automatic multi-precision method. We incorporated the dice loss function and dice metric for evaluating the model. We got a dice score of 0.84. For the tumor core, we got a dice score of 0.84; for the whole tumor, 0.90; and for the enhanced tumor, we got a score of 0.79.
Hybrid Facial Expression Recognition (FER2013) Model for Real-Time Emotion Cl...BIJIAM Journal
Facial expression recognition is a vital research topic in most fields ranging from artificial intelligence and gaming to human–computer interaction (HCI) and psychology. This paper proposes a hybrid model for facial expression recognition, which comprises a deep convolutional neural network (DCNN) and a Haar Cascade deep learning architecture. The objective is to classify real-time and digital facial images into one of the seven facial emotion categories considered. The DCNN employed in this research has more convolutional layers, ReLU activation functions, and multiple kernels to enhance filtering depth and facial feature extraction. In addition, a Haar Cascade model was also mutually used to detect facial features in real-time images and video frames. Grayscale images from the Kaggle repository (FER-2013) and then exploited graphics processing unit (GPU) computation to expedite the training and validation process. Pre-processing and data augmentation techniques are applied to improve training efficiency and classification performance. The experimental results show a significantly improved classification performance compared to state-of-the-art (SoTA) experiments and research. Also, compared to other conventional models, this paper validates that the proposed architecture is superior in classification performance with an improvement of up to 6%, totaling up to 70% accuracy, and with less execution time of 2,098.8 s.
Role of Artificial Intelligence in Supply Chain ManagementBIJIAM Journal
The term “supply chain” refers to a network of facilities that includes a variety of companies. To minimise the entire cost of the supply chain, these entities must collaborate. This research focuses on the use of Artificial Intelligence techniques in supply chain management. It includes supply chain management examples like as
demand forecasting, supply forecasting, text analytics, pricing panning, and more to help companies improve their processes, lower costs and risk, and boost revenue. It gives us a quick rundown of all the key principles of economics and how to comprehend and use them effectively.
An Internet – of – Things Enabled Smart Fire Fighting SystemBIJIAM Journal
Fire accident is a mishap that could be either man made or natural. Fire accident occurs frequently and can be controlled but may at time result in severe loss of life and property. Many a times fire fighters struggle to sort out the exact source of fire as it is continuously flammable and it spreads all over the area. On this concern, we have designed an Internet – of – Things (IoT) based device which sorts out the exact source of fire through a software and
hardware devices and also the complete detail of an area can be visualized by fire fighters which are preinstalled in the software itself. Because of our system’s intelligence in decision making during fire fighting, the proposed system is claimed as ‘Smart Fire Fighting System’. The implementation of the smart fire fighting system can make
the fire fighters to analyze the current situation immediately and make the decision quicker in an effective manner. Because of this system, fire losses in a building can be greatly reduced and many lives can be rescued immediately and moreover fire spread can also be restricted.
Evaluate Sustainability of Using Autonomous Vehicles for the Last-mile Delive...BIJIAM Journal
This study aims to determine if autonomous vehicles (AV) for last-mile deliveries are sustainable from three perspectives: social, environmental, and economic. Because of the relevance of AV application for last-mile delivery, safety was only addressed for the sake of societal sustainability. According to this study, it is rather safe to deploy AVs for delivery since current society’s speed restriction and decent road conditions give the foundation for AVs to run safely for last-mile delivery in metropolitan areas. Furthermore, AV has a distinct advantage in the event of a pandemic. The biggest worry for environmental sustainability is the emission problem. In terms of a series of emissions, it is established that AV has a substantial benefit in terms of emission reduction. This is mostly due to the differences in driving behaviour between autonomous cars and human vehicles. Because of AV’s commercial character, this research used a quantitative approach to highlight the economic sustainability of AV. The
study demonstrates the economic benefit of AV for various carrying capacities.
Automation Attendance Systems Approaches: A Practical ReviewBIJIAM Journal
Accounting for people is the first step of every manpower-based organization in today’s world. Hence, it takes up a signification amount of energy and value in the form of money from respective organizations for both
implementing a suitable system for manpower management as well as maintaining that same system. Although this amount of expenditure for big organizations is near to nothing, rather just a formality, it does not hold as much truth for small organizations such as schools, colleges, and even universities to a certain degree. This is the first point. The second point for discussion is that much work has been done to solve this issue. Various technologies like Biometrics, RFID, Bluetooth, GPS, QR Code, etc., have been used to tackle the issues of attendance collection. This study paves
the path for researchers by reviewing practical methods and technologies used for existing attendance systems.
Transforming African Education Systems through the Application of IOTBIJIAM Journal
The project sought to provide a paradigm for improving African education systems through the use of the IOT. The created IOT model for Africa will enable African countries, notably Namibia, to exchange educational content and resources with other African countries. The objective behind the IOT paradigm in Africa’s education sectors is to provide open access to knowledge and information. The study revealed that there are no recognised platforms in African education systems that are utilised by African governments to interact, communicate, and share educational material directly with African institutions. As a result, the current research developed a model for
transforming African education systems using the IOT in the Namibian context, which will serve as a centralised online platform for self-study, new skill acquisition, and self-improvement using materials provided by African institutions of higher learning. Everyone is welcome to use the platform, including students, instructors, and
members of the general public.
Deep Learning Analysis for Estimating Sleep Syndrome Detection Utilizing the ...BIJIAM Journal
Manual sleep stage scoring is frequently performed by sleep specialists by visually evaluating the patient’s neurophysiological signals acquired in sleep laboratories. This is a difficult, time-consuming, & laborious process. Because of the limits of human sleep stage scoring, there is a greater need for creating Automatic Sleep Stage Classification (ASSC) systems. Sleep stage categorization is the process of distinguishing the distinct stages of sleep & is an important step in assisting physicians in the diagnosis & treatment of associated sleep disorders. In this research, we offer a unique method & a practical strategy to predicting early onsets of sleep disorders, such as restless leg syndrome & insomnia, using the Twin Convolutional Model FTC2, based on an algorithm composed of two modules. To provide localised time-frequency information, 30 second long epochs of EEG recordings are subjected to a Fast Fourier Transform, & a deep convolutional LSTM neural network is trained for sleep stage categorization. Automating sleep stages detection from EEG data offers a great potential to tackling sleep irregularities on a
daily basis. Thereby, a novel approach for sleep stage classification is proposed which combines the best of signal
processing & statistics. In this study, we used the PhysioNet Sleep European Data Format (EDF) Database. The code
evaluation showed impressive results, reaching accuracy of 90.43, precision of 77.76, recall of 93,32, F1-score of 89.12 with the final mean false error loss 0.09. All the source code is availlable at https://github.com/timothy102/eeg.
Robotic Mushroom Harvesting by Employing Probabilistic Road Map and Inverse K...BIJIAM Journal
A collision-free path to a destination position in a random farm is determined using a probabilistic roadmap (PRM) that can manage static and dynamic obstacles. The position of ripening mushrooms is a result of picture processing. A mushroom harvesting robot is explored that uses inverse kinematics (IK) at the target
position to compute the state of a robotic hand for grasping a ripening mushroom and plucking it. The DenavitHartenberg approach was used to create a kinematic model of a two-finger dexterous hand with three degrees of freedom for mushroom picking. Unlike prior experiments in mushroom harvesting, mushrooms are not planted in a grid or design, but are randomly scattered. At any point throughout the harvesting process, no human interaction is necessary
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Object detection and ship classification using YOLO and Amazon Rekognition
1. BOHR International Journal of Internet of things,
Artificial Intelligence and Machine Learning
2023, Vol. 2, No. 1, pp. 47–57
DOI: 10.54646/bijiam.2023.16
www.bohrpub.com
RESEARCH
Object detection and ship classification using YOLO
and Amazon Rekognition
Alejandro Perez and Sikha Bagui*
Hal Marcus College of Science and Engineering, University of West Florida, Pensacola, FL, United States
*Correspondence:
Sikha Bagui,
bagui@uwf.edu
Received: 26 July 2023; Accepted: 08 August 2023; Published: 31 August 2023
The need for reliable automated object detection and classification in the maritime sphere has increased
significantly over the last decade. The increased use of unmanned marine vehicles necessitates the development
and use of an autonomous object detection and classification systems that utilize onboard cameras and operate
reliably, efficiently, and accurately without the need for human intervention. As the autonomous vehicle realm
moves further away from active human supervision, the requirements for a detection and classification suite call
for a higher level of accuracy in the classification models. This paper presents a comparative study using different
classification models on a large maritime vessel image dataset. For this study, additional images were annotated
using Roboflow, focusing on the types of subclasses that had the lowest detection performance, a previous work
by Brown et al. (1). The present study uses a dataset of over 5,000 images. Using the enhanced set of image
annotations, models were created in the cloud using Google Colab using YOLOv5, YOLOv7, and YOLOv8 as well
as in Amazon Web Services using Amazon Rekognition. The models were tuned and run for five runs of 150 epochs
each to collect efficiency and performance metrics. Comparison of these metrics from the YOLO models yielded
interesting improvements in classification accuracies for the subclasses that previously had lower scores, but at
the expense of the overall model accuracy. Furthermore, training time efficiency was improved drastically using the
newer YOLO APIs. In contrast, using Amazon Rekognition yielded superior accuracy results across the board, but
at the expense of the lowest training efficiency.
Keywords: YOLO, Amazon Rekognition, classification models, image processing, maritime vehicles
1. Introduction
The demand for automated object detection and
classification in the maritime sphere has increased
significantly over the last decade. Unmanned marine vehicle
use has increased at a large rate as humans increasingly
strive to automate previously dangerous or impossible-to-
execute jobs in open waters. These tasks necessitate the
development and use of autonomous object detection and
classification systems that utilize onboard cameras and
operate reliably, efficiently, and accurately without the need
for human intervention. As the autonomous vehicle realm
moves further away from active human supervision, the
requirements for a detection and classification suite call for a
higher level of accuracy in the classification models.
Brown et al. (1) created an object detection model
for large maritime vessels using a publicly available
dataset of maritime images from (2) and (3). This dataset
was imported into Roboflow (4), where a subset was
annotated with bounding boxes and labeled using five
different classification labels for large seafaring vessels
(“Container Ships,” “Cargo Ships,” “Military Ships,”
“Tankers,” and Roll-On-Roll-Off ships or “RORO”).
This annotated subset was used to train an object
recognition and classification model by utilizing the
open-source Ultralytics (5, 6) vision AI library YOLOv5
(7, 8). Their results produced a model that yielded an
overall 0.892 average precision rate and 0.902 average
recall rate; however, while the model proved accurate in
detecting vessels with “Cruise Ship,” “Military Ship,” and
47
2. 48 Perez and Bagui
“RORO” labels with 93, 98, and 98% correct classification
percentages, respectively, it performed considerably worse
in classifications of “Container Ships” and “Tankers”
with a correct classification percentage of 86 and 72%,
respectively (1).
The aim of this paper is to explore methods to improve the
efficiency and accuracy of this classification model, making
special targets for the improvement of the two classification
labels that had the lowest correct classification percentages.
To achieve this:
First, Roboflow was used to improve the quality of
the training data. This was done by enhancing the
annotated subset with more bounded and labeled images
from the Analytics Vidhya dataset. Special emphasis was
placed on images that were labeled with the Container
Ship and Tanker labels as those represented the largest
target for improvement. Furthermore, the entire enhanced
annotated subset would also be subject to preprocessing
methods like auto orientation, image resizing, grayscale
transformation, and image augmentation methods like
brightness adjustments and noise reduction.
Second, in addition to training the model again
with YOLOv5 (8) to establish a baseline and to test
whether improvements were achieved by improving
the training data quality, models were trained using
YOLOv7 (9) and YOLOv8 (10) in an attempt to
translate the improvements in the YOLO libraries over
to the creation of more efficient and accurate models.
Furthermore, the same enhanced annotated set would
be used with Amazon Rekognition (11) computer vision
API from Amazon Web Services (11) to train another
detection and classification model, which would serve as
a baseline comparison against the models created from
the YOLO libraries.
Section 2 of this paper will explore some background
on the YOLO family of algorithms and describe the key
differences between the three versions, as well as the
improvements achieved in the latest version as it pertains
to the goals of this paper. Furthermore, this section
also includes some background on Amazon Rekognition
computer vision API and its uses as they relate to the work
being done in this work.
This paper is unique in its comparative approach of
different classification models using a large maritime
vessel image database. In contrast to the original paper
(1), where a single YOLO model was utilized, it creates
three additional classification models and provides
an efficiency and accuracy comparison between them
to arrive at the best possible results. Furthermore, it
utilizes a premium service in Amazon Rekognition to
provide a different alternative to the YOLO approach.
The “Related Works” Section 3 of this paper goes into
further details about the difference of this paper with
other published work. The rest of this paper is presented
as follows: Section 4 presents the methodology; Section
5 explores the results; and finally, Section 6 details
the conclusions.
2. Background on YOLO and
Amazon Rekognition
YOLO is a family of compound-scaled object detection
algorithms, which stands for “You Only Look Once” (5,
12). The YOLO object detection models are trained using
the “Common Objects in Context” COCO (13) dataset of
pre-trained objects and weights. These neural network-based
models are used to detect objects in images or videos in
real time by making predictions of bounding boxes and
class probabilities with the use of a single fully contacted
layer, in a single iteration. In essence, YOLO works by
dividing the input image into a grid of cells and predicting
bounding boxes and object scores for each cell. It then refines
the predictions and applies non-maximum suppression to
eliminate duplicate detections.
This paper uses three different evolutions of the YOLO
family of algorithms as a comparative study on how the
differences among them work to create more efficient and
accurate object detection and classification models for large
maritime vessels.
YOLOv5 (8) is the fifth iteration of the YOLO family
of object detection algorithms and is the natural extension
of the YOLOv3 PyTorch repository (14). YOLOv5 ported
the previous YOLO version Darknet weights into PyTorch
(12) and made further improvements in speed, performance,
and ease of use.
YOLOv7 (9) was released two years later and promised
to deliver increased efficiency and model performance with
key improvements in the way it handles the COCO (13)
pre-trained weight object dataset (15). YOLOv7 uses more
floating point operations than YOLOv5; hence, the increased
arithmetic computational operations make it optimized to be
used with state-of-the-art GPUs, delivering faster processing
speeds than the YOLOv5 versions when training on high-
processing GPUs.
YOLOv8 (10) is the latest iteration of the YOLO family
of object detection algorithms. Developed by Ultralytics (5)
as their official successor to YOLOv5 and released in the
beginning of 2023, it was designed with a strong focus on
speed, size, and accuracy. It introduces new features like a
new backbone network, new loss function, and new anchor-
free detection head. In direct benchmarks against YOLOv5
and v7 on high-end GPUs, it trains faster and produces more
accurate models (16).
Amazon Rekognition is Amazon Web Services (AWS) (11,
17) solution to cloud-based image, video analysis, and object
detection. It provides a simple and friendly API for users to
introduce image and video detection into their applications
without requiring a deep understanding of machine learning
3. 10.54646/bijiam.2023.16 49
and computer vision. Since its release in 2016, it has received
several updates and it is now widely used in face recognition
and detection, surveillance, content moderation, retail, and
healthcare (17). Amazon Rekognition uses Custom Labels
(11) to create classifications for objects and leverages the
strong AWS cloud toolkit to create detection models.
3. Related works
Object detection and classification research has been done
using a varied number of techniques, algorithms, and models.
This paper is a continuation of the work started by Brown
et al. (1) on maritime object detection and classification using
YOLOv5. Brown et al. (1) used the same maritime dataset
and created the five vessel subclasses used in this paper, using
them to train a detection model with YOLOv5 and achieving
an overall mAP (0.5) of 0.919. In this paper, we expand the
scope of this research and introduce YOLOv7, YOLOv8, and
Amazon Rekognition models to form a comparative study
and attempt to arrive at an improved maritime detection and
classification model. Kim et al. (18) also provide research on
maritime object detection and classification using YOLOv5.
Their research uses a different image dataset and different
classification subcategories to train their model, achieving
an overall mAP (0.5) value of 0.898. As with Brown et al.
(1) however, they do not expand their study to different
classification models.
Tang et al. (19) provide research on comparing different
classification models; however, they do not use a traditional
maritime image dataset, opting instead to focus on a GaoFen-
3 dataset of synthetic aperture radar images and other
satellite images of vessels. While technically it is a maritime
image dataset, the top-down angle of their images makes the
nature of their detection models completely different from
the ones created by our research. Furthermore, their models
are based on YOLOv3, YOLOv5, and their modified YOLO
algorithm called N-YOLO (19), which makes for a different
comparative study than the one undertaken in this paper.
Zhao et al. (20) provide a similar comparative study using
top-down maritime images in their detection algorithms, but
instead of a satellite, they use a drone image dataset called
SeaDronesee. Their comparative study is also based on their
own custom YOLO model called YOLOv7-sea, which, as the
name suggests, is based on YOLOv7 instead of YOLOv5 like
Tang et al. (19).
Similarly, Olorunshola et al. (21) also provide a
comparative study of object detection models using YOLO
models, and this time, it is using YOLOv5 and YOLOv7.
However, they do not use maritime images but a custom
dataset for Remote Weapon Station with four classification
subclasses of ‘Persons’, ‘Handguns’, “Rifles,” and “Knives.”
Their models achieve an mAP (0.5) of 0.515 and 0.553 for
YOLOv7 and YOLOv5, respectively.
Other comparative studies are done by Jiang et al. (22)
using YOLOv7 and CBAM-YOLOv7 and Gillani et al.
(23) using YOLOv5 and YOLOv7, but the former uses
a Hemp Duck image dataset, while the latter uses video
as their training data. Notably, Jiang et al. (22) use
Convolutional Block Attention Module CBAM in their
custom YOLO algorithm to improve the feature extraction
rate of their model. Both these papers, however, do not
explore the maritime vessel realm like some of the other
papers mentioned.
Multiclass detector and recognition studies are done by
Ghahremani et al. (24), using CNN algorithms to train
maritime detection models on two multiclass vessel image
datasets, and by Yang et al. (25), using KPE YOLOv5 to train
a model using top-down drone images from the VisDrone-
2020 dataset. Both these papers show results for object
detection and classification models, but neither offers a
comparative study on multiple algorithms.
Finally, Sharma (26) and Mohanta and Sethi (27) provide
research on multiclass object detection and classification
models using Amazon Rekognition. The former focuses on
multiobject detection using custom labels, whereas the latter
bases their study on using Amazon Rekognition for image
pattern detection. Neither of these papers uses maritime
image datasets, nor do they provide a comparative study with
other detection models.
4. Methodology
The process of setting up the methods to implement our
desired model requires several steps, summarized as follows:
(i) The Roboflow environment of the original work
performed by Brown et al. (1) was recreated. This
means the (2) and (3) dataset of 8000 vessel images was
uploaded and the original annotated subset consisting
of roughly 1500 annotated images was imported into
the Roboflow project. Following this, using Google
Colab (28) cloud computing, the original YOLOv5
model was recreated to be used as baseline comparison
against the efficiency and accuracy of the rest of the
models used in this paper.
(ii) To improve the quality of the training data, the original
annotated subset was expanded with more annotated
images using the remaining available images from the
Analytics Vidhya dataset. This process consisted of
copying over the original annotated subset in Roboflow
and annotating another 1000 images, focusing on
more annotations for the Container Ship and Tanker
labels. Once the annotated subset was expanded, the
preprocessing and image augmentation steps were
taken to prepare the subset for model training.
(iii) The prepared data were then independently imported
into the Google Colab environment set up for each
4. 50 Perez and Bagui
of the YOLO models and imported into Amazon S3
(29) cloud storage to begin model training on each
of the respective models. From here, each model was
tuned and the data and results were collected for
comparative analysis.
4.1. Original project setup
Establishing the original model as a baseline is key for
the goal of this paper. Having a baseline of efficiency
and performance will give the new models a comparison
point with which to tangibly establish improvement. The
original project must be recreated to retain not only the
original annotated subset but also the original classification
labels. Importing the Analytics Vidhya image dataset and
the annotated subset provided by Brown et al. (1) into
a new Roboflow environment allows for the recreation of
the annotation environment, the splitting of the annotated
subset into training, test, and validation sets, and provides
the export tools necessary to export the data over to
the model training environment. Figure 1 presents an
example of the bounding box annotation and labeling of a
Container using Roboflow.
With the data now available, the Google Colab (28) cloud
computing environment provided by Brown et al. (1) can
now be used to import the annotated data and begin training
the model using the YOLOv5 library. As with Brown et al.
(1), training is started by passing the arguments shown in
Figure 2.
The training parameters are as follows:
• img: input image size in pixels length/width
• batch: batch size, hardware dependent on the available
GPU memory
• epochs: number of training epochs
FIGURE 1 | Bounding Box Annotation and Labeling of Container
using Roboflow.
FIGURE 2 | YOLOv5 model training arguments.
• data: location where training data are saved
• weights: pre-trained weights provided by YOLO
• cache: cache images for faster training
4.2. Data quality enhancement
Having the baseline data and model established, the process
now shifts to enhancing the annotated subset to improve the
training data. For this, the annotated subset is duplicated,
and more annotations are performed on the available images
from the dataset. Special focus is made to annotate more
images with the Container Ship and Tanker labels as those
represent the lowest-performing classifications in the original
model. In total, an additional 1000 images were annotated
and added to the extended annotated subset. Table 1 presents
the original annotations versus the extended annotations.
The extended annotated subset is then further enhanced
by applying image transformations and augmentation
preprocessing techniques. The goal here is to standardize all
the images and remove errant instances that might cause the
model to train slowly and produce inaccurate results.
All annotations are subject to the following preprocessing:
• Auto-orientation: Discards EXIF rotations and
standardizes pixel ordering
• Resize: Downsizes all images to 416 × 416 pixels
• Grayscale Transformation: Merges color channels to
make images insensitive to color.
Figure 3 presents an example of grayscale transformation.
Brightness: Apply -25%/25% brightness variability
to standardize images against lighting and camera
setting changes. Figure 4 presents an example of
brightness variability.
Noise smoothing: Image smoothing transformation
to remove image noise. Figure 5 demonstrates the
noise reduction.
4.3. Model training
For YOLO model training, the steps are very similar across
all three libraries. First, the annotated sets of train, test,
TABLE 1 | Original annotated subset [created by (1)] vs.
extended dataset.
Class Original annotations Extended annotations
All 1,460 2,507
Container ship 292 581
Cruise ship 292 443
Military ship 292 448
RORO 292 451
Tanker 292 584
5. 10.54646/bijiam.2023.16 51
FIGURE 3 | Grayscale transformation.
FIGURE 4 | Brightness variability.
FIGURE 5 | Noise reduction.
FIGURE 6 | YOLOv5 model training with an extended dataset.
and validation data are imported into the respective training
environment; then the training is started by executing the
train command with the familiar arguments. YOLOv5 model
training is as in Brown et al. (1). Figure 6 presents the
parameters used to train the extended dataset with YOLOv5.
YOLOv7 model training is similar; however, there are
two key differences, that is, the specifications using YOLOv7
training weights with the –weights argument and the
key argument of –device 0 to specify which CUDA
(30) device to utilize for training to utilize YOLOv7
greater CUDA optimization [(5), (21), and (26)]. Figure 7
presents the model training parameters for the extended
dataset with YOLOv7.
The YOLOv8 (10) model train command syntax changed
slightly from the previous two versions; however, the core
FIGURE 7 | YOLOv7 model training with an extended dataset.
FIGURE 8 | YOLOv8 model training with an extended dataset.
meaning behind the argument remains the same. YOLOv8
has standardized CUDA use, so there is no need to specify
CUDA device in the train command. We specify the weights
as YOLOv8 weights. Figure 8 presents YOLOv8 model
training with the extended dataset.
For Amazon Rekognition (11), the setup is a little different,
requiring manual Roboflow export to Amazon S3 bucket
(29) cloud storage as Amazon Custom Labels and setting
the S3 bucket permissions to allow Rekognition access.
Once the S3 bucket is set with the labeled images, the
Rekognition classification project can be created and pointed
to that S3 bucket, and the model training can start. This
work was performed on cloud-based infrastructure provided
by AWS, using dedicated runners on AWS servers that
provide premium CPUs and GPUs for training on machine
learning models.
5. Results and discussion
The results are split into three distinct sections:
(i) Model training efficiency is analyzed across all the
models and iterations. Seven total training time data
points are captured representing the three YOLO
models done with the original annotated subset, three
YOLO models with the enhanced annotated subset,
and the Amazon Rekognition model with the enhanced
annotated subset.
(ii) YOLO models’ accuracy utilizing the original
annotated subset is analyzed. This is the true baseline
comparison as the original model created by Brown
et al. (1) trained with the original annotated subset is
compared against the other YOLO models also trained
with the original annotated subset.
(iii) The results of all three YOLO models and Amazon
Rekognition model utilizing the enhanced annotated
subset are analyzed. This represents the true
comparison point as all improvement steps are
represented in this part and should provide the most
interesting results.
6. 52 Perez and Bagui
TABLE 2 | Training time efficiency per model.
Model Runs Epochs Training time/epoch (Avg) Total training time (Avg)
YOLOv5 original 5 150 0:01:21 3:22:30
YOLOv7 original 5 150 0:00:50 2:05:00
YOLOv8 original 5 150 0:00:17 0:42:30
YOLOv5 enhanced 5 150 0:01:25 3:32:30
YOLOv7 enhanced 5 150 0:00:54 2:15:00
YOLOv8 enhanced 5 150 0:00:18 0:45:00
Rekognition enhanced 5 5:13:00
FIGURE 9 | Training efficiency graph (smaller bars are better).
TABLE 3 | YOLOv5 model performance metrics.
Class Images Labels Precision Recall mAP @.5 mAP @ 5.95
All 292 322 0.897 0.887 0.919 0.591
Container ship 292 69 0.903 0.841 0.916 0.569
Cruise ship 292 59 0.896 0.879 0.923 0.57
Military ship 292 65 0.884 0.908 0.961 0.584
RORO 292 60 1 0.994 0.995 0.712
Tanker 292 69 0.802 0.812 0.798 0.518
5.1. Model training efficiency
Steps were taken to standardize the training environment
used for all the YOLO models. The Google Colab (28)
premium cloud computing environment was set to create
a stable training environment with access to the same
baseline computing resource across all Colab notebooks.
This included premium GPU and RAM access and enough
computing units to permit uninterrupted training for all
instances of runs and epochs.
Overall, each YOLO model was run 5 times for 150 epochs
and training times were recorded for all models, split by
training time per epoch. Amazon Rekognition does not
display epoch data, so only run data are recorded along with
the recorded training time for this model. Table 2 presents
the training time efficiency per model.
TABLE 4 | YOLOv7 model performance metrics.
Class Images Labels Precision Recall mAP @.5 mAP @ 5.95
All 292 322 0.801 0.749 0.824 0.437
Container ship 292 69 0.816 0.71 0.826 0.41
Cruise ship 292 59 0.682 0.712 0.73 0.454
Military ship 292 65 0.822 0.708 0.837 0.361
RORO 292 60 0.949 0.933 0.978 0.613
Tanker 292 69 0.734 0.78 0.747 0.348
TABLE 5 | YOLOv8 model performance metrics.
Class Images Labels Precision Recall mAP @.5 mAP @ 5.95
All 292 322 0.908 0.885 0.934 0.621
Container ship 292 69 0.897 0.882 0.952 0.609
Cruise ship 292 59 0.863 0.847 0.91 0.6
Military ship 292 65 0.934 0.908 0.95 0.586
RORO 292 60 0.983 0.992 0.995 0.761
Tanker 292 69 0.864 0.854 0.887 0.575
FIGURE 10 | YOLOv5 mAP (0.5) graph.
As shown in Table 2, YOLOv8 had the most efficient
training time among all seven data points, followed by
the YOLOv7 model, the YOLOv5 model, and last the
7. 10.54646/bijiam.2023.16 53
FIGURE 11 | YOLOv7 mAP (0.5) graph.
FIGURE 12 | YOLOv8 mAP (0.5) graph.
Amazon Rekognition model. YOLOv8 shows a blazing 18
seconds of training time per epoch, which is quite a fast
result. Not surprisingly, the training of the models with
the enhanced annotated subset took longer given the fact
that it introduced approximately 700 more images into the
training subset. Interestingly, however, the increased training
time is minimal, especially in the YOLOv8 model, which
in part must be attributed to our steps to better prepare
the data with different preprocessing techniques. Figure 9
presents a graphical representation of the training time
efficiency per model.
5.2. YOLO baseline model results
For setting up the baseline, all three YOLO models were
trained using the original annotated dataset. This was set up
to produce a true comparison baseline with which to rate the
models, keeping the training data in the same state as they
were when Brown et al. (1) trained their YOLOv5 model.
Metrics used are True Positive, True Negative, False
Positive, False Negative, Precision, Recall, Mean Average
TABLE 6 | YOLOv5 performance metrics for enhanced datasets.
Class Images Labels Precision Recall mAP @.5 mAP @ 5.95
All 584 389 0.898 0.89 0.921 0.592
Container ship 581 95 0.907 0.84 0.917 0.571
Cruise ship 443 64 0.898 0.880 0.928 0.572
Military ship 448 74 0.89 0.906 0.965 0.586
RORO 451 62 0.991 0.993 0.995 0.721
Tanker 584 94 0.811 0.812 0.811 0.528
TABLE 7 | YOLOv7 performance metrics for enhanced datasets.
Class Images Labels Precision Recall mAP @.5 mAP @ 5.95
All 584 389 0.9141 0.92 0.915 0.601
Container ship 581 95 0.89 0.889 0.898 0.569
Cruise ship 443 64 0.917 0.922 0.923 0.57
Military ship 448 74 0.923 0.928 0.961 0.584
RORO 451 62 0.987 0.994 0.995 0.712
Tanker 584 94 0.854 0.867 0.854 0.568
TABLE 8 | YOLOv8 performance metrics for enhanced datasets.
Class Images Labels Precision Recall mAP @.5 mAP @ 5.95
All 584 389 0.912 0.916 0.944 0.648
Container ship 581 95 0.911 0.905 0.951 0.621
Cruise ship 443 64 0.938 0.922 0.923 0.688
Military ship 448 74 0.919 0.917 0.961 0.632
RORO 451 62 0.968 0.967 0.995 0.712
Tanker 584 94 0.864 0.861 0.891 0.601
Precision, and F1 score. Using the Tanker label as an example,
these metrics are defined as:
• True Positive (TP): Data points labeled as Tanker,
which are actually Tanker.
• True Negative (TN): Data points labeled as not Tanker,
which are actually not Tanker.
• False Positive (FP): Data points labeled as Tanker,
which are actually not Tanker.
• False Negative (FN): Data points labeled as not Tanker,
which are actually Tanker.
Precision: Number of true positives divided by the total
number of positive labels.
Precision =
TP
TP + FP
Recall: Number of true positives divided by true positives plus
false negatives.
Recall =
TP
TP + FN
8. 54 Perez and Bagui
FIGURE 13 | YOLOv5 confusion matrix for enhanced datasets.
FIGURE 14 | YOLOv8 confusion matrix for enhanced datasets.
Mean average precision (mAP): Average precision of each
class and average over all classes.
mAP =
1
N
N
X
i 1
APi
F1 score: The harmonic mean of the precision and recall.
F1 = 2
precision ∗ recall
precision + recall
The following tables show the performance metrics of
the baseline models using the original annotated dataset.
Tables 3–5 show the training results for the YOLOv5,
YOLOv7, and YOLOv8 models, respectively.
These results already offer some interesting observations,
even without introducing our enhanced subset into the
equation. First, it can be observed that YOLOv5 still
performed better than YOLOv7 on every single classification
FIGURE 15 | Combined recall score of YOLO models.
FIGURE 16 | Combined precision score of YOLO models.
label. Even though YOLOv7 trained its model 60% faster than
YOLOv5, the superior accuracy of the YOLOv5 model more
than makes up for that difference.
In contrast, YOLOv8 showed improvement in half
of the classification labels and notably showed marked
improvement in the two labels that the YOLOv5 model
performed the weakest in the original model. The YOLOv8
model also showed improvement in accuracy of the overall
model, with a mean average precision in the intersection over
union threshold 0 to 0.5 (mAP@.5) of 0.934 versus 0.919
on the original model. Furthermore, as seen in section 5.1,
the YOLOv8 model offered approximately a 440% training
efficiency improvement over the YOLOv5 model. These two
facts clearly set the YOLOv8 model as the clear candidate
model out of the three YOLO models.
Figures 10–12 show the graphical representation of
the mAP@.5 for the YOLOv5, YOLOv7, and YOLOv8
models, respectively.
5.3. Enhanced data models
This section shows the true results of the paper. Using the
enhanced annotated subset along with both the three YOLO
9. 10.54646/bijiam.2023.16 55
TABLE 9 | Amazon Rekognition performance metrics for
enhanced datasets.
Class Images Labels Precision Recall F1 score
All 584 389 0.972 0.97 0.971
Container ship 581 95 0.984 0.968 0.976
Cruise ship 443 64 1 0.983 0.991
Military ship 448 74 0.967 0.951 0.959
RORO 451 62 0.984 0.968 0.976
Tanker 584 94 0.925 0.98 0.951
TABLE 10 | Comparison of metric scores for container ship and
tanker classification labels.
Class YOLOv5 original
(mAP@.5 score)
YOLOv8 (mAP@.5
score)
Amazon
Rekognition
(F1 score)
All 0.919 0.944 0.971
Container ship 0.916 0.951 0.976
Tanker 0.798 0.891 0.951
models and introducing the Amazon Rekognition model, the
results here should clearly identify the superior model.
First, we explore the training results of the YOLO models.
As before, Tables 6–8 show the training results using the
enhanced annotated subset for the YOLOv5, YOLOv7, and
YOLOv8 models, respectively.
It can be observed that the YOLOv5 model achieved
a slight improvement over the original model by using
the enhanced annotated set. A nominal improvement of
mAP@.5 from 0.919 to 0.921 in the overall model is minor
but not insignificant. Even while minor, the improvement is
across the board on every single classification label as well
as the overall model. This shows the efforts undertaken to
improve the training data with preprocessing techniques had
a positive effect on the overall model accuracy, even while still
using the same underlying model training library.
Second, it can also be observed that the YOLOv7 model
achieved a significant improvement using the enhanced
annotated set over the same model using the original
annotated set. An improvement of 0.824 to 0.915 mAP@.5
is quite a substantial improvement. It still falls short to the
YOLOv5 model on overall model accuracy, but notably, the
Tanker classification label achieves a big improvement bump
over the results from the original model by achieving 0.854
versus the original 0.798 mAP@.5. This shows the YOLOv7
model to be more receptive to data quality improvements
and makes us postulate that further efforts in this area should
make this model pull ahead of the original model.
Finally, as also observed in section 5.2, the YOLOv8 model
again achieved the largest improvement over the original
model. Not only does it achieve significant improvement on
the overall model accuracy with a 0.944 versus 0.919 mAP@.5
FIGURE 17 | Amazon Rekognition classification model: mislabel of
Tanker instead of Container Ship.
improvement but also it achieves noticeable improvement
in our targeted classification labels of Container Ship and
Tanker, particularly the Tanker classification label which sees
an improvement to 0.891 from 0.798 mAP@.5. This model
also showed improvement when trained with the enhanced
annotated set versus the original, showing, just like with
the YOLOv7 model, to be more receptive to training data
quality improvements.
Figures 13, 14 show the confusion matrix for the YOLOv5
and YOLOv8 models, which are the two best performing
models at this stage of the results.
Confusion matrices for both models show the biggest
area of mislabeling occurs between Container Ships and
Tankers, which coincides with the original conclusions
reached by Brown et al. (1). Notably, the YOLOv8 model
manages to reduce the amount of incorrectly labeled images
between these two labels by a significant amount, hence the
reason for the overall improvement in this model against
the YOLOv5 model.
Figures 15, 16 show the combined recall and precision
graphs of all three YOLO models trained with the
annotated enhanced set.
The combined figures again reinforce the results showing
the YOLOv8 model superior performance even at the raw
precision and recall levels. Interestingly, YOLOv7 manages
to keep up with the other models with this raw score, despite
lower overall mAP@.5 performance.
Amazon Rekognition does not provide epoch data as part
of the result set; thus, the results are shown as averages of the
five training runs. Furthermore, it uses the F1 score as the
10. 56 Perez and Bagui
FIGURE 18 | Amazon Rekognition classification model: mislabel of
Container Ship instead of Tanker.
overall model accuracy score instead of mAP. Table 9 shows
these results averaged out across the five training runs.
The Amazon Rekognition model managed to achieve
improved results across the board. A top overall model F1
score of 0.971 compared to the best YOLO model overall
mAP@.5 of 0.944 was achieved by the YOLOv8 model.
Furthermore, it also managed to significantly increase the
accuracy on both our target classification labels of Container
Ship and Tanker substantially, compared to both the original
YOLOv5 model and our best performing YOLOv8 model.
On the other hand, as shown in section 5.1, this model offered
the lowest training efficiency. This shows a clear tradeoff in
model classification accuracy improvement at the expense of
training efficiency and speed.
Table 10 shows the overall scores for our target labels for
the original YOLOv5 model, the best performing YOLOv8
model, and the Amazon Rekognition model.
Interestingly, the Amazon Rekognition model had similar
mislabeling issues as the YOLO models when it came to
mislabeling Container Ships and Tankers. Obviously given
the score, the mislabeling issues were on a smaller scale, but
it does offer an interesting data point. Figures 17, 18 show an
example of a mislabeling from the Rekognition model.
This continued mislabeling issue across all trained models
tested exposes a potential issue with the ground truth labels of
the vessels. Either there is cross labeling happening between
the Container Ship and Tanker images or these particular
vessels are too similar for the models to make distinctions
on some special cases. We noticed, while annotating images,
that some vessels had profiles of both classifications despite
ground truth labeling them with a specific label.
6. Conclusion and future work
Using the previous work developed by Brown et al. (1),
the annotated set of Analytics Vidhya marine vessel images
and the original annotated and labeled subset of images
was recreated in a Roboflow environment. This allowed
for their original YOLOv5 model to be recreated to serve
as a comparison baseline for the research conducted in
this paper. The original annotated set was used with
two additional YOLO models, YOLOv7 and YOLOv8, to
compare the performance against the original model. This
model comparison began to yield interesting results. First,
an overall significant increase in training efficiency for the
new YOLO models was achieved, with the YOLOv8 model
achieving a nearly 440% training efficiency improvement
over the original YOLOv5 model (1). Second, while
the YOLOv7 model failed to achieve similar levels of
classification accuracy, falling short in both overall model
mAP scores and single-label scores, the YOLOv8 achieved
classification accuracy improvements across the board in
both overall and single-label scores.
Further steps were taken to explore ways to improve
these models and obtain an improved object detection and
classification model. To achieve this, the original annotated
set was enhanced by adding an additional 1000 image
annotations from images extracted from the same Analytics
Vidhya image dataset, and the entire expanded annotated
set was subject to several image preprocessing steps to
improve the quality of the training data. These new annotated
data were used to retrain the same three YOLO models,
but additionally, they were also used to train an Amazon
Rekognition classification model to introduce a non-YOLO
classification model library for comparison.
Results from the training of these models showed
improvements across the board in classification accuracy for
all three YOLO models. Improvements ranged from a modest
2% classification accuracy increase for the YOLOv5 model
to a significant 10% improvement for the YOLOv8 model.
This showed the steps taken to improve the quality of the
training data had significant effects on the accuracy of each
of the models. Furthermore, the Amazon Rekognition model
achieved the highest improvement in classification accuracy.
With a nearly 15% improvement in overall classification over
the original model, nearly 20% improvement in classification
of the lowest performing classification, Tanker, was achieved
by the original model. The Rekognition model, however, had
the lowest training efficiency of them all, clearly showing
a conscious tradeoff of model classification accuracy over
training speed and efficiency.
11. 10.54646/bijiam.2023.16 57
Interestingly, all models struggled mostly with mislabels
between Container Ships and Tankers, illustrating a potential
issue with the ground truth labels for these two vessel types.
There seems to be a high overlap in image labels between
these two types, leading models to significantly mislabel
these types above any other. Future work on enhancing
these models will most definitely need to delve into this
issue. Both YOLOv7 and YOLOv8 models showed good
improvement with the enhanced annotated set, giving good
evidence of the model’s receptiveness to improved training
data. This, together with the possible ground truth issues with
Container Ship and Tanker labels, gives an excellent starting
point for potential improvement with these specific image
classifications, which, if improved further, would push the
models to a very high level of accuracy across the board.
Author contributions
AP prepared the dataset by annotating the images, worked
on applying the machine learning algorithms, and wrote the
initial draft of the manuscript. SB provided guidance and
oversight on the whole project and did the overseeing and
final edits of the manuscript to bring it to publication form.
Both authors contributed to the article and approved the
submitted version.
Conflict of interest
The authors declare that the research was conducted in the
absence of any commercial or financial relationships that
could be construed as a potential conflict of interest.
References
1. Brown S, Hall C, Galliera R, Bagui S. Object detection and ship
classification using YOLOv5. BOHR Int J Comput Sci. (2022) 1:124–133.
2. Analytics Vidhya. Game of deep learning: computer vision Hackathon.
(n.d.). Available online at: https://datahack.analyticsvidhya.com/
contest/game-of-deep-learning/ (accessed January 22, 2023).
3. Kaggle. Game of deep learning: ship datasets. (n.d.). Available online at:
https://www.kaggle.com/datasets/arpitjain007/game-of-deep-learning-
ship-datasets (accessed January 22, 2023).
4. Roboflow. Roboflow: go from raw images to a trained computer vision
model in minutes. (n.d.). Available online at: https://roboflow.com/
(accessed January 22, 2023).
5. Joseph R, Divvala S, Girshick R, Farhadi A. You only look once: unified,
real-time object detection. arxiv [preprint] (2016): arXiv:1506.02640
6. Ultralytics. Ultralytics | Revolutionizing the world of vision AI. Available
online at: https://ultralytics.com/yolov8 (accessed January 22, 2023).
7. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al.
Ultralytics/yolov5: v7.0 - YOLOv5 SOTA realtime instance segmentation.
(2022). Available online at: https://zenodo.org/record/7347926 (accessed
February 28, 2023).
8. Ultralytics. Ultralytics/yolov5. (2020). Available online at: https://github.
com/ultralytics/yolov5 (accessed 2023).
9. Wong KY. Official YOLOv7. (2022). Available online at: https://github.
com/WongKinYiu/yolov7 (accessed 2023).
10. Ultralytics. YOLOv8 by Ultralytics. (2023). Available online at: https:
//github.com/ultralytics/ultralytics (accessed 2023).
11. Amazon. Amazon rekognition – video and image - AWS. Seattle, WA:
Amazon Web Services, Inc (2017).
12. PyTorch. (n.d.). Available online at: https://pytorch.org/hub/ultralytics_
yolov5/ (accessed January 22, 2023).
13. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al.
Microsoft COCO: common objects in context. In: Fleet D, Pajdla T,
Schiele B, Tuytelaars T editors. Computer vision – ECCV 2014. Lecture
notes in computer science. (Vol. 8693), Cham: Springer (2014). p. 740–
755. doi: 10.1007/978-3-319-10602-1_48
14. Ultralytics/Yolov3. ultralytics/yolov3. (2020). Available online at: https:
//github.com/ultralytics/yolov3 (accessed 2023).
15. Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors. arxiv
[preprint] (2022): arXiv:2207.02696
16. Gabriel. Performance benchmark of YOLO v5, v7 and v8. Stereolabs
(2023). Available online at: https://www.stereolabs.com/blog/
performance-of-yolo-v5-v7-and-v8/ (accessed 2023).
17. Mishra A. Machine learning in the AWS cloud. Hoboken, NJ: John Wiley
and Sons (2019).
18. Kim J-H, Kim N, Park YW, Won CS. Object detection and classification
based on YOLO-V5 with improved maritime dataset. J Mar Sci Eng.
(2022) 10:377. doi: 10.3390/jmse10030377
19. Tang G, Zhuge Y, Claramunt C, Men S. N-YOLO: a SAR ship detection
using noise-classifying and complete-target extraction. Remote Sens.
(2021) 13:871. doi: 10.3390/rs13050871
20. Zhao H, Zhang H, Zhao Y. YOLOv7-sea: object detection of maritime
UAV images based on improved YOLOv7. Proceedings of the IEEE/CVF
winter conference on applications of computer vision workshops
(WACVW). Waikoloa, HI: (2023). doi: 10.1109/WACVW58289.2023.
00029
21. Olorunshola OE, Irhebhude ME, Evwiekpaefe AE. A comparative study
of YOLOv5 and YOLOv7 object detection algorithms. J Comput Soc Inf.
(2023) 2:1–12. doi: 10.33736/jcsi.5070.2023
22. Jiang K, Xie T, Yan R, Wen X, Li D, Jiang H, et al. An attention
mechanism-improved YOLOv7 object detection algorithm for hemp
duck count estimation. Agriculture. (2022) 12:1659. doi: 10.3390/
agriculture12101659
23. Gillani IS, Munawar MR, Talha M, Azhar S, Mashkoor Y, Uddin MS,
et al. Yolov5, Yolo-x, Yolo-r, Yolov7 performance comparison: a survey.
Comput Sci Inf Technol (CS and IT). (2022) 12:17. doi: 10.5121/csit.2022.
121602
24. Ghahremani A, Kong Y, Bondarau Y, de With PHN. Multi-
class detection and orientation recognition of vessels in maritime
surveillance. Paper presented at IS&T international symposium on
electronic imaging 2019, image processing: algorithms and systems XVII.
Eindhoven University of Technology Research Portal (2019). doi: 10.
2352/ISSN.2470-1173.2019.11.IPAS-266
25. Yang R, Li W, Shang X, Zhu D, Man X. KPE-YOLOv5: an improved
small target detection algorithm based on YOLOv5. Electronics. (2023)
12:817. doi: 10.3390/electronics12040817
26. Sharma V. Object detection and recognition using Amazon Rekognition
with Boto3. Proceedings of the 6th international conference on trends in
electronics and informatics (ICOEI). Tirunelveli: (2022). doi: 10.1109/
icoei53556.2022.9776884
27. Mohanta R, Sethi B. Amazon rekognition for pattern recognition. J Eng
Sci. (2020) 11:197–200.
28. Google. Google colaboratory. (2019). Available online at: https://colab.
research.google.com/ (accessed 2023).
29. AWS. Cloud object storage | Store and retrieve data anywhere | Amazon
simple storage service. Seattle, WA: Amazon Web Services, Inc (2018).
30. CUDA Toolkit. CUDA toolkit, NVIDIA developer. (2013). Available
online at: https://developer.nvidia.com/cuda-toolkit (accessed 2023).